Implementing cat
in pure bash
Implementing a function in bash that simply echos its standard input to standard output is surprisingly difficult. Here’s one possible golfed implementation:
bashcat(){ local z=\\0;while [ $z ];do LC_ALL=C IFS= read -rd '' l||z=;printf "%s$z" "$l";done;}
Cleaned up, it could look like this:
1bashcat () {
2 local trailing_nul='\0'
3 while [ $trailing_nul ]; do
4 LC_ALL=C IFS= read -rd '' chunk || trailing_nul=
5 printf "%s$trailing_nul" "$chunk"
6 done
7}
There’s a lot going on.
LC_ALL=C
is required forread
to parse the input byte-by-byte, instead of doing locale specific things (i.e. parse UTF-8).LC_ALL
specifically is used, because it overrides all other locale settings likeLANG
,LC_CTYPE
, and so onIFS=
is required, otherwise spaces, tabs, and newlines will get stripped in certain cases-r
is so that backslash is not treated specially-d ''
is required soread
consumes input until a NUL-byte, consuming it but not storing it inchunk
(bash variables are NUL-terminated so it wouldn’t be possible anyways). The space is required if the delimiter is just''
printf
is required becauseecho
cannot reliably print data$trailing_nul
is required, becauseread
exiting successfully (status code0
) means that the delimiter was consumed (in this case the NUL-byte). Ifread
exits with status code1
, it means that the delimiter wasn’t consumed, or EOF. In that case (||
), we resettrailing_nul
so it doesn’t get printed, and the next iteration ofwhile
will not happen. Notice how a failing read doesn’t mean we’re done, it could’ve still read a bunch of bytes intochunk
, which must be flushed, hence the weird logic
Regarding speed, on my system, I see these speeds
1~$ Throughput with random data
2~$ </dev/urandom pv >/dev/null
3[ 545MiB/s ]
4~$ </dev/urandom cat | pv >/dev/null
5[ 440MiB/s ]
6~$ </dev/urandom bashcat | pv >/dev/null
7[ 1.45MiB/s ]
8
9~$ Throughput with zero data
10~$ </dev/zero pv >/dev/null
11[ 31.4GiB/s ]
12~$ </dev/zero cat | pv >/dev/null
13[ 1.63GiB/s ]
14~$ </dev/zero bashcat | pv >/dev/null
15[ 26.6KiB/s ]
Therefore, our bashcat
is roughly 300x slower for random data and more than 60000x slower for just NULs.
If binary data is not a problem, of course, the following works as well, although it cannot be piped:
printf %s "$(</dev/stdin)"
This will skip over any NUL-bytes and strip all trailing newlines.
Read other posts