Implementing cat in pure bash
Implementing a function in bash that simply echos its standard input to standard output is surprisingly difficult. Here’s one possible golfed implementation:
bashcat(){ local z=\\0;while [ $z ];do LC_ALL=C IFS= read -rd '' l||z=;printf "%s$z" "$l";done;}
Cleaned up, it could look like this:
1bashcat () {
2 local trailing_nul='\0'
3 while [ $trailing_nul ]; do
4 LC_ALL=C IFS= read -rd '' chunk || trailing_nul=
5 printf "%s$trailing_nul" "$chunk"
6 done
7}
There’s a lot going on.
LC_ALL=Cis required forreadto parse the input byte-by-byte, instead of doing locale specific things (i.e. parse UTF-8).LC_ALLspecifically is used, because it overrides all other locale settings likeLANG,LC_CTYPE, and so onIFS=is required, otherwise spaces, tabs, and newlines will get stripped in certain cases-ris so that backslash is not treated specially-d ''is required soreadconsumes input until a NUL-byte, consuming it but not storing it inchunk(bash variables are NUL-terminated so it wouldn’t be possible anyways). The space is required if the delimiter is just''printfis required becauseechocannot reliably print data$trailing_nulis required, becausereadexiting successfully (status code0) means that the delimiter was consumed (in this case the NUL-byte). Ifreadexits with status code1, it means that the delimiter wasn’t consumed, or EOF. In that case (||), we resettrailing_nulso it doesn’t get printed, and the next iteration ofwhilewill not happen. Notice how a failing read doesn’t mean we’re done, it could’ve still read a bunch of bytes intochunk, which must be flushed, hence the weird logic
Regarding speed, on my system, I see these speeds
1~$ Throughput with random data
2~$ </dev/urandom pv >/dev/null
3[ 545MiB/s ]
4~$ </dev/urandom cat | pv >/dev/null
5[ 440MiB/s ]
6~$ </dev/urandom bashcat | pv >/dev/null
7[ 1.45MiB/s ]
8
9~$ Throughput with zero data
10~$ </dev/zero pv >/dev/null
11[ 31.4GiB/s ]
12~$ </dev/zero cat | pv >/dev/null
13[ 1.63GiB/s ]
14~$ </dev/zero bashcat | pv >/dev/null
15[ 26.6KiB/s ]
Therefore, our bashcat is roughly 300x slower for random data and more than 60000x slower for just NULs.
If binary data is not a problem, of course, the following works as well, although it cannot be piped:
printf %s "$(</dev/stdin)"
This will skip over any NUL-bytes and strip all trailing newlines.
Read other posts