Implementing `cat` in pure bash

Implementing a function in bash that simply echos its standard input to standard output is surprisingly difficult. Here’s one possible golfed implementation:

bashcat(){ local z=\\0;while [ $z ];do LC_ALL=C IFS= read -rd '' l||z=;printf "%s$z" "$l";done;}

Cleaned up, it could look like this:

1bashcat () {
2    local trailing_nul='\0'
3    while [ $trailing_nul ]; do
4        LC_ALL=C IFS= read -rd '' chunk || trailing_nul=
5        printf "%s$trailing_nul" "$chunk"
6    done
7}

There’s a lot going on.

LC_ALL=C is required for read to parse the input byte-by-byte, instead of doing locale specific things (i.e. parse UTF-8). LC_ALL specifically is used, because it overrides all other locale settings like LANG, LC_CTYPE, and so on
IFS= is required, otherwise spaces, tabs, and newlines will get stripped in certain cases
-r is so that backslash is not treated specially
-d '' is required so read consumes input until a NUL-byte, consuming it but not storing it in chunk (bash variables are NUL-terminated so it wouldn’t be possible anyways). The space is required if the delimiter is just ''
printf is required because echo cannot reliably print data
$trailing_nul is required, because read exiting successfully (status code 0) means that the delimiter was consumed (in this case the NUL-byte). If read exits with status code 1, it means that the delimiter wasn’t consumed, or EOF. In that case (||), we reset trailing_nul so it doesn’t get printed, and the next iteration of while will not happen. Notice how a failing read doesn’t mean we’re done, it could’ve still read a bunch of bytes into chunk, which must be flushed, hence the weird logic

Regarding speed, on my system, I see these speeds

 1~$ Throughput with random data
 2~$ </dev/urandom           pv >/dev/null
 3[ 545MiB/s ]
 4~$ </dev/urandom cat     | pv >/dev/null
 5[ 440MiB/s ]
 6~$ </dev/urandom bashcat | pv >/dev/null
 7[ 1.45MiB/s ]
 8
 9~$ Throughput with zero data
10~$ </dev/zero              pv >/dev/null
11[ 31.4GiB/s ]
12~$ </dev/zero cat        | pv >/dev/null
13[ 1.63GiB/s ]
14~$ </dev/zero bashcat    | pv >/dev/null
15[ 26.6KiB/s ]

Therefore, our bashcat is roughly 300x slower for random data and more than 60000x slower for just NULs.

If binary data is not a problem, of course, the following works as well, although it cannot be piped:

printf %s "$(</dev/stdin)"

This will skip over any NUL-bytes and strip all trailing newlines.

Implementing cat in pure bash

Implementing `cat` in pure bash