oss-sec mailing list archives

Re: shell wildcard expansion (un)safety


From: Fay Stegerman <flx () obfusk net>
Date: Sun, 10 Nov 2024 19:42:13 +0100

* Eli Schwartz <eschwartz () gentoo org> [2024-11-10 00:59]:
[...]
Overall, wildcards are just a classic "here is a programming language
footgun, we cannot fix it because the language is backwards compatible
to the 90s and earlier" which amounts to:

people love bash because it's "simple" and "easy" and "anyone can write
a bash script without knowing what they are doing".
[...]

Obviously, shell scripts and wildcards are one of the easiest ways to trip up
here.  But the underlying issue is that CLI interfaces mix options and
arguments: the lack of a clean separation between data and code/commands
(another example is e.g. printing unescaped control characters to stdout,
something discussed on this list before, and far too common IME, as I recently
found out playing with control characters in X.509 certificate DNs).

This isn't just a shell problem, it's just that shell scripts call external
programs much more frequently.  I certainly don't call external tools like grep
or find from Python the way I do in shell scripts, but it's not exactly uncommon
to have to call some external program (e.g. git) to do something.

And when I do I always make sure to use "--" before any arguments that come from
external sources (user, filesystem) to ensure they're not interpreted as
options, because that problem isn't limited to shell scripts (and for shell
scripts shellcheck can at least provide warnings in common cases).

Nor is it limited to wildcards: e.g. you would have the exact same problem if
you're reading the arguments as lines from a file instead, or getting them from
an HTTP request.  The real problem isn't that a wildcard can expand to things
that start with dashes, the problem is that it matters because the program
receiving the arguments will interpret those as options.  That's the footgun.

Everything that deals with data from external sources and passes it to something
that may interpret some of that data as code/commands has to
validate/sanitise/escape that data.  Ideally one would use an interface that
doesn't mix data and code/commands, which "--" more or less provides (but it's
easy to forget, and of course not all programs support it).

- Fay


Current thread: