1

Some folders on macOS have custom icons that are stored in a file named Icon?, where the ? is actually a CR character, and only prints as "?" in most cases (in Terminal and Finder).

But when printing such a file name in hex in Terminal, you'll get:

$ ls -l1 Icon* | xxd 00000000: 4963 6f6e 0d0a Icon.. 

The 0d is the CR at the end of the name, and the 0a is the LF that's printed by ls at the end of each line.

Now, I like to find such files, using find.

I'd think that this would be the way:

find -E . -iregex '.*/Icon\x0d' 

Nor does:

find -E . -iregex '.*/Icon\r' 

However, this won't find it. But this finds it (using . as a whildcard char):

find -E . -iregex '.*/Icon.' 

But something is wrong with looking for hex chars in general, because this doesn't work either:

find -E . -iregex ".*/\x49con." 

\x49 is the code for I, so this should work.

So, if you want to try this yourself, take any file and try to find it using the find command with the -regex option and specifying at least one character in hex, e.g. looking for the file named 'a' with the regex \x61 or whatever is correct. Can you accomplish it?

Note, on SO, where I had asked first, someone suggester to use this form:

find -E . -iregex $'.*/Icon\r' 

That does indeed work, but it's not really what I want, because I use construct this command in a program where I let the user enter a regex pattern and then apply a regex function to check if a file name matches the pattern. I like to use the very same pattern when invoking the find command, and therefore I'd prefer it if I could use the very same pattern in both places.

Using the $'…' wrapping might change the interpretation of regex patterns (not sure if that'll actually cause problems), so I rather would like to figure out why I cannot use the \x.. notation in the command, because according to man find and man re_format this should work.

    1 Answer 1

    1

    In a POSIX-compliant find implementation (doesn't include FreeBSD's, but should include macos which has been certified as being compliant even if its POSIX utilities are based on FreeBSD's):

    find . -name '*[![:print:]]*' 

    Would report files whose base name contains at least one non-printable character.

    Beware that how file names are decoded into characters and what's classified as [:print:] depends on the current locale (LC_CTYPE category).

    In locales with multi-byte characters (such as UTF-8 encoded ones), it's possible for sequences of bytes not to form valid characters, and with some find (fnmatch()) implementations that causes * (which matches 0 or more characters) not to match across them.

    If you're only concerned about ASCII control characters such as CR, you can fix the locale to C.

    To match on CR specifically, use the ksh93-style $'...' form of quotes (also available in zsh and bash):

    LC_ALL=C find . -name $'*\r*' 

    Then that's a literal CR character that will be passed in the argument to -name.

    If you want to allow the user to enter things like \r, \xff, \u200b in the pattern and have it expanded in the argument to -name, you can do (in both zsh and bash¹):

    printf -v expanded_user_input %b "$user_input" find . -name "$expanded_user_input" 

    Beware that expands \\ to \, and \ is also special to -name (and -regex) so for instance if the user wanted to find files with backslashes in them, they'd need to enter *\\\\*.

    In any case beware that -regex matches on the full path.

    To find files with CR in their name with that, you'd need:

    find . -regex $'.*\r[^/]*' 

    find itself on BSDs support only either basic POSIX regexp (REG_BASIC, default) or extended POSIX regexp (REG_EXTENDED, with -E), a -P for perl-like (which looks like would translate to REG_EXTENDED|REG_ENHANCED with that regexp implementation) like in some grep/sed implementations would be needed to support the enhanced regexps mentioned in macos' re_format man page.

    If you want to use perl-style regexps, an alternative is to use zsh which has that builtin (assuming PCRE/PCRE2 support has been enabled at build time):

    set -o rematchpcre print -rC1 - **/*(NDe['[[ $REPLY:t =~ "\A(?i:$user_input)\z" ]]') 

    For instance to find files whose tail (basename) matches the perl-style regexp stored in $user_input (case insensitively with (?i)) and anchored both at start (\A) and end (\z) of the subject like find's -regex does.


    ¹ Beware the list of supported \...s varies between shells (and their printf builtin) and versions thereof; and that with %b, it's the echo-style of \0123 octal handling (which requires a leading 0) that you get and not the usual one: the user would need to enter \0200 for instance, not \200 for \x80.

    3
    • I believe the proper pattern for -regex to check only on the last path component is to use .*/name-to-find, or do you see an issue with that? Also, thanks for explaining all the options in detail, but my challenge is that my program, which invokes the find tool, lets the user enter an extended RE and then I want to make sure it'll work. Therefore, I cannot use the $'...' escape because then a search using \w+\r instead of Icon\r won't work. For the same reason I cannot use any of your other suggestions, unfortunately.CommentedMay 31, 2024 at 15:50
    • Huh, looks like \w isn't part of extended regex, so if that means that ERE doesn't support any backslash shortcuts, I can use $'…' after all. I guess this will need more testing. Sigh.CommentedMay 31, 2024 at 16:13
    • 1
      @ThomasTempelmann, see there for the specification of standard EREs. You can compare with the perl ones which you seem to be expecting (perldoc perlre) or compatible (like man pcre2pattern). \t, \n, \a as representations of some control characters are much older from C in the 70s. The \d, \w perl extensions are not representations of control characters. See also \b, \v which don't mean the BS, VT control characters in perl but match word boundary or vertical whitespace.CommentedJun 3, 2024 at 7:13

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.