Equivalent of Java's String.getBytes() in Unix Shell (Cygwin)

Question

Let's say I convert my string into byte array.

byte[] byte sUserID.getBytes(“UTF-8”); //Convert User ID String to byte array

Now I need to write a script on Shell that will have exactly the same functionality as my Java code. At some stage I must hash my byte array (using MessageDigest.getInstance(“SHA-256”) in Java and openssl dgst -sha256 –binary in Shell), but because digests in Java code are generated from byte arrays, they won’t match results I get in Shell (in Shell I simply hash strings at the moment, so input formats don't match).

Because my input for openssl in shell should be similar to Java input I want to know whether there is a way to “simulate” getBytes() method in Shell? I don’t have much experience in Shell so I don’t know what could be the best approach in this case. Any ideas? Cheers!

@glennjackman I would say it is duplicated at stackoverflow :) For some reason I think that Unix community has more expertise in this topic than SO community. — C_U, CommentedJan 23, 2015 at 15:17

Stéphane Chazelas · Accepted Answer · 2015-01-23 17:29:47Z

openssl's stdin is a byte stream.

The contents of $user is a sequence of non-0 bytes (which may or may not form valid characters in UTF-8 or other character set/encoding).

printf %s "$user"'s stdout is a byte stream.

printf %s "$user" | openssl dgst -sha256 –binary

Will connect printf's stdout with openssl's stdin. openssl's stdout is another byte stream.

Now, if you're inputing $user from the user from a terminal, The user will enter it by pressing keys on his keyboard. The terminal will send corresponding characters (as written on the key label) encoded in its configured character set. Usually, that character set will be based on the character set in the current locale. You can find what that is with locale charmap.

For instance, with a locale like fr_FR.iso885915@euro, and an xterm started in that locale, locale charmap will return ISO-8859-15. If the user enters stéphane as the username, that é will likely be encoded as the 0xe9 byte because that's how it's defined in the ISO-8859-15 character set.

If you want that é to be encoded as UTF-8 before passing to openssl, that's where you'd use iconv to convert that 0xe9 byte to the corresponding encoding in UTF-8 (two bytes: 0xc30xa9):

IFS= read -r user # read username from stdin as a sequence of bytes # assumed to be encoded from characters as per the # locale's encoding printf %s "$user" | iconv -t utf-8 | # convert from locale encoding to UTF-8 openssl dgst -sha256 –binary

Thank you so much! Works like a charm. And this explanation on how and why it works is the best I've seen so far. — C_U, CommentedJan 23, 2015 at 15:26

Stack Exchange Network

Equivalent of Java's String.getBytes() in Unix Shell (Cygwin)

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Equivalent of Java's String.getBytes() in Unix Shell (Cygwin)

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions