4

Let's say I convert my string into byte array.

byte[] byte sUserID.getBytes(“UTF-8”); //Convert User ID String to byte array 

Now I need to write a script on Shell that will have exactly the same functionality as my Java code. At some stage I must hash my byte array (using MessageDigest.getInstance(“SHA-256”) in Java and openssl dgst -sha256 –binary in Shell), but because digests in Java code are generated from byte arrays, they won’t match results I get in Shell (in Shell I simply hash strings at the moment, so input formats don't match).

Because my input for openssl in shell should be similar to Java input I want to know whether there is a way to “simulate” getBytes() method in Shell? I don’t have much experience in Shell so I don’t know what could be the best approach in this case. Any ideas? Cheers!

2
  • Question duplicated from stackoverflow.com/q/28112029/7552CommentedJan 23, 2015 at 14:58
  • @glennjackman I would say it is duplicated at stackoverflow :) For some reason I think that Unix community has more expertise in this topic than SO community.
    – C_U
    CommentedJan 23, 2015 at 15:17

1 Answer 1

5

openssl's stdin is a byte stream.

The contents of $user is a sequence of non-0 bytes (which may or may not form valid characters in UTF-8 or other character set/encoding).

printf %s "$user"'s stdout is a byte stream.

printf %s "$user" | openssl dgst -sha256 –binary 

Will connect printf's stdout with openssl's stdin. openssl's stdout is another byte stream.

Now, if you're inputing $user from the user from a terminal, The user will enter it by pressing keys on his keyboard. The terminal will send corresponding characters (as written on the key label) encoded in its configured character set. Usually, that character set will be based on the character set in the current locale. You can find what that is with locale charmap.

For instance, with a locale like fr_FR.iso885915@euro, and an xterm started in that locale, locale charmap will return ISO-8859-15. If the user enters stéphane as the username, that é will likely be encoded as the 0xe9 byte because that's how it's defined in the ISO-8859-15 character set.

If you want that é to be encoded as UTF-8 before passing to openssl, that's where you'd use iconv to convert that 0xe9 byte to the corresponding encoding in UTF-8 (two bytes: 0xc30xa9):

IFS= read -r user # read username from stdin as a sequence of bytes # assumed to be encoded from characters as per the # locale's encoding printf %s "$user" | iconv -t utf-8 | # convert from locale encoding to UTF-8 openssl dgst -sha256 –binary 
1
  • 1
    Thank you so much! Works like a charm. And this explanation on how and why it works is the best I've seen so far.
    – C_U
    CommentedJan 23, 2015 at 15:26

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.