I'm currently using a command to convert some Chinese characters into pinyin, which requires the string in my bash to be Unicode safe and put the result in another variable. I can run the following command normally:
chinese="你好" to-pinyin.py $chinese
It will print the output as expected, however, since I want the output in a variable, I tried to do the following:
chinese="你好" pinyin=$(to-pinyin.py $chinese)
And python will fail with:
Traceback (most recent call last): File "/~/to-pinyin.py", line 31, in <module> print pinyin.get(hanzi, delimiter=" ").capitalize() UnicodeEncodeError: 'ascii' codec can't encode character u'\u01d0' in position 1: ordinal not in range(128)
Same thing will happen with backticks. I think I will work around by writing the output to a file and to a conversion there, then load the strings to a variable. How else can I fix this so that I can avoid the workaround?
EDIT:
Per request here is the output of locale:
$ locale LANG=en_US.UTF-8 LANGUAGE=en_US LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
SOLUTION USED
Thanks to muru's response and some help of this other answer I added .encode('utf-8')
to the end of the printed strings in my python script.
I wish I could switch to python3, but there is no defalut pinyin package there and I can't seem to install any good pinyin package that would get my job done quickly in python3. I remember trying for a while but python3 kept refusing to import the package I had installed, so I just installed one in python2 and it worked straight out of the box.
locale
?to-pinyin.py "$chinese"