2

I'm currently using a command to convert some Chinese characters into pinyin, which requires the string in my bash to be Unicode safe and put the result in another variable. I can run the following command normally:

 chinese="你好" to-pinyin.py $chinese 

It will print the output as expected, however, since I want the output in a variable, I tried to do the following:

 chinese="你好" pinyin=$(to-pinyin.py $chinese) 

And python will fail with:

Traceback (most recent call last): File "/~/to-pinyin.py", line 31, in <module> print pinyin.get(hanzi, delimiter=" ").capitalize() UnicodeEncodeError: 'ascii' codec can't encode character u'\u01d0' in position 1: ordinal not in range(128) 

Same thing will happen with backticks. I think I will work around by writing the output to a file and to a conversion there, then load the strings to a variable. How else can I fix this so that I can avoid the workaround?

EDIT:

Per request here is the output of locale:

$ locale LANG=en_US.UTF-8 LANGUAGE=en_US LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= 

SOLUTION USED

Thanks to muru's response and some help of this other answer I added .encode('utf-8') to the end of the printed strings in my python script.

I wish I could switch to python3, but there is no defalut pinyin package there and I can't seem to install any good pinyin package that would get my job done quickly in python3. I remember trying for a while but python3 kept refusing to import the package I had installed, so I just installed one in python2 and it worked straight out of the box.

7
  • Can you add to your question the output of locale?
    – xenoid
    CommentedOct 18, 2019 at 0:14
  • @xenoid Here it is.CommentedOct 18, 2019 at 0:21
  • 1
    Problem is likely in your python code. The message is about reading UTF-8 as ascii. Is it Python2 and Python3? How/where do you handle encodings?
    – xenoid
    CommentedOct 18, 2019 at 0:43
  • @xenoid It is python 2.7.12. It can read the input from the terminal correctly, as shown before. The issue arises when changing from output to the terminal to output to a variable. I can't see why python would be the cause. For me bash is converting the input to ascii when I try to get the output in a variable.CommentedOct 18, 2019 at 0:47
  • Are you sure you didn't quote it in the terminal? like to-pinyin.py "$chinese"
    – jesse_b
    CommentedOct 18, 2019 at 1:17

1 Answer 1

4

This is an issue with Python 2's print, and reason to favour Python 3's consistent Unicode handling.

Now why does redirecting to a file cause problems? It’s because print() in python2 is treated specially. Whereas the other file-like objects in python always convert to ASCII unless you set them up differently, using print() to output to the terminal will use the user’s locale to convert before sending the output to the terminal. When print() is not outputting to the terminal (being redirected to a file, for instance), print() decides that it doesn’t know what locale to use for that file and so it tries to convert to ASCII instead.

1
  • +1: Very interesting. But is there an easy way out of this bind ? E.g. to force interpretation of print in Python 2.7.x according to a locale of user's choice, when there is no tty stdout involved but a command expansion as in var = $(print ...) ?
    – Cbhihe
    CommentedOct 18, 2019 at 17:12

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.