0

I'm using Notepad++ editor on windows with format set to ASCII, I've read "PEP 263: Source Code Encodings" and amended my code accordingly (I think), but there are characters still printing in hex...

#!/usr/bin/python # -*- coding: UTF-8 -*- import os, sys a_munge = [ "A", "4", "/\\", "\@", "/-\\", "^", "aye", "?" ] b_munge = [ "B", "8", "13", "I3", "|3" , "P>", "|:", "!3", "(3", "/3", "3","]3" ] c_munge = [ "C", "<", "(", "{", "(c)" ] d_munge = [ "D", "|)", "|o", "?", "])", "[)", "I>", "|>", " ?", "T)", "0", "cl" ] e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ] . . . 
1
  • so, did you try actually changing Notepad++'s file format to UTF-8?CommentedJan 23, 2010 at 13:44

3 Answers 3

2

Perhaps you should be using unicode literals (e.g. u'€') instead.

7
  • Well I just tryed that and I got File "C:\Users\admin\Desktop\python\passgen.py", line 9 e_replace_list = [ "E", "3", "&", u"Ç", "ú", "[-", "|=-", "?" ] SyntaxError: (unicode error) 'utf8' codec can't decode byte 0x80 in positio unexpected code byte
    – volting
    CommentedJan 23, 2010 at 13:54
  • 3
    1) Your file isn't UTF-8. 2) They should all be unicode literals. farmdev.com/talks/unicodeCommentedJan 23, 2010 at 14:00
  • ...informative presentation thanks, although I'm not sure Im much wiser... To clarify what you said, 'They should all be unicode literals' when u say 'all' do you mean all characters not included in the ASCII set? Ive done this any it runs, but non ASCII characters are still printed in unicode hex eg. € = u'\u20ac'
    – volting
    CommentedJan 23, 2010 at 14:57
  • Then you should consider showing the code that actually does the work.CommentedJan 23, 2010 at 15:10
  • <code> print e_munge </code> This is the way Im doing it at the moment just for debugging purposes but eventually the characters will printed to a Tkinter GUI
    – volting
    CommentedJan 23, 2010 at 19:41
2

The line:

# -*- coding: UTF-8 -*- 

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ] 

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

 e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ] 

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8 bs = '€' us = u'€' print repr(bs) print repr(us) 

OUTPUT:

'\xe2\x82\xac' u'\u20ac' 
1
  • ok I already deduced that, but how do I get it to print out the character € and not the unicode code...
    – volting
    CommentedJan 23, 2010 at 19:39
1

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

3
  • Well I won't be printing all the lists to Tkinter(atleast not at one time). The program will be a simple password generator which will allow a user to input a word that they would like to use for a password, the program will then do a pseudo-random munge of the word and output the result to a tkinter text box so that the user can copy and past to wherever... .Why do suggest that I dont output to Tkinter?
    – volting
    CommentedJan 24, 2010 at 1:13
  • You said that "the characters will printed to a Tkinter GUI". I'm merely suggesting that you don't use the Python print statement to send the data to Tkinter for display.CommentedJan 24, 2010 at 10:17
  • Ok fair enough, I guess my previous comment was a little ambiguous, thanks for your input.
    – volting
    CommentedJan 24, 2010 at 11:35

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.