Python, Source-Code Encoding Problem

Question

I'm using Notepad++ editor on windows with format set to ASCII, I've read "PEP 263: Source Code Encodings" and amended my code accordingly (I think), but there are characters still printing in hex...

#!/usr/bin/python # -*- coding: UTF-8 -*- import os, sys a_munge = [ "A", "4", "/\\", "\@", "/-\\", "^", "aye", "?" ] b_munge = [ "B", "8", "13", "I3", "|3" , "P>", "|:", "!3", "(3", "/3", "3","]3" ] c_munge = [ "C", "<", "(", "{", "(c)" ] d_munge = [ "D", "|)", "|o", "?", "])", "[)", "I>", "|>", " ?", "T)", "0", "cl" ] e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ] . . .

so, did you try actually changing Notepad++'s file format to UTF-8? — SilentGhost, CommentedJan 23, 2010 at 13:44

Ignacio Vazquez-Abrams · Accepted Answer · 2010-01-23 13:38:25Z

2

Perhaps you should be using unicode literals (e.g. u'€') instead.

answered Jan 23, 2010 at 13:38

Ignacio Vazquez-Abrams

801k160 gold badges1.4k silver badges1.4k bronze badges

Well I just tryed that and I got File "C:\Users\admin\Desktop\python\passgen.py", line 9 e_replace_list = [ "E", "3", "&", u"Ç", "ú", "[-", "|=-", "?" ] SyntaxError: (unicode error) 'utf8' codec can't decode byte 0x80 in positio unexpected code byte
– volting
CommentedJan 23, 2010 at 13:54
3
1) Your file isn't UTF-8. 2) They should all be unicode literals. farmdev.com/talks/unicode
– Ignacio Vazquez-Abrams
CommentedJan 23, 2010 at 14:00
...informative presentation thanks, although I'm not sure Im much wiser... To clarify what you said, 'They should all be unicode literals' when u say 'all' do you mean all characters not included in the ASCII set? Ive done this any it runs, but non ASCII characters are still printed in unicode hex eg. € = u'\u20ac'
– volting
CommentedJan 23, 2010 at 14:57
Then you should consider showing the code that actually does the work.
– Ignacio Vazquez-Abrams
CommentedJan 23, 2010 at 15:10
<code> print e_munge </code> This is the way Im doing it at the moment just for debugging purposes but eventually the characters will printed to a Tkinter GUI
– volting
CommentedJan 23, 2010 at 19:41

| Show 2 more comments

Mark Tolonen · Accepted Answer · 2010-01-23 15:42:00Z

The line:

# -*- coding: UTF-8 -*-

declares that the source file is saved in UTF-8. Anything else is an error.

When you declare byte strings in your source code:

e_munge = [ "E", "3", "&", "€", "£", "[-", "|=-", "?" ]

then byte strings like "€" will actually contain the encoded bytes used to save the source file.

When you use Unicode strings instead:

 e_munge = [ u"E", u"3", u"&", u"€", u"£", u"[-", u"|=-", u"?" ]

then when u followed by the byte-string "€" is read by Python from a source file, it uses the declared encoding to decode that character into Unicode.

An illustration:

# coding: utf-8 bs = '€' us = u'€' print repr(bs) print repr(us)

OUTPUT:

'\xe2\x82\xac' u'\u20ac'

ok I already deduced that, but how do I get it to print out the character € and not the unicode code... — volting, CommentedJan 23, 2010 at 19:39

John Machin · Accepted Answer · 2010-01-24 00:40:53Z

1

print some_list is in effect print repr(some_list) -- that's why you see \u20ac instead of a Euro character. For debugging purposes, the "unicode hex" is exactly what you need for unambiguous display of your data.

You appear to have perfectly OK unicode objects in your list; I suggest that you don't "print" the list to Tkinter.

answered Jan 24, 2010 at 0:40

John Machin

83.1k12 gold badges147 silver badges192 bronze badges

Well I won't be printing all the lists to Tkinter(atleast not at one time). The program will be a simple password generator which will allow a user to input a word that they would like to use for a password, the program will then do a pseudo-random munge of the word and output the result to a tkinter text box so that the user can copy and past to wherever... .Why do suggest that I dont output to Tkinter?
– volting
CommentedJan 24, 2010 at 1:13
You said that "the characters will printed to a Tkinter GUI". I'm merely suggesting that you don't use the Python print statement to send the data to Tkinter for display.
– John Machin
CommentedJan 24, 2010 at 10:17
Ok fair enough, I guess my previous comment was a little ambiguous, thanks for your input.
– volting
CommentedJan 24, 2010 at 11:35

Add a comment |

Collectives™ on Stack Overflow

Python, Source-Code Encoding Problem

3 Answers 3

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Related