Pandas: save to excel encoding issue

Question

I have a similar problem to the one mentioned here but none of the suggested methods work for me.

I have a medium size utf-8 .csv file with a lot of non-ascii characters. I am splitting the file by a particular value from one of the columns, and then I'd like to save each of the obtained dataframes as an .xlsx file with the characters preserved.

This doesn't work, as I am getting an error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 7: ordinal not in range(128)

Here is what I tried:

Using xlsxwriter engine explicitly. This doesn't seem to change anything.

Defining a function (below) to change encoding and throw away bad characters. This also doesn't change anything.

def changeencode(data): cols = data.columns for col in cols: if data[col].dtype == 'O': data[col] = data[col].str.decode('utf-8').str.encode('ascii', 'ignore') return data

Changing by hand all the offensive chars to some others. Still no effect (the quoted error was obtained after this change).
Encoding the file as utf-16 (which, I believe, is the correct encoding since I want to be able to manipulate the file from within the excel afterwards) doesn't help either.

I believe that the problem is in the file itself (because of 2 and 3) but I have no idea how to get around it. I'd appreciate any help. The beginning of the file is pasted below.

"Submitted","your-name","youremail","phone","miasto","cityCF","innemiasto","languagesCF","morelanguages","wiek","partnerCF","messageCF","acceptance-795","Submitted Login","Submitted From","2015-12-25 14:07:58 +00:00","Zózia kryś","[email protected]","4444444","Wrocław","","testujemy polskie znaki","Polski","testujemy polskie znaki","44","test","test","1","Justyna","99.111.155.132",

EDIT

Some code (one of the versions, without the splitting part):

import pandas as pd import string import xlsxwriter df = pd.read_csv('path-to-file.csv') with pd.ExcelWriter ('test.xlsx') as writer: df.to_excel(writer, sheet_name = 'sheet1',engine='xlsxwriter')

Have you already tried df.to_excel(path, encoding='utf8')? — Stefan, CommentedDec 28, 2015 at 0:26
@Stefan I have, thanks for asking. To be sure, I tried this one more time - just now. Still nothing. — jjj, CommentedDec 28, 2015 at 0:57

jjj · Accepted Answer · 2017-03-01 16:10:59Z

Supposedly this was a bug in the version of pandas which I was using back then. Right now, in pandas ver. 0.19.2, the code below saves the csv from the question without any trouble (and with correct encoding).
NB: openpyxl module have to be installed on your system.

import pandas as pd df = pd.read_csv('Desktop/test.csv') df.to_excel('Desktop/test.xlsx', encoding='utf8')

@greghor hmm, weird. I installed the same version just now, and it works for me. Do you have openpyxl installed? — jjj, CommentedFeb 12, 2018 at 16:10
thanks for your reply, I do have openpyxl 2-5-0 installed. After strugling for while, I noticed that if I specify the encoding when loading the data df=pd.read_csv("test.csv", encoding="utf-8"), then it works fine. — greg hor, CommentedFeb 12, 2018 at 16:36

Siva Arasu · Accepted Answer · 2015-12-28 07:39:02Z

2

Try encoding the columns with non-ascii characters as

df['col'] = df['col'].apply(lambda x: unicode(x))

and then save the file to xlsx format with encoding 'utf8'

answered Dec 28, 2015 at 7:39

Siva Arasu

214 bronze badges

Thanks for your suggestion, unfortunately this doesn't work. The same error is returned but now it is triggered by the .apply line.
– jjj
CommentedDec 28, 2015 at 22:46
Can you attach snippet of the csv file here?
– Siva Arasu
CommentedJan 4, 2016 at 4:54
Sorry for the late reply. A part of the file which should be enough to trigger the error is posted in my question. Do you want something more?
– jjj
CommentedFeb 25, 2016 at 23:18

Add a comment |

rbinnun · Accepted Answer · 2015-12-28 02:20:03Z

What if you save the csv files from pandas and then use win32com to convert to Excel. It would look something like this...

import win32com.client excel = win32com.client.Dispatch("Excel.Application") excel.Visible = 0 for x in range(10): f = path + str(x) # not showing the pandas dataframe creation df.to_csv(f+'.csv') wb = excel.Workbooks.Open(f+'.csv') wb.SaveAs(f+'.xlsx', 51) #xlOpenXMLWorkbook=51

Collectives™ on Stack Overflow

Pandas: save to excel encoding issue

3 Answers 3

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Linked

Related