Upload csv file to sqlserver with python

Question

I have to weekly upload a csv file to sqlserver, and I do the job using python 3. The problem is that it takes too long for the file to be uploaded (around 30 minutes), and the table has 49000 rows and 80 columns.

Here is a piece of the code, where I have to transform the date format and replace quotes as well. I have already tried it with pandas, but took longer than that.

import csv import os import pyodbc import time srv='server_name' db='database' tb='table' conn=pyodbc.connect('Trusted_Connection=yes',DRIVER='{SQL Server}',SERVER=srv,DATABASE=db) c=conn.cursor() csvfile='file.csv' with open(csvfile,'r') as csvfile: reader = csv.reader(csvfile, delimiter=';') cnt=0 for row in reader: if cnt>0: for r in range(0,len(row)): #this is the part where I transform the date format from dd/mm/yyyy to yyyy-mm-dd if (len(row[r])==10 or len(row[r])==19) and row[r][2]=='/' and row[r][5]=='/': row[r]=row[r][6:10]+'-'+row[r][3:5]+'-'+row[r][0:2] #here I replace the quote to nothing, since it is not important for the report if row[r].find("'")>0: row[r]=row[r].replace("'","") #at this part I query the index to increment by 1 on the table qcnt="select count(1) from "+tb resq=c.execute(qcnt) rq=c.fetchone() rq=str(rq[0]) #here I insert each row into the table that already exists insrt=("insert into "+tb+" values("+rq+",'"+("', '".join(row))+"')") if cnt>0: res=c.execute(insrt) conn.commit() cnt+=1 conn.close()

Any help will be appreciated. Thanks!

sorry, copied from my code but forgot to insert it here. It comes at this part: 'with open(csvfile,'r') as csvfile: reader = csv.reader(csvfile, delimiter=';')'. Just edited it now. — Rafael Polara, CommentedFeb 8, 2019 at 13:23

vnp · Accepted Answer · 2019-02-07 22:50:44Z

First of all, when in doubt, profile.

Now a not-so-wild guess. Most of the time is wasted in

 qcnt="select count(1) from "+tb resq=c.execute(qcnt) rq=c.fetchone() rq=str(rq[0])

In fact, the rq is incremented by each successful insert. Better fetch it once, and increment it locally:

 qcnt="select count(1) from "+tb resq=c.execute(qcnt) rq=c.fetchone() for row in csvfile: .... insert = .... c.execute(insert) rq += 1 ....

Another guess is that committing each insert separately also does not help with performance. Do it once, after the loop. In any case, you must check the success of each commit.

Notice that if there is more than one client updating the table simultaneously, there is a data race (clients fetching the same rq), both with the original design, and with my suggestion. Moving rq into a column of its own may help; I don't know your DB design and requirements.

Consider a single insert values, wrapped into a transaction, instead of multiple independent inserts.

Testing for cnt > 0 is also wasteful. Read and discard the first line; then loop for the remaining rows.

Figuring out whether a field represents a date seems strange. You shall know that in advance.

Thanks man, made two modifications, and elapsed time was reduced by half the time (mainly on the increment part). Awesome!!! — Rafael Polara, CommentedFeb 8, 2019 at 17:17

Stack Exchange Network

Upload csv file to sqlserver with python

1 Answer 1

Hot Network Questions

Upload csv file to sqlserver with python

1 Answer 1

Related

Hot Network Questions