3
\$\begingroup\$

I need to read a remote file using FTP and process it and insert the processed data to a PostgreSQL database in another server.

The below code is working fine, but i want some advice to modify few things.

  1. In this line i don't want to write the actual details, I want that to be hidden from my script. How can i do that?

    ftp = FTP('server.com','user_name','password') 
  2. The script downloads the file to my local system and process it as it is a local file. Is there any way to read and process it without downloading it to the source server (there might be permission issues also)?

  3. If there is no way rather than downloading it to the source, should I specify a path where this file would be downloaded rather than the way I have written it?
  4. Is there anything which i can do to improve this code?

Any guidance will be of great help.

import psycopg2 import time import os #import MySQLdb #import paramiko from ftplib import FTP #from utils.config import Configuration as Config #from utils.utils import get_global_config start_time = time.perf_counter() cnx_psql = psycopg2.connect(host="localhost", database="postgres", user="postgres", password="postgres", port="5432") # Cursors initializations cur_psql = cnx_psql.cursor() def getFile(ftp, filename): try: local_filename = os.path.join(r"/Users/linu/Downloads/", filename) lf = open(local_filename, "wb") ftp.retrbinary("RETR " + filename ,lf.write) print("file copied") except: print ("Error") try: filePath='''/Users/linu/Downloads/log''' #filePath='''/cmd/log/stk/log.txt''' table='staging.stock_dump' SQL="""DROP TABLE IF EXISTS """+ table + """;CREATE TABLE IF NOT EXISTS """+ table + """ (created_date TEXT, product_sku TEXT, previous_stock TEXT, current_stock TEXT );""" cur_psql.execute(SQL) cnx_psql.commit() ftp = FTP('server.com','user_name','password') print("FTP connection succesful") data = [] ftp.cwd('/stockitem/') getFile(ftp,'log') read_file = open(filePath, "r") my_file_data = read_file.readlines() for line in my_file_data: if 'Stock:' in line: fields=line.split(" ") date_part1=fields[0] date_part2=fields[1][:-1] sku=fields[3] prev_stock=fields[5] current_stock=fields[7] if prev_stock.strip()==current_stock.strip(): continue else: cur_psql.execute("insert into " + table+"(created_date, product_sku, previous_stock , current_stock)" + " select CAST('" + date_part1+ " "+ date_part2 + "' AS TEXT)" +", CAST('"+sku+"' AS TEXT),CAST('" + prev_stock +"' AS TEXT),CAST('" +current_stock + "' AS TEXT);") cnx_psql.commit() cur_psql.close() cnx_psql.close() print("Data loaded to DWH from text file") print("Data porting took %s seconds to finish---" % (time.perf_counter() - start_time)) except (Exception, psycopg2.Error) as error: print ("Error while fetching data from PostgreSQL", error) print("Error adding information.") quit() finally: ftp.close() 
\$\endgroup\$

    2 Answers 2

    2
    \$\begingroup\$

    I will not repeat what has been already said in the other, excellent, answer by @avazula.

    You open a few files, but never close them. This might lead to bad things. Instead just use the with keyword, which takes care of closing the file, even if an exception occurs. In addition, you can just iterate over a file and it will iterate over the lines. No need for readlines, which reads the whole file into memory (which might not be possible).

    with open(filePath) as read_file: for line in read_file: ... 

    Python has an official style-guide, PEP8, which programmers are encouraged to follow. It recommend using lower_case for variables and functions, not camelCase.

    An important thing to think about any time you use SQL are SQL injections. If there is any chance at all that a user can manipulate the content of the variables, they might be able to execute a malicious query if you are not careful.

    The psycopg2 documentation even has this not so subtle warning:

    Warning: Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.

    To alleviate this, you can use the second argument of execute, as mentioned in the documentation:

    cur_psql.execute(f"INSERT INTO {table}" " (created_date, product_sku, previous_stock , current_stock)" " VALUES (%s, %s, %s, %s)", (date_part1+ " "+ date_part2, sku, prev_stock, current_stock)) 

    Here I have used an f-string (Python 3.6+) to insert the table name (unescaped in this case). You can properly escape it using sql.Identifier(table).

    \$\endgroup\$
      2
      \$\begingroup\$
      1. In this line i don't want to write the actual details,i want that to be hidden from my script.How can i do that? ftp = FTP('server.com','user_name','password')

      You may want to declare constants, either at the beginning of your file or in a Credentials class that would be used as an enum:

      SERVER = 'server.com' USER_NAME = 'Linu' PASSWORD = 'CR.SEisGreat' ftp = FTP(SERVER, USER_NAME, PASSWORD) 

      You may also want to encrypt your password, with base64 or something alike. Storing passwords in files isn't a good idea anyway, so if you're using this file as a script you're going to call via bash or something alike, I would use a placeholder like argparse that would prompt your password when calling the script:

      parser = argparse.ArgumentParser(description='Reading a file from a distant FTP server.') parser.add_argument('--passwd', dest='password', type=str, help='password required for FTP connection') 
      1. The script downloads the file to my local system and process it as it is a local file, Is there anyway too read and process it without downloading it to the source server(there might be permission issues also)?

      I don't know much about FTP manipulation with Python so I'm not sure there is a way to read a file without having to download it.

      1. If there is no way rather than downloading it to the source, shall I able to mention specific path where this file would be downloaded rather than the way i have written?

      Again, you may place the path into a class constant (or in a separate class):

      FILE_PATH='''/path/to/file/''' 
      1. Is there anything which i can do to improve this code?

      Imports

      import psycopg2 import time import os 

      I would edit this to import only the parts you need, e.g.

      from psycopg2 import connect, sql from time import perf_counter 

      There is no need to import the whole module if you only use a couple of functions, so whenever you can, I'd suggest you only import what you will actually use.


      Raising exceptions

      try: # ... except: print ("Error") 

      I would raise an exception here, since it's a undesired behaviour:

       try: # ... except Exception as e: print ("Error: {}".format(e)) 

      You may want to do the same with your psycopg and FTP errors, I see you catch them but don't read their value. It may provide a more helpful message than the prints, which just indicate you encountered an error but don't tell more about what actually happened.

      SQL placeholders

      SQL="""DROP TABLE IF EXISTS """+ table + """;CREATE TABLE IF NOT EXISTS """+ table + """ (created_date TEXT, product_sku TEXT, previous_stock TEXT, current_stock TEXT );""" 

      Should you want to make your code more generic, you could transform the above mentioned statement into:

      sql.SQL="""DROP TABLE IF EXISTS {table_name} ; CREATE TABLE IF NOT EXISTS {table_name} (created_date TEXT, product_sku TEXT, previous_stock TEXT, current_stock TEXT );""".format( table_name = sql.Literal(table) ) 

      That way you may reuse the query on other tables. You may do the same on the table attributes, i.e.

      keys = ['created_date', 'product_sku', 'previous_stock', 'current_stock'] sql.SQL("""CREATE TABLE IF NOT EXISTS {} ({});""").format( sql.Identifier(table), sql.SQL('{} TEXT,').join(map(sql.Identifier, keys)) ) 

      It alleviates the query and makes it more generic and reusable. By the way, I don't think you need to add IF NOT EXISTS after CREATE since you already deleted the table in the first part of your query.

      \$\endgroup\$
      2
      • \$\begingroup\$I think this answer has a lot of great suggestions but I don't believe that giving the password as part of a bash command is an improvement over storing the password in the file, because now the password will end up in ~/.bash_history and can be discovered by anyone who has momentary access to your computer. I don't think secret storage is a perfectly solved problem but it may be safer just staying in the script.\$\endgroup\$CommentedJul 19, 2019 at 18:38
      • 1
        \$\begingroup\$base64 is not encryption, it is encoding. Converting a password to base64 does not offer any security benefits.\$\endgroup\$CommentedJul 19, 2019 at 20:13

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.