Member Avatar for sudipta.mml

Hi all,

I want to combine multiple files those are kept in a directory. Before getting the big file, I want to edit first and then combine them. For example,

I have two files such as file1 and file2 in a directory, say TEST

file1 contains

1 C 8.95377612903e-07 2 C 2.54310967742e-06 3 C 1.07986354839e-05 4 C 5.07842354839e-05 5 G 0.000244339548387 6 T 0.00117745145161 7 G 0.00199657145161 8 T 0.0059136516129 9 A 0.0093243416129 10 T 0.0122818903226 11 A 0.0148356919355 12 C 0.0170317919355 13 A 0.0273163903226 14 C 0.0360846919355 15 T 0.0767962919355 16 G 0.111205 17 A 0.269571983871 18 A 0.402242983871 19 A 0.512890032258 20 A 0.604756 21 A 0.680681 22 A 0.743145983871 23 A 0.794299 

and file2 contains

1 G 4.14724e-08 2 T 7.38683612903e-08 3 G 7.33707806452e-08 4 T 1.27077546774e-07 5 A 1.37361132258e-07 6 G 1.37420164516e-07 7 A 1.59060645161e-07 8 A 1.59608032258e-07 9 A 1.55923274194e-07 10 C 9.81361774194e-08 11 C 9.78695322581e-08 12 G 1.00416609677e-07 13 G 1.12081406452e-07 14 G 1.50283725806e-07 15 G 2.55789580645e-07 16 G 5.5415083871e-07 17 G 1.36960016129e-06 18 A 3.62490290323e-06 19 T 4.50988016129e-06 20 T 4.84488935484e-06 21 G 4.94761693548e-06 22 A 5.31025516129e-06 23 C 5.42889516129e-06 

Each file contains three column (an integer, a character, a decimal number). Now, I want to combine these two files in such a manner that the first column will be continuously increasing but the remaining two columns will not change.

I really appreciate your help. I look forward to hear from you soon.

Best wishes
Sudipta

Member Avatar for Gribouillis

Write code to read each file in sequence and print each line first. We want to see python code.

Member Avatar for sudipta.mml
import os import re import string path = '/home/sudipta/window' for itemName in os.listdir(path): #Loops over each itemName in the path. Joins the path and the itemName #and assigns the value to itemName. itemName = os.path.join(path, itemName) if os.path.isfile(itemName): lines= file(itemName, 'r').readlines() for i in range(0,len(lines)): data=lines[i].split(',') This is my initial code. After that I cant proceed. 
Member Avatar for Gribouillis

The lines don't look separated by commas, but tabs or spaces. Line 14 in your code won't probably work as expected. You can try something like

import os import re import string path = '/home/sudipta/window' count = 0 for itemName in os.listdir(path): #Loops over each itemName in the path. Joins the path and the itemName #and assigns the value to itemName. itemName = os.path.join(path, itemName) if os.path.isfile(itemName): lines= file(itemName, 'r').readlines() for i in range(0,len(lines)): lineno, nucleotid, value = lines[i].rstrip().split() count += 1 print("\t".join((str(count), nucleotid, value))) 
Member Avatar for sudipta.mml

Thank you very much for the reply. It edits and combines the files. But, the order of combining of the files are not correct. In my directory two files named as winjob_1_s and winjob_2_s. I want combine these two in this order but the program does in reverse order. How to solve this?

Member Avatar for Gribouillis

The solution is to sort the filenames. The most obvious way to do this is

filenames = os.listdir(path) filenames = sorted(filenames) 

However, this code sorts filenames alphabetically, which is not necessarily what you want, for example

>>> L = ["winjob_1_s", "winjob_2_s", "winjob_12_s"] >>> sorted(L) ['winjob_12_s', 'winjob_1_s', 'winjob_2_s'] 

winjob_12_s comes first because 2 is before _ in lexicographic order. What you can do is define a function which returns a numeric value for every filename and sort according to this value. For example

def score(filename): try: L = filename.split("_") # winjob_2_s --> ['winjob', '2', 's'] value = int(L[1]) except Exception: # L had length 0 or L[1] is not an integer value = -1 return value filenames = sorted(filenames, key = score) 

This sorts the filenames according to our score function.

>>> L = ["winjob_1_s", "winjob_2_s", "winjob_12_s"] >>> sorted(L, key = score) ['winjob_1_s', 'winjob_2_s', 'winjob_12_s'] 
Member Avatar for sudipta.mml

Thank you very much for the reply. It works but I have another problem. The directory contains different kinds of file and I want to combine only one kind of file of that directory.

For example:
/home/sudipta/window directory contains winjob_1_s winjob_2_s winjob_3_s and another kind of files named as winjob_1_s_z winjob_2_s_z winjob_3_s_z. I want to combine these two kind of files separately. How to do that?

Member Avatar for Gribouillis

Write code to produce two lists of file names, then handle each list separately.

Member Avatar for sudipta.mml

How to produce two lists of file names? Can you help me in this regard?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.