I'm dealing with a series of bed files, which look like this:
chr1 100 110 0.5 chr1 150 175 0.2 chr1 200 300 1.5
With the columns being chromosome, start, end, score. I have multiple different files with different scores in each one, and I'd like to combine them like this:
> cat a.bed chr1 100 110 0.5 chr1 150 175 0.2 chr1 200 300 1.5 > cat b.bed chr1 100 110 0.4 chr1 150 175 0.7 chr1 200 300 0.9 > cat c.bed chr1 100 110 1.5 chr1 150 175 1.2 chr1 200 300 0.1 > cat combined.bed chr1 100 110 0.5 0.4 1.5 chr1 150 175 0.2 0.7 1.2 chr1 200 300 1.5 0.9 0.1
All the score columns (last column of the file) are added to a single file. I found this answer, which can combine a column from one additional file into an existing file, but I would like a command which can add a variable number of columns together. So if I have 10 bed files to combine, I'd like a command that can process them all together and create a single file with 10 score columns.
Each file should have the same number of lines, and each entry should have the same coordinates in all the files, so there should be no conflicts there. However there can be a lot of entries in each of the files (100K or more generally), so I'd like to avoid processing each one multiple times.
Is there a way to handle this cleanly? This will be in a script so no need to be a one liner.
a.bed
should followb.bed
? Are those sample file names?