As @KarnKumar@KarnKumar said in the comments, you should only load the columns you care about during read_csv
. By using usecols
.
- 44k
- 7
- 76
- 155
As @KarnKumar said in the comments, you should only load the columns you care about during read_csv
.
As @KarnKumar said in the comments, you should only load the columns you care about during read_csv
. By using usecols
.
- 68.9k
- 5
- 74
- 237
As @KarnKumar said in the comments, you should only load the columns you care about during read_csv
.
The filename itself is a problem, because it isn't a CSV; it's a TSV.
The repeated split()
-and-index calls aren't the worst, but I prefer regex parsing with named captures. Pandas has good support for this.
You should do your C7000
filter sooner, so that there's less data to process later.
The output format is bizarre and unhelpful. Perhaps this is an X/Y problem and there's a specific reason it looks that way, but in the absence of any justification, just output a two-column dataframe with a server name index:
import pandas as pd df = pd.read_csv( 'testcreate.tsv', sep='\t', usecols=('Server', 'Server Name', 'Appliance Name'), ) df = df[df['Appliance Name'].str.contains('C7000')] server = df['Server Name'].str.extract( r'''(?x) ^ # start (?P<Server> # named server capture [^.]+ # non-dots (first DNS portion), greedy ) ''', expand=False, ) enclosure_bay = df['Server'].str.extract( r'''(?x) ^ # start (?P<Enclosure> # named capture [^,]+ # non-commas ) ,\ * # comma, optional spaces (?P<Bay> # named capture [^,]+? # non-commas, lazy ) $ # end ''' ) out = pd.DataFrame( index=server, data={ 'Enclosure': enclosure_bay['Enclosure'].str.upper().values, 'Bay': enclosure_bay['Bay'].str.lower().values, }, ) print(out) ''' Original output: Bay,ENC1003,ENC1006,ENC1007,ENC1011,ENC1012,ENC2004,ENC2006,ENC2010,ENC2011 bay 1,tdm1024,vds1009,vds1023,tdm1068,tdm1083,tdm2033,vds2009,tdm2066,tdm2081 New output: Enclosure Bay Server tdm2066 ENC2010 bay 1 tdm1068 ENC1011 bay 1 tdm1083 ENC1012 bay 1 tdm2033 ENC2004 bay 1 vds2009 ENC2006 bay 1 tdm2081 ENC2011 bay 1 tdm1024 ENC1003 bay 1 vds1009 ENC1006 bay 1 vds1023 ENC1007 bay 1 '''