Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Working with CSV Format

Working with the CSV format is a common task in data analysis and data science. CSV (Comma-Separated Values) files are widely used to store tabular data because they are lightweight, human-readable, and supported by almost all data analysis tools and programming languages.

The Python Pandas library is a powerful tool for working with data, it offers extensive functionality for reading, processing, and writing data in CSV format. With Pandas, you can easily handle complex operations like filtering, grouping, and manipulating data in CSV files.

A CSV file is a plain text file where data values are separated by commas, representing tabular data in plain text format. A CSV file has a .csv extension. Below you can see how the data present in the CSV file looks like −

 Sr.no,Name,Gender,Age 1,Braund,male,22 2,Cumings,female,38 3,Heikkinen,female,26 4,Futrelle,female,35

In this tutorial, we will learn how to work with CSV files using Pandas, including reading CSV files into DataFrames, understanding alternative reading methods, and handling large datasets, to exporting data back to CSV.

Reading a CSV File in Pandas

The pandas.read_csv() function is used to read the CSV format file into the Pandas DataFrame or TextFileReader. This function accepts CSV data from a URL or a local file path to load the data into the Pandas environment.

Example

The following example demonstrates how to read CSV data using the pandas.read_csv() function. Here we are using the StringIO to load the CSV string into a file-like object.

 import pandas as pd # Import StringIO to load a file-like object for reading CSV from io import StringIO # Create string representing CSV data data = """Name,Gender,Age Braund,male,22 Cumings,female,38 Heikkinen,female,26 Futrelle,female,35""" # Use StringIO to convert the string data into a file-like object obj = StringIO(data) # read CSV into a Pandas DataFrame df = pd.read_csv(obj) print(df)

Following is the output of the above code −

	Name	Gender	Age
0	Braund	male	22
1	Cumings	female	38
2	Heikkinen	female	26
3	Futrelle	female	35

Writing Data to a CSV File

Pandas provides a method called to_csv() to create or write CSV file using the Pandas data structures, such as DataFrame or Series objects. This function allows you to export your data to a CSV format.

Example

Here is an example demonstrating how to write a Pandas DataFrame to a CSV file using the DataFrame.to_csv() method.

 import pandas as pd # dictionary of lists d = {'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'], 'Date_of_purchase': ['2024-10-10', '2024-10-12', '2024-10-17', '2024-10-16', '2024-10-19', '2024-10-22']} # creating dataframe from the above dictionary of lists dataFrame = pd.DataFrame(d) print("Original DataFrame:\n",dataFrame) # write dataFrame to SalesRecords CSV file dataFrame.to_csv("Output_written_CSV_File.csv") # display the contents of the output csv print("The output csv file written successfully...")

Following is the output of the above code −

 Original DataFrame:

	Car	Date of Purchase
0	BMW	2024-10-10
1	Lexus	2024-10-12
2	Audi	2024-10-17
3	Mercedes	2024-10-16
4	Jaguar	2024-10-19
5	Bentley	2024-10-22

The output csv file written successfully...

If you visit your working directory after executing the above code, you can see the created CSV file named Output_written_CSV_File.csv.

Handling Large CSV Files in Pandas

When working with large CSV files, loading the entire file may cause memory issues. Pandas provides option like chunksize in pandas.read_csv() function to process such files efficiently in smaller chunks.

Example

Below is an example that initially creates a large CSV file using the DataFrame.to_csv() method with random integers and then processes it in chunks using the Pandas read_csv() function.

 import pandas as pd import numpy as np # Generate a DataFrame with random integers data = np.random.randint(0, 100, size=(1000, 5)) column_names = [f"Col_{i}" for i in range(1, 5 + 1)] # Create a DataFrame and save it as a CSV file large_csv_file = "large_file.csv" df = pd.DataFrame(data, columns=column_names) df.to_csv(large_csv_file, index=False) print(f"Large CSV file is created successfully.\n") # Read large CSV file in chunks chunk_size = 200 print("Output CSV data in chunks:") for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): print('Data in chunks:') print(chunk.head(2))

While executing the above code we obtain the following output −

 Large CSV file is created successfully. Output CSV data in chunks: Data in chunks:

	Col_1	Col_2	Col_3	Col_4	Col_5
0	57	74	2	85	54
1	23	27	28	51	69

Data in chunks:

	Col_1	Col_2	Col_3	Col_4	Col_5
200	53	14	38	91	9
201	96	90	0	13	54

Data in chunks:

	Col_1	Col_2	Col_3	Col_4	Col_5
400	28	52	2	57	90
401	57	27	60	91	46

Data in chunks:

	Col_1	Col_2	Col_3	Col_4	Col_5
600	97	15	66	67	30
601	34	50	38	95	50

Data in chunks:

	Col_1	Col_2	Col_3	Col_4	Col_5
800	81	12	15	3	61
801	77	75	52	95	87

Alternatives to Reading CSV Files

In addition to the pandas.read_csv() function, Pandas provides an alternative method for reading CSV data using pandas.read_table() function.

The pandas.read_table() function is used to read general delimited files such as CSV, TSV, or other delimiter-separated formats into a Pandas DataFrame. It is a good alternative for loading CSV files, and it easily handles various delimiters using the sep parameter. Additionally, this function supports iterating or breaking of the file into chunks.

Example

This example shows an alternative way to load CSV data into the Pandas DataFrame using the pd.read_table() function. Here you need to specify the delimiter with the sep parameter to read comma-separated values (CSV).

 import pandas as pd url ="https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/blood_pressure.csv" # read CSV into a Pandas DataFrame using the read_table() function df = pd.read_table(url,sep=',') print(df.head(5))

Following is the output of the above code −

	Patient	Sex	Age Group	BP Before	BP After
0	1	Male	30-45	143	153
1	2	Male	30-45	163	170
2	3	Male	30-45	153	168
3	4	Male	30-45	153	142
4	5	Male	30-45	146	141

Print Page