Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - Stata Format

STATA is a widely used statistical software for data analysis, visualization, and statistics. It is developed by StataCorp, and this file format has the extension .dta and is used to store datasets. Python's Pandas library provides easy to use functionality for reading from and writing to Stata data files, enabling easy exchanging of data between Pandas and Stata software.

In this tutorial, we will learn how to effectively use the read_stata() and to_stata() methods to work with the Stata file format.

Exporting DataFame to Stata Format

Pandas library provides the DataFrame.to_stata() method for exporting the data in Pandas DataFrame object to a .dta file. By default, the Stata file format version is 115, which corresponds to Stata 12.

Example

Here is a basic example demonstrating how to export data in a Pandas DataFrame object to a Stata format file. In this example we are exporting the DataFrame into a Stata file without index by setting the write_index=False in the DataFrame.to_stata() method.

 import pandas as pd import numpy as np # Create a sample DataFrame df = pd.DataFrame({ "a": list("abc"), "b": list(range(1, 4)), "c": np.arange(3, 6).astype("u1"), "d": np.arange(4.0, 7.0), "e": [True, False, True], "f": pd.Categorical(list("abc")), "g": pd.date_range("20250101", periods=3) }) print("Original DataFrame:") print(df) # Save the DataFrame to Stata with custom gzip compression df.to_stata("stata_file.dta", write_index=False) print("\nDataFrame has been successfully exported to a Stata file.")

When we run above program, it produces following result −

 Original DataFrame:

	a	b	c	d	e	f	g
0	a	1	3	4.0	True	a	2025-01-01
1	b	2	4	5.0	False	b	2025-01-02
2	c	3	5	6.0	True	c	2025-01-03

If you visit the folder where the Stata dataset files are saved, you can observe the generated .dta file.

Data Type Limitations of Stata Format

Stata data files have specific restrictions on data types that can be stored −

Supported data types: int8, int16, int32, float32, float64 and strings with 244 or fewer characters.
Missing values: Floating-point data type missing values are stored as the basic missing data type ("." in Stata). And integer missing values are not supported while exporting to Stata format.
Automatic Data Type Conversion: If a data type exceeds STATA's range, it is cast to a larger type. For example, int8 (values between -127 and 100) is converted to int16 for values outside this range.
Unsupported Data Types: Data types like int64, bool, uint8, uint16, and uint32 are gracefully handled by casting them to the smallest supported type.
Precision Warning: Converting "int64" to "float64" may result in a loss of precision for values larger than 2**53.
String Limitation: Strings longer than 244 characters are not supported. Attempting to write such data will raise a ValueError.

Importing Data from Stata Format

Pandas provides the read_stata() method to import data from .dta files. It returns a Pandas DataFrame or a pandas.api.typing.StataReader object for incremental reading.

Example

The following example demonstrates how to read dataset from a Stata file using the read_stata() method.

 import pandas as pd import numpy as np # Create a sample DataFrame df = pd.DataFrame({ "a": list("abc"), "b": list(range(1, 4)), "c": np.arange(3, 6).astype("u1"), "d": np.arange(4.0, 7.0), "e": [True, False, True], "f": pd.Categorical(list("abc")), "g": pd.date_range("20250101", periods=3) }) print("Original DataFrame:") print(df) # Save the DataFrame to Stata df.to_stata("stata_file.dta", write_index=False) # Read a Stata file back to DataFrame result = pd.read_stata("stata_file.dta") print("\nDataFrame read from Stata file with custom index:") print(result)

While executing the above code we get the following output −

 Original DataFrame:

	a	b	c	d	e	f	g
0	a	1	3	4.0	True	a	2025-01-01
1	b	2	4	5.0	False	b	2025-01-02
2	c	3	5	6.0	True	c	2025-01-03

DataFrame read from Stata file with custom index:

	a	b	c	d	e	f	g
0	a	1	3	4.0	1	a	2025-01-01
1	b	2	4	5.0	0	b	2025-01-02
2	c	3	5	6.0	1	c	2025-01-03

Reading Data from Stata File in Chunks

For large datasets, you can use the chunksize parameter of the read_stata() method to create a pandas.api.typing.StataReader instance to read the file incrementally. This StataReader object can be used as an iterator.

Example

The following example demonstrates how to use the read_stata() method for reading the Stata data into Pandas DataFrame as an iterator by specifying the value of the chunksize parameter.

 import pandas as pd import numpy as np # Create a sample DataFrame df = pd.DataFrame({ "a": list("abc"), "b": list(range(1, 4)), "c": np.arange(3, 6).astype("u1"), "d": np.arange(4.0, 7.0), "e": [True, False, True], "f": pd.Categorical(list("abc")), "g": pd.date_range("20250101", periods=3) }) print("Original DataFrame:") print(df) # Save the DataFrame to Stata with custom gzip compression df.to_stata("stata_file.dta", write_index=False) print("\nOutput from a Stata file in chunks:") # Reading Data from Stata File in Chunks with pd.read_stata("stata_file.dta", chunksize=1) as reader: for chunk in reader: print(chunk.shape)

Following is an output of the above code −

 Original DataFrame:

	a	b	c	d	e	f	g
0	a	1	3	4.0	True	a	2025-01-01
1	b	2	4	5.0	False	b	2025-01-02
2	c	3	5	6.0	True	c	2025-01-03

Output from a Stata file in chunks: (1, 7) (1, 7) (1, 7)

Handling Categorical Data in Stata Files

Pandas supports exporting and importing categorical data with value labels in Stata format. Stata supports only string value labels. Non-string categories are converted to strings, potentially resulting in a loss of information.

When importing categorical data from a Stata file into a Pandas DataFrame, it is converted to a Pandas Categorical with integer codes by default. You can use the convert_categoricals=False to import original data values instead of converting them.

Example

This example demonstrates how categorical data in a DataFrame can be saved to a Stata file and read back without converting the categorical values to their string labels.

 import pandas as pd import numpy as np # Create a sample DataFrame df = pd.DataFrame({ "Col_a": pd.Categorical(list("abcdef")), "Col_b": list("aeioou")}, dtype="category") print("Original DataFrame:") print(df) # Save the DataFrame to Stata with custom gzip compression df.to_stata("stata_file.dta", write_index=False) # Read a Stata file by specifying the column to set it as DataFrame Index result = pd.read_stata("stata_file.dta", convert_categoricals=False) print("\nDataFrame read from Stata file with custom index:") print(result)

Following is an output of the above code −

 Original DataFrame:

	Col_a	Col_b
0	a	a
1	b	e
2	c	i
3	d	o
4	e	o
5	f	u

DataFrame read from Stata file with custom index:

	Col_a	Col_b
0	0	0
1	1	1
2	2	2
3	3	3
4	4	3
5	5	4

Print Page