Python Pandas - Feather File Format



The Feather file format in Pandas provides a fast and efficient way to store and retrieve DataFrame data in a binary format. It is a portable file format optimized for high-performance I/O operations and is portable across different platforms.

What is the Feather File Format?

Feather is a binary columnar file format designed for efficient data storage and fast retrieval of tabular data. It supports all Pandas data types, including extension types like categorical and timezone-aware datetime types. The format is based on Apache Arrow's memory specification, enabling high-performance I/O operations.

The Feather file format is language-independent binary file format designed for efficient data exchanging. It is supported by both Python and R languages, ensuring easy data sharing compatibility across data analysis languages. This format is also efficient for fast reading and writing capabilities with less memory usage.

Important Considerations

When working with Feather files in Pandas, you need to keep the following points in mind −

  • Index Storage: Pandas does not store DataFrame indices (Index, or MultiIndex) in Feather files. You can use reset_index() method if you need to store the index.

  • Unique Column Names: Duplicate or non-string column names are not supported.

  • Object Data Types: Columns with object data types are not supported and will raise an error during serialization.

Saving a Pandas DataFrame as a Feather File

To save a Pandas object to a Feather file, you can use the DataFrame.to_feather() method, which saves data of the Pandas object to a file in feather format.

Note: Before saving or retrieving the data from a feather file you need to ensure that the 'pyarrow' library is installed. It is an optional Python dependency library that needs to be installed it by using the following command −

 pip install pyarrow. 

Example

Following is the example that uses the to_feather() method for saving a Pandas DataFrame object into a feather file.

 import pandas as pd import numpy as np # Create a sample DataFrame df = pd.DataFrame({ "a": list("abc"), "b": list(range(1, 4)), "c": np.arange(3, 6).astype("u1"), "d": np.arange(4.0, 7.0), "e": [True, False, True], "f": pd.Categorical(list("abc")), "g": pd.date_range("20240101", periods=3) }) print("Original DataFrame:") print(df) # Save the DataFrame as a feather file df.to_feather("df_feather_file.feather") print("\nDataFrame is successfully saved as a feather file.") 

When we run above program, it produces following result −

 Original DataFrame: 
abcdefg
0a134.0Truea2024-01-01
1b245.0Falseb2024-01-02
2c356.0Truec2024-01-03
DataFrame is successfully saved as a feather file.

Reading a Feather File into Pandas

For loading a feather file data into the Pandas object, you can use the read_feather() method. This method provides several options for customizing data reading.

Example

This example reads the Pandas object from a feather file using the Pandas read_feather() method.

 import pandas as pd import numpy as np # Create a sample DataFrame df = pd.DataFrame({ "a": list("abc"), "b": list(range(1, 4)), "c": np.arange(3, 6).astype("u1"), "d": np.arange(4.0, 7.0), "e": [True, False, True], "f": pd.Categorical(list("abc")), "g": pd.date_range("20240101", periods=3) }) # Save the DataFrame as a feather file df.to_feather("df_feather_file.feather") # Load the feather file result = pd.read_feather("df_feather_file.feather") # Display the DataFrame print(result) # Verify data types print("\nData Type of the each column:") print(result.dtypes) 

While executing the above code we get the following output −

abcdefg
0a134.0Truea2024-01-01
1b245.0Falseb2024-01-02
2c356.0Truec2024-01-03
Data Type of the each column: a object b int64 c uint8 d float64 e bool f category g datetime64[ns] dtype: object

Handling Feather Files in Memory

In-memory files in Python stores the data in RAM rather than reading/writing to a disk. This avoids the high latency of physical I/O operations. Python provides several types of in-memory files, including −

  • Memory-mapped files

  • StringIO

  • BytesIO

  • MemoryFS

Example

This example demonstrates saving and loading a DataFrame as a feather format In-Memory using the read_feather() and to_feather() methods with the help of the BytesIO library, for the in-memory binary data storage.

 import pandas as pd import io # Create a DataFrame df = pd.DataFrame({"Col_1": range(5), "Col_2": range(5, 10)}) print("Original DataFrame:") print(df) # Save the DataFrame as In-Memory feather buf = io.BytesIO() df.to_feather(buf) # Read the DataFrame from the in-memory buffer loaded_df = pd.read_feather(buf) print("\nDataFrame Loaded from In-Memory Feather:") print(loaded_df) 

Following is an output of the above code −

 Original DataFrame: 
Col_1Col_2
005
116
227
338
449
DataFrame Loaded from In-Memory Feather:
Col_1Col_2
005
116
227
338
449
Advertisements
close