Python Pandas read_hdf() Method



The read_hdf() method in Python's Pandas library is used to read data from HDF5 (Hierarchical Data Format) files into a Pandas object such as Series or DataFrame. HDF5 is a widely used file format that supports the storage of large datasets, metadata, and heterogeneous data efficiently.

The read_hdf() method simplifies loading HDF5 data into Pandas for analysis and manipulation. It also provides options for querying and filtering data stored in these files efficiently. This method only supports reading the local system files, and it not supports remote URLs and file-like objects.

Syntax

The syntax of the read_hdf() method is as follows −

 pandas.read_hdf(path_or_buf, key=None, mode='r', errors='strict', where=None, start=None, stop=None, columns=None, iterator=False, chunksize=None, **kwargs) 

Parameters

The Python Pandas read_hdf() method accepts the following parameters −

  • path_or_buf: The file path, buffer, or file-like object to read the HDF5 file from.

  • key: The identifier for the dataset or table within the HDF5 file.

  • mode: Specifies the mode to open the file. Common values are 'r' (read-only), 'r+' (read/write), and 'a' (append).

  • errors: Specifies how to handle errors while encoding and decoding.

  • where: Conditions to filter data (like SQL WHERE clause).

  • start: Specifies starting row for loading data.

  • stop: Specifies ending row for loading data.

  • columns: Specific columns to load from the HDF5 dataset.

  • iterator: If True, returns an iterator for reading data in chunks.

  • chunksize: Number of rows per chunk if iterator is True.

  • **kwargs: Additional keyword arguments passed to HDFStore.

Return Value

The Pandas read_hdf() method returns a Pandas object containing the data from the HDF5 file.

Example: Reading a Simple HDF5 File

Let's see a basic example of demonstrating how to read an entire HDF5 file using the pandas read_hdf() method.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file df.to_hdf('data.h5', key='dataset') # Reading an HDF5 file df = pd.read_hdf('data.h5') print("DataFrame from HDF5 File:") print(df) 

Following is an output of the above code −

 DataFrame from HDF5 File: 
NameAgeCity
0Kiran25New Delhi
1Priya30Hyderabad
2Naveen35Chennai

Example: Reading HDF5 Data with Specific Key

This example shows how to read a specific dataset or table from an HDF5 file using the key parameter. In this example initially we have saved two sets of data to the "data.h5" file under the "dataset_1" and "dataste_2" keys, then retrieved the data using a specific key.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('data.h5', key='dataset_1') # Create a new DataFrame new_data = {'Name': ['Suman', 'Dev'], 'Score': [45, 76]} new_df = pd.DataFrame(new_data) # Append to existing HDF5 file under dataset_2 key new_df.to_hdf('data.h5', key='dataset_2', mode='a') # Reading specific key from HDF5 file result = pd.read_hdf('data.h5', key='dataset_1') print("DataFrame for Key 'dataset_1':") print(result) 

While executing the above code, you will get the following output −

 DataFrame for Key 'dataset_1': 
NameAgeCity
0Kiran25New Delhi
1Priya30Hyderabad
2Naveen35Chennai

Example: Querying Data While Reading HDF5 File

Here is an example demonstrating filtering data while reading HDF5 file using the where parameter.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('example_data.h5', format='table', key='dataset_1', data_columns=True) # Reading HDF5 data while Querying result = pd.read_hdf('example_data.h5', 'dataset_1', where='Age < 32') print("Filtered DataFrame:") print(result) 

Following is an output of the above code −

 Filtered DataFrame: 
NameAgeCity
0Kiran25New Delhi
1Priya30Hyderabad

Example: Reading Specific Columns

Here is another example that demonstrates how to load specific columns data from an HDF5 file, for this you can use the column parameter of the read_hdf() method.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('example_data.h5', format='table', key='dataset_1') # Reading specific columns from a HDF5 file df = pd.read_hdf('example_data.h5', key='dataset_1', columns=['Name', 'City']) print("DataFrame from HDF5 file with Specific Columns:") print(df) 

Upon executing the above code you will get the following output −

 DataFrame from HDF5 file with Specific Columns: 
NameCity
0KiranNew Delhi
1PriyaHyderabad
2NaveenChennai

Example: Reading HDF5 Data in Chunks

You can use the chunksize parameter to read large datasets in smaller chunks. The following example demonstrates the same.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('example_data.h5', format='table', key='dataset_1') # Reading HDF5 data in chunks chunk_iterator = pd.read_hdf('example_data.h5', key='dataset_1', chunksize=1) for chunk in chunk_iterator: print("Chunk DataFrame:") print(chunk) 

Following is an output of the above code −

 Chunk DataFrame: 
NameAgeCity
0Kiran25New Delhi
Chunk DataFrame:
NameAgeCity
1Priya30Hyderabad
Chunk DataFrame:
NameAgeCity
2Naveen35Chennai
python_pandas_io_tool.htm
Advertisements
close