Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas read_hdf() Method

The read_hdf() method in Python's Pandas library is used to read data from HDF5 (Hierarchical Data Format) files into a Pandas object such as Series or DataFrame. HDF5 is a widely used file format that supports the storage of large datasets, metadata, and heterogeneous data efficiently.

The read_hdf() method simplifies loading HDF5 data into Pandas for analysis and manipulation. It also provides options for querying and filtering data stored in these files efficiently. This method only supports reading the local system files, and it not supports remote URLs and file-like objects.

Syntax

The syntax of the read_hdf() method is as follows −

 pandas.read_hdf(path_or_buf, key=None, mode='r', errors='strict', where=None, start=None, stop=None, columns=None, iterator=False, chunksize=None, **kwargs)

Parameters

The Python Pandas read_hdf() method accepts the following parameters −

path_or_buf: The file path, buffer, or file-like object to read the HDF5 file from.
key: The identifier for the dataset or table within the HDF5 file.
mode: Specifies the mode to open the file. Common values are 'r' (read-only), 'r+' (read/write), and 'a' (append).
errors: Specifies how to handle errors while encoding and decoding.
where: Conditions to filter data (like SQL WHERE clause).
start: Specifies starting row for loading data.
stop: Specifies ending row for loading data.
columns: Specific columns to load from the HDF5 dataset.
iterator: If True, returns an iterator for reading data in chunks.
chunksize: Number of rows per chunk if iterator is True.
**kwargs: Additional keyword arguments passed to HDFStore.

Return Value

The Pandas read_hdf() method returns a Pandas object containing the data from the HDF5 file.

Example: Reading a Simple HDF5 File

Let's see a basic example of demonstrating how to read an entire HDF5 file using the pandas read_hdf() method.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file df.to_hdf('data.h5', key='dataset') # Reading an HDF5 file df = pd.read_hdf('data.h5') print("DataFrame from HDF5 File:") print(df)

Following is an output of the above code −

 DataFrame from HDF5 File:

	Name	Age	City
0	Kiran	25	New Delhi
1	Priya	30	Hyderabad
2	Naveen	35	Chennai

Example: Reading HDF5 Data with Specific Key

This example shows how to read a specific dataset or table from an HDF5 file using the key parameter. In this example initially we have saved two sets of data to the "data.h5" file under the "dataset_1" and "dataste_2" keys, then retrieved the data using a specific key.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('data.h5', key='dataset_1') # Create a new DataFrame new_data = {'Name': ['Suman', 'Dev'], 'Score': [45, 76]} new_df = pd.DataFrame(new_data) # Append to existing HDF5 file under dataset_2 key new_df.to_hdf('data.h5', key='dataset_2', mode='a') # Reading specific key from HDF5 file result = pd.read_hdf('data.h5', key='dataset_1') print("DataFrame for Key 'dataset_1':") print(result)

While executing the above code, you will get the following output −

 DataFrame for Key 'dataset_1':

	Name	Age	City
0	Kiran	25	New Delhi
1	Priya	30	Hyderabad
2	Naveen	35	Chennai

Example: Querying Data While Reading HDF5 File

Here is an example demonstrating filtering data while reading HDF5 file using the where parameter.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('example_data.h5', format='table', key='dataset_1', data_columns=True) # Reading HDF5 data while Querying result = pd.read_hdf('example_data.h5', 'dataset_1', where='Age < 32') print("Filtered DataFrame:") print(result)

Following is an output of the above code −

 Filtered DataFrame:

	Name	Age	City
0	Kiran	25	New Delhi
1	Priya	30	Hyderabad

Example: Reading Specific Columns

Here is another example that demonstrates how to load specific columns data from an HDF5 file, for this you can use the column parameter of the read_hdf() method.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('example_data.h5', format='table', key='dataset_1') # Reading specific columns from a HDF5 file df = pd.read_hdf('example_data.h5', key='dataset_1', columns=['Name', 'City']) print("DataFrame from HDF5 file with Specific Columns:") print(df)

Upon executing the above code you will get the following output −

 DataFrame from HDF5 file with Specific Columns:

	Name	City
0	Kiran	New Delhi
1	Priya	Hyderabad
2	Naveen	Chennai

Example: Reading HDF5 Data in Chunks

You can use the chunksize parameter to read large datasets in smaller chunks. The following example demonstrates the same.

 import pandas as pd # Create a DataFrame data = {'Name': ['Kiran', 'Priya', 'Naveen'], 'Age': [25, 30, 35], 'City': ['New Delhi', 'Hyderabad', 'Chennai']} df = pd.DataFrame(data) # Save DataFrame to HDF5 file under dataset_1 key df.to_hdf('example_data.h5', format='table', key='dataset_1') # Reading HDF5 data in chunks chunk_iterator = pd.read_hdf('example_data.h5', key='dataset_1', chunksize=1) for chunk in chunk_iterator: print("Chunk DataFrame:") print(chunk)

Following is an output of the above code −

 Chunk DataFrame:

	Name	Age	City
0	Kiran	25	New Delhi

Chunk DataFrame:

	Name	Age	City
1	Priya	30	Hyderabad

Chunk DataFrame:

	Name	Age	City
2	Naveen	35	Chennai

python_pandas_io_tool.htm

Print Page