SciPy - Working With Different File Formats



SciPy is versatile when it comes to working with different file formats. Beyond standard .mat, .npy and .npz formats SciPy offers support for other file types such as text files, CSV files, images and sound files which are commonly encountered in scientific computing, data analysis and machine learning.

Let's have a look at how to use SciPy with these various file formats in detail −

Text and CSV Files

Text and CSV files are among the most common formats for storing and exchanging tabular data. SciPy and NumPy provide efficient tools for reading from and writing to these formats by making it easy to handle datasets for scientific analysis and machine learning.

Reading Text and CSV Files

SciPy offers the scipy.io.loadtxt() and scipy.io.genfromtxt() functions for loading data from text and CSV files.

Using scipy.io.loadtxt()

loadtxt() is suitable for well-formatted numeric data with no missing values. It loads data directly into a NumPy array which can then be used for analysis. Here is the example of using the scipy.io.loadtxt() function −

 import numpy as np # Load data from a CSV file with a comma delimiter data = np.loadtxt('data.csv', delimiter=',') print(data) 

Following is the output of loading the text file data using the scipy.io.loadtxt() function −

 <Compressed Sparse Row sparse matrix of dtype 'int32' with 4 stored elements and shape (4, 4)> Coords Values (0, 0) 1 (1, 1) 2 (2, 2) 3 (3, 3) 4 

Using scipy.io.genfromtxt()

genfromtxt() is more versatile by handling missing values and various data types such as strings and floats. It's ideal for text files with inconsistent data. Below is the example which handles the missing values from the .csv file −

 import numpy as np # Load data, filling missing values with zero data = np.genfromtxt('/files/data_with_missing.csv', delimiter=',', filling_values=0) print(data) 

Following is the output of the handling the missing values using the scipy.io.genfromtxt() function −

 [[ 0. 0. 0. 0.] [ 0. 0. 0. 88.] [ 0. 27. 0. 92.] [ 0. 22. 0. 95.] [ 0. 0. 0. 70.]] 

Writing Text and CSV Files

SciPy itself doesnt provide direct functions for writing to text and CSV files so we typically rely on NumPys savetxt function or Pythons built-in CSV module.

Using NumPys savetxt with SciPy Data

np.savetxt() is versatile and supports writing arrays to text and CSV files by allowing control over delimiters, formatting and headers. Following is the example of using the savetxt() function of numpy with scipy data −

 import numpy as np # Sample 2D array data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Save as CSV with comma delimiter np.savetxt('/files/output.csv', data, delimiter=',', fmt='%d') print("The file has been updated") 

Following is the output of the writing data into the csv file −

 The file has been updated 

Working with Pandas for Advanced CSV Operations

For complex CSV files or files with headers and mixed data types then pandas is a useful library that offers additional features such as parsing dates and filtering columns by name. Here is the example which works with advanced csv operations −

 import pandas as pd # Reading a CSV file with headers df = pd.read_csv('/files/data.csv') # Writing a DataFrame to CSV df.to_csv('/files/pandas_output.csv', index=False) print("The file has been updated") 

Following is the output working with the pandas for advanced CSV operations −

 The file has been updated 

Image Files

SciPys scipy.ndimage and scipy.misc modules can handle image files. While scipy.misc has limited image capabilities, external libraries such as Pillow or imageio offer better support for reading and writing images which can be converted to NumPy arrays for use with SciPy.

Reading Images

We can use imageio or Pillow libraries to load images and convert them to NumPy arrays. Following is the example of loading images using the imageio −

 import imageio import numpy as np # Read an image image = imageio.v2.imread('/Images/2d_fft.jpeg') print(image.shape) # Check dimensions of the image array 

Here is the output of reading the image using the imageio library −

 (400, 1200, 3) 

Writing Images

For writing into an image i.e., to save an array as an image in Scipy, we can use the function imageio.imwrite. Here is the example which illustrates saving an array as an image with the help of imwrite() function −

 from imageio import imwrite import numpy as np image = np.array([ [[255, 0, 0], [255, 128, 0], [255, 255, 0], [128, 255, 0], [0, 255, 0]], [[0, 255, 128], [0, 255, 255], [0, 128, 255], [0, 0, 255], [128, 0, 255]], [[255, 0, 255], [128, 128, 128], [0, 0, 0], [128, 128, 128], [255, 0, 255]], [[128, 0, 255], [0, 0, 255], [0, 128, 255], [0, 255, 255], [0, 255, 128]], [[0, 255, 0], [128, 255, 0], [255, 255, 0], [255, 128, 0], [255, 0, 0]] ], dtype=np.uint8) # Save a NumPy array as an image imwrite('/Images/output.png', image) 

When we execute the above code output will be saved as an image, that we can check in the specifed location.

Sound Files

SciPy provides scipy.io.wavfile to read and write .wav audio files which is a popular format for storing uncompressed sound data. For more complex audio formats we can consider using the soundfile or librosa libraries.

Reading WAV files

To read an audio file the scipy.io.wavfile.read() function can be used. This function returns the sample rate and the audio data in the form of a NumPy array where each element represents a sample in the audio waveform. Here is the example which reads the given input audio −

 from scipy.io import wavfile # Read the WAV file sampling_rate, audio_data = wavfile.read('/files/sample-3s.wav') # Display sampling rate and audio data details print(f"Sampling Rate: {sampling_rate} Hz") # Frequency of audio samples print(f"Audio Data Shape: {audio_data.shape}") # Shape of the array, e.g., (n_samples,) or (n_samples, n_channels) print(f"Data Type: {audio_data.dtype}") # Type of the data, often int16 or float32 

Here is the output of reading the .wav audio file −

 Sampling Rate: 44100 Hz Audio Data Shape: (140928, 2) Data Type: int16 

Writing WAV files

We can create or modify audio data and save it back to a .wav file using scipy.io.wavfile.write()

 import numpy as np from scipy.io.wavfile import write # Set parameters sampling_rate = 44100 # 44.1 kHz standard sampling rate duration = 2 # 2 seconds frequency = 440 # 440 Hz tone (A4 note) # Generate a sine wave t = np.linspace(0, duration, int(sampling_rate * duration), endpoint=False) audio_data = 0.5 * np.sin(2 * np.pi * frequency * t) # Save the sine wave as a .wav file write('/files/generated_audio.wav', sampling_rate, audio_data.astype(np.float32)) 

HDF5 Files

HDF5 is a popular format for large datasets and while SciPy itself doesnt directly support HDF5 we can use the h5py library to work with these files. HDF5 allows hierarchical organization of data and is particularly useful for machine learning and high-performance computing.

Here is the example of reading and writing the data of the HDF5 file −

 import h5py import numpy as np # Write data to an HDF5 file with h5py.File('data.h5', 'w') as file: file.create_dataset('array', data=np.array([1, 2, 3])) # Read data from an HDF5 file with h5py.File('data.h5', 'r') as file: array = file['array'][:] print(array) 

Below is the output of the writing into the .h5 file −

 [1 2 3] 

SON and XML Files

For hierarchical or structured data JSON and XML formats are often used. SciPy doesnt have native support for these but Pythons built-in json and xml libraries can parse them and we can convert parsed data into NumPy arrays if needed.

Following is the example of reading and writing the data of the SON and XML files −

 import json import numpy as np # Write JSON data data = {'array': [1, 2, 3]} with open('data.json', 'w') as f: json.dump(data, f) # Read JSON data with open('data.json', 'r') as f: data = json.load(f) array = np.array(data['array']) 
Advertisements
close