Python data I/O Cheat Sheet

This Cheat Sheet is a collection of common Python data I/O functions.

Built-in Python I/O functions

Python’s built-in I/O function can handle two types of files: normal text files and binary files.

Open and/or create a file with open():

filename = 'my_data_file.txt'
file_object = open(filename, mode='r')  # open a file
text = file_object.read()               # reading file content and 
                                        # saving it into a variable
file_object.close()                     # closes the open file 
                                        # (always necessary!)
print(text)                             # show file content

The mode variable defines how your file will be opened. Available modes are:

mode description
r read-only mode (default), reading from the beginning of the file, raise error if file does not exist
r+ read-and-write mode, reading from the beginning of the file, raise error if file does not exist
w write-only mode, will overwrite any existing file with the same name, creates a new file if one with the same name doesn’t exist
w+ read-and-write mode, reading from the beginning of the file, will overwrite any existing file with the same name, creates a new file if one with the same name doesn’t exist
a opens a file for appending new entries at the end of an existing file (i.e., after and existing data), creates a new file if one with the same name doesn’t exist
a+ opens a file for both appending and reading, creates a new file if one with the same name doesn’t exist, new entries will be appended at the end of the file

You can check whether your file is still open or not via:

print(file_object.closed) 

File handle: The files are opened or created with an internal file handle. The file handle is like a pointer, which defines from where the data will be read or written in the file.

Write to a new or existing file with write():

file_object = open(filename, mode='w+')

entry = "a word"
file_object.write("entry")              # inserts a new entry in a single 
                                        # line in the text file.
L = ["word 1\n", "word 2\n", "word 3\n"]
file_object.writelines(L)               # inserts a list of entries at
                                        # a single time; "\n" forces a
                                        # line after each entry in L
file_object.close()

Read from an exiting file with read(), readline() and readlines():

file_object = open(filename, mode='r')

print(file_object.read())           # reads all lines of the file and puts
                                    # the current handle at the end of the 
                                    # file

file_object.seek(0)                 # rewinds the file and puts the current  
                                    # handle at the beginning of the file

N_lines = len(file_object.readlines())  # get the number of line within the  
file_object.seek(0)                     # file
for line in range(N_lines):
  print(file_object.readline())     # reads one line at the current handle 
                                    # and puts the handle to the next line
  
file_object.seek(0)
 
print(file_object.readlines())      # reads all lines at once, stores them 
                                    # in a list and puts the handle at 
                                    # the file

file_object.close()

Of course, you can read from a file and store the read entries into a variable:

file_object = open(filename, mode='r')

# by looping over individual lines:
my_data_list = []
N_lines = len(file_object.readlines())  
file_object.seek(0)
for line in range(N_lines):
  my_data_list.append(file_object.readline())
file_object.seek(0)
  
# or via the readlines() command:
my_data_list2 = file_object.readlines()
  
file_object.close()

NumPy I/O functions

The following table containts the most common NumPy data I/O functions:

command description command description
np.save() write an array to a binary npy file np.load() load arrays or pickled objects from .npy/.npz or pickled files
np.savetxt() write an array to a text file np.loadtxt() load data from a text file of one data type
    np.genfromtxt() load data from a text file of mixed data type

Examples:

import numpy as np
array_2D = np.random.random((20,2))

# save and load a binary npy-file:
filename_npy = "my_array.npy"
np.save(filename_npy, array_2D)
array_2D_from_file_npy = np.load(filename_npy)
print(array_2D_from_file_npy)

# save and load a human-readable txt-file for arrays of one data type:
filename_csv = "my_array.csv"
np.savetxt(filename_csv, array_2D, delimiter=", ")
array_2D_from_file_csv = np.loadtxt(filename_csv, delimiter=", ",
                                    skiprows=0, dtype=float)
print(array_2D_from_file_csv)

# load a human-readable txt-file for arrays of mixed data type:
array_2D_from_file_csv = np.genfromtxt(filename_csv, delimiter=", ",
                                       names=True, #Look for column header)
print(array_2D_from_file_csv)

Note, while the np.savetxt() command creates human-readable text files, the np.save() creates so-called npy files. These files can not be read, e.g., by standard test editors. However, they bring some advantages:

  • NumPy arrays are saved with the full information to reconstruct them including shape and dtype on a machine of a different architecture
  • npy ist straightforward to reverse engineer, e.g., to reconstruct a npy reader if the program, with which the file was created, does longer not exist
  • allows memory-mapping of the data

Note: A full overview of available NumPy I/O functions can be found on the Numpy documentation website .

Pandas I/O functions

The following table containts the most common Pandas data I/O functions:

command description command description
DataFrame.to_excel() write DataFrame to an Excel sheet pd.read_excel() write an Excel sheet into DataFrame
DataFrame.to_csv() write DataFrame to a CSV file pd.read_csv() read a CSV file into DataFrame
DataFrame.to_hdf() write DataFrame to an HDF store pd.read_hdf() read DataFrame from an HDF store
DataFrame.to_pickle() pickle (serialize) DataFrame to file. pd.read_pickle() load pickled Pandas object (or any object) from file

Examples:

import pandas as pd
import numpy as np
array_2D = np.random.random((20,2))
df = pd.DataFrame(data=array_2D, columns=["Column 1", "Column 2"])

# saving a Pandas DataFrame to file:
df.to_excel("my_array.xlsx")
df.to_csv("my_array.csv", mode="w")
df.to_hdf("my_array.h5", key='df', mode="w")
df.to_pickle("my_array.pkl")

# loading a Pandas DataFrame from file:
df_read = pd.read_excel("my_array.xlsx")
df_read = pd.read_csv("my_array.csv", mode="w")
df_read = pd.read_hdf("my_array.h5")
df_read = pd.read_pickle("my_array.pkl")

Note: A full overview of available Pandas I/O functions can be found on the Pandas documentation website .

MATLAB files

import scipy.io
filename = 'workspace.mat'
mat = scipy.io.loadmat(filename)

HDF5 files

http://www.h5py.org

import h5py
filename = 'my_hdf_file.h5'
data = h5py.File(filename, 'r')

Pickled files

https://docs.python.org/3/library/pickle.html

import pickle
with open('pickled_data_file.pkl', 'rb') as file:
         pickled_data = pickle.load(file)

updated:


Comments

Comment on this post by publicly replying to this Mastodon post using a Mastodon or other ActivityPub/Fediverse account.

Comments on this website are based on a Mastodon-powered comment system. Learn more about it here.

There are no known comments, yet. Be the first to write a reply.