Python File Reading: Essential Guide for AI/ML Data

Master Python file reading for AI & ML! Learn to open, read entire files, lines, and use 'with' for efficient data handling. Your data prep starts here.

8.4 Reading Files in Python

Handling files is a fundamental skill in Python programming. Whether you're reading simple text files or processing binary data, Python provides built-in functions and methods to make input/output (I/O) operations efficient and developer-friendly.

This guide covers essential techniques for reading files in Python, including:

  • Opening files for reading.
  • Reading entire files, single lines, and multiple lines.
  • Efficient file handling using the with statement.
  • Reading binary files.
  • Working with binary data for integers and floating-point numbers.
  • Simultaneous reading and writing.
  • Controlling file operations with file pointers.

Opening a File for Reading

To read a file, you use Python's built-in open() function. This function requires the filename and an optional mode. For reading, the default mode is 'r'.

Syntax:

file_object = open('filename.txt', 'r')

If the specified file does not exist, Python will raise a FileNotFoundError.

Reading File Content

Python offers several methods to read data from an opened file object.

Reading the Entire File (read())

The read() method reads the entire content of a file into a single string. This is suitable for smaller files that can fit into memory.

Syntax:

file.read(size)
  • size: (Optional) The number of bytes to read. If omitted or negative, it reads the entire file.

Example:

# Assuming 'example.txt' contains:
# welcome to Tutorialspoint.
# Hi Surya.
# How are you?.

try:
    file = open('example.txt', 'r')
    content = file.read()
    print(content)
finally:
    file.close()

Output:

welcome to Tutorialspoint.
Hi Surya.
How are you?.

Reading a Single Line (readline())

The readline() method reads one line at a time from the file, including the newline character (\n) at the end of the line. This is memory-efficient for processing large files line by line.

Syntax:

file.readline(size)
  • size: (Optional) The number of bytes to read. If omitted, it reads until the next newline character.

Example:

try:
    file = open('example.txt', 'r')
    line1 = file.readline()
    print(line1)
    line2 = file.readline()
    print(line2)
finally:
    file.close()

Output:

welcome to Tutorialspoint.

Hi Surya.

Reading All Lines (readlines())

The readlines() method reads all lines from the file and returns them as a list of strings. Each string in the list represents a line from the file, including the newline character.

Syntax:

file.readlines(hint)
  • hint: (Optional) Reads up to the specified number of bytes.

Example:

try:
    file = open('example.txt', 'r')
    lines = file.readlines()
    for line in lines:
        print(line, end='') # end='' to prevent double newlines
finally:
    file.close()

Output:

welcome to Tutorialspoint.
Hi Surya.
How are you?.

Using the with Statement for File Reading

The with statement is the recommended way to handle file operations in Python. It ensures that the file is automatically closed after the block of code is executed, even if errors occur. This prevents resource leaks.

Example:

with open('example.txt', 'r') as file:
    content = file.read()
    print(content)
# The file is automatically closed here

Output:

welcome to Tutorialspoint.
Hi Surya.
How are you?.

Reading Binary Files

To read non-text files such as images, audio files, or executable binaries, you must open them in binary mode. Use the 'rb' mode for reading binary data.

Example – Reading Binary Data:

# Assuming 'test.bin' contains some binary data
with open('test.bin', 'rb') as f:
    data = f.read()
    # If the binary data represents text, you can decode it
    try:
        print(data.decode('utf-8'))
    except UnicodeDecodeError:
        print("Binary data could not be decoded as UTF-8:", data)

Example Output (if test.bin contains "Hello World" as UTF-8 bytes):

Hello World

Writing and Reading Integer Data in Binary Files

You can store and retrieve integers by converting them to bytes using .to_bytes() and back using int.from_bytes().

Writing an Integer:

n = 25
# Convert integer to 8 bytes, using big-endian byte order
data = n.to_bytes(8, 'big')

with open('test.bin', 'wb') as f:
    f.write(data)

Reading the Integer:

with open('test.bin', 'rb') as f:
    data = f.read()
    # Convert bytes back to an integer, using big-endian byte order
    n = int.from_bytes(data, 'big')
    print(n)

Output:

25

Handling Floating-Point Data in Binary Files

Python's struct module is ideal for packing and unpacking binary data, including floating-point numbers.

Writing a Float:

import struct

x = 23.50
# Pack the float into a 4-byte binary representation (single precision)
data = struct.pack('f', x)

with open('test.bin', 'wb') as f:
    f.write(data)

Reading the Float:

import struct

with open('test.bin', 'rb') as f:
    data = f.read()
    # Unpack the binary data as a float
    x = struct.unpack('f', data)[0] # unpack returns a tuple, get the first element
    print(x)

Output:

23.5

Reading and Writing Simultaneously ('r+' Mode)

The 'r+' mode allows you to both read from and write to a file without truncating its existing content. This is useful for modifying files in place.

Using seek() to Control the File Pointer

The seek() method allows you to reposition the file pointer within the file. This is crucial for advanced file operations, like reading from or writing to specific locations.

Syntax:

file.seek(offset, whence)
  • offset: The number of bytes to move the file pointer.
  • whence: Specifies the reference point for the offset:
    • 0: Beginning of the file (default).
    • 1: Current position of the file pointer.
    • 2: End of the file.

Example: Read from a Specific Position

# Assume 'foo.txt' contains: "This is a test file."

with open("foo.txt", "r") as fo:
    # Move the file pointer 10 bytes from the beginning of the file
    fo.seek(10, 0)
    # Read the next 3 bytes
    data = fo.read(3)
    print(data)

Output:

tes

Example: Read and Write in the Same File

# Assume 'foo.txt' contains: "This is a test file."

with open("foo.txt", "r+") as fo:
    # Write new content, which will be appended or overwrite if pointer is moved
    fo.write(" new content")
    # Reset the file pointer to the beginning
    fo.seek(0)
    # Read the entire content
    data = fo.read()
    print(data)

Output:

This is a test file. new content

Rewriting Specific Parts of a File Using Offsets

You can overwrite existing data in a file by moving the file pointer to the desired position using seek() and then writing new data.

Example:

# Create or overwrite 'foo.txt'
with open("foo.txt", "w+") as fo:
    fo.write("This is a rat race")

    # Read 3 characters starting from the 10th byte
    fo.seek(10, 0)
    data_read = fo.read(3)
    print("Data read from position 10:", data_read)

    # Move the pointer back to position 10 to overwrite
    fo.seek(10, 0)
    fo.write("cat") # Overwrite "rat" with "cat"

    # Go back to the beginning to read the updated file
    fo.seek(0, 0)
    updated_data = fo.read()
    print("Updated file content:", updated_data)

Output:

Data read from position 10: rat
Updated file content: This is a cat race

Conclusion

Python provides a versatile set of tools for reading files:

  • Text Files: Use read(), readline(), or readlines() for text-based data.
  • Binary Files: Employ binary modes like 'rb' for non-textual data (images, executables, etc.).
  • File Pointer Control: The seek() method allows precise positioning for reading and writing.
  • Simultaneous I/O: The 'r+' mode enables reading and writing within the same file.
  • Automatic Closing: Always use the with statement for safe and efficient file handling.
  • Binary Data Conversion: The struct module is invaluable for handling numbers (integers, floats) in binary formats.

SEO Keywords

Python file reading, read() method Python, readline() Python, readlines() Python, Python with statement file, binary file reading Python, Python seek method, read and write file Python, Python file pointers, Python struct module file.


Interview Questions

  • How do you open and read a file in Python?
  • What is the difference between read(), readline(), and readlines() in Python file handling?
  • Why should you use the with statement when working with files?
  • How do you read a binary file in Python?
  • How can you read and write to the same file simultaneously in Python?
  • Explain the use of the seek() method in file handling.
  • How do you handle reading floating-point numbers from a binary file?
  • What exception is raised if you try to open a file that does not exist in read mode?
  • How can you overwrite a specific part of a file in Python?
  • Describe how Python’s struct module is used in file operations.