Python File I/O

Topic: Input/Output

Introduction

Python provides powerful tools for reading from and writing to files, which is essential for data science workflows. Understanding file I/O operations enables loading datasets, saving processed results, and handling various file formats. Python supports both text and binary file operations with various encoding options.

Key Concepts

File modes: Read, write, append (text and binary)
Context managers: Automatic resource cleanup with 'with' statement
Text vs binary: Different handling for text and binary files
Encodings: UTF-8, ASCII, and other character encodings
Line-by-line processing: Efficient handling of large files
CSV and JSON: Common data file formats

Python Implementation

# Basic file reading
with open("data.txt", "r") as file:
    content = file.read()

# Reading lines
with open("data.txt", "r") as file:
    lines = file.readlines()  # List of all lines
    for line in file:         # Iterate line by line
        print(line.strip())

# Writing to files
with open("output.txt", "w") as file:
    file.write("Hello, World!\n")
    file.writelines(["Line 1\n", "Line 2\n"])

# Appending to files
with open("log.txt", "a") as file:
    file.write("New entry\n")

# CSV handling
import csv
with open("data.csv", "w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Name", "Age"])
    writer.writerows([["Alice", 25], ["Bob", 30]])

# Reading CSV
with open("data.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

# JSON handling
import json
data = {"name": "John", "age": 30}
with open("data.json", "w") as file:
    json.dump(data, file)

with open("data.json", "r") as file:
    loaded = json.load(file)

When to Use

Loading datasets from disk for analysis
Saving processed data and results
Reading configuration files
Processing log files
Working with CSV and JSON data exports
Handling large files with streaming

Key Takeaways

Always use context managers (with statement) for file operations to ensure proper cleanup
Specify encoding explicitly when dealing with non-ASCII text
Use newline="" when writing CSV files to avoid double line endings
For large files, process line by line to avoid loading entire file into memory
JSON and CSV are the most common data interchange formats in data science

Need More Practice?

Get personalized data science help from ChatWhole's AI-powered platform.

Get Expert Help →