0% found this document useful (0 votes)
4 views30 pages

File Input and Output in Python

File I/O refers to the process of reading from and writing to files, essential for data persistence, configuration loading, and processing large datasets. The open() function is central to File I/O in Python, allowing users to specify file names and modes for reading, writing, appending, and more. Best practices include using the 'with' statement to ensure files are properly closed after operations.

Uploaded by

ammuardra146
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views30 pages

File Input and Output in Python

File I/O refers to the process of reading from and writing to files, essential for data persistence, configuration loading, and processing large datasets. The open() function is central to File I/O in Python, allowing users to specify file names and modes for reading, writing, appending, and more. Best practices include using the 'with' statement to ensure files are properly closed after operations.

Uploaded by

ammuardra146
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

What is File I/O?

File Input/Output (I/O) is the process of reading data from a file (input) and writing data to a file
(output). This is essential for any application that needs to:

 Persist data: Save data so it's not lost when the program closes.

 Read configuration: Load settings from a configuration file.

 Process large datasets: Read data from a file, process it, and write the results to another file.

The Core of File I/O: The open() Function

Everything starts with the built-in open() function. It opens a file and returns a "file object" (also
called a handle), which you can use to read from or write to the file.

Its basic syntax is:


file_object = open("filename.txt", "mode")

Key Parameters:

1. filename: The name of the file you want to open (e.g., "my_data.txt"). You can also provide
a full path (e.g., "C:/Users/YourUser/Documents/my_data.txt").

2. mode: A string that specifies how you want to interact with the file. This is crucial.

Common File Modes

What happens if the file What happens if the file


Mode Name Description
doesn't exist? exists?

(Default) Opens a file for Raises a Reads from the


'r' Read
reading. FileNotFoundError. beginning.

Erases the entire file and


'w' Write Opens a file for writing. Creates a new file. writes from the
beginning.

Adds new content to the


'a' Append Opens a file for appending. Creates a new file.
end of the file.

Opens a file for exclusive


'x' Create Creates a new file. Raises a FileExistsError.
creation.

Can be added to other modes


'+' Update (r+, w+, a+) to allow both reading
and writing.

Can be added to other modes


'b' Binary (rb, wb, ab) to work with binary
files (like images, audio).
The Best Practice: Using the with Statement

It is critical to always close a file after you are done with it. If you don't, you can leak resources or
leave data in a corrupted state.

The best and safest way to do this in Python is with the with statement. It automatically closes the
file for you, even if an error occurs inside the block.

Syntax:

Generated python

with open("filename.txt", "mode") as file:

# Perform operations on the 'file' object here

# The file is automatically closed when you exit this block

1. Writing to a File (Output)

Example: Writing with 'w' (Erase and Write)

This mode is perfect for creating a new file or completely overwriting an existing one.

Generated python

# The text we want to write to the file

lines_to_write = [ "Hello from Python!\n",

"This is the second line.\n",

"Writing files is easy.\n"]

# Use 'w' mode to write to a new file (or overwrite an existing one)

try:

with open("greetings.txt", "w") as file:

file.write("This is the very first line.\n") # write() writes a single string

file.writelines(lines_to_write) # writelines() writes a list of strings

print("File 'greetings.txt' was written successfully.")

except IOError as e:

print(f"An error occurred: {e}")

Result: A file named greetings.txt will be created with the following content:

Generated code

This is the very first line.

Hello from Python!


This is the second line.

Writing files is easy.

Important: write() and writelines() do not automatically add newline characters (\n). You have to
add them yourself.

Example: Appending with 'a' (Add to End)

This mode is used to add content to the end of an existing file without deleting its current contents.

Generated python

# Let's add more content to our existing file

with open("greetings.txt", "a") as file:

file.write("Appending a new line at the end.\n")

print("Appended content to 'greetings.txt'.")

Result: The greetings.txt file will now look like this:

Generated code

This is the very first line.

Hello from Python!

This is the second line.

Writing files is easy.

Appending a new line at the end.

2. Reading from a File (Input)

Let's assume we have the greetings.txt file from the previous step.

Method 1: Reading the Entire File at Once (.read())

This is simple but can consume a lot of memory if the file is very large.

Generated python

try:

with open("greetings.txt", "r") as file:

content = file.read() # Reads the entire file into a single string

print("--- Reading entire file with .read() ---")

print(content)

except FileNotFoundError:
print("The file was not found!")

Output:

Generated code

--- Reading entire file with .read() ---

This is the very first line.

Hello from Python!

This is the second line.

Writing files is easy.

Appending a new line at the end.

Method 2: Reading Line by Line (The Pythonic Way)

This is the most common and memory-efficient way to read a file, especially large ones. You can
iterate directly over the file object.

Generated python

print("\n--- Reading file line-by-line ---")

try:

with open("greetings.txt", "r") as file:

for line in file:

# The 'line' variable includes the newline character at the end.

# We use .strip() to remove leading/trailing whitespace, including the newline.

print(line.strip())

except FileNotFoundError:

print("The file was not found!")

Output:

Generated code

--- Reading file line-by-line ---

This is the very first line.

Hello from Python!

This is the second line.

Writing files is easy.

Appending a new line at the end.


Method 3: Reading All Lines into a List (.readlines())

This reads the entire file and puts each line into a list of strings.

Generated python

print("\n--- Reading all lines into a list with .readlines() ---")

try:

with open("greetings.txt", "r") as file:

lines = file.readlines() # Returns a list of strings

print(lines)

# You can then process this list

print(f"The third line is: {lines[2].strip()}")

except FileNotFoundError:

print("The file was not found!")

Output:

Generated code

--- Reading all lines into a list with .readlines() ---

['This is the very first line.\n', 'Hello from Python!\n', 'This is the second line.\n', 'Writing files is
easy.\n', 'Appending a new line at the end.\n']

The third line is: This is the second line.

Working with Structured Data (CSV and JSON)

While the methods above work for plain text, Python has special libraries for structured data like
CSV and JSON.

Example: CSV Files

The csv module makes it easy to read and write comma-separated values.

Generated python

import csv

# Writing to a CSV file

header = ['name', 'department', 'birth_month']

data = [

['John Doe', 'Engineering', 'November'],


['Jane Smith', 'Marketing', 'May']

with open('employees.csv', 'w', newline='', encoding='utf-8') as file:

writer = csv.writer(file)

writer.writerow(header) # Write the header row

writer.writerows(data) # Write all data rows

print("employees.csv created.")

# Reading from a CSV file

print("\n--- Reading from employees.csv ---")

with open('employees.csv', 'r', encoding='utf-8') as file:

reader = csv.reader(file)

header = next(reader) # Skip the header

print(f"Header: {header}")

for row in reader:

print(f"{row[0]} works in {row[1]}.")

Output:

Generated code

employees.csv created.

--- Reading from employees.csv ---

Header: ['name', 'department', 'birth_month']

John Doe works in Engineering.

Jane Smith works in Marketing.

 newline='' is important when writing CSVs to prevent blank rows.

 encoding='utf-8' is a best practice to ensure your code works with a wide range of
characters.

Example: JSON Files

The json module is perfect for working with JSON data, which is common in web development and
APIs.
Generated python

import json

# Writing a Python dictionary to a JSON file

user_data = {

"id": 123,

"name": "Alice",

"isAdmin": True,

"courses": ["History", "CompSci"]

with open("user.json", "w") as file:

json.dump(user_data, file, indent=4) # 'dump' writes to a file

print("user.json created.")

# Reading from a JSON file into a Python dictionary

print("\n--- Reading from user.json ---")

with open("user.json", "r") as file:

data = json.load(file) # 'load' reads from a file

print(f"User's name is {data['name']}.")

print(f"Courses: {data['courses']}")

What is a Function?

Think of a function as a reusable block of code that performs a specific task. You give it a name, and
you can "call" that name whenever you need to execute that task, instead of writing the code over
and over again. This makes your code more organized, efficient, and easier to read.

There are two main types of functions in Python:

1. Built-in Functions: Functions that are provided by Python itself.

2. User-Defined Functions: Functions that you, the programmer, create.

1. Built-in Functions
Built-in functions are part of Python's standard library. They are always available for you to use
without needing to import any special modules. They are designed to perform common and
essential tasks.

Key Characteristics:

 Pre-defined: They are part of the Python language.

 Always available: You don't need to write them or import them.

 Highly optimized: They are typically written in C and are very fast and efficient.

Examples of Common Built-in Functions

1. print()
Prints the specified message to the screen.

Generated python

print("Hello, World!")

# Output: Hello, World!

2. len()
Returns the length (the number of items) of an object like a string, list, or dictionary.

Generated python

my_list = [10, 20, 30, 40]

name = "Python"

print(f"Length of my_list: {len(my_list)}") # Output: 4

print(f"Length of the word '{name}': {len(name)}") # Output: 6

3. type()
Returns the data type of an object.

Generated python

x = 10

y = "hello"

z = [1, 2, 3]

print(f"Type of x: {type(x)}") # Output: <class 'int'>

print(f"Type of y: {type(y)}") # Output: <class 'str'>

print(f"Type of z: {type(z)}") # Output: <class 'list'>

4. int(), str(), float()


These functions convert values from one type to another.
Generated python

number_string = "123"

number_int = int(number_string) # Convert string to integer

print(f"Integer value: {number_int}") # Output: 123

print(f"Type is now: {type(number_int)}") # Output: <class 'int'>

float_val = float(number_int) # Convert integer to float

print(f"Float value: {float_val}") # Output: 123.0

5. sum(), max(), min()


Perform mathematical operations on a collection of numbers.

Generated python

numbers = [3, 1, 9, 4, 6]

print(f"Sum: {sum(numbers)}") # Output: 23

print(f"Max: {max(numbers)}") # Output: 9

print(f"Min: {min(numbers)}") # Output: 1

2. User-Defined Functions (UDFs)

A user-defined function is a function that you create yourself to perform a specific task that isn't
covered by a built-in function. This is the core of writing modular and reusable code, following the
DRY (Don't Repeat Yourself) principle.

Anatomy of a User-Defined Function

Generated python

# def is the keyword to define a function

# | function_name

# | | parameters (inputs)

# | | |

# v v v

def function_name(parameter1, parameter2):

"""
This is a docstring. It explains what the function does.

It's a best practice to always include one!

"""

# The indented block of code is the function's body

# It contains the logic for the task.

result = parameter1 + parameter2

# The return statement sends a value back as the output.

# This is optional.

return result

Examples of User-Defined Functions

Example 1: A Simple Function with No Inputs or Outputs

This function just performs an action (printing a message).

Generated python

def greet():

"""This function prints a simple greeting."""

print("Hello! Welcome to the program.")

# To use the function, you "call" it by its name:

greet()

greet()

Output:

Generated code

Hello! Welcome to the program.

Hello! Welcome to the program.

Example 2: A Function with a Parameter (Input)

This function takes an input (name) to customize its behavior.

Generated python

def greet_person(name):
"""This function greets a person by their name."""

print(f"Hello, {name}! It's nice to meet you.")

# Call the function and provide an "argument" (the actual value for the parameter)

greet_person("Alice")

greet_person("Bob")

Output:

Generated code

Hello, Alice! It's nice to meet you.

Hello, Bob! It's nice to meet you.

Example 3: A Function with a return Statement (Output)

This function takes two numbers, calculates their sum, and returns the result. The calling code can
then store and use this result.

Generated python

def add_numbers(num1, num2):

"""This function adds two numbers and returns the sum."""

total = num1 + num2

return total

# Call the function and store the returned value in a variable

sum_result = add_numbers(5, 7)

print(f"The sum is: {sum_result}") # Output: 12

# You can use the function's result directly

another_sum = add_numbers(100, 50) + 10

print(f"Another calculation: {another_sum}") # Output: 160

Example 4: A Function without a return Statement

If you don't include a return statement, the function automatically returns a special value: None.

Generated python
def say_goodbye(name):

"""This function just prints a message and doesn't return anything."""

print(f"Goodbye, {name}!")

result = say_goodbye("Charlie")

print(f"The function returned: {result}")

Output:

Generated code

Goodbye, Charlie!

The function returned: None

Summary: Key Differences

Feature Built-in Functions User-Defined Functions

Origin Part of the standard Python language. Created by you, the programmer.

To perform common, general-purpose To perform specific tasks unique to your


Purpose
tasks (e.g., len(), print()). program's logic.

Availabilit Must be defined with def before you can


Always available; no def needed.
y call it.

You have complete control over their logic,


Flexibility Their behavior is fixed.
inputs, and outputs.

What is NumPy?

NumPy (short for Numerical Python) is the most fundamental package for scientific computing in
Python. It's a library that provides:

1. A powerful N-dimensional array object called ndarray.

2. Sophisticated functions for mathematical and logical operations on these arrays.

3. Tools for linear algebra, Fourier transforms, and random number generation.

Why Use NumPy instead of Python Lists?

Python lists are flexible but slow for numerical operations. NumPy arrays are superior for numerical
tasks for three main reasons:
 Speed: NumPy operations are implemented in C and Fortran, making them much faster than
iterating over a Python list.

 Memory Efficiency: NumPy arrays are stored in a contiguous block of memory. This is much
more memory-efficient than Python lists, which store pointers to objects.

 Convenience: NumPy provides a huge library of high-level mathematical functions that


operate on entire arrays without the need for loops (this is called vectorization).

The Core of NumPy: The ndarray

The ndarray is a grid of values, all of the same data type. It has important attributes:

 ndarray.ndim: The number of dimensions (or axes) of the array.

 ndarray.shape: A tuple of integers indicating the size of the array in each dimension.

 ndarray.size: The total number of elements in the array.

 ndarray.dtype: The data type of the elements in the array (e.g., int64, float64).

First, let's install and import NumPy.

Generated bash

# In your terminal or command prompt

pip install numpy

Generated python

# In your Python script or notebook

import numpy as np # 'np' is the standard alias for numpy

1. Creating NumPy Arrays

You can create NumPy arrays in several ways.

a) From a Python List

This is the most common way to get started.

Generated python

# A 1-dimensional array

a = np.array([1, 2, 3, 4, 5])

print(f"1D Array: {a}")

print(f"Shape: {a.shape}") # (5,)


print(f"Dimensions: {a.ndim}") # 1

# A 2-dimensional array (a matrix)

b = np.array([[1, 2, 3], [4, 5, 6]])

print(f"\n2D Array:\n{b}")

print(f"Shape: {b.shape}") # (2, 3) -> 2 rows, 3 columns

print(f"Dimensions: {b.ndim}") # 2

b) Using Built-in Creation Functions

These are useful for creating large arrays with initial placeholder content.

Generated python

# Create an array of zeros

zeros_arr = np.zeros((2, 4)) # A 2x4 matrix of zeros

print(f"Zeros Array:\n{zeros_arr}")

# Create an array of ones

ones_arr = np.ones((3, 3), dtype=np.int16) # Specify data type

print(f"\nOnes Array (integers):\n{ones_arr}")

# Create an array with a range of elements

range_arr = np.arange(10, 20, 2) # Start, stop (exclusive), step

print(f"\nRange Array: {range_arr}")

# Create an array with a specific number of elements between two points

linspace_arr = np.linspace(0, 10, 5) # Start, stop (inclusive), num_points

print(f"\nLinspace Array: {linspace_arr}")

# Create an array with random values

random_arr = np.random.rand(2, 3) # A 2x3 array of random floats between 0 and 1

print(f"\nRandom Array:\n{random_arr}")
2. Array Mathematics: The Power of Vectorization

This is where NumPy shines. You can perform operations on entire arrays without writing loops. This
is called vectorization.

Generated python

x = np.array([1, 2, 3, 4])

y = np.array([10, 20, 30, 40])

# --- Element-wise operations ---

# Addition

add_result = x + y

print(f"x + y = {add_result}") # [11 22 33 44]

# Subtraction

sub_result = y - x

print(f"y - x = {sub_result}") # [ 9 18 27 36]

# Multiplication

mul_result = x * y

print(f"x * y = {mul_result}") # [ 10 40 90 160]

# Division

div_result = y / x

print(f"y / x = {div_result}") # [10. 10. 10. 10.]

# --- Scalar operations (operating with a single number) ---

scalar_add = x + 5

print(f"\nx + 5 = {scalar_add}") # [6 7 8 9]

scalar_mul = x * 2
print(f"x * 2 = {scalar_mul}") # [2 4 6 8]

# --- Universal Functions (ufuncs) ---

# Apply functions like sin, cos, exp to every element

print(f"\nSin(x) = {np.sin(x)}")

3. Indexing and Slicing

Accessing elements in NumPy arrays is similar to Python lists but can be extended to multiple
dimensions.

Generated python

# Let's create a 2D array (a 3x4 matrix)

data = np.array([

[1, 2, 3, 4],

[5, 6, 7, 8],

[9, 10, 11, 12]

])

# Get a single element [row, column]

element = data[1, 2] # Row 1, Column 2

print(f"Element at (1, 2) is {element}") # Output: 7

# Get a specific row

row_1 = data[0, :] # Row 0, all columns

print(f"\nFirst row: {row_1}") # Output: [1 2 3 4]

# Get a specific column

col_2 = data[:, 1] # All rows, Column 1

print(f"Second column: {col_2}") # Output: [ 2 6 10]

# Slicing: Get a sub-matrix

# Get the top-right 2x2 matrix

sub_matrix = data[0:2, 2:4] # Rows 0 to 1, Columns 2 to 3


print(f"\nSub-matrix:\n{sub_matrix}")

# Output:

# [[3 4]

# [7 8]]

4. Boolean Indexing (Filtering)

This is an extremely powerful feature. You can use logical conditions to filter data from an array.

Generated python

arr = np.arange(1, 11) # Array from 1 to 10

print(f"Original array: {arr}")

# Find elements greater than 5

greater_than_5 = arr > 5

print(f"Boolean mask (arr > 5): {greater_than_5}")

# Output: [False False False False False True True True True True]

# Use the boolean mask to select elements

print(f"Elements greater than 5: {arr[greater_than_5]}") # Or more concisely: arr[arr > 5]

# Output: [ 6 7 8 9 10]

# You can also combine conditions

even_numbers = arr[arr % 2 == 0]

print(f"Even numbers: {even_numbers}") # Output: [ 2 4 6 8 10]

5. Aggregation Functions

NumPy has fast built-in aggregation functions to summarize data.

Generated python

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(f"Matrix:\n{matrix}")

# Get sum of all elements

print(f"\nSum of all elements: {matrix.sum()}")


# Get min/max of all elements

print(f"Minimum element: {matrix.min()}")

print(f"Maximum element: {matrix.max()}")

# You can also perform aggregations along a specific axis

# axis=0 -> collapses the rows (computes down the columns)

# axis=1 -> collapses the columns (computes across the rows)

col_sums = matrix.sum(axis=0)

print(f"\nSum of each column: {col_sums}") # [1+4+7, 2+5+8, 3+6+9] -> [12 15 18]

row_means = matrix.mean(axis=1)

print(f"Mean of each row: {row_means}") # [(1+2+3)/3, (4+5+6)/3, (7+8+9)/3] -> [2. 5. 8.]

Practical Example: Simple Data Analysis

Let's tie it all together. Imagine we have daily temperature data (in Fahrenheit) for a week and want
to analyze it.

Generated python

# Daily temperatures in Fahrenheit for one week

temps_f = np.array([72, 75, 68, 65, 78, 82, 81])

print(f"Temperatures (F): {temps_f}")

# 1. Vectorized Operation: Convert temperatures to Celsius

# Formula: C = (F - 32) * 5/9

temps_c = (temps_f - 32) * 5/9

print(f"Temperatures (C): {np.round(temps_c, 2)}") # Round to 2 decimal places

# 2. Aggregation: Calculate statistics

avg_temp_c = temps_c.mean()

max_temp_c = temps_c.max()

min_temp_c = temps_c.min()
print(f"\nAverage temperature: {avg_temp_c:.2f}°C")

print(f"Highest temperature: {max_temp_c:.2f}°C")

print(f"Lowest temperature: {min_temp_c:.2f}°C")

# 3. Boolean Indexing: How many days were hotter than 25°C?

hot_days_mask = temps_c > 25

hot_days = temps_f[hot_days_mask] # Get the original F temps for hot days

print(f"\nThere were {hot_days.size} days hotter than 25°C.")

print(f"The temperatures on those days were: {hot_days}°F")

. Of course. Let's dive deep into creating NumPy arrays and performing operations on them.

First, ensure you have NumPy imported. The standard convention is to import it with the alias np.

Generated python

import numpy as np

Part 1: NumPy Array Creation

Here are the most common ways to create NumPy arrays.

1. From a Python List or Tuple

This is the most direct method. NumPy infers the data type automatically.

Generated python

# Create a 1-dimensional array (a vector)

my_list = [1, 2, 3, 4, 5]

arr1d = np.array(my_list)

print(f"1D Array: {arr1d}")

print(f"Data type: {arr1d.dtype}") # int64 on a 64-bit system

# Create a 2-dimensional array (a matrix)

my_nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

arr2d = np.array(my_nested_list)
print(f"\n2D Array:\n{arr2d}")

print(f"Shape: {arr2d.shape}") # (3, 3) -> 3 rows, 3 columns

2. Using Built-in Creation Functions

These are highly efficient for creating large, structured arrays.

Generated python

# Create an array of a specific size filled with zeros

zeros_arr = np.zeros((2, 4)) # A 2x4 matrix of floating-point zeros

print(f"Zeros Array:\n{zeros_arr}")

# Create an array filled with ones

ones_arr = np.ones((3, 2), dtype=np.int32) # Specify the data type as 32-bit integers

print(f"\nOnes Array:\n{ones_arr}")

# Create an array filled with a specific value

full_arr = np.full((2, 3), 7) # A 2x3 matrix filled with the number 7

print(f"\nFull Array:\n{full_arr}")

# Create an identity matrix (square matrix with ones on the diagonal)

identity_matrix = np.eye(4)

print(f"\nIdentity Matrix:\n{identity_matrix}")

3. Creating Arrays with Sequences of Numbers

Generated python

# Create an array with a range of values (similar to Python's range)

# np.arange(start, stop_exclusive, step)

range_arr = np.arange(0, 10, 2)

print(f"Range Array: {range_arr}") # [0 2 4 6 8]

# Create an array with a specific number of evenly spaced points

# np.linspace(start, stop_inclusive, num_points)

linspace_arr = np.linspace(0, 1, 5)
print(f"\nLinspace Array: {linspace_arr}") # [0. 0.25 0.5 0.75 1. ]

4. Creating Random Arrays

This is extremely useful for simulations, testing, and machine learning.

Generated python

# Create a 2x3 array with random floats between 0 and 1

rand_arr = np.random.rand(2, 3)

print(f"Random float array:\n{rand_arr}")

# Create a 3x4 array with random integers between a low (inclusive) and high (exclusive) value

randint_arr = np.random.randint(10, 20, size=(3, 4))

print(f"\nRandom integer array:\n{randint_arr}")

Part 2: NumPy Array Operations

This is where NumPy's power becomes evident. Operations are applied element-wise without
needing to write loops.

1. Element-wise Arithmetic (Vectorization)

Let's create two arrays to work with.

Generated python

a = np.array([1, 2, 3, 4])

b = np.array([10, 20, 30, 40])

# Addition

print(f"a + b = {a + b}") # [11 22 33 44]

# Subtraction

print(f"b - a = {b - a}") # [ 9 18 27 36]

# Multiplication

print(f"a * b = {a * b}") # [ 10 40 90 160]


# Division

print(f"b / a = {b / a}") # [10. 10. 10. 10.]

# Exponentiation

print(f"a ** 2 = {a ** 2}") # [ 1 4 9 16]

You can also perform operations with a single number (a scalar), which is broadcast to all
elements.

Generated python

print(f"a + 5 = {a + 5}") # [6 7 8 9]

2. Indexing and Slicing

Accessing and modifying parts of an array.

Generated python

# Let's create a 2D array

matrix = np.arange(12).reshape(3, 4) # Create a 1D array 0-11 and reshape it

print(f"Original Matrix:\n{matrix}")

# [[ 0 1 2 3]

# [ 4 5 6 7]

# [ 8 9 10 11]]

# Access a single element [row, column]

print(f"\nElement at (1, 2): {matrix[1, 2]}") # 6

# Get an entire row

print(f"Row 0: {matrix[0]}") # or matrix[0, :] -> [0 1 2 3]

# Get an entire column

print(f"Column 1: {matrix[:, 1]}") # [1 5 9]

# Slicing: Get a sub-array

# Get rows 0 and 1, and columns 1 and 2


sub_matrix = matrix[0:2, 1:3]

print(f"\nSub-matrix (rows 0-1, cols 1-2):\n{sub_matrix}")

# [[1 2]

# [5 6]]

# You can also use slicing to modify values

matrix[0:2, 0] = 99 # Set the first two elements of the first column to 99

print(f"\nModified Matrix:\n{matrix}")

What is Pickling?

Pickling is the process of converting a Python object (like a list, dictionary, or even a custom object)
into a byte stream. This byte stream can be stored in a file, sent over a network, or saved in a
database.

The reverse process is called unpickling, where you convert the byte stream back into the original
Python object.

In simpler terms:

 Pickling: "Freezing" a Python object into a file.

 Unpickling: "Thawing" the object from the file back to its original state in your program.

This process is also known as serialization (pickling) and deserialization (unpickling).

Why Use Pickling?

The primary reason is to save the state of your program. Imagine you have:

 A complex dictionary of user settings that your program has built up.

 A list of custom objects representing game characters with their current health and
inventory.

 A trained machine learning model that took hours to create.

Without pickling, all this data is lost when your program closes. By pickling these objects, you can
save them to a file and load them back the next time your program runs, continuing exactly where
you left off.

The pickle Module

Python's built-in pickle module is used for this process. It has two main functions:

1. pickle.dump(obj, file): Writes the object obj to the file object file.

2. pickle.load(file): Reads a pickled object from the file object file and reconstructs it.

Crucial Note: Pickle files are binary files. You must always open them in binary mode:
 'wb' for Writing in Binary mode.

 'rb' for Reading in Binary mode.

Example 1: Pickling a Simple Dictionary

Let's save a dictionary of user preferences to a file and then load it back.

Step 1: Pickling (Saving the Object)

Generated python

import pickle

# 1. The Python object we want to save

user_settings = {

'theme': 'dark',

'font_size': 14,

'show_sidebar': True,

'bookmarks': ['google.com', 'python.org']

# 2. Open a file in binary write mode ('wb')

try:

with open('settings.pkl', 'wb') as file:

# 3. Use pickle.dump() to write the object to the file

pickle.dump(user_settings, file)

print("Settings have been saved successfully to 'settings.pkl'")

except IOError as e:

print(f"An error occurred: {e}")

What happens here?

 We import the pickle module.

 We create a dictionary user_settings.


 We open a file named settings.pkl in 'wb' mode. The .pkl extension is a common convention
for pickle files.

 pickle.dump() takes our dictionary, converts it into a byte stream, and writes it into the
settings.pkl file.

If you try to open settings.pkl in a text editor, you'll see mostly unreadable binary data.

Step 2: Unpickling (Loading the Object)

Now, let's imagine we've started a new program and want to load these settings.

Generated python

import pickle

# 1. Open the file in binary read mode ('rb')

try:

with open('settings.pkl', 'rb') as file:

# 2. Use pickle.load() to read the object back from the file

loaded_settings = pickle.load(file)

print("Settings have been loaded successfully!")

print("\n--- Loaded Settings ---")

print(f"Theme: {loaded_settings['theme']}")

print(f"Font Size: {loaded_settings['font_size']}")

print(f"Bookmarks: {loaded_settings['bookmarks']}")

except FileNotFoundError:

print("The settings file was not found. Using default settings.")

except IOError as e:

print(f"An error occurred: {e}")

Output:

Generated code

Settings have been loaded successfully!

--- Loaded Settings ---


Theme: dark

Font Size: 14

Bookmarks: ['google.com', 'python.org']

Example 2: Pickling a Custom Object

Pickling is not limited to built-in types. You can also pickle instances of your own classes.

Generated python

import pickle

# A custom class to represent a game character

class Player:

def __init__(self, name, level, hp):

self.name = name

self.level = level

self.hp = hp

self.inventory = []

def display_status(self):

print(f"Name: {self.name}")

print(f"Level: {self.level}")

print(f"HP: {self.hp}")

print(f"Inventory: {self.inventory}")

# --- Pickling (Saving the game state) ---

player1 = Player('Aragorn', 15, 100)

player1.inventory.append('Sword of Anduril')

player1.inventory.append('Health Potion')

# We can even save a list of objects

game_state = [player1]

with open('gamestate.pkl', 'wb') as file:


pickle.dump(game_state, file)

print("Game state saved.")

# --- Unpickling (Loading the game state in a new session) ---

print("\n--- A few moments later, loading game... ---\n")

with open('gamestate.pkl', 'rb') as file:

loaded_game_state = pickle.load(file)

# The loaded object is a list containing a Player instance

loaded_player = loaded_game_state[0]

print("Game state loaded! Player status:")

# The object's methods are still intact!

loaded_player.display_status()

Output:

Generated code

Game state saved.

--- A few moments later, loading game... ---

Game state loaded! Player status:

Name: Aragorn

Level: 15

HP: 100

Inventory: ['Sword of Anduril', 'Health Potion']

Important Warnings and Considerations


1. Security Risk: Never unpickle data from an untrusted or unauthenticated source.
Unpickling can execute arbitrary code. A malicious pickle file could be crafted to take over
your computer. It is not a secure format.

2. Python Version Compatibility: Pickle protocols can change between Python versions. A
pickle file created with a newer version of Python might not be readable by an older version.

3. Human Readability: Pickle is a binary format and is not human-readable. If you need a
human-readable format for configuration or data exchange, use JSON or YAML instead.

Pickle vs. JSON: A Quick Comparison

Feature pickle json

Can handle almost any Python object, Limited to basic types: strings, numbers,
Data Types
including custom classes. booleans, lists, dictionaries.

Human-
No (it's a binary format). Yes (it's a text format).
Readable

Not secure. Can execute arbitrary


Security Secure. Only parses data.
code.

Primary Use Saving Python program state for later Exchanging data between different
Case use by the same program. programs, especially over the web.

3. Boolean Indexing (Filtering)

Use conditions to select elements. This is extremely powerful.

Generated python

data = np.arange(1, 10).reshape(3, 3)

print(f"Data:\n{data}")

# [[1 2 3]

# [4 5 6]

# [7 8 9]]

# Find all elements greater than 5

bool_mask = data > 5

print(f"\nBoolean Mask (data > 5):\n{bool_mask}")

# Use the mask to select only the elements that are True

print(f"Elements > 5: {data[bool_mask]}") # or data[data > 5] -> [6 7 8 9]


# Use a condition to modify values

data[data % 2 == 0] = 0 # Set all even numbers to 0

print(f"\nData with even numbers set to 0:\n{data}")

IGNORE_WHEN_COPYING_START

content_copy download

Use code with caution. Python

IGNORE_WHEN_COPYING_END

4. Aggregation and Statistical Operations

Quickly compute summary statistics.

Generated python

arr = np.array([1, 5, 2, 9, 3, 7])

print(f"Sum: {arr.sum()}") # 27

print(f"Mean: {arr.mean()}") # 4.5

print(f"Max: {arr.max()}") #9

print(f"Min: {arr.min()}") #1

print(f"Standard Deviation: {arr.std()}") # ~2.6

print(f"Index of Max value: {arr.argmax()}") # 3

For 2D arrays, you can perform these operations on the entire matrix or along a specific axis:

 axis=0: Operation along the columns (collapses rows).

 axis=1: Operation along the rows (collapses columns).

Generated python

matrix = np.array([[1, 2, 3], [4, 5, 6]])

print(f"\nMatrix:\n{matrix}")

# Sum of all elements

print(f"Total sum: {matrix.sum()}") # 21


# Sum along columns (axis=0)

print(f"Column sums: {matrix.sum(axis=0)}") # [1+4, 2+5, 3+6] -> [5 7 9]

# Sum along rows (axis=1)

print(f"Row sums: {matrix.sum(axis=1)}") # [1+2+3, 4+5+6] -> [ 6 15]

5. Reshaping and Transposing

Generated python

arr = np.arange(1, 7) # [1 2 3 4 5 6]

# Reshape to a 2x3 matrix

reshaped_arr = arr.reshape(2, 3)

print(f"Reshaped array (2x3):\n{reshaped_arr}")

# Transpose the matrix (swaps rows and columns)

transposed_arr = reshaped_arr.T

print(f"\nTransposed array (3x2):\n{transposed_arr}")

You might also like