Handbook - Course Tech
Handbook - Course Tech
Handbook ................................................................................................................................................................. 5
Module A .............................................................................................................................................................. 5
1. Datatypes in Python: .................................................................................................................................... 5
1.1 Integer (int) .............................................................................................................................. 5
1.2 Float (float) .............................................................................................................................. 5
1. 3 String (str)............................................................................................................................... 5
1. 4 Boolean (bool) ........................................................................................................................ 5
1. 5 List (list) .................................................................................................................................. 6
1.6 Tuple ...................................................................................................................................... 6
1.7 Dictionary (dict)........................................................................................................................ 6
1. 8 Set (set) .................................................................................................................................. 6
1.9 None Type (None) ................................................................................................................... 6
1.10 Complex (complex) ................................................................................................................ 7
2. Logical Operators Overview ......................................................................................................................... 7
3. Comparison Operators................................................................................................................................. 7
4. Arithmetic Operators in Python .................................................................................................................... 8
Examples of Usage ....................................................................................................................... 8
4.2 Advanced Operations .............................................................................................................. 8
4.3 Operations on Python Data Types ........................................................................................... 8
5. Syntax of if-elif-else Statements ................................................................................................................. 10
6. For Loop .................................................................................................................................................... 10
Syntax: ........................................................................................................................................ 10
7. While Loop................................................................................................................................................. 10
Syntax: ........................................................................................................................................ 10
8. break Statement: ....................................................................................................................................... 10
Syntax: ........................................................................................................................................ 10
9. continue Statement: ................................................................................................................................... 11
Syntax: ........................................................................................................................................ 11
10. pass Statement: ....................................................................................................................................... 11
Syntax: ........................................................................................................................................ 11
11. General Functions for Data Types ........................................................................................................... 11
Common Functions ..................................................................................................................... 11
12. List Comprehensions ............................................................................................................................... 11
Syntax Example........................................................................................................................... 11
13. Dictionary Comprehension : ..................................................................................................................... 12
Example: ..................................................................................................................................... 12
Explanation:................................................................................................................................. 12
14. Lambda Functions: .................................................................................................................................. 12
Example: ..................................................................................................................................... 12
Syntax: ........................................................................................................................................ 12
15. Recursion : .............................................................................................................................................. 13
15.1 What is Recursion?.............................................................................................................. 13
15.2 How does Recursion work? ................................................................................................. 13
15.3 Example: Fibonacci Sequence using Recursion .................................................................. 13
16. Variable-Length Arguments ( *args ) and Keyword Arguments ( **kwargs ): ............................................ 13
16.1 Keyword Arguments ( **kwargs ): ........................................................................................ 13
16.2 Combining *args and **kwargs ............................................................................................ 13
Example: ..................................................................................................................................... 14
16.3 Using with Functions: ........................................................................................................... 14
17. File Handling in Python ............................................................................................................................ 14
Key Concepts .............................................................................................................................. 14
17.1 Opening and Closing Files: Use open() to open a file and close() to close it. ....................... 14
17.2 Modes:................................................................................................................................. 14
17.3 Reading Files: Use read() , readline() , or readlines() to read file content. ........................... 14
17.4 Writing Files: Use write() to write data to a file. .................................................................... 14
17.5 Context Manager: Use with to handle files automatically (no need to call close() )............... 14
17.6 Opening a File and Reading Content: .................................................................................. 14
17.7 Writing to a File: ................................................................................................................... 14
17.8 Appending to a File: ............................................................................................................. 14
17.9 Reading File Line by Line: ................................................................................................... 15
17.11 Seek and Tell: .................................................................................................................... 15
18. Sorting Techniques and Analysis: ............................................................................................................ 15
18.1 Bubble Sort Concept............................................................................................................ 15
18.2 Quick Sort Concept.............................................................................................................. 16
18.3 Merge Sort Concept............................................................................................................. 16
18.4 Insertion Sort Concept: ........................................................................................................ 17
18.5 Selection Sort ...................................................................................................................... 18
19. Markov Chain:.......................................................................................................................................... 19
Key Concepts .............................................................................................................................. 19
19.1 States: ................................................................................................................................. 19
19.2 Transition Probabilities:........................................................................................................ 19
19.3 Initial State Distribution: ....................................................................................................... 19
19.4 Stationary Distribution: ......................................................................................................... 19
19.5 Properties ............................................................................................................................ 20
19.6 Steps to Solve Markov Chain Problems ............................................................................... 20
20. NumPy..................................................................................................................................................... 20
20.1 Array .................................................................................................................................... 20
20.2 Difference between NumPy arrays and lists......................................................................... 20
20.3 Higher dimension arrays ...................................................................................................... 21
20.4 Array Properties/Attributes ................................................................................................... 21
20.5 Accessing Array Elements ................................................................................................... 21
20.6 Conditional Indexing ............................................................................................................ 21
20.7 Mutability ............................................................................................................................. 21
20.8 Referencing ......................................................................................................................... 22
20. 9 Reshaping .......................................................................................................................... 22
20.10 Data Types Changing data types ....................................................................................... 22
Initialization 20.11 Zeros and Ones.............................................................................................. 22
20.12 Range ................................................................................................................................ 22
20.13 Operations Arithmetic ........................................................................................................ 23
20.14 Matrix ................................................................................................................................. 23
Matrix Multiplication ..................................................................................................................... 23
20.15 Transpose.......................................................................................................................... 23
20.16 Determinant ....................................................................................................................... 24
20.17 Universal Functions ........................................................................................................... 25
21. Random Module ...................................................................................................................................... 25
21.1 Distribution........................................................................................................................... 25
21.2 Seed .................................................................................................................................... 25
22. Statistical Analysis ................................................................................................................................... 25
22.1 Mean ................................................................................................................................... 25
22.2 Median and Mode ................................................................................................................ 26
22.3 Standard deviation and Variance ......................................................................................... 26
22.4 Visualization with Matplotlib ................................................................................................. 26
Example: ..................................................................................................................................... 26
23. Pandas DataFrame .................................................................................................................................. 26
23.1 Accessing Multiple Columns Concept .................................................................................. 26
23.2 Introduction to Pandas DataFrames..................................................................................... 26
23.3 Creating a DataFrame: ........................................................................................................ 27
23.4 Basic Information About the DataFrame: ............................................................................. 27
23.5 Accessing Data Accessing Columns: ................................................................................... 27
Accessing Rows: ......................................................................................................................... 27
Summary Statistics: ..................................................................................................................... 27
Additional Pandas Topics: ........................................................................................................... 28
23.6 Filtering Data: ...................................................................................................................... 28
Handling Missing Data: ................................................................................................................ 28
23.7 Slicing Rows Concept .......................................................................................................... 28
Syntax ......................................................................................................................................... 28
Example ...................................................................................................................................... 28
23.8 Accessing First and Last Few Rows Accessing First Rows.................................................. 29
23.9 Syntax ................................................................................................................................. 29
Example ...................................................................................................................................... 29
23.10 Accessing Last Rows Concept........................................................................................... 29
Syntax ......................................................................................................................................... 29
Example ...................................................................................................................................... 29
23.11 Statistical Analysis on Data Concept.................................................................................. 29
Common Functions ..................................................................................................................... 30
Concept ....................................................................................................................................... 30
Syntax ......................................................................................................................................... 30
Examples..................................................................................................................................... 30
2. Details of a Specific Student (e.g., Bob):.................................................................................. 30
3. Specific Columns for a Condition: ............................................................................................ 31
4. Students with a Specific Age: .................................................................................................. 31
23.12 Reading Data from CSV Concept ...................................................................................... 31
Syntax ......................................................................................................................................... 31
23.13 Getting DataFrame Information Concept............................................................................ 31
Finding Specific Values in DataFrame Concept ........................................................................... 31
Example ...................................................................................................................................... 32
Module B ............................................................................................................................................................ 33
1 Regression: Linear and Polynomial...................................................................................................................... 33
1.1 Linear Regression ........................................................................................................................................ 33
Equation: .............................................................................................................................................. 33
1.2 Polynomial Regression ................................................................................................................................. 33
Equation: .............................................................................................................................................. 33
Properties: ............................................................................................................................................ 33
1.3 Overfitting vs. Underfitting ............................................................................................................................ 33
1.3.1 Overfitting .................................................................................................................................... 33
1.3.2 Underfitting .................................................................................................................................. 34
2 Principal Component Analysis (PCA) ................................................................................................................... 34
2.1 Process of PCA ............................................................................................................................................ 34
2.2 Formula ........................................................................................................................................................ 34
2.3 Covariance Matrix ........................................................................................................................................ 34
2.4 Important Properties ..................................................................................................................................... 34
3 PageRank ............................................................................................................................................................ 35
3.1 Process ........................................................................................................................................................ 35
3.2 Components and Their Effect ....................................................................................................................... 35
4 Neural Networks .................................................................................................................................................. 35
4.1 Foundations of the Perceptron ..................................................................................................................... 35
Prediction: ............................................................................................................................................ 35
4.2 Enhancing the Perceptron with Learning Algorithms .................................................................................... 36
Weight Update Rule:............................................................................................................................. 36
4.3 Neuron: Basic Building Block........................................................................................................................ 36
4.4 Components of a Neuron ............................................................................................................................. 36
4.5 Mathematical Representation ....................................................................................................................... 37
4.6 Structure of a Neural Network ...................................................................................................................... 37
5 Forward and Backward Propagation .................................................................................................................... 38
5.1 Overview ...................................................................................................................................................... 38
5.2 Forward Pass ............................................................................................................................................... 38
5.3 Backward Pass............................................................................................................................................. 39
∂w .................................................................................................................................................... 40
5.4 Error Calculation........................................................................................................................................... 40
5.5 Activation Functions ..................................................................................................................................... 40
6 Convolution neural networks(CNN) ...................................................................................................................... 41
6.1 Output Size Calculation ................................................................................................................................ 41
Where: .................................................................................................................................................. 41
7 Support Vector Machine (SVM)............................................................................................................................ 41
7.1 Process ........................................................................................................................................................ 41
7.2 Formula ........................................................................................................................................................ 41
7.3 Important Properties ..................................................................................................................................... 41
8 Supervised Learning - Evaluation Metrics ............................................................................................................ 43
8.1 Confusion Matrix .......................................................................................................................................... 43
8.2 Accuracy ...................................................................................................................................................... 43
8.3 Precision ...................................................................................................................................................... 43
8.4 Recall ........................................................................................................................................................... 43
8.5 Mean Absolute Error (MAE).......................................................................................................................... 43
n....................................................................................................................................................... 43
8.6 Mean Squared Error (MSE) .......................................................................................................................... 43
n....................................................................................................................................................... 43
9 Evaluation Metrics for Unsupervised Learning ..................................................................................................... 45
9.1 1. Silhouette Score ....................................................................................................................................... 45
Process: ............................................................................................................................................... 45
Formula: ............................................................................................................................................... 45
9.2 2. Reconstruction Error (Dimensionality Reduction) ..................................................................................... 45
Process: ............................................................................................................................................... 45
Formula: ............................................................................................................................................... 45
9.3 Important Properties ..................................................................................................................................... 47
Silhouette Score: .................................................................................................................................. 47
Reconstruction Error: ............................................................................................................................ 47
10 K-Means Clustering ........................................................................................................................................... 47
10.1 Process ...................................................................................................................................................... 47
10.2 Formula ...................................................................................................................................................... 47
|Cj| .................................................................................................................................................... 47
xi ...................................................................................................................................................... 47
10.3 Important Properties ................................................................................................................................... 47
Handbook
Module A
1. Datatypes in Python:
1.1 Integer (int)
Description: Integers represent whole numbers, both positive and negative, without decimals.
Example:
age = 25
temperature = -5
Explanation: In this example, age and temperature are integers because they don’t have any decimal
parts.
price = 9.99
weight = 70.5
Explanation: Both price and weight have decimal parts, which makes them float values.
1. 3 String (str)
Description: Strings are sequences of characters enclosed within quotes. They are used for text.
Example:
name = "Alice"
greeting = "Hello, world!"
Explanation: Here, name and greeting are strings because they contain characters enclosed in quotes.
1. 4 Boolean (bool)
Description: Booleans represent one of two values: True or False. They are often used in conditional
statements.
Example:
is_active = True
has_permission = False
Explanation: is_active and has_permission are Boolean values, representing True or False.
1. 5 List (list)
Description: Lists are ordered collections of items, which can be of any data type. Lists are mutable,
meaning their contents can be changed.
Example:
Explanation: Here, fruits is a list of strings, and numbers is a list of integers. Lists are written within square
brackets [].
1.6 Tuple
Description: Tuples are ordered collections of items, similar to lists, but they are immutable, meaning they
cannot be changed once created.
Example:
Explanation: coordinates and colors are tuples. They are enclosed within parentheses () and cannot be
modified.
Explanation: person is a dictionary where each piece of data (like "name" and "age") is associated with a
unique key. Dictionaries are written within curly braces {}.
1. 8 Set (set)
Description: Sets are collections of unique items. They are unordered, meaning there’s no guaranteed
order for items. Sets automatically eliminate duplicate values.
Example:
unique_numbers = {1, 2, 3, 4, 4, 5}
Explanation: unique_numbers is a set. Even though 4 is added twice, the set stores only unique values {1,
2, 3, 4, 5}.
result = None
Description: Complex numbers consist of a real and an imaginary part, represented as a + bj where a is
the real part and b is the imaginary part.
Example:
z = 3 + 4j
Explanation: z is a complex number where 3 is the real part and 4j is the imaginary part.
if has_lantern or has_map:
print("You can proceed carefully.")
if not has_key:
print("You need the key.")
3. Comparison Operators
== Equal to 5 == 5 True
+ Addition 5 + 3 8
- Subtraction 5 - 3 2
* Multiplication 5 * 3 15
// Floor Division 5 // 3 1
% Modulus (Remainder) 5 % 3 2
** Exponentiation 5 ** 3 125
Examples of Usage
a = 10
b = 3
print(a + b) # Output: 13
print(a - b) # Output: 7
x = 7
y = 2
print(x ** y) # 7 to the power of 2 = 49
print(x // y) # Floor division = 3
print(x % y) # Remainder = 1
4.4 Strings
Operation Description Example Result
4.6 Lists
4.7 Tuples
4.8 Sets
4.9 Dictionaries
Add or
Add/Update
update key- d["new_key"] = "new_value" {'key': 'value', 'new_key': 'new_value'}
Pair
value pair
Remove a
Delete Pair key-value del d["key"] {}
pair
if condition1:
# Code block for condition1
elif condition2:
# Code block for condition2
else:
# Code block for when none of the above conditions are true
6. For Loop
Syntax:
7. While Loop
Syntax:
while condition:
# Code to execute
8. break Statement:
The break statement is used to exit the loop early, regardless of the loop's condition. It terminates the loop
immediately and control is transferred to the next statement after the loop.
Syntax:
for i in range(10):
if i == 5:
break # exit the loop when i is 5
9. continue Statement:
The continue statement skips the current iteration of the loop and continues to the next iteration, without
executing the remaining code for the current loop iteration.
Syntax:
for i in range(10):
if i == 5:
continue # skip the rest of the loop when i is 5
print(i)
The pass statement is a placeholder. It is used when no action is required and is usually found in places where
code is syntactically required but no action is needed.
Syntax:
for i in range(10):
if i == 5:
pass # do nothing
else:
print(i)
Common Functions
List comprehensions provide a concise way to create lists. They combine loops and conditional statements into a
single line of code.
Syntax Example
numbers = [1, 2, 3, 4, 5]
squares = [x**2 for x in numbers if x > 2]
print(squares) # Output: [9, 16, 25]
Explanation: The above code iterates through the list numbers , selects elements greater than 2, squares
them, and stores the results in the new list squares .
A concise way to create dictionaries using loops or conditions within a single line.
Syntax: {key: value for key, value in iterable if condition} .
Example:
numbers = [1, 2, 3, 4, 5]
squares = {num: num**2 for num in numbers if num > 2}
print(squares)
Output:
Explanation:
{num: num**2 for num in numbers if num > 2} : This dictionary comprehension iterates through the list
numbers and includes only those numbers greater than 2.
For each number, it creates a key-value pair where the key is the number ( num ) and the value is the square of
the number ( num**2 ).
sorted() Function:
Recursion is a programming technique in which a function calls itself in order to solve smaller instances of the same
problem. A recursive function typically has two parts:
1. Base Case: The condition under which the function stops calling itself and returns a result. Without a base
case, the function would call itself indefinitely.
2. Recursive Case: The part where the function calls itself with modified arguments, progressively solving smaller
parts of the problem.
15.2 How does Recursion work?
A recursive function reduces the problem into a simpler or smaller version of the same problem.
The function keeps calling itself with simpler inputs until it hits the base case.
The function then starts returning the results and building up the solution step by step.
The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones. The
sequence starts with 0 and 1, and the subsequent numbers are:
You can use both *args and **kwargs in the same function to handle both positional and keyword arguments.
def describe(*args, **kwargs):
print("Positional arguments:", args)
print("Keyword arguments:", kwargs)
Example:
16.3 Using with Functions:
*args and **kwargs make your functions flexible and reusable by allowing them to accept varying numbers
and types of inputs.
Key Concepts
17.1 Opening and Closing Files: Use open() to open a file and close() to close it.
17.2 Modes:
'r' : Read (default mode).
'w' : Write (overwrites file if it exists).
'a' : Append (adds content to the end of the file).
'r+' : Read and write.
17.3 Reading Files: Use read() , readline() , or readlines() to read file content.
17.4 Writing Files: Use write() to write data to a file.
17.5 Context Manager: Use with to handle files automatically (no need to call close() ).
Examples
Notes:
Best case: O(n) – This occurs when the list is already sorted. The algorithm only needs one pass through the list
to confirm that no swaps are needed.
Worst case: O(n²) – This occurs when the list is sorted in reverse order, meaning the algorithm has to compare
and swap each element in every pass.
Space Complexity
How It Works
18.2 Quick
Sort Concept
Quick Sort is a divide-and-conquer algorithm. It picks a "pivot" element and partitions the list into two sublists:
elements smaller than the pivot and elements greater than the pivot. Then, it recursively sorts the sublists.
Time Complexity
Best case: O(n log n) – This occurs when the pivot divides the list into two nearly equal halves. This results in
balanced partitioning at each step, leading to efficient sorting.
Worst case: O(n²) – This occurs when the pivot is the smallest or largest element in the list, causing the
partitioning to be unbalanced (one side has almost all the elements, and the other side has very few), leading to
poor performance.
Space Complexity
How It Works
5. Choose a pivot element from the list (commonly the last element).
6. Partition the list into two sublists: one with elements smaller than the pivot and one with elements greater than the
pivot.
7. Recursively apply the same process to the sublists.
8. Once the sublists have been sorted, the entire list is sorted.
Example
Pivot: 70
Partitioned: [10, 30, 40, 50, 70, 90, 80]
Now recursively sort the sublists: [10, 30, 40, 50] and [90, 80]
Final sorted list: [10, 30, 40, 50, 70, 80, 90]
18.3 Merge
Sort Concept
Merge Sort is another divide-and-conquer algorithm. It divides the list into two halves, recursively sorts each half,
and then merges the two sorted halves into a single sorted list.
Time Complexity
Best case: O(n log n) – The time complexity is always O(n log n) because the list is always divided in half, and
merging still takes linear time even in the best case.
Worst case: O(n log n) – Merge Sort performs the same regardless of the initial order of elements because the
list is always recursively divided and merged in a consistent manner.
Space Complexity
How It Works
18.4 Insertion
Sort Concept:
Insertion Sort is a simple, comparison-based sorting algorithm that builds the final sorted array one item at a time.
It is analogous to the way people sort playing cards in their hands.
Time Complexity:
Best case: O(n) – Occurs when the list is already sorted. Only one comparison is needed per element.
Worst case: O(n²) – Occurs when the list is sorted in reverse order. Each element is compared with all previous
elements.
Space Complexity: O(1) (in-place sorting).
How It Works:
12. Divide the list into a "sorted" and "unsorted" section. Initially, the first element is considered sorted.
13. Take the first element from the unsorted section and insert it into its correct position in the sorted section.
14. Repeat the process for all elements in the unsorted section until the entire list is sorted.
Example:
Initial list:
12
. Result:
5, 6, 11, 12, 13
5, 6, 11, 12, 13
18.5 Selection Sort
Concept:
Selection Sort is a comparison-based sorting algorithm that repeatedly selects the smallest (or largest) element
from the unsorted section and moves it to the sorted section.
Time Complexity:
Best case: O(n²) – Even in the best case, the algorithm performs the same number of comparisons as it does in
the worst case.
Worst case: O(n²) – Each element is compared to all others to find the minimum.
How It Works:
1. Divide the list into a "sorted" and "unsorted" section. Initially, the entire list is unsorted.
2. Find the smallest element in the unsorted section and swap it with the first element of the unsorted section.
3. Move the boundary between the sorted and unsorted sections one element to the right.
4. Repeat the process until the entire list is sorted.
Example:
Initial list:
1. Find the smallest element (11) and swap it with 64. Result:
11, 25, 12, 22, 64
2. Find the next smallest element (12) and swap it with 25. Result:
3. Find the next smallest element (22) and swap it with 25. Result:
A Markov chain is a stochastic process that satisfies the Markov property, which states that the future state of a
process depends only on the present state and not on the sequence of events that preceded it. Markov chains are
widely used for modeling probabilistic systems that transition between states.
Key Concepts
19.1 States:
P = [
[P(S1 -> S1), P(S1 -> S2), P(S1 -> S3)],
[P(S2 -> S1), P(S2 -> S2), P(S2 -> S3)],
[P(S3 -> S1), P(S3 -> S2), P(S3 -> S3)]
]
19.5 Properties
20. NumPy
NumPy (Numerical Python) is a foundational library for numerical and scientific computing in Python. It provides
robust support for large multi-dimensional arrays and matrices, along with an extensive collection of high-level
mathematical functions to manipulate these arrays. NumPy is highly optimized for performance and is a cornerstone
in fields like data science, machine learning, and scientific research.
20.1 Array
An array is a central data structure in NumPy, enabling efficient storage and manipulation of homogeneous data
types.
import numpy as np
# Creating a 1D array
arr = np.array([1, 2, 3, 4])
# Creating a 2D array
arr_2d = np.array([[1, 2], [3, 4]])
# Creating a 3D array
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
NumPy arrays offer better performance, more functionality, and memory efficiency compared to Python lists. Unlike
lists, NumPy arrays are homogeneous, allowing for optimized operations.
20.3 Higher dimension arrays
Higher-dimensional arrays are useful for applications like image processing or deep learning.
NumPy arrays have several attributes that provide information about their structure and data.
Accessing elements in a NumPy array uses zero-based indexing, similar to Python lists.
arr = np.array([10,20,30,40,50])
print(arr[arr > 30]) # Retrieves elements greater than 30
arr[arr > 30] = -1 # Updates elements greater than 30 to -1
print(arr)
20.7 Mutability
NumPy arrays are mutable, meaning you can modify their contents.
arr[0] = 10 # Modifying an element
print(arr)
20.8 Referencing
Assigning an array to another variable creates a reference, not a copy. Changes in the new variable affect the
original.
ref = arr
ref[1] = 20
print(arr) # Original array is also modified
20. 9 Reshaping
You can reshape arrays to change their structure without altering the data.
arr_float = arr.astype(float)
print(arr_float.dtype)
print(arr_float)
Initialization
Ones
You can initialize arrays filled with zeros or ones, useful for default values.
20.12 Range
20.13 Operations
Arithmetic
20.14 Matrix
Matrix Multiplication
20.15 Transpose
20.16 Determinant
21.1 Distribution
numpy.random.exponential(scale, size=None)
# Here scale is the inverse of the rate parameter, size determine
arr = np.random.exponential(1, 10000)
21.2 Seed
Setting a seed ensures that the random numbers generated are reproducible.
The median is the middle value, while the mode is the most frequent value.
arr = np.array([1,2,3,4,5])
median_val = np.median(arr)
from scipy.stats import mode
mode_val = mode(arr)
Standard deviation measures spread, while variance measures the squared spread.
arr = np.array([1,2,3,4,5])
std_val = np.std(arr)
var_val = np.var(arr)
NumPy arrays can represent image data, which can be visualized using Matplotlib.
Example:
23. Pandas
DataFrame
23.1 Accessing Multiple Columns Concept
Pandas allows users to access multiple columns by passing a list of column names to the DataFrame. This
technique is particularly useful for isolating specific subsets of data for further analysis or visualization.
Pandas provides a powerful data structure called DataFrame, similar to a table or spreadsheet.
It allows for easy manipulation and analysis of structured data.
student_data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': np.random.randint(18, 22, size=5),
'Major': ['Computer Science', 'Physics', 'Mathematics', 'Engineering', 'Biology'],
'GPA': np.random.uniform(2.5, 4.0, size=5).round(2)
}
df = pd.DataFrame(student_data)
print(type(df)) # Output: <class 'pandas.core.frame.DataFrame'>
print(df)
df.info()
Displays:
Number of rows and columns.
Data types of each column.
Memory usage.
Accessing Columns:
Accessing Rows:
Summary Statistics:
df['GPA'].fillna(df['GPA'].mean(), inplace=True)
df.dropna(inplace=True)
Concept
Row slicing allows you to extract specific rows from a DataFrame based on their index positions. This operation is
essential for exploring data subsets or preparing training and testing datasets.
Syntax
df.iloc[start:stop]
Example
Concept
The head() function retrieves the first few rows of the DataFrame. It is useful for previewing the dataset or verifying
its structure.
23.9 Syntax
df.head(n)
Example
Concept
The tail() function retrieves the last few rows of the DataFrame. It is commonly used to check the end of the
dataset.
Syntax
df.tail(n)
Example
Data Concept
Pandas offers built-in methods for performing statistical computations and summarizing data. These methods are
crucial for understanding data distributions and trends.
Common Functions
1. describe(): Provides a statistical summary of numerical columns, including count, mean, standard deviation,
minimum, and quartiles.
df.describe()
df['column_name'].mean()
df['column_name'].min()
df['column_name'].max()
Concept
Querying enables you to filter rows in a DataFrame based on specific conditions. This technique is pivotal for
isolating relevant data.
Syntax
df[condition]
Examples
df[df['GPA'] == df['GPA'].max()]
Retrieves the row where the "GPA" column has the maximum value.
2. Details of a Specific Student (e.g., Bob):
df[df['Name'] == 'Bob']
Retrieves the "Major" and "GPA" columns for the student named "Bob".
df[df['Age'] == 18]
CSV Concept
The read_csv() function reads data from a CSV file and loads it into a DataFrame. This is a primary method for
importing datasets.
Syntax
pd.read_csv(filepath)
Information Concept
info(): Displays metadata about the DataFrame, such as column names, data types, and non-null values.
df.info()
This is useful for assessing the dataset's structure and identifying missing data.
describe(): Provides a statistical summary of numerical columns.
df.describe()
This function is essential for gaining insights into numerical data distributions.
Concept
Conditional indexing enables you to locate rows that meet specific criteria.
Example
df[df['Temperature'] == df['Temperature'].max()]
Retrieves the row where the "Temperature" column has the maximum value.
Module B
1
1.3.2 Underfitting
Definition: Model is too simple to capture underlying patterns in the data. Causes:
Low model complexity (e.g., linear regression for non-linear data). Fixes:
• Increase model complexity.
• Use higher-degree polynomial terms.
• Improve feature engineering.
2.2 Formula
Z = XW
• Z: Transformed data (principal components)
• X: Original data matrix (standardized)
• W : Matrix of eigenvectors (principal components)
2
3 PageRank
3.1 Process
1. Graph Representation: Represent the web as a directed graph where nodes are
web pages and edges are hyperlinks.
2. Initial Rank Assignment: Assign an initial rank to each page (e.g., equally
distributed or random).
3. Rank Update: Iteratively update ranks based on incoming links and the rank of
linking pages.
4. Convergence: Continue until the rank values stabilize (i.e., the change is below a
small threshold).
• Outlinks (Outgoing Links): Pages share their rank among all pages they link
to.
4 Neural Networks
4.1 Foundations of the Perceptron
Prediction:
y = sign(w · x + b)
• w: Weight vector
• b: Bias
3
4.2 Enhancing the Perceptron with Learning Algorithms
Weight Update Rule:
w = w + ∆w where ∆w = η(ytrue − ypred)x
• η: Learning rate
• ytrue: True label
• ypred: Predicted label
Figure 1: Neuron
4
4.5 Mathematical Representation
To compute the output of a neuron, first calculate the weighted sum of inputs and
bias:
Σn
Weighted Sum (Net Output) = w ix i + b
i=1
In words: The neuron’s output is the result of applying the activation function f to the
sum of the weighted inputs (wi × xi) and the bias (b).
• Input Layer: Accepts raw data and passes it to the next layer.
• Hidden Layer(s): Processes inputs using weights and biases, and learns patterns.
1. Weighted Sum of Inputs: For a neuron, the net input is calculated as the
weighted sum of the inputs to the neuron, plus a bias term:
Σ
netj = wijxi + bj
i
where:
6
• wij are the weights associated with the input xi,
• bj is the bias term for neuron j,
• xi are the input features.
In the backward pass, we calculate how much each weight in the network contributed
to the overall error, and adjust the weights accordingly.
(a) Error Calculation: The error for each output neuron is computed using the
Mean Squared Error (MSE) formula:
1 2
Ej = (tj − yj)
2
where:
• tj is the target output,
• yj is the predicted output from the forward pass.
This error measures how far the predicted output yj is from the desired target
output tj.
(b) Total Error: The total error of the network across all output neurons is the
sum of the individual errors:
Σ
Etotal = Ej
j
7
(c) Gradient Calculation: To adjust the weights, we need to compute the gra-
dient of the total error with respect to each weight. Using the chain rule of
differentiation, the gradient of the total error with respect to a weight wij is:
∂Ej ∂yj ∂netj
∂Etotal = · ·
• ∂yj
∂netj = f ′(net ),j where f ′(net )j is the derivative of the activation function
(e.g., for the sigmoid, f ′(netj) = yj(1 − yj)),
∂netj
• ∂wij
= xi, the input to the neuron.
(d) Weight Update: The weights are then updated using the gradient descent
algorithm:
∂Etotal
wnew = wij — η · ∂w
ij
ij
where η is the learning rate, a hyperparameter that controls the step size of
the weight update.
Etotal 2 j
j
where tj is the target output and yj is the predicted output. This error guides the
weight update in the backward pass.
7.1 Process
(a) Define Decision Boundary: Find the hyperplane that best separates classes.
(b) Maximize Margin: Ensure the largest distance (margin) between the hyper-
plane and nearest data points (support vectors).
(c) Kernel Trick (if required): Map data to a higher dimension if it is not
linearly separable.
(d) Solve Optimization Problem: Use quadratic programming to find optimal
hyperplane parameters.
7.2 Formula
f (x) = sign(w · x + b)
• w: Weight vector, defining the orientation of the hyperplane.
• x: Input vector (data point).
• b: Bias term, shifts the hyperplane.
• sign(): Determines class (+1 or -1).
9
8 Supervised Learning - Evaluation Metrics
8.1 Confusion Matrix
A table summarizing the performance of a classification model.
TP FP
FN TN
• TP (True Positive): Correctly predicted positive cases.
• FP (False Positive): Incorrectly predicted as positive.
• FN (False Negative): Incorrectly predicted as negative.
• TN (True Negative): Correctly predicted negative cases.
—
8.2 Accuracy
TP + TN
Accuracy = TP + TN + FP + FN
8.3 Precision
TP
Precision = TP + FP
8.4 Recall
TP
Recall = TP + FN
• ŷ i : Predicted value.
• n: Total number of samples.
• ŷ i : Predicted value.
• n: Total number of samples.
10
9 Evaluation Metrics for Unsupervised Learning
(c) Calculate the overall silhouette score as the mean of all s(i).
(a) Reduce the dimensionality of the data using a method like PCA or autoen-
coders.
(b) Reconstruct the data from the reduced dimensions.
(c) Calculate the error as the difference between the original and reconstructed
data.
Formula:
Reconstruction Error = ||X − X̂ || 2
11
9.3 Important Properties
• Silhouette Score:
– Ranges from −1 to 1.
– Positive scores indicate well-separated clusters, while negative scores sug-
gest misclassification.
• Reconstruction Error:
– Lower values indicate better preservation of original data in reduced di-
mensions.
– Sensitive to noise and dimensionality reduction techniques.
10 K-Means Clustering
10.1 Process
(a) Initialize Centroids: Randomly select k centroids from the dataset.
(b) Assign Clusters: Assign each data point to the nearest centroid.
(c) Update Centroids: Recalculate the centroids as the mean of points in each
cluster.
(d) Repeat: Iterate steps 2 and 3 until centroids stabilize or a maximum number
of iterations is reached.
10.2 Formula
1
µj = Σ
|Cj| xi
xi∈Cj
12
• Sensitive to Initialization: Results may vary with random centroids.
• Scalability: Efficient for large datasets but may struggle with very high di-
mensions.
13