0% found this document useful (0 votes)
10 views4 pages

Ashfatmaterial

The document covers various machine learning techniques including Association Rule Mining using the Apriori algorithm, Linear Regression, and data preprocessing steps such as handling missing data and encoding categorical variables. It provides code examples and explanations for each method, emphasizing the importance of data preparation for effective machine learning models. Additionally, it highlights the application of the Apriori algorithm in market basket analysis to identify frequent itemsets and association rules.

Uploaded by

Ashwin Vikram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Ashfatmaterial

The document covers various machine learning techniques including Association Rule Mining using the Apriori algorithm, Linear Regression, and data preprocessing steps such as handling missing data and encoding categorical variables. It provides code examples and explanations for each method, emphasizing the importance of data preparation for effective machine learning models. Additionally, it highlights the application of the Apriori algorithm in market basket analysis to identify frequent itemsets and association rules.

Uploaded by

Ashwin Vikram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CAT1

01 December 2024 17:38

1. Association Rule Mining – Apriori Algorithm


From <https://www.youtube.com/watch?v=43CMKRHdH30&ab_channel=MaheshHuddar>

1. Association Rule Mining – Apriori Algorithm - Numerical Example Solved by Mahesh Huddar

Confusion Matrix:

How to find Calculate F1 Score for Multi-Class Classification Machine Learning by Dr. Mahesh Huddar

Confusion Matrix Solved Example Accuracy Precision Recall F1 Score Prevalence by Mahesh Huddar

//more is not taught in class

Machine Learning Page 1


Confusion Matrix for Multiclass Classification Precision Recall Weighted F1 Score by Mahesh Huddar

Linear Regression:

Linear Regression Algorithm – Solved Numerical Example in Machine Learning by Mahesh Huddar

Linear Regression Matrix Form | Solved Numerical Example using Matrix Method by Mahesh Huddar

Matrix method:

Multiple Linear Regression:

Multiple Linear Regression Solved Numerical Example in Machine Learning Data Mining by Mahesh
Huddar

Feature Scaling :

1 normalizations
Xi-xmin

Machine Learning Page 2


Xi-xmin
------------
Xmax-xmin

2 standardization: This code is a step-by-step implementation of data preprocessing for a machine learning task using
Xi-xmean Python. Let me break it down:
------------
s.d 1. Importing Libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
○ numpy: Used for numerical computations.
Codes: ○ pandas: Used for handling and analyzing datasets.
○ matplotlib.pyplot: Used for visualizing data.

2. Loading Data:
a = pd.read_csv('Data.csv')
x = a.iloc[:, :-1].values
y = a.iloc[:, -1].values
○ pd.read_csv('Data.csv'): Loads the dataset from a file named Data.csv.
○ x: Contains all columns except the last one (features).
○ y: Contains only the last column (target/output variable).

3. Handling Missing Data:


from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(x[:, 1:3])
x[:, 1:3] = imputer.transform(x[:, 1:3])
○ SimpleImputer: Fills missing values in the dataset.
○ strategy='mean': Replaces missing values with the mean of the column.
○ This is applied to the 2nd and 3rd columns of x.

4. Encoding Categorical Data:


from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)
○ LabelEncoder: Converts categorical labels (in y) into numeric values.
○ Example: If y contains ['Yes', 'No', 'Yes'], it will be converted to [1, 0, 1].

5. Splitting the Dataset:


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)
○ Splits the dataset into training and testing sets.
○ test_size=0.2: 20% of the data is used for testing.
○ random_state=1: Ensures the split is reproducible.

Summary:
• The code reads a dataset, cleans it (handles missing values), encodes categorical data, and splits it
into training and testing sets. These steps are essential for preparing data for machine learning
models.

From <https://chatgpt.com/c/674d069c-1af0-8007-b82d-198ffae1e8ab>

From <https://chatgpt.com/c/674d069c-1af0-8007-b82d-198ffae1e8ab>

This code snippet demonstrates how to preprocess a dataset (e.g., SB_Startups.csv) for machine
learning. Below is a step-by-step explanation:

1. Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
• numpy: Used for numerical computations and matrix operations.
• matplotlib.pyplot: For data visualization (not used here, but imported).
• pandas: For handling datasets and data preprocessing.

2. Importing the Dataset


dataset = pd.read_csv('SB_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
• pd.read_csv: Reads the dataset from a CSV file named SB_Startups.csv.
• X: Independent variables (all columns except the last one).
• y: Dependent variable (the last column, assumed to be the output/target variable).
Example Values of X:
[[165349.2, 136897.8, 471784.1, 'New York'],
[162597.7, 151377.59, 443898.53, 'California'],
...]
• The last column ('New York', 'California', etc.) is categorical data.
Example Values of y:
[192261.83, 191792.06, ...]

3. Encoding Categorical Data


from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(
transformers=[('encoder', OneHotEncoder(), [3])],
remainder='passthrough'
)
X = np.array(ct.fit_transform(X))
Explanation:
• Categorical Data: The fourth column in X ([3]) contains city names ('New York', 'California', etc.),
which are categorical and need to be converted into numeric values for machine learning.
• OneHotEncoder: Converts categorical values into one-hot encoded vectors (binary columns).
○ Example:
'New York' -> [0, 0, 1]
'California' -> [1, 0, 0]
'Florida' -> [0, 1, 0]
• ColumnTransformer: Applies the transformation to the specified column(s) ([3]) while keeping the
rest of the dataset unchanged.
• remainder='passthrough': Leaves other columns as-is.
Example Output of Transformed X:
[[0.0, 0.0, 1.0, 165349.2, 136897.8, 471784.1],
[1.0, 0.0, 0.0, 162597.7, 151377.59, 443898.53],
[0.0, 1.0, 0.0, 153441.51, 101145.55, 487934.54],
...]
• The categorical column ('New York', 'California', etc.) is now represented as one-hot encoded
vectors, and the numerical columns remain unchanged.

4. Key Observations
• Original Data Issues: There are some typos or incorrect values in the code snippet:
○ ydataset.iloc[:, -1].values should be y = dataset.iloc[:, -1].values.
○ Encoding output contains formatting errors (e.g., 8.8 values seem to be typos for the one-
hot encoded data).
• The code correctly encodes the categorical column and prepares X for use in a machine learning
model.
If you'd like, I can clean this code further for better clarity. Let me know!

From <https://chatgpt.com/c/674d069c-1af0-8007-b82d-198ffae1e8ab>

From <https://chatgpt.com/c/674d069c-1af0-8007-b82d-198ffae1e8ab>

The provided code implements Association Rule Learning using the Apriori algorithm to analyze
transactional data (e.g., market basket analysis). Here's an explanation of each part of the code:

1. Importing Libraries
!pip install apyori
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
• apyori: A library that implements the Apriori algorithm for discovering frequent itemsets and
association rules.
• numpy, pandas, matplotlib: Common Python libraries used for numerical calculations, data
manipulation, and visualization.

2. Data Preprocessing
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)
transactions = []
for i in range(0, 7501):
transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])
What this does:

Machine Learning Page 3


What this does:
• pd.read_csv:
○ Reads the dataset from the file Market_Basket_Optimisation.csv.
○ header=None indicates the dataset does not have column headers.
• Creating transactions:
○ Initializes an empty list to store all transactions.
○ Loops through 7501 rows (transactions) in the dataset.
○ For each row, loops through 20 columns (items in a transaction).
○ Appends each transaction as a list of items (converted to strings) to transactions.
Example Output of transactions:
[['milk', 'bread', 'butter'],
['tea', 'sugar', 'biscuits'],
...]

3. Training the Apriori Model


from apyori import apriori
rules = apriori(transactions=transactions,
min_support=0.003,
min_confidence=0.2,
min_lift=3,
min_length=2,
max_length=2)
Parameters Explained:
• transactions: The list of transactions from the preprocessing step.
• min_support=0.003:
○ An itemset must appear in at least 0.3% of transactions (7501 × 0.003 ≈ 23 transactions).
• min_confidence=0.2:
○ Rules must have at least 20% confidence (i.e., if X is bought, there's at least a 20% chance Y
is also bought).
• min_lift=3:
○ The rule's lift (strength compared to random chance) must be at least 3.
• min_length=2, max_length=2:
○ Only consider rules with exactly 2 items in the itemset.
Output:
• The rules object is a generator containing association rules that meet these criteria.

4. Visualizing the Results


results = list(rules)
results
• list(rules): Converts the generator into a list to view all discovered rules.
Example Output:
[RelationRecord(items=frozenset({'chicken', 'light cream'}),
support=0.00457, confidence=0.39, lift=4.84),
RelationRecord(items=frozenset({'olive oil', 'whole wheat pasta'}),
support=0.003, confidence=0.27, lift=3.12)]
Breakdown of a Rule:
RelationRecord(items=frozenset({'chicken', 'light cream'}),
support=0.00457, confidence=0.39, lift=4.84)
• Items: 'chicken' and 'light cream' are frequently bought together.
• Support: 0.00457 (i.e., ~0.457% of all transactions include both items).
• Confidence: 0.39 (i.e., if someone buys 'chicken', there's a 39% chance they also buy 'light cream').
• Lift: 4.84 (i.e., buying 'chicken' makes buying 'light cream' 4.84 times more likely than random
chance).

Key Insights
• The Apriori algorithm helps identify frequent itemsets and association rules in the dataset.
• Businesses can use this analysis for:
○ Cross-selling (e.g., "People who bought this also bought that").
○ Optimized product placement in stores.
○ Recommendation systems in e-commerce.
Let me know if you'd like clarification on any part!

From <https://chatgpt.com/c/674d069c-1af0-8007-b82d-198ffae1e8ab>

From <https://chatgpt.com/c/674d069c-1af0-8007-b82d-198ffae1e8ab>

Machine Learning Page 4

You might also like