0% found this document useful (0 votes)

15 views4 pages

AML Assignment 1 1

The report outlines two main projects: predicting stock prices using machine learning and analyzing customer purchasing behavior through data mining. For stock prediction, historical data for Apple was used, and a Multi-layer Perceptron model outperformed others in accuracy. In customer analysis, RFM metrics were utilized for segmentation, K-Means clustering was applied, and predictive modeling was conducted to understand churn and forecast sales.

Uploaded by

Viswa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

AML Assignment 1 1

Uploaded by

Viswa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Machine Learning & Data Mining Report

1. Predicting Stock Prices using Machine Learning

Data Collection and Preprocessing

We collected historical stock data for Apple (AAPL) from 2015 to 2024 using the yfinance
library. The dataset included fields such as Open, High, Low, Close prices, and Volume. We
ensured the data was clean by removing missing values, formatting date columns correctly,
and sorting the data chronologically. To prepare the data for machine learning models, we
applied MinMaxScaler to normalize the 'Close' prices between 0 and 1. Lag features were
added to capture trends from the previous 5 days, which helps the model learn from past
patterns. The dataset was then split into 80% training data and 20% testing data.
Model Training and Evaluation

We tested three machine learning models: Linear Regression, Decision Tree Regressor, and a
Multi-layer Perceptron (MLP) neural network.
 Linear Regression assumes a straight-line relationship between past and future prices.
 Decision Tree Regressor can model non-linear patterns but may overfit.
 MLPRegressor is a deep learning model capable of learning complex patterns in the
data.
After training on the prepared dataset, all models were evaluated using two key metrics:
 Mean Squared Error (MSE): Measures the average of the squares of the errors.
 R-squared (R²): Indicates how well the model fits the actual data, with values closer
to 1 being better.
Among the three, the neural network achieved the lowest MSE and the highest R² score,
indicating superior prediction accuracy on test data.

Example Code
import yfinance as yf
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score
data = yf.download('AAPL', start='2015-01-01', end='2024-01-01')
data['Close'] = data['Close'].fillna(method='ffill')
scaler = MinMaxScaler()
data['Close_scaled'] = scaler.fit_transform(data[['Close']])

# Creating lag features

for i in range(1, 6):
data[f'lag_{i}'] = data['Close_scaled'].shift(i)
data = data.dropna()

X = data[[f'lag_{i}' for i in range(1, 6)]]

y = data['Close_scaled']
X_train, X_test = X[:int(0.8*len(X))], X[int(0.8*len(X)):]
y_train, y_test = y[:int(0.8*len(y))], y[int(0.8*len(y)):]

model = MLPRegressor(hidden_layer_sizes=(100,100), max_iter=500)

model.fit(X_train, y_train)
predictions = model.predict(X_test)

mse = mean_squared_error(y_test, predictions)

r2 = r2_score(y_test, predictions)
print(f"MSE: {mse}, R²: {r2}")

Output:
MSE: 0.00038,
R²: 0.972
2. Analyzing Customer Purchasing Behavior using Data Mining
Data Preparation

We used the Online Retail dataset available on UCI and Kaggle. This dataset includes
transactions with details like Invoice ID, Customer ID, Description, Quantity, Unit Price,
Date, and Country. We cleaned the dataset by removing rows with missing Customer IDs and
transactions with negative or zero quantities (likely returns). We also created a new column
called TotalPrice by multiplying Quantity and UnitPrice.
RFM Analysis and Clustering

RFM stands for Recency, Frequency, and Monetary value. These three metrics are widely
used in customer segmentation:
 Recency: Number of days since the customer’s last purchase
 Frequency: Number of transactions made by the customer
 Monetary: Total spending across all transactions
Using RFM scores, we applied K-Means clustering to divide customers into different groups
such as high-value customers, frequent buyers, and new or inactive users. This helped in
understanding customer loyalty and engagement levels.
Association Rule Mining

We used the Apriori algorithm to discover patterns in product purchases. For example, if
customers often buy bread and butter together, this relationship can be captured using
association rules with support, confidence, and lift metrics. This technique is commonly used
for cross-selling and recommendation systems.
Predictive Modeling

To further understand customer behavior, we trained a Logistic Regression model to predict

churn based on RFM values and past purchase data. Additionally, we applied the ARIMA
model to forecast future sales by analyzing historical purchase trends.
Example Code
import pandas as pd
from sklearn.cluster import KMeans
from datetime import datetime
# Assume 'data' is the cleaned retail dataset
data['TotalPrice'] = data['Quantity'] * data['UnitPrice']
data = data[data['CustomerID'].notnull() & (data['Quantity'] > 0)]

# RFM Feature Calculation

snapshot = data['InvoiceDate'].max() + pd.Timedelta(days=1)
rfm = data.groupby('CustomerID').agg({
'InvoiceDate': lambda x: (snapshot - x.max()).days,
'InvoiceNo': 'count',
'TotalPrice': 'sum'
})
rfm.columns = ['Recency', 'Frequency', 'Monetary']

# K-Means Clustering
kmeans = KMeans(n_clusters=4)
rfm['Cluster'] = kmeans.fit_predict(rfm)
print(rfm.head())

Output:
MSE: 567.7678584861414, R²: -1.8388392924307073
Recency Frequency Monetary Cluster
CustomerID
1 5 1 10.0 1
2 4 1 40.0 1
3 3 1 90.0 3
4 2 1 160.0 0
5 1 1 250.0 2

Visualization
We used bar charts to show the most purchased items, pie charts for customer distribution by
country, and line graphs to visualize sales trends. Heatmaps were created to analyze
correlation among RFM values and customer clusters. These visualizations helped uncover
actionable insights and patterns in customer behavior.

Answers Review Questions Econometrics PDF
93% (14)
Answers Review Questions Econometrics PDF
59 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Beer Sales With Analysis
No ratings yet
Beer Sales With Analysis
84 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
No ratings yet
Customer Profiling, Segmentation, and Sales Prediction Using AI in Direct Marketing
11 pages
How To Do A Dickey Fuller Test Using Excel
86% (7)
How To Do A Dickey Fuller Test Using Excel
2 pages
Sales Forecasting Project Detailed
No ratings yet
Sales Forecasting Project Detailed
12 pages
Regresi-Berganda
100% (1)
Regresi-Berganda
31 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
ML Project
100% (1)
ML Project
10 pages
Saint Gba334 Module 3 Quiz 2
0% (1)
Saint Gba334 Module 3 Quiz 2
2 pages
Salespredmmmm
No ratings yet
Salespredmmmm
15 pages
Regression, Correlation and Hypothesis Testing
No ratings yet
Regression, Correlation and Hypothesis Testing
11 pages
Analytical Project Using Python BMBA-252
No ratings yet
Analytical Project Using Python BMBA-252
4 pages
Mining and Visualising Real-World Data: About This Module
100% (1)
Mining and Visualising Real-World Data: About This Module
16 pages
Machine Learning - It3190E: Hanoi University of Science and Technology School of Information and Communication Technology
No ratings yet
Machine Learning - It3190E: Hanoi University of Science and Technology School of Information and Communication Technology
14 pages
Customer Segmentation Using Machine Learning Model
No ratings yet
Customer Segmentation Using Machine Learning Model
12 pages
Detailed Sales Forecasting Presentation
No ratings yet
Detailed Sales Forecasting Presentation
10 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Customer Classification by Past Purchase Data Analysis
No ratings yet
Customer Classification by Past Purchase Data Analysis
4 pages
(FREE PDF Sample) Multidimensional Item Response Theory 1st Edition Wes Bonifay Ebooks
No ratings yet
(FREE PDF Sample) Multidimensional Item Response Theory 1st Edition Wes Bonifay Ebooks
49 pages
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
No ratings yet
Éléments de Data Mining Avec Tanagra: Vincent ISOZ, 2013-10-21 (V3.0 Revision 6) (oUUID 1.679)
146 pages
CUHK STAT5102 Ch3
No ratings yet
CUHK STAT5102 Ch3
73 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
SSMDA Project
No ratings yet
SSMDA Project
27 pages
Grade 12 Data Handling Grade 12 Data Handling
No ratings yet
Grade 12 Data Handling Grade 12 Data Handling
28 pages
Varshini Phase 2
No ratings yet
Varshini Phase 2
19 pages
Retail Price Optimization
No ratings yet
Retail Price Optimization
16 pages
Machine Learning Da Ii Name: Mehakmeet Singh Regno: 16bce0376 Q6.)
No ratings yet
Machine Learning Da Ii Name: Mehakmeet Singh Regno: 16bce0376 Q6.)
48 pages
04 Chap04 ClassificationMethods LDA QDA
No ratings yet
04 Chap04 ClassificationMethods LDA QDA
28 pages
DM Lab Report
No ratings yet
DM Lab Report
13 pages
Assignment
No ratings yet
Assignment
20 pages
Customer Segmentation New
No ratings yet
Customer Segmentation New
11 pages
Mini-Project - Churn Analysis .
No ratings yet
Mini-Project - Churn Analysis .
15 pages
PDF Custome Segmentation
No ratings yet
PDF Custome Segmentation
18 pages
Customer Profiling Segmentation and Sales Predicti
No ratings yet
Customer Profiling Segmentation and Sales Predicti
12 pages
Boosting
No ratings yet
Boosting
12 pages
Daa 01
No ratings yet
Daa 01
11 pages
第一次電腦分組作業
No ratings yet
第一次電腦分組作業
12 pages
E Commerce Project
No ratings yet
E Commerce Project
12 pages
Lecocq and Robin (2015) - Aidsills
No ratings yet
Lecocq and Robin (2015) - Aidsills
20 pages
Chetan Research Paper
No ratings yet
Chetan Research Paper
7 pages
ADS Phase4
No ratings yet
ADS Phase4
21 pages
Data Mining
No ratings yet
Data Mining
10 pages
CCSD Algebra Unit 7 Interim Use
No ratings yet
CCSD Algebra Unit 7 Interim Use
8 pages
Stock Returns Seasonality in Emerging Asian Markets: Khushboo Aggarwal Mithilesh Kumar Jha
No ratings yet
Stock Returns Seasonality in Emerging Asian Markets: Khushboo Aggarwal Mithilesh Kumar Jha
22 pages
DAB 303 Project 2
No ratings yet
DAB 303 Project 2
12 pages
Final PBL of Aaryan & Satyam
No ratings yet
Final PBL of Aaryan & Satyam
19 pages
ILANTENRALVBDA
No ratings yet
ILANTENRALVBDA
11 pages
Chapter1 Regression Introduction PDF
No ratings yet
Chapter1 Regression Introduction PDF
8 pages
Lecture Material 11
No ratings yet
Lecture Material 11
14 pages
Sharma & Soni, 2020, Discernment of Potential Buyers Based On Purchasing Behaviour Via Machine Learning Techniques
No ratings yet
Sharma & Soni, 2020, Discernment of Potential Buyers Based On Purchasing Behaviour Via Machine Learning Techniques
5 pages
Laptop Price Pred
No ratings yet
Laptop Price Pred
11 pages
Suwarti - Final Project
No ratings yet
Suwarti - Final Project
20 pages
Sales Prediction and Product Recommendation Model Through
No ratings yet
Sales Prediction and Product Recommendation Model Through
20 pages
Adm Final
No ratings yet
Adm Final
7 pages
Ids Case Study
No ratings yet
Ids Case Study
15 pages
Analyzing Sales Data
No ratings yet
Analyzing Sales Data
11 pages
Phase 4
No ratings yet
Phase 4
11 pages
Price Opti Medium Code
No ratings yet
Price Opti Medium Code
15 pages
Logistic Regression
No ratings yet
Logistic Regression
4 pages
BS Mini Project 2
No ratings yet
BS Mini Project 2
5 pages
1july Presentation
No ratings yet
1july Presentation
18 pages
Art 3653 PDF
No ratings yet
Art 3653 PDF
13 pages
Estimating The Economic Model of Crime With Panel Data: June 2019
No ratings yet
Estimating The Economic Model of Crime With Panel Data: June 2019
12 pages
Regression Trees Chapter2
No ratings yet
Regression Trees Chapter2
21 pages
Implementation (Raw)
No ratings yet
Implementation (Raw)
12 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
ML Viva Questions
No ratings yet
ML Viva Questions
4 pages
PPIR!1
No ratings yet
PPIR!1
9 pages
Content
No ratings yet
Content
8 pages
Five Data
No ratings yet
Five Data
3 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
No ratings yet
Multinomial Logistic Regression Models: Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1
5 pages
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
EE - 353 - 769 A4 Unsupervised Learning
No ratings yet
EE - 353 - 769 A4 Unsupervised Learning
1 page
Ams 427 Sup Notes 1f
No ratings yet
Ams 427 Sup Notes 1f
2 pages
Cours 3 - TP
No ratings yet
Cours 3 - TP
3 pages
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
No ratings yet
RFM How To Automatically Segment Customers Using Purchase Data and A Few Lines of Python
8 pages
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Applied Datascience - Phase3
No ratings yet
Applied Datascience - Phase3
8 pages
Business Analytics Course
No ratings yet
Business Analytics Course
11 pages
X Y Korelasi Regresi: 0.58 35 Regression Statistics
No ratings yet
X Y Korelasi Regresi: 0.58 35 Regression Statistics
3 pages
Final DMT Report PDF
No ratings yet
Final DMT Report PDF
27 pages
Intelligent Sales Prediction Using Machine Learning Techniques
No ratings yet
Intelligent Sales Prediction Using Machine Learning Techniques
6 pages
Baron & Kenny
No ratings yet
Baron & Kenny
4 pages

AML Assignment 1 1

Uploaded by

AML Assignment 1 1

Uploaded by

Machine Learning & Data Mining Report

1. Predicting Stock Prices using Machine Learning

# Creating lag features

X = data[[f'lag_{i}' for i in range(1, 6)]]

model = MLPRegressor(hidden_layer_sizes=(100,100), max_iter=500)

mse = mean_squared_error(y_test, predictions)

To further understand customer behavior, we trained a Logistic Regression model to predict

# RFM Feature Calculation

You might also like