0% found this document useful (0 votes)
6 views

Objective

Uploaded by

khushboo99khush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Objective

Uploaded by

khushboo99khush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Objective:

 Use MySQL to store and retrieve customer data.


 Use Python to build and train a Logistic Regression model for classification.

Database Setup in MySQL

 Create a table to store customer data:

CREATE DATABASE customer_db;

USE customer_db;

CREATE TABLE customer_data (


customer_id INT AUTO_INCREMENT PRIMARY KEY,
age INT,
income FLOAT,
browsing_time FLOAT,
purchase_made BOOLEAN
);

-- Insert some sample data


INSERT INTO customer_data (age, income, browsing_time, purchase_made)
VALUES
(25, 35000, 5.5, 1),
(32, 60000, 2.2, 0),
(45, 75000, 3.1, 1),
(28, 40000, 1.8, 0),
(35, 58000, 4.0, 1);
Python Code

Step 1: Install Required Libraries

pip install mysql-connector-python scikit-learn pandas

Step 2: Python Program

import mysql.connector
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Connect to the MySQL database


def connect_to_db():
connection = mysql.connector.connect(
host="localhost",
user="root", # Replace with your MySQL username
password="", # Replace with your MySQL password
database="customer_db"
)
return connection

# Step 2: Fetch data from the database


def fetch_data(connection):
query = "SELECT age, income, browsing_time, purchase_made FROM customer_data"
cursor = connection.cursor()
cursor.execute(query)
data = cursor.fetchall()
df = pd.DataFrame(data, columns=['Age', 'Income', 'Browsing_Time', 'Purchase_Made'])
return df

# Step 3: Train a machine learning model


def train_model(data):
X = data[['Age', 'Income', 'Browsing_Time']] # Features
y = data['Purchase_Made'] # Target

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Logistic Regression model


model = LogisticRegression()
model.fit(X_train, y_train)

# Test the model


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy * 100:.2f}%")


return model

# Step 4: Main function to integrate everything


def main():
# Connect to the database
connection = connect_to_db()

# Fetch data
data = fetch_data(connection)
print("Data Retrieved from Database:")
print(data)

# Train model
model = train_model(data)

# Close the database connection


connection.close()

if __name__ == "__main__":
main()
How It Works:

1. Database Interaction:
 Data is retrieved from the MySQL database using mysql.connector.
2. Data Preprocessing:
 The retrieved data is converted into a Pandas DataFrame for easy manipulation.
3. Machine Learning:
 A Logistic Regression model is trained to predict whether a customer will make a purchase based on
features like age, income, and browsing time.
4. Evaluation:
 The program splits the data into training and testing sets and evaluates the model's accuracy.
Output:

1. Data from MySQL:

Data Retrieved from Database:


Customer ID Age Income Browsing_Time Purchase_Made
0 25 35000.0 5.5 1
1 32 60000.0 2.2 0
2 45 75000.0 3.1 1
3 28 40000.0 1.8 0
4 35 58000.0 4.0 1

2. Model Accuracy:
Model Accuracy: 100.00%

Detailed Explanation of Advanced DBMS Program Integrating MySQL and Python with Machine Learning
1. Purpose of the Program

The program is designed to integrate MySQL, a relational database, with Python, a programming language, to perform data
analysis and prediction using a Machine Learning algorithm (Logistic Regression). The goal is to:
 Fetch customer data stored in a MySQL database.
 Use the data to train a Logistic Regression model to predict if a customer will make a purchase based on features
like:
i. Age
ii. Income
iii. Browsing Time
 Evaluate the model and print its accuracy.

2. Database Setup in MySQL

 Database Creation: A database named `customer_db` is created to store customer data.


 Table Definition: A table called `customer_data` is defined with columns for:
i. Customer_id`: A unique identifier for each customer.
ii. Age`: Age of the customer.
iii. Income`: Monthly income of the customer.
iv. Browsing_time`: Time spent browsing the website.
v. Purchase_made`: Whether the customer made a purchase (1 for yes, 0 for no).
 Sample Data: Example rows are inserted into the table for testing.
The database acts as the primary source of data for the machine learning model.

3. Python Program

Step 1: Database Connection


 In Python, the `mysql.connector` library is used to connect to the MySQL database. The connection requires:
i. host`: The server hosting the database (e.g., localhost for local databases).
ii. user` and `password`: Authentication credentials for the database.
iii. database`: The specific database to connect to (in this case, `customer_db`).
 This ensures seamless communication between Python and MySQL.

Step 2: Fetch Data from the Database

 SQL Query: The query `SELECT age, income, browsing_time, purchase_made FROM customer_data` retrieves
relevant columns for analysis.
 Fetching Data: The cursor object executes the query and fetches all rows into Python. The data is converted into a
Pandas DataFrame for easy manipulation and analysis.
The DataFrame serves as the input dataset for the machine learning algorithm.

Step 3: Data Preprocessing

 The program divides the data into:


i. Features (X): Independent variables (`Age`, `Income`, `Browsing_Time`).
ii. Target (y): Dependent variable (`Purchase_Made`).
iii. Data Splitting: The dataset is split into two parts:
iv. Training Data: Used to train the model (80% of the data).
v. Testing Data: Used to evaluate the model’s performance (20% of the data).
 The `train_test_split` function from Scikit-learn is used for this purpose.

Step 4: Machine Learning Model

 Algorithm Choice: Logistic Regression is chosen as the machine learning algorithm. This algorithm is ideal for
binary classification problems where the target variable has two possible outcomes (1 = Purchase, 0 = No
Purchase).
 Training: The model is trained on the training dataset using `model.fit(X_train, y_train)`. Logistic Regression
learns the relationship between features and the target variable.
 Prediction: Once trained, the model is used to predict outcomes for the test dataset (`model.predict(X_test)`).

Step 5: Evaluation
 Accuracy Score: The `accuracy_score` function compares the model’s predictions (`y_pred`) with the actual test
labels (`y_test`) to calculate the proportion of correct predictions. This score is displayed as a percentage,
providing a measure of the model’s effectiveness.

Step 6: Integration

 Connecting All Components: The `main()` function ties everything together:


i. Establish a connection with the database.
ii. Fetch the data and convert it into a DataFrame.
iii. Train the machine learning model.
iv. Print the results.
 After execution, the database connection is closed to free resources.

4. Why This Approach?

 Scalability: The program can handle large datasets stored in a database, making it suitable for real-world
applications.
 Reusability: New data can be added to the database, and the program can re-train the model to improve
predictions.
 Integration: It demonstrates how to integrate a relational database (MySQL) with advanced analytics (Python and
Machine Learning).

5. Output:

Data from MySQL:

Data Retrieved from Database:


Customer ID Age Income Browsing_Time Purchase_Made
0 25 35000.0 5.5 1
1 32 60000.0 2.2 0
2 45 75000.0 3.1 1
3 28 40000.0 1.8 0
4 35 58000.0 4.0 1
Model Accuracy:
Model Accuracy: 100.00%

6. Applications of This Program

 E-commerce: Predict customer purchasing behavior to optimize marketing strategies.


 Finance: Classify customers based on credit risk or loan eligibility.
 Healthcare: Predict patient outcomes based on diagnostic data.
 Education: Analyze student performance and identify those needing additional support.
This program serves as a foundation for integrating databases with machine learning models in various domains, ensuring
data-driven decision-making!

You might also like