0% found this document useful (0 votes)
0 views24 pages

VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1

The document outlines various Python programs that demonstrate data manipulation and machine learning techniques using the Pandas library and other libraries such as Scikit-learn. It includes programs for importing/exporting data, data preprocessing, and implementing machine learning models like Linear Regression, K-Nearest Neighbors, Decision Trees, Naïve Bayes, Logistic Regression, Support Vector Machines, Random Forest, and Boosting. Each section provides a clear aim, procedure, and sample code for executing the respective tasks.

Uploaded by

joydepth001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views24 pages

VND - Openxmlformats Officedocument - Wordprocessingml.document&rendition 1

The document outlines various Python programs that demonstrate data manipulation and machine learning techniques using the Pandas library and other libraries such as Scikit-learn. It includes programs for importing/exporting data, data preprocessing, and implementing machine learning models like Linear Regression, K-Nearest Neighbors, Decision Trees, Naïve Bayes, Logistic Regression, Support Vector Machines, Random Forest, and Boosting. Each section provides a clear aim, procedure, and sample code for executing the respective tasks.

Uploaded by

joydepth001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

WRITE A PYTHON PROGRAM TO IMPORT AND EXPORT DATA USING

PANDAS LIBRARY FUNCTIONS.

AIM:

To write a Python program to import and export data using the Pandas library
functions.

PROCEDURE:

1. Import the pandas library.


2. Create or load a dataset using pandas.read_csv() function.
3. Display the contents of the dataset.
4. Export the DataFrame to a new file using DataFrame.to_csv() function.

PROGRAM

import pandas as pd

# Step 1: Import data from a CSV file

df = pd.read_csv('student_data.csv')

print("Imported Data:")

print(df)

# Step 2: Export data to a new CSV file

df.to_csv('student_data_exported.csv', index=False)

print("\nData exported successfully to 'student_data_exported.csv'")


DEMONSTRATE VARIOUS DATA PRE-PROCESSING TECHNIQUES FOR A
GIVEN DATASET.

AIM

To demonstrate various data pre-processing techniques for a given


dataset using Python.

PROCEDURE

1. Load the dataset using pandas.


2. Handle missing values.
3. Encode categorical variables.
4. Normalize or scale numerical features.
5. Remove duplicates.
6. Display the processed dataset.

PROGRAM

import pandas as pd

from sklearn.preprocessing import LabelEncoder, StandardScaler

# Step 1: Create a sample dataset

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Alice'],

'Age': [25, 30, 35, None, 45, 40, 25],

'Gender': ['Female', 'Male', 'Male', 'Male', 'Female', 'Male', 'Female'],

'Salary': [50000, 60000, 70000, 80000, None, 90000, 50000]

df = pd.DataFrame(data)

print("Original Dataset:\n", df)


# Step 2: Handle missing values

df['Age'].fillna(df['Age'].mean(), inplace=True)

df['Salary'].fillna(df['Salary'].median(), inplace=True)

# Step 3: Encode categorical data

le = LabelEncoder()

df['Gender'] = le.fit_transform(df['Gender']) # Female=0, Male=1

# Step 4: Remove duplicates

df = df.drop_duplicates()

# Step 5: Feature Scaling (Normalization)

scaler = StandardScaler()

df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

# Final Processed Dataset

print("\nProcessed Dataset:\n", df)


WRITE A PROGRAM FOR HOUSE RENT PREDICTION USING LINEAR
REGRESSION (CAN USE ANY OTHER PREDICTION OTHER THAN HOUSE
RENT)

AIM

To write a Python program for House Rent Prediction using Linear


Regression.

PROCEDURE

1. Import required libraries.


2. Load or create the dataset.
3. Preprocess the data (handle missing values, encode categorical data).
4. Split the data into training and testing sets.
5. Train a Linear Regression model.
6. Evaluate and make predictions.

PROGRAM

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Create a simple dataset

data = {

'Area (sqft)': [600, 800, 1000, 1200, 1500, 1800, 2000],

'Bedrooms': [1, 2, 2, 3, 3, 4, 4],

'Age of House (years)': [5, 10, 3, 8, 15, 7, 2],

'Rent (Rs)': [8000, 10000, 12000, 15000, 18000, 20000, 23000]


}

df = pd.DataFrame(data)

# Step 2: Define features and target

X = df[['Area (sqft)', 'Bedrooms', 'Age of House (years)']]

y = df['Rent (Rs)']

# Step 3: Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=42)

# Step 4: Train the model

model = LinearRegression()

model.fit(X_train, y_train)

# Step 5: Predict and evaluate

y_pred = model.predict(X_test)

print("Predicted Rents:", y_pred)

print("Actual Rents :", y_test.values)

# Step 6: Performance metrics

print("\nMean Squared Error:", mean_squared_error(y_test, y_pred))

print("R² Score:", r2_score(y_test, y_pred))


WRITE A PROGRAM TO IMPLEMENT K – NEAREST NEIGHBOR
ALGORITHM TO CLASSIFY THE GIVEN DATA SET. PRINT BOTH
CORRECT AND WRONG PREDICTIONS.

AIM

To write a Python program to implement the K-Nearest Neighbor


algorithm to classify the given dataset and print both correct and wrong
predictions.

PROCEDURE

1. Import required libraries.


2. Load a sample dataset (Iris).
3. Split the data into training and testing sets.
4. Train the KNN model.
5. Make predictions.
6. Print correct and incorrect predictions.

PROGRAM

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

# Step 1: Load dataset

iris = load_iris()

X = iris.data

y = iris.target

target_names = iris.target_names
# Step 2: Split into train and test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=42)

# Step 3: Initialize KNN model

k=3

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

# Step 4: Predict

y_pred = knn.predict(X_test)

# Step 5: Print correct and incorrect predictions

print("\n--- Prediction Results ---")

for i in range(len(y_test)):

actual = target_names[y_test[i]]

predicted = target_names[y_pred[i]]

if y_test[i] == y_pred[i]:

print(f"[Correct] Sample {i+1}: Actual = {actual}, Predicted = {predicted}")

else:

print(f"[Incorrect] Sample {i+1}: Actual = {actual}, Predicted = {predicted}")


DEVELOP DECISION TREE CLASSIFICATION MODEL FOR A GIVEN
DATASET AND USE IT TO CLASSIFY A NEW SAMPLE.

AIM

To develop a Decision Tree Classification model for a given dataset and


use it to classify a new sample.

PROCEDURE

1. Import the necessary libraries.


2. Load a dataset (we’ll use the Iris dataset for simplicity).
3. Split the dataset into training and testing sets.
4. Train the Decision Tree model.
5. Test the model on a new sample.
6. Print the classification results.

PROGRAM

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score

# Step 1: Load dataset

iris = load_iris()

X = iris.data
y = iris.target

target_names = iris.target_names

# Step 2: Split the dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=42)

# Step 3: Train Decision Tree Classifier

clf = DecisionTreeClassifier(random_state=42)

clf.fit(X_train, y_train)

# Step 4: Evaluate model

y_pred = clf.predict(X_test)

print("Accuracy on test data:", accuracy_score(y_test, y_pred))

# Step 5: Classify a new sample

# Example: [sepal length, sepal width, petal length, petal width]

new_sample = [[5.1, 3.5, 1.4, 0.2]]

prediction = clf.predict(new_sample)

print("\nNew Sample:", new_sample)

print("Predicted Class:", target_names[prediction[0]])


IMPLEMENT NAÏVE BAYES CLASSIFICATION IN PYTHON.

AIM

To implement Naïve Bayes Classification in Python using the Gaussian


Naïve Bayes algorithm.

PROCEDURE

1. Import required libraries.


2. Load or create a dataset.
3. Split the dataset into training and testing sets.
4. Train the Naïve Bayes model.
5. Predict on the test set and evaluate the model.

PROGRAM

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load dataset

iris = load_iris()

X = iris.data

y = iris.target

target_names = iris.target_names
# Step 2: Split dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=42)

# Step 3: Initialize Naive Bayes classifier

model = GaussianNB()

# Step 4: Train the model

model.fit(X_train, y_train)

# Step 5: Predict the test data

y_pred = model.predict(X_test)

# Step 6: Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:")

print(classification_report(y_test, y_pred, target_names=target_names))


DEVELOP LOGISTIC REGRESSION MODEL FOR A GIVEN DATASET.

AIM:

To develop a Logistic Regression model for a given dataset and evaluate


its performance.

PROCEDURE:

1. Import necessary libraries.


2. Load a dataset (using Iris dataset).
3. Preprocess if needed (optional).
4. Split the data into training and testing sets.
5. Train the Logistic Regression model.
6. Predict and evaluate the model.

PROGRAM

from sklearn.datasets import load_iris

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load dataset

iris = load_iris()

X = iris.data

y = iris.target
target_names = iris.target_names

# Step 2: Split dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=42)

# Step 3: Create and train logistic regression model

model = LogisticRegression(max_iter=200) # max_iter increased to ensure


convergence

model.fit(X_train, y_train)

# Step 4: Make predictions

y_pred = model.predict(X_test)

# Step 5: Evaluate model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:")

print(classification_report(y_test, y_pred, target_names=target_names))


WRITE A PROGRAM TO CONSTRUCT A SUPPORT VECTOR MACHINE
CONSIDERING MEDICAL DATA. USE THIS MODEL TO DEMONSTRATE
THE DIAGNOSIS OF HEART PATIENTS USING THE STANDARD HEART
DISEASE DATA SET.

AIM

To write a Python program to construct a Support Vector Machine (SVM)


model using heart disease medical data for classification.

PROCEDURE

1. Import required libraries.


2. Load the Heart Disease dataset.
3. Preprocess the data (handle missing values, normalize features).
4. Split the dataset into training and testing sets.
5. Train the SVM model.
6. Predict and evaluate performance.

PROGRAM

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


# Step 1: Load dataset

# Download dataset from: https://www.kaggle.com/datasets/clevercoder/heart-


disease-prediction

# or use UCI dataset

url = "https://raw.githubusercontent.com/omairaasim/machine-learning-datasets/
main/heart.csv"

df = pd.read_csv(url)

print("First 5 records:\n", df.head())

# Step 2: Features and labels

X = df.drop('target', axis=1) # Features

y = df['target'] # Labels (0 = No disease, 1 = Disease)

# Step 3: Data Scaling

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Step 4: Train-test split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,


random_state=42)

# Step 5: Create and train SVM model

model = SVC(kernel='linear') # You can try 'rbf', 'poly', etc.

model.fit(X_train, y_train)

# Step 6: Predict

y_pred = model.predict(X_test)

# Step 7: Evaluate model


print("\nAccuracy:", accuracy_score(y_test, y_pred))

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

A) IMPLEMENT RANDOM FOREST ENSEMBLE METHOD ON A GIVEN


DATASET

AIM

To implement the Random Forest ensemble method on a given dataset and


use it for classification.

PROCEDURE

1. Import the necessary libraries.


2. Load and preprocess the dataset.
3. Split the data into training and testing sets.
4. Train a Random Forest classifier.
5. Evaluate and display the results.

PROGRAM

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

from sklearn.preprocessing import StandardScaler

# Step 1: Load the dataset (Heart Disease Dataset)

url = "https://raw.githubusercontent.com/omairaasim/machine-learning-datasets/
main/heart.csv"
df = pd.read_csv(url)

# Step 2: Define features and labels

X = df.drop('target', axis=1)

y = df['target']

# Step 3: Normalize the features

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Step 4: Train-test split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,


random_state=42)

# Step 5: Train Random Forest model

rf = RandomForestClassifier(n_estimators=100, random_state=42)

rf.fit(X_train, y_train)

# Step 6: Predict and evaluate

y_pred = rf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))


IMPLEMENT BOOSTING ENSEMBLE METHOD ON A GIVEN
DATASET

AIM

To implement the Boosting ensemble method using the AdaBoost


algorithm on a given dataset and evaluate its performance .

PROCEDURE

1. Import necessary libraries.


2. Load a dataset (e.g., Iris dataset from sklearn.datasets).
3. Split the dataset into training and testing sets.
4. Create an AdaBoost classifier.
5. Train the classifier on the training data.
6. Predict the results on the test data.
7. Evaluate the performance using accuracy and classification report.

PROGRAM

# Step 1: Import libraries

from sklearn.ensemble import AdaBoostClassifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report

# Step 2: Load dataset


iris = load_iris()

X = iris.data

y = iris.target

# Step 3: Split into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,


random_state=1)

# Step 4: Create AdaBoost model

model = AdaBoostClassifier(n_estimators=50, random_state=1)

# Step 5: Train the model

model.fit(X_train, y_train)

# Step 6: Predict on test data

y_pred = model.predict(X_test)

# Step 7: Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))

print("Classification Report:\n", classification_report(y_test, y_pred))


a)WRITE A PYTHON PROGRAM TO IMPLEMENT K-MEANS CLUSTERING
ALGORITHM

AIM

To implement the K-Means clustering algorithm on a dataset and visualize the clusters.

PROCEDURE

1. Import required libraries.


2. Load or generate sample data.
3. Apply K-Means clustering.
4. Visualize the clusters (optional but helpful).
5. Print cluster centers and labels.

PROGRAM

# Step 1: Import required libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs

# Step 2: Generate sample data (2D)

X, y_true = make_blobs(n_samples=300, centers=3, cluster_std=0.60, random_state=0)


# Step 3: Apply K-Means clustering

kmeans = KMeans(n_clusters=3, random_state=0)

kmeans.fit(X)

y_kmeans = kmeans.predict(X)

# Step 4: Visualize the clusters

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')

centers = kmeans.cluster_centers_

plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')

plt.title("K-Means Clustering")

plt.xlabel("Feature 1")

plt.ylabel("Feature 2")

plt.grid(True)

plt.show()

# Step 5: Output cluster centers

print("Cluster Centers:\n", centers)


b) IMPLEMENT DIMENSIONALITY REDUCTION USING PRINCIPLE
COMPONENT ANALYSIS (PCA) METHOD.

AIM

To implement Dimensionality Reduction using the Principal Component Analysis (PCA)


method and visualize the transformed data.

PROCEDURE

1. Import necessary libraries.


2. Load a dataset (e.g., Iris dataset).
3. Standardize the data.
4. Apply PCA to reduce dimensions.
5. Visualize the reduced data.
6. Print explained variance.

PROGRAM

# Step 1: Import libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.decomposition import PCA

from sklearn.datasets import load_iris

from sklearn.preprocessing import StandardScaler


# Step 2: Load dataset

data = load_iris()

X = data.data

y = data.target

target_names = data.target_names

# Step 3: Standardize the data

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Step 4: Apply PCA (reduce to 2 dimensions for visualization)

pca = PCA(n_components=2)

X_pca = pca.fit_transform(X_scaled)

# Step 5: Visualize the PCA-transformed data

plt.figure(figsize=(8, 6))

colors = ['r', 'g', 'b']

for i, target_name in enumerate(target_names):

plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], alpha=0.7, color=colors[i],


label=target_name)

plt.xlabel('Principal Component 1')

plt.ylabel('Principal Component 2')

plt.title('PCA of IRIS Dataset')

plt.legend()

plt.grid(True)
plt.show()

# Step 6: Print explained variance

print("Explained variance ratio:", pca.explained_variance_ratio_)

You might also like