0% found this document useful (0 votes)
6 views9 pages

Practical File OF Machine Learning

This practical file outlines various machine learning experiments conducted by Shubham Kumar Chaubey as part of a Bachelor of Technology program in Computer Science. It includes experiments on natural language processing, customer segmentation using K-Means clustering, and neural networks, detailing methodologies such as tokenization, stopword removal, and sentiment analysis. The document serves as a comprehensive guide to applying machine learning techniques to real-world data analysis tasks.

Uploaded by

sirbabu778
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

Practical File OF Machine Learning

This practical file outlines various machine learning experiments conducted by Shubham Kumar Chaubey as part of a Bachelor of Technology program in Computer Science. It includes experiments on natural language processing, customer segmentation using K-Means clustering, and neural networks, detailing methodologies such as tokenization, stopword removal, and sentiment analysis. The document serves as a comprehensive guide to applying machine learning techniques to real-world data analysis tasks.

Uploaded by

sirbabu778
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

PRACTICAL FILE

OF
MACHINE LEARNING

BACHELOR OF TECHNOLOGY IN COMPUTER SCIENCE

SESSION 2022-2026
Department of Computer Science
Global Institute of Technology & Management,
(Gurugram University)
Farrukh Nagar, Haryana, India

SUBMITTED BY SUBMITTED TO
Name: Shubham Kumar Chaubey Name: Mr. Muzamil Aslam
Roll No.: 221116 Designation: Professor
Semester: 6th- B
S.
EXPERIMENT DATE SIGNATURE
NO.
1. Automatic Word Analysis in NLP 11-02-2025
Classification Algorithms and ROC
2. 28-02-2025
Interpretation
Customer Segmentation using K-Means
3. 21-03-2025
Clustering
Neural Networks: Feedforward, CNN, and
4. 28-03-2025
RNN
5. Feature Selection Techniques 04-04-2025

Linear and Logistic Regression


6. 11-04-2025
Implementation
7. K-Means Clustering on Iris Dataset 17-04-2025

INDEX
sis
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from collections import Counter
from textblob import TextBlob

# Download necessary NLTK resources


nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Sample text for analysis


text = """Machine learning is a branch of artificial
intelligence that enables computers to learn from data.
It is used in various applications, including speech
recognition, recommendation systems, and automation."""

# 1. Tokenization (Splitting text into words)


tokens = word_tokenize(text.lower()) # Convert to lowercase

# 2. Removing stopwords
stop_words = set(stopwords.words("english"))
filtered_words = [word for word in tokens if word.isalnum()
and word not in stop_words]

# 3. Stemming (Reducing words to their root form)


stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in
filtered_words]

# 4. Lemmatization (Converting words to their base form)


lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in
filtered_words]

# 5. Word Frequency Analysis


word_freq = Counter(lemmatized_words)

# 6. Sentiment Analysis
blob = TextBlob(text)
sentiment = blob.sentiment.polarity # Range from -1
(negative) to 1 (positive)

# Output Results
print("Original Text:", text)
print("\nTokenized Words:", tokens)
print("\nFiltered Words (Without Stopwords):", filtered_words)
print("\nStemmed Words:", stemmed_words)
print("\nLemmatized Words:", lemmatized_words)
print("\nWord Frequency:", word_freq)
print("\nSentiment Analysis Score:", sentiment)
if sentiment > 0:
print("Overall Sentiment: Positive ")
elif sentiment < 0:
print("Overall Sentiment: Negative ")
else:
print("Overall Sentiment: Neutral ")

Explanation:

 Tokenization: Splits the input text into individual words.


 Stopword Removal: Eliminates common English words that don’t
contribute to meaning.
 Stemming: Converts words to their root form using the Porter
Stemmer.
 Lemmatization: Uses WordNet Lemmatizer to find meaningful base
words.
 Word Frequency Analysis: Counts occurrences of each word in the text.
 Sentiment Analysis: Uses TextBlob to determine positive, negative, or
neutral sentiment.

Expected Output:
Original Text: Machine learning is a branch of artificial
intelligence that enables computers to learn from data...

Tokenized Words: ['machine', 'learning', 'is', 'a', 'branch',


'of', ...]

Filtered Words (Without Stopwords): ['machine', 'learning',


'branch', 'artificial', ...]

Stemmed Words: ['machin', 'learn', 'branch', 'artifici', 'intellig',


...]

Lemmatized Words: ['machine', 'learning', 'branch', 'artificial',


'intelligence', ...]

Word Frequency: {'machine': 1, 'learning': 1, 'branch': 1, ...}

Sentiment Analysis Score: 0.3


Overall Sentiment: Positive

Conclusion:
This experiment successfully demonstrates automatic word analysis using
machine learning and NLP techniques, covering:

✅ Tokenization
✅ Stopword Removal
✅ Stemming & Lemmatization
✅ Word Frequency Analysis
✅ Sentiment Analysis

This approach is widely used in text classification, chatbots, and AI-driven


language processing. 🚀

 Dvlnxfbnlfbncnnvnlx
 bfbnvlnClustering: The dataset contained raw, ungrouped customer data
without clear segmentation.
 After Clustering: The K-Means algorithm successfully identified distinct customer
segments based on their annual income and spending patterns.
 Customer Segmentation: The segmented customers represent different shopper
profiles, allowing targeted marketing strategies to be devised.

Note: The labels assigned to clusters ("Budget Shoppers," "Impulse Buyers,"


etc.) are interpretations based on the visual representation of the clusters.
Further analysis and domain expertise might be required to refine the label
accuracy.

This experiment demonstrates how K-Means clustering can be employed to


understand customer behavior and identify distinct segments within a dataset.

Experiment 4: Three assignments on designing


neural networks for solving learning problems.

# Import necessary libraries


import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D,
Embedding, LSTM
from tensorflow.keras.datasets import mnist, cifar10, imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import matplotlib.pyplot as plt

# -----------------------------
# Assignment 1: Feedforward NN for MNIST
# -----------------------------
print("Training Feedforward NN on MNIST...")
vnnnnnnnnnnnnnnnnnnnnnnnnuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxjjjjjjjjjjjjjjjjjjjjjjjjjjjjjlxxxxxxxx
xxxxxxxxxxxxxxxxxxxxllllllllllllllllllllllllllllllllllmean
texture 0.323782 1.000000 ... 0.415185

fvbnum
Experiment 7: One Assignment to be done in
Grouping
Grouping Data using K-Means Clustering

Objective:

To implement the K-Means Clustering algorithm for unsupervised


learning and to group similar data points based on selected features.

Software/Tools Required:

 Python
 Libraries: pandas, sklearn, matplotlib, seaborn

Dataset:

Use the Iris dataset from sklearn.datasets

Tasks to Perform:

1. Load the Iris dataset.


2. Choose two features for clustering.
3. Apply K-Means clustering to group data into clusters.
4. Visualize the clusters using a scatter plot.
5. Display the cluster centroids.
6. Evaluate the clustering using Silhouette Score.

Sample Python Code:


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.metrics import silhouette_score

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Select features for clustering


X = df[['petal length (cm)', 'petal width (cm)']]

# Apply KMeans
kmeans = KMeans(n_clusters=3, random_state=42)
df['cluster'] = kmeans.fit_predict(X)

# Visualize the clusters


plt.figure(figsize=(8,6))
sns.scatterplot(x='petal length (cm)', y='petal width (cm)',
hue='cluster', data=df, palette='Set2')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200,
marker='X', label='Centroids')
plt.title("K-Means Clustering")
plt.legend()
plt.show()

# Evaluate with Silhouette Score


score = silhouette_score(X, df['cluster'])
print("Silhouette Score:", score)

Sample Output:
Silhouette Score: 0.66

A scatter plot will display 3 groups of data points with distinct colors
and black centroids.
Expected Learning Outcomes:

 Understand unsupervised learning and clustering.


 Learn how to group similar data points without labeled data.
 Visualize and evaluate clustering performance using Silhouette
Score.

You might also like