0% found this document useful (0 votes)
6 views

Final Report

Uploaded by

comfidentsaksham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Final Report

Uploaded by

comfidentsaksham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Internship Report

On
Machine Learning Internship
In partial fulfillment of requirements for the degree
Of

Bachelor of Technology
In
Computer Science & Engineering (Artificial Intelligence
& Machine Learning)

Submitted By:
Ms.Prachi Shreelochan
2201331530147
B. Tech CSE(AIML) 3rd Year
ACSE0559 Internship Assessment-II

Under the Guidance of:


Ms.Aarushi Thusu
(Assistant Professor, Department of CSE(AIML))
Mr.Faizan Ahmad
(Assistant Professor, Department of CSE(AIML))

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


(Artificial Intelligence & Machine Learning)
NOIDA INSTITUTE OF ENGG. & TECHNOLOGY, GREATER NOIDA, GAUTAM BUDDH NAGAR
(AN AUTONOMOUS INSTITUTE)

(Approved by AICTE and affiliated to Dr.A.P.J.Abdul Kalam Technical University, Uttar


Pradesh, Lucknow)

2024
1
Certificate

I hereby certify that the work which is being submitted in the Project Report
entitled “MACHINE LEARNING INTERNSHIP” in partial fulfillment of
the requirements for the award of the Bachelor of Technology in Computer
Science and Engineering Artificial Intelligence & Machine Learning and
submitted to the Department of Computer Science & Engineering Artificial
Intelligence & Machine Learning, Noida Institute of Engineering &
Technology, Greater Noida is an authentic record of my Internship carried out
during Fourth/Fifth semester under the supervision of Ms. Aarushi
Thusu(Assistant Professor) & Mr. Faizan Ahmad(Assistant Professor),
Department of Computer Science and Engineering Artificial Intelligence &
Machine Learning, Noida Institute of Engineering & Technology, Greater
Noida. The matter embodied in this project Report is original and has not been
submitted for the award of any other degree or diploma.

Signature of Candidate
Prachi Shreelochan
2201331530147

This is to certify the above statement made by the candidate is correct and
true to the best of my knowledge.

Signature of Supervisor Signature of HOD


Ms.Aarushi Thusu & Mr.Faizan Ahmad Dr.Raju
(Assistant Professor) (Head(CSE-AIML))

2
DECLARATION

I hereby declare that this submission is my own project and that, to the best of
my own knowledge and belief, it contains no material previously published by
another person nor material which to a substantial extent has been accepted for
the award of any other degree or diploma of the university or other institute of
higher learning except where due acknowledgement has been made in the text.

Signature of Candidate

3
Acknowledgement

Successfully completing any task gives us satisfaction as well as internal


strength for future problems but the person alone has never existed. He is truly
accompanied by few people. They use to give the person support as well as
suggestion to successfully complete the work. So I feel pleasure for thanking
all such great people who motivates me and provides me kind support at all
stages of my Internship Project work. Firstly, I would like to honor my institute
“Noida Institute of Engineering & Technology, Greater Noida”. Here I have
been provided with a workplace and infrastructure to learn recent technologies
and conceptual background to strengthen my programming and professional
skills. I am very much grateful to Ms.Aarushi Thusu(Assistant Professor) &
Mr.Faizan Ahmad(Assistant Professor) (Computer Science and
Engineering Artifical Intelligence & Machine Learning) and Dr.Raju
(Head-CSE(AIML)), Noida Institute of Engineering & Technology, Greater
Noida, for their helpful attitude and encouragement in making my project.
Furthermore, I am thankful to, all faculty members for motivating me and to
the Staffs of computer labs in the department for providing excellent valuable
facility as well as issuing me a computer system of good configuration and
providing regular maintenance. I would like to extend special thanks to all my
batch mates for their love, encouragement and constant support. Last but not
least I would like to thank my parents for supporting me to complete my
project report in all ways.

Prachi Shreelochan

4
Internship Certificate

5
Abstract

During my machine learning internship, I gained extensive hands-on


experience in developing, training, and evaluating machine learning models to
address real-world problems. My work involved implementing a variety of
algorithms across supervised, unsupervised, and sentiment analysis tasks,
leveraging datasets to derive actionable insights and enhance model
performance.
Key highlights include successfully developing a simple linear regression
model for salary prediction with an accuracy of 98.8%, creating a decision tree
classifier on the iris dataset with testing accuracy of 96.6%, and performing
sentiment analysis on a 50k IMDb movie review dataset achieving an accuracy
of 88.48%. I also explored clustering techniques, implementing K-means
clustering on a mall customer dataset to identify customer segments.
Throughout the internship, I honed my skills in data preprocessing, feature
engineering, and hyperparameter tuning. Tools such as Python, Pandas, Scikit
learn, and TensorFlow played a pivotal role in my projects. Additionally, I
strengthened my analytical abilities by visualizing results and validating model
predictions.
This internship provided me with a deeper understanding of the machine
learning workflow, enhanced my problem-solving skills, and solidified my
passion for applying artificial intelligence to solve complex challenges
efficiently.

6
Technology Background

•Python: Used as the primary programming language for its extensive libraries
and frameworks tailored for machine learning, such as Scikit-learn, Pandas,
and NumPy. Python's simplicity and flexibility made it ideal for data
preprocessing, model development, and performance evaluation.

•Scikit-learn: A robust library used for implementing machine learning models


like linear regression, decision trees, and K-means clustering. It provides tools
for model training, evaluation, and hyperparameter tuning.

•Pandas: Employed for data manipulation and analysis. Its data structures like
DataFrames facilitated efficient handling of datasets, including cleaning,
filtering, and transforming raw data into a structured format.

•NumPy: Essential for numerical computations, it was used for handling


multidimensional arrays and performing mathematical operations critical to
machine learning workflows.

•Matplotlib/Seaborn: Used for data visualization to understand data


distributions, relationships, and patterns. These libraries helped create
insightful graphs and plots for analysis and reporting.

•TensorFlow: Utilized in tasks like sentiment analysis for building and training
neural networks. TensorFlow's scalability and flexibility enabled efficient
model development.

•Natural Language Toolkit (NLTK): Applied for text preprocessing in


sentiment analysis, including tokenization, stemming, and stop-word removal
to prepare textual data for model input.

•Google Colab: A cloud-based platform used for writing and executing code. It
provided an environment for real-time development, debugging, and
visualization, with the added advantage of GPU/TPU support for faster
computations.

7
Project Problem Background

1.Linear Regression on Salary Dataset


Background: Predicting employee salaries based on experience and other
features is critical for HR analytics. Accurate salary predictions assist in budget
planning, talent acquisition, and ensuring equitable compensation. The task
required building a regression model to establish a relationship between
features (e.g., years of experience) and salaries, enabling efficient prediction of
compensation trends.

2.Decision Tree Classifier on Iris Dataset


Background: Classification of flower species based on physical characteristics
(sepal and petal dimensions) is a foundational task in machine learning.
Automating this classification provides a basis for understanding supervised
learning. The task focused on building a decision tree classifier to predict
flower species, helping in species identification and dataset segmentation.

3. Sentiment Analysis on IMDb Movie Review Dataset


Background: With the rise of user-generated content, analyzing sentiments in
movie reviews helps businesses gauge audience reactions. The task aimed to
classify reviews as positive or negative, facilitating decision-making for movie
producers, marketers, and streaming platforms to understand public opinion
and improve content strategies.

4. K-Means Clustering on Mall Customer Dataset


Background: Segmenting customers based on purchasing behavior and
demographic attributes is essential for personalized marketing. The task
involved clustering mall customers to identify distinct groups, allowing
businesses to design targeted campaigns, enhance customer retention, and
o p t i m i z e re s o u r c e al l o c a t i o n f o r be t t e r e n g a g e m e n t s t r a t e g i e s .

8
Project Modules

1. Linear Regression on Salary Dataset


Modules:
1. Data Collection and Preprocessing: Collect and clean salary dataset
(e.g., handling missing values, encoding categorical variables).
2. Feature Selection: Identify relevant features like years of experience and
filter irrelevant or redundant data.
3. Model Development: Implement a simple linear regression model to
predict salary.
4. Evaluation: Use metrics like Mean Squared Error (MSE) and R-squared
to evaluate the model's performance.
5. Visualization: Plot regression lines and residuals to analyze model
accuracy.

2. Decision Tree Classifier on Iris Dataset


Modules:
1. Data Loading and Exploration: Load and explore the Iris dataset,
analyzing feature distributions and class labels.
2. Data Splitting: Split the dataset into training and testing sets for model
validation.
3. Model Building: Construct a decision tree classifier to predict flower
species.
4. Pruning and Optimization: Fine-tune hyperparameters like tree depth
to prevent overfitting.
5. Performance Analysis: Evaluate using metrics such as accuracy,
confusion matrix, and classification report.

3. Sentiment Analysis on IMDb Movie Review Dataset


Modules:
1. Text Data Preprocessing: Clean text reviews by removing special
characters, stop-words, and stemming.
2. Vectorization: Convert text data into numerical form using techniques
like Bag-of-Words or TF-IDF.
3. Model Training: Train a machine learning model (e.g., logistic
regression or neural networks) to classify sentiments.
4. Validation: Assess model performance using metrics like accuracy,
9
precision, recall, and F1-score.
5. Prediction and Deployment: Test the model on unseen reviews and
prepare for deployment.

4. K-Means Clustering on Mall Customer Dataset


Modules:
1. Data Understanding and Cleaning: Analyze customer demographics
and spending behavior, and clean any inconsistencies.
2. Feature Scaling: Normalize data to ensure fair clustering, as K-means is
sensitive to feature magnitudes.
3. Elbow Method: Determine the optimal number of clusters by analyzing
within-cluster sum of squares (WCSS).
4. Cluster Formation: Apply K-means clustering to segment customers
into meaningful groups.
5. Visualization and Insights: Visualize clusters and interpret them to
derive actionable business insights.

10
Snapshots of Project
1.Linear Regression

11
12
2.Decision Tree Classifier

13
3.Sentiment Analysis

14
4.K-Means Clustering

15
16
17
Applications

1. Linear Regression on Salary Dataset


Applications:
HR Analytics: Helps organizations predict employee salaries based on
experience, skill set, and other factors to plan budgets and compensation
policies.
Career Guidance:
Assists professionals in forecasting potential earnings based on industry trends
and qualifications.
Financial Planning: Employers can use salary predictions to align with market
standards and optimize payroll expenses.
2. Decision Tree Classifier on Iris Dataset
Applications:
Educational Purposes: Widely used in teaching machine learning concepts for
supervised classification tasks.
Botanical Research: Enables quick and accurate classification of plant species
based on physical characteristics.
Automation in Agriculture: Assists in identifying plant varieties for crop
management and breeding programs.
3. Sentiment Analysis on IMDb Movie Review Dataset
Applications:
Entertainment Industry: Helps producers and marketers analyze audience
feedback and identify content that resonates with viewers.
Reputation Management: Enables platforms to monitor public sentiment about
movies, actors, and production houses.
Recommendation Systems: Enhances user experiences by recommending
content based on aggregated sentiment trends.
4. K-Means Clustering on Mall Customer Dataset
Applications:
Marketing and Personalization: Helps businesses create targeted campaigns by
grouping customers with similar behaviors.
Customer Retention: Identifies high-value customers and their preferences for
improved retention strategies.
Resource Allocation: Aids in optimizing store layouts and inventory
management by understanding customer demographics and spending habits.

18
References

1. Linear Regression on Salary Dataset Books:


Introduction to Statistical Learning by Gareth James et al.
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by
Aurélien Géron.
Online Resources:
Scikit-learn Linear Regression Documentation
Kaggle - Linear Regression Tutorials

2. Decision Tree Classifier on Iris Dataset Books:


Data Mining: Concepts and Techniques by Jiawei Han et al.
Python Machine Learning by Sebastian Raschka and Vahid Mirjalili.
Online Resources:
Scikit-learn Decision Trees Documentation
UCI Machine Learning Repository - Iris Dataset
Kaggle Kernel on Iris Classification

3. Sentiment Analysis on IMDb Movie Review


Dataset Books:
Speech and Language Processing by Jurafsky and Martin.
Text Mining and Analytics by ChengXiang Zhai and Sean Massung.
Online Resources:
NLTK Documentation
Kaggle - IMDb Dataset for Sentiment Analysis
TensorFlow Sentiment Analysis Tutorials

4. K-Means Clustering on Mall Customer Dataset


Books:
Machine Learning Yearning by Andrew Ng. 22
Pattern Recognition and Machine Learning by Christopher M. Bishop.
Online Resources:
Scikit-learn Clustering Documentation
Kaggle - Mall Customers Dataset Visualization
of Clustering with Python

19
Table of the Contents:

TITLE Pg No.
S. No.

1. Internship Certificate 5

2. Abstract 6

3. Technology Background 7

4. Project Problem Background 8

5. Project Modules 9

6. Snapshots of project 11

7. Applications 18

8. Reference 19

You might also like