0% found this document useful (0 votes)

6 views

Final Report

Uploaded by

comfidentsaksham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Final Report

Uploaded by

comfidentsaksham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Internship Report

On
Machine Learning Internship
In partial fulfillment of requirements for the degree
Of

Bachelor of Technology
In
Computer Science & Engineering (Artificial Intelligence
& Machine Learning)

Submitted By:
Ms.Prachi Shreelochan
2201331530147
B. Tech CSE(AIML) 3rd Year
ACSE0559 Internship Assessment-II

Under the Guidance of:

Ms.Aarushi Thusu
(Assistant Professor, Department of CSE(AIML))
Mr.Faizan Ahmad
(Assistant Professor, Department of CSE(AIML))

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(Artificial Intelligence & Machine Learning)
NOIDA INSTITUTE OF ENGG. & TECHNOLOGY, GREATER NOIDA, GAUTAM BUDDH NAGAR
(AN AUTONOMOUS INSTITUTE)

(Approved by AICTE and affiliated to Dr.A.P.J.Abdul Kalam Technical University, Uttar

Pradesh, Lucknow)

2024
1
Certificate

I hereby certify that the work which is being submitted in the Project Report
entitled “MACHINE LEARNING INTERNSHIP” in partial fulfillment of
the requirements for the award of the Bachelor of Technology in Computer
Science and Engineering Artificial Intelligence & Machine Learning and
submitted to the Department of Computer Science & Engineering Artificial
Intelligence & Machine Learning, Noida Institute of Engineering &
Technology, Greater Noida is an authentic record of my Internship carried out
during Fourth/Fifth semester under the supervision of Ms. Aarushi
Thusu(Assistant Professor) & Mr. Faizan Ahmad(Assistant Professor),
Department of Computer Science and Engineering Artificial Intelligence &
Machine Learning, Noida Institute of Engineering & Technology, Greater
Noida. The matter embodied in this project Report is original and has not been
submitted for the award of any other degree or diploma.

Signature of Candidate
Prachi Shreelochan
2201331530147

This is to certify the above statement made by the candidate is correct and
true to the best of my knowledge.

Signature of Supervisor Signature of HOD

Ms.Aarushi Thusu & Mr.Faizan Ahmad Dr.Raju
(Assistant Professor) (Head(CSE-AIML))

2
DECLARATION

I hereby declare that this submission is my own project and that, to the best of
my own knowledge and belief, it contains no material previously published by
another person nor material which to a substantial extent has been accepted for
the award of any other degree or diploma of the university or other institute of
higher learning except where due acknowledgement has been made in the text.

Signature of Candidate

3
Acknowledgement

Successfully completing any task gives us satisfaction as well as internal

strength for future problems but the person alone has never existed. He is truly
accompanied by few people. They use to give the person support as well as
suggestion to successfully complete the work. So I feel pleasure for thanking
all such great people who motivates me and provides me kind support at all
stages of my Internship Project work. Firstly, I would like to honor my institute
“Noida Institute of Engineering & Technology, Greater Noida”. Here I have
been provided with a workplace and infrastructure to learn recent technologies
and conceptual background to strengthen my programming and professional
skills. I am very much grateful to Ms.Aarushi Thusu(Assistant Professor) &
Mr.Faizan Ahmad(Assistant Professor) (Computer Science and
Engineering Artifical Intelligence & Machine Learning) and Dr.Raju
(Head-CSE(AIML)), Noida Institute of Engineering & Technology, Greater
Noida, for their helpful attitude and encouragement in making my project.
Furthermore, I am thankful to, all faculty members for motivating me and to
the Staffs of computer labs in the department for providing excellent valuable
facility as well as issuing me a computer system of good configuration and
providing regular maintenance. I would like to extend special thanks to all my
batch mates for their love, encouragement and constant support. Last but not
least I would like to thank my parents for supporting me to complete my
project report in all ways.

Prachi Shreelochan

4
Internship Certificate

5
Abstract

During my machine learning internship, I gained extensive hands-on

experience in developing, training, and evaluating machine learning models to
address real-world problems. My work involved implementing a variety of
algorithms across supervised, unsupervised, and sentiment analysis tasks,
leveraging datasets to derive actionable insights and enhance model
performance.
Key highlights include successfully developing a simple linear regression
model for salary prediction with an accuracy of 98.8%, creating a decision tree
classifier on the iris dataset with testing accuracy of 96.6%, and performing
sentiment analysis on a 50k IMDb movie review dataset achieving an accuracy
of 88.48%. I also explored clustering techniques, implementing K-means
clustering on a mall customer dataset to identify customer segments.
Throughout the internship, I honed my skills in data preprocessing, feature
engineering, and hyperparameter tuning. Tools such as Python, Pandas, Scikit
learn, and TensorFlow played a pivotal role in my projects. Additionally, I
strengthened my analytical abilities by visualizing results and validating model
predictions.
This internship provided me with a deeper understanding of the machine
learning workflow, enhanced my problem-solving skills, and solidified my
passion for applying artificial intelligence to solve complex challenges
efficiently.

6
Technology Background

•Python: Used as the primary programming language for its extensive libraries
and frameworks tailored for machine learning, such as Scikit-learn, Pandas,
and NumPy. Python's simplicity and flexibility made it ideal for data
preprocessing, model development, and performance evaluation.

•Scikit-learn: A robust library used for implementing machine learning models

like linear regression, decision trees, and K-means clustering. It provides tools
for model training, evaluation, and hyperparameter tuning.

•Pandas: Employed for data manipulation and analysis. Its data structures like
DataFrames facilitated efficient handling of datasets, including cleaning,
filtering, and transforming raw data into a structured format.

•NumPy: Essential for numerical computations, it was used for handling

multidimensional arrays and performing mathematical operations critical to
machine learning workflows.

•Matplotlib/Seaborn: Used for data visualization to understand data

distributions, relationships, and patterns. These libraries helped create
insightful graphs and plots for analysis and reporting.

•TensorFlow: Utilized in tasks like sentiment analysis for building and training
neural networks. TensorFlow's scalability and flexibility enabled efficient
model development.

•Natural Language Toolkit (NLTK): Applied for text preprocessing in

sentiment analysis, including tokenization, stemming, and stop-word removal
to prepare textual data for model input.

•Google Colab: A cloud-based platform used for writing and executing code. It
provided an environment for real-time development, debugging, and
visualization, with the added advantage of GPU/TPU support for faster
computations.

7
Project Problem Background

1.Linear Regression on Salary Dataset

Background: Predicting employee salaries based on experience and other
features is critical for HR analytics. Accurate salary predictions assist in budget
planning, talent acquisition, and ensuring equitable compensation. The task
required building a regression model to establish a relationship between
features (e.g., years of experience) and salaries, enabling efficient prediction of
compensation trends.

2.Decision Tree Classifier on Iris Dataset

Background: Classification of flower species based on physical characteristics
(sepal and petal dimensions) is a foundational task in machine learning.
Automating this classification provides a basis for understanding supervised
learning. The task focused on building a decision tree classifier to predict
flower species, helping in species identification and dataset segmentation.

3. Sentiment Analysis on IMDb Movie Review Dataset

Background: With the rise of user-generated content, analyzing sentiments in
movie reviews helps businesses gauge audience reactions. The task aimed to
classify reviews as positive or negative, facilitating decision-making for movie
producers, marketers, and streaming platforms to understand public opinion
and improve content strategies.

4. K-Means Clustering on Mall Customer Dataset

Background: Segmenting customers based on purchasing behavior and
demographic attributes is essential for personalized marketing. The task
involved clustering mall customers to identify distinct groups, allowing
businesses to design targeted campaigns, enhance customer retention, and
o p t i m i z e re s o u r c e al l o c a t i o n f o r be t t e r e n g a g e m e n t s t r a t e g i e s .

8
Project Modules

1. Linear Regression on Salary Dataset

Modules:
1. Data Collection and Preprocessing: Collect and clean salary dataset
(e.g., handling missing values, encoding categorical variables).
2. Feature Selection: Identify relevant features like years of experience and
filter irrelevant or redundant data.
3. Model Development: Implement a simple linear regression model to
predict salary.
4. Evaluation: Use metrics like Mean Squared Error (MSE) and R-squared
to evaluate the model's performance.
5. Visualization: Plot regression lines and residuals to analyze model
accuracy.

2. Decision Tree Classifier on Iris Dataset

Modules:
1. Data Loading and Exploration: Load and explore the Iris dataset,
analyzing feature distributions and class labels.
2. Data Splitting: Split the dataset into training and testing sets for model
validation.
3. Model Building: Construct a decision tree classifier to predict flower
species.
4. Pruning and Optimization: Fine-tune hyperparameters like tree depth
to prevent overfitting.
5. Performance Analysis: Evaluate using metrics such as accuracy,
confusion matrix, and classification report.

3. Sentiment Analysis on IMDb Movie Review Dataset

Modules:
1. Text Data Preprocessing: Clean text reviews by removing special
characters, stop-words, and stemming.
2. Vectorization: Convert text data into numerical form using techniques
like Bag-of-Words or TF-IDF.
3. Model Training: Train a machine learning model (e.g., logistic
regression or neural networks) to classify sentiments.
4. Validation: Assess model performance using metrics like accuracy,
9
precision, recall, and F1-score.
5. Prediction and Deployment: Test the model on unseen reviews and
prepare for deployment.

4. K-Means Clustering on Mall Customer Dataset

Modules:
1. Data Understanding and Cleaning: Analyze customer demographics
and spending behavior, and clean any inconsistencies.
2. Feature Scaling: Normalize data to ensure fair clustering, as K-means is
sensitive to feature magnitudes.
3. Elbow Method: Determine the optimal number of clusters by analyzing
within-cluster sum of squares (WCSS).
4. Cluster Formation: Apply K-means clustering to segment customers
into meaningful groups.
5. Visualization and Insights: Visualize clusters and interpret them to
derive actionable business insights.

10
Snapshots of Project
1.Linear Regression

11
12
2.Decision Tree Classifier

13
3.Sentiment Analysis

14
4.K-Means Clustering

15
16
17
Applications

1. Linear Regression on Salary Dataset

Applications:
HR Analytics: Helps organizations predict employee salaries based on
experience, skill set, and other factors to plan budgets and compensation
policies.
Career Guidance:
Assists professionals in forecasting potential earnings based on industry trends
and qualifications.
Financial Planning: Employers can use salary predictions to align with market
standards and optimize payroll expenses.
2. Decision Tree Classifier on Iris Dataset
Applications:
Educational Purposes: Widely used in teaching machine learning concepts for
supervised classification tasks.
Botanical Research: Enables quick and accurate classification of plant species
based on physical characteristics.
Automation in Agriculture: Assists in identifying plant varieties for crop
management and breeding programs.
3. Sentiment Analysis on IMDb Movie Review Dataset
Applications:
Entertainment Industry: Helps producers and marketers analyze audience
feedback and identify content that resonates with viewers.
Reputation Management: Enables platforms to monitor public sentiment about
movies, actors, and production houses.
Recommendation Systems: Enhances user experiences by recommending
content based on aggregated sentiment trends.
4. K-Means Clustering on Mall Customer Dataset
Applications:
Marketing and Personalization: Helps businesses create targeted campaigns by
grouping customers with similar behaviors.
Customer Retention: Identifies high-value customers and their preferences for
improved retention strategies.
Resource Allocation: Aids in optimizing store layouts and inventory
management by understanding customer demographics and spending habits.

18
References

1. Linear Regression on Salary Dataset Books:

Introduction to Statistical Learning by Gareth James et al.
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by
Aurélien Géron.
Online Resources:
Scikit-learn Linear Regression Documentation
Kaggle - Linear Regression Tutorials

2. Decision Tree Classifier on Iris Dataset Books:

Data Mining: Concepts and Techniques by Jiawei Han et al.
Python Machine Learning by Sebastian Raschka and Vahid Mirjalili.
Online Resources:
Scikit-learn Decision Trees Documentation
UCI Machine Learning Repository - Iris Dataset
Kaggle Kernel on Iris Classification

3. Sentiment Analysis on IMDb Movie Review

Dataset Books:
Speech and Language Processing by Jurafsky and Martin.
Text Mining and Analytics by ChengXiang Zhai and Sean Massung.
Online Resources:
NLTK Documentation
Kaggle - IMDb Dataset for Sentiment Analysis
TensorFlow Sentiment Analysis Tutorials

4. K-Means Clustering on Mall Customer Dataset

Books:
Machine Learning Yearning by Andrew Ng. 22
Pattern Recognition and Machine Learning by Christopher M. Bishop.
Online Resources:
Scikit-learn Clustering Documentation
Kaggle - Mall Customers Dataset Visualization
of Clustering with Python

19
Table of the Contents:

TITLE Pg No.
S. No.

1. Internship Certificate 5

2. Abstract 6

3. Technology Background 7

4. Project Problem Background 8

5. Project Modules 9

6. Snapshots of project 11

7. Applications 18

8. Reference 19

Internship Report
No ratings yet
Internship Report
20 pages
Applications Manual: Daewoo Anti Theft System
No ratings yet
Applications Manual: Daewoo Anti Theft System
8 pages
final report saksham
No ratings yet
final report saksham
20 pages
Final Docs Organized (1) Organized (1) Removed Merged
No ratings yet
Final Docs Organized (1) Organized (1) Removed Merged
29 pages
Me Internship Certificate(s)
No ratings yet
Me Internship Certificate(s)
27 pages
Contents TIE Report - Merged
No ratings yet
Contents TIE Report - Merged
18 pages
Internship Report 1
No ratings yet
Internship Report 1
19 pages
Internship Report
No ratings yet
Internship Report
21 pages
INTERNSHIP REPORT
No ratings yet
INTERNSHIP REPORT
41 pages
Intership PCB Report
No ratings yet
Intership PCB Report
12 pages
25June Final_merged
No ratings yet
25June Final_merged
64 pages
bl
No ratings yet
bl
19 pages
Aiml virtual internship report
No ratings yet
Aiml virtual internship report
99 pages
Jayanth Documentation
No ratings yet
Jayanth Documentation
34 pages
Internship Report Format
No ratings yet
Internship Report Format
25 pages
m it
No ratings yet
m it
17 pages
Internship ML REPORT
No ratings yet
Internship ML REPORT
27 pages
Roopesh Ack
No ratings yet
Roopesh Ack
4 pages
Godavari Engg College 24-25 Internship Report
No ratings yet
Godavari Engg College 24-25 Internship Report
19 pages
A Seminar Report On Machine Learning
No ratings yet
A Seminar Report On Machine Learning
38 pages
Internshipreport FINAL441
No ratings yet
Internshipreport FINAL441
14 pages
Final Report Anna
No ratings yet
Final Report Anna
33 pages
First3 Pages - Report - 32429 - FINAL+FINAL
No ratings yet
First3 Pages - Report - 32429 - FINAL+FINAL
6 pages
Data Science & Machine Learning: Prajapati Dipkumar Ramabhai
No ratings yet
Data Science & Machine Learning: Prajapati Dipkumar Ramabhai
53 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
32 pages
Mlinternshippresentationinfidata2021 230322072529 D72e287f
No ratings yet
Mlinternshippresentationinfidata2021 230322072529 D72e287f
15 pages
Mach Weird
No ratings yet
Mach Weird
8 pages
sachin
No ratings yet
sachin
28 pages
Final Report Dinesh
No ratings yet
Final Report Dinesh
33 pages
Final Report Uday
No ratings yet
Final Report Uday
33 pages
Ai ML Virtual Internship
No ratings yet
Ai ML Virtual Internship
51 pages
AI PDF
No ratings yet
AI PDF
51 pages
GOOGLE AIML report
No ratings yet
GOOGLE AIML report
43 pages
Document from .
No ratings yet
Document from .
41 pages
internship report _merged
No ratings yet
internship report _merged
29 pages
AI_ML_Report
No ratings yet
AI_ML_Report
24 pages
Ayush PDF
No ratings yet
Ayush PDF
28 pages
Internship report by sachin gadadaki king
No ratings yet
Internship report by sachin gadadaki king
28 pages
Internship-Report 32429
No ratings yet
Internship-Report 32429
31 pages
Final Report Affa
No ratings yet
Final Report Affa
33 pages
Rojalin Nayak Internship Report
No ratings yet
Rojalin Nayak Internship Report
32 pages
Credit Card Predection Project (1)
No ratings yet
Credit Card Predection Project (1)
214 pages
Hi Front Cse
No ratings yet
Hi Front Cse
10 pages
Mounojit Das SIP Report
No ratings yet
Mounojit Das SIP Report
61 pages
21PA1A05H0 Document
No ratings yet
21PA1A05H0 Document
33 pages
Ravi Internship Report
No ratings yet
Ravi Internship Report
39 pages
dhanush_23[1]
No ratings yet
dhanush_23[1]
30 pages
Ml Internship
No ratings yet
Ml Internship
40 pages
ML - Internship Presentation - Infidata - 2021
No ratings yet
ML - Internship Presentation - Infidata - 2021
15 pages
Machine Learning Task 1
No ratings yet
Machine Learning Task 1
12 pages
Report[1]
No ratings yet
Report[1]
33 pages
Sanjay Final
No ratings yet
Sanjay Final
29 pages
Final 30
No ratings yet
Final 30
20 pages
Google Aiml
No ratings yet
Google Aiml
47 pages
google ai-ml report-5J6[1][1]
No ratings yet
google ai-ml report-5J6[1][1]
52 pages
Report Final
No ratings yet
Report Final
21 pages
Google Aiml 3
No ratings yet
Google Aiml 3
29 pages
Data Science-Logbook
No ratings yet
Data Science-Logbook
101 pages
Shaumya Final Report
No ratings yet
Shaumya Final Report
22 pages
Machine Learning: Master Supervised and Unsupervised Learning Algorithms with Real Examples (English Edition)
From Everand
Machine Learning: Master Supervised and Unsupervised Learning Algorithms with Real Examples (English Edition)
Kamalkant Hiran
No ratings yet
Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
From Everand
Machine Learning Infrastructure and Best Practices for Software Engineers: Take your machine learning software from a prototype to a fully fledged software system
Miroslaw Staron
No ratings yet
Scope Review Checklist
No ratings yet
Scope Review Checklist
4 pages
Moments Chapter 2 The Adventures of Toto
0% (1)
Moments Chapter 2 The Adventures of Toto
5 pages
Operator S Manual: Model Series 760-770 Lawn Tractor
No ratings yet
Operator S Manual: Model Series 760-770 Lawn Tractor
108 pages
GYANRAJ (Recruitment in Escorts) MBA UN HR
No ratings yet
GYANRAJ (Recruitment in Escorts) MBA UN HR
114 pages
part 6 د.عيسى فؤاد اسئلة شهر مارس SPLE
No ratings yet
part 6 د.عيسى فؤاد اسئلة شهر مارس SPLE
42 pages
8655880-Texto Do Artigo-56369-4-10-20190909
No ratings yet
8655880-Texto Do Artigo-56369-4-10-20190909
25 pages
European Depository Receipt - Edr
No ratings yet
European Depository Receipt - Edr
10 pages
Genetic Disorders of Haeoglobin - Haemoglobinopathies - 200622
No ratings yet
Genetic Disorders of Haeoglobin - Haemoglobinopathies - 200622
22 pages
Max Unit Sold: Total Result $ 4,492.50
No ratings yet
Max Unit Sold: Total Result $ 4,492.50
57 pages
PROJECT REPORT AI-driven Healthcare System
No ratings yet
PROJECT REPORT AI-driven Healthcare System
23 pages
QDT AC Delco N100SMF 12V-100Ah Maintenance Free Battery Spec1
No ratings yet
QDT AC Delco N100SMF 12V-100Ah Maintenance Free Battery Spec1
4 pages
Use Reset To Restore Your Windows 10 PC: Topics in This Guide Include
No ratings yet
Use Reset To Restore Your Windows 10 PC: Topics in This Guide Include
5 pages
Color Spot AT: User Manual
No ratings yet
Color Spot AT: User Manual
31 pages
Live Music: Thursdays Calendar of Events
No ratings yet
Live Music: Thursdays Calendar of Events
1 page
Swiggy
No ratings yet
Swiggy
9 pages
Ghist Module Ay 2022-2023
No ratings yet
Ghist Module Ay 2022-2023
120 pages
Week2-Classical Management Theory PDF
No ratings yet
Week2-Classical Management Theory PDF
8 pages
4G LTE-LTE-Advanced For Mobile Broadband - Data
No ratings yet
4G LTE-LTE-Advanced For Mobile Broadband - Data
8 pages
ProjectInformationStandard FileNaming
No ratings yet
ProjectInformationStandard FileNaming
3 pages
Project Report ON Inventory & Billing System
No ratings yet
Project Report ON Inventory & Billing System
60 pages
Tigrip Section From Yale - CMCO Catalogue - Yale and Cosmo Petra - Safe Lifting Solutions
100% (1)
Tigrip Section From Yale - CMCO Catalogue - Yale and Cosmo Petra - Safe Lifting Solutions
76 pages
Project Report On Inflation New
No ratings yet
Project Report On Inflation New
78 pages
Pornography: Legal or Illegal? (Poster)
No ratings yet
Pornography: Legal or Illegal? (Poster)
1 page
Chemistry of Complexes - 1
No ratings yet
Chemistry of Complexes - 1
66 pages
Aim and Objective of Construction Management
67% (3)
Aim and Objective of Construction Management
2 pages
CLASS 12TH ACCOUNTANCY DAY-1
No ratings yet
CLASS 12TH ACCOUNTANCY DAY-1
16 pages
Demystifying Global Macroeconomics 3rd Edition John E. Marthinsen All Chapters Instant Download
100% (4)
Demystifying Global Macroeconomics 3rd Edition John E. Marthinsen All Chapters Instant Download
81 pages
Refund Status
No ratings yet
Refund Status
1 page
Be - First Year Fe Engineering - Semester 1 - 2019 - November - Engineering Mathematics I Pattern 2019
No ratings yet
Be - First Year Fe Engineering - Semester 1 - 2019 - November - Engineering Mathematics I Pattern 2019
5 pages

Final Report

Uploaded by

Final Report

Uploaded by

Internship Report

Under the Guidance of:

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(Approved by AICTE and affiliated to Dr.A.P.J.Abdul Kalam Technical University, Uttar

Signature of Supervisor Signature of HOD

Successfully completing any task gives us satisfaction as well as internal

During my machine learning internship, I gained extensive hands-on

•Scikit-learn: A robust library used for implementing machine learning models

•NumPy: Essential for numerical computations, it was used for handling

•Matplotlib/Seaborn: Used for data visualization to understand data

•Natural Language Toolkit (NLTK): Applied for text preprocessing in

1.Linear Regression on Salary Dataset

2.Decision Tree Classifier on Iris Dataset

3. Sentiment Analysis on IMDb Movie Review Dataset

4. K-Means Clustering on Mall Customer Dataset

1. Linear Regression on Salary Dataset

2. Decision Tree Classifier on Iris Dataset

3. Sentiment Analysis on IMDb Movie Review Dataset

4. K-Means Clustering on Mall Customer Dataset

1. Linear Regression on Salary Dataset

1. Linear Regression on Salary Dataset Books:

2. Decision Tree Classifier on Iris Dataset Books:

3. Sentiment Analysis on IMDb Movie Review

4. K-Means Clustering on Mall Customer Dataset

4. Project Problem Background 8

You might also like