0% found this document useful (0 votes)
64 views43 pages

GOOGLE AIML Report

Uploaded by

Phani Srikanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views43 pages

GOOGLE AIML Report

Uploaded by

Phani Srikanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Summer Industry Internship – II Report

On

AI-ML Virtual Internship


During

IV Year I Semester Summer

Submitted to

The Department of Computer Science and Engineering-IOT


In partial fulfillment of the academic requirements of
Jawaharlal Nehru Technological University

For

The award of the degree of


Bachelor of Technology

In

Computer Science and Engineering-IOT

By

Anuragh Kamble
(22315A6905)

Name of Internship Co-ordinator : Mrs. C. Swetha


Designation : Assistant Professor

Sreenidhi Institute of Science and Technology


Yamnampet, Ghatkesar, R.R. District, Hyderabad - 501301
CERTIFICATE

This is to certify that this Summer Industry Internship – II Report on “AI-ML Virtual Internship”,
submitted by Anuragh Kamble (22315A6905) in the year 2024 in partial fulfillment of the academic
requirements of Jawaharlal Nehru Technological University for the award of the degree of Bachelor of
Technology in Computer Science and Engineering-IOT , is a bonafide work in industry internship that has
been carried out during IV B Tech CSE-IOT I Semester Summer, will be evaluated in IV B Tech CSE-
IOT I Semester , under our guidance. This report has not been submitted to any other institute or university
for the award of any degree.

Mrs. C. Swetha Dr. T. Venkat Narayana Rao


Assistant Professor Head of Department CSE-IOT
Department of CSE-IOT
Internship Coordinator

External Examiner

Date:-
DECLARATION

I , Anuragh Kamble (22315A6905) student of SREENIDHI INSTITUTE OF SCIENCE AND TECHNOLOGY,


YAMNAMPET, GHATKESAR, studying IV year I semester, CSE-IOT solemnly declare that the Summer Industry

Internship-I Report, titled “AI-ML Virtual Internship” is submitted to SREENIDHI INSTITUTE OF


SCIENCE AND TECHNOLOGY for partial fulfillment for the award of degree of Bachelor of technology
in COMPUTER SCIENCE AND ENGINEERING-INTERNET OF THINGS.

It is declared to the best of our knowledge that the work reported does not form part of any dissertation submitted to
any other University or Institute for award of any degree

Kamble Anuragh
22315A6905
ACKNOWLEDGEMENT

I would like to express my gratitude to all the people behind the screen who helped me to transform an idea into a real
application.

I would like to thank my Project co- ordinator Mrs. C. Swetha for his technical guidance, constant encouragement and
support in carrying out my project at college.

I profoundly thank Dr. T. Venkat Narayana Rao, Head of the Department of Computer Science & Engineering –IOT
who has been an excellent guide and also a great source of inspiration to my work.
I would like to express my heart-felt gratitude to my parents without whom I would not have been privileged to achieve
and fulfill my dreams. I am grateful to our principal, Dr. T. Ch. Siva Reddy, who most ably run the institution and has
had the major hand in enabling me to do my project.
The satisfaction and euphoria that accompany the successful completion of the task would be great but incomplete
without the mention of the people who made it possible with their constant guidance and encouragement crowns all the
efforts with success. In this context, I would like thank all the other staff members, both teaching and non-teaching,
who have extended their timely help and eased my task.

ANURAGH KAMBLE
22315A6905
ABSTRACT

The AI-ML virtual internship offered a comprehensive and practical understanding of artificial intelligence
(AI) and machine learning (ML), focusing on the transformative power of these technologies in real-world
applications. Participants explored foundational AI concepts such as natural language processing (NLP),
generative models, and supervised learning, bridging theory with practice. A notable highlight was the
development of an email spam detector, utilizing tools like Scikit-learn and NLTK. This project involved
key processes such as data preprocessing, vectorization, and the application of Naive Bayes classifiers to
distinguish spam from legitimate emails. The internship emphasized hands-on learning, fostering problem-
solving skills through practical projects. Participants gained experience in designing machine learning
pipelines, understanding feature engineering, and evaluating model performance.

By integrating AI technologies into meaningful applications, the program showcased how automation
enhances efficiency, such as in spam email filtering, while improving overall user experience. Beyond
technical skills, the internship cultivated critical thinking and encouraged participants to consider ethical
implications and real-world challenges of AI systems. It highlighted AI's role in automating tasks,
improving decision-making, and its potential across various industries. In conclusion, the AI-ML virtual
internship provided a transformative experience, equipping participants with the skills, confidence, and
knowledge to create scalable, impactful AI-driven solutions while understanding the broader societal impact
of these technologies.
TABLE OF CONTENTS

S.NO. CONTENT PAGE NO.

1. Executive Summary 1
1.1 Course Learnings Objectives 2
1.2 Course Outcomes 2
2. Overview of the Organization 3
2.1 Introduction of the Organization 3
2.2 Vision 3
2.3 Academy on Barding 3
2.4 Skill Training 3
2.5 Industry Certificate 3
2.6 Placement Linkage 3
2.7 Awards & Recognitions 4
2.8 Future Plans of the Organization 4
3. Internship Part 5
3.1 Intern’s day-to-day Responsibilities include 5
3.2 Software Requirements 5
3.3 Hardware Requirements 5
3.4 Working Conditions 5
4. weekly report 6
4.1 Activity log for first week 6

4.2 Activity log for second week 8

4.3 Activity log for third week 10

4.4 Activity log for fourth week 12

4.5 Activity log for fifth week 14

4.6 Activity log for sixth week 16

4.7 Activity log for seventh week 18

4.8 Activity log for eighth week 20

4.9Activity log for ninth week 22

4.10 Activity log for tenth week 24

5. Project 26
5.1 Install Required Libraries 26
5.2 python code for email spam detection 26
5.3 Explanation of Code 27
5.4 Improving the Model 28
5.5 Output Screens 28
6. Outcomes description 30
5.1 Describe the work environment you have experienced. 30
5.2 Describe the real time technical skills you have acquired. 30
5.3 Describe the managerial skills you have acquired. 30
7. Conclusion 31
7.1 Bibliography 32
8. Appendix 33
1. EXECUTIVE SUMMARY

The internship involved gaining a good understanding of a Machine Learning model for
employee promotion. My task is to design and develop this model, which involves:
• Understanding the data set
• Cleaning of the data set
• Get to know bow the metrics of the data are evaluated
• Create a model suitable for this problem statement
One of the important achievements of this internship was the development of the model
object such that it is flexible according to the data given to it. The objective is to take
anything thrown at it, even though it is not pre-processed sufficiently and outputs the
predicted labels.
A model was finally developed using the above object. It was a prototype solution to a real-
life problem which is promotion of employees based on their performance metrics.
I acquired many new technical skills throughout my work team. I acquired new
knowledge in the area of Machine Learning. I also brushed up my Python skills while
making the Machine Learning Model. Then I got introduced to the area of research and
bow to approach it. Most importantly, the work experience was particularly good which
included good fellowship, cooperative teamwork and accepting responsibilities.
Although I spent a lot of time learning new things, I found that l was well trained in certain
areas that helped me substantially in my projects. Many programming skills that I used in
my projects, such as programming style and design, were ones that I had acquired during
my studies in Computing Science. Work techniques like completing the work beforehand
even though it does not require to be completed today and as well as others are also learnt
during this internship. It taught how to solve a particular problem based only 011 data as
input. Here data means raw data as in numbers. These techniques can be used in my future
job as the whole topic of Analyst is dependent on this. This is the internship report based
on the two-month long internship program that I had successfully completed in AICTE from
18/07/2022 to 24/09/2022 as a requirement of my B. Tech . Program on Department of
Computer Science and Engineering. As being completely new to practical, corporate world
setting. Every hour spent in the internship gave me some amount of experience all the
time all of which cannot be explained in words. But nevertheless, they were all useful for
my career.
The Report will cover background information on the internship I was involved in, as well
as details on how the projects or tasks were developed. This concludes my overall work
experience as well as my opinion of the Industrial Internship Program in general.

1
1.1 COURSE LEARNINGS OBJECTIVES

• Internships are generally thought of to be reserved for college students looking to gain
experience in a particular field. However, a wide array of people can benefit from Training
Internships in order to receive real world experience and develop their skills.
• An objective for this position should emphasize the skills you already possess in the area and
your interest in learning more
• Internships are utilized in a number of different career fields, including architecture.
Engineering, healthcare, economics, advertising and many more.
• Some internships are used to allow individuals to perform scientific research while others
are specifically designed to allow people to gain first-hand experience working.
• Utilizing internships is a great way to build your resume and develop skills that can be
emphasized in your resume for future jobs. then you are applying for a Training
.Internship, make sure to highlight any special skills or talents that can make you stand apart
from the rest of the applicants so that you have an improved chance of landing the position.

1.2 COURSE OUTCOMES


Student will be able to
a. Enhance the technical knowledge by using Modern Tools
b. Become a team leader by participating in the Team work
c. Enhance communication skills by participate in the Group discussions
d. Acquire the Project Skills and will estimating the project cost
e. Improve the life Long learning skills by learning the new technologies on their own

2
2. OVERVIEW OF THE ORGANIZATON

2.1 INTRODUCTION OF THE ORGANIZATION


EduSkills is a Non-profit organization which enables Industry 4.0 ready digital
workforce in India. EduSkills vision is to fill the gap between Academia and Industry by ensuring
world class curriculum access to their faculties and students.
2.2 VISION
To be a world class organization leading technological and socioeconomic
development of the country by enhancing the global competitiveness of technical man power by
ensuring high quality technical education to all sections of the society.
2.3 ACADEMY ON BOARDING
Academy on boarding is the part of the process to establish a platform, to connect
Academia with Corporate to overcome the skill gap at the earliest.
2.4 SKILL TRAINING
Edu Skills Foundation is transforming the vision of "Skilled India" through various
cutting edge inter disciplinary skills to minimize the scarcity, in turn, making them self-
independent.

2.5 INDUSTRY CERTIFICATE


Only Skills without certification does not benefit in long run. To maintain a
sustainable career. Industry certifications are very much required. We provide platform to go
through the required training as well as the corresponding certifications.
2.6 PLACEMENT LINKAGE
We are not confined to provide skilling platform only, but also connect IT/ITES and
Core industries to hire our trained candidate pool. Entrepreneurship We promote more job
provider than job seekers' by conducting several programs.
2.7 AWARDS & RECOGNITIONS
The Academies & Instructors are the backbone to make every program successful. We
take care of our instructors. Who really contribute to the growth of these programs . We connect
them with proper platforms, where the world builders are recognized & awarded.

3
WEBISTE : aicte-india.org

FOUNDED : November 1945

SECTOR : Technology Education

HEADQUARTERS : New Delhi

AGENCY EXECUTIVE : Anil Sahasra budhe (Chairperson)

PARENTAGENCY : Department of Higher Education

TYPE : Statutory Corporation

2.8 FUTURE PLANS OF THE ORGANIZATION

The company:

• plan to expand our production facilities


• intend to continue our focus on training

• intend to enhance our value-added services


• intend to penetrate new industries, expand sales network and enhance brand
awareness

• intend to grow our business through joint ventures and acquisition.

4
3. INTERNSHIP PART

3.1 INTERN'S DAY-TO-DAY RESPONSIBILITIES INCLUDE


• Research and implement appropriate ML algorithms and tools
• Develop machine learning applications according to requirements
• Select appropriate datasets and data representation methods
• Run machine learning tests and experiments

3.2 SOFTWARE REQUIREMENTS


• PYTHON IDLE
• NOTEPAD OTEPAD++
• VISUAL STUDIO CODE
• NETBEANS IDE
• MYSQL
3.3 HARDWARE REQUIREMENTS
• RAM: At least 128MB
• DISK SPACE:124MB for python IDLE,2MBfor importing python modules
• PROCESSOR: Minimum Pentium 2 266 MHz processor.

3.4 WORKING CONDITIONS


Working conditions have been quite remarkable for the company. It is observed as
the company have a strict policy for the work hygiene. The health & safety of the employee is
also a primary concern for the company. They provide health insurance for the employees
under some conditions. Coming to re numeration of the employees, the company provide
quiet handsomely. The company's re numeration model is based on the employee’s
experience and his/her efficiency. Company also distributes the workload such that the
employees doesn't feel work is becoming an overhaul. Management oversees each
employee’s work so that there are no discrepancies.
Work schedule is just normal business hours. Although there are some extra hours
of work on occasional Sundays. These are justified by the company by providing off time in
between working days or hours.

5
4. ACTIVITY LOG AND WEEKLY REPORT

WEEK-1

Table 4.1: Activity log for the first week

Day Brief description of the daily activity Learning Outcome

Introduction to Cloud Concept Introduction to Cloud


Day–1 Overview. computing, Advantages
of the Cloud.

Introduction to AWS Moving to the AWS


Day-2 Cloud.

Introduction to Cloud Fundamentals of


Day–3 Economics and Billing. Pricing, Total cost of
ownership, Case study.
AWS Billing & Cost
Day–4 AWS Organizations management, Billing
Dashboard.
Technical Support
Day–5 AWS Organizations Models, Introduction to
AWS Global
Infrastructure
Introduction to AWS Global AWS Global
It1frastructure overview Infrastructure
Day–6

6
WEEKLY REPORT:
WEEK–1 (From Date 18-04-2024 to Date 23-04-2024)
Objective of the Activity Done:
Cloud Concepts Overview & Cloud Economic

Detailed Report:
In this week, I have learned how to:
 Define different types of cloud computing models
 Describe six advantages of cloud computing
 Recognize the main AWS service categories and core services
 Review the AWS Cloud Adoption Framework
 Explored the fundamental of AWS pricing
 Reviewed TCO concepts
 Reviewed an AWS Pricing Calculator estimate
 Reviewed the Billing dashboard
 Reviewed Technical Support options and costs

7
WEEK-2

Table 4.2: Activity log for the second week

Day Brief description of the daily Learning Outcome


activity

Introduction AWS Global AWS Service & Service


Day–1 Infrastructure Overview. Categories.
AWS Shared Responsibility
Day-2 Model, AWS 1AM, Console
AWS Cloud security Demonstration - Identity and
Access Management.
Securing a New AWS Account.
Day–3 AWS Cloud Security Securing Data.
Working to Ensure
Day–4 AWS Cloud Security Compliance.
Lab Introduction to AWS
1AM.
Network Basics, Amazon VPC,
Networking and Content Console Demonstration- VPC
Day–5 Delivery Wizard, VPC Networking.
VPC Security, Route53 Cloud front,
Networking and Content Lab 2- Build your VPC and Launch
Day–6 Delivery a Web Server.

8
WEEKLY REPORT
WEEK–2 (From Date 25-04-2024 to Date 30-04-2024)
Objective of the Activity Done:
AWS Global Infrastructure Over view.
AWS Cloud Security &Network
Networking and Content Delivery

Detailed Report:

In this week, I have learned how to:

 Identify the difference between AWS Regions, Availability Zones, and edge locations
 Identify AWS service and service categories
 Recognize the shared responsibility model
 Identify the responsibility of the customer and AWS
 Recognize LAM users, groups, and roles
 Describe different types of security credentials in IAM
 Identify the steps to securing a new AWS account
 Explore IAM users and group
 Recognize how to secure AWS data
 Recognize AWS compliance programs

9
WEEK-3

Table 4.3:Activity log for The Third week

Day Brief description of the daily Learning Outcome


activity

Introduction Computer Computer Services Overview


Day–1 Amazon EC2 Part I, Amazon EC2
Part 2, Amazon EC2 Part

Console Demonstration- EC2 Lab


Day-2 Introduction computer 3- Introduction to Amazon EC2,
An1azon EC2 Cost
Optimization
Introduction Activity-
Day–3 Introduction computer Introduction to Container Services
to AWS Lambda, AWS Lambda,
AWS Elastic Beanstal
Introduction to Storage AWS EBS Console
Day–4 Demonstration - EBS, Lab 4-
working with EBS, AWS S3
Introduction to storage Console Demonstration S3, AWS
EFS, Console Demonstration -83
Day–5 and
EFS
Introduction to Storage AWS S3 Glacier, Console
Day - 6 Demonstrations - Glacier

10
WEEKLY REPORT
WEEK–3 (From Dt 01-05-2024 to Dt 06-05-2024)
Objective of the Activity Done:

Introduction Compute and Introduction to Storage


.
Detailed Report:

In this week, I have learned how to:


 Provide an overview of different AWS compute services in the cloud.
 Demonstrate why to use Amazon Elastic Compute Cloud (Amazon EC2)
 Identify the functionality in the Amazon EC2 console
 Perform basic functions in Amazon EC2 to build a virtual computing environment.
 Identify Amazon EC2 cost optimization elements
 Demonstrate when to use AWS Elastic Beanstalk
 Demonstrate when to use AWS Lambda
 Identify how to run containerized applications in a cluster of managed servers
 Identify the different types of storage.
 Explain Amazon S3.
 Identify the functionality in Amazon S3
 Explain Amazon EBS
 Identify the functionality in Amazon EB
 Perform functions in Amazon EBS to build an Amazon EC2 storage solution.
 Explain Amazon EFS
 Identify the functionality in Amazon EF
 Explain Amazon S3 Glacier
 Identify the functionality in Amazon S3 Glacier
 Differentiate between Amazon EBS, Amazon S3, Amazon EFS, and Amazon S3 Glacier

11
WEEK-4

Table 4.4:Activity log for the fourth week

Day Brief description of the daily Learning Outcome


activity

Amazon RDS, Console


Day–1 Demonstration - RDS. Lab 5 -
Databases Build DynamoDB Console'
Demonstration- DynamoDB,
Amazon Redshift. Amazon
Aurora
Introduction Vide What is
Day-2 Introduction to Machine Learning machine Learning?
Business problems solved with
Day–3 Introduction to Machine Learning machine Learning, machine
learning process
Introduction to Machine Learning Machine Learning tools &
Day–4 Overview

Machine learning challenges


Day–5 Introduction to Machine Learning
Amazon Sage maker
Day–6 Introduction to Machine Learning

12
WEEKLY REPORT

WEEK–4(From 08-05-2024 to 13-05-2024)

Objective of the Activity Done: Databases, Cloud Architecture and Introduction to Auto Scaling
and Monitoring.

Detailed Report:

In this week. I have learned bow to:

 Provide an overview of different Business problems solved with machine learning.

 Basic process of how the machine Learning works on real time projects.

 Basic Description of Machine teaming tools that are in basic projects.

 Challenges in Machine Learning Challenges.

 Introduction about AMAZON SageMaker.

 Problem solving that faced with machine learning.

 Overview of Process, Tools and problems of Machine Learning

13
WEEK-5

Table 4.5: Activity log for the fifth week

Day Brief description of the daily


activity Learning Outcome

Implementing a Introduction Video,


Day–1 Machine Learning Formulating
pipeline with machine Learning problems
Amazon Sage Maker
Implementing a Collecting and securing data,
Day-2 Machine Learning Extracting, transforming
pipeline with and
Amazon sage Maker loading data
Implementing a Machine Securing your data, Amazon sage
Day–3 Learning pipeline with Maker - Creating and importing
Amazon Sage Maker data.
Implementing a LAB : AMAZON sage Maker -
Day–4 Machine Learning Creating and importing data
pipeline with Evaluating your data.
Amazon Sage Maker
Implementing a Describing your data, Finding
Day–5 Machine correlations
Learning pipeline with
Amazon Sage Maker
Implementing a LAB : AMAZON sage Maker -
Day–6 Machine Exploring Data. Feature Engineering
Learning pipeline with
Amazon Sage Maker

14
WEEKLY REPORT
WEEK–5 (From Dt 15-05-2024to Dt 20-05-2024)
Objective of the Activity Done: Introduction to Machine Learning Pipeline

Detailed Report:
In this week, I have learned how to:
 Explain Formulating Machine Learning Problems.
 Identify the functionality of Securing Data.
 Identify the functionality of Collecting Data.
 Explain about Extracting, Transforming and Loading Data.
 Describe Securing your data using above methods
 Explain securing your data.
 Perform tasks in on Amazon SageMaker -Creating and importing Data.
 Evaluating your data.
 Describing your data
 Identifying Correlations.
 Performing tasks on Amazon SageMaker - Exploring Data.
 Explain the importance of Feature Engineering.

15
WEEK-6

Table 4.6: Activity log book for the sixth week

Day Brief description of the daily


activity Learning Outcome

Day–1
Implementing Cleaning your Data, Dealing
a Machine Learning with Outliners and selecting
pipeline with Amazon sage features.
Maker

Day-2
Implementing LAB:AMAZON
a Machine Learning pipeline sage Maker -
with Amazon sage Maker Encoding Categorical Data

Day–3
Implementing Training a model using Amazon
a Machine Learning pipeline SageMaker LAB: Training
with Amazon sage Maker Model
Hosting and using the
Day–4
Implementing model , LAB: Amazon
a Machine Learning SageMaker
pipeline with Amazon sage Deploying a model.
Maker
Implementing Evaluating the accuracy of the
Day–5 a Machine Learning pipeline model. Calculating
with Amazon sage Maker classification metrics. Selecting
classification thresholds. LAB:
Generating model performance
metrics.
Implementing Hyper parameter and model
Day–6
a Machine Learning pipeline tuning, LAB: 1-lyperparameter
with Amazon sage Maker and model tuning.

16
WEEKLY REPORT
WEEK –6 (From Date 22-05-2024 to Date 27-05-2024)

Objective of the Activity Done: Extension of Machine Learning Pipeline.


Detailed Report:

In this week, I have learned how to:


 Explain about Cleaning your Data.
 Explain dealing with outliers and selecting features
 Describe available overview of outliers
 Performing tasks on AMAZON SageMaker - Training a model.
 Explain the Hosting and using model.
 Identify how to perform tasks on AMAZON SageMaker - Deploying a model
 Describe the evaluating the accuracy of the model.
 Calculating Classification metrics.

17
WEEK-7

Table 4.7: Activity log book for the seventh week

Day Brief description of the


daily activity Learning Outcome

Day- 1 Introduction Forecasting Introduction Videos,


Forecasting Overview

Day- 2 Introduction Forecasting Processing time series data and


special Considerations for time
series data.
Day– 3 Introduction Forecasting Using Amazon Forecast
Day– 4 Introduction Forecasting Demo of Amazon Forecast

Day– 5 Introduction Forecasting LAB: Creating a


Forecast with
Amazon Forecast part I
Day– 6 Introduction Forecasting LAB: Creating a Forecast with
Amazon Forecast part 2

18
WEEKLY REPORT
WEEK –7 (From Date 29-05-2023 to Date 03-06-2024)

Objective of the Activity Done: Introduction to AMAZON FORECAST


Detailed Report:
In this week. I have learned how to:
 Explain the introduction to Forecasting.
 Describe the Processing of rime series data.
 B1iefing Special Considerations for time series data.
 Process of how to use Amazon Forecast.
 Performing tasks on Amazon Forecasting.
 Introduction of Amazon Forecast.
 Briefing of Amazon Forecast.
 Process of creating a forecast with Amazon Forecasting.

19
WEEK-8

Table 4.8: Activity log book for the eighth week

Day Brief description of the daily


activity Learning Outcome

Introduction to
Day–1 Introduction Computer Vision. Computer
Vision, Image and Video
Analysis
Facial recognition and Video
Day-2 Introduction Computer Vision. Analysis with Amazon
Recognition
Day–3 Introduction Computer Vision. Preparing Customer
Datasets for computer Vision.
Day–4 Introduction Computer Vision. Creating the training
Dataset.

Day–5 Introduction Computer Vision. Evaluation and improve Your


model. Labeling Images with
Amazon Ground Truth
Day–6 Introduction Computer Vision. LAB: Facial Recognition

20
WEEKLY REPORT
WEEK –8 (From Date 05-06-2024 to Date 10-06-2024)

Objective of the Activity Done: Introduction to Computer Vision.

Detailed Report:
In this week, I have learned how to:
 Process of making of Datasets.
 Using of Datasets.
 Extracting of Datasets.
 Processing of Datasets.
 Importing Datasets into Project.
 Introduction of Computer Vision (CV).
 Basics of Computer Vision.
 Modules used for implementing Computer vision in real time projects.
 Describing tasks that perform using Computer vision.
 Completing Image and Video Analysis.
 Implementing custom datasets.
 Labelling images with Amazon Ground Truth.
 Preforming tasks on Facial Recognition.

21
WEEK-9

Table 4.9: Activity log book for the ninth week

Day Brief description of the daily


activity Learning Outcome

Day–1 Introduction to Natural Language Introduction To


Processing
Language processing.

Day-2 Introduction to Natural Language Introduction of Natural Language


Processing processing.

Day–3 Introduction to Natural Language Overview of Natural


Processing
Language processing.

Day–4 Introduction to Natural Language Natural Language processing


Processing
Managed Services-part I.

Day–5 Introduction to Natural Language Natural Language processing


Processing
Managed Services-part 2.

Day–6 Introduction to Natural Language Amazon comprehend


Processing

22
WEEKLY REPORT
WEEK –9 (From Date 12-06-2024 to Date 17-06-2024 )

Objective of the Activity Done: Introduction to Natural Language Processing.

Detailed Report:
In this week, I have learned how to:
 Identifying Language.
 Processing the Natural Language
 Introduction of Processing of Natural Language.
 Modules used in NLP.
 Overview of NLP (Natural Language Processing)

23
WEEK-10

Table 4.10: Activity log book for the tenth week

Day Brief description of the daily


activity Learning Outcome

Day–1 Overview of amazon


Introduction to Natural Language
comprehend
Processing

Day-2 Introduction
Introduction to Natural Language
Processing to Amazon Dolly

Day–3 Description
Introduction to Natural Language
Processing about Amazon Polly

Day–4 Amazon Translate


Introduction to Natural Language
Processing
LAB part-I: AMAZON
Day–5 Introduction to Natural Language
Processing lex-Create a chatbot

Day–6 Working on AMAZON


Introduction to Natural Language
Processing Lex -Create a chatbot

24
WEEKLY REPORT
WEEK –10 (From Date 19-06-2024 to Date 24-06 -2024)

Objective of the Activity Done: Extension of Natural Language Processing.


Detailed Report:
In this week. I have learned how to:
 Introduction about Amazon Comprehend.
 Briefing of Amazon Comprehend.
 Overview of Amazon Comprehend.
 Introduction to Amazon Polly.
 Explanation about Amazon Polly.
 Explain about Amazon Translate.
 Performing Tasks on AMAZON LEX.
 Creating a Chat bot.

25
5. PROJECT

From the learnings throughout this course, I have got a deeper understanding about the tools and frameworks
used in the
AI-ML projects, using all the learnings I have implemented a project which detects whether an email is spam or
ham.

An email spam detector can be created using a machine learning model trained on a dataset containing labeled
spam and non-spam emails. Below is a Python implementation using the Natural Language Toolkit (NLTK)
and Scikit-learn, two popular libraries for natural language processing and machine learning.

STEP-BY-STEP IMPLEMENTATION:

5.1 INSTALL REQUIRED LIBRARIES:

Ensure the following libraries are installed. Run the following commands if needed:

pip install numpy pandas scikit-learn nltk

5.2 PYTHON CODE FOR EMAIL SPAM DETECTOR

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load Dataset


# Sample dataset: Can be replaced with a CSV file containing labeled email data
data = {
"Email": [
"Win a $1000 cash prize now!",
"Meeting at 3 PM tomorrow, don't forget the notes.",
"You are pre-approved for a credit card.",
"Hello team, please review the attached file.",
"Get cheap meds online now!",
],
"Label": ["spam", "ham", "spam", "ham", "spam"], # spam=spam email, ham=non-spam
email
}

df = pd.DataFrame(data)

# Step 2: Preprocess Data


df['Label'] = df['Label'].map({'spam': 1, 'ham': 0}) # Convert labels to numeric
X = df['Email'] # Features
26
y = df['Label'] # Labels

# Step 3: Vectorize Text Data


vectorizer = CountVectorizer()
X_vectorized = vectorizer.fit_transform(X)

# Step 4: Split Data into Training and Test Sets


X_train, X_test, y_train, y_test = train_test_split(X_vectorized, y, test_size=0.25,
random_state=42)

# Step 5: Train the Model


model = MultinomialNB()
model.fit(X_train, y_train)

# Step 6: Evaluate the Model


y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

# Step 7: Test with New Email


new_email = ["Congratulations! You have won a free trip to Hawaii."]
new_email_vectorized = vectorizer.transform(new_email)
prediction = model.predict(new_email_vectorized)
print("Prediction (1=spam, 0=ham):", prediction)

5.3 EXPLANATION OF CODE

1. Dataset:

 Replace the sample data with a larger dataset like the SpamAssassin Public Corpus or similar datasets
available online.
 The Label column uses 1 for spam and 0 for ham (non-spam).

2. Text Vectorization:

 Converts text into numerical data using CountVectorizer, which creates a bag-of-words representation.

3. Model Training:

 The MultinomialNB model is used, which is well-suited for text classification tasks like spam detection.

27
4. Evaluation:

 Measures accuracy and provides a classification report (precision, recall, F1-score).

5. Prediction:

 Tests the model on new email text to determine if it's spam (1) or not (0).

5.4 IMPROVING THE MODEL:

 Use TF-IDF (Term Frequency-Inverse Document Frequency) for better text representation:

from sklearn.feature_extraction.text import TfidfVectorizer


vectorizer = TfidfVectorizer()
X_vectorized = vectorizer.fit_transform(X)

 Experiment with other machine learning models (e.g., Logistic Regression, Support Vector Machines).

 Use a larger, real-world dataset for better training and evaluation.

5.5 OUTPUT SCREENS:

TEST – 1:

Accuracy: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 2
1 1.00 1.00 1.00 1
Prediction (1=spam, 0=ham): [1]

28
TEST – 2 :

SAMPLE INPUT:

"Congratulations! You’ve been selected as a lucky winner for our special offer. Act now to
claim your prize!"
OUTPUT:

Accuracy: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 2
1 1.00 1.00 1.00 1
Prediction (1=spam, 0=ham): [1]
TEST – 3:

SAMPLE INPUT:

"Your invoice for last month’s services is attached. Let us know if you have any questions."
OUTPUT:

Accuracy: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 2
1 1.00 1.00 1.00 1
Prediction (1=spam, 0=ham): [0]

29
6. OUTCOMES DESCRIPTION

6.1 DESCRIBE THE WORK ENVIRONMENT YOU HAVE EXPERINCED

My work environment is one where I'm able to work as part of a team and that allows everyone's talents
to grow. As I researched your company, I noticed its devotion to cultivating each employee's skills and
abilities. I've found that this type of environment is most conducive to my productivity, especially in a
position that requires me to constantly improve my design skills. It allows me to remain passionate
about my job and helps me express my creativity to the best of my ability.
6.2 DESCRIBE THE REAL TIME TECHNICAL SKILLS YOU HAVE
ACQUIRED
Technical skills, I have acquired:

• Data extraction

• Data cleaning

• Classifications

• Regression

• Python implementation of various machine learning algorithms

• Numpy

• Pandas

• Sklearn

• Keras

6.3 DESCRIBE THE MANAGERIAL SKILLS YOU HAVE ACQUIRED


Managerial Skills are:

a. Technical Skill.

b. Conceptual Skill.
c. Interpersonal and Communication Skills.

d. Decision-Making Skill.

e. Diagnostic and Analytical Skills.

30
7. CONCLUSION

The AI-ML virtual internship was an enriching experience that provided a robust
understanding of artificial intelligence (AI) and machine learning (ML) concepts, particularly
their practical applications in solving real-world problems. During this program, I explored
core principles such as natural language processing (NLP), generative models, and the
integration of these techniques into projects. The focus on NLP expanded my comprehension
of how AI systems interpret, process, and generate human language effectively, which is vital
for tasks like text classification, sentiment analysis, and conversational AI.

One of the key projects undertaken was the implementation of an email spam detector. This
project involved preprocessing data, applying machine learning models, and evaluating their
performance. Leveraging tools like Scikit-learn and Natural Language Toolkit (NLTK), I
gained hands-on experience in creating a classifier capable of distinguishing spam emails
from legitimate ones. The use of techniques like vectorization and supervised learning models
(e.g., Naive Bayes) was crucial in building a robust and efficient solution.

The spam detector project underscored the significance of AI in automating tedious and
repetitive tasks, such as filtering spam emails, which enhances productivity and user
experience. This hands-on exercise reinforced critical problem-solving skills and emphasized
the importance of data preprocessing, feature engineering, and model evaluation for achieving
accurate results.

Overall, the internship bridged theoretical knowledge with practical application, deepening
my understanding of AI's transformative capabilities. It also fostered critical thinking and
collaboration, equipping me with skills to innovate in AI-driven domains. With projects like
the spam detector, I now feel more confident in applying AI-ML techniques to build scalable
and impactful solutions in future endeavors.

31
7.1 BIBLIOGRAPHY

1. Bishop, Christopher M
Pattern Recognition and Machine Learning.
This book provides an in-depth understanding of supervised learning models like Naive
Bayes, which are foundational for text classification tasks such as spam detection.
Publisher: Springer, 2006.
URL: [Springer Link](https://www.springer.com/gp/book/9780387310732)

2. Scikit-learn Documentation
Scikit-learn: Machine Learning in Python.
This official documentation explains the implementation and usage of machine learning
models, including Naive Bayes, vectorization techniques, and model evaluation metrics, used
in the spam detector.
URL: [Scikit-learn Documentation](https://scikit-learn.org/stable/)

3. NLTK Documentation
Natural Language Toolkit Documentation.
This source offers insights into processing text data, tokenization, and other natural
language processing techniques relevant to building the spam detection pipeline.
URL: [NLTK Documentation](https://www.nltk.org/)

4. SpamAssassin Public Corpus


The SpamAssassin Public Mail Corpus.
A publicly available dataset for spam email detection that is widely used for training and
testing spam classification models.
URL: [Apache SpamAssassin](https://spamassassin.apache.org/publiccorpus/)

5. Raschka, Sebastian
Python Machine Learning.
This book provides practical examples of implementing machine learning projects,
including text classification and data preprocessing, using Python libraries like Scikit-learn.
Publisher: Packt Publishing, 2015.
URL: [Python Machine Learning](https://www.packtpub.com/product/python-machine-
learning-third-edition/9781789955750)

32
APPENDIX C: ABSTRACT

Sreenidhi Institute of Science and Technology


Summer Industry Internship -II
Batch No:19-D1
Title
Roll No Name

22315A6905 K.Anuragh CREATING A EMAIL SPAM DETECTOR


USING NLTK AND SCIKIT-LEARN

ABSTRACT

The AI-ML virtual internship provided a practical understanding of artificial intelligence,


focusing on concepts like natural language processing (NLP) and machine learning (ML). A key
highlight was the development of an email spam detector using tools like Scikit-learn and NLTK,
showcasing the automation of spam filtering through supervised learning models. This project
emphasized the importance of data preprocessing, feature engineering, and model evaluation.
Overall, the internship bridged theoretical and practical AI-ML knowledge, fostering critical
thinking and equipping participants to create scalable, impactful solutions in AI-driven domains.

Student : Internal Guide HOD-IT


K.Anuragh Dr. T. Venkat Narayana
Assistant Professor Dept Rao
of IOT Professor

APPENDIX D: CORRELATION BETWEEN THE SUMMER


INDUSTRY INTERNSHIP-II AND THE
PROGRAMOUTCOMES (POS), PROGRAM SPECIFIC
OUTCOMES (PSOS)

Batch No:19-D1
Title
Roll No Name

22315A6905 K.Anuragh CREATING AN EMAIL SPAM DETECTOR


USING NLTK AND SCIKIT-LEARN

33
Table 1: Project/Internship correlation with appropriate POs/PSOs (Please specify
level of Correlation, H/M/L against POs/PSOs)

H High M Moderate L Low

SREENIDHI INSTITUTE OF SCIENCE AND


TECHNOLOGY DEPARTMENT OF COMPUTER
SCIENCE ANDENGINEERING
Projects Correlation with
POs/PSOs

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO12 PSO1 PSO2 PSO3
PO11
M L L H H L M H M H H H H H M

Student : K.Anuragh Internal Guide HOD-IT


Dr. T. Venkat Narayana
Assistant Professor Dept Rao
of IOT Professor

APPENDIX E: DOMAIN OF INTERNSHIP AND NATURE OF


INTERNSHIP

Batch No:19-D1
Title
Roll No Name

22315A6905 K.Anuragh CREATING AN EMAIL SPAM DETECTOR


USING NLTK AND SCIKIT-LEARN

34
Table 2: Nature of the Project/Internship work (Please tick √ Appropriate for your
project)

Batch Title Nature of project


No.

Product Application Research Others


(please
specify)

20 CREATING AN EMAIL
SPAM DETECTOR
USING NLTK AND ✓
SCIKIT-LEARN

Student : K.Anuragh Internal Guide HOD-IT


Dr. T. Venkat Narayana
Assistant Professor Dept Rao
of IOT Professor

35
Table 3: Domain of the Project/ Internship work (Please tick √ Appropriate for your
project)

Batch Domain of the project


No.
Title
Artificial Computer Data Cloud Software
Intelligence, Networks, warehousing, computing, engineering,
Internet of Image
Machine Information Data mining, things processing
Learning and Big data
deep learning security, Cyber analytics
security

CREATING
AN EMAIL
SPAM
DETECTOR ✓
USING
20
NLTK AND
SCIKIT-
LEARN

Student : K.Anuragh Internal Guide HOD-IT


Dr. T. Venkat Narayana
Assistant Professor Rao
Dept of IOT Professor

36

You might also like