0% found this document useful (0 votes)

37 views45 pages

Data Science: Industrial Training Report

Uploaded by

Harmeet kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views45 pages

Data Science: Industrial Training Report

Uploaded by

Harmeet kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

An

Industrial Training Report

Data Science
Submitted in partial fulfilment for the award
of the Degree Of
Bachelor of Technology In
ARTIFICIAL INTELLIGENCE AND
DATA SCIENCE

Submitted To: Submitted By:

Mr. Ankur Dutt Sharma Manoj Kumari
HOD AI&DS VII SEM AI&DS
21EAIAD020

Department of Artificial Intelligence and Data Science

Arya College of Engineering, Kukas, Jaipur
Rajasthan Technical University, Kota (2024-25)
ARYA COLLEGE OF ENGINEERING
SP-40, RIICO Industrial Area, Jaipur (Raj)-302028

Department of Artificial Intelligence and Data Science

Certiﬁcate

This is to certify that the work, which is being presented in the

Practical training seminar report for practical training taken at

“CELEBAL TECH NOLOGIES” entitled

“DATA SCIENCE” submitted by Ms. Manoj Kumari a student of

fourth year (VII Sem)
B.Tech. in Computer Engineering as a partial fulfilment for the
award of degree of bachelor of technology is a record of student’s
work carried out and found satisfactory for submission.

Mr. Ankur Dutt Sharma

Head Of Department
Training Certificate

i
Candidate’s Declaration
I hereby declare that the work, which is being presented in the
Industrial Training report, entitled “Data Science” in partial
fulfilment for the award of Degree of “Bachelor of Technology”
submitted to the Department of Artificial Intelligence and Data
Science, Arya College of Engineering, is a record of my own
investigations carried under the Guidance of Mr. Ankur Dutt
Sharma, Head of Department of Artificial Intelligence and Data
Science.

(Signature of Candidate)

Candidate Name
Manoj Kumari

Roll No.: 21EAIAD020

ii
Abstract

Python, known for its simplicity, flexibility, and large ecosystem of libraries and modules, is
the perfect choice for creating AI and machine learning applications. In this tutorial, we explore
the basics of AI as it relates to Python, discussing its core concepts, libraries for AI and ML,
and code examples showcasing basic principles.

Artificial Intelligence, Machine Learning and Deep Learning are the buzzwords that have been
able to grasp the interest of many researchers since various years. Enabling computers to think,
decide and act like humans has been one of the most significant and noteworthy developments
in the field of computer science. Various algorithms have been designed over time to make
machines impersonate the human brain and many programming languages have been used to
implement those algorithms. Python is one such programming language that provides a rich
library of modules and packages for use in scientific computing and machine learning. This
paper aims at exploring the basic concepts related to machine learning and attempts to
implement a few of its applications using python. This paper majorly used the Scikit-Learn
library of Python for implementing the applications developed for the purpose of research.

Data science is a multidisciplinary field that uses scientific methods, algorithms, and systems
to extract knowledge and insights from structured and unstructured data. It combines
statistics, computer science, and domain expertise to analyze data, build predictive models,
and solve co mplex problems. Essentially, it's all about turning raw data into actionable
intelligence.

Keywords: Machine Learning, Python, Scikit-Learn, AI, ML, Deep Learning, NumPy,
Matplotlib, Workflow of machine learning, NLTK, statistics, multidisciplinary, bData science
predictive models.

iii
ACKNOWLEDGEMENT

On the completion of the industrial training on DATA SCIENCE.

I would like to thanks the Department of Artificial Intelligence and Data Science,
Arya College Of Engineering, Head Of Department Mr. Ankur Dutt Sharma for
providing us the opportunity to have such a training where we could get the
exposure of competing and performing with students from other colleges and
universities.
I would also like to express my heartful gratitude to Mr. Ankur Dutt Sharma under
whose guidance I have been able to complete this training successfully and gain
experience and knowledge about the various topics of the subject.
I would also like to thank all the teaching assistants at Celebal Technologies, Jaipur,
they have been very helpful throughout the process both in solving our doubts and
motivating us to complete our tasks and assignments and helping us learn.
I would also like to express my deepest appreciation for Mr. Sarthak Acharjee for
guiding me throughout the training and all the people who have directly or
indirectly helped me to successfully complete the training .

Manoj Kumari
21EAIAD020

iv
Learning/Internship Objectives

• Internships are generally thought of to be reserved for college students looking

to gain experience in a particular field. However, a wide array of people can
benefit from Training Internships in order to receive real world experience and
develop their skills.
• An objective for this position should emphasize the skills you already possess
in the area and your interest in learning more.
• Internships are utilized in a number of different career fields, including
architecture, engineering, healthcare, economics, advertising and many more.
• Some internships are used to allow individuals to perform scientific research
while others are specifically designed to allow people to gain first-hand
experience working.
• Utilizing internships is a great way to build your resume and develop skills that
can be emphasized in your resume for future jobs. When you are applying for a
Training Internship, make sure to highlight any special skills or talents that can
make you stand apart from the rest of the applicants so that you have an
improved chance of landing the position.

v
TABLE OF CONTENTS
TITLE PAGE NO.

CERTIFICATE i
CANDIDATE’S DECLARATION ii
ABSTRACT iii
ACKNOWLEDGEMENT iv
LEARNING OBJECTIVES v
CHAPTER 1: INTRODUCTTON OF DATA SCIENCE 1-2
CHAPTER 2: OVERVIEW OF AI&ML 3-4
CHAPTER 3: PYTHON OVERVIEW 5-6
CHAPTER 4: IMAGE PROCESSING 7-8
CHAPTER 5: STATISTICS 9-10
CHAPTER 6: APPLICATIONS 11-12
CHAPTER 7: LIBRARIES IN PYTHON 13-20
CHAPTER 8: MACHINE LEARNING 21-28
ALGORITHMS
CHAPTER 9: NATURAL LANGUAGE PROCESSING 29-30
CHAPTER 10:DEEP LEARNING ALGORITHMS 31-35
CHAPTER 11: CONCLUSION 36
Chapter 1
INTRODCTION OF
DATA SCIENCE

 What is Data Science?

Data science is a multidisciplinary field that combines various techniques and tools from
statistics, computer science, and domain-specific knowledge to extract meaningful insights from
data. It involves the entire process of data collection, preprocessing, analysis, and interpretation
to solve complex problems and uncover hidden patterns. Data scientists work with a wide range
of data types, including structured, unstructured, and semistructured data, to derive actionable
insights that can inform business decisions and drive innovation. At its core, data science is about
uncovering the story behind the data and communicating those findings in a clear and compelling
way. It requires a unique blend of technical skills, analytical thinking, and domain expertise to
transform raw data into tangible business value

 The Data Science Lifecycle

The data science lifecycle is a structured process that guides data scientists through the various stages
of a project. This typically includes:
1. Problem Identification and Framing:
Define the business or research question you're trying to solve with data science. Clear goals an
d objectives are crucial.
2. Data Collection and Preprocessing:
Gather relevant data from various sources. This could be databases, APIs, web scraping, or
eve n IoT devices.Clean and preprocess the data to handle missing values, outliers, and
inconsistenc ies. It's like tidying up before a party.
3. Exploratory Data Analysis:
Analyze the data to uncover patterns, relationships, and insights. Visualization tools like Matplo
tlib and Seaborn come in handy here.
4. Feature Engineering and Selection:
Create new features or modify existing ones to improve the performance of machine learning m
odels. Choose the appropriate machine learning algorithm and train the model using your data.
5. Model Building and Evaluation:
Test the model with unseen data to evaluate its performance. Metrics like accuracy, precision,
and recall help assess how well it works.
6. Model Deployment and Monitoring:
Deploy the model into a production environment where it can make real-
time predictions and decisions.Continuously monitor the model's performance and update it as
needed. Even the best models need regular check-ups.
7. Communicating Insights and Driving Action:
Present your findings and insights to stakeholders. Effective communication ensures that data sc
ience translates into actionable business decisions.

By following this structured approach, data scientists can ensure that their work is aligned with the
business objectives, leverages the most relevant data sources, and delivers actionable insights that
can be effectively communicated to stakeholders

Data science is a game-changer across a ton of fields. Here are some of its killer applications:
1) Healthcare:
Improving diagnostics, personalized treatments, and predicting disease outbreaks. Think AIdriven
medical imaging and patient data analysis.
2) Finance:
Fraud detection, risk management, and algorithmic trading. Banks use data science to spot fraudul
ent transactions and optimize investment portfolios.
3) Marketing:
Personalized marketing campaigns, customer segmentation, and sentiment analysis. It's why you
g et those eerily spot-on recommendations!
4) Retail:
Inventory management, demand forecasting, and customer behavior analysis. Helps companies ke
ep their shelves stocked with what you want.
5) Transportation:
Optimizing routes, predicting maintenance needs, and managing fleets. Makes logistics and ridesh
aring super efficient.

2
Chapter 2
OVERVIEW OF AI&ML

AI & ML are techniques, code or algorithms that enable machines to develop, demonstrate and mimic
human cognitive behavior or intelligence and hence the name “Artificial Intelligence”. Some of the
most successful applications of AI around us can be seen in Robotics, Computer Vision, Virtual
Reality, Speech Recognition, Automation, Gaming and so on…
Artificial Intelligence is constantly pushing the boundaries of what machines are capable of. The Main
purpose of AI & ML is to train real time smart machines to use their speed and capability. Most
importantly, machines can think and perform tasks like humans.
By AI & ML we would get to learn about Building Artificially Intelligent systems including computer
vision and natural language processing techniques. Machine Learning & Deep Learning are the key
part of this course, and are implemented using Python Scripting. Various Libraries like numpy, pandas,
matplotlib, scikit-learn, tensorflow etc. were used .
Introduction of AI & Machine Learning
Artificial Intelligence is a technique for building systems that mimic human behavior or
decisionmaking.
Machine Learning is a subset of AI that uses data to solve tasks. These solvers are trained models of
data that learn based on the information provided to them. This information is derived from probability theory
and linear algebra. ML algorithms use our data to learn and automatically solve predictive tasks. Deep
Learning is a subset of Machine Learning Which relies on the multi layer neural networks to
Solve.

FIG.1

3
FIG 2: Relation between Artificial Intelligence , Machine Learning and Deep Learning

• Use of Data in the world of AI

1. Al development and the role of data at each step
2. Data types used in AI development
3. Data characteristics that influence the process or outcome of Al development
4. Socio-ethical, economic and environmental impacts of data in Al
5. Law and transparency as modifiers to impacts of data in Al
6. Availability of accessibility to data for Al development
7. Data quality and challenges in three fields (pandemic response, human language technologies for
under-resourced languages, and AI applications in the criminal justice system).

4
Chapter 3
Python Overview

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is

designed to be highly readable. It uses English words frequently whereas other languages use
punctuation, and it has fewer syntactic constructions than other languages.
• Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile
your program before executing it. This is similar to PERL and PHP.
• Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter
directly to write your programs.
• Python is Object-Oriented − Python supports Object-Oriented style or technique of programming
that encapsulates code within objects.
• Python is a Beginner's Language − Python is a great language for the beginner-level programmers
and supports the development of a wide range of applications from simple text processing to WWW
browsers to games.
Python is an open-source and cross-platform programming language. It is available for use
under Python Software Foundation License (compatible with GNU General Public License) on all
the major operating system platforms Linux, Windows and Mac OS.

Python Quicker: Keywords, Data Types, Operators

FIG 3 FIG 4

5
FIG 5

o Comprehensions in python:
1 List Comprehensions:
Syntax:
output_list = [output_exp for var in input_list if (var satisfies this condition)]
2 Dictionary Comprehensions:
Syntax:
output_dict = {key:value for (key, value) in iterable if (key, value satisfy this condition)}
3 Set Comprehensions:
Syntax:
newSet= { expression for element in iterable }
4 Generator comprehension:
Syntax:
generator= (expression for element in iterable if condition)

• What is Python Module

A Python module is a file containing Python definitions and statements. A module can define functions,
classes, and variables. A module can also include runnable code. Grouping related code into a module
makes the code easier to understand and use. It also makes the code logically organized.

6
Names of python modules : creating module , import module , Rename module , import a part of
module.

7
Chapter 4
Image Processing
• What Is Image Processing?
Image processing is the process of transforming an image into a digital form and performing certain
operations to get some useful information from it. The image processing system usually treats all
images as 2D signals when applying certain predetermined signal processing methods.

• About Computer Vision in AI:

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive
meaningful information from digital images, videos and other visual inputs — and take actions or
make recommendations based on that information. If AI enables computers to think, computer vision
enables them to see, observe and understand.

• About Digital Image Processing:

Digital image processing is the use of algorithms and mathematical models to process and analyze
digital images. The goal of digital image processing is to enhance the quality of images, extract
meaningful information from images, and automate image-based tasks.

FIG 6
There are a few main types of image processing:
• Visualization: Objects not visible in the image are detected
• Recognition: Detect objects present in the image
• Sharpening and Restoration: Original images are enhanced
• Pattern Recognition: The patterns in the image are measured
• Retrieval: Find images that are similar to the original by searching a large database.

8
Some Libraries that are using in Image Processing and Data Processing:
 OpenCV:
OpenCV is often deployed for computer vision tasks like face detection, object detection, face
recognition, image segmentation, and much more.
Some of the main highlights of OpenCV:
1. Used by major companies like IBM, Google, and Toyota
2. Algorithmic efficiency Vast access to algorithms
3. Multiple interfaces
 Scikit-Image :
Scikit-Image, which uses NumPy arrays as image objects, offers many different algorithms for
segmentation, color space manipulation, geometric transformation, analysis.
 SciPy :
This image processing library is another great option if you’re looking for a wide range of applications
like image segmentation, convolution, reading images, face detection, feature extraction, and more.
 Matplotlib:
The image processing library is usually used for 2D visualizations like scatter plots, histograms, and
bar graphs, but it has proven to be useful for image processing by effectively pulling information out
of an image.
 NumPy:
NumPy is an open-source Python library used for numerical analysis, it can also be used for image
processing tasks like image cropping, manipulating pixels, masking of pixel values, and more. NumPy
contains a matrix and multi-dimensional arrays as data structures.
 Pandas
Pandas is an open-source library commonly used in data science. It is primarily used for data analysis,
data manipulation, and data cleaning. Pandas allow for simple data modeling and data analysis
operations without needing to write a lot of code. As stated on their website, pandas is a fast, powerful,
flexible, and easy-to-use open-source data analysis and manipulation tool.
 Scikit-Learn
The terms machine learning and scikit-learn are inseparable. Scikit-learn is one of the most used
machine learning libraries in Python. Built on NumPy, SciPy, and Matplotlib, it is an open-source
Python library that is commercially usable under the BSD license. It is a simple and efficient tool for
predictive data analysis tasks.

9
Chapter 5
STATISTICS
Statistics provides the framework for understanding and interpreting data. It enables us to calculate
uncertainty, spot trends, and draw conclusions about populations from samples. In data science, a
strong grasp of statistical concepts is crucial for making informed decisions, validating findings,
and building robust models.

1. Descriptive Statistics
Descriptive statistics help us summarize and describe the key characteristics of a dataset. This
includes measures of central tendency like mean (average), median (middle value), and mode
(most frequent value), which tell us about the typical or central value of a dataset. We also use
measures of variability, such as range (difference between maximum and minimum values),
variance, and standard deviation, to understand how spread out the data is. Additionally, data
visualization techniques like histograms, bar charts, and scatter plots provide visual
representations of data distributions and relationships, making it easier to grasp complex
patterns.
2. Inferential Statistics
Inferential statistics, on the other hand, allow us to make generalizations about a population
based on a sample. This involves understanding how to select representative samples and how
they relate to the overall population. Hypothesis testing is a key tool in inferential statistics,
allowing us to evaluate whether a hypothesis about a population is likely to be true based on
sample data. We also use confidence intervals to estimate the range of values within which a
population parameter is likely to fall. Finally, p-values and significance levels help us
determine the statistical significance of results and whether they are likely due to chance.
10
The Fundamental Statistics Concepts for Data Science:
1. Correlation
Correlation quantifies the relationship between two variables. The correlation coefficient, a
value between -1 and 1, indicates the strength and direction of this relationship. A positive
correlation means that as one variable increases, so does the other, while a negative correlation
means that as one variable increases, the other decreases. Pearson correlation measures linear
relationships, while Spearman correlation assesses monotonic relationships.
2. Regression
Regression analysis is a statistical method used to model the relationship between a dependent
variable and one or more independent variables. Linear regression models a linear relationship,
while multiple regression allows for multiple independent variables. Logistic regression is used
when the dependent variable is categorical, such as predicting whether a customer will churn or
not.
3. Bias
Bias refers to systematic errors in data collection, analysis, or interpretation that can lead to
inaccurate conclusions. Selection, measurement, and confirmation bias are examples of different
types of bias. Mitigating bias requires careful data collection and analysis practices, such as
random sampling, blinding, and robust statistical methods.
4. Probability
Probability is the study of random events and their likelihood of occurrence. Expected values,
variance, and probability distributions are examples of fundamental probability concepts.
Conditional probability and Bayes’ theorem allow us to update our beliefs about an event based
on new information.
5. Statistical Analysis
Statistical analysis is the process of testing hypotheses and making inferences about data using
statistical techniques. Analysis of variance (ANOVA) compares means between multiple groups,
while chi-square tests assess the relationship between categorical variables.
6. Normal Distribution
Numerous natural phenomena can be described by the normal distribution, commonly referred
to as the bell curve. It is a common probability distribution. It’s characterized by its mean and
standard deviation. Z-scores standardize values relative to the mean and standard deviation,
allowing us to compare values from different normal distributions.

11
Chapter 6
APPLICATIONS

When discussing applications in the context of Artificial Intelligence and Machine Learning, we're
referring to the practical uses and implementations of these technologies across various industries and
domains. Here are some notable AI and ML applications:
1. Healthcare:

• Medical Diagnosis: AI is used for diagnosing diseases, such as cancer, diabetes, and heart conditions,
by analysing medical images and patient data.
• Drug Discovery: ML models help identify potential drug candidates and predict their efficacy,
accelerating the drug development process.
• Personalized Medicine: AI assists in tailoring treatment plans and medications based on an individual's
genetic makeup and health history.
2. Finance:

• Algorithmic Trading: ML algorithms analyse financial data to make real-time trading decisions,
optimizing investment portfolios.
• Credit Scoring: AI assesses creditworthiness by analysing an applicant's financial history and
behaviour.
• Fraud Detection: ML models detect fraudulent transactions and activities by identifying unusual
patterns and anomalies.
3. Autonomous Vehicles:

• Self-Driving Cars: AI and ML enable vehicles to perceive their environment, make decisions, and
navigate without human intervention.
• Drones and UAVs: Unmanned aerial vehicles use AI for navigation, surveillance, and delivery tasks.
4. Natural Language Processing (NLP):

• Chatbots: NLP-powered chatbots provide customer support, answer queries, and automate interactions
in various industries.
• Language Translation: AI translates text and speech across languages, enabling global
communication.
• Sentiment Analysis: NLP algorithms analyse social media and customer reviews to gauge public
sentiment about products and services.

12
5. AI CHATBOT: AI chatbots are computer programs that use artificial intelligence to mimic human
conversation. They can be used for customer service, education, and entertainment. Some popular
AI chatbots include Bing, ChatGPT, Tay, ELIZA and cleverbot.
6. RECOMMENDATION SYSTEM: Various platforms that we use in our daily lives like e-
commerce, entertainment websites, social media, video sharing platforms, like youtube, etc., all
use the recommendation system to get user data and provide customised recommendations to users
to increase engagement.
7. ROBOTICS: Robotics is another field where Artificial Intelligence applications are commonly
used. Robots powered by AI, use real-time updates to sense obstacles in its path and pre-plan its
journey instantly. It can be used for: Carrying goods in hospitals, factories, and warehouses,
Cleaning offices and large equipment, Inventory management.
8. AUTOMOBILES: AI is also used in self -driving vehicles .AI can be used along with the
vehicle’s camera, radar, cloud services, GPS, and control signals to operate the vehicle. AI can
improve the in-vehicle experience and provide additional systems like emergency braking, blind-
spot monitoring, and driver-assist steering.
9. SPAM FILTERS: The email that we use in our day-to-day lives has AI that filters out spam emails
sending them to spam or trash folders, letting us see the filtered content only. The popular email
provider, Gmail, has managed to reach a filtration capacity of approximately 99.9%.

13
Chapter 7
Libraries in Python

Python has a rich ecosystem of libraries for data science, analysis, machine learning, and artificial
intelligence (AI). Here's a list of popular libraries in each of these categories:
6.1 Pandas
Pandas is a popular Python library for data manipulation and analysis. It provides data structures
and functions for working with structured data, such as spreadsheets or SQL tables, making it a
fundamental tool for data scientists and analysts. Below, I'll explain some of the key functions and
concepts in Pandas:
1. Data Structures:
• Series: A one-dimensional array-like object containing data and associated labels or indexes. It is
similar to a column in a spreadsheet or a single column of a database table.
• DataFrame: A two-dimensional, tabular data structure with rows and columns. It is similar to a
spreadsheet or a SQL table. DataFrames are the most commonly used Pandas data structure.
2. Data Import and Export:
• pd.read_csv(): Reads data from a CSV file into a DataFrame.
• pd.read_excel(): Reads data from an Excel file into a DataFrame.
• df.to_csv(): Writes data from a DataFrame to a CSV file.
• df.to_excel(): Writes data from a DataFrame to an Excel file.
3. Data Explossration:
• df.head(): Returns the first n rows of a DataFrame.
• df.tail(): Returns the last n rows of a DataFrame.
• df.info(): Provides information about the DataFrame, including data types and missing values.
• df.describe(): Generates summary statistics of numeric columns.
• df.shape: Returns the dimensions (number of rows and columns) of the DataFrame.
• df.columns: Returns the column names of the DataFrame.
4. Data Selection and Indexing:
• df['column_name'] or df.column_name: Selects a single column from the DataFrame.
• df[['column1', 'column2']]: Selects multiple columns.
• df.loc[row_label]: Selects rows by label.
• df.iloc[row_index]: Selects rows by integer index.

14
5. Data Manipulation and Transformation:
▪ df.drop(): Removes specified rows or columns from the DataFrame.
• df.rename(): Renames columns or indexes.
• df.sort_values(): Sorts the DataFrame by one or more columns.
• df.groupby(): Groups data based on a column or multiple columns.
• df.pivot_table(): Creates pivot tables to summarize data.
• df.apply(): Applies a function to each element or row in the DataFrame.
6. Data Cleaning:
• df.isnull(): Checks for missing values.
• df.dropna(): Removes rows or columns with missing values.
• df.fillna(): Fills missing values with specified values.
7. Data Aggregation:
• df.sum(), df.mean(), df.median(): Compute various summary statistics.
• df.max(), df.min(): Find the maximum and minimum values.
• df.count(): Counts the number of non-null elements.
8. Data Visualization Integration:
• Pandas integrates with data visualization libraries like Matplotlib and Seaborn to create plots and
charts directly from DataFrames.
9. Merging and Joining Data:
• pd.concat(): Concatenates DataFrames along rows or columns.
• pd.merge(): Performs database-style joins on DataFrames.

6.2 NumPy
NumPy (Numerical Python) is a fundamental library in the Python ecosystem, particularly in the
context of data analysis and machine learning (ML). It provides support for working with
numerical data efficiently, making it an essential tool for data scientists and ML practitioners.
Here's how NumPy is used in data analysis and ML, along with some key functions:

Data Representation:
• ndarray: NumPy's core data structure is the ndarray (N-dimensional array). It allows for efficient
storage and manipulation of multi-dimensional data, such as matrices and tensors. This is crucial
in data analysis and ML where datasets are often multi-dimensional.

Data Cleaning and Preprocessing:

15
• Handling Missing Data: NumPy provides functions like np.isnan() and np.nan_to_num() for
identifying and handling missing data, a common preprocessing step in data analysis.

• Data Transformation: NumPy allows you to reshape and transform data using functions like
np.reshape(), np.transpose(), and np.concatenate(). This is useful for preparing data for various
analysis and modeling tasks.

Data Exploration:
• Descriptive Statistics: NumPy offers functions for computing basic statistics, such as np.mean(),
np.median(), np.std(), and np.var(), which are essential for exploring and summarizing data.

Mathematical and Statistical Analysis:

• Statistical Functions: NumPy provides various statistical functions for hypothesis testing,
correlation analysis, and random sampling, such as np.corrcoef(), np.histogram(), and
np.random.choice().
• Linear Algebra: NumPy offers functions for linear algebra operations, including matrix
multiplication (np.dot()), eigenvalue and eigenvector computation (np.linalg.eig()), and solving
linear systems (np.linalg.solve()). These are essential in ML, especially for algorithms like
regression and dimensionality reduction.

Machine Learning:
• Data Representation: In ML, datasets are often represented as NumPy arrays. Many ML libraries,
including Scikit-Learn, expect data in this format.
• Feature Engineering: NumPy is used to create new features and transform existing ones, a critical
aspect of feature engineering in ML.
• Performance Optimization: NumPy's efficient array operations are crucial for optimizing ML
algorithms, particularly when working with large datasets.

Random Number Generation:

• Random Sampling: NumPy's random number generation functions (np.random) are used for tasks
like creating synthetic datasets, performing simulations, and generating random samples. This is
important for techniques like bootstrapping and Monte Carlo simulations in data analysis and ML.

Integration with Data Visualization:

• Data Visualization: NumPy integrates seamlessly with data visualization libraries like Matplotlib
and Seaborn, enabling data scientists and ML practitioners to create visualizations based on
numerical data.
Handling Missing Data:

16
o Missing Data: NumPy provides tools to identify and manage missing data, which is a common issue

in data analysis and ML tasks.

6.3 Matplotlib
Matplotlib is a powerful Python library for creating data visualizations and plots. It provides various
functions and modules that enable users to customize, create, and display a wide range of
visualizations. Here are some key functions and concepts associated with Matplotlib in the context
of data analysis and visualization:

Basic Plotting Functions:

• plt.plot(): Creates line plots and can be used for visualizing trends over continuous data points.
• plt.scatter(): Generates scatter plots for visualizing relationships between two variables.
• plt.bar(), plt.barh(): Creates bar charts for displaying categorical data.
• plt.hist(): Generates histograms for visualizing the distribution of data.
• plt.boxplot(), plt.violinplot(): Used to create box plots and violin plots for displaying the
distribution of data and identifying outliers.
• plt.pie(): Generates pie charts for displaying parts of a whole.

Customization and Styling:

• plt.xlabel(), plt.ylabel(): Adds labels to the x and y-axes, respectively.
• plt.title(): Sets the title of the plot.
• plt.legend(): Adds a legend to the plot.
• plt.grid(): Displays a grid on the plot.
• plt.xlim(), plt.ylim(): Sets the limits for the x and y-axes.
• plt.xticks(), plt.yticks(): Sets the tick positions and labels for the x and y-axes.
• plt.axhline(), plt.axvline(): Adds horizontal and vertical lines to the plot.
• plt.annotate(): Adds text annotations to specific data points.

Subplots and Multiple Axes:

• plt.subplots(): Creates a grid of subplots within a single figure.
• ax = fig.add_subplot(): Adds individual subplots to a figure.
• ax.twinx(), ax.twiny(): Adds a secondary y-axis or x-axis to a subplot.

Annotations and Text:

• plt.text(): Adds text to arbitrary positions on the plot.

17
• plt.annotate(): Annotates specific data points with arrows and labels.

6.4 Seaborn
Seaborn is a Python data visualization library based on Matplotlib that provides a high-level
interface for creating informative and aesthetically pleasing statistical graphics. It is particularly
well-suited for data analysis and exploration, as it simplifies the process of creating complex
visualizations with concise code.
Here's an explanation of Seaborn in the context of data analysis and visualization, along with its
key functions: Advantages of Seaborn:
1. High-Level Interface: Seaborn is designed to work seamlessly with Pandas DataFrames, making
it easier to visualize data directly from data structures commonly used in data analysis.
2. Beautiful Aesthetics: Seaborn provides attractive default styles and color palettes that enhance the
visual appeal of plots.
3. Statistical Plotting: Seaborn specializes in creating statistical plots that help users understand data
distributions, relationships, and patterns.

Basic Plotting Functions:

• sns.histplot(): Creates histograms for visualizing the distribution of data.
• sns.kdeplot(): Generates kernel density estimate (KDE) plots to visualize the distribution of
data.
• sns.scatterplot(): Creates scatter plots for visualizing relationships between two variables.
• sns.boxplot(), sns.violinplot(): Used to create box plots and violin plots to show the
distribution of data and identify outliers.
• sns.barplot(), sns.countplot(): Creates bar charts for displaying categorical data.
• sns.lineplot(): Generates line plots for visualizing trends over continuous data points.
• sns.pairplot(): Creates a matrix of scatter plots for examining relationships between multiple
variables in a dataset.

Statistical Enhancements:
• sns.regplot(): Combines a scatter plot with a linear regression fit line.
• sns.lmplot(): Creates regression plots for visualizing relationships between variables.

Subplots and Multiple Axes:

• sns.FacetGrid(): Provides a grid of subplots for visualizing relationships between variables
using different facets.

Pairwise Relationships:
18
• sns.pairplot(): Generates a grid of scatter plots for examining pairwise relationships between
numerical columns in a dataset, with histograms along the diagonal.

Heatmaps:
• sns.heatmap(): Generates heatmaps to visualize the correlation matrix or other 2D data
structures.

Saving and Exporting Plots:

• Plots created with Seaborn can be saved using Matplotlib functions like plt.savefig().

6.5 Scikit-Learn
Scikit-Learn, often referred to as sklearn, is a Python library for machine learning that provides a wide
range of functions and tools for various aspects of machine learning tasks. Below, I'll explain
Scikit-Learn in the context of machine learning, along with some key functions and concepts:

Data Preparation:
• Data Splitting: train_test_split(): Splits a dataset into training and testing sets for model
evaluation.
• Data Preprocessing: Functions like StandardScaler() and MinMaxScaler() are used to scale
and normalize features. LabelEncoder() and OneHotEncoder() are used for encoding
categorical variables.

Supervised Learning:
• Classification: Scikit-Learn includes classifiers like LogisticRegression,
DecisionTreeClassifier, RandomForestClassifier, and more. Key functions include fit(),
predict(), and score().
• Regression: Regression models like LinearRegression, Ridge, and Lasso are available for
predictive modeling. Similar functions as in classification are used for regression tasks.

Unsupervised Learning:
• Clustering: Scikit-Learn provides clustering algorithms such as KMeans, DBSCAN, and
AgglomerativeClustering. Key functions include fit() and predict().
• Dimensionality Reduction: Techniques like PCA (Principal Component Analysis) and TSNE
(tdistributed Stochastic Neighbor Embedding) are used for dimensionality reduction and
visualization.

Model Evaluation:
• Cross-Validation: cross_val_score() and KFold() are used for k-fold cross-validation to
estimate a model's performance on unseen data.

19
• Metrics: Scikit-Learn provides metrics like accuracy_score, precision_score, recall_score,
f1_score, and mean_squared_error for evaluating model performance.

Hyperparameter Tuning:
• Grid Search: GridSearchCV() allows you to perform hyperparameter tuning by specifying a
grid of hyperparameters to search over.
• Randomized Search: RandomizedSearchCV() performs hyperparameter tuning using
randomized search, which is often faster than grid search.

6.6 TensorFlow
TensorFlow is an open-source machine learning framework developed by Google. It's designed for
creating, training, and deploying machine learning models, particularly deep learning models.
TensorFlow allows you to build and train neural networks for a wide range of machine learning
tasks. Here's an explanation of TensorFlow in the context of machine learning, along with some
key functions and concepts:

TensorFlow Core:
• Tensors: TensorFlow is named after its core concept, tensors, which are multi-dimensional
arrays. Tensors can be constants, variables, or placeholders.
• Computational Graph: TensorFlow builds a computational graph that represents the
operations to be performed on tensors. This allows for efficient execution and optimization.

Loss Functions and Optimizers:

• tf.losses: TensorFlow includes various loss functions (e.g., mean_squared_error,
categorical_crossentropy) to measure the model's error.
• tf.optimizers: It offers optimization algorithms such as SGD, Adam, and RMSprop to
minimize the loss during training.

Model Training:
• model.compile(): Configures the model with the chosen loss function, optimizer, and metrics.

• model.fit(): Trains the model on labeled training data, specifying the number of epochs and
batch size.

Model Evaluation:
• model.evaluate(): Evaluates the trained model on a test dataset to assess its performance using
metrics like accuracy, loss, etc.

20
Chapter 8
Machine Learning Algorithms

8.1 Logistic Regression

Logistic regression is a supervised machine learning algorithm mainly used for classification tasks
where the goal is to predict the probability that an instance of belonging to a given class. It is used for
classification algorithms its name is logistic regression. it’s referred to as regression because it takes
the output of the linear regression function as input and uses a sigmoid function to estimate the
probability for the given class. Logistic regression is used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic function, which
predicts two maximum values (0 or 1).

Logistic Function (Sigmoid Function):

• The sigmoid function is a mathematical function used to map the predicted values to probabilities.
• It maps any real value into another value within a range of 0 and 1. o The value of the logistic
regression must be between 0 and 1, which cannot go beyond this limit, so it forms a curve like the
“S” form.
• The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the probability of either
0 or 1. Such as values above the threshold value tends to 1, and a value below the threshold values
tends to 0.

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:
1. Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
2. Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types of
the dependent variable, such as “cat”, “dogs”, or “sheep”.
3. Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as “low”, “Medium”, or “High”.

21
FIG 7
Import statements
# from sklearn.linear_model import LogisticRegression
# from sklearn.model_selection import train_test_split

8.2 Linear Regression

Linear regression is a type of supervised machine learning algorithm that computes the linear
relationship between a dependent variable and one or more independent features. When the number
of the independent feature, is 1 then it is known as Univariate Linear regression, and in the case of
more than one feature, it is known as multivariate linear regression. The goal of the algorithm is to
find the best linear equation that can predict the value of the dependent variable based on the
independent variables.

fig. Linear Regression

FIG 8

22
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:

• Simple Linear Regression:

If a single independent variable is used to predict the value of a numerical dependent variable, then
such a Linear Regression algorithm is called Simple Linear Regression.

• Multiple Linear regression:

If more than one independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Multiple Linear Regression.

 Cost function - The different values for weights or coefficient of lines (a0, a1) gives the different line
of regression, and the cost function is used to estimate the values of the coefficient for the best fit
line. o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps the input
variable to the output variable. This mapping function is also known as Hypothesis function. For
Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average
of squared error occurred between the predicted values and actual values.

 Gradient Descent:

o Gradient descent is used to minimize the MSE by calculating the gradient of the cost
function. o A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function. o It is done by a random selection of values of coefficient and then
iteratively update the values to reach the minimum cost function.
 R-squared method: o R-squared is a statistical method that determines the goodness of fit.
o It measures the strength of the relationship between the dependent and independent variables
on a scale of 0-100%.
o The high value of R-square determines the less difference between the predicted values and
actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple determination for
multiple regression.

23
o It can be calculated from the below formula:

8.3 Decision Tree

A decision tree is a type of supervised learning algorithm that is commonly used in machine learning
to model and predict outcomes based on input data. It is a tree-like structure where each internal
node sets on attribute, each branch corresponds to attribute value.

FIG 9
8.4 Random Forest
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree
and based on the majority votes of predictions, and it predicts the final output. The greater number of
trees in the forest leads to higher accuracy and prevents the problem of overfitting.

24
FIG 10
Why use Random Forest?
Below are some points that explain why we should use the Random Forest
algorithm. o It takes less training time as compared to other algorithms. o It
predicts output with high accuracy, even for the large dataset it runs efficiently.
o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by combining N decision tree,
and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new data points
to the category that wins the majority votes.

Applications of Random Forest

There are mainly four sectors where Random forest mostly used:

o Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.

25
o Land Use: We can identify the areas of similar land use by this algorithm. o Marketing:
Marketing trends can be identified using this algorithm.

Fitting the Random Forest algorithm to the training set:

Now we will fit the Random Forest algorithm to the training set. To fit it, we will import the
RandomForestClassifier class from the sklearn.ensemble library. The code is given below:
1. #Fitting Decision Tree classifier to the training set
2. from sklearn.ensemble import RandomForestClassifier
3. classifier= RandomForestClassifier(n_estimators= 10, criterion="entropy") 4.
classifier.fit(x_train, y_train)

In the above code, the classifier object takes below parameters:

o n_estimators= The required number of trees in the Random Forest. The default value is 10. We
can choose any number but need to take care of the overfitting issue.
o criterion= It is a function to analyze the accuracy of the split. Here we have taken "entropy" for
the information gain.

8.5 Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which
is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
ndimensional space into classes so that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are classified using a decision boundary
or hyperplane:

26
FIG 11
SVM algorithm can be used for Face detection, image classification, text categorization, etc.

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier
used is called as Non-linear SVM classifier.

8.6 Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used
for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training dataset. o Naïve Bayes
Classifier is one of the simple and most effective Classification algorithms which helps in building the
fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

27
Bayes' Theorem: o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the conditional
probability.
o The formula for Bayes' theorem is given as:

Where ,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence. P(B) is Marginal
Probability: Probability of Evidence.

28
Chapter 9
Natural Language Processing

 What is Natural Language Processing (NLP)?

Natural Language Processing (NLP) is a field of computer science that deals with the interaction
between computers and human language. It enables computers to understand, interpret, and
generate human language in a meaningful way. NLP encompasses a wide range of tasks, from basic
language analysis to complex tasks like machine translation, text summarization, and question
answering.

NLP systems leverage various techniques from linguistics, computer science, and artificial
intelligence to process text and speech data. They employ algorithms to analyze the structure,
meaning, and context of language, allowing computers to extract valuable information, perform
tasks, and communicate effectively with humans.

 Core Tasks in NLP

• Text Analysis
Breaking down text into its fundamental components, such as words, sentences, and grammatical
structures. This includes tasks like tokenization, part-of-speech tagging, and named entity
recognition.
• Language Modeling
Predicting the probability of a sequence of words or phrases, providing a statistical understanding
of language structure. This is crucial for tasks like machine translation, text summarization, and
speech recognition.
• Machine Translation
Automatically translating text from one language to another. This involves understanding the
meaning of the source text and generating an accurate and natural-sounding translation in the target
language.
• Text Summarization
Generating a concise and informative summary of a longer text. This involves identifying the key
concepts and information in the text and presenting them in a clear and concise manner.
• Named entity recognition
It aims to extract entities in a piece of text into predefined categories such as personal names,
organizations, locations, and quantities. The input to such a model is generally text, and the output
is the various named entities along with their start and end positions. Named entity recognition is
useful in applications such as summarizing news articles and combating disinformation.
29
• Spam detection
It is a prevalent binary classification problem in NLP, where the purpose is to classify emails as either
spam or not. Spam detectors take as input an email text along with various other subtexts like title
and sender’s name. They aim to output the probability that the mail is spam. Email providers like
Gmail use such models to provide a better user experience by detecting unsolicited and unwanted
emails and moving them to a designated spam folder.
• Speech Recognition
By converting audio recordings into transcripts, NLP models can analyze spoken language, enabling
applications like virtual assistants, call center analytics, and audio content indexing.
• Grammatical error correction
It models encode grammatical rules to correct the grammar within text. This is viewed mainly as a
sequence-to-sequence task, where a model is trained on an ungrammatical sentence as input and a
correct sentence as output.

 NLP Algorithms
1. Statistical Methods
Statistical methods, such as Hidden Markov Models (HMMs) and Conditional Random Fields
(CRFs), have been widely used in NLP for tasks like part-of-speech tagging and named entity
recognition. These methods rely on statistical probabilities to predict linguistic patterns.

2. Neural Network Models

Neural network models, particularly deep learning techniques, have revolutionized NLP. Recurrent
Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformer models have
shown impressive results in tasks like machine translation, text summarization, and question
answering.

30
Chapter 10
Deep Learning Algorithms

9.1 Artificial Neural Network

Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our Artificial
Neural Network tutorial is developed for beginners as well as professions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence
modelled after the brain. An Artificial neural network is usually a computational network based on
biological neural networks that construct the structure of the human brain. Similar to a human brain
has neurons interconnected to each other, artificial neural networks also have neurons that are linked
to each other in various layers of the networks. These neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network. In this
tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing map, Building
blocks, unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological neural networks that develop the
structure of a human brain. Similar to the human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected to one another in various layers of
the networks. These neurons are known as nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

31
The typical Artificial Neural Network looks something like the given figure.

FIG 12

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus
represents Nodes, synapse represents Weights, and Axon represents Output.

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to understand
what a neural network consists of. In order to define a neural network that consists of a large
number of artificial neurons, which are termed units arranged in a sequence of layers. Let us look
at various types of layers available in an artificial neural network.
Artificial Neural Network primarily consists of three layers:

FIG 13

32
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in
output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a
bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of
task we are performing.

9.2 Convolution Neural Network

Convolutional Neural Network (CNN) is the extended version of artificial neural networks (ANN)
which is predominantly used to extract the feature from the grid-like matrix dataset. For example,
visual datasets like images or videos where data patterns play an extensive role.

CNN architecture
Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer,
Pooling layer, and fully connected layers.

FIG 14: Simple CNN architecture

33
The Convolutional layer applies filters to the input image to extract features, the Pooling layer down
samples the image to reduce computation, and the fully connected layer makes the final prediction.
The network learns the optimal filters through backpropagation and gradient descent.

Layers used to build ConvNets :

A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a

sequence of layers, and every layer transforms one volume to another through a differentiable function.

Types of layers:

Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.

• Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will
be an image or a sequence of images. This layer holds the raw input of the image with width 32, height
32, and depth 3.

• Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It
applies a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller
matrices usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot
product between kernel weight and the corresponding input image patch. The output of this layer is
referred ad feature maps. Suppose we use a total of 12 filters for this layer we’ll get an output volume
of dimension 32 x 32 x 12.

• Activation Layer: By adding an activation function to the output of the preceding layer, activation
layers add nonlinearity to the network. it will apply an element-wise activation function to the output
of the convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky
RELU, etc. The volume remains unchanged hence output volume will have dimensions 32 x 32 x 12.

• Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the
size of volume which makes the computation fast reduces memory and also prevents overfitting. Two
common types of pooling layers are max pooling and average pooling. If we use a max pool with 2
x 2 filters and stride 2, the resultant volume will be of dimension 16x16x12.

34
• Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for categorization
or regression.

• Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.

• Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or soft max which converts the output of each class into the probability
score of each class.

35
Chapter 11 CONCLUSION

“Data science is not just a technical discipline; it's a strategic asset. It has the power to reshape
industries, enhance human experiences, and address global challenges. The ability to derive actionable
insights from data sets organizations apart in today’s competitive landscape.
As we move forward, the role of data science will continue to expand, integrating with emerging
technolo gies like artificial intelligence and the Internet of Things (IoT). This synergy will unlock even
greater pot ential, pushing the boundaries of what’s possible. In essence, data science is the key to
unlocking a future driven by data, where informed decisions lead to impactful and sustainable
outcomes.”

“The integration of Artificial Intelligence (AI) and Machine Learning (ML) with Python has opened up
a world of possibilities for solving complex problems, automating tasks, and making data driven
decisions. AIML, which stands for Artificial Intelligence and Machine Learning, leverages Python's rich
ecosystem of libraries and tools to create intelligent systems, predictive models, and data driven
applications. Whether it's natural language processing, image recognition, recommendation systems, or
predictive analytics, Python's versatility and extensive AI and ML libraries like TensorFlow, Scikit-Learn,
and Keras have made it a leading choice for researchers, data scientists, and developers. AIML using
Python empowers us to harness the power of data and create intelligent solutions that drive innovation
and transform industries."

“In the field of artificial intelligence and machine learning has made substantial progress in the past five
years and is having real-world influence on people, institutions and culture. Even if the current state of
AI technology is still far short of the field’s creation initiative of reconstructing full human-like
intelligence in machines, research and development teams are exploiting these moves forward and
absorbing them into society-facing applications. Artificial Intelligence has helped people to create
robotic and computer systems to make their businesses more economically efficient. Life was forever
changed by AI because humans could use the support of machines to complete repetitive, dangerous and
difficult tasks. With the help of AI machines, people could get jobs done faster and easier. Businesses
could improve the efficiency of manufacturing output, data processing and customer service.”

36
37

ProblemSet Notebook17-18
100% (1)
ProblemSet Notebook17-18
97 pages
Internship Report 40 Pages
No ratings yet
Internship Report 40 Pages
40 pages
Six Sigma Tools
No ratings yet
Six Sigma Tools
24 pages
Ai ML Training Report
No ratings yet
Ai ML Training Report
44 pages
SAPM-practise Problems 1
No ratings yet
SAPM-practise Problems 1
35 pages
PSMOD - Sample Practical Test (A)
No ratings yet
PSMOD - Sample Practical Test (A)
3 pages
Finalest Doc of Report (Submit This One) FINALLY DONE (1) Final
No ratings yet
Finalest Doc of Report (Submit This One) FINALLY DONE (1) Final
154 pages
Anand First Page
No ratings yet
Anand First Page
351 pages
Artificial Intelligence Training and Placement Program - Bangalore and Coimbatore
100% (1)
Artificial Intelligence Training and Placement Program - Bangalore and Coimbatore
15 pages
Quantitative Techniques Notes
No ratings yet
Quantitative Techniques Notes
21 pages
ML - Report Daiva
No ratings yet
ML - Report Daiva
39 pages
Internship Report: Certified BY
No ratings yet
Internship Report: Certified BY
37 pages
Ibis - 2022 - Sarà - Welcome Aboard Are Birds Using Ships
No ratings yet
Ibis - 2022 - Sarà - Welcome Aboard Are Birds Using Ships
12 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
ML Web App Presentation
No ratings yet
ML Web App Presentation
10 pages
Data Science Report RTU
No ratings yet
Data Science Report RTU
101 pages
The Good Corporate Governance Effect Towards FirmValue Mediated by Bank Soundness Ratio
No ratings yet
The Good Corporate Governance Effect Towards FirmValue Mediated by Bank Soundness Ratio
5 pages
Phan 2020
No ratings yet
Phan 2020
8 pages
Summary Report
No ratings yet
Summary Report
26 pages
Tax Planning Strategies and Profitability of Quoted Manufacturing Companies in Nigeria
No ratings yet
Tax Planning Strategies and Profitability of Quoted Manufacturing Companies in Nigeria
11 pages
Aiml Virtual Internship Report
No ratings yet
Aiml Virtual Internship Report
99 pages
Metrics 2019 Lec3
No ratings yet
Metrics 2019 Lec3
59 pages
2010 Skipping Class in College and Exam Performance Evidence From A Regression Discontinuity Classroom Experiment
No ratings yet
2010 Skipping Class in College and Exam Performance Evidence From A Regression Discontinuity Classroom Experiment
10 pages
Module 6D - Multiple Linear Regression Analysis PDF
No ratings yet
Module 6D - Multiple Linear Regression Analysis PDF
42 pages
Sivuuuu
No ratings yet
Sivuuuu
55 pages
Leila
No ratings yet
Leila
6 pages
Real Report
No ratings yet
Real Report
62 pages
ML Report
No ratings yet
ML Report
46 pages
The Effect of Religious Tourism Experiences On Personal Values
No ratings yet
The Effect of Religious Tourism Experiences On Personal Values
10 pages
Eco 311 Module Test 2024 SE
No ratings yet
Eco 311 Module Test 2024 SE
9 pages
An Industrial Training Report On Data Science
No ratings yet
An Industrial Training Report On Data Science
36 pages
E.venkatasai Ir
No ratings yet
E.venkatasai Ir
204 pages
Nisha Report
No ratings yet
Nisha Report
50 pages
Report
No ratings yet
Report
28 pages
21P31A05C3
No ratings yet
21P31A05C3
54 pages
Food Wastage in Indian Dining Halls
No ratings yet
Food Wastage in Indian Dining Halls
17 pages
C0 Report
No ratings yet
C0 Report
50 pages
Avinash PDF
No ratings yet
Avinash PDF
23 pages
Ali Usman Noor 2015
No ratings yet
Ali Usman Noor 2015
10 pages
Report
No ratings yet
Report
87 pages
Development of Asphalt Pavement Temperature Model For Tropical Climate Conditions in West Bali Region
No ratings yet
Development of Asphalt Pavement Temperature Model For Tropical Climate Conditions in West Bali Region
7 pages
Econometrics I: Professor William Greene Stern School of Business Department of Economics
No ratings yet
Econometrics I: Professor William Greene Stern School of Business Department of Economics
49 pages
Data Science Report
No ratings yet
Data Science Report
46 pages
Toolkit Note: Testing and Adjusting For Attrition in Household Panel Data
No ratings yet
Toolkit Note: Testing and Adjusting For Attrition in Household Panel Data
12 pages
Ali, T., & Waheed, N. (2017)
No ratings yet
Ali, T., & Waheed, N. (2017)
8 pages
Data Science IT Fsdfegg
No ratings yet
Data Science IT Fsdfegg
31 pages
Intership Report
No ratings yet
Intership Report
41 pages
Inferential Statistics Pearson's Product Moment Correlation, R Spearman's Rho: R
No ratings yet
Inferential Statistics Pearson's Product Moment Correlation, R Spearman's Rho: R
11 pages
Final Modified Document PG
No ratings yet
Final Modified Document PG
58 pages
ML Internship
No ratings yet
ML Internship
40 pages
Regression Analysis: League Points Using Goals Scored and Wins by 3 or More Goals
No ratings yet
Regression Analysis: League Points Using Goals Scored and Wins by 3 or More Goals
3 pages
Machine Learning &deep Learning in Python &R
No ratings yet
Machine Learning &deep Learning in Python &R
48 pages
MEGAH
No ratings yet
MEGAH
36 pages
Data Science Course Brochure
No ratings yet
Data Science Course Brochure
16 pages
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
34 pages
MD Salman
No ratings yet
MD Salman
22 pages
Tushar Internship Report 4th Year
No ratings yet
Tushar Internship Report 4th Year
17 pages
2020 - Applied Statistics For Environmental Science With R
No ratings yet
2020 - Applied Statistics For Environmental Science With R
3 pages
Data Science Minimum - 10 Essential Skills You Need To Know To Start Doing Data Science - KDnuggets
No ratings yet
Data Science Minimum - 10 Essential Skills You Need To Know To Start Doing Data Science - KDnuggets
8 pages
Ajay Internship
No ratings yet
Ajay Internship
35 pages
Natural Language Processing Professional Program
No ratings yet
Natural Language Processing Professional Program
12 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
File of ML
No ratings yet
File of ML
42 pages
Fazli Bipin
No ratings yet
Fazli Bipin
24 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
Nisha Resume
No ratings yet
Nisha Resume
2 pages
IndustrialTraining Report
No ratings yet
IndustrialTraining Report
26 pages
Sushil 7th (1 PDF
No ratings yet
Sushil 7th (1 PDF
29 pages
FINAL INTERN DOCUMENT Dhanunjai
No ratings yet
FINAL INTERN DOCUMENT Dhanunjai
26 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Final Last
No ratings yet
Final Last
34 pages
Ra 1711003040118
No ratings yet
Ra 1711003040118
34 pages
Python Training Report (ML)
No ratings yet
Python Training Report (ML)
19 pages
J1 (SkillDzire)
No ratings yet
J1 (SkillDzire)
49 pages
Naveen Python - For - Data-Science-Report
100% (1)
Naveen Python - For - Data-Science-Report
24 pages
Management Science Notes
No ratings yet
Management Science Notes
13 pages
Sachin
No ratings yet
Sachin
28 pages
Data Sceince and AI Training Curriculum - V4.0
No ratings yet
Data Sceince and AI Training Curriculum - V4.0
19 pages
Data Science Training Report.
100% (1)
Data Science Training Report.
73 pages
EE0005 Introduction To Data Science and Artificial Intelligence - OBTL
No ratings yet
EE0005 Introduction To Data Science and Artificial Intelligence - OBTL
8 pages
21ai66 ML Lab Manual
No ratings yet
21ai66 ML Lab Manual
41 pages
L1S18MSAF0004 Muhammad Sufyan Sarwar
No ratings yet
L1S18MSAF0004 Muhammad Sufyan Sarwar
3 pages
Review Question Stat
No ratings yet
Review Question Stat
19 pages
TRAINING REPORT Abha Shrivas 0801EC171002
No ratings yet
TRAINING REPORT Abha Shrivas 0801EC171002
17 pages
Dilip Bagercha Report
No ratings yet
Dilip Bagercha Report
53 pages
Report Data Analysis
No ratings yet
Report Data Analysis
45 pages
Post Graduate Diploma in Machine Learning & Artificial Intelligence (PGD-ML&AI)
No ratings yet
Post Graduate Diploma in Machine Learning & Artificial Intelligence (PGD-ML&AI)
19 pages
Shareef
No ratings yet
Shareef
29 pages
Report of Industrial Training
No ratings yet
Report of Industrial Training
22 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet

Data Science: Industrial Training Report

Uploaded by

Data Science: Industrial Training Report

Uploaded by

An

Industrial Training Report

Submitted To: Submitted By:

Department of Artificial Intelligence and Data Science

Department of Artificial Intelligence and Data Science

This is to certify that the work, which is being presented in the

Practical training seminar report for practical training taken at

“CELEBAL TECH NOLOGIES” entitled

“DATA SCIENCE” submitted by Ms. Manoj Kumari a student of

Mr. Ankur Dutt Sharma

Roll No.: 21EAIAD020

On the completion of the industrial training on DATA SCIENCE.

• Internships are generally thought of to be reserved for college students looking

 What is Data Science?

 The Data Science Lifecycle

• Use of Data in the world of AI

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is

Python Quicker: Keywords, Data Types, Operators

• What is Python Module

• About Computer Vision in AI:

• About Digital Image Processing:

Data Cleaning and Preprocessing:

Mathematical and Statistical Analysis:

Random Number Generation:

Integration with Data Visualization:

in data analysis and ML tasks.

Basic Plotting Functions:

Customization and Styling:

Subplots and Multiple Axes:

Annotations and Text:

Basic Plotting Functions:

Subplots and Multiple Axes:

Saving and Exporting Plots:

Loss Functions and Optimizers:

8.1 Logistic Regression

Logistic Function (Sigmoid Function):

Type of Logistic Regression:

8.2 Linear Regression

fig. Linear Regression

• Simple Linear Regression:

• Multiple Linear regression:

8.3 Decision Tree

How does Random Forest algorithm work?

Applications of Random Forest

Fitting the Random Forest algorithm to the training set:

In the above code, the classifier object takes below parameters:

8.5 Support Vector Machine Algorithm

SVM can be of two types:

8.6 Naïve Bayes Classifier Algorithm

 What is Natural Language Processing (NLP)?

 Core Tasks in NLP

2. Neural Network Models

9.1 Artificial Neural Network

What is Artificial Neural Network?

The architecture of an artificial neural network:

9.2 Convolution Neural Network

FIG 14: Simple CNN architecture

Layers used to build ConvNets :

A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a

Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.

You might also like