Machine Learning Report PDF
Machine Learning Report PDF
On
Submitted by
Ishak Gauri
Department of computer
science and technology,
Quantum University,Roorkee
Asst.Prof.Bhanu Pratap
Department of computer
science and engineering
………………………………………
(Signature of Student)
Name of Student: - Ishak gauri
Date:………………..
SUMMER TRAINING CERTIFICATE
About Codsoft
Who We Are
CodSoft are IT services and IT consultancy that specializes in creating innovative
solutions for businesses. We are passionate about technology and believe in the
power of software to transform the world. Our internship program is just one of the
ways in which we are investing in the future of the industry.
At CodSoft, we believe practical knowledge is the key to success in the tech
industry. Our aim is to help students lacking basic skills by offering hands-on
learning through live projects and real-world examples.
INTERNSHIP POSITION
Machine learning intern
Gain mastery in Machine learning from the comfort of your home and
open doors to amazing job opportunities with our certification
program. Enroll in our intensive 4-week internship, where you'll
acquire knowledge in web application development and deployment .
Establish a strong base for your career and real-world implementation
within a supportive and collaborative setting.
TABLE OF CONTENTS
1. Introduction………………………………………………………………………………07
1.1. A Taste of Machine Learning……………………………………………………….07
1.2. Relation to Data Mining………………………………………………………….….07
1.3. Relation to Optimization………………………………………………………….…07
1.4. Relation to Statistics…………………………………………………………............08
1.5. Future of Machine Learning………………………………………………………....08
2. Technology Learnt……………………………………………………………………….08
2.1. Introduction to Artificial Intelligence and Machine Learning……………………....08
2.1.1. Definition of Artificial Intelligence…………………………………………..08
2.1.2. Definition of Machine Learning…………………………………………...…09
2.1.3. Machine Learning Algorithms……………………………………………….10
2.1.4. Applications of Machine Learning………………………………………...…11
2.2. Techniques of Machine Learning…………………………………………………....12
2.2.1. Supervised Learning…………………………………………...……………..12
2.2.2. Unsupervised Learning……………………………………………...………..16
2.2.3. Semi- supervised Learning……………………………………………..…….18
2.2.4. Reinforcement Learning…………………………………………………..….19
2.2.5. Some Important Considerations in Machine Learning…………………........19
2.3. Data Preprocessing………………………………………………………….……....20
2.3.1. Data Preparation………………………………………………………….….20
2.3.2. Feature Engineering…………………………………………………….……21
2.3.3. Feature Scaling…………………………………………………………….…22
2.3.4. Datasets………………………………………………………………………24
2.3.5. Dimensionality Reduction with Principal Component Analysis………….….24
2.4. Math Refresher………………………………………………………………………25
2.4.1. Concept of Linear Algebra……………………………………………...……25
2.4.2. Eigenvalues, Eigenvectors, and Eigen decomposition……………………....30
2.4.3. Introduction to Calculus…………………………………………………..….30
2.4.4. Probability and Statistics………………………………………………….….31
2.5. Supervised learning……………………………………………………………….…34
2.5.1. Regression……………………………………………………………………34
2.5.1.1. Linear Regression…………………………………………………….35
2.5.1.2. Multiple Linear Regression…………………………………………..35
2.5.1.3. Polynomial Regression……………………………………………….36
2.5.1.4. Decision Tree Regression…………………………………………….37
2.5.1.5. Random Forest Regression…………………………………………...37
2.5.2. Classification…………………………………………………………………38
2.5.2.1. Linear Models………………………………………………………..39
2.5.2.1.1. Logistic Regression…………………………………………..39
2.5.2.1.2. Support Vector machines…………………………………….39
2.5.2.2. Nonlinear Models…………………………………………………….40
2.5.2.2.1. K-Nearest Neighbors (KNN)…………………………………40
2.5.2.2.2. Kernel Support Vector Machines (SVM)…………………….40
2.5.2.2.3. Naïve Bayes…………………………………………………..41
2.5.2.2.4. Decision Tree Classification…………………………………41
1. Introduction
1.1.A Taste of Machine Learning
✓ Arthur Samuel, an American pioneer in the field of computer gaming and
✓artificial intelligence, coined the term "Machine Learning" in 1959.
Over the past two decades Machine Learning has become one of the mainstays
of information technology.
✓With the ever-increasing amounts of data becoming available there is good
reason to believe that smart data analysis will become even more pervasive as
a necessary ingredient for technological progress.
1.2. Relation to Data Mining
• Data mining uses many machine learning methods, but with different goals; on the
other hand, machine learning also employs data mining methods as "unsupervised
learning" or as a preprocessing step to improve learner accuracy.
1.3. Relation to Optimization
Machine learning also has intimate ties to optimization: many learning problems
are formulated as minimization of some loss function on a training set of examples.
Loss functions express the discrepancy between the predictions of the model being
trained and the actual problem instances.
1.4.Relation to Statistics
Michael I. Jordan suggested the term data science as a placeholder to call the overall
field.
Leo Breiman distinguished two statistical modelling paradigms: data model and
algorithmic model, wherein "algorithmic model" means more or less the machine
learning algorithms like Random forest.
1.5.Future of Machine Learning
➢
Machine Learning can be a competitive advantage to any company be it a top MNC
or a startup as things that are currently being done manually will be done tomorrow
by machines.
➢
Machine Learning revolution will stay with us for long and so will be the future of
Machine Learning.
❖ 2. Technology Learnt
Introduction to AI & Machine Learning
Machine Learning is an approach or subset of Artificial Intelligence that is based on the idea
that machines can be given access to data along with the ability to learn from it.
❖ Define Machine Learning
Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience without
being explicitly programmed. Machine learning focuses on the development of
computer programs that can access data and use it learn for themselves.
❖ Features of Machine Learning
✓ Machine Learning is computing-intensive and generally requires a large
amount of training data.
✓ It involves repetitive training to improve the learning and decision making
of algorithms.
✓ As more data gets added, Machine Learning training can be automated for
learning new data patterns and adapting its algorithm.
❖ Traditional Approach
Traditional programming relies onhard-coded rules.
❖Machine Learning Approach
Machine Learning relies on learning patterns based on sample data.
✓
If learning on training data is too intensive, it may lead to overfitting–a situation
where the algorithm is not able to handle new testing data that it has not seen
before. The technique to keep data generic is called regularization.
❖ Examples of Supervised Learning
✓ Voice Assistants
✓Gmail Filters
✓ Weather Apps
❖ Types of Supervised Learning
✓ Classification
➢ Answers “ What class?”
➢Applied when the output has finite and discreet values Example: Social
media sentiment analysis has three potential outcomes, positive,
negative, or neutral
✓ Regression
➢Answers “How much?”
✓
Clustering
The most common unsupervised learning method is cluster analysis. It is used to
find data clusters so that each cluster has the most closely matched data.
✓
Visualization Algorithms
Visualization algorithms are unsupervised learning algorithms that accept unlabeled
data and display this data in an intuitive 2D or 3D format. The data is separated into
somewhat clear clusters to aid understanding.
✓
Anomaly Detection
This algorithm detects anomalies in data without any prior training.
❖Define 2.2.3.Semi-supervised
Semi-supervised Learning
Learning
Semi-supervised learning falls between unsupervised learning (without any labeled training
data) and supervised learning (with completely labeled training data).
❖
Example of Semi-supervised Learning
many degrees of freedom (such as a high-degree polynomial model) is likely to
have high variance and thus overfit the training data.
❖ Bias & Variance Dependencies
➢Increasing a model’s complexity will reduce its bias and increase its variance.
➢Conversely, reducing a model’s complexity will increase its bias and reduce its
❖
variance. This is why it is called a tradeoff.
What is Representational Learning
In Machine Learning, Representation refers to the way the data is presented. This
often make a huge difference in understanding.
Unlabeled✓Data
Test Data ✓
Validation Data
2.3.2.Feature Engineering
❖ ✓
Defin e Feature Engineering
The transformation stage in the data preparation process includes an important step
known as Feature Engineering.
Feature Engineering refers to selecting and extracting right features from the data that are
relevant to the task and model in consideration.
❖
Aspects of Feature Engineering
Featu✓re Selection
Most useful and relevant features are selected from the available data
✓
Feature Addition
New features are created by gathering new data
✓
Feature Extraction
Existing features are combined to develop more useful ones
✓
Feature Filtering
Filter out irrelevant features to make the modelling step easy
2.3.3. Feature Scaling
❖Define Feature Scaling
✓
Feature scaling is an important step in the data transformation stage
✓of data preparation process.
Feature Scaling is a method used in Machine Learning for
standardization of independent variables of data features.
❖Techniques of Feature Scaling
✓Standardization
▪Standardization is a popular feature scaling method, which gives data
the property of a standard normal distribution (also known as Gaussian
distribution).
▪ All features are standardized on the normal distribution (a mathematical
▪model).
The mean of each feature is centered at zero, and the feature column has
a standard deviation of one.
2.3.4.Datasets
➢Machine Learning problems often need training or testing datasets.
➢A dataset is a large repository of structured data.
➢In many cases, it has input and output labels that assist in Supervised Learning.
❖Clustering means
✓ Clustering is a Machine Learning technique that involves the grouping
of data points.
✓
It is an interconnected group of nodes akin to the vast network of layers of
neurons in a brain.
❖ 2.7.3. TensorFlow
TensorFlow is the open source Deep Learning library provided by Google.
4. Learning Outcome
➢ Have a good understanding of the fundamental issues and challenges of machine
learning: data, model selection, model complexity, etc.
Have an understanding of the strengths and weaknesses of many popular machine
➢
learning approaches.
Appreciate the underlying mathematical relationships within and across Machine
➢
Learning algorithms and the paradigms of supervised and un-supervised learning.
Be able to design and implement various machine learning algorithms in a range of
➢
real-world applications.
Ability to integrate machine learning libraries and mathematical and statistical tools
➢ with modern technologies
Ability to understand and apply scaling up machine learning techniques and
➢ associated computing techniques and technologies.
5. Gantt Chart
6. Bibliography
6.1.All Content used in this report is from
✓ https://www.simplilearn.com/
✓ https://www.wikipedia.org/
https://towardsdatascience.com/
✓ https://www.expertsystem.com/
https://www.coursera.org/
✓ https://www.edureka.co/
https://subhadipml.tech/
✓ https://www.forbes.com/
https://medium.com/
✓ https://www.google.com/
6.2. All Pictures are from
✓ https://ww
http w.simplilearn.com/s://www.google.com/
✓ https://www.wikipedia.org/
https://www.youtube.com/
✓ https://www.edureka.co/
✓
Hands-on Machine Learning with Scikit-learn & Tensorflow By Aurelien Geron
6.3.BooPky tIh roenfe Mrraecdh ainree Learning by Sebastian Raschka
✓