0% found this document useful (0 votes)

9 views29 pages

Machine Learning Section

This document introduces a Machine Learning course focusing on Python and Spark's MLlib library, outlining the structure of the course, including suggested readings, theory lectures, and consulting projects. It explains the difference between supervised and unsupervised learning, detailing how algorithms learn from labeled and unlabeled data, respectively. The document emphasizes the importance of understanding Spark's MLlib documentation and provides guidance for students with varying backgrounds in mathematics.

Uploaded by

abhimanyu thakur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views29 pages

Machine Learning Section

Uploaded by

abhimanyu thakur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Machine Learning

Let’s learn something!

Python and Spark

● It is now time to begin with the Machine

Learning Sections of the course!
● This introduction section will discuss a
general introduction to machine learning
and how Spark’s MLlib library works for
Machine Learning.
Python and Spark

● Most Machine Learning Sections have:

○ Suggested Reading Assignment
○ Basic Theory Lecture
○ Documentation Walkthrough
○ More realistic custom code example
○ Consulting Project
○ Consulting Project Solutions
Python and Spark

● The Consulting Projects are looser, more

realistic projects for you to attempt with
the skills you just learned.
● A dataset, some background, and a
problem is described, and you are free to
solve it however you want.
Python and Spark

● If you prefer a more guided approach to

problems, that’s totally okay!
● We have the custom code examples
before each Consulting Project.
● Plus, you can treat the Consulting Project
Solutions as an additional “code-along”!
Python and Spark

● Because different students have

different backgrounds in math, we will
keep the mathematics behind the
machine learning algorithms light.
Python and Spark

● If you are interested in reading more

about the math behind the algorithms
we discuss, we will be using
Introduction to Statistical Learning by
Gareth James as a companion book.
● It’s freely available online.
Companion Book

● Students who want the mathematical

theory should do the suggested reading
assignment that will appear for each
machine learning section.
● Otherwise, feel free to watch the Intro
Theory Lectures for the fundamentals.
Companion Book

● First Suggested Reading Assignment:

○ Read Chapters 1 & 2 to gain a
background understanding before
continuing to the Machine Learning
Lectures.
What is Machine Learning?

● Machine learning is a method of data analysis

that automates analytical model building.
● Using algorithms that iteratively learn from
data, machine learning allows computers to
ﬁnd hidden insights without being explicitly
programmed where to look.
What is it used for?

● Fraud detection. ● Recommendation Engines

● Web search results. ● Customer Segmentation
● Real-time ads on web pages ● Text Sentiment Analysis
● Credit scoring and next-best offers. ● Predicting Customer
● Prediction of equipment failures. Churn
● New pricing models. ● Pattern and image
● Network intrusion detection. recognition.
● Email spam ﬁltering.
● Financial Modeling
Machine Learning Process

Test
Data

Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building
Supervised Learning

● Spark’s MLlib is mainly designed for Supervised

and Unsupervised Learning tasks, with most of
its algorithms falling under those two
categories.
● Let’s discuss them in more detail and describe
how they are different!
Supervised Learning

● Supervised learning algorithms are trained

using labeled examples, such as an input
where the desired output is known.
● For example, a piece of equipment could have
data points labeled either “F” (failed) or “R”
(runs).
Supervised Learning

● The learning algorithm receives a set of inputs

along with the corresponding correct outputs,
and the algorithm learns by comparing its
actual output with correct outputs to ﬁnd
errors.
● It then modiﬁes the model accordingly.
Supervised Learning

● Through methods like classiﬁcation, regression,

prediction and gradient boosting, supervised
learning uses patterns to predict the values of
the label on additional unlabeled data.
● Supervised learning is commonly used in
applications where historical data predicts
likely future events.
Supervised Learning

● For example, it can anticipate when credit card

transactions are likely to be fraudulent or which
insurance customer is likely to ﬁle a claim.
● Or it can attempt to predict the price of a house
based on different features for houses for which
we have historical price data.
Unsupervised Learning

● Unsupervised learning is used against data

that has no historical labels.
● The system is not told the "right answer." The
algorithm must ﬁgure out what is being shown.
● The goal is to explore the data and ﬁnd some
structure within.
Unsupervised Learning

● For example, it can ﬁnd the main attributes

that separate customer segments from each
other.
● Popular techniques include self-organizing
maps, nearest-neighbor mapping, k-means
clustering and singular value decomposition.
● One issue is that it can be difﬁcult to evaluate
results of an unsupervised model!
Final Thoughts

● Machine Learning takes time to learn.

● Be patient with yourself and feel free to post to
the QA forums.
● No one course can be a reference for all
Machine Learning topics, but I’m always happy
to point you in the right direction!
Machine Learning
with Spark
Python and Spark

● Spark has its own MLlib for Machine

Learning.
● The future of MLlib utilizes the Spark 2.0
DataFrame syntax.
Python and Spark

● One of the main “quirks” of using MLlib is

that you need to format your data so that
eventually it just has one or two
columns:
○ Features, Labels (Supervised)
○ Features (Unsupervised)
Python and Spark

● This requires a little more data

processing work than some other
machine learning libraries, but the big
upside is that this exact same syntax
works with distributed data, which is no
small feat for what is going on “under the
hood”!
Python and Spark

● When working with Python and Spark

with MLlib, the documentation examples
are always with nicely formatted data.
● However, we’ll have our own custom
examples that have messier, more
realistic data!
Python and Spark

● We will also have consulting projects,

which set you loose on a real world data
project with a data set and a problem to
solve, without explicitly telling you what
to do!
Python and Spark

● A huge part of learning MLlib is getting

comfortable with the documentation!
● Being able to master the skill of ﬁnding
information (not memorization) is the
key to becoming a great Spark and
Python developer!
Python and Spark

● Fortunately, the Spark MLlib

documentation is quite good, and we’ll
constantly teach you how to refer to it
during each Machine Learning
Algorithm Section.
● Let’s jump to it now!
spark.apache.org

Chan, Jamie - Machine Learning With Python For Beginners - A Step-By-Step Guide With Hands-On Projects (Learn Coding Fast With Hands-On Project (2021) - Libgen - Li
100% (1)
Chan, Jamie - Machine Learning With Python For Beginners - A Step-By-Step Guide With Hands-On Projects (Learn Coding Fast With Hands-On Project (2021) - Libgen - Li
200 pages
Machine Learning
100% (1)
Machine Learning
405 pages
Machine Learning Section
No ratings yet
Machine Learning Section
31 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
24 pages
R22 Machine Learning Digital Notes Final
No ratings yet
R22 Machine Learning Digital Notes Final
143 pages
Practical # 9
No ratings yet
Practical # 9
4 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
22 pages
07 Overview of Machine Learning
No ratings yet
07 Overview of Machine Learning
113 pages
ML Intro
No ratings yet
ML Intro
22 pages
Introduction To Machine Learning Basics
No ratings yet
Introduction To Machine Learning Basics
12 pages
AIML
No ratings yet
AIML
5 pages
Introduction To ML Unit-1
No ratings yet
Introduction To ML Unit-1
90 pages
ML Key Concepts
No ratings yet
ML Key Concepts
139 pages
Week 1 Introduction To ML
100% (1)
Week 1 Introduction To ML
42 pages
1 - AML - Manish
No ratings yet
1 - AML - Manish
72 pages
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
No ratings yet
Turner, Ryan - Python Machine Learning - The Ultimate Beginner's Guide To Learn Python Machine Learning Step by Step Using Scikit-Learn and Tensorflow (2019)
144 pages
Study On Machine Learning Research Paper
No ratings yet
Study On Machine Learning Research Paper
17 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Unit-2 AI Python
No ratings yet
Unit-2 AI Python
57 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
Machine Learning-Lecture 01
No ratings yet
Machine Learning-Lecture 01
28 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Machine Learning IAI
No ratings yet
Machine Learning IAI
94 pages
Machine Learning-Supervised Learning
No ratings yet
Machine Learning-Supervised Learning
31 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Module - 1 Lecture-1
No ratings yet
Module - 1 Lecture-1
40 pages
Python Machine Learning
No ratings yet
Python Machine Learning
109 pages
PPT-Final Project - DT - Done All Final
No ratings yet
PPT-Final Project - DT - Done All Final
14 pages
Machine Learning Unit-I
No ratings yet
Machine Learning Unit-I
41 pages
Unit 3 - DS - 1st Year
No ratings yet
Unit 3 - DS - 1st Year
5 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Introduction
No ratings yet
Introduction
4 pages
1 Lecture 1: Introduction To Machine Learning
No ratings yet
1 Lecture 1: Introduction To Machine Learning
12 pages
Machine Learning With Python Report
100% (1)
Machine Learning With Python Report
41 pages
Ai Unit 4
No ratings yet
Ai Unit 4
32 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Module 1
No ratings yet
Module 1
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Module 01 - ML-21EC744
No ratings yet
Module 01 - ML-21EC744
20 pages
Machine Learning
No ratings yet
Machine Learning
49 pages
1725629890-Unit1 Machine Learning Introduction CU 3.0
No ratings yet
1725629890-Unit1 Machine Learning Introduction CU 3.0
38 pages
Unit 1
No ratings yet
Unit 1
19 pages
SK Learn
No ratings yet
SK Learn
9 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Lecture 01 02
No ratings yet
Lecture 01 02
30 pages
Machine Learning in New
No ratings yet
Machine Learning in New
13 pages
Pyspark Material
No ratings yet
Pyspark Material
16 pages
Edureka Machine Learning Ebook
No ratings yet
Edureka Machine Learning Ebook
23 pages
3 Introduction To Machine Learning
No ratings yet
3 Introduction To Machine Learning
21 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Practicing Consciousness Slides
No ratings yet
Practicing Consciousness Slides
22 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
68 pages
Ad8552 ML Unit I
No ratings yet
Ad8552 ML Unit I
31 pages
Chapter Five
No ratings yet
Chapter Five
178 pages
The Beginner’s Guide to Local AI – Free AI Run Locally on Your PC
From Everand
The Beginner’s Guide to Local AI – Free AI Run Locally on Your PC
Steven Mcananey
No ratings yet
Paul Mather The New Microsoft Project
No ratings yet
Paul Mather The New Microsoft Project
41 pages
PHP Webforms
No ratings yet
PHP Webforms
39 pages
DAA Lab
No ratings yet
DAA Lab
6 pages
Haard 1
No ratings yet
Haard 1
1 page
CH - 5 JS
No ratings yet
CH - 5 JS
109 pages
Natural Language Processing
No ratings yet
Natural Language Processing
19 pages
Spark Overview
No ratings yet
Spark Overview
31 pages
Spark DataFrame Basics
No ratings yet
Spark DataFrame Basics
10 pages
Server Side PHP 1
No ratings yet
Server Side PHP 1
19 pages
CH - 5 JS
No ratings yet
CH - 5 JS
109 pages
Tutorial 10 Data Driven Testing in Cucumber Scenario Outline
No ratings yet
Tutorial 10 Data Driven Testing in Cucumber Scenario Outline
10 pages
Tutorial 8 DataTable Aslists in Cucumber
No ratings yet
Tutorial 8 DataTable Aslists in Cucumber
13 pages
Clustering
No ratings yet
Clustering
43 pages
Chapter 09 Advanced Data Structures
No ratings yet
Chapter 09 Advanced Data Structures
9 pages
Spring Slides
No ratings yet
Spring Slides
63 pages
Spring Boot Ecommerce Masterclass
No ratings yet
Spring Boot Ecommerce Masterclass
337 pages
UDEMY - SK - XPath Tutorial From Basic To Advance Level
No ratings yet
UDEMY - SK - XPath Tutorial From Basic To Advance Level
9 pages
Youtube PavanKumar Manual Testing 02 (Practical)
No ratings yet
Youtube PavanKumar Manual Testing 02 (Practical)
21 pages
Testing - Log4J
No ratings yet
Testing - Log4J
7 pages
Tutorial 1 What Is Cucumber-BDD
No ratings yet
Tutorial 1 What Is Cucumber-BDD
9 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
UDEMY - SK - SelectorsHub Tutorial - A Free Next Gen XPath & Locators Tool
No ratings yet
UDEMY - SK - SelectorsHub Tutorial - A Free Next Gen XPath & Locators Tool
20 pages
Testing - Apache POI
No ratings yet
Testing - Apache POI
12 pages
Tutorial 6 BackgroundKeyword
No ratings yet
Tutorial 6 BackgroundKeyword
9 pages
Parsing Json
No ratings yet
Parsing Json
1 page
Xpath Vs CSS - Everything You Need To Know About XPath and CSS
No ratings yet
Xpath Vs CSS - Everything You Need To Know About XPath and CSS
11 pages
IPD Checklist
No ratings yet
IPD Checklist
1 page
BC Contact Numbers Emails All
No ratings yet
BC Contact Numbers Emails All
1 page
Iterator+in+Java+Collection+ Iterator
No ratings yet
Iterator+in+Java+Collection+ Iterator
8 pages
Slides For Windows OS
No ratings yet
Slides For Windows OS
43 pages
99-Article Text-341-1-10-20190510
No ratings yet
99-Article Text-341-1-10-20190510
9 pages
Warning B.Tech. VII Sem - Compressed
No ratings yet
Warning B.Tech. VII Sem - Compressed
47 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
CBSE Class 10 Artificial Intelligence Sample Question Paper 2024-25 PDF Download Check Marking Scheme
No ratings yet
CBSE Class 10 Artificial Intelligence Sample Question Paper 2024-25 PDF Download Check Marking Scheme
5 pages
ATAL Schedule-Participants
No ratings yet
ATAL Schedule-Participants
1 page
Csi 5101 Mobile and Wireless Computing1
No ratings yet
Csi 5101 Mobile and Wireless Computing1
110 pages
TY Seminar Report
No ratings yet
TY Seminar Report
33 pages
Computer Science Presentation
No ratings yet
Computer Science Presentation
15 pages
Doubt Clearance Session (AI) On 29.12.2024
No ratings yet
Doubt Clearance Session (AI) On 29.12.2024
41 pages
2024 A Novel Internet of Things Web Attack Detection Architecture Based On The Combination of Symbolism and Connectionism AI
No ratings yet
2024 A Novel Internet of Things Web Attack Detection Architecture Based On The Combination of Symbolism and Connectionism AI
15 pages
Bca Brochure 2024
No ratings yet
Bca Brochure 2024
28 pages
Men S Prints Graphics Forecast S S 26 Extra Ordinary en
No ratings yet
Men S Prints Graphics Forecast S S 26 Extra Ordinary en
14 pages
Clasification of Mango (Mangifera Indica L) Fruit Varieties Using CNN
No ratings yet
Clasification of Mango (Mangifera Indica L) Fruit Varieties Using CNN
7 pages
Adarsh Logbook
No ratings yet
Adarsh Logbook
25 pages
Speech Recognition System Using Python Report
No ratings yet
Speech Recognition System Using Python Report
7 pages
What Doctor Why Ai and Robotics Will Define New Health
No ratings yet
What Doctor Why Ai and Robotics Will Define New Health
50 pages
Eeac 040
No ratings yet
Eeac 040
25 pages
Ai and Its Harm Over Directors of The Company
No ratings yet
Ai and Its Harm Over Directors of The Company
36 pages
Machine Learning Toolfor Kids
No ratings yet
Machine Learning Toolfor Kids
7 pages
Timline-Reality Split Frequency Vibration & The Hidden Forces of Life - Guenther
No ratings yet
Timline-Reality Split Frequency Vibration & The Hidden Forces of Life - Guenther
59 pages
A Speculative Philosophy of Planetary Computation (Talk) (Bratton, 2025)
No ratings yet
A Speculative Philosophy of Planetary Computation (Talk) (Bratton, 2025)
12 pages
Final Project
No ratings yet
Final Project
18 pages
Solution Brief - Intelligent Hyperconverged Infrastructure - HPE SimpliVity
No ratings yet
Solution Brief - Intelligent Hyperconverged Infrastructure - HPE SimpliVity
2 pages
FDP DL Broucher
No ratings yet
FDP DL Broucher
5 pages
Digitization Big Data Analytics and Artificial Intelligence Transforming Business Society and Research A Short Essay
No ratings yet
Digitization Big Data Analytics and Artificial Intelligence Transforming Business Society and Research A Short Essay
3 pages
AI Infrastructure Outline 2025
No ratings yet
AI Infrastructure Outline 2025
4 pages
What-Job-Seekers-Wish-Employers-Knew - BCG - The Network - Jan 2023
No ratings yet
What-Job-Seekers-Wish-Employers-Knew - BCG - The Network - Jan 2023
50 pages
Enhancing Laboratory Safety With AI: PPE Detection and Non-Compliant Activity Monitoring Using Object Detection and Pose Estimation
No ratings yet
Enhancing Laboratory Safety With AI: PPE Detection and Non-Compliant Activity Monitoring Using Object Detection and Pose Estimation
10 pages
ENISA-JRC Report - Cybersecurity Challenges in The Uptake of Artificial Intelligence in Autonomous Driving
No ratings yet
ENISA-JRC Report - Cybersecurity Challenges in The Uptake of Artificial Intelligence in Autonomous Driving
58 pages
Example of Supervised Learning Algorithms
No ratings yet
Example of Supervised Learning Algorithms
5 pages

Machine Learning Section

Uploaded by

Machine Learning Section

Uploaded by

Machine Learning

Let’s learn something!

● It is now time to begin with the Machine

● Most Machine Learning Sections have:

● The Consulting Projects are looser, more

● If you prefer a more guided approach to

● Because different students have

● If you are interested in reading more

● Students who want the mathematical

● First Suggested Reading Assignment:

● Machine learning is a method of data analysis

● Fraud detection. ● Recommendation Engines

● Spark’s MLlib is mainly designed for Supervised

● Supervised learning algorithms are trained

● The learning algorithm receives a set of inputs

● Through methods like classiﬁcation, regression,

● For example, it can anticipate when credit card

● Unsupervised learning is used against data

● For example, it can ﬁnd the main attributes

● Machine Learning takes time to learn.

● Spark has its own MLlib for Machine

● One of the main “quirks” of using MLlib is

● This requires a little more data

● When working with Python and Spark

● We will also have consulting projects,

● A huge part of learning MLlib is getting

● Fortunately, the Spark MLlib

You might also like