BECE352E Module 3
BECE352E Module 3
• Supervised and
Unsupervised ML
Algorithms
A canonical definition by Tom Mitchell in 1997: “An agent is said to learn from experience (E) with
respect to some class of tasks (T), and the performance measure (P), if the learner's performance at T,
as measured by P, improves with E". One has to be very careful about defining the set of tasks T, and
the performance measure P. With experience E, the performance P has to improve.”
3
What is Machine Learning?
5
Applications of Machine Learning
6
Real world Applications of Machine Learning
7
Types of Machine Learning based on Learning
8
What is Supervised Machine Learning?
• Learning an
input and
output map.
• It deals with
Labels
labelled data.
10
What is Unsupervised Machine Learning?
• Discovering
patterns in
the data.
• It deals with
unlabelled
data.
12
Difference between Supervised and Unsupervised
13
Supervised and Unsupervised Algorithms
14
Classification Vs. Regression
• If a patient has stiff neck, what’s the probability he/she has meningitis?
P( S | M ) P( M ) 0.5 1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20
16
Naïve Bayes Classification model
•Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps
in building the fast machine learning models that can make quick predictions.
•It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
•Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.
•Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem
•Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability
of a hypothesis with prior knowledge. It depends on the conditional probability.
•The formula for Bayes' theorem is given as:
•Where,
•P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
•P(B) is Marginal Probability: Probability of Evidence.
•P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
•P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.
17
Naïve Bayes Classification model Solved Example#1
If the weather is sunny, then the Player should play or not?
18
Naïve Bayes Classification model Solved Example#1
Step-1 Frequency table for the Weather Conditions:
19
Naïve Bayes Classification model Solved Example#1
Step-3 Applying Bayes Theorem
20
K-Nearest Neighbor(KNN) Algorithm
• Choosing the value of k:
• If k is too small, sensitive to noise
points
• If k is too large, neighborhood may
include points from other classes
• Higher values of k provide smoothing
that reduces the risk of overfitting
due to noise in the training data
• Value of k can be chosen based on
error rate measures
• We should also avoid over-smoothing
by choosing k=n, where n is the total
number of tuples in the training data
set
21
K-Nearest Neighbor(KNN) Algorithm-Solved Example
22
K-Nearest Neighbor(KNN) Algorithm-Solved Example
The distance between the new point and each training point is
calculated.
23
K-Nearest Neighbor(KNN) Algorithm-Solved Example
The distance between the new point and each training point is calculated using either of
the forms
24
K-Nearest Neighbor(KNN) Algorithm-Solved Example
The closest k data points are selected (based on the distance). In this
example, points 1, 5, 6 will be selected if the value of k is 3.
25
K-Nearest Neighbor(KNN) Algorithm-Solved Example
• Select the k value. This determines the
number of neighbors we look at when
we assign a value to any new
observation.
• In our example, for a value k = 3, the
closest points are ID1, ID5 and ID6.
26
K-Nearest Neighbor(KNN) Algorithm-Solved Example
• In our example, for a value k = 5, the
closest points are ID1, ID4, ID5, ID6
and ID10.
27
K-Nearest Neighbor(KNN) Algorithm
Advantages
• It is simple to implement.
• It is robust to the noisy training
data
• It can be more effective if the
training data is large.
Disadvantages:
• Always needs to determine the
value of K which may be complex
some time.
• The computation cost is high
because of calculating the distance
between the data points for all the
training samples.
28
What is Regression?
Example #2
32
What Logistic Regression predicts?
• Probability of Y occurring given known values for X(s).
• In Logistic Regression, the Dependent Variable is transformed into the natural log of the odds.
This is called logit (short for logistic probability unit).
• The probabilities which ranged between 0.0 and 1.0 are transformed into
odds ratios that range between 0 and infinity and approximated as a sigmoid
function applied to a linear combination of input features in the range 0 to 1.
• If the probability for group membership in the modeled category is above
some cut point (the default is 0.50), the subject is predicted to be a member
of the modeled group. Example: Default their payment.
• If the probability is below the cut point, the subject is predicted to be a
member of the other group. Example: No Default their payment.
• For any given case, logistic regression computes the probability that a case
with a particular set of values for the independent variable is a member of
the modeled category.
33
Logistic Regression-Solved Example#1
A dataset consist of women and men Instagram users with a sample size of
1069. Let the probability of men and women using Instagram
be 𝑃𝑚𝑒𝑛 𝑎𝑛𝑑 𝑃𝑤𝑜𝑚𝑒𝑛 𝑟𝑒𝑠𝑝𝑒𝑐𝑡𝑖𝑣𝑒𝑙𝑦. The sample proportion of women who
are Instagram users is given as 61.08%, and the sample proportion for men
is 43.98%. The difference is 0.170951, and the 95% confidence interval is
(0.111429, 0.2292).Establish a logistic regression model specifies the
relationship between p and x. 𝑃0 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
Odds= =
1− 𝑃0 𝑓𝑎𝑖𝑙𝑢𝑟𝑒
Solution
𝑃𝑤𝑜𝑚𝑒𝑛
Logistic regression equation for women log ( ) = 𝛽0 + 𝛽1
1− 𝑃𝑤𝑜𝑚𝑒𝑛
𝑃𝑚𝑒𝑛
Logistic regression equation for men log ( ) = 𝛽0
1− 𝑃𝑚𝑒𝑛
34
Logistic Regression-Solved Example#1 (Contd.)
𝑃 0.6108
Odds for women=1− 0𝑃 = 1−0.6108=1.5694
0
𝑃0 0.4398
Odds for men=1− 𝑃 = =0.7851
0 1−0.4398
𝑃𝑤𝑜𝑚𝑒𝑛
Log of Odds for women=log ( )=log(1.5694)=0.4507=𝛽0 + 𝛽1
1− 𝑃𝑤𝑜𝑚𝑒𝑛
𝑃𝑚𝑒𝑛
Log of Odds for men=log ( )=log(0.7851)=-0.2419=𝛽0
1− 𝑃𝑚𝑒𝑛
𝑏0 = −0.2419
Slope 𝑏1 = Log (odds for women)-Log(odds for men)=0.4507-(- 0.2419)=0.6926
35
Model Estimation and Evaluation
36
Principal Component Analysis (PCA)#Solved Example-1
Consider the two dimensional patterns (2, 1), (3, 5), (4, 3), (5, 6), (6, 7), (7, 8).
Compute the principal component using PCA Algorithm.
Get data.
37
Principal Component Analysis (PCA)#Solved Example-1
38
Principal Component Analysis (PCA)#Solved Example-1
39
Principal Component Analysis (PCA)#Solved Example-1
40
Principal Component Analysis (PCA)#Solved Example-1
41
Principal Component Analysis (PCA)#Solved Example-1
42
Principal Component Analysis (PCA)#Solved Example-1
43
Principal Component Analysis (PCA)#Solved Example-1
44
Principal Component Analysis (PCA)#Solved Example-1
45
Principal Component Analysis (PCA)#Solved Example-1
46
IoT Data Analytics
• IoT data analytics refers to the
procedure of gathering, examining,
and deciphering data produced by
these devices to gain knowledge
and make wise decisions.
• Data analytics uses bunches of
hardware, software, and data
science techniques to collect
accurate information from massive
data created by IoT devices.
47
IoT Data Analytics-Components
•Data Collection − IoT devices are embedded with various sensors that collect data on
different parameters such as temperature, humidity, pressure, and motion. This data is
transmitted to a central server or cloud platform for further processing.
•Data Storage − The data generated by IoT devices is massive and needs to be stored
efficiently.
•Data Processing − IoT data analytics involves processing data to extract valuable insights.
To make sure the data is correct, consistent, and prepared for analysis, data processing
procedures including data cleansing, data transformation & data normalization are utilized.
•Data analysis − To find patterns and trends in the data, statistical & machine learning
algorithms are employed.
•Data Visualization − IoT data analytics involves the use of data visualization tools to
present insights and findings in a user-friendly and understandable format. Visualization
tools like dashboards, charts & graphs help to understand the data quickly and then make
decisions in a very logical and practical way. So, they can give an informed decision based
on the insights derived from IoT data analysis.
48
IoT Data Analytics-Challenges
•Data Security − IoT devices generate sensitive data that can be vulnerable to cyber-
attacks. Every organization must make sure that IoT data is stored securely. Also, only
authorized people can access it.
•Data Privacy − IoT devices collect personal data such as location, health, and behaviour.
Organizations should check that all these data must be collected and used in compliance
with privacy regulations.
•Data Quality − IoT data can be noisy and inconsistent. Organizations need to ensure that
IoT data is accurate, consistent, and reliable for analysis.
•Scalability − IoT data is generated at a massive scale. Organizations need to ensure that
their IoT data analytics infrastructure can scale to handle large volumes of data.
•Interoperability − IoT devices come from different manufacturers and have different
protocols & standards. All these make it difficult to integrate & analyze data from different
sources. Interoperability challenges can lead to data silos, reduced efficiency, and increased
costs. Organizations need to ensure that their IoT data analytics infrastructure can integrate
data from different sources and platforms seamlessly.
49
IoT Data Analytics-Applications
• Predictive Maintenance − IoT data analytics is used to predict when equipment is likely to fail. By analyzing
the data generated by sensors embedded in machines, organizations can identify patterns that indicate potential
equipment failure. It enables organizations to schedule maintenance before a failure occurs, reducing
downtime and increasing efficiency.
• Energy Management − IoT data analytics is used to monitor and optimize energy consumption in buildings.
By analyzing data on energy usage, temperature, and occupancy, organizations can identify areas where
energy usage can be reduced. It helps organizations save money on energy costs and reduce their carbon
footprint.
• Supply Chain Optimization − IoT data analytics is used to optimize supply chain operations. By analyzing
data on inventory levels, transportation routes & delivery times, organizations can identify areas where supply
chain processes can be improved. It helps organizations reduce costs and improve customer satisfaction.
• Smart Cities − IoT data analytics is used to make cities more efficient and sustainable. You can easily analyze
traffic patterns, air quality, and energy usage. With this cities can identify the areas they need improvements.
• Healthcare − IoT data analytics is used to monitor patients remotely, collect vital signs data & provide
personalized healthcare. By analyzing patient data, healthcare providers can identify patterns that indicate
potential health issues, enabling them to intervene early and provide more effective treatment. IoT data
analytics can also help healthcare providers improve operational efficiency by optimizing resource allocation
and reducing wait times. 50
Cloud Computing for IoT
• Cloud Internet of Things (IoT) uses cloud
computing services to collect and process
data from IoT devices, and to manage the
devices remotely.
• The scalability of cloud IoT platforms
enables the processing of large amounts of
data, as well as artificial intelligence (AI) and
analytics capabilities.
• Cloud IoT is a technology architecture that
connects IoT devices to servers housed in
cloud data centers. This enables real-time
data analytics, allowing better, information-
driven decision making, optimization, and
risk mitigation. Cloud IoT also simplifies
management of connected devices at-scale.
51
Cloud Computing for IoT
Cloud IoT is different from traditional, or non-cloud-based IoT in a few key ways:
• Data Storage: the cloud collects IoT data generated by thousands or millions of IoT sensors, with
the data being stored and processed in a central location. While in other types of IoT
architectures, data may be stored and processed on-premises
• Scalability: cloud IoT is highly scalable, as cloud infrastructure (compute, storage, and
networking resources) can easily handle thousands of devices and process their data across large
systems
• Flexibility: cloud IoT provides a high level of flexibility, as it allows devices to be added or
removed as-needed, without having to reconfigure the entire system
• Maintenance: in cloud IoT, the maintenance of servers and networking equipment is handled by
the cloud service provider (CSP). While in other types of IoT architectures, maintenance may be
the responsibility of the end user
• Cost: cloud IoT can be more cost-effective over the long-term, as users only pay for the
resources they actually consume, and users do not have to invest upfront in their own expensive
compute, storage, and networking infrastructure
52
Cloud Computing for IoT
• Cloud IoT connects IoT devices – which collect and transmit data – to cloud-based servers via
communication protocols such as MQTT and HTTP and over wired and wireless networks.
These IoT devices can be managed and controlled remotely and integrated with other cloud
services.
• A cloud IoT system typically includes the following elements:
• IoT Devices: physical devices, such as sensors and actuators, that generate and transmit
data to the cloud
• Connectivity: communication protocols and standards used to connect the IoT devices to
the cloud. Examples of protocols include MQTT and HTTP, while examples of standards
are Wi-Fi, 4G/LTE, 5G, Zigbee, and LoRa (long range) 53
Cloud Computing for IoT
• Cloud Platforms: cloud service providers (CSPs) that offer infrastructure and services to
connect to the IoT devices. Examples include AWS IoT and Azure IoT
• Data Storage: cloud-based storage for data generated by the IoT devices, which can be
housed in repositories such as a database, data warehouse, or data lake
• Application Layer or API: cloud IoT platforms typically provide a native application –
for analytics, machine learning (ML), and visualization – or application programming
interface (API) – for data processing. Usually, applications offer the ability to manage and
monitor the IoT devices for provisioning, software updates, and troubleshooting
• Security: measures put in place to secure the data and IoT devices, such as encryption,
authentication, and access control 54
Cloud Based platforms
• A Cloud platform hosts the server hardware and
operating system in a web-based datacenter.
• The platform enables the coexistence of hardware and
software, and it offers remote connectivity to compute
services at scale.
• Businesses leveraging a cloud platform can remotely
access a variety of pay-per-use computing services,
including databases, servers, analytics, storage,
intelligence, networking, and software.
• Organizations do not have to build and own computing
infrastructure or data centres. They only pay for the
services they use.
• The cloud platforms permit enterprises to create and test
applications, as well as store, retrieve, analyze, and back
up data. Companies can also embed business intelligence
into operations, stream videos or audios, and deliver on-
demand software on a worldwide scale. 55
Cloud Based platforms
• Public Cloud Platforms: A public cloud platform is a third- Top Cloud Platforms
party cloud service provider that delivers scalable computing 1.Amazon Web Services
resources via the Internet. Typical examples of public cloud (AWS) IoT Platform
platforms include IBM Bluemix, Microsoft Azure, Google 2.Microsoft Azure IoT
Cloud Platform, and AWS (Amazon Web Services).
3.Google IoT
• Private Cloud: Private cloud platforms are managed by an
4.IBM Watson IoT
organization's internal IT department. They use existing
infrastructure and resources that already exist at a company's 5.Cisco IoT Cloud Connect
on-premises data center. Private cloud platforms offer the 6.ThingsBoard Open-Source
highest level of cybersecurity. IoT Platform
• Hybrid Cloud Platforms: Hybrid clouds, which combine 7.Oracle IoT Intelligent
private and public cloud platforms, provide both scalability Applications
and security. It allows enterprises to seamlessly move
applications and data between private and public cloud
platforms, offering increased flexibility and superior
optimization of infrastructure, compliance, and security
56
Cloud Based platforms
• Platform-as-a-Service (PaaS): Platform-as-a-Service (PaaS) emerges as a dynamic cloud
computing solution, providing users with a comprehensive suite of hardware, software, and
resources to seamlessly develop, deploy, and manage applications without additional hardware
or software investments.
• PaaS proves invaluable for developers and individuals tasked with creating custom
applications or seamlessly integrating existing ones into the cloud environment.
• With PaaS, innovation knows no bounds as users unlock the power to shape their digital
landscape without the complexities of infrastructure provisioning.
Use Cases for PaaS
1.Web Application Hosting: PaaS can host applications requiring frequent updates without managing the
underlying infrastructure. This makes it easier to deploy and scale applications.
2.Mobile App Development: PaaS can be used to develop and deploy mobile applications more quickly, as it
provides access to ready-made components and services.
3.Big Data Analytics: PaaS can process and analyze large amounts of data quickly and cost-effectively, as it
provides access to powerful computing resources.
4.IoT Solutions: PaaS can be used to develop and manage connected devices and applications, as it provides
access to scalability and secure communication infrastructure.
5.DevOps Automation: PaaS can be used to automate development and operations processes, such as
deployment, testing, and monitoring, which helps to ensure faster and more reliable software releases.
57
Cloud Based platforms
• Infrastructure-as-a-Service (IaaS)
• Infrastructure-as-a-Service (IaaS) is a cloud computing solution that furnishes users with
virtualized computing components such as servers, storage, networks, and operating systems.
• It is optimal for those seeking more control over their infrastructure while avoiding physical
hardware costs.
Use Cases for IaaS
1.Web Hosting: IaaS can host web-based applications and websites, providing users access to the underlying
infrastructure and computing resources.
2.Application Development and Testing: IaaS can be used to develop and test software applications, as it
provides users with access to the underlying infrastructure and computing resources.
3.Database Hosting: IaaS can host databases as it provides users access to the underlying infrastructure and
computing resources.
4.Disaster Recovery: IaaS can be used for disaster recovery, as it allows users to quickly provision additional
resources from the cloud to restore their data and systems.
5.Big Data Analytics: IaaS can store, process and analyze large amounts of data, providing users with access to
the underlying infrastructure and computing resources.
6.IoT Deployment: IaaS can deploy and manage large-scale Internet of Things (IoT) solutions, as it provides
users with access to the underlying infrastructure and computing resources.
58
Cloud Based platforms
• Software-as-a-Service (SaaS): As a cloud computing solution, it provides users seamless
access to software applications via the internet. These web-based programs can be utilized from
any device with an internet connection, eliminating the need for local installations. SaaS caters
to individuals and organizations seeking efficient access to specific software programs, enabling
enhanced collaboration, scalability, and flexibility without the burden of software management.
Use Cases for SaaS
1.Email and Collaboration: Email and collaboration tools such as Google Apps and Office 365 are popular SaaS
applications for communication and productivity.
2.CRM: Customer relationship management (CRM) tools such as Salesforce and Zendesk provide businesses with
a platform to manage customer data, automate sales and marketing operations, and track customer engagement.
3.E-commerce: E-commerce platforms such as Shopify, BigCommerce, and Magento provide businesses with a
complete solution to create and manage their online stores.
4.Project Management: Project management and task management tools such as Asana, Trello, and Basecamp
are popular SaaS applications used to manage projects, tasks, and timelines.
5.Accounting: Accounting and bookkeeping tools such as QuickBooks Online and Xero provide businesses with
an easy way to track financials and keep their books in order.
6.Human Resources: Human resource management (HRM) tools such as BambooHR and Zenefits provide
businesses with a platform to manage employee data and automate HR processes.
59
ML for Cloud IoT Analytics
• IoT and machine learning
deliver insights otherwise
hidden in data for rapid,
automated responses and
improved decision making.
Machine learning for IoT can
be used to project future
trends, detect anomalies, and
augment intelligence by
ingesting image, video and
audio.
• Machine learning can help
demystify the hidden patterns
in IoT data by analyzing
massive volumes of data
using sophisticated
algorithms. 60
ML for Cloud IoT Analytics
• Machine learning
inference can supplement
or replace manual
processes with automated
systems using statistically
derived actions in critical
processes.
• With machine learning for
IoT,
• Ingest and transform data
into a consistent format
• Build a machine learning
model
• Deploy this machine
learning model on cloud,
edge and device 61
Benefits of ML for Cloud IoT Analytics
• Simplify machine learning model training: Cumulocity IoT Machine Learning is designed
to help you quickly build new machine learning models in an easy manner. AutoML support
allows the right machine learning model to be chosen for you based on your data, whether that
be operational device data captured on the Cumulocity IoT platform or historical data stored in
big data archives.
• Flexibility to use your data science library of choice: There are a wide variety of data
science libraries available (e.g., Tensorflow®, Keras, Scikit-learn) for developing machine
learning models. Cumulocity IoT Machine Learning allows models to be developed in data
science frameworks of your choice. These models can be transformed into industry-standard
formats using open source tools and made available for scoring within Cumulocity IoT.
• Rapid model deployment to operationalize machine learning quickly: Whether created
within Cumulocity IoT Machine Learning itself or imported from other data science
frameworks, model deployment into production environments is possible wherever needed in
one click, either in the cloud or at the edge. Operationalized models can be easily monitored
and updated if underlying patterns shift. Additionally, pretrained and verified models are
available for immediate model deployment to accelerate adoption.
62
Benefits of ML for Cloud IoT Analytics
• Prebuilt connectors for operational & historical datastores: Cumulocity IoT Machine
Learning provides easy access to data residing in operational and historical datastores for
model training. It can retrieve this data on a periodic basis and route it through an automated
pipeline to transform the data and train a machine learning model. Data can be hosted on
Amazon® S3 or Microsoft® Azure® Data Lake Storage, as well as local data storage, and
retrieved using prebuilt Cumulocity IoT DataHub connectors.
• Integration with Cumulocity IoT Streaming Analytics: Cumulocity IoT Machine Learning
enables high-performance scoring of real-time IoT data within Cumulocity IoT Streaming
Analytics. Cumulocity IoT Streaming Analytics provides a “Machine Learning” building block
in its visual analytics builder that allows the user to invoke a specified machine learning model
to score real-time data. This provides a no-code environment to integrate machine learning
models with streaming analytics workflows.
• Notebook integration: Jupyter Notebook, a de facto standard in data science, provides an
interactive environment across programming languages. They can be used to prepare and
process data, train, deploy and validate machine learning models. This open-source web
application is integrated with Cumulocity IoT Machine Learning.
63
ML for Cloud IoT Analytics-Challenges
ML based Cloud computing also poses some
challenges for IoT applications, such as latency,
bandwidth, reliability, and interoperability.
• Latency means the delay between sending
and receiving data, which can affect the
performance and responsiveness of IoT
devices and applications.
• Bandwidth means the amount of data that
can be transferred over a network, which
can limit the data volume and quality of IoT
devices and applications.
• Reliability means the availability and
consistency of cloud services, which can be • Interoperability means the ability of different cloud
affected by network failures, outages, or services and IoT devices to communicate and work
together, which can be hindered by incompatible
disruptions.
standards, protocols, or formats.
64