Chapter 1
Chapter 1
By : Tinsae D.
2 Course description
Chapter 1:
Foundations of Data Science
4 1.1. Introduction - Data Science
Chapter Objectives:
The objective of this chapter material is to gently introduce
you to Data Science through some real-world examples of
where Data Science is used, and also by highlighting some
of the main concepts involved.
Fig. A general
structure of a
machine
learning based
predictive
model .
30 … Machine Learning
Machine learning is the term used to describe
a series of processes in which a computer
learns from evidence or learns from lots of
examples of data to help it to certain data-
based tasks.
Common to all machine learning algorithms is
a training step. Training is where the computer
learns something about the world or a
particular problem, based on data drawn from
that world.
.
31 … Machine Learning
Question 2
Which of the following is true:
o Clustering organizes data using pre-selected
labelling information.
o Regression is a supervised method for modelling
and predicting continuous valued data.
o Classification is the process of making a decision
based on data, and returning a categorical or
discrete output.
43 Neural Networks and Deep Learning
Fig. An
artificial
neural
network
modeling with
multiple
processing
layers.
45 How data science can play a significant role
K-Means …..
48 1.4. K-Means Clustering
Assignment (Group)
List some Real world data set available for data scientist with their
location
59 Assignment #1- 5% individual-select
all that can apply
Question 1
The K-means algorithm is an example of:
o unsupervised learning
o supervised learning
o data clustering
o classification
Question 2
60 The "k" in K-means represents:
o the number of data points in a cluster
o the number of clusters to find in a dataset
o none of the above
o the number of steps in the K-means algorithm
Question 3
Before running K-means, data scaling is applied
to the data in order to:
o standardize features to make them more comparable.
o make the plots look nicer.
o remove noisy data.
61 Question 4 (select two that applies)
Supervised machine learning includes:
o clustering
o regression
o none of the above
o classification
Question 5
K-means is run on a dataset. One of the clusters
contains 6 items of data with the following values:
2, 3, 2, 3, 1, 1. What number corresponds to the
data center, or centroid, of this cluster? (Answer this
with justification)
62