0% found this document useful (0 votes)

17 views62 pages

Chapter 1

The document outlines a course on Data Science Fundamentals with Python, focusing on K-means clustering and basic Python programming. It introduces key concepts of data science, including types of data, analytics methods, and machine learning techniques. The course aims to equip learners with the skills to analyze data and derive actionable insights for real-world applications.

Uploaded by

yosefmuluye42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views62 pages

Chapter 1

Uploaded by

yosefmuluye42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 62

Data Science

Fundamentals with Python

Code:ITec 5032

By : Tinsae D.
2 Course description

 The objective of this course is to introduce you to one of

the emerging technology - data science. There are different
data science techniques and this course is prepared to
introduce you to one of the commonly used basic data
science technique – K-means clustering from different
perspectives (ML, Mathematics, Statistics and
Programming).
 Parallelly you will be introduced to python programming
basics in data science.
3

Chapter 1:
Foundations of Data Science
4 1.1. Introduction - Data Science
Chapter Objectives:
The objective of this chapter material is to gently introduce
you to Data Science through some real-world examples of
where Data Science is used, and also by highlighting some
of the main concepts involved.

 How Data Science might be applied to real-world situations.

 How the K-Means clustering algorithm works.
 What we mean by data.
 What we mean by machine learning.
 How to use python programming for data science
5 Data science
 We are living in the age of “data science and
advanced analytics”, where almost everything in
our daily lives is digitally recorded as data . e.g.
 The data can be structured (?), semi-structured (?),
or unstructured (?), which increases day by day.
 Data science is typically a “concept to unify
statistics, data analysis, and their related
methods” to understand and analyze the actual
phenomena with data, which can be a discovery,
prediction, service, suggestion, insight into
decision-making, thought, model, paradigm, tool,
or system.
6 … Data science
 With our lives been increasingly digitized, constantly
on smartphones, electronic payments, we've all
become data-generating machines. This data says a
lot about you.
 when gathered from large populations, this data
becomes even more valuable. It gives the companies
and governments collecting it the ability to analyze
not only what we've done, but also the ability to
predict what you're about to do in future.
7 Data science

 Data science is all about extracting intelligence

from data.
 Data science draws on methods from statistics and
computer science to process the vast amounts of
data that we generate in our daily lives and to turn
this into something meaningful and valuable.
 whatever you do in today's world, you generate
data.
example:
8 … Data science
 Most of us have no idea about exactly how our data
is being used. The algorithms used by social media
companies, banks, and health insurers can seem like
mysterious black boxes.
 What are the Different applications of data science ?
one of the most commonly used applications of data
science: data clustering, using the K-means algorithm .
9 what do we mean by data?
 The word "data" is often used interchangeably
with information.
 With respect to data science
“data can refer a collection of factual
information that can be used by a computer as a
basis for calculation.”
 Data can be represented either as a number
such as age or a price, or as a category such
as hair color or type of fruit.
10 … data

 Data becomes particularly interesting when we're

dealing with large volumes of it. Take information on the
hair color and age of a handful of people. With only a
few examples, not much can be said. But if this is scaled
up to 10,000 people, then patterns emerge in the data
which can be useful to, for example, a marketing
company that's trying to market new hair care products.
 They might use this to decide which age group they
should market their products to.
11 Types of data
 There are two broad types of data,
i. Categorical and
ii. Numeric.
i. Categorical
 is when a textual description or label is used
to represent specific categories of objects.
Categories include things like
primary colors: red, green, and blue; or
fruit: banana, apple, orange.
12 ii. Numerical
 Numeric data can be continuous. Something that is measured on a
continuous scale, like air temperatures at the North Pole or the weight
of a cow.
 Numeric data can also be discreet. Something that is countable like the
number of beans in a jar, or the number of people who survived the
sinking of the Titanic.
 Ultimately, all data is stored on a computer as a discrete binary number.
This means that when we're working with categorical data or
continuous numeric data, the computer is actually mapping this to some
discrete value behind the scenes
13 Check point

Note: You can select more than one.

Question 1
 Which of the following is best represented
as categorical data?
o Volvo, Citroen, Honda, Toyota
o 1.02, 0.30, 0.03, 12.10
o The length of a swimming pool
o A person's age in months
14 … Check point
Question 2
 Which of the following might best be
represented by continuous numeric data?
o Types of fruit
o Cost of coffee
o A, B, C, D ,E
o Height of a bird in flight
o 1.03, 2.01, 13/2, 19, 101.10, 1/3
15 Data science
 Data science is typically a “concept to unify statistics, data
analysis, and their related methods” to understand and
analyze the actual phenomena with data, which can be a
discovery, prediction, service, suggestion, insight into
decision-making, thought, model, paradigm, tool, or
system.
 The popularity of “Data science” is increasing day-by-day,
which is shown in the next figure of google trends data
over the last 5 years.
16

In the fig the average is 71 for machine learning,

60 for data science, 30 for data analytics, and 12
for data mining. This shows how data science has
become popular when it comes to the popularity of
data science using recent advanced data analytics
technology such as Machine Learning is more
popular.
17 Data science
 Usually, data science is the field of applying
advanced analytics methods and scientific
concepts to derive useful business information
from data.
 Advanced analytics is a step forward in
offering a deeper understanding of data and
helping to analyze granular data while , Basic
analytics offer a description of data in general.
18 Data science: Types of analytics

 In the field of data science, several types of analytics are popular

i. Descriptive analytics" which answers the question of what happened;
ii. "Diagnostic analytics" which answers the question of why did it happen;
iii. "Predictive analytics" which predicts what will happen in the future; and
iv. "Prescriptive analytics" which prescribes what action should be taken,

Although the area of “data science” is huge, mainly

focus on analytics is deriving useful insights through advanced analytics,
where the results are used to make smart decisions in various real-world
application areas.
19 …. Types of analytics
20 Data science
 Neural network, or deep learning analysis can
provide deeper knowledge about data, and thus can
be used to develop data-driven intelligent
applications.
 More specifically, regression analysis, classification,
clustering analysis, association rules, time-series
analysis, sentiment analysis, behavioral patterns,
anomaly detection, factor analysis, log analysis, and
deep learning which is originated from the artificial
neural network, are playing major role.
21 Data science related terms
 Data analysis” refers to the processing of data by
conventional (e.g., classic statistical, empirical, or
logical) theories, technologies, and tools for
extracting useful information and for practical
purposes .
 Data analytics”, on the other hand, refers to the
theories, technologies, instruments, and processes
that allow for an in-depth understanding and
exploration of actionable data insight. Statistical
and mathematical analysis of the data is the major
concern in this process.
22 Data science related terms
 Data mining” also referred as knowledge mining
from data, knowledge extraction, knowledge
discovery from data (KDD), data/pattern analysis,
data archaeology, and data dredging. It is the
process of discovering interesting patterns and
knowledge from large amounts of data.
 Big data: massive, high dimensional,
heterogeneous, complex, unstructured,
incomplete, noisy, and erroneous” . Several
unique features including volume, velocity,
variety, veracity, value (5Vs), and complexity are
used to understand and describe big data.
23 Data science related terms

 Machine learning”, a branch of artificial intelligence

(AI), is one of the major techniques used in advanced
analytics which can automate analytical model
building. This is focused on the premise that systems
can learn from data, recognize trends, and make
decisions, with minimal human involvement.
 Deep Learning” is a subfield of machine learning that
discusses algorithms inspired by the human brain’s
structure and the function called artificial neural
networks
…… Data science
24
 Unlike the above data-related terms, “Data science” is an
umbrella term that encompasses advanced data analytics, data
mining, machine, and deep learning modeling, and several
other related disciplines like statistics, to extract insights or
useful knowledge from the datasets and transform them into
actionable business strategies.
 data science from the disciplinary perspective can be defined as
“a new interdisciplinary field that synthesizes and builds on
statistics, informatics, computing, communication,
management, and sociology to study data and its environments
to transform data to insights and decisions by following a data-
to-knowledge-to-wisdom thinking and methodology.
25 Data science
 How data science can play a significant role in the real-
world business process?
26 How data science can play a significant role …
 Understanding business problems: getting a clear
understanding of the problem , o understand and identify
the business problems, the data scientists formulate
relevant questions while working with the end-users and
other stakeholders.
 Understanding data: real-world data sets are often noisy,
missing values, have inconsistencies, or other data issues,
which are needed to handle effectively. what data is
available and how it aligns to the business problem could
be the first step in data understanding, what data would be
best needed and the best ways to acquire it.
27 How data science can play a significant role ..
 Data pre-processing and exploration: examines a broad data collection to
discover initial trends, attributes, points of interest, etc. in an unstructured
manner to construct meaningful summaries of the data. visualizing and
interpreting the data through graphical representation such as a chart, plot,
histogram. use data summarization and visualization to audit the quality of the
data .
 Machine learning modeling and evaluation: Once the data is prepared for
building the model, data scientists design a model, algorithm, or set of models,
to address the business problem. Model building is dependent on what type of
analytics. Data scientists typically separate training and test subsets of the given
dataset usually dividing in the ratio of 80:20. This is to observe whether the
model performs well or not on the data, to maximize the model performance.
(Model validation and assessment metrics: error rate, accuracy, true positive,
false positive, true negative, false negative, precision, recall, f-score, ROC
(receiver operating characteristic curve) analysis, applicability analysis)
28 More on ML

 advanced analytics” can be defined as the autonomous or semi-

autonomous analysis of data or content using advanced techniques and
methods to discover deeper insights, make predictions, or produce
recommendations, where machine learning-based analytical modeling is
considered as the key technologies in the area.
 wide range of methods such as regression and classification analysis,
association rule analysis, time-series analysis, behavioral analysis, log
analysis, and so on can be applied.
29 ML

Fig. A general
structure of a
machine
learning based
predictive
model .
30 … Machine Learning
 Machine learning is the term used to describe
a series of processes in which a computer
learns from evidence or learns from lots of
examples of data to help it to certain data-
based tasks.
 Common to all machine learning algorithms is
a training step. Training is where the computer
learns something about the world or a
particular problem, based on data drawn from
that world.
.
31 … Machine Learning

 Training may allow the computer to build some

internal representation, or model, about that world
 Alternatively, the computer can be trained to search
for patterns in the data to help structure the world.
 The outcome of training is an algorithm that can be
used for a variety of tasks such as predicting future
events, automatically recognizing objects, or
structuring data in a manageable way.
32 examples of machine learning
Example1 (regression)
 Machine learning can be used to predict the
future. Take earthquake prediction. This is
important applications in predicting natural
disasters and helping people plan to minimize the
impact.
 When trained on information such as time,
location, and magnitude of historical earthquakes
in a region, a machine learning algorithm may be
used by geologists to work out the probability of
another earthquake occurring at a certain time
and place in the future. This sort of machine
33 Example 2-(classification)
 Machine learning can also be used to recognize and
classify previously unseen objects.
 Given a large database of possible road signs, a
machine learning algorithm in a self-driving car may
be able to correctly recognize a stop sign in an
unfamiliar location or recognize a stop sign that is
drawn slightly different from the others it's seen.
 This information can be used to direct the car to take
appropriate action, such as stop. This sort of machine
learning where the output is a discrete label, a stop
sign, is known as classification.
34 Example 3-(Clustering)

 Machine learning can also be used to structure

lots of data into manageable chunks or clusters.
 Information from thousands of people's shopping
habits for example, can be used to link some
groups of those people into specific clusters for
more targeted marketing. Clustering is an
example of unsupervised learning. It allows us to
find patterns of behavior in data, based on
similarities in that data.
35  The same approach may make use of data drawn
from social media posts to automatically cluster
people into groups of similar political affiliation, or
use sequences of information from samples of
DNA to cluster people into groups with similar
genetics.
 In summary, machine learning is really just about
building algorithms that can help a computer to
learn from a body of data so that may make
sense or make predictions about new and
previously unseen data.
 Simply Machine learning is when a machine
learns from examples.
36 1.3. Supervised Vs. Unsupervised
Learning
 Machine learning algorithms fall into two
categories,
i. supervised learning and
ii. unsupervised learning.
 What do I mean by supervised?
Consider a parent teaching an infant the difference
between dogs and cats. Every time the child sees a
dog, the parent will point out "dog", and similarly
for a cat.
37 …. i. supervised learning

 All that is needed are lots of examples of

data, images of animals that had been pre-
labelled as either cat or dog by the child
supervisor or parent.
 Supervised learning is essentially this,
providing the computer with lots of training
data images of dogs and cats in this
example, alongside the class labels that we
have assigned either dog or cat.
38 …. i. supervised learning

 When completely new data is then presented, the

trained algorithm can infer the most likely
corresponding class or value. Trained algorithm,
will make a decision on whether it's either a dog or
a cat. This is an example of classification.
 Alternatively, the algorithm can give some
measure indicating the degree of dogishness, or
catishness. This is an example of regression, or a
continuous valued output or a probability is given.
39 ii. unsupervised learning.
 Unsupervised learning is when we don't know beforehand the
structure of the data.
 We don't use any labels. In our child learning example, the
parent never tells the child what it is they're looking at.
 The child may be exposed to lots of dogs and cats, but they
have to learn the similarities and differences between
examples of the creatures based on some other criteria. Over
time, the child may well learn traditional dichotomy dog versus
cat. But equally, they could form a different clustering. For
example, degree of furriness versus non furry. They might not
even be limited to just two classes or clusters. Perhaps, the
data suggests that three or more clusters may be appropriate.
40 … ii. unsupervised learning.
 For example, if the similarity criteria was color of
fur. This division of data into groups based on some
measure of similarity is why this type of
unsupervised learning is referred to as data
clustering.
 Data clustering is a very powerful tool and
exemplifies many of the most important aspects of
machine learning and data science.
 In this course, we will explore clustering in greater
detail, and in particular, the most common and
useful clustering algorithm, K-means clustering.
41 Check point
Select all the can apply (Note you can select more
than one)
Question 1
 Which of the following are commonly
associated with supervised learning?
o classification
o clustering
o regression
42 Check point

Question 2
 Which of the following is true:
o Clustering organizes data using pre-selected
labelling information.
o Regression is a supervised method for modelling
and predicting continuous valued data.
o Classification is the process of making a decision
based on data, and returning a categorical or
discrete output.
43 Neural Networks and Deep Learning

 Deep learning is a form of machine learning that uses artificial neural

networks to create a computational architecture that learns from data
by combining multiple processing layers, such as the input, hidden, and
output layers.
 The key benefit of deep learning over conventional machine learning
methods is that it performs better in a variety of situations, particularly
when learning from large datasets
 The most common deep learning algorithms are: multilayer perceptron
(MLP) , convolutional neural network (CNN or ConvNet) , long short term
memory recurrent neural network (LSTM-RNN).
44 Neural Networks …..

Fig. An
artificial
neural
network
modeling with
multiple
processing
layers.
45 How data science can play a significant role

 Data product and automation: is typically the output of

any data science activity . A data product, or data-enabled
or guide, which can be a discovery, prediction, service,
suggestion, insight into decision-making, thought, model,
paradigm, tool, application, or system that process data
and generate results. Businesses can use the results of such
data analysis.
46
K-Means
loading ……………………………….
47

K-Means …..
48 1.4. K-Means Clustering

 Data clustering is a method of unsupervised

machine learning, where data is separated into
groups or clusters based on some similarity
measure.
 K-means clustering is probably the most common
example of data separation into groups or clusters
based on some similarity measure.
 To show how K-Means Clustering works, lets start
with example.
… k-Means clustering (Example)
49

• Imagine a one-dimensional axis, a line representing income. Each dot

shown in this line represents a population of people with that level of
income.
• Say we want to uncover some pattern in this data. Specifically, can
we find out something about the relationships between these people
based only on their level of income? Using traditional statistics, we
can see something that the overall average income.
50  But the average income information fails to capture the fact that there
are clearly two groupings of income here, which we might label as
wealthy and everyone else.

 The K-means algorithm is pretty good at finding such patterns without

us having to tell it beforehand.
51 steps of K-means
 K-means has five basic steps and works as follows.
 Step one.
First, we select the number of clusters we want to look for. This is the k in K-
means. Here we choose k equals two. The algorithm then randomly selects k
points on our data axis. Note that it doesn't matter that these points do not
necessarily correspond to existing data. These points are called our data centers
or centroids.
 Step two.
The distance from each data point to each of our k centroids is calculated. In this
case, distance is simply a measure of the difference in income between the points.
 Step three.
52
Clusters are formed by assigning each data point to either centroid one or two,
depending on which is closest.
 Step four.
This is the update step. The average value calculated over the members of each
cluster is then set as the new centroid. We ignore and dispense with the
previous centroid value.
 Finally, step five.
We then recursively run steps two to four recalculating the centroids until
eventually the centroid positions do not change. When the centroids remains
stable like this, the algorithm is said to have converged. In our example, we
were able to discover two clusters. But if we were to say k equals to three
looking for three clusters, we can find another pattern in the data, which we can
then map to wealthy, average, and poor.
53  K-mean steps are Most useful cases involve more than one dimension or
feature.
 The same basic principle can be applied to two-dimensions.
 The distance measure between points here might be a simple Euclidean
distance.
 It turns out that K-means can be applied to any number of dimensions,
provided there is sufficient data to train the algorithm.
 K-means converges to what is known as a local minimum. This basically
means that although the algorithm seems to have found the best
groupings. A better result may yet be found if the algorithm were to be
started again with different initial centroids positions. It turns out that
the selection of initial centroids in step one is crucial to finding a good
solution.
54 standardization
 Having good data to begin with is crucial to the
success of any data science analysis.
 If the data going in is bad, then the algorithms won't
work as well as you'd like.
 Much of the work carried out by data scientists is spent
cleaning and adjusting the data to make it usable.
 For K-means, it's particularly important that the data
used is compatible between different features.
Continuous value data such as income, times, weights,
can be using the Euclidean distance.
55 … standardization
 However, for more than two dimensions of features with
different ranges, for example, if you had income levels
between 1,000 and a million , and weight from 20
kilograms to a 150 kilograms, it's important to scale that
data so that the two things can be compatible. Usually,
this means adjusting all values to fall between zero and
one. This scaling of the data is sometimes referred to as
standardization.
 Categorical feature data like oranges, apples, or cat, dog,
are not as easily handled by K-means. If the categories fall
in some kind of scale like very dog versus slightly dog, less
dog, these may be converted into a number range like 1,
0.8, 0.6, 0.4 and K-means can then be used.
56 Check point
 Question 1
These are the final cluster assignments from a run of K-means on 12 data
points (the red marks). What value of K was used in the algorithm?
57  Question 2
 K-means clustering is run on the data shown below.
During the first round of the algorithm, the cluster
centroids are placed in the positions indicated by the
large blue and green crosses.

o(a,b,c,d) and (e,f,g)

o(a,b,c) and (d,e,f,g)
o(a,b,c) and (d) and (e,f,g)
o(a,b) and (c,d) and (e,f) and (g)
58 1.5. Real world data set

 Publicly available real-world dataset. This is based on two sources of

information. The World Bank income inequality index and the
Gallup poll of happiness, covering over a 120 countries.

Assignment (Group)
List some Real world data set available for data scientist with their
location
59 Assignment #1- 5% individual-select
all that can apply
Question 1
 The K-means algorithm is an example of:
o unsupervised learning
o supervised learning
o data clustering
o classification
Question 2
60  The "k" in K-means represents:
o the number of data points in a cluster
o the number of clusters to find in a dataset
o none of the above
o the number of steps in the K-means algorithm
Question 3
 Before running K-means, data scaling is applied
to the data in order to:
o standardize features to make them more comparable.
o make the plots look nicer.
o remove noisy data.
61 Question 4 (select two that applies)
 Supervised machine learning includes:
o clustering
o regression
o none of the above
o classification
Question 5
 K-means is run on a dataset. One of the clusters
contains 6 items of data with the following values:
2, 3, 2, 3, 1, 1. What number corresponds to the
data center, or centroid, of this cluster? (Answer this
with justification)
62

End of Chapter One

Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
100% (1)
Vision Based Systems For UAV Applications: Aleksander Nawrat Zygmunt Kus
348 pages
(Graduate Texts in Mathematics 3) Helmut H. Schaefer (Auth.) - Topological Vector Spaces-Springer New York (1971)
No ratings yet
(Graduate Texts in Mathematics 3) Helmut H. Schaefer (Auth.) - Topological Vector Spaces-Springer New York (1971)
305 pages
BCA SEM 3 Computer Oriented Numerical Methods BC0043
75% (4)
BCA SEM 3 Computer Oriented Numerical Methods BC0043
10 pages
Computer ISCE Sample Paper
100% (1)
Computer ISCE Sample Paper
5 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
54 pages
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
No ratings yet
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
54 pages
Data Science: October 2021
No ratings yet
Data Science: October 2021
51 pages
MS Excel 280 Short Keys Guide Book
No ratings yet
MS Excel 280 Short Keys Guide Book
36 pages
RFIC Inductor Toolkit
No ratings yet
RFIC Inductor Toolkit
39 pages
CO Distribution Cycle
No ratings yet
CO Distribution Cycle
10 pages
1 Stop Project1
No ratings yet
1 Stop Project1
27 pages
Javascript Tutorial
No ratings yet
Javascript Tutorial
30 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
2017 H2 Math Functions Lecture Notes
No ratings yet
2017 H2 Math Functions Lecture Notes
32 pages
Data Science vs. Statistics: Two Cultures?
No ratings yet
Data Science vs. Statistics: Two Cultures?
22 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
33 pages
Time Management: Steve Briggs - Vice President and Managing Director
No ratings yet
Time Management: Steve Briggs - Vice President and Managing Director
11 pages
Mathematics For Economics: Euncheol Shin
No ratings yet
Mathematics For Economics: Euncheol Shin
14 pages
CH-2, Stress & Strain
No ratings yet
CH-2, Stress & Strain
74 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
WWW - Manaresults.Co - In: Power System Analysis
No ratings yet
WWW - Manaresults.Co - In: Power System Analysis
8 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
Uplift Force, Seepage, and Exit Gradient Under Diversion Dams
No ratings yet
Uplift Force, Seepage, and Exit Gradient Under Diversion Dams
11 pages
Solutionbank: Edexcel AS and A Level Modular Mathematics
No ratings yet
Solutionbank: Edexcel AS and A Level Modular Mathematics
67 pages
2 Data Science Process 06-01-2024
No ratings yet
2 Data Science Process 06-01-2024
32 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Module 1
No ratings yet
Module 1
192 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
CH 1
No ratings yet
CH 1
34 pages
Math 462: HW3 Solutions
No ratings yet
Math 462: HW3 Solutions
8 pages
Project Report
No ratings yet
Project Report
29 pages
Topic 3 Notes: Jeremy Orloff
No ratings yet
Topic 3 Notes: Jeremy Orloff
11 pages
Government Intervention Chapter - 9: Exercise Practice Set: S D S D S D
No ratings yet
Government Intervention Chapter - 9: Exercise Practice Set: S D S D S D
7 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
A Short Introduction To The Kotlin Language: For Java Developers
No ratings yet
A Short Introduction To The Kotlin Language: For Java Developers
35 pages
3250+module+1+ +Intro+to+Data+Science
No ratings yet
3250+module+1+ +Intro+to+Data+Science
71 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
Carmichael MArron 2018 OJO
No ratings yet
Carmichael MArron 2018 OJO
22 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
IAT 2 Part A - DS
No ratings yet
IAT 2 Part A - DS
5 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
Unit 1
No ratings yet
Unit 1
28 pages
Information Technology Fundamentals: CCIT4085
No ratings yet
Information Technology Fundamentals: CCIT4085
43 pages
Trigonometry Formulae - Trigo Formulae For LAKSHYA JEE
No ratings yet
Trigonometry Formulae - Trigo Formulae For LAKSHYA JEE
2 pages
AME 365 Heat Transfer & Combustion (UNIT 5)
No ratings yet
AME 365 Heat Transfer & Combustion (UNIT 5)
73 pages
Class 9 Sample Paper 2020-21
No ratings yet
Class 9 Sample Paper 2020-21
3 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science
No ratings yet
Data Science
64 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages
#2 Photo Editing
No ratings yet
#2 Photo Editing
36 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
Data Science
No ratings yet
Data Science
40 pages
Exploratory Sensor Data Analysis in Python - by Mabel González Castellanos - Towards Data Science
No ratings yet
Exploratory Sensor Data Analysis in Python - by Mabel González Castellanos - Towards Data Science
19 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Csi Algebra Unit 2 The Real Number System
No ratings yet
Csi Algebra Unit 2 The Real Number System
22 pages
DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
Amc8 2020
No ratings yet
Amc8 2020
16 pages
Unit 1
No ratings yet
Unit 1
76 pages
Lecture 3 - Introduction To Computer Data Processing Using Python
No ratings yet
Lecture 3 - Introduction To Computer Data Processing Using Python
22 pages
Kadir
No ratings yet
Kadir
80 pages
Arc Exam
No ratings yet
Arc Exam
2 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
#1 Introduction To MMS
No ratings yet
#1 Introduction To MMS
49 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Level - I (C.W) (Areas)
No ratings yet
Level - I (C.W) (Areas)
3 pages
Education and Training Policy - Am.en
No ratings yet
Education and Training Policy - Am.en
54 pages
CH 1 Inclusivness May 2023
No ratings yet
CH 1 Inclusivness May 2023
52 pages
CH 2 Inclusivness
No ratings yet
CH 2 Inclusivness
38 pages
08 Implementing DHCP
No ratings yet
08 Implementing DHCP
14 pages
Ch-1 - Query Processing and Optimization
No ratings yet
Ch-1 - Query Processing and Optimization
39 pages
10 Implementing File Services
No ratings yet
10 Implementing File Services
8 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
17 pages
Data Science
No ratings yet
Data Science
9 pages
CH 3 Inclusivness
No ratings yet
CH 3 Inclusivness
37 pages
Of-Fti-All-18 PPT4
No ratings yet
Of-Fti-All-18 PPT4
72 pages
02 Introduction - Fall 23-24
No ratings yet
02 Introduction - Fall 23-24
29 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Ch7-Overview of Data Science-Part 1
No ratings yet
Ch7-Overview of Data Science-Part 1
37 pages
4 5783016375635157076
No ratings yet
4 5783016375635157076
1 page
Probability Class 11
No ratings yet
Probability Class 11
1 page
ADS SEM 8 Unit 1
No ratings yet
ADS SEM 8 Unit 1
75 pages
Pedagogy Identified - Competencies 2016+++
No ratings yet
Pedagogy Identified - Competencies 2016+++
14 pages
Computer Security by William Stalling CH 2 MCQ
No ratings yet
Computer Security by William Stalling CH 2 MCQ
6 pages
IT Project Assignment
No ratings yet
IT Project Assignment
7 pages
Final 1 Inctional Requirment
No ratings yet
Final 1 Inctional Requirment
7 pages
Portifolio Format and Components
No ratings yet
Portifolio Format and Components
12 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
Applied - Data - Science MODULE 1 SEM8
No ratings yet
Applied - Data - Science MODULE 1 SEM8
16 pages
4 5861438557154971483
No ratings yet
4 5861438557154971483
2 pages
Practice Questions II
No ratings yet
Practice Questions II
14 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
Applied Data Analysis
No ratings yet
Applied Data Analysis
128 pages
Ds1 - Shahana
No ratings yet
Ds1 - Shahana
36 pages
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Data Science Essentials: Machine Learning and Natural Language Processing
From Everand
Data Science Essentials: Machine Learning and Natural Language Processing
Angel Gabaldon
No ratings yet

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Data Science

Fundamentals with Python

 The objective of this course is to introduce you to one of

 How Data Science might be applied to real-world situations.

 Data science is all about extracting intelligence

 Data becomes particularly interesting when we're

Note: You can select more than one.

In the fig the average is 71 for machine learning,

 In the field of data science, several types of analytics are popular

Although the area of “data science” is huge, mainly

 Machine learning”, a branch of artificial intelligence

 advanced analytics” can be defined as the autonomous or semi-

 Training may allow the computer to build some

 Machine learning can also be used to structure

 All that is needed are lots of examples of

 When completely new data is then presented, the

 Deep learning is a form of machine learning that uses artificial neural

 Data product and automation: is typically the output of

 Data clustering is a method of unsupervised

• Imagine a one-dimensional axis, a line representing income. Each dot

 The K-means algorithm is pretty good at finding such patterns without

o(a,b,c,d) and (e,f,g)

 Publicly available real-world dataset. This is based on two sources of

End of Chapter One

You might also like