Lecture 1 & 2
Lecture 1 & 2
Objective:
To reinforce the concepts developed in theory with
experiments on data preparation and data analysis using different
techniques in Python
Class
Assignments 10%
Quizzes 10%
Midterm Exam 30%
Project + Final Exam 10% + 40%
General Guidelines 5
Internet of Things /
M2M Health/Scientific
Computing
• Reporting
• Monitoring (fine-grained)
• Exploration
• Finding Patterns
• Root Cause Analysis
• Closed-loop Control
• Model construction
• Prediction
Data vs. Information 19
Data Information
• Raw facts • Produced by processing
• Have not yet been raw data to reveal its
processed to reveal meaning
their meaning to the • Requires context
end user • Bedrock of knowledge
• Building blocks of • Should be accurate,
information relevant, and timely to
enable good decision
making
Data vs. Information (cont’d.) 20
Data, Information, and Beyond 21
Data, Information, and Beyond
22
• Information: Data that has been “cleaned” of errors and further processed in a way that makes it
easier to measure, visualize and analyze for a specific purpose
• Knowledge:“How” is the information, derived from the collected data, relevant to our goals?
“How” are the pieces of this information connected to other pieces to add more meaning and
value? And, maybe most importantly, “how” can we apply the information to achieve our goal?
• Wisdom: we must answer questions such as ‘why do something’ and ‘what is best’. In other
words, wisdom is knowledge applied in action.
• If data and information are like a look back to the past, knowledge and wisdom are associated
with what we do now and what we want to achieve in the future.
DATA
SCIENCE
Data Science and Big Data 24
• Data-driven science
• Interdisciplinary field
• Extract knowledge or
insight from data in various
forms
Data science workflow 30
Data Science vs Data analysis vs Data
Engineering 31
e.g., Baseball
• How to best measure individual player’s skill, value or performance?
• What is the trajectory of player’s performances as they mature and age?
• To what extent does batting performance correlate with the position played?
Data Science Life Cycle
34
• Understanding of Problem: It all starts with understanding the problem at hand, the questions, and the answers
we are trying to find from the dataset at hand.
• Data Acquisition: Data Acquisition, as the name suggests, is about retrieving the data with the help of Data
Engineers where required. It also consolidates all of the data required to answer the question or to solve the
problem at hand.
• Data Wrangling(Preparation): Data wrangling is about using knowledge to preprocess data. It involves looking
for missing values and asking business questions like why they are missing. Furthermore, it uses knowledge to
give shape to the dataset appropriate for visualizations and to support the coming steps in the life cycle.
• Data Exploration: Data Exploration is about visualization and other statistics’ measures to see whether the
questions we asked, in the beginning, are being answered or not? .
• Feature Engineering and Selection: It is a preprocessing step before modeling in both Machine Learning and
Deep Learning. We will look into these fields in the coming sections. It has similar steps to Data Wrangling apart
from some algorithms for Feature Selection and transformation.
• Modeling: Modeling is the process that uncovers the meaning of the data. It is about capturing underlying trends
and the data’s behavior to make the model, which can be used for predictive analytics as described in the
previous section.
• Deployment: After we build the model we’ll deploy it in the most efficient and optimized manner so that real-
world people can use it. It can be deployed on mobile applications and web applications.
• Monitoring: After we have deployed the model, we will want to monitor it. Monitoring is about familiarizing the
model with the new dataset and tracking the number of requests that the model receives. It also involves making
changes to the analysis and starting over if required.
Data Science Life Cycle 35
Applications of Data Science 37
Nate Silver 38
Commerce & Retail use big data and data science to optimize business processes
and for profitable decision making.
• Descriptive: A set of techniques for reviewing and examining the data set(s)
to understand the data to describe what happened or is happening?