Lesson 1 Introduction To Data Science
Lesson 1 Introduction To Data Science
Data Science
Module 1
Week 1
Overview of Data Science
Introduction to the R Language
Module Objectives
At the end of this module, students must be able to:
1. Explain the meaning of and differentiate the concepts of
big data, data analytics and data science;
2. Differentiate the domain areas of statistics and data
science;
3. Execute basic commands in R.
4. Perform basic data processing using both Excel and R.
Big Data
What is Big Data?
refers to humongous volumes of data
that cannot be processed effectively
with the traditional applications that
exist (usually comprise of raw data
that isn’t aggregated and is most often
impossible to store in the memory of a
single computer)
Big Data
What is Big Data?
refers to humongous volumes of data that cannot be processed effectively with the
traditional applications that exist (usually comprise of raw data that isn’t aggregated
and is most often impossible to store in the memory of a single computer)
immense volumes of data, both unstructured and structured (usually inundates a
business on a day-to-day basis)
something that can be used to analyze insights which can lead to better decisions
and strategic business moves.
(Gartner)“high-volume, and high-velocity and/or high-variety information assets
that demand cost-effective, innovative forms of information processing that enable
enhanced insight, decision making, and process automation”
The Ten-V’s of Big Data
Common Types of Big Data
Data Science
Deals with unstructured and structured data
Data Science
Deals with unstructured and structured data
a field that comprises of everything that
related to data cleansing, preparation, and
analysis.
the combination of statistics, mathematics,
programming, problem-solving, capturing
data in ingenious ways
the umbrella of techniques used when
trying to extract insights and information
from data.
The Data Science Process
Data Analytics
the science of examining raw data with the purpose of drawing conclusions
about that information.
involves applying an algorithmic or mechanical process to derive insights.
(e.g., running through a number of data sets to look for meaningful
correlations between each other)
used in a number of industries to allow the organizations and companies to
make better decisions as well as verify and disprove existing theories or
models.
its focus lies in inference, which is the process of deriving conclusions that
are solely based on what the researcher already knows.
Data Analytics
Analytics Value Chain
Why R?
Download: http://www.r-project.org/
Getting Started
This appears
when you
open R
Getting Started
-click “File”
-click “New Script”
Getting Started
-click “File”
-click “New Script”
To start, we clear
the console: Press
“Ctrl+L” in your
keyboard.
Getting Started
Now we have a
better workspace.
To start, we clear
the console: Press
“Ctrl+L” in your
keyboard.
Reading Stored Data
Reading csv file
Type the following in the script panel:
data<-read.csv(“D://data/fish.csv”)
data
summary(data)
Reading Stored Data
Reading csv file
click here after typing the command in the script panel or type “CTRL R”
Reading Stored Data
Reading csv file
- at this point you have just asked R to assign your data file as “data” (you may
use any name or label)
- the symbol “<-” is the assignment operator for R (“=“ in Matlab)
Now, try executing the 2 other commands. Why are the output?
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Exercises
1. Perform a quick survey among members of the class on the following
variables:
a. age
b. sex
c. income of parents
d. educational background of parents
e. number of siblings in the family
f. grade profile in mathematics ands English subjects enrolled in the
previous quarter