0% found this document useful (0 votes)
64 views

Lesson 1 Introduction To Data Science

This document provides an overview of the objectives and content covered in the first week of a data science module. The module introduces key concepts related to big data, data analytics, and data science. It differentiates statistics and data science. The document also demonstrates how to execute basic commands and perform basic data processing in R. Exercises are provided to help students apply these skills, such as surveying classmates and summarizing the data in tabular and cross-tabular forms.

Uploaded by

Andrei Viloria
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Lesson 1 Introduction To Data Science

This document provides an overview of the objectives and content covered in the first week of a data science module. The module introduces key concepts related to big data, data analytics, and data science. It differentiates statistics and data science. The document also demonstrates how to execute basic commands and perform basic data processing in R. Exercises are provided to help students apply these skills, such as surveying classmates and summarizing the data in tabular and cross-tabular forms.

Uploaded by

Andrei Viloria
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Introduction to

Data Science
Module 1
Week 1
Overview of Data Science
Introduction to the R Language
Module Objectives
At the end of this module, students must be able to:
1. Explain the meaning of and differentiate the concepts of
big data, data analytics and data science;
2. Differentiate the domain areas of statistics and data
science;
3. Execute basic commands in R.
4. Perform basic data processing using both Excel and R.
Big Data
What is Big Data?
 refers to humongous volumes of data
that cannot be processed effectively
with the traditional applications that
exist (usually comprise of raw data
that isn’t aggregated and is most often
impossible to store in the memory of a
single computer)
Big Data
What is Big Data?
 refers to humongous volumes of data that cannot be processed effectively with the
traditional applications that exist (usually comprise of raw data that isn’t aggregated
and is most often impossible to store in the memory of a single computer)
 immense volumes of data, both unstructured and structured (usually inundates a
business on a day-to-day basis)
 something that can be used to analyze insights which can lead to better decisions
and strategic business moves.
 (Gartner)“high-volume, and high-velocity and/or high-variety information assets
that demand cost-effective, innovative forms of information processing that enable
enhanced insight, decision making, and process automation”
The Ten-V’s of Big Data
Common Types of Big Data
Data Science
 Deals with unstructured and structured data
Data Science
 Deals with unstructured and structured data
 a field that comprises of everything that
related to data cleansing, preparation, and
analysis.
 the combination of statistics, mathematics,
programming, problem-solving, capturing
data in ingenious ways
 the umbrella of techniques used when
trying to extract insights and information
from data.
The Data Science Process
Data Analytics
 the science of examining raw data with the purpose of drawing conclusions
about that information.
 involves applying an algorithmic or mechanical process to derive insights.
(e.g., running through a number of data sets to look for meaningful
correlations between each other)
 used in a number of industries to allow the organizations and companies to
make better decisions as well as verify and disprove existing theories or
models.
 its focus lies in inference, which is the process of deriving conclusions that
are solely based on what the researcher already knows.
Data Analytics
Analytics Value Chain
Why R?

 It is free and powerful


 It can produce professional graphs
 Programming is easy

Download: http://www.r-project.org/
Getting Started
This appears
when you
open R
Getting Started
-click “File”
-click “New Script”
Getting Started
-click “File”
-click “New Script”

Now you see 2 work


pads:
-the console: where
the commands are
executed
-the script : where
commands are
written
Getting Started
Let us first fix the
workspace.
- Click
“Windows”
- Click “Tile
Vertically”
Getting Started
Now we have a
better workspace.

To start, we clear
the console: Press
“Ctrl+L” in your
keyboard.
Getting Started
Now we have a
better workspace.

To start, we clear
the console: Press
“Ctrl+L” in your
keyboard.
Reading Stored Data
Reading csv file
Type the following in the script panel:
data<-read.csv(“D://data/fish.csv”)
data
summary(data)
Reading Stored Data
Reading csv file

click here after typing the command in the script panel or type “CTRL R”
Reading Stored Data
Reading csv file
- at this point you have just asked R to assign your data file as “data” (you may
use any name or label)
- the symbol “<-” is the assignment operator for R (“=“ in Matlab)

Now, try executing the 2 other commands. Why are the output?
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Samples of Basic Commands in R
Exercises
1. Perform a quick survey among members of the class on the following
variables:
a. age
b. sex
c. income of parents
d. educational background of parents
e. number of siblings in the family
f. grade profile in mathematics ands English subjects enrolled in the
previous quarter

Provide a summary of the data in tabular and cross-tabular forms.

You might also like