0% found this document useful (0 votes)
128 views13 pages

Unit - I: Topic - 1

The document provides a learning plan for an introduction to data science and basic data analytics using R session. The session aims to: 1) Understand the role of data science and big data in analytics. 2) Familiarize students with basic R syntax and commands for importing different file formats. 3) Help students understand major data types, R's graphical user interfaces, and attributes for data import and export.

Uploaded by

301047 ktr.it.17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views13 pages

Unit - I: Topic - 1

The document provides a learning plan for an introduction to data science and basic data analytics using R session. The session aims to: 1) Understand the role of data science and big data in analytics. 2) Familiarize students with basic R syntax and commands for importing different file formats. 3) Help students understand major data types, R's graphical user interfaces, and attributes for data import and export.

Uploaded by

301047 ktr.it.17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Unit_I

Session Learning Plan & Materials

Topic_1: Introduction of Data Science – Basic Data Analytics using R


Session _1:

Session Learning Rationale :

The purpose of learning this Session on Introduction of Data Science and Basic Data Analytics using R is to
1. Understand data science and its characteristics that are fueling big data analytics.
2. Educate the importance of R tool in data analysis and Familiarize students with Basic R syntax.

Session Learning Outcomes:

At the end of this class hour on Introduction of Data Science and Basic Data Analytics using R, my students will be able to:

1. Understand the role of Data science and Big data in analytics


2. Understand the Major Types of Data involved in analytics
3. Understand the need of R tool in statistical computing and analytics with basic commands in R

Implement Operate
Design
Conceive (What Students (What Students
Time Topic (What Students would
(What Students Absorb) would would demonstrate /
design/solve)
Implement / test) prove)
Introduction of Data Science  Role of data science in analytics
10  About Driving Data Deluge

Big data & its Characteristics  Understanding Attributes


10 standing out as defining big data
characteristics
Data Structures  Different types of Data Structures
10  Why Data growth is increasingly
unstructured?
Introduction to R  Programming features of R
10  understand packages and
GUIs of R
Basic R Commands  From A Comma Delimited Text
and Importing the different file formats File Can able to
10 in R import any kind
 From Excel
of file formats to
 From CSV R

Basic / Key Learning Material: (Descriptive/ Concept Map/ Mind-Map etc., to be prepared by faculty member)

Quiz Choose all Correct / Multiple Choice / True or False / Match the following Answer 1 Answer 2 Answer 3 Answer 4
Which of the following is one of the key data science skills? c) Data
b) Machine d) All of the
1. a) Statistics Visualizati
Learning Mentioned
on

If I execute the expression x <- 4 in R, what is the class of the object `x' as
2. determined by the `class ()' function? NUMERIC INTEGER REAL COMPLEX

The R language is a dialect of which of the following programming languages?


3. S C LISP SAS

In R Language the following are all atomic data types EXCEPT


4. integer logical data frame character

A numeric An integer a numeric


Suppose I have a vector x <- 1:4 and y <- 2:3. What is produced by the
vector with the vector with the vector with
5. expression x + y? an error.
values 3, 5, 3, 4. values 3, 5, 5, the values
7. 1, 2, 5, 7.

6. R has how many atomic classes of objects ? 1 2 3 5

All of the
7. Numbers in R are generally treated as _______ precision real numbers. Single Double real
mentioned

8. R objects can have attributes, which are like ________ for the object. metadata features Expression None

Short Question & Answer Hints


1. Define the term Data Science.
Data science, also known as data-driven science, is an interdisciplinary field about scientific processes and systems to
extract knowledge or insights from data in various forms, either structured or unstructured.
2. List out the characteristics of Big Data.
Velocity,Variety,Volume
3. With appropriate syntax explain how to read a .txt file in R.
df <- read. table("<Filename>.txt", header = TRUE)
4. Explain the three main attributes of big data characteristics
5. Compare the differences between BI and Data Science
6. Explain about data import in R language.
R Commander is used to import data in R language. To start the R commander GUI, the user must type in the command Rcmdr into the
console. There are 3 different ways in which data can be imported in R language-
• Users can select the data set in the dialog box or enter the name of the data set (if they know).
• Data can also be entered directly using the editor of R Commander via Data->New Data Set. However, this works well when
the data set is not too large.
• Data can also be imported from a URL or from a plain text file (ASCII), from any other statistical package or from the
clipboard.

Topic_2: R Graphical User Interfaces – Data Import and Export – Attribute and Data Types
Session _2:

Session Learning Plan & Materials Session 2


Session Learning Rationale :
The purpose of learning this Session on R Graphical User Interfaces - Data Import and Export - Attribute and Data Types, is to
understand basic R programming, its data types and attributes with respect to Data analytics.
Session Learning Outcomes:
At the end of this class hour on R Graphical User Interfaces - Data Import and Export - Attribute and Data Types, my students will be able to:
1. Understand the basics of R Programming, and its data types
2. Ability to import the data and export the data using R programming
Operate
Design Implement
Conceive (What Students
(What Students (What Students
Time Topic (What Students would
would would Implement
Absorb) demonstrate /
design/solve) / test)
prove)

Recap / Introduction:
Introduction of Data Science and Big
05 Oral Questioning /
Data Analytics using R
Discussion

Sub-topic – 1 (Lecture/Demonstration)
R graphical Interfaces, Data Import and
10 Window panes of
Export
R Studio

Sub-topic – 1 (Participative / Verify)


Activity
10 Basic commands
in R

Sub-topic – 2 (Lecture/Demonstration)
Attributes and Data Types
10 NOIR Attributes
Sub-topic – 2 (Participative / Verify)
Activity
10 Code for an
Example

Conclusion &Summary
Review Activity
05 Short questions
and answers

Basic / Key Learning Material: (Descriptive/ Concept Map/ Mind-Map etc., to be prepared by faculty member)

PPT

Answ
Quiz Choose all Correct / Multiple Choice / True or False / Match the following Answer 1 Answer 2 Answer 4
er 3

d) All of
c)
a) the
The __________ function returns a list of all the formal arguments of a function b) funct() formal
1. formals()
() mentioned

b) c) d)
You can check to see whether an R object is NULL with the _________ function. a) is.null()
2. is.nullobj() null() empty()

What will be the output of following code ? d) All of


a) 4 b) 3 c) 2
3. > f <- function(a, b) { the
+ a^2 mentioned
+}
> f(2)

c)t.co
Which function in R language is used to find out whether the means of 2 groups are
4. a)t.tests () b)t.equals() mapre d)None
equal to each other or not?
()

Short Question & Answers


1. Describe read.csv() and write.csv() functions in RStudio
2. Define NOIR
3. Write the R code to create a two dimensional matrix. Give a suitable example
4. What is the difference between a matrix and a dataframe?
5. What is f(3) where:
y <- 5
f <- function(x) { y <- 2; y^2 + g(x) }
g <- function(x) { x + y }
Why?

Answer: 12. In f(3), y is 2, so y^2 is 4. When evaluating g(3), y is the globally scoped y (5) instead of the y that is locally scoped to f,
so g(3) evaluates to 3 + 5 or 8. The rest is just 4 + 8, or 12.

6. If I have a data.frame df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6), c(7, 8, 9))...

a.) How do I select the c(4, 5, 6)?


b.) How do I select the 1?
c.) How do I select the 5?
d.) What is df[, 3]?
e.) What is df[1,]?
f.) What is df[2, 2]?
Answers: (a) df[[2]] or df$b, (b) df[[1]][[1]] or df$a[[1]], (c) df[[2]][[2]] or df$b[[2]], (d) 7 8 9, (e) 1 4 7, (f) 5.

Long Question & Answer Hints


1. Explain NOIR attributes in Detail

2. Write short notes on Arrays and Matrices in R programming with suitable example.

3. What is contingency tables and explain with an example.

Session Home Learning Materials

Topic_3: Descriptive Statistics


Session _3:

Session Learning Rationale :


The purpose of learning this Session on Descriptive Statistics
Session Learning Outcomes:
At the end of this class hour on Descriptive Statistics my students will be able to:

1. Understand the functions on view, summary, basic statistics ,correlation


2. Ability to understand Generic functions
3. To generate the descriptive statistics: averages, data ranges, and quartiles
Operate
Design Implement
Conceive (What Students
(What Students (What Students
Time Topic (What Students would
would would Implement
Absorb) demonstrate /
design/solve) / test)
prove)

Recap / Introduction:
Introduction of Statistics
05 Oral Questioning /
Discussion

Sub-topic – 1 (Lecture/Demonstration)
Activity
Functions on view, summary, basic
20 Basic Function
statistics ,correlation
commands using
R

Sub-topic – 1 (Participative / Verify)


Activity
10 Usage of Generic functions Generic functions
using R

Sub-topic – 2 (Lecture/Demonstration)
Summary statistics on Example that
20 Mean Vs Median demonstrates
Standard Deviation summary
Quartiles , Min/Max statistics
Correlations between variables

Conclusion &Summary
Review Activity
05 Short questions
and answers

Session Learning Plan & Materials Descriptive Statistics Session 3

Session Home Learning Materials


Basic / Key Learning Material: (Descriptive/ Concept Map/ Mind-Map etc., to be prepared by faculty member)

PPT,PDF
Choose all Correct /
Multiple Choice / True or
Quiz Answer 1 Answer 2 Answer 3 Answer 4
False / Match the
following

Which of the following is one a)Statistics b)MachineLearning c)DataVisualization d)AlloftheMentioned


1
of the key data science skill ?

The advantage of using a


model-based approach is True False
2
that is more closely tied to
the model performance.

Which of the following is


performed by Data Scientist a)Definethequestion b)Createreproduciblecode d)AlloftheMentioned
3 c)Challengeresults
?

Which of the following is


a) Data is not ready b) All steps should be c) Hard to use for d) None of the
characteristic of Processed
4 for analysis noted data analysis mentioned
Data ?

Short Question & Answers


7. Define correlation
8. Define the function for mean median ,range?
9. State the syntax for finding Covariance between Variables?
10. What is the difference between sapply and lapply? When should you use one versus the other? Bonus: When should you use vapply?
Answer: Use lapply when you want the output to be a list, and sapply when you want the output to be a vector or a dataframe.
Generally vapply is preferred over sapply because you can specify the output type of vapply (but not sapply). The drawback
is vapply is more verbose and harder to use.

11. Explain what is t-tests in R?

In R, the t.test () function produces a variety of t-tests. T-test is the most common test in statistics and used to determine whether
the means of two groups are equal to each other.

Long Question & Answer Hints


1. Discuss in details the several descriptive statistics that is provided by the Summary function( )?
2. For an example dataset (sales data frame) justify the results obtained using summary function?

Topic_4: Exploratory Data Analysis – Visualization Before Analysis

Session _4:

Session Learning Rationale :

The purpose of learning this Session on Exploratory data analysis is to learn about graphical representation of data

Session Learning Outcomes:

At the end of this class hour on Exploratory data analysis: Visualizing before analysis, my students will be able to:

1. Understand about ggplot() and Anscombe’s dataset.


Operate
Design
Conceive Implement (What Students
(What Students
Time Topic (What Students (What Students would would
would
Absorb) Implement / test) demonstrate /
design/solve)
prove)

Introduction: Able to understand


Exploratory data analysis the better way of
05 graphical
representation of
data

Sub-topic – 1
Able to know
(Lecture/Demonstartion)
15 about the functions
Ggplot2() package detail
in ggplot()
package.

Sub-topic – 1 (Participative / Verify)


Geometric object and Aesthetic Able to implement
10 mapping geom_point,geom_line and
geom_boxplot

Sub-topic – 2
(Lecture/Demonstration) Understand the
10 Comparison of Base graphics and differences and
ggplot2() specialities of
ggplot package

Conclusion & Summary


Able to implement the
Usage of Anscombe’s dataset for
10 stastical properties like mean,
execution of statistical properties
variance, correlation etc. by
using anscombe’s set.

Session Learning Plan & Materials


Basic / Key Learning Material: (Descriptive/ Concept Map/ Mind-Map etc., to be prepared by faculty member)

PPT and PDF

Choose all Correct / Multiple Choice


Quiz Answer 1 Answer 2 Answer 3 Answer 4
/ True or False / Match the following

. _______ grammar makes a clear


ggplot2 d3.js
5. distinction between your data and what ggplot1 D4.js
gets displayed on the screen or page.

Which of the following is a plot to


investigate the order in which ggplot ggsave
6. ggpcp ggorder
observations were recorded ?

________ is used for translating


translate_qplot_base translate_qplot_gpl
7. between qplot and base graphics. translate_qplot_lattice translate_qplot_ggplot

________ is used to create a plot to


8. ggmissplot ggmissing ggfluctuation None
illustrate patterns of missing values.

__________ create a complete ggplot


9. autoplot is.ggplot printplot All of the above
appropriate to a particular data type

Short Question & Answers


1. What is grammer of graphics?
2. Give the advantages of ggplot2.
3. Compare ggplot2 and base graphics.

Long Question & Answer Hints


1. With proper examples explain the ggplot2 package

Topic_5: Visualizing a Single Variable – Examining Multiple Variables

Session _5:

Session Learning Plan & Materials

Session Learning Rationale :

The purpose of this session would be to learn to visualize data corresponding to single variable and multiple variables

Session Learning Outcomes:

At the end of this class hour on visualizing, students will


1. Understand visualizing single and multiple variables
2. Understand handling of dirty data

Design Implement Operate


Conceive (What Students (What (What Students
Time Topic
(What Students Absorb) would Students would
design/solve) would demonstrate /
Implement / prove)
test)

15 Dotchart, Barplot, Histogram, Density


Visualizing Single Variable
Plot, Scatterplot

20 Box and Whisker Plot, Hexbinplot,


Visualizing Multiple Variables
Scatterplot matrix

10 How plots help identify outliers and


Dirty Data
other dirty data

Session Home Learning Materials


Basic / Key Learning Material: (Descriptive/ Concept Map/ Mind-Map etc., to be prepared by faculty member)

Choose all Correct / Multiple Choice / True or False / Match


Quiz Answer 1 Answer 2 Answer 3 Answer 4
the following

The -------------- plot used for visualizing multiple variables


10. Scatter Hexbin Barplot Histogram
when the volume of data is high.

A company’s sales plot shows that there are small peaks at the
Seasonality
11. end of the year and large peaks at the middle of the year. This Peak Effect Quarterly effect Year effect
effect
effect is called ------------.

12. The R command used to plot a continuous histogram is ----------- glot(data) barplot(data) rug(data) glot(density[data])

Approximation
In dirty data cleaning, the wrong data can generally be replaced based on
13. Ones Zeros NULL
with -------------------- nearest
neighbour

Long Question & Answer Hints


1. With a examples, explain the ways of plotting multiple variables.
 Small data sets and sample diagrams
 Box and Whisker Plot, Hexbinplot, Scatterplot matrix

Topic_6: Data Exploration Versus Presentation.

Session _6:
Session Learning Rationale :
The purpose of learning this Session on Data Exploration Versus presentation with respect to Data analytics.
Session Learning Outcomes:
At the end of this class hour on Data Exploration Versus presentation students will be able to:
1. Know what the user need to know
2. Ability to present data using R studio
Operate
Design Implement
Conceive (What Students
(What Students (What Students
Time Topic (What Students would
would would Implement
Absorb) demonstrate /
design/solve) / test)
prove)

Recap / Introduction:
Refreshing the various plot functions
05 Oral Questioning /
Discussion

Sub-topic – 1 (Lecture/Demonstration) Activity on


Discuss on Density plots and Histogram density and
20 histogram
functions using R

Sub-topic – 2 (Lecture/Demonstration)
Using an suitable example discuss on
20 Data Presentation and Exploration

Sub-topic – 2 (Participative / Verify)


Activity
10 Functions related to presentation and Code for an
exploration Example

Conclusion &Summary
Review Activity
05 Short questions
and answers

Session Learning Plan & Materials Session 6

Basic / Key Learning Material: (Descriptive/ Concept Map/ Mind-Map etc., to be prepared by faculty member)
PPT,PDF

Choose all Correct /


Multiple Choice / True or
Quiz Answer 1 Answer 2 Answer 3 Answer 4
False / Match the
following

The reports generated by a Commercial


reporting system are Digital E-
A. Web portal B. courier C. D.
1 usually not delivered in
dashboard Mail
which of the following service
media?

. _________ is a category of
d) All of the
applications and
mentioned
technologies for presenting c) EIS
2 a) Data warehouse b) MIS
and analyzing corporate
and external data

................... are responsible


for running queries and
A) Hardware B Software C) End users D) Middle ware
reports against data
3
warehouse tables.

............................. is the
process of finding a model
that describes and
A) Data D) Data
4 distinguishes data classes C) Data discrimination
Characterization B) Data Classification selection
or concepts.
__________ is a
nonparametric hypothesis
c) Wilcoxon
5. test that checks whether a) Student’s test b) Welch’s test
rank sum test
two populations are
identically disturbed.

Short Question & Answers


12. Difference between using visualization for data exploration, and for presenting results to stakeholders.
13. Do you think the regression line significantly captures the relationship between two variables?
14. In the Iris slide example, how would you characterize the relationship between sepal width and sepal Length?
15. What is confidence interval with example.
16. What are type 1 and type ii errors.
17. Explain Wilcoxon Rank-sum test.

Long Question & Answer Hints


1. Discuss the Various Techniques of Data Exploration and Presentation with suitable example.?

2. Explain the difference of means with example

3. Briefly Explain ANOVA

You might also like