0% found this document useful (0 votes)
17 views18 pages

Da Handbook

DA HANDBOOK

Uploaded by

Ajay Kumar Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views18 pages

Da Handbook

DA HANDBOOK

Uploaded by

Ajay Kumar Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

DATA ANALYTICTS

Subject Code: CS513PE


Regulations : R18 - JNTUH
Class: III Year B.Tech CSE I Semester

Department of Computer Science and Engineering


Bharat Institute of Engineering and Technology
Ibrahimpatnam-501510,Hyderabad
DATA ANALYTICS (CS513PE)
B.Tech. III Year I Sem
COURSE PLANNER

I. COURSE PURPOSE:

More and more organizations these days use their data as decision supporting tool and to build
data-intensive products and services. The collection of skills required by organizations to
support these functions has been grouped under the term “Data Analytics”. This course will
cover the basic concepts of data analytics, methodologies for analyzing structured and
unstructured data with emphasis on the relationship between the data Scientist and the
business needs.

II. PRE-REQUISITES:

1. Course of “Database Management Systems”


2. Knowledge of probability and statistics

III. COURSE OBJECTIVIES:

1. To explore the fundamental concepts of data analytics


2. To learn the principles and methods of statistical analysis
3. Discover interesting patterns, analyze supervised and unsupervised models and estimate the
accuracy of the algorithms
4. To understand the various search methods and visualization techniques

IV. COURSE COUCOMES:

PROGRAM
Bloom’s Taxonomy
S. OUTCOMES,
Course Outcomes Levels
No. PROGRAM

SPECIFIC
OUTCOMES

L1-Remembering, L2-
Understand the impact of data analytics for PO1-PO6,PO9-
1. Understanding, L5-
business decisions and strategy PO12,PSO1-PSO3
Evaluating
L3-Applying, L5- PO1-PO6,PO9-
2. Carry out data analysis/statistical analysis
Evaluating PO12,PSO1-PSO3
To carry out standard data visualization L4-Analyzing, L5- PO1-PO6,PO9-
3.
and formal inference procedures Evaluating PO12,PSO1-PSO3
4 L4-Analyzing, L6-
PO1-PO6,PO9-
Design Data Architecture Creating, L1-
PO12,PSO1-PSO3
Remembering
5 L6-Creating, L1- PO1-PO6,PO9-
Understand various Data Sources Knowledge and L3- PO12,PSO1-PSO3
Applying

V. COURSE CONTENT:
UNIT – I

Data Management: Design Data Architecture and manage the data for analysis, understand
various sources of Data like Sensors/Signals/GPS etc. Data Management, Data Quality
(noise, outliers, missing values, duplicate data) and Data Processing & Processing

UNIT – II

Data Analytics: Introduction to Analytics, Introduction to Tools and Environment, Application


of Modeling in Business, Databases & Types of Data and variables, Data Modeling
Techniques, Missing Imputations etc. Need for Business Modeling

UNIT – III

Regression – Concepts, Blue property assumptions, Least Square Estimation, Variable


Rationalization, and Model Building etc.

Logistic Regression: Model Theory, Model fit Statistics, Model Construction, Analytics
applications to various Business Domains etc.

UNIT – IV

Object Segmentation: Regression Vs Segmentation – Supervised and Unsupervised Learning,


Tree Building – Regression, Classification, Overfitting, Pruning and Complexity, Multiple
Decision Trees etc.

Time Series Methods: Arima, Measures of Forecast Accuracy, STL approach, Extract features
from generated model as Height, Average Energy etc and Analyze for prediction
UNIT – V

Data Visualization: Pixel-Oriented Visualization Techniques, Geometric Projection Visualization


Techniques, Icon-Based Visualization Techniques, Hierarchical Visualization Techniques,
Visualizing Complex Data and Relations.

VI. LESSON PLAN:

Teaching
Course Learning
S.NO WEEK TOPICS methodologi REFERENCES
Outcomes
es
UNIT-1
Understanding and
Remembering the
1 Design Data Architecture T1
basics of data
architecture
I
Applying the definition of
2 Design Data Architecture T1
data Mining
Understanding data
3 Managing Data Analysis T1
analysis
Understanding Various Analyzing the different
4 T1
Sources of Data sources of data Chalk and
Understanding the board,
5 2 Data Management T1
management PPT
Applying the data for presentati
6 Data Quality T1
quality processing on
Analyzing data
7 Data Processing T1
processing
3 Analyzing data
8 Data Processing T1
processing
9 Revision of Unit-1 T1
10 MOCK TEST-1
4
11 Tutorial/bridge class #1 T1

UNIT-2
Introduction to Data Understanding data
12 4 T1,T2
Analytics analytics
Understanding and
Introduction to Tools and
13 5 analyzing tools T1,T2
Environment
environment
Application of Modeling in Understanding real-time
14 T1,T2
Business applications
Application of Modeling in Understanding real-time
15 T1,T2
Business applications
Databases & Types of Data Understanding and
16 T1,T2
and Variables analyzing databases Chalk and
Analyzing modeling board,
17 Data Modeling techniques T1,T2
6 techniques PPT
Analyzing modeling presentat T1,T2
18 Data Modeling techniques techniques ion
T1,T2
Understanding the
19 Missing Imputations consequences of missing T1,T2
imputations
7 Creating a representation of
20 Need for Business Modeling T1,T2
business model
21 Revision of Unit-II T1,T2
22 8 Tutorial/bridge class #2

I-MID EXAMINATIONS(WEEK-9)

UNIT-3
Understanding the
23 Regression Concepts T1,T2
concepts
8
Evaluating the
24 Blue Property Assumptions T1,T2
assumptions
Understanding the
25 Least Square Estimation T1,T2
algorithm
Creating and
26 9 Variable Rationalization understanding T1,T2
the concepts
Understanding
27 Model Building Chalk and board, T1,T2
model creation
Evaluating Model PPT
28 Logistic Regression presentation T1,T2
theory
Evaluating Model
29 Model Fit Statistics T1,T2
10 theory
Understanding
30 Model Construction Model T1,T2
Construction
Understanding
31 Analytics Applications applications to T1,T2
11 business domains
32 Tutorial/bridge class #3
UNIT-4
Understanding
11 Regression vs Segmentation object T1,T2
33 segmentation
Supervised and Unsupervised Evaluating
T1,T2
34 Learning Algorithms
Analyzing the
Tree Building - Regression
35 12 concepts
Understanding
Classification & Overfitting classification T1,T2
36 methods
Understanding and
Pruning and Complexity evaluating the Chalk and board, T1,T2
37 complexity PPT
13 Evaluating decision presentation
Multiple Decision Trees T1,T2
38 trees
Times Series Methods - Understandingalgori
T1,T2
39 Arima thms
Understanding and
Measures of Forecast
implementing real T1,T2
Accuracy
40 world examples
14 Understanding
STL Approach various T1,T2
41 approaches
Extract Features from Models Analyzing extraction T1,T2
42
UNIT-5
43 Data Visualization T1,T2
Understanding and
44 15 Geometric Projection Analyzing and T1,T2
evaluating
45 Icon-Based Techniques T1,T2
46 MOCK TEST-2 MOCK TEST-2 Chalk and board, T1,T2
PPT
47 Hierarchical Visualization Understanding and presentation T1,T2
Analyzing and
48 16 Complex Data Relationships evaluating T1,T2
49 Tutorial/bridge class #6
50 Tutorial/bridge class #7
II MID EXAMINATIONS (WEEK 17)

TEXT BOOKS:
1. Student’s Handbook for Associate Analytics – II, III.
2. Data Mining Concepts and Techniques, Han, Kamber, 3rd Edition, Morgan Kaufmann
Publishers.

REFERENCES:
1. Introduction to Data Mining, Tan, Steinbach and Kumar, AddisionWisley, 2006.
2. Data Mining Analysis and Concepts, M. Zaki and W. Meira
3. Mining of Massive Datasets, Jure Leskovec Stanford Univ. Anand Rajaraman Milliway Labs
Jeffrey D Ullman Stanford Univ.

Department of Academic Year: 2019 -


COURSE OUTCOMES
Computer Science and 2020
ASSESSMENT
Engineering Semester: ODD / EVEN

VII. HOW PROGRAM OUTCOMES ARE ASSESSED:

Proficiency
Program Outcomes (PO) Level assessed
by
PO1Engineering knowledge: Apply the knowledge of mathematics,
Lectures,
science, engineering fundamentals, and an engineering Assignments,
2.5
specialization to the solution of complex engineering problems Exams
related to Computer Science and Engineering.
PO2 Problem analysis: Identify, formulate, review research
literature, and analyze complexengineering problems
Lectures,
related to Computer Science and Engineering and Assignments,
1.5
reaching substantiated conclusions using first principles Exams
of mathematics, natural sciences, and engineering
sciences.
PO3 Design/development of solutions: Design solutions for
complex engineering problems related to Computer
Science and Engineering anddesign system components Lectures,
or processes that meet the specified needs with 3 Assignments,
appropriate consideration for the public health and safety, Exams
and the cultural, societal, and environmental
considerations.
PO4 Conduct investigations of complex problems: Use
research-based knowledge and researchmethods including Lectures,
design of experiments, analysis and interpretation of data, 1.5 Assignments,
and synthesis of the information to provide valid Exams
conclusions.
PO5 Modern tool usage: Create, select, and apply appropriate Lectures,
2.5 Assignments,
techniques, resources, and modernengineering and IT
Proficiency
Program Outcomes (PO) Level assessed
by
tools including prediction and modeling to complex Exams
engineering activities with an understanding of the
limitations.
PO6 The engineer and society: Apply reasoning informed by the
contextual knowledge to assesssocietal, health, safety, Lectures,
legal and cultural issues and the consequent 1 Assignments,
responsibilities relevant to the Computer Science and Exams
Engineering professional engineering practice.
PO7 Environment and sustainability: Understand the impact of
the Computer Science and Engineering professional
engineering solutionsin societal and environmental -
contexts, and demonstrate the knowledge of, and need for
sustainable development.
PO8 Ethics: Apply ethical principles and commit to professional
ethics and responsibilities and norms ofthe engineering -
practice.
PO9 Individual and team work: Function effectively as an Lectures,
individual, and as a member or leader indiverse teams, 1.5 Assignments,
and in multidisciplinary settings. Exams
PO10 Communication: Communicate effectively on complex
engineering activities with the engineeringcommunity and
Lectures,
with society at large, such as, being able to comprehend Assignments,
2.0
and write effective reports and design documentation, Exams
make effective presentations, and give and receive clear
instructions.
PO11 Project management and finance: Demonstrate knowledge
and understanding of theengineering and management Lectures,
principles and apply these to one’s own work, as a 1.5 Assignments,
member and leader in a team, to manage projects and in Exams
multidisciplinary environments.
PO12 Life-long learning: Recognize the need for, and have the
Lectures,
preparation and ability to engage inindependent and life- Assignments,
2.5
long learning in the broadest context of technological Exams
change.
VIII. HOW PROGRAM SPECIFIC OUTCOMES ARE ASSESSED:

Proficiency
Program Specific Outcomes (PSO) Level assessed
by
PSO1 Foundation of mathematical concepts: To use mathematical Lectures,
methodologies to crack problem using suitable mathematical 2.5 Assignments,
analysis, data structure and suitable algorithm. Exams
PSO2 Foundation of Computer System: The ability to interpret the 3.0 Lectures,
fundamental concepts and methodology of computer systems. Assignments,
Students can understand the functionality of hardware and software Exams
aspects of computer systems.
PSO3 Foundations of Software development: The ability to grasp the
software development lifecycle and methodologies of software
Lectures,
systems. Possess competent skills and knowledge of software design Assignments,
2.0
process. Familiarity and practical proficiency with a broad area of Exams
programming concepts and provide new ideas and innovations
towards research.

MAPPING COURSE OUTCOMES LEADING TO THE ACHIEVEMENT


OFPROGRAM OUTCOMES AND PROGRAM SPECIFICOUTCOMES:
Program Program Specific
Course Outcomes Outcomes
Outcomes PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
1 3 1 2 1 2 1 - - 2 2 2 3 1 3 2
2 2 2 3 2 3 1 - - 1 2 2 3 2 2 2
3 2 1 2 1 2 1 - - 2 2 2 2 1 2 2
4 3 2 3 2 2 1 - - 1 1 1 2 3 3 2
5 2 1 2 2 2 1 - - 2 2 1 1 1 2 2
AVG 2.4 1.4 2.5 1.6 2.5 1 - - 1.6 1.8 1.6 2.2 1.6 2.4 2

DESCRIPTIVE QUESTIONS

UNIT-1
Short Answer Questions
QUESTIONS Blooms Courseoutcomes
taxonom
y level
1. What is Data Management? Understand 1
2. What is Big Data? Understand 1
3. List out Enterprise Requirements Knowledge 1
4. What is workplace safety? Knowledge 1
5. What did you understand about is Big-data tools ? Analyze 1

Long Answer Questions


1. Explain Data Architecture in detail. Understanding 1
2. Explain the sources of primary Data. Creating 1
3. Write about data preprocessing needs. Analyzing 1
4. Explain in detail for generating primary data. Understanding 1
5. Explain Survey methods and experimental method. Analyzing 1
UNIT-2
Short Answer Questions
QUESTIONS Blooms taxonomy Course
level outco
mes
1.what is data analytics? Understanding 2
2. Explain about tools used for data analytics? Knowledge 2
3.Name some data modeling techniques? Understand 2
4.Explain missing imputations? Analyze 2
5.Define data variables? Interpret the use of variables for Understand 2
business modeling

Long Answer Questions


1.Discuss the importance of data analytics Analysis 2
2.Describe the tools used for data analytics with an example? Analysis 2
3.Explain how and where missing imputations are involved in Understand 2
real world scenario
4.Explain databases and types of data and variables involved in Understand 2
data analytics
5.Explain with example the need for business modeling Analysis 2

UNIT-3
Short Answer Questions
QUESTIONS Blooms Course
taxonomy outco
level mes
1.State BLUE property assumptions? Understand 3
2.What is variable rationalization? Knowledge 3
3.Explain theoretically an analytics application in business Analysis 3
domain?
4.How to calculate a LSE regression line Knowledge 3
5.Explain OLS? Understand 3

Long Answer Questions


1.Explain about regression and discuss with an example? Analysis 3
2.Summarize how does LSE work? Understanding 3
3.Describe the working procedures of Logistic Regression in Analysis 3
Business world?
4.Discuss about variable realization? Evaluate 3
5.Explain about model fit statistics used for regression with an Knowledge 3
example and also discuss about model construction?
UNIT-4
Short Answer Questions
QUESTIONS Blooms Course
taxonom outco
y level mes
1.What is regression Knowledge 4
2.Describe segmentation with an example Knowledge 4
3.Give real-time examples of supervised learning Knowledge 4
4.what are decision trees Understand 4
5.Briefly describe Arima method Understand 4

Long Answer Questions


1. What is Linear Regression? Explain with an example Analysis 4
2.Differentiate between supervised and unsupervised learning Understand 4
3.Detail overfitting and pruning? Analysis 4
4.Explain time series method with an example Knowledge 4
5.Generate a model to measure forecast accuracy Knowledge 4

UNIT-5
Short Answer Questions
QUESTIONS Blooms Course
taxonomy outco
level mes
1.Name some frequently used 2-D space-filling curves? Knowledge 5
2.What is a scatter plot an scatter-plot matrix? Understand 5
3.Speciy the dimensionality of Chernoff faces? Analysis 5
4. Write a short note on Hierarchical visualization techniques. Understand 5
5.Explain tag cloud briefly. Understand 5

Long Answer Questions


1.Explain complex data and deduce its relationships? Evaluate 5
2.Explain a visualization technique using parallel coordinates? Knowledge 5
3. Explain asymmetrical Chernoff faces? Analysis 5
4.Explain geometric projection visualization Understand 5
5.Write notes of a) circle segment technique and 2) space filling Knowledge 5
curves
UNIT-1
1. Data Architectures describes
a. Data extraction
b. data processing, storage and utilization
c. ETL
d. Business
Answer: B
2. Data architecture doesn’t break into which process?
a. conceptual
b. logical
c. Application
d. Physical
Answer: C
3. Market researchers use which experimental designs most frequently
a. All of the below
b. CRD
c. RBD
d. LSD
Answer: A

4. Which is a 2-way classification scheme?


a. LSD
b. CRD
c. RBD
d. None of the above
Answer: A
5. What is factorial design?
a. test 2 or more variables
b. test a database
c. test a business model
d. none of the above
Answer: A
UNIT-2
1. According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop?

a. Big data management and data mining

b. Data warehousing and business intelligence

c. Management of Hadoop clusters

d. Collecting and storing unstructured data

Answer: A
2. All of the following accurately describe Hadoop, EXCEPT:
a. Open source
b. Real-time
c. Java-based
d. Distributed computing approach
Answer: B
3. __________ has the world’s largest Hadoop cluster.
a. Apple
b. Datamatics
c. Facebook
d. None of the mentioned
Answer: C
4. What are the five V’s of Big Data?

a. Volume

b. Velocity

c. Variety

d. All the above

Answer: D
5. _________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.

a. Scalding

b. Cascalog

c. Hcatalog

d. Hcalding

Answer: B
UNIT-3
1) True-False: Is Logistic regression a supervised machine learning algorithm?
A) TRUE
B) FALSE

Answer: A - True, Logistic regression is a supervised learning algorithm because it uses true
labels for training. Supervised learning algorithm should have input variables (x) and a target
variable (Y) when you train the model.

2) True-False: Is Logistic regression mainly used for Regression?


A) TRUE
B) FALSE

Answer: B - Logistic regression is a classification algorithm, don’t confuse with the name
regression.

3) True-False: Is it possible to design a logistic regression algorithm using a Neural Network


Algorithm?
A) TRUE
B) FALSE

Answer: A-True, Neural network is a is a universal approximator so it can implement linear


regression algorithm.

4) True-False: Is it possible to apply a logistic regression algorithm on a 3-class Classification


problem?
A) TRUE
B) FALSE

Answer: A - Yes, we can apply logistic regression on 3 classification problem, we can use One
Vs all method for 3 class classification in logistic regression.

5) Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B

Answer: B - Logistic regression uses maximum likely hood estimate for training a logistic
regression.

6) Which of the following evaluation metrics cannot be applied in case of logistic regression
output to compare with target?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error

Answer: D -Since, Logistic Regression is a classification algorithm so it’s output cannot be real
time value so mean squared error cannot use for evaluating it

UNIT-4
1. Choose the options that are correct regarding machine learning (ML) and artificial intelligence
(AI),

(A) ML is an alternate way of programming intelligent machines.


(B) ML and AI have very different goals.
(C) ML is a set of techniques that turns a dataset into a software.
(D) AI is a software that can emulate the human mind.

Answer: (A), (C), (D)

2. Which of the following sentence is FALSE regarding regression?

(A) It relates inputs to outputs.


(B) It is used for prediction.
(C) It may be used for interpretation.
(D) It discovers causal relationships.

Answer: (D)

3.Supervised learning and unsupervised clustering both require at least one


a. Hidden attribute
b. Output attribute
c. input attribute
d. Categorical attribute

Answer: A

4. Supervised learning differs from unsupervised clustering in that supervised learning requires

a.at least one input attribute


b. input attributes to be categorical
c. at least one output attribute
d. Output attributes to be categorical
Answer: B
5. Which of the following statement is correct?

1. If autoregressive parameter (p) in an ARIMA model is 1, it means that there is no auto-


correlation in the series.
2. If moving average component (q) in an ARIMA model is 1, it means that there is auto-
correlation in the series with lag 1.
3. If integrated component (d) in an ARIMA model is 0, it means that the series is not
stationary.

A) Only 1
B) Both 1 and 2
C) Only 2
D) All of the statements

Solution: (C)

Autoregressive component: AR stands for autoregressive. Autoregressive parameter is denoted


by p. When p =0, it means that there is no auto-correlation in the series. When p=1, it
means that the series auto-correlation is till one lag.

Integrated: In ARIMA time series analysis, integrated is denoted by d. Integration is the inverse
of differencing. When d=0, it means the series is stationary and we do not need to take the
difference of it. When d=1, it means that the series is not stationary and to make it
stationary, we need to take the first difference. When d=2, it means that the series has been
differenced twice. Usually, more than two time difference is not reliable.

Moving average component: MA stands for moving the average, which is denoted by q. In
ARIMA, moving average q=1 means that it is an error term and there is auto-correlation with
one lag.

UNIT-5
1. Worlds-within-Worlds is also known as? Answer: a
a. n-Vision
b. b. influence graph
c. c. binary attribute
d. 6-D dataset

2. tree-maps display hierarchical data as a set of? Answer: a


a. nested rectangles
b. grouped variables
c. representation dimensions
d. customer object
3. IBVT represents? Answer: a
a. multidimensional data values
b. tree-maps
c. pages related to a particular subject
d. Kylie Minogue

4. Which is an extension to scatter plot? Answer: c


a. Distiller
b. Hub pages
c. scatter plot matrix
d. scores

5. What is main Objective of data visualization? Answer: d


a. Web Component, Score and Usage Mining
b. Web Control, Text and Utility Mining
c. Web Content, Score and Utility Mining
d. communicate data clearly

Fill in the blanks:

1. For a data set of m dimensions pixel-oriented create_______________ pixels


Answer: m

2.______________ techniquesuses windows in the shape of segments of a circle


Answer: circle segment

3. Chernoff faces display multidimensional data of up to ____________ variables/dimensions


Answer: 18

4.The__________visualization technique maps multidimensional data to five piecestic figures


Answer: Stick figure

5. A____________ is a visualization of statistics of user-generated tags Answer: Tag cloud

WEBSITES:
1. Associate Analytics – II
https://satyasai2.files.wordpress.com/2017/02/associate-analytics-m1-sh.pdf
2. Associate Analytics – III
http://jntuhsd.in/uploads/programmes/Associate_Analytics_M3_final.pdf
LIST OF TOPICS FOR STUDENT SEMINARS (Optional):
1. Data Management
2. Data Analytics
3. Regression
4. Logistic Regression
5. Object Segmentation
6. Time Series Methods
7. Data Visualization

CASE STUDIES / SMALL PROJECTS (Optional):

Case study-1: Titanic Data Set

As the name suggests (no points for guessing), this data set provides the data on all the
passengers who were aboard the RMS Titanic when it sank on 15 April 1912 after colliding
with an iceberg in the North Atlantic ocean. It is the most commonly used and referred to
data set for beginners in data science. With 891 rows and 12 columns, this data set provides a
combination of variables based on personal characteristics such as age, class of ticket and
sex, and tests one’s classification skills.

Objective: Predict the survival of the passengers aboard RMS Titanic.


Reference:https://www.kaggle.com/c/titanic

You might also like