0% found this document useful (0 votes)
16 views12 pages

Data Science

Data Science question bank

Uploaded by

thangam suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

Data Science

Data Science question bank

Uploaded by

thangam suresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

SRM VALLIAMMAI ENGINEERING COLLEGE

(An Autonomous Institution)


SRM Nagar, Kattankulathur – 603 203

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

QUESTION BANK

(COMMON TO DEPARTMENT OF INFORMATION TECHNOLOGY)

VI SEMESTER

1904607- DATA SCIENCE

Regulation – 2019

Academic Year 2021 – 2022

Prepared by

Mr. N. LEO BRIGHT TENNISSON (AP Sr. G. / CSE)

Dr. D. SRIDEVI (AP Sr. G. / IT)

1
SRM VALLIAMMAI ENGINEERING COLLEGE
(An Autonomous Institution)
SRM Nagar, Kattankulathur-603203
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

QUESTION BANK

SUBJECT : 1904607- DATA SCIENCE


SEM/YEAR : VI / III
UNIT - I: INTRODUCTION TO DATA SCIENCE
Introduction to Data Science-Concept of Data Science-Traits of Big Data-Web Scraping- Analysis vs
Reporting

PART – A
Q. Question BT Competence
No Level
1 What is Data Science process? Explain. BTL-1 Remember
2 Differentiate Business Intelligence (BI) and Data Science. BTL-2 Understand

3 Compare Data Science and Statistics. BTL-4 Analyze


4 Define Data Science. BTL-1 Remember
5 List out the areas in which Data Science can be applied. BTL-1 Remember
6 Who is a Data Scientist? BTL-1 Remember
7 Compare Big Data with Data Science. BTL-5 Evaluate
8 State the purpose or reporting and analysis. BTL-1 Remember
9 List out the advantages of web scraping. BTL-1 Remember
10 Can Data Science Predict the Stock Market? Examine. BTL-3 Apply
11 Discuss about analysis and reporting. BTL-2 Understand
12 Give Drew Conway’s Venn diagram of Data Science. BTL-2 Understand
13 Specify the life cycle of Data Science. BTL-6 Create
14 Illustrate the use of Data Science with an example. BTL-3 Apply
Show the ways in which decision making and predictions are made in
15 BTL-3 Apply
Data Science.
16 Differentiate Data Mining and Data Science. BTL-2 Understand
17 Analyze Data Science ethics. BTL-4 Analyze

18 Analyze the roles of Data Science. BTL-4 Analyze


Bootstrap is more thorough in terms of the magnitude of replication. BTL-5 Evaluate
19 Justify.

2
20 Develop a general algorithm for Data Science process. BTL-6 Create

PART – B
i. What is Bigdata? (3)
1 BTL-1 Remember
ii. Describe the main features of a big data in detail.(10)
2 Describe life cycle of Data Science with neat diagram. (13) BTL-1 Remember

3 List the main characteristics of Big Data. BTL-1 Remember

i. Discuss nature of data.(7)


4 BTL-2 Understand
ii. Give detail description of applications of data. (6)
i. Give the Difference between Traditional Business Intelligence (BI)
5 BTL-2 Understand
versus Big Data.(7)
ii. Give the various drawbacks of using Traditional system approach. (6)
6 i. Demonstrate the ETL (Extract, Transform and Load) system? (7) BTL-3 Apply
ii. Explain Big Data Technology Landscape. (6)
7 Analyze and write short notes on the following. BTL-4 Analyze
i. Hadoop Distributed File System (HDFS). (6)
ii. YARN.(7)
Explain the following in detail.
8 i. Map Reduce. (7) BTL-4 Analyze
ii. YARN.(6)
9 i. Assess the difference between analysis and analytics. (6) BTL-5 Evaluate
ii. Discuss the importance of big data analytics? (7)
Extrapolate big data analytics and Develop a summary of various
10 BTL-6 Create
applications in the real world scenario. (13)
11 Describe the roles and stages in data science project. (13) BTL-1 Remember
12 i. Illustrate the importance of big data. (6) BTL-3 Apply
ii. List out the various challenges faced in big data in detail. (7)
13 Explain storage consideration in Big Data. (13) BTL-4 Analyze
14 Discuss Data Cleaning and Sampling. (13) BTL-2 Understand

PART – C

1 Create a brief summary about the challenges faced in processing big data BTL-6 Create
now a day. (15)
2 Evaluate in detail about the case study of big data solutions. (15) BTL-5 Evaluate
3 Explain Traditional Vs Big data business approach with its drawbacks. BTL-6 Create
(15)
Evaluate the various formats of data and illustrate with a real time
4 BTL-5 Evaluate
examples. (15)

3
UNIT II MATHEMATICAL FOUNDATIONS

Linear Algebra: Vectors, Matrices- Statistics: Describing a Single Set of Data, Correlation, Simpson’s
Paradox-Correlation and Causation- Probability: Dependence and Independence, Conditional Probability,
Bayes’s-Theorem, Random Variables-Continuous Distributions- The Normal Distribution-The Central Limit
Theorem.
PART – A
Q. Questions BT Competence
No Level
1 Point out application of vectors. BTL-4 Analyze
2 Point out the rules for dot product of two vectors. BTL-4 Analyze
3 Compare variance and covariance. BTL-5 Evaluate
4 Develop a matrix to demonstrate binary relationship. BTL-6 Create
5 What is statistics? What are the ways to describe single set of data? BTL-1 Remember
6 List applications of matrices. BTL-1 Remember
7 Given single set of data, explain central tendencies of the data. BTL-2 Understand
8 Describe dispersion in single set of data. BTL-2 Understand
9 Give example of a continuous distribution. BTL-2 Understand
10 Define Bayes’s Theorem. BTL-1 Remember
11 List some applications of conditional probability. BTL-1 Remember
12 What way we can think of probability with respect to Data Science? BTL-1 Remember
13 What is correlation? BTL-1 Remember
14 Why normal distribution is important? BTL-2 Understand
15 Classify the different distribution of values of random variables. BTL-3 Apply
16 Illustrate normal distribution with diagram. BTL-3 Apply
17 Complete a routine to display a histogram for sample number people and BTL-3 Apply
respective number of friends for them.
18 Analyze and write the importance of matrices in representing data sets. BTL-4 Analyze
19 Reason for importance of normal distribution is central limit theorem – BTL-5 Evaluate
Justify.
20 Develop a routine to plot Probability Density Function. BTL-6 Create
PART – B
Describe vectors and various operations on vectors with routines,
1 example and diagram. (13) BTL-1 Remember
2 Explain matrices with respect to Data Science. (6) BTL-3 Apply
Explain statistics and single set of Data. (7)
3 i. Describe about correlation in detail.(7) BTL-1 Remember
ii. Explain any one application of correlation.(6)
4 Explain normal distribution with an example. (13) BTL-3 Apply

4
5 i. Explain conditional probability.(8) BTL-5 Evaluate
ii. Justify the need for normal distribution. (5)
6 i. Give routine to display a histogram. (7) BTL-2 Understand
ii. Discuss about Dependence and Independence. (6)
7 i. Describe application of matrices to represent binary relationship an BTL-2 Understand
example. (7)
ii. Describe Bayes’s Theorem. (6)
8 i. Write a routine to plot Probability Density Function and illustrate with BTL-4 Analyze
an example. (7)
ii. Write a routine to plot a Histogram that compares Binomial
Distribution and Normal Distribution. (6)
9 i. Describe Normal Distribution in detail. (7) BTL-1 Remember
ii. Explain any one application of Bayes’s theorem. (6)
10 Briefly describe the use of statistics in Data Science. (13) BTL-1 Remember
11 Analyze and write a routine to implement various Probability Functions BTL-4 Analyze
with example. (13)
12 Develop a data set and demonstrate correlation. (13) BTL-6 Create
13 Discuss in detail about the variance, covariance, and correlation. (13) BTL-2 Understand
14 Illustrate various distributions of values of random variables. (13) BTL-4 Analyze

PART – C
1 Develop a routine to demonstrate Binomial Distribution and Normal BTL-6 Create
Distribution. (15)
Assess the routines to implement various random variable distribution
2 BTL-5 Evaluate
functions. (15)
3 Assess the difference between variance and covariance. BTL-5 Evaluate
Show a data set of values and demonstrate its correlation. (15)
4 Develop your own scenarios to demonstrate use of Vectors and Matrices BTL-6 Create
in Data Science. (15)

UNIT - III: MACHINE LEARNING


Overview of Machine learning concepts –Types of Machine learning - Linear Regression- model
assumptions-Classification and Regression algorithms- Naïve Bayes, K-Nearest Neighbors, logistic
regression- support vector machines (SVM), decision trees, and random forest.
Q. Questions BT Competence
No Level
PART – A
1 A common danger in machine learning is overfitting Justify BTL 5 Evaluate
2 Usually the choice of a model involves a trade-off between precision and
BTL5 Evaluate
recall. Justify.
3 What is Machine Learning? BTL 1 Remember

5
4 Create a chart that demonstrates overfitting. BTL 6 Create
5 How supervised models differ from unsupervised models? BTL 4 Analyze
6 What is the reason for the word “Naïve” in Naïve Bayes classification? BTL 1 Remember
7 List the major categories of Machine Learning. BTL 1 Remember
8 What is a model with respect to Machine Learning? Give example. BTL 2 Understand
9 Define simple linear Regression. BTL 1 Remember
10 How to find the hyper plane dimension given the dimension of data in BTL 2 Understand
Support Vector Machine classification?
11 Simulate the idea behind nearest neighbor’s classification. BTL 6 Create
12 Discuss about random forests. BTL 2 Understand
13 Give the formula for Conditional probability. BTL 2 Understand
14 Explain Bayes’s theorem. BTL 4 Analyze
15 How we get random trees in Random Forest classification? BTL 3 Apply
16 List major categories of supervised learning. BTL 1 Remember
17 List out various regression models under supervised learning. BTL 1 Remember
18 Illustrate all possible decisions that can be made by the following decision
tree.
Is a Person Physically Fit?

Age<30?
Yes No
BTL 3 Apply
Eat’s a lot of Pizzas? Exercises in the morning?
Yes No Yes No

Unfit! fit! fit! Unfit!

19 Differentiate regression and Classification. BTL 4 Analyze


20 Show formula for maximum likelihood estimation given a sample data BTL 3 Apply
v1,….,vn that comes from a distribution that depends on some unknown
parameter ϴ.
PART – B
1 Write routine for logistic regression and explain with necessary data and BTL 1 Remember
charts. (13)
2 Explain the following with suitable example. BTL 4 Analyze
i. Simple Linear Regression. (6)
ii. Multiple Regression. (7)
3 Describe K-Nearest predictive model with suitable routine and example. BTL 1 Remember
(13)
4 Write the formula for Bayes Theorem and explain Naïve Bayes classifier BTL 1 Remember
with necessary routine.(13)

6
5 Discuss in detail the various Supervised Machine Learning techniques. (13) BTL 2 Understand
6 Construct a decision tree for the following data: BTL 5 Evaluate
Explain various path in the tree that leads to various decisions. (13)

Outlook Temp Humidity Windy Play Golf


Rainy Hot High False No
Rainy Hot High True No
Over roast Hot High False Yes
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Over roast Cool Normal True Yes
Rainy Mild High False No
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes
Rainy Mild Normal True Yes
Overroast Mild High True Yes
Overroast Hot Normal False Yes
Sunny Mild High True No
7 Discuss random forest with suitable algorithms and examples? (13) BTL 2 Understand
8 Develop a routine for Support Vector Machine for a two dimension data. BTL 6 Create
Validate the algorithm with suitable example. (13)
9 Describe in detail about the following. BTL 1 Remember
i. Support Vector Machine. (7)
ii. Hyper Plane. (6)
10 Differentiate classification model and regression model of machine BTL 4 Analyze
learning with suitable examples. (13)
11 i. Write short notes Random Trees. (6) BTL 2 Understand
ii. Explain random forest with example. (7)
12 Explain the Support Vector Machine classification for three dimensional BTL 4 Analyze
data with necessary routine.(13)
13 Illustrate decision trees with suitable examples. (13) BTL 3 Apply
14 i. Show a formula for maximum likelihood estimation (6) BTL 3 Apply
ii. Prove the working of Naïve Bayes classifier with necessary routine. (7)
PART – C
1 Construct a decision tree for sample data of your own and evaluate various BTL 5 Evaluate
decision that can be arrived based on the decision tree. (15)
2 Create your own three-dimensional data and classify them using Support BTL 6 Create
Vector Machine. (15)
3 Evaluate random trees and explain random forest. (15) BTL 5 Evaluate

7
4 Construct a decision tree for the following data: BTL 6 Create
Explain various path in the tree that leads to various decisions. (15)

Outlook Temp Humidity Windy Play Golf


Rainy Hot High False No
Rainy Hot High True No
Over roast Hot High False Yes
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Over roast Cool Normal True Yes
Rainy Mild High False No
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes
Rainy Mild Normal True Yes
Overroast Mild High True Yes
Overroast Hot Normal False Yes
Sunny Mild High True No

UNIT - IV: PROGRAMMING TOOLS FOR DATA SCIENCE


Introduction to Programming Tools for Data Science-Toolkits using Python: Matplotlib, NumPy, Scikit-
learn, NLTK-Visualizing Data: Bar Charts, Line Charts and Scatterplots-Working with data: Reading Files,
Scraping the Web, Using APIs (Example: Using the Twitter APIs).

PART – A
Q.No Questions BT Competence
Level
1 What is SAS? BTL1 Remember
2 Define data visualization in machine learning. BTL1 Remember
3 Give the features of Numpy. BTL2 Understand
4 What is meant Matplotlib? Give features of Matplotlib. BTL1 Remember
5 Give the expansion for NLTK in machine learning and explain. BTL2 Understand
6 List any four data science tools. BTL1 Remember
7 Describe about Apache Spark. BTL1 Remember
8 Predict the features of Scikit. BTL2 Understand
9 Compare R and Python. BTL4 Analyze

8
10 Distinguish Statistics and Data Science. BTL2 Understand
11 Classify the different visualization tools. BTL3 Apply
12 Develop line chart for the following data. BTL6 Create
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3].
13 Which language is best for learning data science? Illustrate why? BTL3 Apply
14 Summarize the MATLAB. BTL5 Evaluate
15 Point out the components of Data Science. BTL4 Analyze
16 Compare various data science languages. BTL4 Analyze
17 Select the best tool or language for data science and give justification. BTL5 Evaluate
18 Illustrate line charts with an example. BTL3 Apply
19 Identify the tools for Data Science. BTL1 Remember
20 Develop a bar chart for the following data. BTL6 Create
movies = ["Annie Hall", "Ben-Hur", "Casablanca", "Gandhi", "West Side
Story"]
num_oscars = [5, 11, 3, 8, 10].
PART – B
1 i. Describe Numpy in detail. (6) BTL1 Remember
ii. Write a python program that uses numpy and explain it. (7)
2 Describe the following. BTL1 Remember
i. Numpy. (7)
ii. Scikit. (6)
3 i. List the different types of charts? (7) BTL1 Remember
ii. Explain any one chart in detail with an Example.(6)
4 Discuss various Toolkits in Python in detail.(13) BTL2 Understand
5 Describe various web scraping methods in detail.(13) BTL2 Understand
6 Illustrate Matplotlib with an example. (13) BTL3 Apply
7 Explain different visualization tools in detail with an example. (13) BTL4 Analyze
8 Point out various features of Toolkits that can be used with Python. (13) BTL4 Analyze
9 Write about estimators and Explain how it can be fitted to some data using BTL5 Evaluate
its fit method. (13)
10 Write a program by loading the Iris dataset, split it into train and test sets, BTL6 Create
and compute the accuracy score of a pipeline on the test data. (13)
11 i. Write a python program to read a file. (7) BTL3 Apply
ii. Illustrate the flow of the program.( 6)

9
12 Describe the following. BTL1 Remember
i. MaTLAB. (7)
ii. Python. (6)
13 Explain in detail about the following. BTL4 Analyze
i. Line chart. (6)
ii. Bar chart .(7)
14 Describe NLTK. Explain the steps to use it in Python. (13) BTL2 Understand
PART – C
1 Develop a line chart to visualize a data set of your choice and give the BTL6 Create
detailed explanation of observations from chart. (15)
2 Analyze how to construct a bar chart for a data set and explain it in BTL4 Analyze
detail.(15)
3 Explain a various methods of Scraping the web in detail. (15) BTL5 Evaluate
4 Prepare a program to read a file and discuss its working.(15) BTL6 Create

UNIT - V: CASE STUDIES OF DATA SCIENCE APPLICATION


Weather forecasting-Stock market prediction-Object recognition- Real Time Sentiment Analysis.
PART – A
Q. Questions BT Competence
No Level
1 What is weather forecasting? BTL1 Remember
2 Define precipitation. BTL1 Remember
3 Give the advantages of weather forecasting. BTL2 Understand
4 What is Object Recognition? BTL1 Remember
5 Give need for opinion mining. BTL2 Understand
6 Name the applications of Sentiment Analysis. BTL1 Remember
7 Name the applications of Object Detection. BTL1 Remember
8 Predict the importance of opinions. BTL2 Understand
9 Point out the role of the web in Sentiment Analysis. BTL4 Analyze
10 Distinguish between computer vision tasks: Image Classification and Object BTL2 Understand
Localization.
11 Classify the different computer vision tasks. BTL3 Apply
12 Develop sample input and output for Object Detection. BTL6 Create
13 Which is said to be primary source of atmospheric science? BTL3 Apply
14 Summarize the Role of Modeling to Predict Stock Prices. BTL5 Evaluate
15 Point out the importance of Stock Market. BTL4 Analyze

10
16 Compare different computer vision tasks. BTL4 Analyze
17 Can Data Science be used in Stock Market Analysis? Justify. BTL5 Evaluate
18 How weather forecasts are made? BTL3 Apply
19 List three modules of R-CNN. BTL1 Remember
20 Develop sample input and output for Object Localization. BTL6 Create
PART – B
1 i. Describe data is a crucial part of Weather Predictions. (6) BTL1 Remember
ii. How weather Data is an aid for many Events. (7)
Describe the following
2  i. Image Classification. (6) BTL1 Remember
 ii. Object Localization. (7)
3 Describe the following. BTL1 Remember
1. i, A Twitter NLP chain,. (5)
ii, NL processor and Ad-hoc NL processor. (8)
4 Discuss various subprocesses involved in the complete process of data science BTL2 Understand
for weather prediction. (13)
5 Describe YOLO Model Family. (13) BTL2 Understand
6 Write in detail about R-CNN Model Family. (13) BTL3 Apply
7 Explain schema, which shows the process of the water cycle and BTL4 Analyze
precipitation occurrence. (13)
8 Compare the following computer vision tasks and discuss about each task BTL4 Analyze
in a very detailed Manner.
i, Object Localization. (6)
ii. Object Detection. (7)
9 Summarize of Predictions made by YOLO Model. (13) BTL5 Evaluate
10 Develop a code to Prepare the Input for the LSTM Model. (13) BTL6 Create
11 i. Write short notes on R-CNN.(7) BTL4 Analyze
ii. Illustrate Satellite Imagery and Sensor Data. (6)
Describe the following
12 i. Image Classification. (6) BTL1 Remember
ii. Object Localization. (7)
13 i. Discuss in detail about Satellite Imagery and Sensor Data in weather BTL3 Apply
forecasting. (7)
i. Explain the Stock Market with suitable example. (6)
14 Describe various computer vision tasks in object recognition .(13) BTL2 Understand

PART – C
1 Develop a case study of Sentiment Analysis in Twitter.(15) BTL6 Create
2 Explain Condensation and coalescence are important parts of the water cycle and BTL5 Evaluate
how data collected from it.(15)

3 Explain Fast R-CNN. (15) BTL5 Evaluate


4 Develop a case study on Google Stock Price Prediction Using LSTM. (15) BTL6 Create

11
12

You might also like