0% found this document useful (0 votes)

25 views7 pages

FDS Unit 1 QB

The document is a question bank for the course AD3491 - Fundamentals of Data Science and Analytics, covering various topics such as characteristics of quality data, definitions of data science, applications, and techniques like outlier detection and data cleansing. It also compares data science and big data, outlines challenges and advantages of data science, and discusses the data science process and methodologies. Additionally, it includes detailed questions for both Part A and Part B, focusing on practical applications and theoretical understanding of data science concepts.

Uploaded by

poornima.priyanka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views7 pages

FDS Unit 1 QB

Uploaded by

poornima.priyanka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

www.BrainKart.

com

DEPARTMENT OF INFORMATION TECHNOLOGY

Academic Year: 2024-25 (Even Semester)
QUESTION BANK
Course Code & Name: AD3491-Fundamentals of Data Science and Analytics
Name of the Faculty member: Mrs.R.Poornima @ Priyanka

UNIT I- INTRODUCTION TO DATA SCIENCE

Part A

1. What are the characteristics of a quality data?

 Validity - The degree to which your data conforms to defined business rules
or constraints.
 Accuracy - Ensure your data is close to the true values.
 Completeness - The degree to which all required data is known.
 Consistency - Ensure your data is consistent within the same data set and/or
across multiple data sets.
 Uniformity - The degree to which the data is specified using the same unit of
measure.

2. What do you mean by Data Science? Or Define Data Science.

 Data science is the domain of study that deals with vast volumes of data using
modern tools and techniques to find hidden patterns, derive meaningful
information, and make business decisions.
 Data science can be explained as the entire process of gathering actionable
insights from raw data that involves concepts like pre-processing of data, data
modelling, statistical analysis, data analysis, machine learning algorithms, etc.
 The main purpose of data science is to compute better decision making.

3. List out at least five applications of data science.

 Finance and Fraud & Risk Detection.
 Healthcare.
 Internet Search and Website Recommendations.
 Retail Marketing and Targeted Advertising.
 Advanced Image Recognition.
 Speech Recognition.
 Airline Route Planning.

4. Write short note on outlier detection and state its real-time application
 In statistics, an outlier is a data point that differs significantly from other
observations.
 An outlier detection technique (ODT) is used to detect anomalous
observations/samples that do not fit the typical/normal statistical distribution of a
dataset.
 Applications of Outlier Detection are SPAM Detection, Credit Card Fraudulent
Activity detection, intrusion detection in cyber security

5. What are the contents should be included in a project charter?

A project charter requires teamwork, and your input covers at least the following:
i. A clear research goa
ii. The project mission and context
iii. How you’re going to perform your analysis
iv. What resources you expect to use
v. Proof that it’s an achievable project, of concepts
www.BrainKart.
com

vi. Deliverable and a measure of success

vii. A timeline

6. Define Data Cleansing.

 Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly
formatted, duplicate, or incomplete data within a dataset.
 When combining multiple data sources, there are many opportunities for data to be
duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable,
even though they may look correct.
 Data cleansing, also referred to as data cleaning or data scrubbing.
7. What is brushing and linking in exploratory data analysis? (April/May 2023)
Brushing and Linking is the connection of two or more views of the same data, such that
a change to the representation in one view affects the representation in the other.
Brushing and linking is also an important technique in interactive visual analysis, a
method for performing visual exploration and analysis of large, structured data sets.
Linking and brushing is one of the most powerful interactive tools for doing exploratory
data analysis using visualization.
8. How does confusion matrix define the performance of classification
algorithm? (April/May 2023)
A confusion matrix is a matrix that summarizes the performance of a machine learning
model on a set of test data. It is often used to measure the performance of classification
models, which aim to predict a categorical label for each input instance. The matrix
displays the number of true positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN) produced by the model on the test data.
9. Specify the Facets of data with an example for each and how Benchmarking
tools and Scheduling tools support data science process
The various facets of data are:
⚫ Structured
⚫ SQL, or Structured Query Language, is the preferred way to manage and
query data that resides in databases
⚫ Unstructured
⚫ Example: E-mail
⚫ Natural language
⚫ It is an another type of unstructured data
⚫ Machine-generated
⚫ Graph-based
⚫ Audio, video, and images
⚫ Streaming
10. Outline the Data cleansing techniques and what are the types of errors
description and give the solution for it
It is a sub process of the data science process. It focuses on removing errors in our data.
So our data becomes a true and consistent representation of the processes it originates
from.

2
www.BrainKart.
com

11. Define Outlier and show the distribution with an example. How it differs
from sanity check?
An outlier is an observation that lies an abnormal distance from other values in a random
sample from a population. A machine learning model sanity check is a set of tests
performed in a pre-production environment to detect these sorts of systematic errors and
biases, so you can ensure models work as expected before deploying them to production.

12. Define Bigdata.

It is huge, large or voluminous data, information or the relevant statistics acquired by the
large organizations and ventures. Many software and data storage created and prepared
as it is difficult to compute the big data manually.
13. Compare Data Science vs Big Data
Data Science Big Data

Data Science is an area/Domain Big Data is a technique to collect,

maintain and process the huge
information.
It is about collection, processing, It is about extracting the vital and
analyzing valuable
and utilizing of data into various information from huge amount of the
operations. data.
It is more conceptual
It is a field of study just like the It is a technique of tracking and
Computer discovering of trends of complex
Science, Applied Statistics or Applied data sets.
Mathematics, Data Base Management

3
System.

The goal is to build data-dominant The goal is to make data more vital and
products usable i.e. by extracting only
for a venture important information from the huge
data within existing traditional
aspects.
Tools mainly used in Data Science Tools mostly used in Big Data includes
includes Hadoop, Spark, Flink, etc.
SAS, R, Python, etc
It is a sub set of Data Science as mining It is a super set of Big Data as data science
activities which is in a pipeline of the Data consists of Data scrapping, cleaning,
science. visualization, statistics and many more
techniques.
It is mainly used for scientific purposes It is mainly used for business purposes and
customer satisfaction
Uses mathematics and statistics extensively Used by businesses to track their presence in
along with programming skills to develop a the market which helps them develop agility
model to test the hypothesis and make and gain a competitive advantage over others
decisions in the business
Internet search, digital advertisements, text Telecommunication, financial service, health
to-speech recognition, risk detection, and and sports, research and development, and
other activities security and law enforcement

14. List out the characteristics of big data.

 Volume
 Variety
 Velocity
 Veracity
 Value
15. Give the various challenges of bigdata.
 Data Capture
 Curation
 Storage
 Search
 Sharing
 Transfer
 Visualization
16. State the need of Data Science.
 Data Science is used in many industries in the world today, e.g. banking,
consultancy, healthcare, and manufacturing.
 Examples of where Data Science is needed:
o For route planning: To discover the best routes to ship
o To foresee delays for flight/ship/train etc. (through predictive analysis)
o To create promotional offers
o To find the best suited time to deliver goods
o To forecast the next years revenue for a company
o To analyze health benefit of training
4
o To predict who will win elections
17. What are the advantages/ benefits of data science?
 Commercial Companies in all business wish to
o analyses and gain insights into their customers, processes, staff,
completion, and products.
 Many companies use data science to offer customers a
o better user experience,
o cross-sell, up-sell, and personalize their offerings.
 Human resource professionals use
o people analytics and text mining to screen candidates
o monitor the mood of employees
o study informal networks among coworkers.
 Financial institutions use data science to
o predict stock markets
o determine the risk of lending money
o learn how to attract new clients for their services.
 Many governmental organizations not only rely on internal data scientists to
o discover valuable information, but also share their data with the public. You
can use this data to gain insights or build data-driven applications.
 Nongovernmental organizations (NGOs)
o can use it as a source for get funding.
o Many data scientists devote part of their time to helping NGOs, because NGOs
often lack the resources to collect data and employ data scientists.
 Universities use data science in their research but also to
o Enhance the study experience of their students.
o The rise of massive open online courses (MOOC) produces a lot of data, which
allows universities to study how this type of learning can complement
traditional classes.
18. List the overview of techniques that handle missing data and mention the pros and cons
of it

Part B
1. Discuss in detail about step-by-step process in Data Science with neat diagram
(Analyze)

5
2. Discuss briefly about: (Analyze)
i. Life cycle of Data Science
ii. Machine Learning in Data Science
3. Exemplify in detail about different facets of data with examples. (Analyze)
(April/May 2023)
4. Sketch and outline the step-by-step activities in the data science process. (Remember)
(April/May 2023)
5. Explain in detail about cleansing, integrating, and transforming data with example.
(Analyze) (April/May 2023)
6. Discuss a Linear prediction model execution on a semi random data and give the
python code for the same with model diagnostic and comparison. (Analyze)
7. Give a detailed view on the methodologies of transforming data with examples.
(Understand)
8. Discuss in detail about the characteristics of data, benefits, applications. (Understand)
9. Discuss a K- Nearest neighbour model execution with confusion matrix on a semi
random data and give the python code for the same with model diagnostic and
comparison. (Analyze)
10. Give a detailed case study of building a recommender system inside a database with all
required steps for a data science model. (Analyze)
11. Give a detailed case study of predicting malicious URLs from the set of URLs data
with all the required steps of data science process. (Analyze)

12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
Foundation of Data Science (BSC)
No ratings yet
Foundation of Data Science (BSC)
64 pages
Data Science Notes
No ratings yet
Data Science Notes
61 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
Fdsa 12 - 2M
No ratings yet
Fdsa 12 - 2M
15 pages
FDS Notes
No ratings yet
FDS Notes
5 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Ads TopperSh
No ratings yet
Ads TopperSh
50 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Fdsa Unit 1 Aids Sem 4
No ratings yet
Fdsa Unit 1 Aids Sem 4
26 pages
MLM FDS
No ratings yet
MLM FDS
19 pages
Data Science - Notes - X
No ratings yet
Data Science - Notes - X
3 pages
Ixs8h l8mgc
No ratings yet
Ixs8h l8mgc
40 pages
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
No ratings yet
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
8 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Unit 4 & 5-Data Science and Computer Vision
No ratings yet
Unit 4 & 5-Data Science and Computer Vision
18 pages
FDS Unit1
No ratings yet
FDS Unit1
30 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
IAT 2 Part A - DS
No ratings yet
IAT 2 Part A - DS
5 pages
Fods MQP Solutions - 025136
No ratings yet
Fods MQP Solutions - 025136
76 pages
Unit I 2 Marks With Ans
No ratings yet
Unit I 2 Marks With Ans
7 pages
Ids Unit I
No ratings yet
Ids Unit I
46 pages
Data Science
No ratings yet
Data Science
10 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
Fdsa Unit 1
No ratings yet
Fdsa Unit 1
25 pages
Foundation of Data Science (BSC) 1
No ratings yet
Foundation of Data Science (BSC) 1
64 pages
Revision
No ratings yet
Revision
19 pages
DS 3-Marks Semeseter Suggestion
No ratings yet
DS 3-Marks Semeseter Suggestion
54 pages
Data Science Notes and Questions - 250605 - 112515
No ratings yet
Data Science Notes and Questions - 250605 - 112515
5 pages
IV AI-DS AD3491 FDSA QB Unit1
No ratings yet
IV AI-DS AD3491 FDSA QB Unit1
5 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
Unit 4
No ratings yet
Unit 4
6 pages
Ad3491-FDA Unit 1 Question Bank
No ratings yet
Ad3491-FDA Unit 1 Question Bank
8 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Data Science
No ratings yet
Data Science
14 pages
DS
No ratings yet
DS
7 pages
A Pocket Guide To Public Speaking. ISBN 1319102786, 978-1319102784
100% (21)
A Pocket Guide To Public Speaking. ISBN 1319102786, 978-1319102784
23 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
3.question Bank
No ratings yet
3.question Bank
7 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
Data Science (Quick Guide) For College Exams
No ratings yet
Data Science (Quick Guide) For College Exams
34 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
Q1. Explain Data Science Process Along With Detailed Diagram
No ratings yet
Q1. Explain Data Science Process Along With Detailed Diagram
7 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Data Science Internship
No ratings yet
Data Science Internship
6 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
8 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
II CSE - A&B (96) DS-int 1 QP ANS-set1
No ratings yet
II CSE - A&B (96) DS-int 1 QP ANS-set1
7 pages
Graded Quiz Unit 3
No ratings yet
Graded Quiz Unit 3
36 pages
FDS - Unit 1 Question Bank
No ratings yet
FDS - Unit 1 Question Bank
16 pages
File
No ratings yet
File
27 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
ABS Blink Codes
No ratings yet
ABS Blink Codes
1 page
Data Science
No ratings yet
Data Science
10 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
NEJE KZ Board Schematic
0% (1)
NEJE KZ Board Schematic
1 page
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
17 pages
ADS & A Unit-1 Study Material
No ratings yet
ADS & A Unit-1 Study Material
13 pages
Nabl Test Report Cross Verification Methodology and Steps For Report Authenticity
No ratings yet
Nabl Test Report Cross Verification Methodology and Steps For Report Authenticity
4 pages
Adobe Acrobat Xi Pro 1102 Torrent PDF
No ratings yet
Adobe Acrobat Xi Pro 1102 Torrent PDF
4 pages
Marketing Plan in Apple: Yubo Yang Xi Chen Sitong Lu Bosheng Hu
No ratings yet
Marketing Plan in Apple: Yubo Yang Xi Chen Sitong Lu Bosheng Hu
20 pages
Ultrasonic Flow Meter
0% (1)
Ultrasonic Flow Meter
2 pages
Article Summary Report ATAL FDP SIT Eee
No ratings yet
Article Summary Report ATAL FDP SIT Eee
3 pages
FDS Unit 5 QB
No ratings yet
FDS Unit 5 QB
8 pages
BIJ Data Analysis Report
No ratings yet
BIJ Data Analysis Report
18 pages
AWS Helper
No ratings yet
AWS Helper
67 pages
Horizon CT5.1
No ratings yet
Horizon CT5.1
40 pages
Chapter 1 v8.2
No ratings yet
Chapter 1 v8.2
72 pages
COSF 327 INFORMATION SECURITY AND AUDIT CONTROL - Kabarak University
No ratings yet
COSF 327 INFORMATION SECURITY AND AUDIT CONTROL - Kabarak University
3 pages
Iat 1 Marks
No ratings yet
Iat 1 Marks
43 pages
Test Phase and Pre-Final Viva Important Questions
No ratings yet
Test Phase and Pre-Final Viva Important Questions
10 pages
Linkstation User Manual
No ratings yet
Linkstation User Manual
89 pages
Quick Installation Guide: Netis 150Mbps Wireless N Portable Router
No ratings yet
Quick Installation Guide: Netis 150Mbps Wireless N Portable Router
1 page
Implementation of Color Sorting
No ratings yet
Implementation of Color Sorting
37 pages
ISP 39 - Joining Letter
No ratings yet
ISP 39 - Joining Letter
4 pages
Experiment No. 1: Name: Juili Maruti Kadu Te A Roll No: 19 UID: 118CP1102B Sub: Software Engineering
No ratings yet
Experiment No. 1: Name: Juili Maruti Kadu Te A Roll No: 19 UID: 118CP1102B Sub: Software Engineering
32 pages
Ii - Cyber - CS3352 - Int V - QB
No ratings yet
Ii - Cyber - CS3352 - Int V - QB
3 pages
II - CSE - CS3352 - QB - Int 1
No ratings yet
II - CSE - CS3352 - QB - Int 1
3 pages
Retire The Threetier Applica 308298
No ratings yet
Retire The Threetier Applica 308298
17 pages
CD4541 Programmable Timer
No ratings yet
CD4541 Programmable Timer
7 pages
DB Report Paper
No ratings yet
DB Report Paper
8 pages
PRO1 Brochure
No ratings yet
PRO1 Brochure
12 pages
Bhanu Priya 2020 IOP Conf. Ser. Mater. Sci. Eng. 912 062009
No ratings yet
Bhanu Priya 2020 IOP Conf. Ser. Mater. Sci. Eng. 912 062009
10 pages
PT Meeting Parent Form IT
No ratings yet
PT Meeting Parent Form IT
1 page
CSE1002
No ratings yet
CSE1002
14 pages
Application of Jacobian Series
No ratings yet
Application of Jacobian Series
6 pages
TB Barricade v3 - ESP
No ratings yet
TB Barricade v3 - ESP
6 pages
Formulir BAS ACADEMY RENDY
No ratings yet
Formulir BAS ACADEMY RENDY
4 pages
Video Conferencing Industry: 5 Forces Worksheet: Key Barriers To Entry
No ratings yet
Video Conferencing Industry: 5 Forces Worksheet: Key Barriers To Entry
1 page
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

FDS Unit 1 QB

Uploaded by

FDS Unit 1 QB

Uploaded by

www.BrainKart.

DEPARTMENT OF INFORMATION TECHNOLOGY

UNIT I- INTRODUCTION TO DATA SCIENCE

1. What are the characteristics of a quality data?

2. What do you mean by Data Science? Or Define Data Science.

3. List out at least five applications of data science.

5. What are the contents should be included in a project charter?

vi. Deliverable and a measure of success

6. Define Data Cleansing.

12. Define Bigdata.

Data Science is an area/Domain Big Data is a technique to collect,

14. List out the characteristics of big data.

You might also like