0% found this document useful (0 votes)
25 views7 pages

FDS Unit 1 QB

The document is a question bank for the course AD3491 - Fundamentals of Data Science and Analytics, covering various topics such as characteristics of quality data, definitions of data science, applications, and techniques like outlier detection and data cleansing. It also compares data science and big data, outlines challenges and advantages of data science, and discusses the data science process and methodologies. Additionally, it includes detailed questions for both Part A and Part B, focusing on practical applications and theoretical understanding of data science concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views7 pages

FDS Unit 1 QB

The document is a question bank for the course AD3491 - Fundamentals of Data Science and Analytics, covering various topics such as characteristics of quality data, definitions of data science, applications, and techniques like outlier detection and data cleansing. It also compares data science and big data, outlines challenges and advantages of data science, and discusses the data science process and methodologies. Additionally, it includes detailed questions for both Part A and Part B, focusing on practical applications and theoretical understanding of data science concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

www.BrainKart.

com

DEPARTMENT OF INFORMATION TECHNOLOGY


Academic Year: 2024-25 (Even Semester)
QUESTION BANK
Course Code & Name: AD3491-Fundamentals of Data Science and Analytics
Name of the Faculty member: Mrs.R.Poornima @ Priyanka

UNIT I- INTRODUCTION TO DATA SCIENCE


Part A

1. What are the characteristics of a quality data?


 Validity - The degree to which your data conforms to defined business rules
or constraints.
 Accuracy - Ensure your data is close to the true values.
 Completeness - The degree to which all required data is known.
 Consistency - Ensure your data is consistent within the same data set and/or
across multiple data sets.
 Uniformity - The degree to which the data is specified using the same unit of
measure.

2. What do you mean by Data Science? Or Define Data Science.


 Data science is the domain of study that deals with vast volumes of data using
modern tools and techniques to find hidden patterns, derive meaningful
information, and make business decisions.
 Data science can be explained as the entire process of gathering actionable
insights from raw data that involves concepts like pre-processing of data, data
modelling, statistical analysis, data analysis, machine learning algorithms, etc.
 The main purpose of data science is to compute better decision making.

3. List out at least five applications of data science.


 Finance and Fraud & Risk Detection.
 Healthcare.
 Internet Search and Website Recommendations.
 Retail Marketing and Targeted Advertising.
 Advanced Image Recognition.
 Speech Recognition.
 Airline Route Planning.

4. Write short note on outlier detection and state its real-time application
 In statistics, an outlier is a data point that differs significantly from other
observations.
 An outlier detection technique (ODT) is used to detect anomalous
observations/samples that do not fit the typical/normal statistical distribution of a
dataset.
 Applications of Outlier Detection are SPAM Detection, Credit Card Fraudulent
Activity detection, intrusion detection in cyber security

5. What are the contents should be included in a project charter?


A project charter requires teamwork, and your input covers at least the following:
i. A clear research goa
ii. The project mission and context
iii. How you’re going to perform your analysis
iv. What resources you expect to use
v. Proof that it’s an achievable project, of concepts
www.BrainKart.
com

vi. Deliverable and a measure of success


vii. A timeline

6. Define Data Cleansing.


 Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly
formatted, duplicate, or incomplete data within a dataset.
 When combining multiple data sources, there are many opportunities for data to be
duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable,
even though they may look correct.
 Data cleansing, also referred to as data cleaning or data scrubbing.
7. What is brushing and linking in exploratory data analysis? (April/May 2023)
Brushing and Linking is the connection of two or more views of the same data, such that
a change to the representation in one view affects the representation in the other.
Brushing and linking is also an important technique in interactive visual analysis, a
method for performing visual exploration and analysis of large, structured data sets.
Linking and brushing is one of the most powerful interactive tools for doing exploratory
data analysis using visualization.
8. How does confusion matrix define the performance of classification
algorithm? (April/May 2023)
A confusion matrix is a matrix that summarizes the performance of a machine learning
model on a set of test data. It is often used to measure the performance of classification
models, which aim to predict a categorical label for each input instance. The matrix
displays the number of true positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN) produced by the model on the test data.
9. Specify the Facets of data with an example for each and how Benchmarking
tools and Scheduling tools support data science process
The various facets of data are:
⚫ Structured
⚫ SQL, or Structured Query Language, is the preferred way to manage and
query data that resides in databases
⚫ Unstructured
⚫ Example: E-mail
⚫ Natural language
⚫ It is an another type of unstructured data
⚫ Machine-generated
⚫ Graph-based
⚫ Audio, video, and images
⚫ Streaming
10. Outline the Data cleansing techniques and what are the types of errors
description and give the solution for it
It is a sub process of the data science process. It focuses on removing errors in our data.
So our data becomes a true and consistent representation of the processes it originates
from.

2
www.BrainKart.
com

11. Define Outlier and show the distribution with an example. How it differs
from sanity check?
An outlier is an observation that lies an abnormal distance from other values in a random
sample from a population. A machine learning model sanity check is a set of tests
performed in a pre-production environment to detect these sorts of systematic errors and
biases, so you can ensure models work as expected before deploying them to production.

12. Define Bigdata.


It is huge, large or voluminous data, information or the relevant statistics acquired by the
large organizations and ventures. Many software and data storage created and prepared
as it is difficult to compute the big data manually.
13. Compare Data Science vs Big Data
Data Science Big Data

Data Science is an area/Domain Big Data is a technique to collect,


maintain and process the huge
information.
It is about collection, processing, It is about extracting the vital and
analyzing valuable
and utilizing of data into various information from huge amount of the
operations. data.
It is more conceptual
It is a field of study just like the It is a technique of tracking and
Computer discovering of trends of complex
Science, Applied Statistics or Applied data sets.
Mathematics, Data Base Management

3
System.

The goal is to build data-dominant The goal is to make data more vital and
products usable i.e. by extracting only
for a venture important information from the huge
data within existing traditional
aspects.
Tools mainly used in Data Science Tools mostly used in Big Data includes
includes Hadoop, Spark, Flink, etc.
SAS, R, Python, etc
It is a sub set of Data Science as mining It is a super set of Big Data as data science
activities which is in a pipeline of the Data consists of Data scrapping, cleaning,
science. visualization, statistics and many more
techniques.
It is mainly used for scientific purposes It is mainly used for business purposes and
customer satisfaction
Uses mathematics and statistics extensively Used by businesses to track their presence in
along with programming skills to develop a the market which helps them develop agility
model to test the hypothesis and make and gain a competitive advantage over others
decisions in the business
Internet search, digital advertisements, text Telecommunication, financial service, health
to-speech recognition, risk detection, and and sports, research and development, and
other activities security and law enforcement

14. List out the characteristics of big data.


 Volume
 Variety
 Velocity
 Veracity
 Value
15. Give the various challenges of bigdata.
 Data Capture
 Curation
 Storage
 Search
 Sharing
 Transfer
 Visualization
16. State the need of Data Science.
 Data Science is used in many industries in the world today, e.g. banking,
consultancy, healthcare, and manufacturing.
 Examples of where Data Science is needed:
o For route planning: To discover the best routes to ship
o To foresee delays for flight/ship/train etc. (through predictive analysis)
o To create promotional offers
o To find the best suited time to deliver goods
o To forecast the next years revenue for a company
o To analyze health benefit of training
4
o To predict who will win elections
17. What are the advantages/ benefits of data science?
 Commercial Companies in all business wish to
o analyses and gain insights into their customers, processes, staff,
completion, and products.
 Many companies use data science to offer customers a
o better user experience,
o cross-sell, up-sell, and personalize their offerings.
 Human resource professionals use
o people analytics and text mining to screen candidates
o monitor the mood of employees
o study informal networks among coworkers.
 Financial institutions use data science to
o predict stock markets
o determine the risk of lending money
o learn how to attract new clients for their services.
 Many governmental organizations not only rely on internal data scientists to
o discover valuable information, but also share their data with the public. You
can use this data to gain insights or build data-driven applications.
 Nongovernmental organizations (NGOs)
o can use it as a source for get funding.
o Many data scientists devote part of their time to helping NGOs, because NGOs
often lack the resources to collect data and employ data scientists.
 Universities use data science in their research but also to
o Enhance the study experience of their students.
o The rise of massive open online courses (MOOC) produces a lot of data, which
allows universities to study how this type of learning can complement
traditional classes.
18. List the overview of techniques that handle missing data and mention the pros and cons
of it

Part B
1. Discuss in detail about step-by-step process in Data Science with neat diagram
(Analyze)

5
2. Discuss briefly about: (Analyze)
i. Life cycle of Data Science
ii. Machine Learning in Data Science
3. Exemplify in detail about different facets of data with examples. (Analyze)
(April/May 2023)
4. Sketch and outline the step-by-step activities in the data science process. (Remember)
(April/May 2023)
5. Explain in detail about cleansing, integrating, and transforming data with example.
(Analyze) (April/May 2023)
6. Discuss a Linear prediction model execution on a semi random data and give the
python code for the same with model diagnostic and comparison. (Analyze)
7. Give a detailed view on the methodologies of transforming data with examples.
(Understand)
8. Discuss in detail about the characteristics of data, benefits, applications. (Understand)
9. Discuss a K- Nearest neighbour model execution with confusion matrix on a semi
random data and give the python code for the same with model diagnostic and
comparison. (Analyze)
10. Give a detailed case study of building a recommender system inside a database with all
required steps for a data science model. (Analyze)
11. Give a detailed case study of predicting malicious URLs from the set of URLs data
with all the required steps of data science process. (Analyze)

You might also like