FDS Unit 1 QB
FDS Unit 1 QB
com
4. Write short note on outlier detection and state its real-time application
In statistics, an outlier is a data point that differs significantly from other
observations.
An outlier detection technique (ODT) is used to detect anomalous
observations/samples that do not fit the typical/normal statistical distribution of a
dataset.
Applications of Outlier Detection are SPAM Detection, Credit Card Fraudulent
Activity detection, intrusion detection in cyber security
2
www.BrainKart.
com
11. Define Outlier and show the distribution with an example. How it differs
from sanity check?
An outlier is an observation that lies an abnormal distance from other values in a random
sample from a population. A machine learning model sanity check is a set of tests
performed in a pre-production environment to detect these sorts of systematic errors and
biases, so you can ensure models work as expected before deploying them to production.
3
System.
The goal is to build data-dominant The goal is to make data more vital and
products usable i.e. by extracting only
for a venture important information from the huge
data within existing traditional
aspects.
Tools mainly used in Data Science Tools mostly used in Big Data includes
includes Hadoop, Spark, Flink, etc.
SAS, R, Python, etc
It is a sub set of Data Science as mining It is a super set of Big Data as data science
activities which is in a pipeline of the Data consists of Data scrapping, cleaning,
science. visualization, statistics and many more
techniques.
It is mainly used for scientific purposes It is mainly used for business purposes and
customer satisfaction
Uses mathematics and statistics extensively Used by businesses to track their presence in
along with programming skills to develop a the market which helps them develop agility
model to test the hypothesis and make and gain a competitive advantage over others
decisions in the business
Internet search, digital advertisements, text Telecommunication, financial service, health
to-speech recognition, risk detection, and and sports, research and development, and
other activities security and law enforcement
Part B
1. Discuss in detail about step-by-step process in Data Science with neat diagram
(Analyze)
5
2. Discuss briefly about: (Analyze)
i. Life cycle of Data Science
ii. Machine Learning in Data Science
3. Exemplify in detail about different facets of data with examples. (Analyze)
(April/May 2023)
4. Sketch and outline the step-by-step activities in the data science process. (Remember)
(April/May 2023)
5. Explain in detail about cleansing, integrating, and transforming data with example.
(Analyze) (April/May 2023)
6. Discuss a Linear prediction model execution on a semi random data and give the
python code for the same with model diagnostic and comparison. (Analyze)
7. Give a detailed view on the methodologies of transforming data with examples.
(Understand)
8. Discuss in detail about the characteristics of data, benefits, applications. (Understand)
9. Discuss a K- Nearest neighbour model execution with confusion matrix on a semi
random data and give the python code for the same with model diagnostic and
comparison. (Analyze)
10. Give a detailed case study of building a recommender system inside a database with all
required steps for a data science model. (Analyze)
11. Give a detailed case study of predicting malicious URLs from the set of URLs data
with all the required steps of data science process. (Analyze)