0% found this document useful (0 votes)
19 views5 pages

FDS 1

The document consists of a series of questions and answers related to data science concepts, including applications, definitions, and methods. Key topics covered include data types, outlier detection, data visualization libraries, and data cleaning techniques. Additionally, it discusses the 3Vs of data science and exploratory data analysis (EDA).

Uploaded by

f95850369
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views5 pages

FDS 1

The document consists of a series of questions and answers related to data science concepts, including applications, definitions, and methods. Key topics covered include data types, outlier detection, data visualization libraries, and data cleaning techniques. Additionally, it discusses the 3Vs of data science and exploratory data analysis (EDA).

Uploaded by

f95850369
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1

Q1) Attempt any Eight of the following : [8 × 1 = 8]

a) List any two applications of Data Science.


Answer: Healthcare analytics, Fraud detection

b) What is outlier?
Answer: An outlier is a data point that differs significantly from other observations in a
dataset.

c) What is missing values?


Answer: Missing values are data points where no value is stored for a variable in an
observation.

d) Define variance.
Answer: Variance measures how far each number in a dataset is from the mean.

e) What is nominal attribute?


Answer: A nominal attribute is a categorical variable without any order, e.g., colors or
names.

f) What is data transformation?


Answer: It's the process of converting data into a suitable format or structure for
analysis.

g) What is one hot coding?


Answer: It converts categorical variables into binary vectors representing each
category.

h) What is the use of Bubble plot?


Answer: A bubble plot visualizes relationships between three variables using x, y, and
bubble size.

i) Define data visualisation.


Answer: It is the graphical representation of data and information to identify patterns
and insights.

j) Define standard deviation.


Answer: It measures the amount of variation or dispersion in a dataset.
2

Q2) Attempt any four of the following : [4 × 2 = 8]

a) Differentiate structured and unstructured data.


Answer:

• Structured Data: Organized in rows and columns (e.g., databases).

• Unstructured Data: Not organized, e.g., images, videos, emails.

b) What is inferential statistics?


Answer:
It uses a random sample of data to make inferences or predictions about a larger
population.

c) What do you mean by data preprocessing?


Answer:
Data preprocessing is a technique to clean, transform, and organize raw data into a usable
format.

d) Define data discretization.


Answer:
Data discretization is the process of converting continuous data into discrete buckets or
intervals.

e) What is visual encoding?


Answer:
Visual encoding refers to how data values are mapped to visual elements like position, size,
shape, or color in a chart or graph.
3

Q3) Attempt any two of the following : [2 × 4 = 8]

a) Explain outlier detection methods in brief.


Answer:

1. Z-Score Method: Detects outliers by measuring how far a point is from the mean in
terms of standard deviations.

2. IQR Method (Interquartile Range): Values lying below Q1 - 1.5×IQR or above Q3 +


1.5×IQR are outliers.

3. Box Plot: Graphical method to identify outliers visually.

4. DBSCAN: A clustering algorithm that can detect outliers as noise points.

b) Write different data visualization libraries in Python.


Answer:

1. Matplotlib – Basic plotting library.

2. Seaborn – Built on matplotlib; used for statistical graphs.

3. Plotly – Interactive web-based visualizations.

4. Altair – Declarative statistical visualization.

5. Bokeh – Interactive visualization for modern web browsers.

c) What is data cleaning? Explain any two data cleaning methods.


Answer:
Data Cleaning involves detecting and correcting inaccurate or incomplete data.
Two methods:

• Handling Missing Data: Replace with mean/median or drop rows.

• Removing Duplicates: Use tools like drop_duplicates() in pandas to remove


repeated entries.
4

Q4) Attempt any two of the following : [2 × 4 = 8]

a) Explain 3V’s of Data Science.


Answer:

1. Volume: Refers to the amount of data (large scale).

2. Velocity: Speed at which data is generated and processed.

3. Variety: Different types of data – structured, unstructured, and semi-structured.

b) Explain data cube aggregation method in detail.


Answer:
Data cube aggregation summarizes data along multiple dimensions. It uses aggregation
functions (sum, average) to compute statistics across various levels of detail.
For example, sales data can be aggregated by region, time, and product to support OLAP
(Online Analytical Processing).

c) Explain any two data transformation techniques in detail.


Answer:

1. Normalization: Scaling data to fit within a specific range, like 0 to 1.


Example: Min-Max normalization

2. Encoding Categorical Variables: Converting categories into numeric format, e.g.,


one-hot encoding or label encoding.
5

Q5) Attempt any one of the following : [1 × 3 = 3]

a) Write a short note on feature extraction.


Answer:
Feature extraction is the process of transforming raw data into a set of useful features that
represent the underlying problem.
For example, extracting keywords from text, or edges from images. It helps in improving
model accuracy by providing relevant data.

b) Explain Exploratory Data Analysis (EDA) in detail.


Answer:
EDA is a technique to analyze and summarize datasets using visual methods.
Key steps include:

• Understanding the structure of data (types, missing values).

• Detecting outliers and anomalies.

• Visualizing distributions using histograms, box plots, scatter plots.

• Identifying relationships between variables using correlation matrices.


It helps form hypotheses and guides further analysis or modeling.

You might also like