0% found this document useful (0 votes)

19 views14 pages

FDS - 2 Solved

TYBCS foundation of data science solved question paper

Uploaded by

devyanibotre2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views14 pages

FDS - 2 Solved

TYBCS foundation of data science solved question paper

Uploaded by

devyanibotre2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Q1) Attempt any EIGHT of the following:

a) What is Data science?

Sol:

Data science is an interdisciplinary field that combines domain expertise, programming

skills, and knowledge of mathematics and statistics to extract meaningful insights from
data. It involves processes such as data collection, data cleaning, data analysis, data
visualization, and the application of machine learning algorithms to solve complex
problems and make data-driven decisions.

b) Define Data source?

Sol:

A data source is any location or system that provides data for analysis. Examples include
databases, spreadsheets, APIs, sensors, and web services. Data sources can be internal
(e.g., company databases) or external (e.g., public datasets, social media).

c) What is missing values?

Sol:

Missing values refer to data points that are not recorded or are absent in a dataset. They
can occur due to various reasons, such as data entry errors, data corruption, or incomplete
data collection. Handling missing values is crucial for accurate data analysis.

d) List the visualization libraries in python.

Sol:

1. Matplotlib

2. Seaborn

3. Plotly

4. Bokeh

5. Altair
e) List applications of data science.

Sol:

1. Healthcare: Predictive analytics, disease diagnosis, personalized treatment plans.

2. Finance: Fraud detection, credit risk assessment, algorithmic trading.

3. Marketing: Customer segmentation, sentiment analysis, recommendation systems.

4. Retail: Inventory management, demand forecasting, customer behavior analysis.

5. Manufacturing: Predictive maintenance, quality control, supply chain optimization.

f) What is data transformation?

Sol:

Data transformation is the process of converting data from one format or structure into
another. This process includes various techniques such as normalization, aggregation, and
scaling to prepare data for analysis, improve its quality, and ensure consistency across
datasets.

g) Define Hypothesis Testing?

Sol:

Hypothesis testing is a statistical method used to make inferences or draw conclusions

about a population based on sample data. It involves formulating a null hypothesis (H0)
and an alternative hypothesis (H1), selecting a significance level, calculating a test
statistic, and making a decision to accept or reject the null hypothesis based on the test
results.

h) What is use of Bubble plot?

Sol:

A bubble plot is a type of scatter plot where each point is represented by a bubble. The
position of the bubble indicates the values of two variables, while the size of the bubble
represents the value of a third variable. Bubble plots are useful for visualizing the
relationships between three variables in a single plot.
i) Define Data cleaning?

Sol:

Data cleaning is the process of identifying, correcting, or removing errors and

inconsistencies in a dataset. This involves handling missing values, outliers, duplicate
records, and formatting issues to ensure that the data is accurate, complete, and ready for
analysis.

j) Define standard deviation?

Sol:

Standard deviation is a measure of the dispersion or spread of a set of values. It quantifies

the amount of variation or dispersion in a dataset relative to its mean. A low standard
deviation indicates that the data points are close to the mean, while a high standard
deviation indicates that the data points are spread out over a wide range of values.

Q2) Attempt any FOUR of the following:

a) List the tools for data scientist.

Sol:

• Tableau: A powerful data visualization tool that helps create interactive and
shareable dashboards.
• Power BI: A business analytics service by Microsoft that provides interactive
visualizations and business intelligence capabilities with a simple interface.
• Matplotlib: A plotting library for Python that provides tools to create static,
interactive, and animated visualizations.
• Seaborn: Built on top of Matplotlib, this Python library provides high-level
interface for drawing attractive and informative statistical graphics.
b) Define statistical data analysis?

Sol:

Statistical data analysis is the process of collecting, organizing, analyzing, interpreting, and
presenting data using statistical methods. It involves using techniques such as descriptive
statistics, inferential statistics, hypothesis testing, regression analysis, and more to derive
insights and make data-driven decisions.

c) What is data cube?

Sol:

A data cube is a multi-dimensional array of values, commonly used to describe data in a

data warehouse. It allows for complex queries and analyses on multi-dimensional data,
providing a way to view and analyze data from different perspectives (dimensions) such as
time, geography, and product categories.

d) Give the purpose of data preprocessing?

Sol:

The purpose of data preprocessing is to prepare raw data for analysis by transforming it into
a clean and usable format. This involves steps such as data cleaning, normalization,
transformation, feature extraction, and encoding. Data preprocessing ensures that the data
is accurate, consistent, and ready for analysis, which improves the quality and reliability of
the results.

e) What is the purpose of data visualization?

Sol:

The purpose of data visualization is to represent data graphically, making it easier to

understand and interpret. It helps in identifying patterns, trends, and outliers in the data,
facilitating data-driven decision-making. Data visualization also aids in communicating
complex information effectively to stakeholders and enhances the overall analytical
process.
Q3) Attempt any TWO of the following:

a) What are the measures of central tendency? Explain any two of them in brief.

Sol:

Measures of central tendency are statistical metrics used to describe the center or
distribution of a dataset. The three main measures are:

1. Mean: The average of all data points, calculated by summing the values and dividing
by the number of observations.

o Example: For the dataset [1, 2, 3, 4, 5], the mean is (1+2+3+4+5)/5 = 3.

2. Median: The middle value in an ordered dataset, which separates the data into two
halves.

o Example: For the dataset [1, 2, 3, 4, 5], the median is 3.

3. Mode: The most frequently occurring value in a dataset.

o Example: For the dataset [1, 2, 2, 3, 4], the mode is 2.

b) What are the various types of data available? Give example of each.

Sol:

1. Nominal Data:

o Definition: Categorical data without any intrinsic ordering. It represents

categories or labels.

o Example: Gender (Male, Female), Colors (Red, Blue, Green)

2. Ordinal Data:

o Definition: Categorical data with a clear ordering or ranking between the

categories.

o Example: Education levels (High School, Bachelor's, Master's, Ph.D.),

Satisfaction levels (Poor, Fair, Good, Excellent)
3. Interval Data:

o Definition: Numeric data with meaningful differences between values, but no

true zero point.

o Example: Temperature in Celsius (0°C does not mean 'no temperature')

4. Ratio Data:

o Definition: Numeric data with meaningful differences between values and a

true zero point.

o Example: Height (in centimeters), Weight (in kilograms), Age (in years)

c) What is Venn diagram? How to create it? Explain with example.

Sol:

Venn Diagram: A Venn diagram is a graphical representation used to show the relationships
between different sets. It uses overlapping circles to illustrate the commonalities and
differences between the sets.

How to Create a Venn Diagram:

1. Draw circles for each set. The number of circles corresponds to the number of sets.

2. Label each circle with the set name.

3. Place elements in the appropriate sections of the circles to indicate membership in

the sets.

Example: To represent sets A = {1, 2, 3} and B = {2, 3, 4}:

• Draw two overlapping circles, one for A and one for B.

• Place 1 in the non-overlapping part of A, 4 in the non-overlapping part of B, and 2

and 3 in the overlapping section.

A B

1 2, 3 4

This diagram shows that 2 and 3 are common to both sets A and B, while 1 is unique to A
and 4 is unique to B.
Q4) Attempt any two of the following:

a) Explain different data formats in brief.

Sol:

1. CSV (Comma-Separated Values):

o Description: A plain text format where each line represents a record, and fields are
separated by commas. It is widely used for data exchange between applications.

o Example:

Name, Age, Country

John, 25, USA

Jane, 30, UK

2. JSON (JavaScript Object Notation):

o Description: A structured, text-based format that uses key-value pairs to represent

objects and arrays. It is commonly used for data interchange between web services
and applications.

o Example:

"name": "John",

"age": 25,

"country": "USA"

3. XML (eXtensible Markup Language):

o Description: A flexible text format that uses tags to define the structure and content of
data. It is used for data representation and exchange.

o Example:

</person>

4. SQL (Structured Query Language):

o Description: A language used for managing and querying relational databases. It

allows for data manipulation and retrieval.

o Example:

SELECT name, age, country

FROM people

WHERE age > 20;

5. Parquet:

o Description: A columnar storage file format optimized for efficient data storage and
retrieval, particularly in big data environments.

o Example:

// Parquet file structure

"columns": [

"name",

"age",

"country"

"data": [

["John", 25, "USA"],

["Jane", 30, "UK"]

}
b) What is data quality? Which factors affect data qualities?

Sol:

Data Quality: Data quality refers to the accuracy, completeness, reliability, and relevance
of data for its intended use. High-quality data ensures that analyses and decisions based
on the data are accurate and reliable.

Factors Affecting Data Quality:

1. Accuracy: The degree to which data correctly represents the real-world values it is
intended to model. Inaccurate data can lead to incorrect conclusions.

2. Completeness: The extent to which all required data is available. Missing data can
result in biased analyses.

3. Consistency: The degree to which data is uniform and free from contradictions.
Inconsistent data can arise from different data sources or entry errors.

4. Timeliness: The degree to which data is up-to-date and available when needed.
Outdated data can render analyses irrelevant.

5. Validity: The extent to which data conforms to defined formats and standards.
Invalid data can result from incorrect data entry or format mismatches.

c) Write detailed notes on basic data visualization tools?

Sol:

Basic Data Visualization Tools:

1. Matplotlib:

o Description: Matplotlib is one of the most widely used data visualization libraries in
Python. It provides a flexible platform for creating static, animated, and interactive
visualizations.

o Capabilities: Line plots, scatter plots, bar charts, histograms, pie charts, and more.

o Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [10, 20, 25, 30, 35]

plt.plot(x, y)

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.title('Sample Line Plot')

plt.show()

2. Seaborn:

o Description: Built on top of Matplotlib, Seaborn offers a high-level interface for

drawing attractive and informative statistical graphics.

o Capabilities: Enhanced visualizations, including heatmaps, violin plots, box plots, and
pair plots.

o Example:

import seaborn as sns

import matplotlib.pyplot as plt

data = sns.load_dataset("iris")

sns.pairplot(data, hue="species")

plt.show()

3. Plotly:

o Description: Plotly is an interactive, open-source plotting library that supports a wide

range of visualization types and interactive features.

o Capabilities: 3D plots, geographic maps, interactive charts, and dashboards.

o Example:

import plotly.express as px

df = px.data.iris()

fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species')

fig.show()
4. Bokeh:

o Description: Bokeh provides an elegant and concise way to create interactive

visualizations for modern web browsers.

o Capabilities: Interactive plots, dashboards, and data applications.

o Example:

from bokeh.plotting import figure, show

p = figure(title="Bokeh Plot Example", x_axis_label='x', y_axis_label='y')

p.line([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], legend_label="Line", line_width=2)

show(p)

5. Altair:

o Description: Altair is a declarative statistical visualization library based on Vega and

Vega-Lite, designed for creating simple yet powerful visualizations.

o Capabilities: Interactive visualizations with concise code.

o Example:

import altair as alt

import pandas as pd

data = pd.DataFrame({

'a': [1, 2, 3, 4, 5],

'b': [3, 4, 5, 6, 7]

})

chart = alt.Chart(data).mark_line().encode(

x='a',

y='b'

chart.show()
Q5) Attempt any ONE of the following:

a) What is outlier? State types of outliers.

Sol:

Outlier: An outlier is a data point that significantly differs from other observations in a
dataset. It can indicate variability in the data, measurement errors, or experimental errors.
Outliers can skew statistical analyses and affect the accuracy of models.

Types of Outliers:

1. Global Outliers:

o Definition: Data points that deviate significantly from the rest of the dataset. These
outliers are also known as point outliers.

o Example: In a dataset of student grades, a score of 100 when most scores range
between 60-80.

2. Contextual Outliers:

o Definition: Data points that are outliers in a specific context or condition. These
outliers are context-dependent and may appear normal in other contexts.

o Example: A high temperature reading in a normally cold region during winter.

3. Collective Outliers:

o Definition: A group of data points that deviate significantly from the rest of the
dataset. These outliers may not be outliers individually but form a collective anomaly
when considered together.

o Example: A sudden spike in network traffic at a specific time, indicating a potential

cyber-attack.
b) State and explain any three data transformation techniques.

Sol:

1. Normalization:

o Description: Normalization is the process of scaling numeric data to a standard range,

typically between 0 and 1. It helps in bringing all features to the same scale, which
can improve the performance of machine learning algorithms.

o Example:

from sklearn.preprocessing import MinMaxScaler

data = [[100], [200], [300], [400], [500]]

scaler = MinMaxScaler()

normalized_data = scaler.fit_transform(data)

print(normalized_data)

2. Standardization:

o Description: Standardization involves scaling data to have a mean of 0 and a standard

deviation of 1. This technique is useful when the data has different units or scales.

o Example:

from sklearn.preprocessing import StandardScaler

data = [[100], [200], [300], [400], [500]]

scaler = StandardScaler()

standardized_data = scaler.fit_transform(data)

print(standardized_data)
3. Log Transformation:

o Description: Log transformation is used to stabilize variance and make the data more
normally distributed. It is particularly useful for data that follows a skewed
distribution.

o Example:

import numpy as np

data = [1, 10, 100, 1000, 10000]

log_transformed_data = np.log(data)

print(log_transformed_data)

Ozone Mag #65 - Mar 2008
No ratings yet
Ozone Mag #65 - Mar 2008
99 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
FDS - 5 Solved
No ratings yet
FDS - 5 Solved
13 pages
Fds Print
No ratings yet
Fds Print
7 pages
FDS Sem5
No ratings yet
FDS Sem5
20 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
FDS - 3 Solved
No ratings yet
FDS - 3 Solved
21 pages
Data Science Four Marks Qa
No ratings yet
Data Science Four Marks Qa
4 pages
FDS - 4 Solved
No ratings yet
FDS - 4 Solved
21 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
FDS - 1 Solved
No ratings yet
FDS - 1 Solved
17 pages
FDS Most Imp Question
No ratings yet
FDS Most Imp Question
12 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Important Questions
No ratings yet
Important Questions
26 pages
Da Question Bank
No ratings yet
Da Question Bank
7 pages
FDS 1
No ratings yet
FDS 1
5 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
Punyashlok Ahilyadevi Holkar Solapur University, Solapur Final Year B.Tech. (Electronics & Telecommunication Engg.) (Part - II) CBCS Pattern
No ratings yet
Punyashlok Ahilyadevi Holkar Solapur University, Solapur Final Year B.Tech. (Electronics & Telecommunication Engg.) (Part - II) CBCS Pattern
6 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
DM - Midsem - Question Bank
No ratings yet
DM - Midsem - Question Bank
5 pages
FDS
No ratings yet
FDS
7 pages
Class 9 (Chap #4)
No ratings yet
Class 9 (Chap #4)
9 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
DS Unit 1
No ratings yet
DS Unit 1
99 pages
Computer Unit - 4
No ratings yet
Computer Unit - 4
28 pages
Data - Mining 1 18 36
No ratings yet
Data - Mining 1 18 36
19 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Data Mining: Set-01: (Introduction)
No ratings yet
Data Mining: Set-01: (Introduction)
14 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
FDS Important Q
No ratings yet
FDS Important Q
5 pages
3.question Bank
No ratings yet
3.question Bank
7 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
Data Mining
No ratings yet
Data Mining
34 pages
Business 1
No ratings yet
Business 1
1 page
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Unit 1
No ratings yet
Unit 1
34 pages
DS&ML 4
No ratings yet
DS&ML 4
9 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
What Is Data? Explain The Importance of Data.: Unit I 1
No ratings yet
What Is Data? Explain The Importance of Data.: Unit I 1
52 pages
Understanding Data Assignment 2
No ratings yet
Understanding Data Assignment 2
12 pages
02 Data
No ratings yet
02 Data
24 pages
Module 1 - BCS602 - Chapter 02
No ratings yet
Module 1 - BCS602 - Chapter 02
90 pages
UNIT 5 Data Literacy Levels of Measurement QuesAnsExtra
No ratings yet
UNIT 5 Data Literacy Levels of Measurement QuesAnsExtra
14 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Unit1-Data Science Fundamentals
No ratings yet
Unit1-Data Science Fundamentals
35 pages
Home Assignment Dataliteracy
No ratings yet
Home Assignment Dataliteracy
4 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
DS Assignment No 2
No ratings yet
DS Assignment No 2
21 pages
Levels of Measurement Q A
No ratings yet
Levels of Measurement Q A
16 pages
Data Mining
No ratings yet
Data Mining
5 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
FDS Imp Docs
No ratings yet
FDS Imp Docs
22 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Module 2 Cascadding Style Sheet: Objectives
No ratings yet
Module 2 Cascadding Style Sheet: Objectives
8 pages
String JS
No ratings yet
String JS
2 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Chapter-3 Data Processing
No ratings yet
Chapter-3 Data Processing
54 pages
Chapter 1-Introduction To Data Science
No ratings yet
Chapter 1-Introduction To Data Science
39 pages
Operating System Concepts
No ratings yet
Operating System Concepts
64 pages
Efficacy of Dialectical Behavior Therapy For Adolescents at High Risk For Suicide: A Randomized Clinical Trial
No ratings yet
Efficacy of Dialectical Behavior Therapy For Adolescents at High Risk For Suicide: A Randomized Clinical Trial
11 pages
Case 1 Puberty - Tutor PDF
No ratings yet
Case 1 Puberty - Tutor PDF
13 pages
Arya Surya - DVD Rip - 1CD - Xvid - Uyirvani - Avi.torrent
No ratings yet
Arya Surya - DVD Rip - 1CD - Xvid - Uyirvani - Avi.torrent
47 pages
First Love PDF
No ratings yet
First Love PDF
189 pages
What Is Makigami: The Earth-Friendly Alternative To Plastic
67% (3)
What Is Makigami: The Earth-Friendly Alternative To Plastic
6 pages
Get (Ebook) Macroeconomics, 14th Canadian Edition by Christopher T.S. Ragan ISBN 9780321794888, 0321794885 Free All Chapters
100% (1)
Get (Ebook) Macroeconomics, 14th Canadian Edition by Christopher T.S. Ragan ISBN 9780321794888, 0321794885 Free All Chapters
52 pages
Test No 10 Ans Co - Chem
No ratings yet
Test No 10 Ans Co - Chem
10 pages
Introduction of Rooms Division Front Office and Housekeeping Departmentppt 140604044238 Phpapp01
No ratings yet
Introduction of Rooms Division Front Office and Housekeeping Departmentppt 140604044238 Phpapp01
26 pages
SC900 Nilfisk
No ratings yet
SC900 Nilfisk
77 pages
Course 3. Exam Questions
No ratings yet
Course 3. Exam Questions
6 pages
Atterberg's Limits Soil Classification - Liquid Limit, Plastic Limit, Shrinkage
100% (1)
Atterberg's Limits Soil Classification - Liquid Limit, Plastic Limit, Shrinkage
5 pages
Practical Entity Framework: Database Access For Enterprise Applications 1st Edition Brian L. Gorman Download
100% (1)
Practical Entity Framework: Database Access For Enterprise Applications 1st Edition Brian L. Gorman Download
59 pages
Speech and Stage Arts Group 1
No ratings yet
Speech and Stage Arts Group 1
31 pages
Kaplit Reviewer
No ratings yet
Kaplit Reviewer
4 pages
Functionalism
No ratings yet
Functionalism
11 pages
HMC Holdings v. Extreme - Complaint
No ratings yet
HMC Holdings v. Extreme - Complaint
42 pages
TIME IS RUNNING OUT - MUSE (Tab Guitarra)
No ratings yet
TIME IS RUNNING OUT - MUSE (Tab Guitarra)
6 pages
Business Valuation Tutorials Unit I
No ratings yet
Business Valuation Tutorials Unit I
24 pages
FM 103 and BM 108
No ratings yet
FM 103 and BM 108
187 pages
Computers and Education Viewpoints
No ratings yet
Computers and Education Viewpoints
122 pages
Need Chapter Sampler
0% (1)
Need Chapter Sampler
20 pages
Origional HRMS Final Report Part3
0% (1)
Origional HRMS Final Report Part3
32 pages
Molly Michael Transcript
No ratings yet
Molly Michael Transcript
222 pages
Schizophrenia: Third Edition
No ratings yet
Schizophrenia: Third Edition
49 pages
04 - Client Installation Guide
No ratings yet
04 - Client Installation Guide
30 pages
IGU Zoology Syllabus
No ratings yet
IGU Zoology Syllabus
56 pages
DPP-3 - Coulomb's Law and Superposition Principle - 12th JEE - Physics
No ratings yet
DPP-3 - Coulomb's Law and Superposition Principle - 12th JEE - Physics
2 pages
Book - Pairing
No ratings yet
Book - Pairing
6 pages
Panels For Façades, Fascias and Dormer Windows: - Exterior
No ratings yet
Panels For Façades, Fascias and Dormer Windows: - Exterior
6 pages