0% found this document useful (0 votes)

5 views7 pages

Fds Print

The document provides a comprehensive overview of data science concepts, including definitions of data sources, missing values, and data transformation. It also lists popular visualization libraries in Python, tools for data scientists, and applications of data science across various industries. Additionally, it discusses data quality, types of data, and techniques for data visualization, alongside explanations of statistical concepts such as measures of central tendency and hypothesis testing.

Uploaded by

mr.mashroof532

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Fds Print

Uploaded by

mr.mashroof532

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

FDS paper solve

1 marks

1]what is data science?

ans:Data science is the study of data to extract knowledge and insights that can be used to inform
decisions and predictions:

2]define data source?

ans:In the foundation of data science, a "data source" refers to the specific location or system where raw
data originates, essentially the point of origin for the information that is used for analysis, whether it's a
database, file, sensor, website, or any other digital or physical repository where data is stored and
accessed.

3]what is missing values?

ans:Missing data, or missing values, occur when you don't have data stored for certain variables or
participants. Data can go missing due to incomplete data entry, equipment malfunctions, lost files, and
many other reasons. In any dataset, there are usually some missing data.

4]list visualization libraries in python?

ans:Some of the most popular data visualization libraries in Python include: Matplotlib, Seaborn, Plotly,
Bokeh, Altair, ggplot, Holoviews, and Folium; with Matplotlib being the most established and Seaborn
building on top of it for more aesthetic statistical graphs.

5]list applications of data science?

ans:Data science applications include: fraud detection, healthcare analytics, targeted advertising,
product recommendation systems, risk assessment, image recognition, sentiment analysis, customer
behavior analysis, predictive maintenance, airline route planning, and optimizing supply chains across
various industries like finance, marketing, and technology.

6]what is data transformation?

ans:the process of converting raw data into a structured, usable format by cleaning, manipulating, and
structuring it, allowing for easier analysis and decision-making.

7]what is hypothesis testing?

ans:Hypothesis testing is a statistical procedure that helps determine if a hypothesis about a population
is valid based on a sample of data.

8]what is use of bubble polt?

ans:A bubble plot, also known as a bubble chart, is a data visualization tool that can be used to show
relationships between three or more numeric variables:

9]what is data cleaning?

ans:Data cleaning is the process of fixing or removing incorrect, incomplete, or duplicated data in a
dataset to improve its quality and reliability. It's also known as data cleansing or data scrubbing.

10]what is standard deviation?

ans:In data science, standard deviation is a statistical measurement that shows how spread out a set of
data is in relation to its mean:

2 MARKS

1]list the tools for data scientis

ans:Apache Spark

TensorFlow

Tableau

SAS

BigML

Power BI

Apache Hadoop

Git

Microsoft Excel

Tableab

TensorFlow

Apache Spark

2]define statisical data analysis?

ans:Statistical data analysis is the process of collecting, analyzing, and presenting data to identify
patterns and trends, and to derive conclusions. It's a scientific tool used by data scientists, researchers,
businesses, and governments to make decisions.

3]what is data cube?

ans:A data cube is a multidimensional data structure that stores data in a tabular form and is used for
efficient analysis:

4]give purpose of data preprocessing?

ans:The purpose of data preprocessing is to clean, transform, and organize raw data into a format
suitable for further analysis or modeling by removing inconsistencies, handling missing values, and
ensuring data quality, making it ready for machine learning algorithms or other data analysis techniques;
essentially, it prepares the data to be more usable and reliable for further processing.

5]what is purpose of data visualization

ans:The purpose of data visualization in data science is to help people and organizations understand,
explore, and monitor data. Data visualization is the process of using visuals like charts, maps, and graphs
to represent data and information.

4 MARKS

1]what are the measures of central tendency,explain any two?

ans:There are three main measures of central tendency: mode. median. mean.

Median

The median is the middle value in distribution when the values are arranged in ascending or descending
order.

The median divides the distribution in half (there are 50% of observations on either side of the median
value). In a distribution with an odd number of observations, the median value is the middle value.

Looking at the retirement age distribution (which has 11 observations), the median is the middle value,
which is 57 years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

When the distribution has an even number of observations, the median value is the mean of the two
middle values. In the following distribution, the two middle values are 56 and 57, therefore the median
equals 56.5 years:

52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

Mean

The mean is the sum of the value of each observation in a dataset divided by the number of
observations. This is also known as the arithmetic average.

Looking at the retirement age distribution again:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623)
and dividing by the number of observations (11) which equals 56.6 years.

2]what are the various types of data available? give example of each?

ans:There are several types of data, including:

Quantitative data: This data can be further divided into discrete and continuous data. Discrete data
represents countable items, while continuous data outlines data measurement.

Categorical data: This data is always qualitative.

Nominal data: This data is categorized without a natural order or ranking.

Ordinal data: This data involves order but not fixed intervals.

Discrete data: This data consists of distinct and separate values.

Boolean data: This data contains only two values: true and false.

Multimedia data: This data includes photographs, audio, video, and numerous specialized formats.

Ratio scales: This data has a "true zero". The number zero means that the data has no value point. An
example of this is height or weight.

Confidential data: This information should only be accessed by a limited audience that has obtained
proper authorization.

Internal data: This data often relates to a company, business or organization. Only those employees who
work for the company typically have acess to internet data.

3]what is venn diagram ?how to create it?

ans:A Venn diagram is a visual representation of how sets of items relate to each other, using
overlapping shapes to show how they are similar and different. Here's how to create a Venn diagram:

Draw overlapping shapes: Usually circles, but can also be ellipses, spheres, or triangles.

Label each shape: Each shape represents a set of items.

Show the overlap: The overlapping area shows what the sets have in common.

Show the differences: The parts of the shapes that don't overlap show the differences between the sets.

Here's an example of a Venn diagram:

Circle 1: Represents every number between 1 and 25

Circle 2: Represents every number between 1 and 100 that is divisible by 5

Overlapping area: Contains the numbers 5, 10, 15, 20, and 25

Venn diagrams are used in many fields, including mathematics, statistics, logic, linguistics, computer
science, and business. They are often used in presentations and reports to help visualize data.

4]explain different data format in brief?

ans:Data formats define the structure of data in a database or file system, and can refer to a number of
things, including:

File format: How data is encoded and stored in a computer file. Some common file formats include:

JSON: A simple format that's easy for programming languages to read

XML: A widely used format for exchanging data

Comma separated files (CSV): A compact format that's good for transferring large amounts of data

HTML: A format that's easy to refer to, but may be better for data that's easy to download and
manipulate

Data type: A constraint placed on how data is interpreted in a type system

Recording format: How data is encoded for storage on a storage medium

Content format: How media content is represented as data

Audio format: How encoded sound data is formatted

Video format: How encoded video data is formatted

Signal format: How signal data is formatted for use in signal processing

Data formats are important because data scientists need to convert source data to a common format for
each model to process.

5]what is data quality which factors are affected data qualities?

ans:Data quality is a measure of how well a data set meets its intended purpose. It is based on a number
of factors, including:

Accuracy: Whether the data accurately represents the entities or events it's supposed to represent

Completeness: Whether the data includes all the values and types of data it's expected to contain

Consistency: Whether the data is uniform across systems and data sets

Validity: Whether the data conforms to defined business rules and parameters

Uniqueness: Measures the number of duplicates

Timeliness: How timely the data is

Accessibility: Whether the data is obtainable at the time it is needed and by those who need it
Data quality is important because it ensures that the data used for analysis, reporting, and decision-
making is reliable and trustworthy. Poor data quality can negatively impact customer service, employee
productivity, and key strategies.

Factors that can affect data quality include:

Incomplete information: Missing data can make data unusable. This can be due to poor data standards
or participants dropping out of a study.

Bias: Bias can negatively affect data collection.

Use of language: The use of language can negatively affect data collection.

Ethics: Ethics can negatively affect data collection.

Cost: Cost can negatively affect data collection.

Time and timing: Time and timing can negatively affect data collection.

Privacy issues: Privacy issues can negatively affect data collection.

Cultural sensitivity: Cultural sensitivity can negatively affect data collection.

6]write detailed notes on data visualization tools and techniques

ans:Together with the demand for data visualization and analysis, the tools and solutions in this area
develop fast and extensively. Novel 3D visualizations, immersive experiences and shared VR offices are
getting common alongside traditional web and desktop interfaces. Here are three categories of data
visualization technologies and tools for different types of users and purposes.

Data visualization is the graphical representation of information and data. By using visual elements like
charts, graphs, and maps, data visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.

3 MARKS

1]what is outliers and its type

ans:In data science, outliers are data points that are significantly different from the rest of the data set.
There are different types of outliers, including:

Global outliers

These are data points that are extreme compared to the entire data distribution. For example, if a
person's height is 7 feet in a dataset of heights that range from 5 to 6 feet, the 7 foot height would be a
global outlier.

Contextual outliers

These outliers depend on the context of the data and may not be outliers in a different context.

Collective outliers
These are groups of data points that are significantly different from the rest of the dataset when
considered together. For example, a group of customers who consistently make purchases that are
significantly larger than the rest of the customers could be considered a collective outlier.

Univariate outliers

These outliers are exceptional with respect to a single variable. For example, a recorded height of 3
meters in a dataset of human heights would likely be a univariate outlier.

Multivariate outliers

These outliers only appear abnormal when considering the relationship between two or more variables.
For example, a person's weight might not be an outlier by itself, but when considered in relation to their
height, it might be identified as an outlier.

2]state and explain any three data transformation techniques

ans: The most common types of data transformation are:

Constructive: The data transformation process adds, copies, or replicates data.

Destructive: The system deletes fields or records.

Aesthetic: The transformation standardizes the data to meet requirements or parameters.

Structural: The database is reorganized by renaming, moving, or combining columns.

Weekly Quiz 2 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
No ratings yet
Weekly Quiz 2 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
5 pages
Naga Manisha-SFDC Lightning Developer Resume
100% (1)
Naga Manisha-SFDC Lightning Developer Resume
6 pages
Exam C1000-100 IBM Cloud Solution Architect v4 Sample Test
No ratings yet
Exam C1000-100 IBM Cloud Solution Architect v4 Sample Test
4 pages
PRG10.Multi Threading in T24-R13
No ratings yet
PRG10.Multi Threading in T24-R13
32 pages
Sap Basis Mock Test
No ratings yet
Sap Basis Mock Test
4 pages
FDS - 2 Solved
No ratings yet
FDS - 2 Solved
14 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
FDS - 5 Solved
No ratings yet
FDS - 5 Solved
13 pages
FDS Most Imp Question
No ratings yet
FDS Most Imp Question
12 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
FDS Sem5
No ratings yet
FDS Sem5
20 pages
Important Questions
No ratings yet
Important Questions
26 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Unit 1
No ratings yet
Unit 1
34 pages
DM - Midsem - Question Bank
No ratings yet
DM - Midsem - Question Bank
5 pages
Understanding Data Assignment 2
No ratings yet
Understanding Data Assignment 2
12 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
ITDS Unit 1 - Merged
No ratings yet
ITDS Unit 1 - Merged
86 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
Chapter 1.1 Introduction To Data
No ratings yet
Chapter 1.1 Introduction To Data
10 pages
GFG DataScience Interview Questions
No ratings yet
GFG DataScience Interview Questions
64 pages
Lec2 Data
No ratings yet
Lec2 Data
51 pages
FDS - 3 Solved
No ratings yet
FDS - 3 Solved
21 pages
DS Unit 1
No ratings yet
DS Unit 1
99 pages
Data Mining: Set-01: (Introduction)
No ratings yet
Data Mining: Set-01: (Introduction)
14 pages
Data Science Four Marks Qa
No ratings yet
Data Science Four Marks Qa
4 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
DM Unit-1-1
No ratings yet
DM Unit-1-1
56 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
Computer Unit - 4
No ratings yet
Computer Unit - 4
28 pages
Module 1
No ratings yet
Module 1
30 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
FDS Imp Docs
No ratings yet
FDS Imp Docs
22 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Unit1-Data Science Fundamentals
No ratings yet
Unit1-Data Science Fundamentals
35 pages
Data Science
No ratings yet
Data Science
24 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Chapter 2: Getting To Know Your Data
No ratings yet
Chapter 2: Getting To Know Your Data
30 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Da Question Bank
No ratings yet
Da Question Bank
7 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
DS&ML 4
No ratings yet
DS&ML 4
9 pages
Chapter 2 - Tagged
No ratings yet
Chapter 2 - Tagged
66 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Lect 3
No ratings yet
Lect 3
51 pages
Data ch2
No ratings yet
Data ch2
16 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
Undestanding Data Module-3
No ratings yet
Undestanding Data Module-3
8 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
CS109a Lecture1
No ratings yet
CS109a Lecture1
67 pages
BA - Unit - 1 - Merged (1) Highlighted
No ratings yet
BA - Unit - 1 - Merged (1) Highlighted
103 pages
BA - Unit 1
No ratings yet
BA - Unit 1
16 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
UNIT 5 Data Literacy Levels of Measurement QuesAnsExtra
No ratings yet
UNIT 5 Data Literacy Levels of Measurement QuesAnsExtra
14 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
Lect 2 DM Converted 1
No ratings yet
Lect 2 DM Converted 1
29 pages
Unit 1 Data Analytics (KCA-034)
No ratings yet
Unit 1 Data Analytics (KCA-034)
21 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Resume 15 Janardana Janardana
No ratings yet
Resume 15 Janardana Janardana
2 pages
Relational Database Design: Practice Exercises
0% (1)
Relational Database Design: Practice Exercises
6 pages
SOP DC-DR Database DRILL
No ratings yet
SOP DC-DR Database DRILL
4 pages
Oracle Database Performance Tuning
No ratings yet
Oracle Database Performance Tuning
10 pages
Selfstudys Com File
No ratings yet
Selfstudys Com File
28 pages
DDM 3
No ratings yet
DDM 3
43 pages
Manually Install HR Schema
No ratings yet
Manually Install HR Schema
1 page
SADCW 7e Chapter07
No ratings yet
SADCW 7e Chapter07
30 pages
Aris Mashzone: Cool Business Mashups in Minutes
No ratings yet
Aris Mashzone: Cool Business Mashups in Minutes
2 pages
Database Project Group 5
No ratings yet
Database Project Group 5
22 pages
Universe Designer & WEB Intelligence
No ratings yet
Universe Designer & WEB Intelligence
27 pages
Discuss The Hadoop Distributed File System (HDFS)
No ratings yet
Discuss The Hadoop Distributed File System (HDFS)
12 pages
DB Partitioning
No ratings yet
DB Partitioning
11 pages
Inst Op2020
No ratings yet
Inst Op2020
84 pages
Sem2 - IDC - BCA-124 P Working With Data Using MySQL - Revised
No ratings yet
Sem2 - IDC - BCA-124 P Working With Data Using MySQL - Revised
5 pages
DMS Assignment 1
No ratings yet
DMS Assignment 1
2 pages
Geospatial Imagery Formats
No ratings yet
Geospatial Imagery Formats
9 pages
Excel Practice - ExcelR
No ratings yet
Excel Practice - ExcelR
177 pages
MySQL Perf Tuning OOW2015 Dim
No ratings yet
MySQL Perf Tuning OOW2015 Dim
141 pages
Databricks Class 1 PPT
No ratings yet
Databricks Class 1 PPT
8 pages
Scou 220 Manual T 03
No ratings yet
Scou 220 Manual T 03
18 pages
Big Data Analytics r20 Supply April-2025
No ratings yet
Big Data Analytics r20 Supply April-2025
1 page
Writing Your Report
No ratings yet
Writing Your Report
3 pages
Chapter One: Design & Implementation of Hospital Billing and Payment System
No ratings yet
Chapter One: Design & Implementation of Hospital Billing and Payment System
6 pages
Outer Space and Galaxies by Slidesgo
No ratings yet
Outer Space and Galaxies by Slidesgo
54 pages

Fds Print

Uploaded by

Fds Print

Uploaded by

FDS paper solve

1]what is data science?

2]define data source?

3]what is missing values?

4]list visualization libraries in python?

5]list applications of data science?

6]what is data transformation?

7]what is hypothesis testing?

8]what is use of bubble polt?

9]what is data cleaning?

10]what is standard deviation?

1]list the tools for data scientis

2]define statisical data analysis?

3]what is data cube?

4]give purpose of data preprocessing?

5]what is purpose of data visualization

1]what are the measures of central tendency,explain any two?

Looking at the retirement age distribution again:

ans:There are several types of data, including:

Categorical data: This data is always qualitative.

Nominal data: This data is categorized without a natural order or ranking.

Discrete data: This data consists of distinct and separate values.

3]what is venn diagram ?how to create it?

Label each shape: Each shape represents a set of items.

Here's an example of a Venn diagram:

Circle 1: Represents every number between 1 and 25

Circle 2: Represents every number between 1 and 100 that is divisible by 5

Overlapping area: Contains the numbers 5, 10, 15, 20, and 25

4]explain different data format in brief?

JSON: A simple format that's easy for programming languages to read

XML: A widely used format for exchanging data

Data type: A constraint placed on how data is interpreted in a type system

Recording format: How data is encoded for storage on a storage medium

Content format: How media content is represented as data

Audio format: How encoded sound data is formatted

Video format: How encoded video data is formatted

5]what is data quality which factors are affected data qualities?

Uniqueness: Measures the number of duplicates

Timeliness: How timely the data is

Factors that can affect data quality include:

Bias: Bias can negatively affect data collection.

Ethics: Ethics can negatively affect data collection.

Cost: Cost can negatively affect data collection.

Privacy issues: Privacy issues can negatively affect data collection.

Cultural sensitivity: Cultural sensitivity can negatively affect data collection.

6]write detailed notes on data visualization tools and techniques

1]what is outliers and its type

2]state and explain any three data transformation techniques

ans: The most common types of data transformation are:

Constructive: The data transformation process adds, copies, or replicates data.

Destructive: The system deletes fields or records.

Aesthetic: The transformation standardizes the data to meet requirements or parameters.

Structural: The database is reorganized by renaming, moving, or combining columns.

You might also like