0% found this document useful (0 votes)
5 views7 pages

Fds Print

The document provides a comprehensive overview of data science concepts, including definitions of data sources, missing values, and data transformation. It also lists popular visualization libraries in Python, tools for data scientists, and applications of data science across various industries. Additionally, it discusses data quality, types of data, and techniques for data visualization, alongside explanations of statistical concepts such as measures of central tendency and hypothesis testing.

Uploaded by

mr.mashroof532
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

Fds Print

The document provides a comprehensive overview of data science concepts, including definitions of data sources, missing values, and data transformation. It also lists popular visualization libraries in Python, tools for data scientists, and applications of data science across various industries. Additionally, it discusses data quality, types of data, and techniques for data visualization, alongside explanations of statistical concepts such as measures of central tendency and hypothesis testing.

Uploaded by

mr.mashroof532
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

FDS paper solve

1 marks

1]what is data science?

ans:Data science is the study of data to extract knowledge and insights that can be used to inform
decisions and predictions:

2]define data source?

ans:In the foundation of data science, a "data source" refers to the specific location or system where raw
data originates, essentially the point of origin for the information that is used for analysis, whether it's a
database, file, sensor, website, or any other digital or physical repository where data is stored and
accessed.

3]what is missing values?

ans:Missing data, or missing values, occur when you don't have data stored for certain variables or
participants. Data can go missing due to incomplete data entry, equipment malfunctions, lost files, and
many other reasons. In any dataset, there are usually some missing data.

4]list visualization libraries in python?

ans:Some of the most popular data visualization libraries in Python include: Matplotlib, Seaborn, Plotly,
Bokeh, Altair, ggplot, Holoviews, and Folium; with Matplotlib being the most established and Seaborn
building on top of it for more aesthetic statistical graphs.

5]list applications of data science?

ans:Data science applications include: fraud detection, healthcare analytics, targeted advertising,
product recommendation systems, risk assessment, image recognition, sentiment analysis, customer
behavior analysis, predictive maintenance, airline route planning, and optimizing supply chains across
various industries like finance, marketing, and technology.

6]what is data transformation?

ans:the process of converting raw data into a structured, usable format by cleaning, manipulating, and
structuring it, allowing for easier analysis and decision-making.

7]what is hypothesis testing?

ans:Hypothesis testing is a statistical procedure that helps determine if a hypothesis about a population
is valid based on a sample of data.

8]what is use of bubble polt?

ans:A bubble plot, also known as a bubble chart, is a data visualization tool that can be used to show
relationships between three or more numeric variables:

9]what is data cleaning?


ans:Data cleaning is the process of fixing or removing incorrect, incomplete, or duplicated data in a
dataset to improve its quality and reliability. It's also known as data cleansing or data scrubbing.

10]what is standard deviation?

ans:In data science, standard deviation is a statistical measurement that shows how spread out a set of
data is in relation to its mean:

2 MARKS

1]list the tools for data scientis

ans:Apache Spark

TensorFlow

Tableau

SAS

BigML

Power BI

Apache Hadoop

Git

Microsoft Excel

Tableab

TensorFlow

Apache Spark

2]define statisical data analysis?

ans:Statistical data analysis is the process of collecting, analyzing, and presenting data to identify
patterns and trends, and to derive conclusions. It's a scientific tool used by data scientists, researchers,
businesses, and governments to make decisions.

3]what is data cube?

ans:A data cube is a multidimensional data structure that stores data in a tabular form and is used for
efficient analysis:

4]give purpose of data preprocessing?

ans:The purpose of data preprocessing is to clean, transform, and organize raw data into a format
suitable for further analysis or modeling by removing inconsistencies, handling missing values, and
ensuring data quality, making it ready for machine learning algorithms or other data analysis techniques;
essentially, it prepares the data to be more usable and reliable for further processing.

5]what is purpose of data visualization


ans:The purpose of data visualization in data science is to help people and organizations understand,
explore, and monitor data. Data visualization is the process of using visuals like charts, maps, and graphs
to represent data and information.

4 MARKS

1]what are the measures of central tendency,explain any two?

ans:There are three main measures of central tendency: mode. median. mean.

Median

The median is the middle value in distribution when the values are arranged in ascending or descending
order.

The median divides the distribution in half (there are 50% of observations on either side of the median
value). In a distribution with an odd number of observations, the median value is the middle value.

Looking at the retirement age distribution (which has 11 observations), the median is the middle value,
which is 57 years:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

When the distribution has an even number of observations, the median value is the mean of the two
middle values. In the following distribution, the two middle values are 56 and 57, therefore the median
equals 56.5 years:

52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

Mean

The mean is the sum of the value of each observation in a dataset divided by the number of
observations. This is also known as the arithmetic average.

Looking at the retirement age distribution again:

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is calculated by adding together all the values (54+54+54+55+56+57+57+58+58+60+60 = 623)
and dividing by the number of observations (11) which equals 56.6 years.

2]what are the various types of data available? give example of each?

ans:There are several types of data, including:

Quantitative data: This data can be further divided into discrete and continuous data. Discrete data
represents countable items, while continuous data outlines data measurement.

Categorical data: This data is always qualitative.

Nominal data: This data is categorized without a natural order or ranking.

Ordinal data: This data involves order but not fixed intervals.

Discrete data: This data consists of distinct and separate values.

Boolean data: This data contains only two values: true and false.

Multimedia data: This data includes photographs, audio, video, and numerous specialized formats.

Ratio scales: This data has a "true zero". The number zero means that the data has no value point. An
example of this is height or weight.

Confidential data: This information should only be accessed by a limited audience that has obtained
proper authorization.

Internal data: This data often relates to a company, business or organization. Only those employees who
work for the company typically have acess to internet data.

3]what is venn diagram ?how to create it?

ans:A Venn diagram is a visual representation of how sets of items relate to each other, using
overlapping shapes to show how they are similar and different. Here's how to create a Venn diagram:

Draw overlapping shapes: Usually circles, but can also be ellipses, spheres, or triangles.

Label each shape: Each shape represents a set of items.

Show the overlap: The overlapping area shows what the sets have in common.

Show the differences: The parts of the shapes that don't overlap show the differences between the sets.

Here's an example of a Venn diagram:

Circle 1: Represents every number between 1 and 25

Circle 2: Represents every number between 1 and 100 that is divisible by 5

Overlapping area: Contains the numbers 5, 10, 15, 20, and 25


Venn diagrams are used in many fields, including mathematics, statistics, logic, linguistics, computer
science, and business. They are often used in presentations and reports to help visualize data.

4]explain different data format in brief?

ans:Data formats define the structure of data in a database or file system, and can refer to a number of
things, including:

File format: How data is encoded and stored in a computer file. Some common file formats include:

JSON: A simple format that's easy for programming languages to read

XML: A widely used format for exchanging data

Comma separated files (CSV): A compact format that's good for transferring large amounts of data

HTML: A format that's easy to refer to, but may be better for data that's easy to download and
manipulate

Data type: A constraint placed on how data is interpreted in a type system

Recording format: How data is encoded for storage on a storage medium

Content format: How media content is represented as data

Audio format: How encoded sound data is formatted

Video format: How encoded video data is formatted

Signal format: How signal data is formatted for use in signal processing

Data formats are important because data scientists need to convert source data to a common format for
each model to process.

5]what is data quality which factors are affected data qualities?

ans:Data quality is a measure of how well a data set meets its intended purpose. It is based on a number
of factors, including:

Accuracy: Whether the data accurately represents the entities or events it's supposed to represent

Completeness: Whether the data includes all the values and types of data it's expected to contain

Consistency: Whether the data is uniform across systems and data sets

Validity: Whether the data conforms to defined business rules and parameters

Uniqueness: Measures the number of duplicates

Timeliness: How timely the data is

Accessibility: Whether the data is obtainable at the time it is needed and by those who need it
Data quality is important because it ensures that the data used for analysis, reporting, and decision-
making is reliable and trustworthy. Poor data quality can negatively impact customer service, employee
productivity, and key strategies.

Factors that can affect data quality include:

Incomplete information: Missing data can make data unusable. This can be due to poor data standards
or participants dropping out of a study.

Bias: Bias can negatively affect data collection.

Use of language: The use of language can negatively affect data collection.

Ethics: Ethics can negatively affect data collection.

Cost: Cost can negatively affect data collection.

Time and timing: Time and timing can negatively affect data collection.

Privacy issues: Privacy issues can negatively affect data collection.

Cultural sensitivity: Cultural sensitivity can negatively affect data collection.

6]write detailed notes on data visualization tools and techniques

ans:Together with the demand for data visualization and analysis, the tools and solutions in this area
develop fast and extensively. Novel 3D visualizations, immersive experiences and shared VR offices are
getting common alongside traditional web and desktop interfaces. Here are three categories of data
visualization technologies and tools for different types of users and purposes.

Data visualization is the graphical representation of information and data. By using visual elements like
charts, graphs, and maps, data visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.

3 MARKS

1]what is outliers and its type

ans:In data science, outliers are data points that are significantly different from the rest of the data set.
There are different types of outliers, including:

Global outliers

These are data points that are extreme compared to the entire data distribution. For example, if a
person's height is 7 feet in a dataset of heights that range from 5 to 6 feet, the 7 foot height would be a
global outlier.

Contextual outliers

These outliers depend on the context of the data and may not be outliers in a different context.

Collective outliers
These are groups of data points that are significantly different from the rest of the dataset when
considered together. For example, a group of customers who consistently make purchases that are
significantly larger than the rest of the customers could be considered a collective outlier.

Univariate outliers

These outliers are exceptional with respect to a single variable. For example, a recorded height of 3
meters in a dataset of human heights would likely be a univariate outlier.

Multivariate outliers

These outliers only appear abnormal when considering the relationship between two or more variables.
For example, a person's weight might not be an outlier by itself, but when considered in relation to their
height, it might be identified as an outlier.

2]state and explain any three data transformation techniques

ans: The most common types of data transformation are:

Constructive: The data transformation process adds, copies, or replicates data.

Destructive: The system deletes fields or records.

Aesthetic: The transformation standardizes the data to meet requirements or parameters.

Structural: The database is reorganized by renaming, moving, or combining columns.

You might also like