0% found this document useful (0 votes)
15 views

Data Science One Mark Question

Uploaded by

Omkar Shinde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Data Science One Mark Question

Uploaded by

Omkar Shinde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Q1.

ONE MARKS QUESTIONS

1. What do you mean by Primary Data?


Primary data is data collected firsthand by a researcher for a specific purpose or
project. It is original and has not been previously published or gathered.

2. What do you mean by Data Quality?


Data quality refers to the condition of a dataset and how suitable it is for its
intended use. It is defined by characteristics such as accuracy, completeness,
consistency, timeliness, and relevance.

3. Define outlier.
An outlier is a data point that significantly deviates from other observations in a
dataset. It can indicate variability in measurement, errors, or a unique feature in
the data.

4. Define Interquartile range.


The interquartile range (IQR) is a measure of statistical dispersion and is
calculated as the difference between the third quartile (Q3) and the first quartile
(Q1). IQR = Q3 - Q1.

5. What do you mean by missing values?


Missing values occur when no data value is stored for a variable in a particular
observation. It can happen due to data entry errors, non-response, or other
reasons.

6. What are uses of zip files?


Zip files are used to compress and combine multiple files into a single file. This
reduces storage space and simplifies file transfers and archiving.

7. What do you mean by XML Files data format?


XML (Extensible Markup Language) is a file format used to store and transport
data. It uses tags to define elements and data structure, making it both human-
and machine-readable.

8. Define data discretization.


Data discretization is the process of converting continuous data into discrete
buckets or intervals to reduce data complexity and make patterns in the data more
discernible.
9. What is tag cloud?
A tag cloud is a visual representation of textual data, typically used to depict the
most frequently occurring words in a text. The size of each word in the cloud
corresponds to its frequency.

10. What is visual encoding?


Visual encoding is the representation of data using visual attributes such as
position, size, shape, color, and orientation to convey information effectively in
charts and graphs.

11. What is Data Science?


Data Science is a multidisciplinary field that involves extracting knowledge,
insights, and meaningful patterns from structured and unstructured data using
techniques from statistics, machine learning, and data visualization.

12. Define Data Source.


A data source is the location from which data is gathered for analysis, such as a
database, a file, or an external web API.

13. What is missing values?


Missing values are the absence of data for a specific variable within a dataset,
which may arise due to various reasons like data entry errors, skipped questions,
or non-response.

14. List the visualization libraries in Python.


Some common visualization libraries in Python are Matplotlib, Seaborn, Plotly, and
Bokeh.

15. List applications of Data Science.


Applications of Data Science include predictive analytics, recommendation
systems, fraud detection, healthcare diagnosis, and customer segmentation.

16. What is data transformation?


Data transformation involves converting data from one format or structure to
another to make it suitable for analysis.

17. Define Hypothesis Testing.


Hypothesis testing is a statistical method used to determine whether there is
enough evidence to reject a null hypothesis based on sample data.

18. What is use of Bubble plot?


A bubble plot is used to represent three dimensions of data, where the x and y
axes show two variables, and the size of the bubble indicates the third variable.
19. Define Data Cleaning.
Data cleaning is the process of identifying and correcting errors, inconsistencies,
and missing values in a dataset to improve its quality and make it suitable for
analysis.

20. Define Standard Deviation.


Standard deviation is a measure of the amount of variation or dispersion in a set of
values. A low standard deviation indicates that values are close to the mean, while
a high standard deviation indicates more spread out values.

21. List any two applications of Data Science.


1. Fraud detection in financial institutions.
2. Personalized recommendation systems in e-commerce.

22. What is an outlier?


An outlier is a data point that is significantly different from the rest of the data
points in a dataset, either due to variability in the data or errors.

23. Define Variance.


Variance measures the spread of a set of numbers. It is the average of the squared
differences from the mean.

24. What is nominal attribute?


A nominal attribute is a categorical attribute that has no inherent order or ranking
among its values. Examples include gender or color.

25. What is one-hot coding?


One-hot coding is a process of converting categorical variables into a binary
format, with each unique category represented as a separate binary feature.

26. Define Data Visualization.


Data visualization is the graphical representation of data and information using
charts, graphs, and other visual tools to help identify patterns, trends, and
insights.

You might also like