0% found this document useful (0 votes)

57 views16 pages

01.ad3491 Fdsa QB

The document discusses fundamentals of data science and analytics. It covers topics like data science definitions and processes, different data types, descriptive analytics techniques, and more. Several questions are provided with explanations on key concepts in data science.

Uploaded by

kandasamy.1229

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views16 pages

01.ad3491 Fdsa QB

Uploaded by

kandasamy.1229

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

AD3491FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

QUESTION BANK

UNIT I
INTRODUCTION TO DATA SCIENCE
Part A
1. Define Data Science.
Data science is an interdisciplinary field that uses scientific methods, processes,
algorithms and systems to extract knowledge from noisy, structured or unstructured
data and apply knowledge from data across a wide range of application domains.
Data science is related to Data mining, Machine learning and Big data.
2. What is big data?
Big data is a comprehensive term for any collection of large or complex data sets
which is difficult to process using traditional data management techniques such as, for
example, the RDBMS (relational database management systems).
3. What is machine learning?
Machine learning is a branch of artificial intelligence (AI) and computer science
which focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
4. What is data mining?
Data mining is the process of reviewing large data sets to identify patterns and
relationships that can solve business problems through data analysis. Data mining
techniques and tools enable enterprises to predict future trends and make
moreinformed business decisions.
5. What are the characteristics of big data?
The characteristics of big data are:
- Volume - Quantity of data
- Variety - Different types of data
- Velocity - Speed at which new data generated
- These characteristics are complemented with a fourth V, Veracity - Accuracy of
data
6. List the categories of data.
The main categories of data are:
a) Structured Data
b) Unstructured Data
Introduction to Data Science
c) Natural language Data
d) Machine-generated Data
e) Graph-based Data
f) Audio, video and images Data
g) Streaming Data
7. List some of the application domains of data science?
The application domains of data science are:
- Health care applications
- Transportation
- Education
- Government Organization
- Commercial applications
8. What is structured data? Give examples.
Structured data is data that depends on a data model and resides in a fixed field
within a record.
Examples: Tabies within databases or Excel files.
9. What is unstructured data? Give an example.
Unstructured data is data that is not easy to fit into a data model because the
content is context-specific or varying. One example of unstructured data is the regular
email.
10. What is machine generated data?
Machine-generated data is information that is automatically created by a
computer, process, application, or other machine without human intervention.
Machinegenerated data is becoming a major data resource. The analysis of machine data
relies on highly scalable tools, due to its high volume and speed.
Examples: Web server logs, call detail records, network event logs, and telemetry.
11. State the importance of setting the research goal.
The goal of a data science project is to fulfill a precise and measurable objective
that is clearly connected to the purpose, workflows, and decision-making processes of
the business. This step defines the scope of the project.
12. List the phases involved in the data science process.
- Setting the research goal
- Retrieving data
- Data Preparation
- Data Exploration
- Data Modeling
- Presentation and automation
13. What is meant by data cleaning?
Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset. When combining
multiple data sources. there are many opportunities for data to be duplicated or
mislabeled.
14. What is project charter?
Project charter is a document that lays out the project vision, scope, objectives,
project team, and their responsibilities, key stakeholders, and how it will be carried out
or the implementation plan.
15. Identify the important contents of a project charter.
Project charter must include the following:
- Clear research goal
- Project mission and context
- Performing the analysis
- Resources to be used
- Proof that it is an achievable project, or proof of concepts
- Deliverables and a measure of success
- Timeline
16. List some of the visualization techniques.
- Line graphs
- Histograms
- Bar graphs
- Boxplots
- Sankey and network graphs.
17. List the problems associated with real world data.
The problems with the real world data are:
- Incomplete data: Missing attribute values.
- Noisy: Contains errors.
- Inconsistent: Contain discrepancies in codes and names.
18. Define data warchouse, data mart and data lake,
- Data warehouse is constructed by integrating data from multiple
heterogeneous sources that support analytical reporting. structured and/or ad
hoc queries, and decision making.
- Data mart supplies subject-oriented data necessary to support a specific
business unit.
- A data lake stores an organization's raw and processed (unstructured and
structured) data at both large and small scales.
19. List some of the factors involved in selecting the modeling technique.
- Model performance
- Moving the model to a production environment for easy implementation
- Maintenance of the model
20. What is dummy variable?
Dummy variables can only take two values: true (1) or false (0). They are used to
indicate the absence of a categorical effect that may explain the observation.
21. What do you mean by Exploratory Data Analysis?
Exploratory Data Analysis is the critical process of performing initial
investigations on data so as to discover patterns, to spot anomalies, to test hypothesis
and to check assumptions with the help of summary statistics and graphical
representations. It is used to discover trends and patterns from the dataset
22. List out the methods for combining data from different table.
- Joining tables
- Appending tables
- Using views to simulate data joins and appends
- Enriching aggregated measures
Part B & C
1. Discuss the applications of data science and big data with suitable examples.
2. Illustrate the overview of the data science process.
3. Elaborate any 5 application domains of data science.
4. Describe the categories of data for data mining.
5. Discuss the significance of setting the research goal for the data science project.
6. Discuss the strategies involved in retrieving relevant data from different sources of
data.
7. Explain the different stages of data preparation phase.
8. Elucidate the techniques involved in data cleansing.
9. Illustrate the steps involved in combining data from different data sources.
10. Explain impact of variable reduction on data science project highlighting its pros
and cons.
11. Elaborate on the steps involved in model building with suitable diagrams.
UNIT – II
DESCRIPTIVE ANALYTICS
PART - A
1. What is meant by frequency distribution?
A frequency distribution is a collection of observations produced by sorting
observations into classes and showing their frequency ( $f$ ) of occurrence in each
class.
2. What is Qualitative data? Give example.
Qualitative or Categorical data is a set of observations where any single
observation is a word, letter, or numerical code that represents a class or a category.
Example: Words - Yes or No, Letters - Y or N, Numerical code - 0 or 1
3. What is quantitative data? Give an example
Quantitative Data is a set of observations where any single observation is a
number that represents an amount or a count. It can be expressed in numerical values,
which make it countable and includes statistical data analysis. It is also known as
numerical data.
Example: Weights: 35,56 ….., 70 kgs
4. Compare Discrete and Continuous Variables.
Discrete Variable is a variable that consists of isolated numbers separated by
gaps.
- Example: Number of students in a class
Continuous Variable is a variable that consists of numbers whose values, at least
in theory, have no restrictions.
- Example: Height of students in a class
5. State the differences between Nominal and Ordinal data.
Nominal Data Ordinal Data
Cannot be quantified and there is no There is a sequential order by their position
intrinsic ordering on the scale
It is "in-between" qualitative data and
Qualitative data or categorical data
quantitative data
They do not provide any quantitative They provide sequence and can assign
value and cannot perform any numbers to ordinal data but cannot perform
arithmetical operation arithmetical operation
Cannot be used to compare with one Can compare one item with another by
another ranking or ordering
Examples: Economic status, customer
Examples: Gender, Nationality
satisfaction
6. What are the types of Frequency Distribution?
- Grouped frequency distribution
- Ungrouped frequency distribution
- Cumulative frequency distribution
- Relative frequency distribution
- Relative cumulative frequency distribution
7. Define an outlier.
An outlier is a data point that differs significantly from other observations. An
outlier can occur due to variability in the measurement or it may indicate an
experimental error.
8. What is percentile rank?
Percentile Rank (PR) of an observation is the percentage of scores in the entire
distribution with equal or smaller values than that score. Its mathematical formula is
given as:

9. State the differences between a histogram and bar graph.

Histogram Bar graph
Histogram refers to a graphical representation Bar graph is a pictorial representation
that displays data by way of bars to show the of data that uses bars to compare
frequency of numerical data. different categories of data.
Distribution of non-discrete variables Comparison of discrete variables
Quantitative data Categorical data
Bars touch each other, hence there are no Bars do not touch each other; hence
spaces between bars there are spaces between bars.
Elements are grouped together, so that they Elements are taken as individual
are considered as ranges. entities.

10. What are the measures of central tendency?

Mean, Median and Mode
11. Define Mode.
The mode represents the value of the most frequently occurring score.
12. Define Median.
Median represents the middle value when observations are ordered from least
most.
13. What is a Positively Skewed Distribution?
Positively Shewed Distribution is a distribution that includes a few extreme
observations in the positive direction (to the right of the majority of observations),
14. What is a Negatively Skewed Distribution?
Negatively Skewed Distribution is a distribution that includes a few extreme
observations in the negative direction (to the left of the majority of observations).
15. What is Variance?
Variance is the mean of all squared deviation scores.

16. What is standard deviation?

Standard Deviation is a rough measure of the average (or standard) amount by
which scores deviate on either side of their mean.

17. What is a Normal Curve?

A theoretical curve noted for its symmetrical bell-shaped form.
18. What is z Score?
A z score is a unit-free, standardized score that, regardless of the original units of
measurement, indicates how many standard deviations a score is above or below the
mean of its distribution.

Where, X is the original score and i and σ are the mean and the standard deviation,
respectively, for the normal distribution of the original scores.
19. How will you convert a z score to original score?

20. What is Correlation?

Correlation measures the relationship between two variables.
Example: The relationship between the computer skills and GPA of the student
21. What are the types of correlation?
- Positive correlation
- Negative correiation
- No correlation
22. What is a scatterplot?
A scatterplot is a graph containing a cluster of dots that represents all pairs of
scores. Scatter plots are used to observe relationships between variables.
23. What is a curvilinear relationship?
Curvilinear Relationship is a relationship that can be described best with a
curved line.
24. What are the Key Properties of correlation coefficient r ?
The two properties are:
- The sign of r indicates the type of linear relationship, whether positive or
negative.
- The numerical value of r, without regard to sign, indicates the strength of the
linear relationship.
25. What is regression?
Regression is a statistical method to determine the relationship between one
dependent variable and a series of other variables known as independent (explanatory)
variables.
26. What are the types of regression models?
- Linear model
- Non Linear model
27. What is restricted range?
Restricted range refers to a range of values that has been condensed, or
shortened. Example: The entire range of GPA scores is 0 to 10.0. A restricted range
could be 6.0 to 10.0 or 8.0 to 10.0 .
28. What is a regression line?
A regression line is a line which is used to describe the behavior of a set of data.
It displays the connection between scattered data points in any set. It gives the best
trend of the given data.
29. What is the Interpretation of r2 ?
The squared correlation coefficient, r2, provides us with not only a key
interpretation of the correlation coefficient but also a measure of predictive accuracy
that supplements the standard error of estimate, sy|x.
30. What is Standard Error of Estimate?
Standard Error of Estimate sy|x is a rough measure of the average amount of
predictive error i.e., as a rough measure of the average amount by which known Y
values deviate from their predicted Y values.
31. Give the Least Squares Regression Equation.
32. State the desirable property of least square regression?
The desirable property is that it automatically minimizes the total of all squared
predictive errors for known Y scores in the original correlation analysis.
33. State the Multiple Regression Equation.

34. When does Regression Fallacy occur?

Regression fallacy occurs whenever regression towards the mean is interpreted
as a real, rather than a chance, effect. The regression fallacy can be avoided by splitting
the subset of extreme observations into two groups.

Part B & C
1. Explain the different types of frequency distribution with suitable examples and
diagrams.
2. Elaborate the different ways to describe or represent data using tables with suitable
examples.
3. Explain the various ways by which data can be represented or described using graphs
with suitable examples and diagrams.
4. Explain the different measures of central tendency and describe the suitable
measures for the different types of data distribution.
5. Construct the frequency table and draw bar graph and stem and leaf displays for the
following data: 139,145,150,145,136,150,152,144,138,138
6. Construct the histogram and convert it to a frequency polygon for the following data:
138,139,139,145,145,150,145,136,150,152,144,138,138,150,149.133,134.152$.
155,151 .
7. Compute the mean, median and mode for the foliowing data sets:
- 45,55,60,60,63,63,63,63,65,65,70
- 26.9,26.3,28.7,27.4,26.6,27.4,26.9,26.9
8. Explain the various measures of variability with suitable examples.
9. Using the computation formula for the sum of squares, calculate the population
standard deviation and the sample standard deviation for the scores:
- 1,3,7,2,0,4,3,7
- 10,8,5,0,1,7,9,2,1
10. Consider the test scores approximating a normal curve with a mean of 500 and a
standard deviation of 100 . Sketch a normal curve and shade in the target area
described by the following:
- more than 550
- less than 525
- between 520 and 540
Plan solutions for the target areas. Convert to z scores and find proportions that
correspond to the target areas.
11. Elaborate in detail the significance of correlation and the various types of
correlation.
12. What are scatterplots? Illustrate on the various types with suitable examples.
13. Elaborate on the correlation coefficient r. Compare the various correlation
coefficients,
14. Calculate and analyze the correlation coefficient between the number of study hours
and the number of sleeping hours of different students.
Number of Study Hours 2 4 6 8 10
Number of Sleeping Hours 10 9 8 7 6

15. What is the significance of r2 ? Give a detailed interpretation of r2 ?

16. Discuss the importance of regression. Elabourate on the types of Regression.
17. Calculate the regression coefficient and obtain the lines of regression for the
following data.
X123 4 5 6 7
Y 9 8 10 12 11 13 14

18. Explain the significance of regression line and Least squares regression equation.
19. Find the standard error of the estimate of the mean weight of high school football
player using the data given of weights of the players.
Player Number Weight in Pounds
1 150
2 203
3 176
4 190
5 168
6 193
7 189
8 178
9 197
10 172

20. Elaborate on multiple regression equations.

21. Elucidate regression towards the mean. Explain regression fallacy and state how it
can be avoided.
UNIT III

INFERENTIAL STATISTICS

Part A

1. Define Population. Give an example.

A population is characterized by any complete set of observations (or potential

observations). It includes all the elements from the data set and the measurable
characteristics of the population such as mean and standard deviation are known as
parameters.

Example: All students in a college, all people living in India indicate the population of
India.

2. What is real population?

A real population is one in which all potential observations are accessible at the time of
sampling.

Examples of real populations: The ages of all visitors to a Park on a given day, the
ethnicity of all employees in an organization.

3. List the different types of population.

The different types of population are:

 Finite Population
 Infinite Population
 Existent Population
 Hypothetical Population

4. What is Hypothetical Population?

The population in which whose unit is not available in solid form is known as the
hypothetical population. A population consists of sets of observations, objects etc. that
are all something in common. In some situations, the populations are only hypothetical.

Examples: An outcome of rolling the dice, outcome of tossing a coin.

5. Define Sample.

Any subset of observations from a population may be characterized as a sample. In

typical applications of inferential statistics, the sample size is small relative to the
population size.
6. List the categories of sampling.

There are two categories of sampling generally used:

 Probability sampling
 Non-probability sampling

7. What is random sampling?

Probability sampling, also known as random sampling, is a kind of sample selection

where randomization is used instead of deliberate choice.

8. List the types of random sampling.

Types of Probability/Random sampling:

 Simple random sampling

 Systematic sampling
 Stratified random sampling
 Cluster sampling

9. Differentiate population and sample.

Population Sample
Measurable quantity is called parameter Measurable quantity is called statistics
It is a complete set This is a subset of population
It is the true representation of opinion It has margin error and confidence interval
Data collection is by complete Data collection is by means of Sampling or
enumeration or census sample survey

10. List the types of non-probability sampling.

Types of Non-probability sampling:

 Convenience sampling
 Quota sampling
 Purposive sampling
 Snowball sampling

11. What is Snowball sampling?

Snowball sampling is also known as referral sampling. This technique helps researchers
find a sample when they are difficult to locate. Researchers use this technique when the
sample size is small and not easily available.
12. What is the difference between non-probability sampling and probability
sampling?

Non-probability sampling Probability sampling

Sample selection is based on the subjective
Sample is selected at random.
judgment of the researcher.
Not everyone has an equal chance to Everyone in the population has an equal
participate. chance of getting selected.
The researcher does not consider sampling Used when sampling bias has to be
bias. reduced.
Useful when the population has similar
Useful when the population is diverse.
traits.
The sample does not accurately represent
Used to create an accurate sample.
the population.
Finding respondents is easy. Finding the right respondents is not easy.

13. What is the Optimal Sample Size?

The sample sizes are in hundreds or thousands for surveys, but less than 100 for most
experiments. Optimal sample size depends on the estimated variability among
observations and the acceptable amount of error in the conclusion.

14. What is Systematic sampling?

Systematic sampling is also known as systematic clustering. In this method, random

selection applies only to the first item chosen. A rule is then applied so that every $n$th
item or person after that is picked.

15. What is Cluster sampling?

Groups rather than individual units of the target population are selected at random.
Cluster sampling is similar to stratified sampling, besides the population is divided into
a large number of subgroups (for example, hundreds of thousands of strata or
subgroups). After that, some of these subgroups are chosen at random and simple
random samples are then gathered within these subgroups. These subgroups are
known as clusters. It is basically utilized to lessen the cost of data compilation.

16. What are the advantages of random sampling?

Advantages of random sampling are:

 It helps to reduce the bias involved in the sample compared to other methods of
sampling and it is considered as a fair method of sampling.
 This method does not require any technical knowledge, as it is a fundamental
method of collecting the data.
 The data collected through this method is well informed.
 As the population size is large in the simple random sampling method,
researchers can create the sample size that they want.
 It is easy to pick the smaller sample size from the existing larger population.

17. What is Consecutive sampling?

This non-probability sampling method is very similar to convenience sampling, with a

slight variation. In this case, the researcher picks a single person or a group of a sample,
conducts research over a period, analyzes the results, and then moves on to another
subject or group if needed. Consecutive sampling technique gives the researcher a
chance to work with many topics and fine-tune their research by collecting results that
have vital insights.

18. What is the Standard Error of the Mean?

The standard error of the mean equals the standard deviation of the population divided
by the square root of the sample size. It is a rough measure of the average amount by
which sample means deviate from the mean of the sampling distribution or from the
population mean.

19. What is hypothesis testing?

Hypothesis testing is an act in statistics whereby an analyst tests an assumption

regarding a population parameter. The methodology employed by the analyst depends
on the nature of the data used and the reason for the analysis.

20. What is one tailed test?

A one-tailed test is a statistical test in which the critical area of a distribution is one-
sided so that it is either greater than or less than a certain value, but not both. If the
sample being tested falls into the one-sided critical area, the alternative hypothesis will
be accepted instead of the null hypothesis.

21. What is two tailed test?

A two-tailed test, in statistics, is a method in which the critical area of a distribution is

two-sided and tests whether a sample is greater than or less than a certain range of
values. It is used in null-hypothesis testing and testing for statistical significance. If the
sample being tested falls into either of the critical areas, the alternative hypothesis is
accepted instead of the null hypothesis.

22. State the Central Limit Theorem.

The Central Limit Theorem states that the distribution of a sample variable
approximates a normal distribution (i.e., a bell curve) as the sample size becomes larger,
assuming that all samples are identical in size, and regardless of the population's actual
distribution shape.
23. What is confidence interval?

Confidence intervals measure the degree of uncertainty or certainty in a sampling

method. They can take any number of probability limits, with the most common being a
95 % or 99 % confidence level. Confidence intervals are conducted using statistical
methods, such as a t-test.

24. What is the formula for the confidence interval for µ (based on z )?

Where

25. What is point estimate?

A point estimate is defined as a calculation where a sample statistic is used to estimate

or approximate an unknown population parameter.

26. List the methods to calculate point estimates.

 Method of Moments
 Maximum Likelihood
 Bayes Estimators
 Best Unbiased Estimators

27. What are the properties of point estimators?

 Bias
 Consistency
 Most efficient or unbiased

28. What is the drawback of point estimates? How it can be resolved?

Point estimates convey no information about the degree of inaccuracy due to sampling
variability. Statisticians supplement point estimates with another, more realistic type of
estimate, known as interval estimates or confidence intervals.

29. What is interval estimator?

Interval estimator uses sample data to calculate the interval of the possible values of an
unknown parameter of a population. It gives the range of values for the parameter.
Interval estimates are intervals within which the parameter is expected to fall, with a
certain degree of confidence. The interval of the parameter is selected in a way that it
falls within a 95 % or higher probability, also known as the confidence interval.

The level of confidence indicates the percent of time that a series of confidence intervals
includes the unknown population characteristic, such as the population mean.

Part B & C

1. Discuss on population and samples with suitable examples.

2. Discuss the different types of random sampling techniques.

3. Elaborate on the different types of non-probability based sampling techniques.

4. Illustrate the hypothesis testing with an example.

5. Explain the procedure of z-test with an example.

6. A teacher claims that the mean score of students in the class is greater than 80 with a
standard deviation of 20. If a sample of 75 students was selected with a mean score of
90 then check if there is enough evidence to support this claim at a 0.05 significance
level.

7. An online food delivery company claims that the mean delivery time is less than 30
minutes with a standard deviation of 10 minutes. Is there enough evidence to support
this claim at a 0.05 significance level if 49 orders were examined with a mean of 20
minutes?

8. A company wants to improve the quality of products by reducing defects and

monitoring the efficiency of assembly lines. In assembly line A, there were 9 defects
reported out of 100 samples and in line B, 25 defects out of 600 samples were
identified. Check if there is a difference in the procedures at a 0.05 alpha level?

9. Explain in detail about Estimation and the significance of point estimates.

10. Elaborate on Confidence interval and level of confidence.

FDS - Unit 1 Question Bank
No ratings yet
FDS - Unit 1 Question Bank
16 pages
3.question Bank
No ratings yet
3.question Bank
7 pages
Fds Two Marks
No ratings yet
Fds Two Marks
10 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
7 - Foundations of DS
No ratings yet
7 - Foundations of DS
8 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
II CSE - A&B (96) DS-int 1 QP ANS-set1
No ratings yet
II CSE - A&B (96) DS-int 1 QP ANS-set1
7 pages
Fdsa Unit 1 Aids Sem 4
No ratings yet
Fdsa Unit 1 Aids Sem 4
26 pages
FDS Unit 1 QB
No ratings yet
FDS Unit 1 QB
7 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
FDS Unit1
No ratings yet
FDS Unit1
30 pages
Data Science Interview Best
No ratings yet
Data Science Interview Best
48 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
FDS Notes
No ratings yet
FDS Notes
5 pages
Unit 1 QB
No ratings yet
Unit 1 QB
6 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
Set. No - 2 P18pecs021-Data Science QP - Ph.d.
No ratings yet
Set. No - 2 P18pecs021-Data Science QP - Ph.d.
20 pages
Fods QB
No ratings yet
Fods QB
35 pages
Class 9 (Chap #4)
No ratings yet
Class 9 (Chap #4)
9 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
Fds Question Bank
No ratings yet
Fds Question Bank
116 pages
FDS Imp Docs
No ratings yet
FDS Imp Docs
22 pages
Fdsa 12 - 2M
No ratings yet
Fdsa 12 - 2M
15 pages
Chapter No.4 Exercise Solution (Computer)
No ratings yet
Chapter No.4 Exercise Solution (Computer)
8 pages
Fds 2 Marks
No ratings yet
Fds 2 Marks
14 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Q1. Explain Data Science Process Along With Detailed Diagram
No ratings yet
Q1. Explain Data Science Process Along With Detailed Diagram
7 pages
Unit 1
No ratings yet
Unit 1
34 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
ML Chapter 2
No ratings yet
ML Chapter 2
9 pages
Big Data (Imp-Questions)
No ratings yet
Big Data (Imp-Questions)
17 pages
Ad3491-FDA Unit 1 Question Bank
No ratings yet
Ad3491-FDA Unit 1 Question Bank
8 pages
Fdsa Unit 1
No ratings yet
Fdsa Unit 1
25 pages
Revision
No ratings yet
Revision
19 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
No ratings yet
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
8 pages
HTTTTC - Final Exam
No ratings yet
HTTTTC - Final Exam
4 pages
Introduction Data Science Edited
No ratings yet
Introduction Data Science Edited
33 pages
Unit I 2 Marks With Ans
No ratings yet
Unit I 2 Marks With Ans
7 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
Data Science
No ratings yet
Data Science
14 pages
IV AI-DS AD3491 FDSA QB Unit1
No ratings yet
IV AI-DS AD3491 FDSA QB Unit1
5 pages
Data Science - 8 (Answer Key)
No ratings yet
Data Science - 8 (Answer Key)
17 pages
Important Questions
No ratings yet
Important Questions
26 pages
Fds 2 Marks
No ratings yet
Fds 2 Marks
13 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
MLM FDS
No ratings yet
MLM FDS
19 pages
FDS Sem5
No ratings yet
FDS Sem5
20 pages
AD3491-Unit 1
No ratings yet
AD3491-Unit 1
32 pages
Understanding Data Assignment 2
No ratings yet
Understanding Data Assignment 2
12 pages
Data Science - Notes - X
No ratings yet
Data Science - Notes - X
3 pages
Dpa-Set - A
No ratings yet
Dpa-Set - A
29 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Data Science One Mark Question
No ratings yet
Data Science One Mark Question
3 pages
FDS 2 Marks All Units For File
No ratings yet
FDS 2 Marks All Units For File
13 pages
Statistical Modelling For Machine Learning
No ratings yet
Statistical Modelling For Machine Learning
179 pages
Assessment Student Learning
No ratings yet
Assessment Student Learning
23 pages
Analyzing The Effectiveness of Mobile
No ratings yet
Analyzing The Effectiveness of Mobile
11 pages
MC11 07
No ratings yet
MC11 07
6 pages
Managerial Computing: Modelling Microsoft Excel
No ratings yet
Managerial Computing: Modelling Microsoft Excel
31 pages
OER Statistics
No ratings yet
OER Statistics
8 pages
Final BMS 840 Module
No ratings yet
Final BMS 840 Module
62 pages
Semester IV BCom Hons
No ratings yet
Semester IV BCom Hons
34 pages
Statistics Solved MCQs (Set-1) McqMate - Com - Merged
No ratings yet
Statistics Solved MCQs (Set-1) McqMate - Com - Merged
47 pages
518PST01-P&S-unit IV-Qnbank
No ratings yet
518PST01-P&S-unit IV-Qnbank
4 pages
Jins Edwin Chemistry Assignment Semester 1 Term 2 2024
No ratings yet
Jins Edwin Chemistry Assignment Semester 1 Term 2 2024
15 pages
Statistics For GMAT
No ratings yet
Statistics For GMAT
28 pages
Chapter 6 Practice Questions On Normal Probability Distribution With Answer Key
No ratings yet
Chapter 6 Practice Questions On Normal Probability Distribution With Answer Key
4 pages
MMW Module 8 - Measures of Relative Position
No ratings yet
MMW Module 8 - Measures of Relative Position
11 pages
A New Calculation Method of Well Leakage in Gas We
No ratings yet
A New Calculation Method of Well Leakage in Gas We
5 pages
Public Speaking Skill of Senior High School Students at Partida National High School
No ratings yet
Public Speaking Skill of Senior High School Students at Partida National High School
13 pages
Investigating Effects of Viral Marketing On Consumer'S Purchasing Decision (Case Study: The Students of The
No ratings yet
Investigating Effects of Viral Marketing On Consumer'S Purchasing Decision (Case Study: The Students of The
12 pages
En 50341 1 OHTLs General Requirements
No ratings yet
En 50341 1 OHTLs General Requirements
2 pages
STS All Midterms
No ratings yet
STS All Midterms
26 pages
D2969 Zrow1858
No ratings yet
D2969 Zrow1858
17 pages
New Syllabus 2024-25 Final
No ratings yet
New Syllabus 2024-25 Final
52 pages
Abstract 1
No ratings yet
Abstract 1
2 pages
Chapter 14 - Measures of Central Tendency and Dispersion
No ratings yet
Chapter 14 - Measures of Central Tendency and Dispersion
140 pages
MBA Syllabus
No ratings yet
MBA Syllabus
26 pages
Statistics For Technology A Course in Applied Statistics Third Edition 3rd Ed Chatfield PDF Download
No ratings yet
Statistics For Technology A Course in Applied Statistics Third Edition 3rd Ed Chatfield PDF Download
78 pages
Finals MMW Final
No ratings yet
Finals MMW Final
4 pages
Sample Questions of Math 2205 For Probability
No ratings yet
Sample Questions of Math 2205 For Probability
9 pages
Course Pattern - BM (2024 - 2025)
No ratings yet
Course Pattern - BM (2024 - 2025)
75 pages
Hkdse m1 Notes
No ratings yet
Hkdse m1 Notes
3 pages
Coefficient of Variation - Definition, Formula, Interpretation, Examples & FAQs
No ratings yet
Coefficient of Variation - Definition, Formula, Interpretation, Examples & FAQs
19 pages

01.ad3491 Fdsa QB

Uploaded by

01.ad3491 Fdsa QB

Uploaded by

AD3491FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

9. State the differences between a histogram and bar graph.

10. What are the measures of central tendency?

16. What is standard deviation?

17. What is a Normal Curve?

20. What is Correlation?

34. When does Regression Fallacy occur?

15. What is the significance of r2 ? Give a detailed interpretation of r2 ?

20. Elaborate on multiple regression equations.

1. Define Population. Give an example.

A population is characterized by any complete set of observations (or potential

2. What is real population?

3. List the different types of population.

The different types of population are:

4. What is Hypothetical Population?

Examples: An outcome of rolling the dice, outcome of tossing a coin.

Any subset of observations from a population may be characterized as a sample. In

There are two categories of sampling generally used:

7. What is random sampling?

Probability sampling, also known as random sampling, is a kind of sample selection

8. List the types of random sampling.

Types of Probability/Random sampling:

 Simple random sampling

9. Differentiate population and sample.

10. List the types of non-probability sampling.

Types of Non-probability sampling:

11. What is Snowball sampling?

Non-probability sampling Probability sampling

13. What is the Optimal Sample Size?

14. What is Systematic sampling?

Systematic sampling is also known as systematic clustering. In this method, random

15. What is Cluster sampling?

16. What are the advantages of random sampling?

Advantages of random sampling are:

17. What is Consecutive sampling?

This non-probability sampling method is very similar to convenience sampling, with a

18. What is the Standard Error of the Mean?

19. What is hypothesis testing?

Hypothesis testing is an act in statistics whereby an analyst tests an assumption

20. What is one tailed test?

21. What is two tailed test?

A two-tailed test, in statistics, is a method in which the critical area of a distribution is

22. State the Central Limit Theorem.

Confidence intervals measure the degree of uncertainty or certainty in a sampling

25. What is point estimate?

A point estimate is defined as a calculation where a sample statistic is used to estimate

26. List the methods to calculate point estimates.

27. What are the properties of point estimators?

28. What is the drawback of point estimates? How it can be resolved?

29. What is interval estimator?

1. Discuss on population and samples with suitable examples.

2. Discuss the different types of random sampling techniques.

3. Elaborate on the different types of non-probability based sampling techniques.

4. Illustrate the hypothesis testing with an example.

5. Explain the procedure of z-test with an example.

8. A company wants to improve the quality of products by reducing defects and

9. Explain in detail about Estimation and the significance of point estimates.

10. Elaborate on Confidence interval and level of confidence.

You might also like