0% found this document useful (0 votes)
4 views

Data Science (Unit 02) Notes

Uploaded by

ankushpandey900
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Science (Unit 02) Notes

Uploaded by

ankushpandey900
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Science Using Python (MSCPHY- 404)

Unit - 02
Definition and scope of Statistics

Statistics is a branch of mathematics that deals with collecting, analyzing, interpreting, presenting,
and organizing data. It provides tools and methodologies for understanding and drawing
conclusions from data by applying mathematical theories of probability.

Definition

Statistics involves two main categories:

1. Descriptive Statistics: Methods for summarizing and describing the important features of
data. Examples include measures of central tendency (mean, median, mode) and measures
of variability (range, variance, standard deviation).
2. Inferential Statistics: Techniques that allow us to use data from a sample to make
generalizations or inferences about a population. This includes hypothesis testing,
confidence intervals, and regression analysis.

Scope

The scope of statistics is broad and encompasses many areas:

 Data Collection: Designing surveys, experiments, and observational studies to gather data.
 Data Analysis: Using statistical techniques to examine data and discover patterns or
relationships.
 Probability: Applying the principles of probability to predict and model random events.
 Inferential Methods: Making predictions or inferences about a population based on
sample data.
 Statistical Modelling: Creating models to understand the relationships between variables
and to forecast future trends.
 Decision Making: Using statistical information to inform and guide decisions in various
fields such as business, healthcare, economics, and government.

Statistics is integral to many fields, including medicine, economics, social sciences, engineering,
and natural sciences. It helps in making informed decisions, improving processes, and advancing
knowledge in various disciplines.

Does that cover what you were looking for? Or is there a specific aspect of statistics you're more
interested in exploring?

Statistical Population

 Definition: A statistical population is the entire set of individuals, items, or data that is of
interest in a particular study. It includes all possible observations that match a set of criteria.
 Example: If you're studying the average height of adult men in India, the population would
be all adult men living in India.

Sample

 Definition: A sample is a subset of the population that is selected for the actual study. It
should be representative of the population to ensure that the results can be generalized back
to the population.
 Example: Continuing from the previous example, a sample would be a group of adult men
in India whose heights are measured and analyzed.

Key Differences

1. Size: A population includes all members that fit the criteria, while a sample consists of
only a part of the population.
2. Purpose: The population represents the entire group of interest, and the sample is used to
make inferences about the population.
3. Representation: A sample should be a fair representation of the population to avoid bias
and ensure accurate results.

Importance of Sampling

 Practicality: Studying an entire population is often impractical or impossible. Sampling


makes research feasible and cost-effective.
 Efficiency: Sampling allows for quicker data collection and analysis, making the research
process more efficient.
 Accuracy: With proper sampling techniques, a sample can provide highly accurate
estimates and insights about the population.

Effective sampling requires careful planning to ensure that the sample accurately reflects the
population, and various sampling methods (like random sampling, stratified sampling, etc.) are
used to achieve this.

Quantitative Data

 Definition: Quantitative data is numerical and can be measured and counted. It represents
quantities and is often used for statistical analysis and mathematical calculations.
 Examples: Height, weight, temperature, number of students in a class, annual income, test
scores.
 Characteristics:
o Can be discrete (countable items) or continuous (measurable quantities).
o Allows for mathematical operations like addition, subtraction, multiplication, and
division.
o Can be represented using graphs like bar charts, histograms, and scatter plots.

Qualitative Data
 Definition: Qualitative data is descriptive and conceptual. It represents qualities or
characteristics that are not numerical and often involves categorizing or labeling attributes.
 Examples: Colors, gender, nationality, favorite foods, opinions, feedback, types of pets.
 Characteristics:
o Describes properties or attributes.
o Often categorized into different groups or themes.
o Can be represented using graphs like pie charts and bar charts.

Comparison

 Nature:
o Quantitative: Numerical and measurable.
o Qualitative: Descriptive and categorical.
 Usage:
o Quantitative: Used for statistical and mathematical analysis.
o Qualitative: Used for understanding concepts, opinions, and experiences.
 Representation:
o Quantitative: Numbers, graphs, and charts.
o Qualitative: Words, themes, categories, and charts.

Understanding the difference between these two types of data is crucial for selecting appropriate
methods of analysis and representation. Quantitative data provides hard numbers that can be
statistically analyzed, while qualitative data offers rich, detailed insights into the subject matter.

Attributes

 Definition: Attributes are the specific qualities or characteristics that describe or


differentiate entities in a dataset. They are often used in qualitative data.
 Examples:
o Color (red, blue, green)
o Gender (male, female, non-binary)
o Brand (Nike, Adidas, Puma)

Variables

 Definition: Variables are measurable quantities that can take on different values. They are
often used in quantitative data.
 Examples:
o Height (in centimeters)
o Weight (in kilograms)
o Age (in years)

Types of Variables

Variables can be categorized into several types:


1. Categorical (Qualitative) Variables:
o Nominal: Categories without a specific order (e.g., colors, brands).
o Ordinal: Categories with a specific order (e.g., rankings, satisfaction levels).
2. Numerical (Quantitative) Variables:
o Discrete: Countable values (e.g., number of students, number of cars).
o Continuous: Measurable values that can take on any value within a range (e.g.,
height, weight, temperature).

Key Differences

 Nature:
o Attributes: Qualitative, descriptive.
o Variables: Quantitative, measurable.
 Representation:
o Attributes: Categories or labels.
o Variables: Numbers or quantities.

Understanding attributes and variables is crucial in statistics as they determine the type of data
analysis and statistical methods to be used. If you're looking at a dataset, recognizing whether you
have attributes or variables (or a mix of both) will guide you on how to process and analyze your
data effectively.

1. Nominal Scale

 Definition: The nominal scale classifies data into distinct categories in which no order or
ranking can be inferred.
 Characteristics:
o Categories are mutually exclusive.
o Categories do not have a logical order.
 Examples:
o Gender (male, female, non-binary)
o Blood type (A, B, AB, O)
o Colors (red, blue, green)

2. Ordinal Scale

 Definition: The ordinal scale arranges data into categories that have a meaningful order or
ranking, but the intervals between the categories are not necessarily equal.
 Characteristics:
o Categories are ordered.
o The differences between the categories are not uniform.
 Examples:
o Socioeconomic status (low, middle, high)
o Education level (high school, bachelor's, master's, PhD)
o Customer satisfaction ratings (satisfied, neutral, dissatisfied)
3. Interval Scale

 Definition: The interval scale measures data where the intervals between values are
meaningful and equal, but there is no true zero point.
 Characteristics:
o Differences between values are consistent.
o No true zero point (zero does not mean absence of the quantity).
 Examples:
o Temperature in Celsius or Fahrenheit (0°C does not mean 'no temperature')
o Calendar years (2023, 2024, etc.)

4. Ratio Scale

 Definition: The ratio scale has all the characteristics of the interval scale, with the addition
of a true zero point, which means zero indicates the absence of the quantity being measured.
 Characteristics:
o Equal intervals between values.
o A true zero point exists.
 Examples:
o Weight (0 kg means no weight)
o Height (0 cm means no height)
o Income (0 dollars means no income)

Summary

Scale Description Examples


Categories without a specific
Nominal Gender, Blood type, Colors
order
Ordered categories without Socioeconomic status, Education level, Customer
Ordinal
equal intervals satisfaction ratings
Equal intervals without a true
Interval Temperature, Calendar years
zero
Ratio Equal intervals with a true zero Weight, Height, Income

Each scale of measurement provides different levels of information and requires specific statistical
techniques for analysis. Understanding these scales helps in selecting appropriate statistical
methods and interpreting data accurately.

Tabular Presentation

 Definition: Organizing data into tables, which include rows and columns.
 Advantages:
o Easy to compare data points.
o Suitable for detailed and complex data.
 Example: A table showing the number of students in different classes.

Example Table

Class Number of Students


1st 30
2nd 28
3rd 35
4th 33

Graphical Presentation

 Definition: Using visual aids like charts, graphs, and diagrams to represent data.
 Advantages:
o Quickly conveys trends and patterns.
o Engages the audience more effectively.
 Example Types:
o Bar Chart: Good for comparing different categories.
o Pie Chart: Ideal for showing proportions.
o Histogram: Useful for displaying frequency distribution of data.
o Line Graph: Great for showing trends over time.
o Scatter Plot: Effective for showing relationships between two variables.

Example Graphs

Bar Chart:

| Categories | Values |
|--------------|----------|
| Category A | ###### |
| Category B | ### |
| Category C | ######## |

Pie Chart:

Category A: 40%
Category B: 20%
Category C: 40%
Values

Category C Category A
40% 40%

Category B
Category A 20% B
Category Category C

In summary, the choice between tabular and graphical presentation depends on the nature of the
data and the audience. Tables are excellent for detailed comparisons, while graphs are ideal for
visualizing trends, patterns, and relationships. Both methods can complement each other to provide
a comprehensive understanding of the data.

You might also like