0% found this document useful (0 votes)

209 views22 pages

Unit-2 Solution

fods

Uploaded by

AntonyManickaraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

209 views22 pages

Unit-2 Solution

fods

Uploaded by

AntonyManickaraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 22

PART-A

1.will treating categorical variables as continuous variables result in a better predictive model?
Justify your answer.(APR-2024)
Treating categorical variables as continuous variables is generally not advisable and often leads to
misleading results
a.Nature of Categorical Variables
b.Misinterpretation of Relationships
c.Loss of Information
d.Model Performance
e.Statistical Assumptions

2.Issue:Feeding data which has variables correlated to one another is not a good statistical practice,
since we are providing multiple weightage to the same type of data.
Solution:Correlation Analysis
Show how such issues are prevented by correlation analysis technique.Justify with a small instance
dataset.(Apr-2024)
Nov-2023To prevent issues caused by multicollinearity, you might:
1. Remove Highly Correlated Variables:
• If two variables are highly correlated, you can remove one. For instance, you might
choose to keep only "Size" and exclude "Bedrooms" if they are providing redundant
information.
2. Combine Correlated Variables:
• Create a new variable that combines the information. For example, a "Size per
Bedroom" variable might provide a new perspective.
3. Principal Component Analysis (PCA):
• PCA can transform correlated variables into a set of linearly uncorrelated
components.

Example Solution
Let’s simplify and fit a linear regression model to predict Price using Size and Age, after deciding to
exclude Bedrooms due to its high correlation with Size:
Create the Regression Model:
3.Explain the types of data
a.Quantitative Data
b.Qualitative Data
c.Text Data
d.Time Series Data
e.Spatial Data
f.Binary Data
g.Structured Data
i.Unstructured Data
4.Define median with example
The median is a measure of central tendency that represents the middle value in a dataset when it is
ordered from smallest to largest. If the dataset has an odd number of observations, the median is the
middle one. If the dataset has an even number of observations, the median is the average of the two
middle values.

Example 1: Odd Number of Observations

Consider the dataset: 3, 7, 5, 9, 1
1. First, order the numbers: 1, 3, 5, 7, 9
2. Since there are 5 numbers (an odd number), the median is the middle one, which is 5.
So, the median of this dataset is 5.
Apr-2023
5.compare and contrast qualitative data and quantitative daa with an example.Qualitative
Data
Nature:
• Descriptive: Qualitative data is non-numeric and describes qualities or characteristics.
• Subjective: It often involves subjective interpretation of data.
• Exploratory: Typically used to explore concepts, understand experiences, and gather in-
depth insights.
Collection Methods:
• Interviews
• Focus Groups

Quantitative Data
Nature:
• Numeric: Quantitative data is numeric and can be measured.
• Objective: It tends to be more objective and can be statistically analyzed.
• Hypothesis Testing: Often used to test hypotheses and make predictions.
Collection Methods:
• Surveys/Questionnaires:
• Experiments
• Existing Data
6.List the difference between a discrete variable and continuous variable with an
example.(nov-22)
• Discrete Variables
Definition: Discrete variables are those that can only take on a countable number of distinct values.
They often represent counts or categories.
Characteristics:
1. Countable: You can list or count all possible values.
2. No Intermediate Values: There are no possible values between two adjacent values.
3. Examples: Number of students in a class, number of cars in a parking lot.
Example:
• Number of books

Continuous Variables
Definition: Continuous variables can take on an infinite number of values within a given range.
They are usually measurements and can be divided into smaller and smaller parts.
Characteristics:
1. Uncountable: There are infinitely many possible values within a given range.
2. Intermediate Values: There are possible values between any two adjacent values.
3. Examples: Height, weight, temperature.
Example:
• Height:
• Nov-2022
• 7.classify the below list of data into their types:a)ethnic group b)age c)family size
d)academic major e)sexual preference f)IQ score g)nte worth (dollars) h)third-place finish
i)gender j)temperature
And write a brief note on them.a) Ethnic group - Categorical (nominal)
b) Age - Quantitative (continuous)
c) Family size - Quantitative (discrete)
d) Academic major - Categorical (nominal)
e) Sexual preference - Categorical (nominal)
f) IQ score - Quantitative (continuous)
g) Net worth (dollars) - Quantitative (continuous)
h) Third-place finish - Ordinal (ordinal)
i) Gender - Categorical (nominal)
j) Temperature - Quantitative (continuous)
8.What is a percentile rank?Give an example.
A percentile rank is a statistical measure used to understand and interpret a data point's position
within a dataset relative to other data points. Specifically, it tells you the percentage of scores in a
dataset that fall below a particular score.
Eg:If the test had 1000 students, and your percentile rank is the 75th percentile, then 750 students
scored below you, and 250 students scored above you.
Part-B
1.(a)i)Indicate whether each of the following distributions is positively or negatively skewed.The
distribution of
(1) Incomes of tax payers have a mean of $48,000 and a median of $43,000.To determine the
skewness of the distribution, compare the mean and median:
• Positively skewed (right-skewed): The mean is greater than the median.
• Negatively skewed (left-skewed): The mean is less than the median.
Given:
• Mean income = $48,000
• Median income = $43,000
Since the mean ($48,000) is greater than the median ($43,000), the distribution of incomes is
positively skewed (right-skewed). This means that there are some high-income outliers pulling the
mean to the right, creating a longer tail on the higher end of the distribution.
(2)GPAs for all students at some college have a mean of 3.01 and a median of 3.20
• Mean GPA = 3.01
• Median GPA = 3.20
Since the mean (3.01) is less than the median (3.20), the distribution of GPAs is positively skewed
(right-skewed). This indicates that there are some lower GPAs pulling the mean down, creating a
longer tail on the lower end of the distribution.
ii)During their swim through a water maze, 15 laboratory rats made the following number of errors
(blind alleyway entrances):2,17,5,3,28,7,5,8,5,6,2,12,10,4,3.
(1)Find the mode,median and mean for these data.

To find the mode, median, and mean for the given data set, follow these steps:

Data Set
2, 17, 5, 3, 28, 7, 5, 8, 5, 6, 2, 12, 10, 4, 3
1. Mode
The mode is the number that appears most frequently in the data set.
• Frequency of each number:
• 2: 2 times
• 3: 2 times
• 4: 1 time
• 5: 3 times
• 6: 1 time
• 7: 1 time
• 8: 1 time
• 10: 1 time
• 12: 1 time
• 17: 1 time
• 28: 1 time
The number 5 appears most frequently (3 times), so the mode is 5.
2. Median
The median is the middle value when the data is ordered from smallest to largest. If there is an even
number of observations, the median is the average of the two middle numbers.
• First, sort the data: 2, 2, 3, 3, 4, 5, 5, 5, 6, 7, 8, 10, 12, 17, 28
• There are 15 data points (an odd number), so the median is the 8th value in this sorted list.
The median is 5.
3. Mean
3. Mean: To find the mean, we need to calculate the sum of all the values and then divide by the
number of data points.
Sum of the data: 2 + 17 + 5 + 3 + 28 + 7 + 5 + 8 + 5 + 6 + 2 + 12 + 10 + 4 + 3 = 119
Number of data points = 15
Mean = Total sum / Number of data points = 119 / 15 ≈ 7.93

So, the mean is 9.4.

Summary
• Mode: 5
• Median: 5
• Mean: approximately 7.93
(2)without constructing a frequency distribution or graph, would it be possible to characterize the
shape of this distribution as balanced, positively skewed, or negatively skewed?
To characterize the shape of the distribution without constructing a frequency distribution or graph,
you can use the relationship between the mean, median, and mode to determine the skewness.
Here’s how:
1. Calculate the Mode, Median, and Mean:
• Mode: 5
• Median: 5
• Mean: 7.93
2. Determine Skewness:
• Positively Skewed (Right-Skewed): Mean > Median
• Negatively Skewed (Left-Skewed): Mean < Median
• Balanced (Symmetrical): Mean ≈ Median
In this case:
• Mean =7.93
• Median = 5
The mean (7.93) is greater than the median (5).
This indicates that the distribution is positively skewed (right-skewed). This is because the
mean is pulled in the direction of the higher values, suggesting that there are some high
outliers (such as the number 28) that are stretching the distribution to the right.

Summary
Based on the mean and median comparison, the distribution of errors is positively skewed (right-
skewed).
(B)i)Assume that SAT math scores approximate a normal curve with a mean of 500 and standard
deviation 100.
Sketch a normal curve and shade in the target area(s) described y each of the following statements:
*more than 570
*Less than 515
*between 520 and 540
*convert to z scores and find the target areas specific to the above values.

To address the problem of shading areas under the normal curve based on SAT math scores, we’ll
start by sketching the normal distribution and then use z-scores to find the specific areas.

1. Sketching the Normal Curve

The SAT math scores are normally distributed with:
• Mean (μ) = 500
• Standard Deviation (σ) = 100
Here’s a rough sketch of the normal curve:
Between 520 and 540: Area between z = 0.2 and z = 0.4 (approx. 7.61%).

ii)Assume that the burning times of electric light bulbs approximate a normal curve with a mean of
1200 hours and standard deviation of 120 hours. If a large number of new lights are installed at the
same time (possibly along a newly opened freeway) , at what time will.
 1 percent fails?
 50 percent fail?
 95 percent fail?
To determine the time at which a certain percentage of light bulbs will fail, given that the burning
times follow a normal distribution with a mean (μ\muμ) of 1200 hours and a standard deviation (σ\
sigmaσ) of 120 hours, we use the properties of the normal distribution. Here’s how we find the
times at which 1 percent, 50 percent, and 95 percent of the light bulbs will fail:
1. 1 Percent Failure Time: This is the time below which 1% of the bulbs will fail. In terms of
the normal distribution, this corresponds to the 1st percentile.
2. 50 Percent Failure Time: This is the median of the distribution, which for a normal
distribution is the mean. Thus, 50% of the bulbs will fail by this time.
3. 95 Percent Failure Time: This is the time below which 95% of the bulbs will fail. In terms
of the normal distribution, this corresponds to the 95th percentile.
We’ll use the Z-scores associated with these percentiles to find the actual times.

1. 1 Percent Failure Time

The Z-score for the 1st percentile (which is the 0.01 quantile) is approximately -2.33. To find the
corresponding time:
X=μ+Z⋅σ
Substitute the values:

So, approximately 1 percent of the light bulbs will fail by 920.4 hours.

2. 50 Percent Failure Time

The 50th percentile is the mean of the distribution.
So, 50 percent of the light bulbs will fail by 1200 hours.

3. 95 Percent Failure Time

The Z-score for the 95th percentile (which is the 0.95 quantile) is approximately 1.645. To find the
corresponding time:
X=μ+Z⋅σX = \mu + Z \cdot \sigmaX=μ+Z⋅σ
Substitute the values:

So, approximately 95 percent of the light bulbs will fail by 1397.4 hours.

Summary
• 1 percent of bulbs fail by approximately 920.4 hours.
• 50 percent of bulbs fail by 1200 hours.
• 95 percent of bulbs fail by approximately 1397.4 hours.
3.a)i)Explain normal curve and z-score.
ii)Using standard normal curve table, find the proportion of the total area identified with the
following statements.
1)above a z score of 1.80
2)between the mean and a z score of 1.65
3)between z scores of 0 and -1.96

1. Normal Curve and Z-Score

Normal Curve:
• The normal curve, or normal distribution, is a continuous probability distribution that is
symmetric about its mean. It has a bell-shaped curve.
• The mean, median, and mode of the distribution are all located at the center of the curve.
• The total area under the curve equals 1 (or 100% when expressed as a percentage). This area
represents the total probability of all possible outcomes.
• The curve is characterized by two parameters: the mean (μ) and the standard deviation (σ).
Z-Score:
• The z-score (or standard score) measures how many standard deviations an individual data
point is from the mean of the distribution.
• It is calculated using the formula:

•
•
•
•
• A z-score of 0 corresponds to the mean, positive z-scores correspond to values above the
mean, and negative z-scores correspond to values below the mean.

2. Finding Proportions Using the Standard Normal Curve Table

The standard normal curve table (or z-table) provides the area (or probability) to the left of a given
z-score in the standard normal distribution. To find the area to the right of a z-score, we subtract the
value from 1.
Let’s use the z-table to find the following proportions:
1. Above a z-score of 1.80:
• Look up the z-score of 1.80 in the z-table. It typically gives the cumulative
probability (area) to the left of 1.80.
• The cumulative probability for z = 1.80 is approximately 0.9641.
• The proportion of the area above z = 1.80 is:
1−0.9641=0.03591 - 0.9641 = 0.03591−0.9641=0.0359
So, about 3.59% of the area is above a z-score of 1.80.
2. Between the mean and a z-score of 1.65:
• The mean corresponds to a z-score of 0, so we need to find the cumulative
probability for z = 1.65.
• From the z-table, the cumulative probability for z = 1.65 is approximately 0.9505.
• The area between the mean (z = 0) and z = 1.65 is:
0.9505−0.5=0.45050.9505 - 0.5 = 0.45050.9505−0.5=0.4505
So, about 45.05% of the area is between the mean and a z-score of 1.65.
3. Between z-scores of 0 and -1.96:
• First, find the cumulative probability for z = -1.96.
• For z = -1.96, the cumulative probability is approximately 0.0250.
• Since we are interested in the area between z = 0 and z = -1.96, and the cumulative
probability for z = 0 is 0.5, the area between these z-scores is:
0.5−0.0250=0.47500.5 - 0.0250 = 0.47500.5−0.0250=0.4750
So, about 47.50% of the area is between z-scores of 0 and -1.96.
These proportions help us understand the distribution of data relative to the mean in a normal
distribution.
b)i)Describe the types of variable
ii)suppose a hospital tested the age and body fat data for randomly selected adults with the
following result:
Age 23 27 39 49 50 52 54 56 57 58 60
%Fat 9.5 17.8 31.4 27.2 31.2 34.6 42.5 33.4 30.2 34.1 41
Draw the boxplots for age.

Types of Variables
In statistics, variables are typically categorized into different types based on their nature and the
kind of data they represent. The most common types are:
1. Qualitative (Categorical) Variables:
• Nominal: These variables represent categories without a specific order. Examples
include gender, color, or type of fruit.
• Ordinal: These variables represent categories with a meaningful order or ranking,
but the distances between categories are not necessarily equal. Examples include
education level (high school, bachelor's, master's, etc.) or satisfaction ratings (poor,
fair, good, excellent).
2. Quantitative (Numerical) Variables:
• Discrete: These variables represent countable quantities and often involve integers.
Examples include the number of children in a family or the number of cars in a
parking lot.
• Continuous: These variables represent measurable quantities and can take on an
infinite number of values within a range. Examples include height, weight, and age.
ii)suppose a hospital tested the age and body fat data for randomly selected adults with the
following result:
Age 23 27 39 49 50 52 54 56 57 58 60
%Fat 9.5 17.8 31.4 27.2 31.2 34.6 42.5 33.4 30.2 34.1 41
Draw the boxplots for age.
To draw the boxplot for the age data, follow these steps:

1. Organize the Data

First, let's arrange the age data in ascending order:
Age: 23,27,39,49,50,52,54,56,57,58,60\text{Age: } 23, 27, 39, 49, 50, 52, 54, 56, 57, 58,
60Age: 23,27,39,49,50,52,54,56,57,58,60

2. Calculate Key Statistics

1. Median (Q2)
The median is the middle value of the sorted data. Since there are 11 data points (an odd number),
the median is the 6th value:
Median (Q2)=52

2. Lower Quartile (Q1)

The lower quartile is the median of the lower half of the data, excluding the overall median. The
lower half of the data is:
23,27,39,49,5023, 27, 39, 49, 5023,27,39,49,50
The median of this subset (5 values) is the 3rd value:
Lower Quartile (Q1)=39
3. Upper Quartile (Q3)
The upper quartile is the median of the upper half of the data, excluding the overall median. The
upper half of the data is:
54,56,57,58,6054, 56, 57, 58, 6054,56,57,58,60
The median of this subset (5 values) is the 3rd value:
Upper Quartile (Q3)=57
4. Interquartile Range (IQR)
The IQR is calculated as:
IQR=Q3−Q1=57−39=18
5. Whiskers
To determine the whiskers, calculate:
• Lower whisker: The smallest data point within
• Q1−1.5×IQR:
• Q1−1.5×IQR=39−1.5×18=39−27=12
• The smallest value is 23, which is greater than 12, so the lower whisker is 23.
• Upper whisker: The largest data point within
• Q3+1.5×IQR:
• Q3+1.5×IQR=57+1.5×18=57+27=84
• The largest value is 60, which is less than 84, so the upper whisker is 60.

3. Draw the Boxplot

Boxplot Components:
• Box: Draw a box from Q1 (39) to Q3Q3Q3 (57).
• Median Line: Draw a line inside the box at the median (52).
• Whiskers: Extend the whiskers from the smallest data point (23) to Q1 and from Q3Q3Q3
to the largest data point (60).

• In this boxplot:
• The box starts at 39 and ends at 57.
• The median (52) is marked inside the box.
• The whiskers extend from 23 to 39 on the lower side and from 57 to 60 on the upper side.
This boxplot provides a visual summary of the distribution of ages in the dataset.
•
4.a)i)what is a frequency distribution?Customers who have purchased a particular product
rated the usability of the product on a 10-point scale, ranging from 1 (poor) to 10
(excellent )as follows:
3 7 2 7 8

3 1 4 10 3
2 5 3 5 8
9 7 6 3 7
8 9 7 3 6
Construct a frequency distribution for the above data.

A frequency distribution is a way to organize and summarize a set of data by showing how
often each value or range of values occurs. It provides a clear picture of the data's
distribution, making it easier to analyze and interpret.

To construct a frequency distribution for the usability ratings provided, we need to tally how many
times each rating appears in the dataset. Here’s the step-by-step process:
1. List the Ratings: The ratings are:
2. 3,7,2,7,8,3,1,4,10,3,2,5,3,5,8,9,7,6,3,7,8,9,7,3,6
3. Create a Table to Count Frequencies: We’ll count how many times each rating from 1 to
10 appears in the list.

Verify the Count:

• Rating 1 appears once.
• Rating 2 appears twice.
• Rating 3 appears ten times.
• Rating 4 appears once.
• Rating 5 appears twice.
• Rating 6 appears twice.
• Rating 7 appears seven times.
• Rating 8 appears four times.
• Rating 9 appears twice.
• Rating 10 appears once.
Thus, the frequency distribution for the usability ratings is summarized in the table above.
ii)what is relative frequency distribution?The GRE scores for a group of graduate school
applicants are distributed as follows:
GRE Sore frequency
725-749 1
700-724 3
675-699 14
650-774 30
625-649 34
600-624 42
575-599 30
550-574 27
525-549 13
500-524 4
475-499 3
Total 200

Explain the procedure to convert a frequency distribution into a relative frequency

distribution
And convert the data presented in the above table to a relative frequency distribution.Do not
round numbers to two digits to the right of the decimal point.

To convert a frequency distribution into a relative frequency distribution, follow these steps:
1. Calculate the Total Number of Observations: This is the sum of all frequencies. In your
case, it's given as 200.
2. Calculate Relative Frequency for Each Interval: Divide the frequency of each interval by
the total number of observations.
3. Present the Results: Create a table where each frequency is replaced by its corresponding
relative frequency.
Here’s how you can convert the given frequency distribution into a relative frequency distribution:

Frequency Distribution Table

GRE Score Range Frequency

725-749 1
700-724 3
675-699 14
650-674 30
GRE Score Range Frequency
625-649 34
600-624 42
575-599 30
550-574 27
525-549 13
500-524 4
475-499 3
Total 200

Convert to Relative Frequency Distribution

1. Relative Frequency Formula:
Relative Frequency=FrequencyTotal Number of Observations\text{Relative Frequency} = \
frac{\text{Frequency}}{\text{Total Number of
Observations}}Relative Frequency=Total Number of ObservationsFrequency
2. Compute Relative Frequencies:
•
• For 475-
499:

{3}/{200} = 0.015
Relative Frequency Distribution Table
GRE Score Range Frequency Relative Frequency
725-749 1 0.005
700-724 3 0.015
675-699 14 0.07
650-674 30 0.15
625-649 34 0.17
600-624 42 0.21
575-599 30 0.15
550-574 27 0.135
525-549 13 0.065
500-524 4 0.02
475-499 3 0.015
By converting frequencies to relative frequencies, you get a better sense of how each category
compares proportionally to the whole dataset.

b)I)what is z-score?Outline the steps to obtain a Z-score.

A z-score is a statistical measure that describes how many standard deviations a data point is from
the mean of a dataset. It helps in understanding the position of a particular value within a
distribution and is commonly used in various statistical analyses and hypothesis testing.

Steps to Obtain a Z-Score

1. Find the Mean (μ)
ii)Express each of the following scores as a Z score:First, Mary’s intelligence quotient is
135, given a mean of100 and standard deviation 15.Second,
Mary obtained a score of 470 in the competitive examination conducted in April 2022 given
a mean of 500 and a standard deviation of 100.

Applying the formula:

12.a)Demonstrate the different types of variables used in data analysis with an example for
each.
In data analysis, different types of variables are used to categorize and interpret data. Here’s an
overview of common variable types with examples for each:
1. Nominal Variables
Nominal variables are categorical variables with no inherent order or ranking. They simply
represent different categories.
Example:
• Variable: Favorite Color
• Categories: Red, Blue, Green, Yellow, etc.
• Usage: You might use this variable to analyze the most popular color among a group of
people.

2. Ordinal Variables
Ordinal variables are categorical variables with a meaningful order but no consistent difference
between categories.
Example:
• Variable: Customer Satisfaction Rating
• Categories: Very Unsatisfied, Unsatisfied, Neutral, Satisfied, Very Satisfied
• Usage: You might use this variable to gauge overall customer satisfaction and identify
trends over time.

3. Interval Variables
Interval variables are numeric variables where the intervals between values are consistent, but there
is no true zero point.
Example:
• Variable: Temperature in Celsius
• Values: -5°C, 0°C, 25°C, 40°C, etc.
• Usage: You might use this variable to analyze temperature patterns and their effects on
various outcomes.

4. Ratio Variables
Ratio variables are numeric variables with a true zero point, which allows for meaningful
comparisons of ratios.
Example:
• Variable: Height
• Values: 150 cm, 170 cm, 180 cm, etc.
• Usage: You might use this variable to study the correlation between height and other factors,
such as weight.

5. Binary Variables
Binary variables are a special type of nominal variable with only two possible values.
Example:
• Variable: Has a Pet
• Values: Yes, No
• Usage: You might use this variable to analyze pet ownership trends or its effects on other
variables, like happiness.

6. Continuous Variables
Continuous variables can take on an infinite number of values within a given range and can be
measured with fine precision.
Example:
• Variable: Annual Income
• Values: $30,000, $45,678, $100,000, etc.
• Usage: You might use this variable to analyze income distribution and its impact on
spending behavior.

7. Discrete Variables
Discrete variables are numeric variables that can only take on specific, distinct values, often
integers.
Example:
• Variable: Number of Children
• Values: 0, 1, 2, 3, etc.
• Usage: You might use this variable to study family size and its effects on household
spending.
These different types of variables are fundamental in designing data analysis methods and
interpreting results, allowing analysts to make meaningful conclusions from data.
b)The number of friends reported by Facebook users is summarized in the following
frequency distribution.
FRIENDS f
400-above 2
350-399 5
300-349 12
250-299 17
200-249 23
150-199 49
100-149 27
60-99 29
0-49 36
Total 200

i)what is the shape of this distribution?

ii)Find the relative frequencies.
iii)Find the approximate percentile rand of the interval 300-349
iv)convert to histogram.
v)why would it not be possible to convert to a stem and leaf display?
Let's address each part of the question about the frequency distribution of the number of friends
reported by Facebook users.
i) Shape of the Distribution
To determine the shape of the distribution, we analyze the frequency data:
• 400-above: 2
• 350-399: 5
• 300-349: 12
• 250-299: 17
• 200-249: 23
• 150-199: 49
• 100-149: 27
• 60-99: 29
0-49: 36
Observing the frequencies:
• The number of friends increases to a peak in the 150-199 interval, then decreases in higher
intervals, forming a somewhat bell-shaped distribution.
The distribution is unimodal (one peak) and appears to be roughly symmetric with a central peak
around the 150-199 interval, although it's not perfectly symmetrical.
ii) Relative Frequencies
Relative frequency is calculated by dividing the frequency of each class by the total number of
observations and then converting it to a percentage.
Total number of observations = 200
Relative frequency for each interval:

iii) Approximate Percentile Rank of the Interval 300-349

To find the approximate percentile rank of the interval 300-349, we calculate the cumulative
frequency up to the interval just before 300-349, then add the cumulative frequency of the 300-349
interval.
Cumulative frequencies:

The interval 300-349 is approximately at the 87.5th percentile.

iv) Convert to Histogram
To convert the frequency distribution to a histogram:
1. Choose the intervals as the x-axis bins (400-above, 350-399, etc.).
2. Plot the frequency for each interval as the height of the bars on the y-axis.
The bars should be contiguous (no spaces between intervals), and each bar's height should
correspond to the frequency of that interval.
v) Stem-and-Leaf Display
It is not feasible to convert this frequency distribution into a stem-and-leaf display because:
1. Large Number of Data Points: A stem-and-leaf plot is typically used for a smaller dataset
where each data point can be represented individually. Given the range of intervals and the
total number of observations (200), it's impractical to list out each data point in a stem-and-
leaf plot.
2. Grouped Data: The frequency distribution provided is grouped data. Stem-and-leaf plots
are used for ungrouped or less aggregated data, where individual values can be directly
displayed. Here, you only have interval-based frequencies, not the specific individual data
points.
Thus, while histograms can effectively represent grouped data, stem-and-leaf displays are better
suited for raw, ungrouped data.

Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
A Guide To Life Cycle Assessment of Buildings
No ratings yet
A Guide To Life Cycle Assessment of Buildings
194 pages
Risk Assessment
100% (1)
Risk Assessment
3 pages
Unit 2 - Data Preprocessing
No ratings yet
Unit 2 - Data Preprocessing
23 pages
Unit 5 Ad3491 Fundamentals of Data Science Unit 5 Notes
No ratings yet
Unit 5 Ad3491 Fundamentals of Data Science Unit 5 Notes
24 pages
Quality Assurance of Teachers Continuing Professional Development 2019
No ratings yet
Quality Assurance of Teachers Continuing Professional Development 2019
48 pages
CS3352 - Foundation of Data Science
No ratings yet
CS3352 - Foundation of Data Science
2 pages
Research in Daily Life 1 Chapter 1 To 3
No ratings yet
Research in Daily Life 1 Chapter 1 To 3
47 pages
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
No ratings yet
Programming in C - CS3251 - HandWritten Notes - Un - 250316 - 200237
38 pages
Psychometric Tests
No ratings yet
Psychometric Tests
15 pages
What Is A Balanced Scorecard (BSC) ?: Definition Cheat Sheet
100% (1)
What Is A Balanced Scorecard (BSC) ?: Definition Cheat Sheet
13 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Iwt Practical
No ratings yet
Iwt Practical
20 pages
Customer Satisfaction in A Restaurant
No ratings yet
Customer Satisfaction in A Restaurant
62 pages
Econometrics 1 Cumulative Final Study Guide
No ratings yet
Econometrics 1 Cumulative Final Study Guide
35 pages
Xie, X. (2010) - Why Are Students Quiet - Looking at The Chinese Context and Beyond. ELT Journal 64 (1), 10 - 20.
100% (2)
Xie, X. (2010) - Why Are Students Quiet - Looking at The Chinese Context and Beyond. ELT Journal 64 (1), 10 - 20.
11 pages
Papp Susan Margaret 202006 PHD Thesis
No ratings yet
Papp Susan Margaret 202006 PHD Thesis
289 pages
Final Repot ART HRM Beximco Pharma
No ratings yet
Final Repot ART HRM Beximco Pharma
96 pages
Budget and Budgetary Control
No ratings yet
Budget and Budgetary Control
14 pages
ds4015 Big Data Analytics Vignesh K Notes
No ratings yet
ds4015 Big Data Analytics Vignesh K Notes
146 pages
Thinking and Working Scientifically Stage 1 - 6
No ratings yet
Thinking and Working Scientifically Stage 1 - 6
9 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
52 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Piling - Pile Integrity Testing - History Present Future
100% (1)
Piling - Pile Integrity Testing - History Present Future
16 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Innovative Strategies Statistical Solutions and Simulations For Modern Clinical Trials 1st Edition Mark Chang (Author) All Chapters Instant Download
No ratings yet
Innovative Strategies Statistical Solutions and Simulations For Modern Clinical Trials 1st Edition Mark Chang (Author) All Chapters Instant Download
55 pages
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
No ratings yet
Unit-5 Unit-5: Case Studies of Big Data Analytics Using Map-Reduce Programming
11 pages
Lecture 6 Data Preprocessing
No ratings yet
Lecture 6 Data Preprocessing
59 pages
(Ebook) Multinationals and Cross-Cultural Management: The Transfer of Knowledge Within Multinational Corporations by Parissa Haghirian ISBN 9780203846759, 0203846753 Instant Download
100% (1)
(Ebook) Multinationals and Cross-Cultural Management: The Transfer of Knowledge Within Multinational Corporations by Parissa Haghirian ISBN 9780203846759, 0203846753 Instant Download
46 pages
CS01207
No ratings yet
CS01207
3 pages
A Study On Financial Performance Analysis of Toyota Company
No ratings yet
A Study On Financial Performance Analysis of Toyota Company
34 pages
FDS Unit 1
No ratings yet
FDS Unit 1
21 pages
DM Unit V
No ratings yet
DM Unit V
13 pages
Radhakrishnan Health Data As Wealth
No ratings yet
Radhakrishnan Health Data As Wealth
47 pages
Unit2 Skiplist
No ratings yet
Unit2 Skiplist
10 pages
Institutional Strengthening Handbook
No ratings yet
Institutional Strengthening Handbook
38 pages
Technical Writing Guidelines
No ratings yet
Technical Writing Guidelines
3 pages
Data Engineering UNIT-1
100% (1)
Data Engineering UNIT-1
14 pages
Mean Stack Technologies Lab Record
No ratings yet
Mean Stack Technologies Lab Record
49 pages
Unit4 Datascience
No ratings yet
Unit4 Datascience
43 pages
2013 GCE A Level Solution H1 Math
No ratings yet
2013 GCE A Level Solution H1 Math
2 pages
Notes 5
No ratings yet
Notes 5
12 pages
Journal of Learning Disabilities: A Longitudinal Study On Dysgraphic Handwriting in Primary School
No ratings yet
Journal of Learning Disabilities: A Longitudinal Study On Dysgraphic Handwriting in Primary School
12 pages
6 1 Mining Complex Data
No ratings yet
6 1 Mining Complex Data
69 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
2 pages
Record Ex-12 To Ex-15
No ratings yet
Record Ex-12 To Ex-15
13 pages
A Report of Six Weaks Industrial Training at BBSBEC, Fatehgarh Sahib
No ratings yet
A Report of Six Weaks Industrial Training at BBSBEC, Fatehgarh Sahib
24 pages
OOSE Unit 1 Notes
No ratings yet
OOSE Unit 1 Notes
21 pages
CCS341 - Data Warehousing 2023 Nov Dec
No ratings yet
CCS341 - Data Warehousing 2023 Nov Dec
2 pages
PJJ SBLE3123 ENGLISH PROFIENCY IIIvv
No ratings yet
PJJ SBLE3123 ENGLISH PROFIENCY IIIvv
5 pages
Tableau Lab Example
No ratings yet
Tableau Lab Example
9 pages
Notes 2 Unit
No ratings yet
Notes 2 Unit
8 pages
By Radhika Subramanian and DR PDF
No ratings yet
By Radhika Subramanian and DR PDF
6 pages
STA112 - Lecture - 1 - Content - Probability 1
No ratings yet
STA112 - Lecture - 1 - Content - Probability 1
42 pages
Alternative Methodology To Avoid Convergence Problems Caused For WELDRAW Keyword
No ratings yet
Alternative Methodology To Avoid Convergence Problems Caused For WELDRAW Keyword
3 pages
Lesson Plan: Present Absent
No ratings yet
Lesson Plan: Present Absent
3 pages
Ee/Rppf: Extended Essay - Reflections On Planning and Progress Form
No ratings yet
Ee/Rppf: Extended Essay - Reflections On Planning and Progress Form
3 pages
L-2.9 Hmac Cmac
No ratings yet
L-2.9 Hmac Cmac
14 pages
III Sem Syllabus RNSIT New
No ratings yet
III Sem Syllabus RNSIT New
19 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
DS&BD Lab Manul
No ratings yet
DS&BD Lab Manul
98 pages
The Effects of Participation in Athletics On Academic Performance Among High School Sophomores and Juniors
No ratings yet
The Effects of Participation in Athletics On Academic Performance Among High School Sophomores and Juniors
2 pages
N Final Report Undp Hrva
No ratings yet
N Final Report Undp Hrva
156 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
100% (1)
Computer Science (Optional II) Grade 9-10: Micro Syllabus - Academic Year 2069
6 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
Computer Vision Module Application For Finding A Target in A Live Camera
No ratings yet
Computer Vision Module Application For Finding A Target in A Live Camera
8 pages
TYBSc (CS) Sem VI - Practical - Slips-1
No ratings yet
TYBSc (CS) Sem VI - Practical - Slips-1
30 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Take Home Quiz
No ratings yet
Take Home Quiz
1 page
Sri Raaja Raajan College of Engineering and Technology Department of Computer Science and Engineering
No ratings yet
Sri Raaja Raajan College of Engineering and Technology Department of Computer Science and Engineering
1 page
Unit 2 - Knowledge Delivery
No ratings yet
Unit 2 - Knowledge Delivery
31 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
21 pages
JAVA Sample Questions For Practice (II CSE - A' & II IT - B')
No ratings yet
JAVA Sample Questions For Practice (II CSE - A' & II IT - B')
5 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
Cs2358 Internet Programming Lab Anna University Syllabus
No ratings yet
Cs2358 Internet Programming Lab Anna University Syllabus
12 pages
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
No ratings yet
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
39 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
CS1403 CASE Tools Lab Manual
100% (2)
CS1403 CASE Tools Lab Manual
67 pages
Chapter 1 BFC34303 (Lyy)
No ratings yet
Chapter 1 BFC34303 (Lyy)
104 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
Chapter 1 BFC34303
No ratings yet
Chapter 1 BFC34303
104 pages
Toaz - Info Ge 4 Topic 2 Statistics PR
No ratings yet
Toaz - Info Ge 4 Topic 2 Statistics PR
11 pages
Fs Lab Manual
No ratings yet
Fs Lab Manual
57 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet