0% found this document useful (0 votes)
25 views16 pages

Pivot Table

Uploaded by

Cherry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views16 pages

Pivot Table

Uploaded by

Cherry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Box plot

Pivot table
Box Plot
The method to summarize a set of data that is measured
using an interval scale is called a box and whisker plot

Parts of Box Plots


Minimum: The minimum value in the given dataset
First Quartile (Q1): The first quartile is the median of the lower
half of the data set.
Median: The median is the middle value of the dataset, which
divides the given dataset into two equal parts. The median is
considered as the second quartile.
Third Quartile (Q3): The third quartile is the median of the
upper half of the data.
Maximum: The maximum value in the given dataset.
Interquartile Range (IQR): The difference between the third
quartile and first quartile is known as the interquartile range.
(i.e.) IQR = Q3-Q1

Outlier: The data that falls on the far left or right side of the
ordered data is tested to be the outliers. Generally, the outliers
fall more than the specified distance from the first and third
quartile.

(i.e.) Outliers are greater than Q3+(1.5 . IQR) or less than


Q1-(1.5 . IQR).
Suppose you have the math test results for a class of 15
students. Here are the results:
91 95 54 69 80 85 88 73 71 70 66 90 86 84 73
Step 1: Order the data points from least to greatest.
54 66 69 70 71 73 73 80 84 85 86 88 90 91 95
Step 2: Find the median of the data:

finding the median


Step 3: Find the middle points of the two halves divided by the median (find the
upper and lower quartiles).
Step 4: Find the extreme values.

This is the easiest part. You need to find the largest and
smallest data values.

Extreme values = 54 and 95.

So, we can determine that the five-number summary for the


class of students is 54, 70, 80, 88, 95.

Now we are absolutely ready to draw our box and whisker


plot.
As you see, the plot is divided into four groups: a lower
whisker, a lower box half, an upper box half, and an upper
whisker. Each of those groups shows 25% of the data because
we have an equal amount of data in each group.
Interpreting the box and whisker plot results:

✔ The box and whisker plot shows that 50% of the students
have scores between 70 and 88 points.

✔ In addition, 75% scored lower than 88 points, and 50% have


test results above 80. So, if you have test results somewhere
in the lower whisker, you may need to study more.
Comparative double box and whisker plot
Suppose an IT company has two stores that sell computers. The
company recorded the number of sales each store made each
month. In the past 12 months, we have the following numbers of
sold computers:
Store 1:
350, 460, 20, 160, 580, 250, 210, 120, 200, 510, 290, 380.
Store 2:
520, 180, 260, 380, 80, 500, 630, 420, 210, 70, 440, 140.
Syntax:boxplot()

x: This parameter sets as a vector or a formula.


data: This parameter sets the data frame.
main: This parameter is the title of the chart.
names: This parameter are the group labels that will be
showed under each boxplot.

✔ The mtcars dataset is a built-in dataset in R that contains


measurements on 11 different attributes for 32 different cars.
✔ Load the mtcars Dataset
✔ data(mtcars)
Summarize the mtcars Dataset

We can use the summary() function to quickly summarize each


variable in the dataset:

summary(mtcars)
dim(mtcars)
names(mtcars)

hist(mtcars$mpg,
col='steelblue',
main='Histogram',
xlab='mpg',
ylab='Frequency')
boxplot(mtcars$mpg,
main='Distribution of mpg values',
ylab='mpg',
col='steelblue',
border='black')

plot(mtcars$mpg, mtcars$wt,
col='steelblue',
main='Scatterplot',
xlab='mpg',
ylab='wt',
pch=19)
Pivot table

✔ The Pivot table is one of Microsoft Excel’s most powerful features


that let us extract the significance from a large and detailed data
set.
✔ A Pivot Table often shows some statistical value about the dataset
by grouping some values from a column together, To do so in the
R programming Language, we use the group_by() and the
summarize() function of the dplyr package library.
Pivot table

✔ The dplyr package in the R Programming Language is a


structure of data manipulation that provides a uniform set of
verbs that help us in preprocessing large data.
✔ The group_by() function groups the data using one or more
variables and then summarize function creates the summary
of data by those groups using aggregate function passed to it
Pivot table

Syntax:
df %>% group_by( grouping_variables) %>% summarize( label =
aggregate_fun() )

Parameter:
df: determines the data frame in use.
grouping_variables: determine the variable used to group data.
aggregate_fun(): determines the function used for summary. for
example, sum, mean, etc.

sample_data <- data.frame(label=c(‘x', ‘y', ‘z', ‘x',


‘y', ‘z', ‘x', ‘y',
‘z'),
value=c(222, 18, 51, 52, 44, 19, 100, 98, 34))

# load library dplyr


library(dplyr)
# create pivot table with sum of value as summary
sample_data %>% group_by(label) %>%
summarize(sum_values = sum(value))
Pivot table

1x 374
2y 160
3z 104

# create sample data frame


sample_data <- data.frame(label=c(‘x', ‘y', ‘z', ‘x',
‘y', ‘z', ‘x', ‘y',
‘z'),
value=c(222, 18, 51, 52, 44, 19, 100, 98, 34))

# load library dplyr


library(dplyr)

# create pivot table with sum of value as summary


sample_data %>% group_by(label) %>%
summarize(average_values = mean(value))

1x 125.
2y 53.3
3z 34.7

You might also like