DSAAct 6
DSAAct 6
Chelsea C. Ancheta
2024-11-30
Preparations
Ensure that the library ggplot is loaded. The code is already provided.
If the ggplot2 package is not yet installed. Click the console tab and enter: install.packages("ggplot2")
library(ggplot2)
Also, for this activity, we will go back to the Family Income and Expenditure Survey (FIES). So, import the
fies data frame.
Task 1
We start by limiting only our data set. Let us take the data from Region 1 only. Create a subset data frame
fies_ilocos wherein the Region is “I - Ilocos Region”
Task 2 (Histogram)
Let’s Study our demographics. First create a simple histogram of the Household.Head.Age.
hist(fies_ilocos$Household.Head.Age)
1
Histogram of fies_ilocos$Household.Head.Age
300
200
Frequency
100
50
0
20 40 60 80 100
fies_ilocos$Household.Head.Age
Improve the graph by defining the starting and ending value of each bar. Use breaks = seq() command.
Copy your original script/code and add the needed code.
hist(fies_ilocos$Household.Head.Age,
breaks = seq (10,100, by = 5),
xlab = "Age",
ylab = "Frequency",
col = "lightblue",
main = "Histogram of Household Head Age")
2
Histogram of Household Head Age
300
200
Frequency
100
50
0
20 40 60 80 100
Age
Let’s further improve by including colors. Select a basic color of your choice. Use that color for the color
and use the lighter color for the fill
Copy your original script/code and add the needed code.
3
Histogram of fies_ilocos$Household.Head.Age
300
200
Frequency
100
50
0
20 40 60 80 100
Age
4
Histogram of fies_ilocos$Household.Head.Age
300
200
Frequency
100
50
0
20 40 60 80 100
Age
5
300
200
count
100
25 50 75 100
Household.Head.Age
For statisticians, we can also add a vertical line representing the mean and the median. A code is provided
below, just edit the ... parts to complete the code.
ggplot(fies_ilocos,aes(x=Household.Head.Age))+
geom_histogram(breaks= seq(10,100,by=5),color = "purple", fill= "lightblue")+
geom_vline(aes(xintercept=mean(Household.Head.Age)),color="yellow",linetype="dashed") +
geom_vline(aes(xintercept=median(Household.Head.Age)),color="red",linetype="dotted") +
annotate("text",x = mean(fies_ilocos$Household.Head.Age)+ 7.5,y=325,label ="Mean",color="green") +
annotate("text",x = median(fies_ilocos$Household.Head.Age)+ 7.5,y=350,label="Median",color= "orange")
6
Median
Mean
300
200
y
100
25 50 75 100
Household.Head.Age
7
Bar Graph of Household Head Marital Status
1500
1000
500
0
Marital Status
Let’s make it a pre-process chart for Pareto. Arrange the bars from largest to smallest.
Copy your original script/code and add the needed code.
8
Bar Graph of Household Head Marital Status
1500
1000
500
0
Marital Status
9
Bar Graph of Household Head Marital Status
1500
1000
Count
500
0
Marital Status
10
Bar Graph of Household Head Marital Status
1713
1500
1000
Count
500
425
136
73
1
0
Marital Status
11
Bar Graph of Household Head Marital Status
1500
1000
Count
500
Since the demographics has already been explored, let try to find some patterns in the data set about two
information.
Create a basic scatterplot using geom_point() on Total.Food.Expenditure and Total.Household.Income
12
5e+05
4e+05
Total.Food.Expenditure
3e+05
2e+05
1e+05
0e+00
0e+00 1e+06 2e+06 3e+06 4e+06
Total.Household.Income
Since the data set is concentrated on one corner of the plot, let’s transform by using log10().
Copy your original script/code and add the needed code.
13
Total Food Expenditure vs Total Household Income (Log−Transformed)
3e+05
Total.Food.Expenditure
1e+05
3e+04
1e+04
14
Income vs. Food Expenditure by Marital Status
5e+05
4e+05
Total Food Expenditure
Marital Status
Annulled
3e+05
Divorced/Separated
Married
Single
2e+05
Widowed
1e+05
0e+00
0e+00 1e+06 2e+06 3e+06 4e+06
Total Household Income
Add labels.
Copy your original script/code and add the needed code.
## Warning: ggrepel: 2343 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
15
Income vs. Food Expenditure by Marital Status
Widowed
5e+05
4e+05
Married
Total Food Expenditure
1e+05
0e+00
0e+00 1e+06 2e+06 3e+06 4e+06
Total Household Income
16
Income vs. Food Expenditure by Marital Status
6e+05
Total Food Expenditure
Marital Status
4e+05
Annulled
Divorced/Separated
Married
Single
2e+05 Widowed
0e+00
0e+00 1e+06 2e+06 3e+06 4e+06
Total Household Income
17
Income vs. Food Expenditure by Marital Status
6e+05
Total Food Expenditure
Marital Status
4e+05
Annulled
Divorced/Separated
Married
Single
2e+05 Widowed
0e+00
0e+00 1e+06 2e+06 3e+06 4e+06
Total Household Income
18