Week 3 - Home Work

The document provides instructions for cleaning and analyzing a dataset on US state populations from 2010-2011. It involves: 1) Creating a function to read the CSV file from a URL into a dataframe; 2) Cleaning the dataframe by removing unnecessary columns and rows, and changing column names; 3) Storing the cleaned dataframe and calculating summary statistics like the mean population; 4) Finding the most populous state in 2011 and sorting the data by 2011 population.

Uploaded by

Anu Maria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

238 views5 pages

Week 3 - Home Work

Uploaded by

Anu Maria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Week

3 - Home Work

Step 1: Create a function (named readStates) to read a CSV file into R

1. Note that you are to read a URL, not a file local to your computer.
2. The file is a dataset on state populations (within the United States).

The URL is: http://www2.census.gov/programs-surveys/popest/tables/2010-

2011/state/totals/nst-est2011-01.csv

> ReadStates <- function()

+ {
+ StatePopulation <- read.csv("http://www2.census.gov/programs-
surveys/popest/tables/2010-2011/state/totals/nst-est2011-01.csv",header =
FALSE)
+ return(StatePopulation)
+ }
> MyData <- ReadStates()
> #MyData
>

Step 2: Clean the dataframe

3. Note the issues that need to be fixed (removing columns, removing rows,
changing column names).
4. Within your function, make sure there are 51 rows (one per state + the district of
Columbia). Make sure there are only 5 columns with the columns having the
following names (stateName, Jul2010, Jul2011, base2010, base2011).
5. Make sure the last four columns are numbers (i.e. not strings).

> # Clean the dataframe

> MyData <- MyData[-1:-9,]
> MyData <- MyData[-52:-58,]
> MyData <- MyData[1:5]
> Datastored <- MyData
>
> # Change the column names
> MyData <- Datastored
> colnames(MyData,do.NULL = TRUE, prefix = "col")
[1] "V1" "V2" "V3" "V4" "V5"
> oldnames = c("V1", "V2", "V3", "V4", "V5")
> newnames = c("StateName","July_2010","July_2011","Base_2010","Base_2011")
> for(i in 1:5) names(MyData)[names(MyData) == oldnames[i]] = newnames[i]
> MyData
StateName July_2010 July_2011 Base_2010 Base_2011
10 .Alabama 4,779,736 4,779,735 4,785,401 4,802,740
11 .Alaska 710,231 710,231 714,146 722,718
12 .Arizona 6,392,017 6,392,013 6,413,158 6,482,505
13 .Arkansas 2,915,918 2,915,921 2,921,588 2,937,979
14 .California 37,253,956 37,253,956 37,338,198 37,691,912
15 .Colorado 5,029,196 5,029,196 5,047,692 5,116,796
16 .Connecticut 3,574,097 3,574,097 3,575,498 3,580,709
17 .Delaware 897,934 897,934 899,792 907,135
18 .District of Columbia 601,723 601,723 604,912 617,996
19 .Florida 18,801,310 18,801,311 18,838,613 19,057,542
20 .Georgia 9,687,653 9,687,660 9,712,157 9,815,210
21 .Hawaii 1,360,301 1,360,301 1,363,359 1,374,810
22 .Idaho 1,567,582 1,567,582 1,571,102 1,584,985
23 .Illinois 12,830,632 12,830,632 12,841,980 12,869,257
24 .Indiana 6,483,802 6,483,800 6,490,622 6,516,922
25 .Iowa 3,046,355 3,046,350 3,050,202 3,062,309
26 .Kansas 2,853,118 2,853,118 2,859,143 2,871,238
27 .Kentucky 4,339,367 4,339,362 4,347,223 4,369,356
28 .Louisiana 4,533,372 4,533,372 4,545,343 4,574,836
29 .Maine 1,328,361 1,328,361 1,327,379 1,328,188
30 .Maryland 5,773,552 5,773,552 5,785,681 5,828,289
31 .Massachusetts 6,547,629 6,547,629 6,555,466 6,587,536
32 .Michigan 9,883,640 9,883,635 9,877,143 9,876,187
33 .Minnesota 5,303,925 5,303,925 5,310,658 5,344,861
34 .Mississippi 2,967,297 2,967,297 2,970,072 2,978,512
35 .Missouri 5,988,927 5,988,927 5,995,715 6,010,688
36 .Montana 989,415 989,415 990,958 998,199
37 .Nebraska 1,826,341 1,826,341 1,830,141 1,842,641
38 .Nevada 2,700,551 2,700,551 2,704,283 2,723,322
39 .New Hampshire 1,316,470 1,316,472 1,316,807 1,318,194
40 .New Jersey 8,791,894 8,791,894 8,799,593 8,821,155
41 .New Mexico 2,059,179 2,059,180 2,065,913 2,082,224
42 .New York 19,378,102 19,378,104 19,395,206 19,465,197
43 .North Carolina 9,535,483 9,535,475 9,560,234 9,656,401
44 .North Dakota 672,591 672,591 674,629 683,932
45 .Ohio 11,536,504 11,536,502 11,537,968 11,544,951
46 .Oklahoma 3,751,351 3,751,354 3,760,184 3,791,508
47 .Oregon 3,831,074 3,831,074 3,838,332 3,871,859
48 .Pennsylvania 12,702,379 12,702,379 12,717,722 12,742,886
49 .Rhode Island 1,052,567 1,052,567 1,052,528 1,051,302
50 .South Carolina 4,625,364 4,625,364 4,637,106 4,679,230
51 .South Dakota 814,180 814,180 816,598 824,082
52 .Tennessee 6,346,105 6,346,110 6,357,436 6,403,353
53 .Texas 25,145,561 25,145,561 25,253,466 25,674,681
54 .Utah 2,763,885 2,763,885 2,775,479 2,817,222
55 .Vermont 625,741 625,741 625,909 626,431
56 .Virginia 8,001,024 8,001,030 8,023,953 8,096,604
57 .Washington 6,724,540 6,724,540 6,742,950 6,830,038
58 .West Virginia 1,852,994 1,852,996 1,854,368 1,855,364
59 .Wisconsin 5,686,986 5,686,986 5,691,659 5,711,767
60 .Wyoming 563,626 563,626 564,554 568,158
> # Clean the StateName, change Census data to numeric.
>
> MyData$StateName <- gsub("\\.","",MyData[,1])
> MyData$July_2010 <- as.numeric(gsub(",","",MyData[,2]))
> MyData$July_2011 <- as.numeric(gsub(",","",MyData[,3]))
> MyData$Base_2010 <- as.numeric(gsub(",","",MyData[,4]))
> MyData$Base_2011 <- as.numeric(gsub(",","",MyData[,5]))
> DataSorted <- MyData
>

Step 3: Store and Explore the dataset

6. Store the dataset into a dataframe, called dfStates.

7. Test your dataframe by calculating the mean for the July2011 data, by doing:
mean(dfStates$Jul2011) àyou should get an answer of 6,053,834

> MyData <- DataSorted

> #Store the dataset into a dataframe, called dfStates
> dfStates <- data.frame(MyData)
>
> #Test your dataframe by calculating the mean for the July2011 data
> Mean <- mean(dfStates$July_2011)
> Mean
[1] 6053834
>

Step 4: Find the state with the Highest Population

8. Based on the July2011 data, what is the population of the state with the highest
population? What is the name of that state?
9. Sort the data, in increasing order, based on the July2011 data.

>
> #what is the population of the state with the highest population?
> MaxPop <- max(MyData$July_2011,na.rm = FALSE)
> MaxPop
[1] 37253956
>
> #What is the name of that state?
> MaxPopState <- MyData$StateName[which.max(MyData$July_2011)]
> MaxPopState
[1] "California"
>
> #Sort data in increasing order
> SortedData <- MyData[order(MyData$July_2011),]
> SortedData
StateName July_2010 July_2011 Base_2010 Base_2011
60 Wyoming 563626 563626 564554 568158
18 District of Columbia 601723 601723 604912 617996
55 Vermont 625741 625741 625909 626431
44 North Dakota 672591 672591 674629 683932
11 Alaska 710231 710231 714146 722718
51 South Dakota 814180 814180 816598 824082
17 Delaware 897934 897934 899792 907135
36 Montana 989415 989415 990958 998199
49 Rhode Island 1052567 1052567 1052528 1051302
39 New Hampshire 1316470 1316472 1316807 1318194
29 Maine 1328361 1328361 1327379 1328188
21 Hawaii 1360301 1360301 1363359 1374810
22 Idaho 1567582 1567582 1571102 1584985
37 Nebraska 1826341 1826341 1830141 1842641
58 West Virginia 1852994 1852996 1854368 1855364
41 New Mexico 2059179 2059180 2065913 2082224
38 Nevada 2700551 2700551 2704283 2723322
54 Utah 2763885 2763885 2775479 2817222
26 Kansas 2853118 2853118 2859143 2871238
13 Arkansas 2915918 2915921 2921588 2937979
34 Mississippi 2967297 2967297 2970072 2978512
25 Iowa 3046355 3046350 3050202 3062309
16 Connecticut 3574097 3574097 3575498 3580709
46 Oklahoma 3751351 3751354 3760184 3791508
47 Oregon 3831074 3831074 3838332 3871859
27 Kentucky 4339367 4339362 4347223 4369356
28 Louisiana 4533372 4533372 4545343 4574836
50 South Carolina 4625364 4625364 4637106 4679230
10 Alabama 4779736 4779735 4785401 4802740
15 Colorado 5029196 5029196 5047692 5116796
33 Minnesota 5303925 5303925 5310658 5344861
59 Wisconsin 5686986 5686986 5691659 5711767
30 Maryland 5773552 5773552 5785681 5828289
35 Missouri 5988927 5988927 5995715 6010688
52 Tennessee 6346105 6346110 6357436 6403353
12 Arizona 6392017 6392013 6413158 6482505
24 Indiana 6483802 6483800 6490622 6516922
31 Massachusetts 6547629 6547629 6555466 6587536
57 Washington 6724540 6724540 6742950 6830038
56 Virginia 8001024 8001030 8023953 8096604
40 New Jersey 8791894 8791894 8799593 8821155
43 North Carolina 9535483 9535475 9560234 9656401
20 Georgia 9687653 9687660 9712157 9815210
32 Michigan 9883640 9883635 9877143 9876187
45 Ohio 11536504 11536502 11537968 11544951
48 Pennsylvania 12702379 12702379 12717722 12742886
23 Illinois 12830632 12830632 12841980 12869257
19 Florida 18801310 18801311 18838613 19057542
42 New York 19378102 19378104 19395206 19465197
53 Texas 25145561 25145561 25253466 25674681
14 California 37253956 37253956 37338198 37691912
>

Step 5: Explore the distribution of the states

10. Write a function that takes two parameters. The first is a vector and the
second is a number.
11. The function will return the percentage of the elements within the vector
that is less than the same (i.e. the cumulative distribution below the value
provided).
12. For example, if the vector had 5 elements (1,2,3,4,5), with 2 being the
number passed into the function, the function would return 0.2 (since 20% of
the numbers were below 2).
13. Test the function with the vector ‘dfStates$Jul2011Num’, and the mean of
dfStates$Jul2011Num’.

> MyFunction <- function(MyVector,MyNumber)

+ {
+ Value <- MyVector < MyNumber
+ MyVal <- length(which(Value,arr.ind = FALSE, useNames = TRUE))
+ NumVect <- length(MyVector)
+ CumilativeDist <- (MyVal/NumVect)*100
+ return(CumilativeDist)
+ }
> MyFunction(dfStates$July_2011,Mean)
[1] 66.66667

Dashboard Excel
No ratings yet
Dashboard Excel
4,908 pages
demoPCA 0
No ratings yet
demoPCA 0
173 pages
demoPCAfr 1
No ratings yet
demoPCAfr 1
178 pages
State Math Standards - Review - Fordham - 2005
No ratings yet
State Math Standards - Review - Fordham - 2005
130 pages
Description: Tags: Appnd-A
No ratings yet
Description: Tags: Appnd-A
263 pages
Community Redistricting Report Card
No ratings yet
Community Redistricting Report Card
112 pages
Learnandexcel In-1
No ratings yet
Learnandexcel In-1
115 pages
Description: Tags: Apptable2
No ratings yet
Description: Tags: Apptable2
110 pages
Engineering Docs4
No ratings yet
Engineering Docs4
86 pages
Description: Tags: Sthistbypr94to96
No ratings yet
Description: Tags: Sthistbypr94to96
92 pages
Dashboard Ence602
No ratings yet
Dashboard Ence602
314 pages
Charts Graphs and Frequenc
No ratings yet
Charts Graphs and Frequenc
26 pages
BABM - 1001 Week 4 Seminar 2
No ratings yet
BABM - 1001 Week 4 Seminar 2
112 pages
dữ liệu ktl
No ratings yet
dữ liệu ktl
5 pages
Metstat 1 - Malvina Vioxa - 21611145
No ratings yet
Metstat 1 - Malvina Vioxa - 21611145
25 pages
Calculus 1 - Math6100: QUIZ 1 - Prelim Exam
100% (1)
Calculus 1 - Math6100: QUIZ 1 - Prelim Exam
77 pages
Pollino SAData
No ratings yet
Pollino SAData
12 pages
HYSYS Optimization
No ratings yet
HYSYS Optimization
11 pages
US Heat Map
No ratings yet
US Heat Map
9 pages
Analysis of US Arrests: Tejus Prabhu 8/21/2021
No ratings yet
Analysis of US Arrests: Tejus Prabhu 8/21/2021
8 pages
Introduction To The Fractional Fourier Transform and Its Applications
100% (1)
Introduction To The Fractional Fourier Transform and Its Applications
43 pages
2015 Aqha Executive Summary
No ratings yet
2015 Aqha Executive Summary
2 pages
Chapter 10 - Exercise 9
No ratings yet
Chapter 10 - Exercise 9
4 pages
Description: Tags: Eiap-Final
No ratings yet
Description: Tags: Eiap-Final
11 pages
EmpireCenter Research&data Jan14
No ratings yet
EmpireCenter Research&data Jan14
2 pages
Excel Formulas Usage
No ratings yet
Excel Formulas Usage
5 pages
Description: Tags: Table 25
No ratings yet
Description: Tags: Table 25
6 pages
Quiz 2
No ratings yet
Quiz 2
28 pages
Us FGMC All States Table
No ratings yet
Us FGMC All States Table
1 page
Description: Tags: 03q4ffelpga
No ratings yet
Description: Tags: 03q4ffelpga
5 pages
Description: Tags: Table 04
No ratings yet
Description: Tags: Table 04
4 pages
Description: Tags: FWSfiscalst
No ratings yet
Description: Tags: FWSfiscalst
7 pages
Moneybackprrel
No ratings yet
Moneybackprrel
3 pages
Percentage:: Note: Italics Indicate Write-In Votes
No ratings yet
Percentage:: Note: Italics Indicate Write-In Votes
7 pages
TP Acp
No ratings yet
TP Acp
6 pages
Ginn Reading 360 Little Books. Level 1
No ratings yet
Ginn Reading 360 Little Books. Level 1
5 pages
Description: Tags: Pcancelst
No ratings yet
Description: Tags: Pcancelst
3 pages
Description: Tags: Aa6
No ratings yet
Description: Tags: Aa6
2 pages
Description: Tags: Aa5
No ratings yet
Description: Tags: Aa5
2 pages
Description: Tags: Aa7
No ratings yet
Description: Tags: Aa7
2 pages
Description: Tags: Fws-Comm
No ratings yet
Description: Tags: Fws-Comm
2 pages
Kami Export - Armani Marshall - Ch07 - Enrich - Act
No ratings yet
Kami Export - Armani Marshall - Ch07 - Enrich - Act
3 pages
Urban Rural by State 2010 Short Ver
No ratings yet
Urban Rural by State 2010 Short Ver
2 pages
g11 Qtax4
No ratings yet
g11 Qtax4
2 pages
Description: Tags: Aa4
No ratings yet
Description: Tags: Aa4
2 pages
States and Their Capitals U.S Aurora Medina
No ratings yet
States and Their Capitals U.S Aurora Medina
5 pages
c2010br-01 U.S. Population Change
No ratings yet
c2010br-01 U.S. Population Change
12 pages
Inserting and Deleting
No ratings yet
Inserting and Deleting
2 pages
Description: Tags: 06q3ffelpstate
No ratings yet
Description: Tags: 06q3ffelpstate
1 page
Description: Tags: 03q4ffelpstate
No ratings yet
Description: Tags: 03q4ffelpstate
1 page
Description: Tags: 05ffelpstate
No ratings yet
Description: Tags: 05ffelpstate
1 page
Description: Tags: Table-21-2005-06
No ratings yet
Description: Tags: Table-21-2005-06
1 page
Description: Tags: 06q2ffelpstate
No ratings yet
Description: Tags: 06q2ffelpstate
1 page
I129 H1B A Healthcare FY15-20 PDF
No ratings yet
I129 H1B A Healthcare FY15-20 PDF
4 pages
Cssa
No ratings yet
Cssa
14 pages
Data Power BI2
No ratings yet
Data Power BI2
2 pages
1.1 Practice Key
100% (1)
1.1 Practice Key
7 pages
F1 Working With Functions - Further Functions and Relations
No ratings yet
F1 Working With Functions - Further Functions and Relations
7 pages
Quadratic Functions and Equations in One Variable
No ratings yet
Quadratic Functions and Equations in One Variable
70 pages
Description: Tags: 06q1ffelpstate
No ratings yet
Description: Tags: 06q1ffelpstate
1 page
Mathematics: Quadratic Functions
No ratings yet
Mathematics: Quadratic Functions
15 pages
West Nile Virus Disease Cases Reported To CDC by State of Residence, 1999-2019
No ratings yet
West Nile Virus Disease Cases Reported To CDC by State of Residence, 1999-2019
2 pages
Clean Power Plan - Proposed State Goals
No ratings yet
Clean Power Plan - Proposed State Goals
8 pages
Clustering Using R
No ratings yet
Clustering Using R
13 pages
Table 132. People Who Got Married, and Divorced in The Past 12 Months by State: 2009
No ratings yet
Table 132. People Who Got Married, and Divorced in The Past 12 Months by State: 2009
1 page
Sin (1/x), Fplot Command - y Sin (X), Area (X, Sin (X) ) - y Exp (-X. X), Barh (X, Exp (-X. X) )
No ratings yet
Sin (1/x), Fplot Command - y Sin (X), Area (X, Sin (X) ) - y Exp (-X. X), Barh (X, Exp (-X. X) )
26 pages
Alg 1 A
No ratings yet
Alg 1 A
16 pages
DSP Lab 1 PDF
No ratings yet
DSP Lab 1 PDF
10 pages
Area Codes by State
No ratings yet
Area Codes by State
6 pages
Analysis and Applications of Laplace /fourier Transformations in Electric Circuit
No ratings yet
Analysis and Applications of Laplace /fourier Transformations in Electric Circuit
7 pages
Pu 2 Maths QP SN
No ratings yet
Pu 2 Maths QP SN
7 pages
Holiday HW 2022-23
No ratings yet
Holiday HW 2022-23
10 pages
A Method of Estimating Plane Vulnerability Based On Damage of Survivors" by Abraham Wald (CRC)
No ratings yet
A Method of Estimating Plane Vulnerability Based On Damage of Survivors" by Abraham Wald (CRC)
101 pages
R Nuts and Bolts
No ratings yet
R Nuts and Bolts
9 pages
Reduced Maths Syllabus - ISC 12
No ratings yet
Reduced Maths Syllabus - ISC 12
9 pages
# States That Honor The Listed State at Left Permit/License
No ratings yet
# States That Honor The Listed State at Left Permit/License
3 pages
Chapter 5 - Linear Programming
No ratings yet
Chapter 5 - Linear Programming
4 pages
Congressional Apportionment, 1789-2010: Year of Census
No ratings yet
Congressional Apportionment, 1789-2010: Year of Census
1 page
Random Variables and Their Probability Distributions: o o o o
No ratings yet
Random Variables and Their Probability Distributions: o o o o
9 pages
Matlab Codes
No ratings yet
Matlab Codes
12 pages
SPE 53962 Surface Axial Load Based Progressive Cavity Pump Optimization System
No ratings yet
SPE 53962 Surface Axial Load Based Progressive Cavity Pump Optimization System
6 pages
1.1 - Power Functions
No ratings yet
1.1 - Power Functions
16 pages
Practice Exercise 1 MAT 2101
No ratings yet
Practice Exercise 1 MAT 2101
3 pages
EOS Syllabus 2022
No ratings yet
EOS Syllabus 2022
2 pages
C Programming Exercice
No ratings yet
C Programming Exercice
5 pages
Consider The Function F Defined Over
No ratings yet
Consider The Function F Defined Over
13 pages
SCADA Project Guide - 11
No ratings yet
SCADA Project Guide - 11
1 page
2010 League State Ranking
No ratings yet
2010 League State Ranking
1 page
Calculus Cheat Sheets
100% (72)
Calculus Cheat Sheets
5 pages
SharePoint 2010 Issue Tracking System Design, Create, and Manage
From Everand
SharePoint 2010 Issue Tracking System Design, Create, and Manage
Sarath Thirumoorthi
3/5 (1)