Week 3 - Home Work
Week 3 - Home Work
3 - Home Work
1. Note that you are to read a URL, not a file local to your computer.
2. The file is a dataset on state populations (within the United States).
3. Note the issues that need to be fixed (removing columns, removing rows,
changing column names).
4. Within your function, make sure there are 51 rows (one per state + the district of
Columbia). Make sure there are only 5 columns with the columns having the
following names (stateName, Jul2010, Jul2011, base2010, base2011).
5. Make sure the last four columns are numbers (i.e. not strings).
8. Based on the July2011 data, what is the population of the state with the highest
population? What is the name of that state?
9. Sort the data, in increasing order, based on the July2011 data.
>
> #what is the population of the state with the highest population?
> MaxPop <- max(MyData$July_2011,na.rm = FALSE)
> MaxPop
[1] 37253956
>
> #What is the name of that state?
> MaxPopState <- MyData$StateName[which.max(MyData$July_2011)]
> MaxPopState
[1] "California"
>
> #Sort data in increasing order
> SortedData <- MyData[order(MyData$July_2011),]
> SortedData
StateName July_2010 July_2011 Base_2010 Base_2011
60 Wyoming 563626 563626 564554 568158
18 District of Columbia 601723 601723 604912 617996
55 Vermont 625741 625741 625909 626431
44 North Dakota 672591 672591 674629 683932
11 Alaska 710231 710231 714146 722718
51 South Dakota 814180 814180 816598 824082
17 Delaware 897934 897934 899792 907135
36 Montana 989415 989415 990958 998199
49 Rhode Island 1052567 1052567 1052528 1051302
39 New Hampshire 1316470 1316472 1316807 1318194
29 Maine 1328361 1328361 1327379 1328188
21 Hawaii 1360301 1360301 1363359 1374810
22 Idaho 1567582 1567582 1571102 1584985
37 Nebraska 1826341 1826341 1830141 1842641
58 West Virginia 1852994 1852996 1854368 1855364
41 New Mexico 2059179 2059180 2065913 2082224
38 Nevada 2700551 2700551 2704283 2723322
54 Utah 2763885 2763885 2775479 2817222
26 Kansas 2853118 2853118 2859143 2871238
13 Arkansas 2915918 2915921 2921588 2937979
34 Mississippi 2967297 2967297 2970072 2978512
25 Iowa 3046355 3046350 3050202 3062309
16 Connecticut 3574097 3574097 3575498 3580709
46 Oklahoma 3751351 3751354 3760184 3791508
47 Oregon 3831074 3831074 3838332 3871859
27 Kentucky 4339367 4339362 4347223 4369356
28 Louisiana 4533372 4533372 4545343 4574836
50 South Carolina 4625364 4625364 4637106 4679230
10 Alabama 4779736 4779735 4785401 4802740
15 Colorado 5029196 5029196 5047692 5116796
33 Minnesota 5303925 5303925 5310658 5344861
59 Wisconsin 5686986 5686986 5691659 5711767
30 Maryland 5773552 5773552 5785681 5828289
35 Missouri 5988927 5988927 5995715 6010688
52 Tennessee 6346105 6346110 6357436 6403353
12 Arizona 6392017 6392013 6413158 6482505
24 Indiana 6483802 6483800 6490622 6516922
31 Massachusetts 6547629 6547629 6555466 6587536
57 Washington 6724540 6724540 6742950 6830038
56 Virginia 8001024 8001030 8023953 8096604
40 New Jersey 8791894 8791894 8799593 8821155
43 North Carolina 9535483 9535475 9560234 9656401
20 Georgia 9687653 9687660 9712157 9815210
32 Michigan 9883640 9883635 9877143 9876187
45 Ohio 11536504 11536502 11537968 11544951
48 Pennsylvania 12702379 12702379 12717722 12742886
23 Illinois 12830632 12830632 12841980 12869257
19 Florida 18801310 18801311 18838613 19057542
42 New York 19378102 19378104 19395206 19465197
53 Texas 25145561 25145561 25253466 25674681
14 California 37253956 37253956 37338198 37691912
>
10. Write a function that takes two parameters. The first is a vector and the
second is a number.
11. The function will return the percentage of the elements within the vector
that is less than the same (i.e. the cumulative distribution below the value
provided).
12. For example, if the vector had 5 elements (1,2,3,4,5), with 2 being the
number passed into the function, the function would return 0.2 (since 20% of
the numbers were below 2).
13. Test the function with the vector ‘dfStates$Jul2011Num’, and the mean of
dfStates$Jul2011Num’.