Programming for Data Science Assignment-2
Programming for Data Science Assignment-2
After I have this file in RStudio, I will likely need to make the following changes:
i. Since we only want data from the 50 US states, I'll need to filter out rows related to the District
of Columbia and Puerto Rico and also the rows above and below the data.
ii. Columns such as population estimates, changes, and percentages etc. may be read in as
character strings. These will need to be converted into numeric types for calculations and
analysis.
iii. We observe missing values represented by a dash ('-'), indicating zero or no data. We will need to
replace these dashes with NA or 0,
iv. The column headers are spread across multiple rows. We will need to correct these into a single
row by providing correct column names.
v. We will have to make sure the data types align with our needs for analysis.
2. Importing Dataset:
3. Using str(), summary(), and View() functions :