Lab4-Factors & DataFrames
Lab4-Factors & DataFrames
>f
## [1] small small medium large small large
## Levels: large medium small
>levels(f)
## [1] "large" "medium" "small"
By default, factor levels for character vectors are created in alphabetical order.
To see the underlying representation of a factor use the command
> unclass(x)
For vectors representing ordinal variables, you add the argument ordered=TRUE
to the factor() function . Given the vector
status <- c("Poor", "Improved", "Excellent", "Poor")
the statement
status <- factor(status, ordered=TRUE)
will encode the vector as (3, 2, 1, 3) and associate these values internally
as 1=Excellent, 2=Improved, and 3=Poor.
You can override the default by specifying a levels option. For
example,
status <- factor(status, order=TRUE,
levels=c("Poor", "Improved", "Excellent"))
would assign the levels as 1=Poor, 2=Improved, 3=Excellent.
Changing the order of the levels like this changes how many functions handle
the factor. The order of factor levels mostly affects how summary
information is printed and how factors are plotted.
f <- factor(c("small", "small", "medium",
"large", "small",
"large"))
f
## [1] small small medium large small large
## Levels: large medium small
summary(f)
## large medium small
## 2 1 3
Data Frame
A data frame is a list of equal-length vectors. A data frame can be thought of as
a rectangular structure where each column is a variate and each row is an
observation.
Example 1:
data <- data.frame(x = 1:3, y = 4:6, z=c("one", "two", "three"))
str(data)
## 'data.frame': 3 obs. of 3 variables:
## $ x: int 1 2 3
## $ y: int 4 5 6
## $ z: Factor w/ 3 levels "one","three",..: 1 3 2
Example 2:
OR
>match_stat<-data.frame(name=character(0),matches=numeric(0)
,innings=numeric(0),highestscore=numeric(0),avg=numeric(0))
Creates an empty data frame with given variable names and mo
des and then the command
> match_stat <- edit(match_stat)
Invokes a text editor that allows you to enter your data manually.
Invoking mydata <- edit(mydata) again allows you to edit the data you’ve entered and
to add new data.
Note :
The result of the editing is assigned back to the object(match_stat) itself. The edit()
function operates on a copy of the object. If you don’t assign it a destination, all the edits
will be lost.
>tail(match_stat,n=3)
>match_stat$half_cent<-c(68,62,58,63,57)
> match_stat$cent<-c(51,41,45,36,33)
> match_stat
name matches innings highestscore avg half_cent cent
1 Tendulkar 200 329 248 53.78 68 51
2 Ponting 168 287 257 51.85 62 41
3 kallis 166 280 224 55.37 58 45
4 Dravid 164 286 270 52.31 63 36
5 cook 161 291 294 45.35 57 33
> match_stat
name matches innings highestscore avg half_cent cent
1 Tendulkar 200 329 248 53.78 68 51
2 Ponting 168 287 257 51.85 62 41
3 kallis 166 280 224 55.37 58 45
4 Dravid 164 286 270 52.31 63 36
5 cook 161 291 294 45.35 57 33
6 sangakkara 134 233 319 57.40 52 38
7 lara 131 232 400 52.80 48 34
3. Accessing by condition
>match_stat[match_stat$name=="Tendulkar",] #access the row
corresponding to player Tendulkar
>match_stat[which(match_stat$highestscore==max(match_stat
$highestscore)),c(1,5)] #Display the name and the average of the player
who is having maximum highestscore.
>match_stat[match_stat$name=="Tendulakar",2]<-201 # Modify
Tendulkar’s number of matches as 201.
4. Subset command:
>subset(match_stat,matches>165,select=c(name,matches)) #To
select names & matches of players with matches>165