Lab4-Factors & DataFrames

Factors represent categorical data and can be ordered or unordered. They are created using the factor() function, which stores categorical values as integers mapped to character strings representing categories or levels. A data frame is a list of equal-length vectors that can be thought of as a rectangular structure with columns as variables and rows as observations. It allows storing multiple types of variables together. New rows and columns can be added to an existing data frame using functions like rbind() and $ operator respectively.

Uploaded by

roliho3769

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Lab4-Factors & DataFrames

Uploaded by

roliho3769

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Factors

• Factors are used to represent categorical data and can be unordered or

ordered.
• Factors can be considered as an integer vector where each integer has a label.
Factor objects can be created with the factor() function. The function factor()
stores the categorical values as a vector of integers in the range [1... k] (where
k is the number of unique values in the categorical variable), and an internal
vector of character strings (the original values) mapped to these integers.
• A factor is usually constructed by giving it a vector of strings. These are
translated into the different categories, and the factor becomes a vector of
these categories.These categories are called “levels”.
Example:
>f <- factor(c("small", "small", "medium", "large", "small","large"))

>f
## [1] small small medium large small large
## Levels: large medium small

>levels(f)
## [1] "large" "medium" "small"

By default, factor levels for character vectors are created in alphabetical order.
To see the underlying representation of a factor use the command
> unclass(x)

For vectors representing ordinal variables, you add the argument ordered=TRUE
to the factor() function . Given the vector
status <- c("Poor", "Improved", "Excellent", "Poor")
the statement
status <- factor(status, ordered=TRUE)
will encode the vector as (3, 2, 1, 3) and associate these values internally
as 1=Excellent, 2=Improved, and 3=Poor.
You can override the default by specifying a levels option. For
example,
status <- factor(status, order=TRUE,
levels=c("Poor", "Improved", "Excellent"))
would assign the levels as 1=Poor, 2=Improved, 3=Excellent.
Changing the order of the levels like this changes how many functions handle
the factor. The order of factor levels mostly affects how summary
information is printed and how factors are plotted.
f <- factor(c("small", "small", "medium",
"large", "small",
"large"))
f
## [1] small small medium large small large
## Levels: large medium small
summary(f)
## large medium small
## 2 1 3

ff <- factor(c("small", "small", "medium",

"large", "small",
"large"), levels = c("small", "medium",
"large"))
ff
## [1] small small medium large small large
## Levels: small medium large
summary(ff)
## small medium large
## 3 1 2

Data Frame
A data frame is a list of equal-length vectors. A data frame can be thought of as
a rectangular structure where each column is a variate and each row is an
observation.

Example 1:
data <- data.frame(x = 1:3, y = 4:6, z=c("one", "two", "three"))
str(data)
## 'data.frame': 3 obs. of 3 variables:
## $ x: int 1 2 3
## $ y: int 4 5 6
## $ z: Factor w/ 3 levels "one","three",..: 1 3 2

Example 2:

Consider the following data from website ESPN cricinfo live

Name Matches Innings Highestscore Average
Tendulkar 200 329 248 53.78
Ponting 168 287 257 51.85
Kallis 166 280 224 55.37
Dravid 164 286 270 52.31
Cook 161 291 294 45.35
The above data frame for batsmen with most runs can be created as follows:
> match_stat<-data.frame(name=c("Tendulkar","Ponting","kallis","Dravid",
"cook"),matches=c(200,168,166,164,161),innings=c(329,287,280,286,291),high
estscore=c(248,257,224,270,294),avg=c(53.78,51.85,55.37,52.31,45.35))
> match_stat
name matches innings highestscore avg
1 Tendulkar 200 329 248 53.78
2 Ponting 168 287 257 51.85
3 kallis 166 280 224 55.37
4 Dravid 164 286 270 52.31
5 cook 161 291 294 45.35

>match_stat<-data.frame(name=character(0),matches=numeric(0)
,innings=numeric(0),highestscore=numeric(0),avg=numeric(0))
Creates an empty data frame with given variable names and mo
des and then the command
> match_stat <- edit(match_stat)
Invokes a text editor that allows you to enter your data manually.
Invoking mydata <- edit(mydata) again allows you to edit the data you’ve entered and
to add new data.
Note :
The result of the editing is assigned back to the object(match_stat) itself. The edit()
function operates on a copy of the object. If you don’t assign it a destination, all the edits
will be lost.

Getting structure of data frame:

The structure of data frame created can be obtained by function str() as follows:
>str(match_stat)
'data.frame': 5 obs. of 5 variables:
$ name : Factor w/ 5 levels "cook","Dravid",..: 5 4 3 2
1
$ matches : num 200 168 166 164 161
$ innings : num 329 287 280 286 291
$ highestscore: num 248 257 224 270 294
$ avg : num 53.8 51.9 55.4 52.3 45.4

Getting summary of data in data frame:

The summary of data in data frame cab be obtained by function summary().
>summary(match_stat)
name matches innings highestscore avg
cook :1 Min. :161.0 Min. :280.0 Min. :224.0 Min. :45.35
Dravid :1 1st Qu.:164.0 1st Qu.:286.0 1st Qu.:248.0 1st Qu.:51.85
kallis :1 Median :166.0 Median :287.0 Median :257.0 Median :52.31
Ponting :1 Mean :171.8 Mean :294.6 Mean :258.6 Mean :51.73
Tendulkar:1 3rd Qu.:168.0 3rd Qu.:291.0 3rd Qu.:270.0 3rd Qu.:53.78
Max. :200.0 Max. :329.0 Max. :294.0 Max. :55.37
If a data frame has too many rows and columns ,we can display few starting or ending entries b
y function head and tail as follows:
>head(match_stat,n=2)

name matches innings highestscore avg

1 Tendulakar 200 329 248 53.78
2 Ponting 168 287 257 51.85

>tail(match_stat,n=3)

name matches innings highestscore avg

3 kallis 166 280 224 55.37
4 Dravid 164 286 270 52.31
5 cook 161 291 294 45.35

Adding New columns:

Any new column can be added in the data frame given as below. Let we want to add number of
0s and 100s for every player of the data frame. We use ‘$’ operator to introduce a new column

>match_stat$half_cent<-c(68,62,58,63,57)
> match_stat$cent<-c(51,41,45,36,33)
> match_stat
name matches innings highestscore avg half_cent cent
1 Tendulkar 200 329 248 53.78 68 51
2 Ponting 168 287 257 51.85 62 41
3 kallis 166 280 224 55.37 58 45
4 Dravid 164 286 270 52.31 63 36
5 cook 161 291 294 45.35 57 33

Adding New rows:

New rows can be added in the existing data frame. Let we want to add two more players Sangk
ara and Lara. This can be done by rbind function.
>new_match_stat<-data.frame(name=c("sangakkara","lara"),matches=c(134,131),in
nings=c(233,232),highestscore=c(319,400),avg=c(57.4,52.8),half_cent=c(52,4
8),cent=c(38,34))
> match_stat<-rbind(match_stat,new_match_stat)

> match_stat
name matches innings highestscore avg half_cent cent
1 Tendulkar 200 329 248 53.78 68 51
2 Ponting 168 287 257 51.85 62 41
3 kallis 166 280 224 55.37 58 45
4 Dravid 164 286 270 52.31 63 36
5 cook 161 291 294 45.35 57 33
6 sangakkara 134 233 319 57.40 52 38
7 lara 131 232 400 52.80 48 34

Accessing data from data frame:

The data from the data frame can be accessed as follows:
1. Accessing by position(indices)
>match_stat[4,] #accessing 4th row
>match_stat[,4] #accessing 4th column
>match_stat[c(4,5),c(1,2)] #accessing 4th , 5th row and 1st ,2nd column
2. Accessing by column name:
>match_stat$name #access the “name” column

3. Accessing by condition
>match_stat[match_stat$name=="Tendulkar",] #access the row
corresponding to player Tendulkar

>match_stat[match_stat$name=="Tendulakar",4] # to find highest

score of Tendulkar
>which(match_stat$highestscore>=270,) # Find the row number of th
e data for which the highestscore is equal or greater than 270

>match_stat[which(match_stat$highestscore==max(match_stat
$highestscore)),c(1,5)] #Display the name and the average of the player
who is having maximum highestscore.

>match_stat[match_stat$name=="Tendulakar",2]<-201 # Modify
Tendulkar’s number of matches as 201.

4. Subset command:
>subset(match_stat,matches>165,select=c(name,matches)) #To
select names & matches of players with matches>165

>subset(match_stat,highestscore>250,select=name) # select names

of
players with highest score >250

Ramp Layout and Design
No ratings yet
Ramp Layout and Design
1 page
Operation & Installation Manual For Wave Pool
100% (3)
Operation & Installation Manual For Wave Pool
98 pages
R Basics Part2
No ratings yet
R Basics Part2
15 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
New Chapter 13 Elementary Statistics
No ratings yet
New Chapter 13 Elementary Statistics
15 pages
Basic_Data_Objects_in_R
No ratings yet
Basic_Data_Objects_in_R
18 pages
DSBDL 3: 2.1 Getting The Count of A Particular Column
No ratings yet
DSBDL 3: 2.1 Getting The Count of A Particular Column
11 pages
Statistics and Data Science with R Part -4
No ratings yet
Statistics and Data Science with R Part -4
23 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Introduction To Basics of R - Assignment: Log2 (2 5) Log (Exp (1) Exp (2) )
No ratings yet
Introduction To Basics of R - Assignment: Log2 (2 5) Log (Exp (1) Exp (2) )
10 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
Phan Project2 Report
No ratings yet
Phan Project2 Report
10 pages
R Data Types 8
No ratings yet
R Data Types 8
7 pages
r Module 5
No ratings yet
r Module 5
21 pages
R study material I
No ratings yet
R study material I
8 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Chapter - 3 Common Statistical Procedure
No ratings yet
Chapter - 3 Common Statistical Procedure
20 pages
Dar lecture 7
No ratings yet
Dar lecture 7
24 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
UNIT-3-2
No ratings yet
UNIT-3-2
21 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Module IV
No ratings yet
Module IV
43 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
R
No ratings yet
R
15 pages
DS Lab
No ratings yet
DS Lab
31 pages
DataFramesCheatSheet v1.x Rev1
No ratings yet
DataFramesCheatSheet v1.x Rev1
2 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
Python Pandas II Notes XII
No ratings yet
Python Pandas II Notes XII
20 pages
R Lectures Chapter 4
No ratings yet
R Lectures Chapter 4
3 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
5_Data Summaries and Visualization (4)
No ratings yet
5_Data Summaries and Visualization (4)
87 pages
Kanak Gupta 1116 SEC Assignment
No ratings yet
Kanak Gupta 1116 SEC Assignment
3 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
CH 3
No ratings yet
CH 3
33 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
Introduction To R
No ratings yet
Introduction To R
11 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
lab week2-3
No ratings yet
lab week2-3
26 pages
Basic Data Types
No ratings yet
Basic Data Types
48 pages
R - Lecture #2
No ratings yet
R - Lecture #2
21 pages
L5
No ratings yet
L5
29 pages
DA_Lab_Week-2
No ratings yet
DA_Lab_Week-2
22 pages
Unit 1 Factor
No ratings yet
Unit 1 Factor
9 pages
Capital Gains
No ratings yet
Capital Gains
8 pages
R
No ratings yet
R
13 pages
Descriptive Statistics in Matlab
No ratings yet
Descriptive Statistics in Matlab
2 pages
SINGLE VARIABLE Notes 5.3 Year 10
No ratings yet
SINGLE VARIABLE Notes 5.3 Year 10
9 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
UL2
No ratings yet
UL2
2 pages
Unit 3
No ratings yet
Unit 3
11 pages
R-pres
No ratings yet
R-pres
53 pages
Unit-Iv Bdaur-Bcom
No ratings yet
Unit-Iv Bdaur-Bcom
9 pages
Box Plot Data-Aggregation To Normalization DJB Notes 25-04-2024
No ratings yet
Box Plot Data-Aggregation To Normalization DJB Notes 25-04-2024
21 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
G9 Sub Verb Rules
No ratings yet
G9 Sub Verb Rules
5 pages
Pre Owned Cars For Sale 4272022
No ratings yet
Pre Owned Cars For Sale 4272022
5 pages
Definition of Distribution Logistics From The Producer To The Customer
No ratings yet
Definition of Distribution Logistics From The Producer To The Customer
2 pages
Osha Safety Audits
No ratings yet
Osha Safety Audits
12 pages
Handshaking
No ratings yet
Handshaking
3 pages
Items Net Sales Gender Age Custome R Type of Customer Method of Payment Marital Status
No ratings yet
Items Net Sales Gender Age Custome R Type of Customer Method of Payment Marital Status
5 pages
8130.1 Application For Export Certificate of Airworthiness
No ratings yet
8130.1 Application For Export Certificate of Airworthiness
2 pages
Perfecto V Fernandez Philosophy and Law
100% (2)
Perfecto V Fernandez Philosophy and Law
13 pages
33KV Earthmat 1design
No ratings yet
33KV Earthmat 1design
204 pages
The Good Life Is A: Not A State of
No ratings yet
The Good Life Is A: Not A State of
8 pages
Williams, Kenneth R. - The Natural Calculator (1991)
100% (2)
Williams, Kenneth R. - The Natural Calculator (1991)
111 pages
Elastomers Chemical Compatibility Char
No ratings yet
Elastomers Chemical Compatibility Char
12 pages
Research Paradigms
No ratings yet
Research Paradigms
24 pages
01 Introduction To FinMan
No ratings yet
01 Introduction To FinMan
76 pages
Common Idioms List
No ratings yet
Common Idioms List
3 pages
Vol 7 Psychological Forcespdf
100% (2)
Vol 7 Psychological Forcespdf
31 pages
Media Encode
No ratings yet
Media Encode
3 pages
NetApp E2600 - 1051FG000044
No ratings yet
NetApp E2600 - 1051FG000044
1 page
Express Limited Warranty: FORM 6315 First Edition
No ratings yet
Express Limited Warranty: FORM 6315 First Edition
2 pages
12 TH PPT of Foods and Industrial MicrobiologyCourse No. DTM 321 1
No ratings yet
12 TH PPT of Foods and Industrial MicrobiologyCourse No. DTM 321 1
22 pages
Mehul (Parking Project)
No ratings yet
Mehul (Parking Project)
20 pages
Tugas 3 Bahasa Inggris
No ratings yet
Tugas 3 Bahasa Inggris
5 pages
Thielman v. Fagan Complaint, Case No. 3:22-CV-01516-SB
No ratings yet
Thielman v. Fagan Complaint, Case No. 3:22-CV-01516-SB
44 pages
Research Methodology: Qudrattullah Omerkhel
No ratings yet
Research Methodology: Qudrattullah Omerkhel
99 pages
Netter s Concise Radiologic Anatomy Netter Basic Science 1st Edition Edward Weber Do 2024 scribd download
No ratings yet
Netter s Concise Radiologic Anatomy Netter Basic Science 1st Edition Edward Weber Do 2024 scribd download
51 pages
Class 10 HHW
No ratings yet
Class 10 HHW
3 pages
FULLTEXT02
No ratings yet
FULLTEXT02
49 pages
Human Resource Management Assignment Case Study - Job Analysis at Go-Forward
No ratings yet
Human Resource Management Assignment Case Study - Job Analysis at Go-Forward
11 pages