Table Function
Table Function
Here we look at some examples of how to work with two way tables. We assume that you can enter data
and understand the different data types.
The idea is that 356 people have been polled on their smoking status (Smoke) and their socioeconomic
status (SES). For each person it was determined whether or not they are current smokers, former
smokers, or have never smoked. Also, for each person their socioeconomic status was determined (low,
middle, or high). The data le contains only two columns, and when read R interprets them both as factors:
>smokerData<read.csv(file='smoker.csv',sep=',',header=T)
>summary(smokerData)
SmokeSES
current:116High:211
former:141Low:93
never:99Middle:52
You can create a two way table of occurrences using the table command and the two columns in the data
frame:
>smoke<table(smokerData$Smoke,smokerData$SES)
>smoke
HighLowMiddle
current514322
former922821
never68229
In this example, there are 51 people who are current smokers and are in the high SES. Note that it is
assumed that the two lists given in the table command are both factors. (More information on this is
available in the chapter on data types.)
>smoke<matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
>colnames(o)<c("High","Low","Middle")
>rownames(o)<c("current","former","never")
>smoke<as.table(smoke)
>smoke
HighLowMiddle
current514322
former922821
never68229
>barplot(smoke,legend=T,beside=T,main='SmokingStatusbySES')
>plot(smoke,main="SmokingStatusBySocioeconomicStatus")
There are a number of ways to get the marginal distributions using the margin.table command. If you just
give the command the table it calculates the total number of observations. You can also calculate the
marginal distributions across the rows or columns based on the one optional argument:
>margin.table(smoke)
[1]356
>margin.table(smoke,1)
currentformernever
11614199
>margin.table(smoke,2)
HighLowMiddle
2119352
>smoke/margin.table(smoke)
HighLowMiddle
current0.143258430.120786520.06179775
former0.258426970.078651690.05898876
never0.191011240.061797750.02528090
>margin.table(smoke,1)/margin.table(smoke)
currentformernever
0.32584270.39606740.2780899
>margin.table(smoke,2)/margin.table(smoke)
HighLowMiddle
0.59269660.26123600.1460674
That is a little obtuse, so fortunately, there is a better way to get the proportions using the prop.table
command. You can specify the proportions with respect to the different marginal distributions using the
optional argument:
>prop.table(smoke)
HighLowMiddle
current0.143258430.120786520.06179775
former0.258426970.078651690.05898876
never0.191011240.061797750.02528090
>prop.table(smoke,1)
HighLowMiddle
current0.43965520.37068970.1896552
former0.65248230.19858160.1489362
never0.68686870.22222220.0909091
>prop.table(smoke,2)
HighLowMiddle
current0.24170620.46236560.4230769
former0.43601900.30107530.4038462
never0.32227490.23655910.1730769
If you want to do a chi-squared test to determine if the proportions are different, there is an easy way to
do this. If we want to test at the 95% condence level we need only look at a summary of the table:
>summary(smoke)
Numberofcasesintable:356
Numberoffactors:2
Testforindependenceofallfactors:
Chisq=18.51,df=4,pvalue=0.0009808
Since the p-value is less that 5% we can reject the null hypothesis at the 95% condence level and can say
that the proportions vary.
Of course, there is a hard way to do this. This is not for the faint of heart and involves some linear algebra
which we will not describe. If you wish to calculate the table of expected values then you need to multiply
the vectors of the margins and divide by the total number of observations:
>expected<as.array(margin.table(smoke,1))%*%t(as.array(margin.table(smoke,2)))/margin.table(smoke)
>expected
HighLowMiddle
current68.7528130.3033716.94382
former83.5702236.8342720.59551
never58.6769725.8623614.46067
The result in this array and can be directly compared to the existing table. We need the square of the
difference between the two tables divided by the expected values. The sum of all these values is the Chi-
squared statistic:
>chi<sum((expectedas.array(smoke))^2/expected)
>chi
[1]18.50974
>1pchisq(chi,df=4)
[1]0.0009808236
>smokerData<read.csv(file='smoker.csv',sep=',',header=T)
>smoke<table(smokerData$Smoke,smokerData$SES)
>mosaicplot(smoke)
>help(mosaicplot)
>
The mosaicplot command takes many of the same arguments for annotating a plot:
>mosaicplot(smoke,main="Smokers",xlab="Status",ylab="EconomicClass")
>
If you wish to switch which side (horizontal versus vertical) to determine the primary proportion then you
can use the sort option. This can be used to switch whether the width or height is used for the rst
proportional length:
>mosaicplot(smoke,main="Smokers",xlab="Status",ylab="EconomicClass")
>mosaicplot(smoke,sort=c(2,1))
>
Finally if you wish to switch which side is used for the vertical and horzintal axis you can use the dir option:
>mosaicplot(smoke,main="Smokers",xlab="Status",ylab="EconomicClass")
>mosaicplot(smoke,dir=c("v","h"))
>
Previous Next
Sponsorship
This site generously supported by Datacamp. Datacamp offers a free interactive introduction to R
coding tutorial as an additional resource. Already over 100,000 people took this free tutorial to
sharpen their R coding skills.