Classes - Correspondence Analysis - Data Anaysis in Management - MBM
Classes - Correspondence Analysis - Data Anaysis in Management - MBM
Correspondence analysis
Correspondence analysis –
exercises
Exercise 1 (Source: Greenacre’a, 1984 )
A certain survey of all 193 staff members of a company was
conducted in order to formulate a smoking policy. The information
on smoking habits of staff memebers was reported. The staff
members were classified according to:
1. their rank (five levels): Senior managers, Junior managers, Senior
employees, Junior employees and Secretaries, and
2. a categorization of their smoking habits (four groups): None,
Light, Medium and Heavy.
The results of the survey can be found in the
„Exercise_1_smoking.csv” file.
Conduct a simple correspondence analysis. Interpret all results.
NOTE!
The „Exercise_1_smoking.csv” file can be
downloaded from our course on the
Moodle platform.
NOTE !!!
Copy Exercise_1_smoking.csv file into
drive D !!!
The code in R required to create and analyze
contingency table and correspondence matrix, and
then to conduct a simple correspondence analysis is
provided in „CA_classes_exercise_1.R” file, which
can be downloaded from our course on the Moodle
platform.
NOTE!
Remember that the “#” symbol
signifies a comment in R, and everything
on a line after it is ignored.
Solution: THE R CODE
install.packages("ca") # install a relevant package to conduct correspondence analysis
library(ca) # includes functions to CA
# Add row and column marginal totals and create a table with sums for both variables
addmargins(contingency.table, FUN=sum)
Solution:
THE R CODE
# Create a correspondence matrix, that is, create a table of proportions
round(addmargins(prop.table(contingency.table)),4)
# Calculate row profiles, that is, create a table of proportions over rows,
round(addmargins(prop.table(contingency.table, margin=1)),4)
#Calculate column profiles, that is, create a table of proportions over columns,
round(addmargins(prop.table(contingency.table, margin=2)),4)
attach(data.smoking)
print("Conducting CA:",quote=FALSE)
model.ca<-ca(contingency.table)
THE R CODE
Commands in R –
Output
lines of the code
# Input data set > data.smoking[1:6,]
Exercise_1_smoking.csv into object
Staff_group Smoking_habits
(Data Frame) data.smoking
data.smoking <- 1 Senior_managers None
read.csv2("D:/Exercise_1_smoking.csv 2 Senior_managers None
", header=TRUE)
3 Senior_managers None
4 Senior_managers None
# Show the first six records
data.smoking[1:6,] 5 Senior_managers Light
6 Senior_managers Light
Solution:
Print categories of „Smoking_habits” variable and
„Staff_group” variable:
THE R CODE
Commands in R
– Output
lines of the code
print("Categories of > print("Categories of Smoking_habits variable:",quote=FALSE)
Smoking_habits
[1] Categories of Smoking_habits variable:
variable:",quote=FALSE)
print(table(data.smoking > print(table(data.smoking$Smoking_habits))
$Smoking_habits))
Heavy Light Medium None
print("Categories of 25 45 62 61
Staff_group
> print("Categories of Staff_group variable:",quote=FALSE)
variable:",quote=FALSE)
print(table(data.smoking [1] Categories of Staff_group variable:
$Staff_group)) > print(table(data.smoking$Staff_group))
addmargins(contingency.table, 2: Smoking_habits
FUN=sum) Smoking_habits
Staff_group Heavy Light Medium None sum
Junior_employees 13 24 33 18 88
Junior_managers 4 3 7 4 18
Secretaries 2 6 7 10 25
Senior_employees 4 10 12 25 51
Senior_managers 2 2 3 4 11
sum 25 45 62 61 193
Solution, continued :
Based on the contingency table let us develop a
correspondence matrix:
THE R CODE
Commands in R –
Output
lines of the code
# Create a correspondence > round(addmargins(prop.table(contingency.table)),4)
matrix, that is, create a table
Smoking_habits
of proportions
Staff_group Heavy Light Medium None Sum
round(addmargins(prop.table( Junior_employees 0.0674 0.1244 0.1710 0.0933 0.4560
contingency.table)),4) Junior_managers 0.0207 0.0155 0.0363 0.0207 0.0933
Secretaries 0.0104 0.0311 0.0363 0.0518 0.1295
Senior_employees 0.0207 0.0518 0.0622 0.1295 0.2642
Senior_managers 0.0104 0.0104 0.0155 0.0207 0.0570
Sum 0.1295 0.2332 0.3212 0.3161 1.0000
Solution, continued :
Selected interpretations:
• 6.74% of all employees are junior employees and are heavy
smokers;
• 23.3% of all employees are light smokers (see columns’
masses);
• 5.7% of all employees are senior managers (see rows’ masses);
NOTE! Please write other interpretations.
Solution, continued :
Let us now compute row profiles:
THE R CODE
Commands in R –
Output
lines of the code
# Calculate row > round(addmargins(prop.table(contingency.table,
profiles, that is, create
margin=1)),4)
a table of proportions
over rows, i.e. row Smoking_habits
percentages Staff_group Heavy Light Medium None Sum
Junior_employees 0.1477 0.2727 0.3750 0.2045 1.0000
round(addmargins(prop
Junior_managers 0.2222 0.1667 0.3889 0.2222 1.0000
.table(contingency.table
, margin=1)),4) Secretaries 0.0800 0.2400 0.2800 0.4000 1.0000
Senior_employees 0.0784 0.1961 0.2353 0.4902 1.0000
Senior_managers 0.1818 0.1818 0.2727 0.3636 1.0000
Sum 0.7102 1.0573 1.5519 1.6806 5.0000
Solution, continued :
Selected interpretations:
• 14.77% of junior employees are heavy smokers;
THE R CODE
Commands in R –
Output
lines of the code
# Calculate column > round(addmargins(prop.table(contingency.table,
profiles, that is, create a
margin=2)),4)
table of proportions
over columns, i.e. Smoking_habits
column percentages Staff_group Heavy Light Medium None Sum
Junior_employees 0.5200 0.5333 0.5323 0.2951 1.8807
round(addmargins(prop
Junior_managers 0.1600 0.0667 0.1129 0.0656 0.4051
.table(contingency.table
, margin=2)),4) Secretaries 0.0800 0.1333 0.1129 0.1639 0.4902
Senior_employees 0.1600 0.2222 0.1935 0.4098 0.9856
Senior_managers 0.0800 0.0444 0.0484 0.0656 0.2384
Sum 1.0000 1.0000 1.0000 1.0000 4.0000
Solution, continued :
Selected interpretations:
• 16.0% of heavy smokers are junior managers;
Rows:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Jnr_mp | 456 1000 308 | -233 942 331 | 58 58 152 |
2 | Jnr_mn | 93 991 139 | -259 526 84 | -243 465 551 |
3 | Scrt | 130 999 71 | 201 865 70 | 79 133 81 |
4 | Snr_mp | 264 1000 450 | 381 999 512 | -11 1 3 |
5 | Snr_mn | 57 893 31 | 66 92 3 | -194 800 214 |
Columns:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Hevy | 130 995 192 | -294 684 150 | -198 310 506 |
2 | Lght | 233 984 83 | -99 327 31 | 141 657 463 |
3 | Medm | 321 983 148 | -196 982 166 | 7 1 2 |
4 | None | 316 1000 577 | 393 994 654 | -30 6 29 |
THE R CODE
Eigenvalues = inertias. Eigenvalues correspond to the amount of
Commands in R –
lines of the code
information retained by each axis. Eigenvalues
Output are large for the first axis
summary(model.ca) and>small for the subsequent axis.
summary(model.ca)
Rows:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Jnr_mp | 456 1000 308 | -233 942 331 | 58 58 152 |
2 | Jnr_mn | 93 991 139 | -259 526 84 | -243 465 551 |
Rows’ masses – the 3 | Scrt | 130 999 71 | 201 865 70 | 79 133 81 |
mass shows the 4 | Snr_mp | 264 1000 450 | 381 999 512 | -11 1 3 |
relative weight of 5 | Snr_mn | 57 893 31 | 66 92 3 | -194 800 214 |
each category on
the sample. Columns:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Hevy | 130 995 192 | -294 684 150 The
| -198 310 506
second |
dimension is
Coordinates for the map –
2 | Lght | 233 984 83 | -99 327 31 mainly characterized
| 141 657 463 | by
coordinates for the first two
3 | Medm | 321 983 148 | -196 982 166 Junior_managers
| 7 1 2 | and
dimensions (k=1 and k=2).
4 | None | 316 1000 577 | 393 994 654 Senior_managers.
| -30 6 29 |
COLUMNS SCORES
Commands in R –
NOTE!
The quantities in tables for rows Output
lines of the code
and columns are multiplied by
1000 (e.g., the> coordinates
summary(model.ca) and masses)!
summary(model.ca)
THE R CODE
Commands in R –
Output
lines of the code
print("Print a perceptul map > print("Print a perceptul map of CA:",quote=FALSE)
of CA:",quote=FALSE)
[1] Print a perceptul map of CA:
> plot(model.ca, mass=c(TRUE,
plot(model.ca,
mass=c(TRUE, TRUE),contrib=c("absolute","absolute"),las=1)
TRUE),contrib=c("absolute"
,"absolute"),las=1)
Perceptual map
Solution, continued :
Interpretations:
Exercise 2
A certain survey of 162 randomly selected customers was conducted in
order to study the relationship between the supermarket in which the
customer does the shopping and the main reason why the customer does
the shopping in this particular supermarket. A sample of customers of
some supermarkets (Aldi, Auchan, Biedronka, Carrefour, Kaufland,
Lewiatan, Lidl, Tesco and Others) was asked to indicate only one (the
most important for them) of seven reasons of shopping: a frequent
promotion (R1), high quality products (R2), nearby location (R3), open
hours (R4), low prices (R5), a wide range of goods (R6) and parking space
(R7).
The results of the survey can be found in the
„Exercise_2_supermarkets.csv” file.
Exercise 2
1. Create a contingency table of „Supermarket” versus
„Reason_of_shopping”. Interpret results.
2. Develop the correspondence matrix. Interpret results.
3. Calculate row profiles and column profiles. Interpret results.
4. Conduct a correspondence analysis and interpret results.
NOTE!
The „Exercise_2_supermarkets.csv” file can be
downloaded from our course on the Moodle platform.
Exercise 3
A large survey of randomly selected people on the attitude of
people to debt was conducted in order to study the
relationship between the level of income and the housing
tenure. Each of the respondents was then classified according
to two criteria:
1. income group: the lowest, low, medium, high, the highest,
2. and the housing tenure: rent, mortgage, owned outright.
The results of the survey can be found in the
„Exercise_3_debt.csv” file.
Exercise 3
1. Create a contingency table of „Income_group” (rows) versus
„Housing_tenure” (columns). Interpret results.
2. Develop the correspondence matrix. Interpret results.
3. Calculate row profiles and column profiles. Interpret results.
4. Conduct a correspondence analysis and interpret results.
NOTE!
The „Exercise_3_debt.csv” file can be downloaded from
our course on the Moodle platform.