0% found this document useful (0 votes)
30 views31 pages

Classes - Correspondence Analysis - Data Anaysis in Management - MBM

This document provides instructions for conducting correspondence analysis exercises using survey data on staff smoking habits. It includes: 1) Loading and exploring the survey data, which classified 193 staff members on smoking habits (none, light, medium, heavy) and rank (senior manager to secretary). 2) Creating a contingency table and correspondence matrix to analyze relationships between smoking habits and rank. 3) Running a correspondence analysis in R and interpreting the results, including a perceptual map to visualize relationships between variables.

Uploaded by

Antoine Bloyet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views31 pages

Classes - Correspondence Analysis - Data Anaysis in Management - MBM

This document provides instructions for conducting correspondence analysis exercises using survey data on staff smoking habits. It includes: 1) Loading and exploring the survey data, which classified 193 staff members on smoking habits (none, light, medium, heavy) and rank (senior manager to secretary). 2) Creating a contingency table and correspondence matrix to analyze relationships between smoking habits and rank. 3) Running a correspondence analysis in R and interpreting the results, including a perceptual map to visualize relationships between variables.

Uploaded by

Antoine Bloyet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Data Analysis in Management

Programme: Modern Business Management


2022/2023

Roman Huptas, Department of Statistics,


Cracow University of Economics
Classes 1, 2 and 3

Correspondence analysis
Correspondence analysis –
exercises
Exercise 1 (Source: Greenacre’a, 1984 )
A certain survey of all 193 staff members of a company was
conducted in order to formulate a smoking policy. The information
on smoking habits of staff memebers was reported. The staff
members were classified according to:
1. their rank (five levels): Senior managers, Junior managers, Senior
employees, Junior employees and Secretaries, and
2. a categorization of their smoking habits (four groups): None,
Light, Medium and Heavy.
The results of the survey can be found in the
„Exercise_1_smoking.csv” file.
Conduct a simple correspondence analysis. Interpret all results.
NOTE!
The „Exercise_1_smoking.csv” file can be
downloaded from our course on the
Moodle platform.

NOTE !!!
Copy Exercise_1_smoking.csv file into
drive D !!!
The code in R required to create and analyze
contingency table and correspondence matrix, and
then to conduct a simple correspondence analysis is
provided in „CA_classes_exercise_1.R” file, which
can be downloaded from our course on the Moodle
platform.
NOTE!
Remember that the “#” symbol
signifies a comment in R, and everything
on a line after it is ignored.
Solution: THE R CODE
install.packages("ca") # install a relevant package to conduct correspondence analysis
library(ca) # includes functions to CA

# Input data set Exercise_1_smoking.csv into object (Data Frame) data.smoking


data.smoking <- read.csv2("D:/Exercise_1_smoking.csv", header=TRUE)

# Show the first six records


data.smoking[1:6,]

print("Categories of Smoking_habits variable:",quote=FALSE)


print(table(data.smoking$Smoking_habits))

print("Categories of Staff_group variable:",quote=FALSE)


print(table(data.smoking$Staff_group))

# Create a contingency table for counts of Staff_group and Smoking_habits


contingency.table <- xtabs(~Staff_group+Smoking_habits, data=data.smoking)

# Print a contingency table for counts of Staff_group and Smoking_habits


print("Create a contingency table for counts of Staff_group and Smoking_habits :",quote=FALSE)
print(contingency.table)

# Add row and column marginal totals and create a table with sums for both variables
addmargins(contingency.table, FUN=sum)
Solution:
THE R CODE
# Create a correspondence matrix, that is, create a table of proportions
round(addmargins(prop.table(contingency.table)),4)

# Calculate row profiles, that is, create a table of proportions over rows,
round(addmargins(prop.table(contingency.table, margin=1)),4)

#Calculate column profiles, that is, create a table of proportions over columns,
round(addmargins(prop.table(contingency.table, margin=2)),4)

attach(data.smoking)

print("Conducting CA:",quote=FALSE)
model.ca<-ca(contingency.table)

print("Extended results of CA:",quote=FALSE)


summary(model.ca)

print("Print a perceptul map of CA:",quote=FALSE)


plot(model.ca, mass=c(TRUE, TRUE),contrib=c("absolute","absolute"),las=1)
Solution:
First, input data set Exercise_1_smoking.csv into object
(Data Frame) data.smoking:

THE R CODE
Commands in R –
Output
lines of the code
# Input data set > data.smoking[1:6,]
Exercise_1_smoking.csv into object
Staff_group Smoking_habits
(Data Frame) data.smoking
data.smoking <- 1 Senior_managers None
read.csv2("D:/Exercise_1_smoking.csv 2 Senior_managers None
", header=TRUE)
3 Senior_managers None
4 Senior_managers None
# Show the first six records
data.smoking[1:6,] 5 Senior_managers Light
6 Senior_managers Light
Solution:
Print categories of „Smoking_habits” variable and
„Staff_group” variable:

THE R CODE
Commands in R
– Output
lines of the code
print("Categories of > print("Categories of Smoking_habits variable:",quote=FALSE)
Smoking_habits
[1] Categories of Smoking_habits variable:
variable:",quote=FALSE)
print(table(data.smoking > print(table(data.smoking$Smoking_habits))
$Smoking_habits))
Heavy Light Medium None
print("Categories of 25 45 62 61
Staff_group
> print("Categories of Staff_group variable:",quote=FALSE)
variable:",quote=FALSE)
print(table(data.smoking [1] Categories of Staff_group variable:
$Staff_group)) > print(table(data.smoking$Staff_group))

Junior_employees Junior_managers Secretaries Senior_employees Senior_managers


88 18 25 51 11
Solution:
Then, let us create a contingency table of „Staff_group”
versus „Smoking_habits”:
THE R CODE
Commands in R –
Output
lines of the code
# Create a contingency table for > print("Create a contingency table for counts of
counts of Staff_group and Staff_group and Smoking_habits :",quote=FALSE)
Smoking_habits
[1] Create a contingency table for counts of
Staff_group and Smoking_habits :
contingency.table <-
xtabs(~Staff_group+Smoking_habits, > print(contingency.table)
data=data.smoking) Smoking_habits
Staff_group Heavy Light Medium None
# Print a contingency table for Junior_employees 13 24 33 18
counts of Staff_group and
Junior_managers 4 3 7 4
Smoking_habits
Secretaries 2 6 7 10
Senior_employees 4 10 12 25
print("Create a contingency table for
counts of Staff_group and Senior_managers 2 2 3 4
Smoking_habits :",quote=FALSE)
print(contingency.table)
Solution:
Then let us calculate row and column marginal totals and
develop a full contingency table:
THE R CODE
Commands in R –
Output
lines of the code
# Add row and column marginal > addmargins(contingency.table, FUN=sum)
totals and create a table with sums Margins computed over dimensions
for both variables in the following order:
1: Staff_group

addmargins(contingency.table, 2: Smoking_habits

FUN=sum) Smoking_habits
Staff_group Heavy Light Medium None sum
Junior_employees 13 24 33 18 88
Junior_managers 4 3 7 4 18
Secretaries 2 6 7 10 25
Senior_employees 4 10 12 25 51
Senior_managers 2 2 3 4 11
sum 25 45 62 61 193
Solution, continued :
Based on the contingency table let us develop a
correspondence matrix:

THE R CODE
Commands in R –
Output
lines of the code
# Create a correspondence > round(addmargins(prop.table(contingency.table)),4)
matrix, that is, create a table
Smoking_habits
of proportions
Staff_group Heavy Light Medium None Sum
round(addmargins(prop.table( Junior_employees 0.0674 0.1244 0.1710 0.0933 0.4560
contingency.table)),4) Junior_managers 0.0207 0.0155 0.0363 0.0207 0.0933
Secretaries 0.0104 0.0311 0.0363 0.0518 0.1295
Senior_employees 0.0207 0.0518 0.0622 0.1295 0.2642
Senior_managers 0.0104 0.0104 0.0155 0.0207 0.0570
Sum 0.1295 0.2332 0.3212 0.3161 1.0000
Solution, continued :
Selected interpretations:
• 6.74% of all employees are junior employees and are heavy
smokers;
• 23.3% of all employees are light smokers (see columns’
masses);
• 5.7% of all employees are senior managers (see rows’ masses);
NOTE! Please write other interpretations.
Solution, continued :
Let us now compute row profiles:

THE R CODE
Commands in R –
Output
lines of the code
# Calculate row > round(addmargins(prop.table(contingency.table,
profiles, that is, create
margin=1)),4)
a table of proportions
over rows, i.e. row Smoking_habits
percentages Staff_group Heavy Light Medium None Sum
Junior_employees 0.1477 0.2727 0.3750 0.2045 1.0000
round(addmargins(prop
Junior_managers 0.2222 0.1667 0.3889 0.2222 1.0000
.table(contingency.table
, margin=1)),4) Secretaries 0.0800 0.2400 0.2800 0.4000 1.0000
Senior_employees 0.0784 0.1961 0.2353 0.4902 1.0000
Senior_managers 0.1818 0.1818 0.2727 0.3636 1.0000
Sum 0.7102 1.0573 1.5519 1.6806 5.0000
Solution, continued :
Selected interpretations:
• 14.77% of junior employees are heavy smokers;

NOTE! Please write other interpretations.


Solution, continued :
Let us now compute column profiles:

THE R CODE

Commands in R –
Output
lines of the code
# Calculate column > round(addmargins(prop.table(contingency.table,
profiles, that is, create a
margin=2)),4)
table of proportions
over columns, i.e. Smoking_habits
column percentages Staff_group Heavy Light Medium None Sum
Junior_employees 0.5200 0.5333 0.5323 0.2951 1.8807
round(addmargins(prop
Junior_managers 0.1600 0.0667 0.1129 0.0656 0.4051
.table(contingency.table
, margin=2)),4) Secretaries 0.0800 0.1333 0.1129 0.1639 0.4902
Senior_employees 0.1600 0.2222 0.1935 0.4098 0.9856
Senior_managers 0.0800 0.0444 0.0484 0.0656 0.2384
Sum 1.0000 1.0000 1.0000 1.0000 4.0000
Solution, continued :
Selected interpretations:
• 16.0% of heavy smokers are junior managers;

NOTE! Please write other interpretations.


Solution, continued :
To compute correspondence analysis, let us type the following
commands:
THE R CODE
Commands in R –
Output
lines of the code
attach(data.medications) > attach(data.medications)
> print("Conducting CA:",quote=FALSE)
print("Conducting
[1] Conducting CA:
CA:",quote=FALSE)
model.ca<- > model.ca<-ca(contingency.table)
ca(contingency.table) > print("Extended results of CA:",quote=FALSE)
[1] Extended results of CA:
print("Extended results of
CA:",quote=FALSE)
THE R CODE
Commands in R –
Output
lines of the code
summary(model.ca) > summary(model.ca)

Principal inertias (eigenvalues):


dim value % cum% scree plot
1 0.074759 87.8 87.8 **********************
2 0.010017 11.8 99.5 ***
3 0.000414 0.5 100.0
-------- -----
Total: 0.085190 100.0

Rows:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Jnr_mp | 456 1000 308 | -233 942 331 | 58 58 152 |
2 | Jnr_mn | 93 991 139 | -259 526 84 | -243 465 551 |
3 | Scrt | 130 999 71 | 201 865 70 | 79 133 81 |
4 | Snr_mp | 264 1000 450 | 381 999 512 | -11 1 3 |
5 | Snr_mn | 57 893 31 | 66 92 3 | -194 800 214 |

Columns:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Hevy | 130 995 192 | -294 684 150 | -198 310 506 |
2 | Lght | 233 984 83 | -99 327 31 | 141 657 463 |
3 | Medm | 321 983 148 | -196 982 166 | 7 1 2 |
4 | None | 316 1000 577 | 393 994 654 | -30 6 29 |
THE R CODE
Eigenvalues = inertias. Eigenvalues correspond to the amount of
Commands in R –
lines of the code
information retained by each axis. Eigenvalues
Output are large for the first axis
summary(model.ca) and>small for the subsequent axis.
summary(model.ca)

Principal inertias (eigenvalues):


dim value % cum% scree plot
1 0.074759 87.8 87.8 **********************
2 0.010017 11.8 99.5 ***
3 0.000414 0.5 100.0
-------- -----
Dimensions are ordered
decreasingly and listed Total: 0.085190 100.0
The cumulative percentage of total inertia
according to the amount explained – the first dimension explains 87.8%
of inertia (variance) Rows: of total inertia, the first two dimensions
explained in the solution. name mass qlt explain
inr k=199.5% of total
cor ctr inertia.
k=2 cor ctr
1 | Jnr_mp | 456 1000 308 | -233 942 331 | 58 58 152 |
Proportion
2 | Jnr_mn | 93 of total
991 inertia
139 | -259- the
526 first
84 dimension
| -243 465 551 |
3 | explains
Scrt | 13087.8%
999 of71 total
| 201inertia,
865 70the | second
79 133 81 |
4 | dimension
Snr_mp | 264 explains
1000 45011.8%
| 381of 999
total512
inertia.
| -11 1 3 |
5 | Snr_mn | 57 893 31 | 66 92 3 | -194 800 214 |

Total inertia Columns:


name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Hevy | 130 995 192 | -294 684 150 | -198 310 506 |
2 | Lght | 233 984 83 | -99 327 31 | 141 657 463 |
3 | Medm | 321 983 148 | -196 982 166 | 7 1 2 |
4 | None | 316 1000 577 | 393 994 654 | -30 6 29 |
ROWS SCORES
Commands in R –
NOTE!
The quantities in tables for rows Output
lines of the code
and columns are multiplied by
1000 (e.g., the> coordinates
summary(model.ca) and masses)!
summary(model.ca)

Principal inertias (eigenvalues):


These
dim two
value %categories
cum% scree plot
(Senior_employees and Contributions of
1 0.074759 87.8 87.8 ********************** rows (in %) to the
These quantities show Junior_employees) strongly
how total inertia has 2 0.010017 11.8 99.5 *** dimensions.
contribute to explaining the
been distributed across 3 0.000414 0.5 100.0
first dimension.
rows – percentages of -------- -----
total inertia. Total: 0.085190 100.0

Rows:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Jnr_mp | 456 1000 308 | -233 942 331 | 58 58 152 |
2 | Jnr_mn | 93 991 139 | -259 526 84 | -243 465 551 |
Rows’ masses – the 3 | Scrt | 130 999 71 | 201 865 70 | 79 133 81 |
mass shows the 4 | Snr_mp | 264 1000 450 | 381 999 512 | -11 1 3 |
relative weight of 5 | Snr_mn | 57 893 31 | 66 92 3 | -194 800 214 |
each category on
the sample. Columns:
name mass qlt inr k=1 cor ctr k=2 cor ctr
1 | Hevy | 130 995 192 | -294 684 150 The
| -198 310 506
second |
dimension is
Coordinates for the map –
2 | Lght | 233 984 83 | -99 327 31 mainly characterized
| 141 657 463 | by
coordinates for the first two
3 | Medm | 321 983 148 | -196 982 166 Junior_managers
| 7 1 2 | and
dimensions (k=1 and k=2).
4 | None | 316 1000 577 | 393 994 654 Senior_managers.
| -30 6 29 |
COLUMNS SCORES
Commands in R –
NOTE!
The quantities in tables for rows Output
lines of the code
and columns are multiplied by
1000 (e.g., the> coordinates
summary(model.ca) and masses)!
summary(model.ca)

Principal inertias (eigenvalues):


dim value % cum% Contributions of
scree plot
1 columns (in %) to
0.074759 87.8 87.8 **********************
These two categories (None
2 the dimensions.
and 0.010017
Medium)11.8 99.5 ***
strongly
3 0.000414to explaining
contribute 0.5 100.0 the
first-------- -----
dimension.
Total: 0.085190 100.0

Coordinates for the map – The second


coordinates for the first twoRows: dimension is
dimensions (k=1 and k=2). name mass qlt inr k=1 cor ctr k=2 cor ctr mainly
1 | Jnr_mp | 456 1000 308 | -233 942 331 | 58 58 152 |
characterized
These quantities show 2 | Jnr_mn | 93 991 139 | -259 526 84 | -243 465 551 by
| Heavy and
how total inertia has 3 | Scrt | 130 999 71 | 201 865 70 | 79 133 81 Light
|
been distributed across 4 | Snr_mp | 264 1000 450 | 381 999 512 | -11 1 3 categories.
|
columns – percentages
5 | Snr_mn | 57 893 31 | 66 92 3 | -194 800 214 |
of total inertia.
Columns:
Columns’ masses – name mass qlt inr k=1 cor ctr k=2 cor ctr
the mass shows the
1 | Hevy | 130 995 192 | -294 684 150 | -198 310 506 |
relative weight of
2 | Lght | 233 984 83 | -99 327 31 | 141 657 463 |
each category on
3 | Medm | 321 983 148 | -196 982 166 | 7 1 2 |
the sample.
4 | None | 316 1000 577 | 393 994 654 | -30 6 29 |
Solution, continued :
Let us now plot a perceptual map in the two-
dimensional space:

THE R CODE

Commands in R –
Output
lines of the code
print("Print a perceptul map > print("Print a perceptul map of CA:",quote=FALSE)
of CA:",quote=FALSE)
[1] Print a perceptul map of CA:
> plot(model.ca, mass=c(TRUE,
plot(model.ca,
mass=c(TRUE, TRUE),contrib=c("absolute","absolute"),las=1)
TRUE),contrib=c("absolute"
,"absolute"),las=1)
Perceptual map
Solution, continued :
Interpretations:
Exercise 2
A certain survey of 162 randomly selected customers was conducted in
order to study the relationship between the supermarket in which the
customer does the shopping and the main reason why the customer does
the shopping in this particular supermarket. A sample of customers of
some supermarkets (Aldi, Auchan, Biedronka, Carrefour, Kaufland,
Lewiatan, Lidl, Tesco and Others) was asked to indicate only one (the
most important for them) of seven reasons of shopping: a frequent
promotion (R1), high quality products (R2), nearby location (R3), open
hours (R4), low prices (R5), a wide range of goods (R6) and parking space
(R7).
The results of the survey can be found in the
„Exercise_2_supermarkets.csv” file.
Exercise 2
1. Create a contingency table of „Supermarket” versus
„Reason_of_shopping”. Interpret results.
2. Develop the correspondence matrix. Interpret results.
3. Calculate row profiles and column profiles. Interpret results.
4. Conduct a correspondence analysis and interpret results.

NOTE!
The „Exercise_2_supermarkets.csv” file can be
downloaded from our course on the Moodle platform.
Exercise 3
A large survey of randomly selected people on the attitude of
people to debt was conducted in order to study the
relationship between the level of income and the housing
tenure. Each of the respondents was then classified according
to two criteria:
1. income group: the lowest, low, medium, high, the highest,
2. and the housing tenure: rent, mortgage, owned outright.
The results of the survey can be found in the
„Exercise_3_debt.csv” file.
Exercise 3
1. Create a contingency table of „Income_group” (rows) versus
„Housing_tenure” (columns). Interpret results.
2. Develop the correspondence matrix. Interpret results.
3. Calculate row profiles and column profiles. Interpret results.
4. Conduct a correspondence analysis and interpret results.

NOTE!
The „Exercise_3_debt.csv” file can be downloaded from
our course on the Moodle platform.

You might also like