0% found this document useful (0 votes)
7 views2 pages

Tutorial7 ClassificationTrees

The document discusses using classification trees to develop models for predicting suitability for two jobs: personal assistant to a CEO and playing tennis. For the CEO assistant model, the tree is constructed using criteria from applications and ratings of past hires to classify applicants as suitable or unsuitable. For the tennis model, weather, temperature, humidity and wind data are used to classify days as suitable for playing or not.

Uploaded by

alutakaunda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Tutorial7 ClassificationTrees

The document discusses using classification trees to develop models for predicting suitability for two jobs: personal assistant to a CEO and playing tennis. For the CEO assistant model, the tree is constructed using criteria from applications and ratings of past hires to classify applicants as suitable or unsuitable. For the tennis model, weather, temperature, humidity and wind data are used to classify days as suitable for playing or not.

Uploaded by

alutakaunda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

RESEARCH AND SURVEY STATISTICS – STA3022F

TUTORIAL #7

CHAPTER 9. CLASSIFICATION TREES

QUESTION 1: PERSONAL ASSISTANTS TO CEO’S RECRUITMENT MODEL Study

A recruitment agency for PA’s (Personal Assistants to a Chief Executive Office (CEO)) proposes to
develop a selection model to assist with the screening of PA positions in Commerce and Industry. The
following attributes are measured through a set of questions included in the written application form
which an applicant submits together with their CV. A recruitment officer assesses each application and
rates each applicant on these criteria.

Criteria Variable Code Rating


Quality of work experience in this field exper 1=Little, 2=Moderate, 3=Extensive
Personality personality 1=Reserved, 2=Average, 3=Extrovert
itskill 1=Excellent, 2=Very Good, 3=Good, 4=Poor, 5=Very
IT skills poor
Interpersonal skills ipskill 1=Very low, 2=Low, 3=Moderate, 4=High, 5=Very high

An analyst used the data mining tool of Classification Trees to identify an appropriate decision rule to
classify future applicants for the ‘PA to the CEO’ position as suitable or not suitable. Those applicants
that are considered suitable are granted an interview, are placed on the company’s files, and are
recommended to clients who approach them for PA to the CEO applicants.

In the training data set, 200 previous PA applications were reviewed and a rating was assigned to each
of the four selection criteria above. In addition, the client companies who employed each of these 200
applicants were approached and asked to indicate the extent to which they were satisfied with these
placements. This resulted in the response category “suitable” or “unsuitable” (variable with the code
suitable).

The input data for the 4 selection criteria considered appropriate and relevant results from the
Classification Tree module of Statistica is given below.

1) root 200 98 Yes (0.4900000 0.5100000)


2) exper=Little 68 28 No (0.5882353 0.4117647)
4) ipskill=Low,Moderate,Very Low 39 11 No (0.7179487 0.2820513)
8) itskill=Good,Poor,Very Gd,Very Pr 32 6 No (0.8125000 0.1875000) *
9) itskill=Excellnt 7 2 Yes (0.2857143 0.7142857) *
5) ipskill=High,Very Hi 29 12 Yes (0.4137931 0.5862069)
10) itskill=Good,Very Pr 14 6 No (0.5714286 0.4285714) *
11) itskill=Excellnt,Poor,Very Gd 15 4 Yes (0.2666667 0.7333333) *
3) exper=EXtensve,Moderate 132 58 Yes (0.4393939 0.5606061)
6) itskill=Excellnt 25 10 No (0.6000000 0.4000000)
12) personality=Average,Introvrt 18 5 No (0.7222222 0.2777778) *
13) personality=EXtrovrt 7 2 Yes (0.2857143 0.7142857) *
7) itskill=Good,Poor,Very Gd,Very Pr 107 43 Yes (0.4018692 0.5981308) *

1
Questions:

1. Construct the classification tree from the relevant data. Annotate your tree diagram clearly
labelling each split with the appropriate criteria and the levels of the split; the number of cases
in each split, and the number and percentage of cases in each node (both intermediate and root
nodes).
2. Interpret the Classification Tree and define an appropriate decision rule for selecting suitable
Personal Assistants from their application forms for further assessment through an interview.
3. What percentage of applicants is correctly identified as suitable by the chosen criteria? Justify.
Use this finding to comment on the reliability of the derived decision rule.
4. What is the likelihood of being considered suitable for a particular applicant who has little
relevant PA experience, but high interpersonal skills? Briefly explain your reasoning.

QUESTION 2: When to play tennis

In a tennis match we are trying to classify the times as play-not play. The following attributes are
observed for 10 days:
Play: yes – no
Outlook: rainy – sunny – overcast
Temperature: cool – hot – mild
Humidity: normal – high
Windy: true – false

Play Outlook Temperature Humidity Wind


yes rainy cool normal FALSE
no rainy cool normal TRUE
yes overcast hot high FALSE
no sunny mild high FALSE
yes rainy cool normal FALSE
yes sunny cool normal FALSE
yes rainy cool normal FALSE
yes sunny hot normal FALSE
yes overcast mild high TRUE
no sunny mild high TRUE

Using the CART algorithm, construct a classification tree with a stopping rule of node size = 1.

Questions:

1. Using the CART algorithm, construct a classification tree with a stopping rule of node size = 1.
Important annotations are: criteria for each split, levels of the split, total number of cases in
each node, and number and percentage of cases in each of the groups at each node.
2. Construct the classification matrix. What is the probability that a day chosen at random is likely
to be correctly classified by the decision rules? Justify.
3. Write down the set of decision rules that should be presented.
4. How does CHAID differ from CART in the way it chooses the split variables?

You might also like