0% found this document useful (0 votes)
71 views4 pages

Feature Selection: Mean Square Between Groups Mean Square Within Groups

This document discusses feature selection using statistical tests to identify the features that have the strongest relationship to the target variable. It calculates the F-value for different features to measure their predictive power. The features with the highest F-values are the states of DE, KA, and area code, indicating they have the strongest influence on the target variable compared to other features.

Uploaded by

Hsu Let Yee Hnin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views4 pages

Feature Selection: Mean Square Between Groups Mean Square Within Groups

This document discusses feature selection using statistical tests to identify the features that have the strongest relationship to the target variable. It calculates the F-value for different features to measure their predictive power. The features with the highest F-values are the states of DE, KA, and area code, indicating they have the strongest influence on the target variable compared to other features.

Uploaded by

Hsu Let Yee Hnin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Feature Selection

Features with the strongest relationship with the output variable are selected using statistical
tests.
mean square between groups
F_value =
mean square within groups
F_value of different features are calculated.
Group1 Group2 Group3
2 10 10
3 8 13
7 7 14
2 5 13
6 10 15
Sum 20 40 65
Mean 4 8 13

X - mean ( X −mean)2
Group1 -2 4
-1 1
3 9
-2 4
2 4
Group2 2 4
0 0
-1 1
-3 9
2 4
Group3 -3 9
0 0
1 1
0 0
2 4

Sum of square within groups (SSW) = (4+1+9+4+4)+(4+0+1+9+4)+(9+0+1+0+4)=54


Observation Mean X – mean ( X −mean)2
2 8.3 -6.3 40.1
3 -5.3 28.4
7 -1.3 1.8
2 -6.3 40.1
6 -2.3 5.4
10 1.7 2.8
8 -0.3 0.1
7 -1.3 1.8
5 -3.3 11.1
10 1.7 2.8
10 1.7 2.8
13 4.7 21.8
14 5.7 32.1
13 4.7 21.8
15 6.7 44.4
sum 257.3

Sum of square total (SST) = 257.3


SST = SSW + SSB
SSB = SST – SSW = 257.3 – 54 = 203.3
Degree of freedom between groups = dfb = groups -1 = 3-1 = 2
Degree of freedom within groups = dfw =observation – group = 15-3=12
SSB 203.3
Mean square between groups = = = 101.667
dfb 2
SSw 54
Mean square within groups = = = 4.5
dfw 12
mean square between groups 101.667
F_value = = = 22.59
mean square within groups 4.5

F_value for Telecom_Dataset

No Feature_Name F_value
1 State_DE 11.69798
2 State_KA 9.304263
3 area code 7.305324
4 State_MI 6.923433
5 State_OK 6.0412
6 State_OH 6.02171
7 total night minutes 5.447898
8 total intl charge 5.40735
9 State_AK 5.400937
10 State_IL 4.457331
11 total eve charge 4.297015
12 total night calls 4.290958
13 total intl minutes 3.784666
14 customer service 3.458785
calls
15 State_AL 2.455233
16 State_ID 2.455233
17 State_MA 2.455233
18 State_MD 2.455233
19 State_NE 2.455233
20 total day minutes 1.825441
21 Voice mail plan 1 1.825441
22 total intl calls 1.623302
23 total night charge 1.621888
24 State_FL 1.182149
25 State_GA 1.182149
26 State_HI 1.182149
27 State_IA 1.182149
28 State_IN 1.182149
29 State_MO 1.182149
30 State_MT 1.182149
31 State_OR 1.182149
32 State_VT 1.182149
33 State_WY 1.182149
34 International plan 0 1.182149
35 State_AZ 1.182149
36 State_NH 1.182149
37 State_SC 1.182149
38 Sate_VA 1.182149
39 total eve minutes 0.775825
40 total day calls 0.733302
41 State_CO 0.599227
42 account length 0.096057
43 total day charge 0.014973
44 total eve calls 0.014891
45 number vmail 0.012495
messages
46 International Plan 1 0.012495
47 Voice main Plan 0 0.012495
48 State_AR -19.6667
49 State_LA -19.6667
50 State_NJ -19.6667
51 State_RI -19.6667
52 State_TX -19.6667
53 State_WI -19.6667
54 State_WV -19.6667
55 State_NY -19.667

You might also like