0% found this document useful (0 votes)
87 views12 pages

1.4 Feature Selection

The document discusses feature selection in SKlearn. It covers several feature selection techniques including SelectPercentile, GenericUnivariateSelect, SelectKBest, and SelectFromModel. SelectPercentile selects the most important features based on a given percentile. GenericUnivariateSelect selects a specified number of features based on a scoring function. SelectKBest and SelectFromModel also select a given number of features, with SelectFromModel selecting features based on an existing model. Examples are provided to demonstrate each technique.

Uploaded by

mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views12 pages

1.4 Feature Selection

The document discusses feature selection in SKlearn. It covers several feature selection techniques including SelectPercentile, GenericUnivariateSelect, SelectKBest, and SelectFromModel. SelectPercentile selects the most important features based on a given percentile. GenericUnivariateSelect selects a specified number of features based on a scoring function. SelectKBest and SelectFromModel also select a given number of features, with SelectFromModel selecting features based on an existing model. Examples are provided to demonstrate each technique.

Uploaded by

mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

SKlearn ‫ مكتبة‬:‫القسم العاشر‬

A. Data Preparation 10. Ensemble Classifier


1. Data files from SKlearn 11. K Nearest Neighbors
2. Data cleaning 12. Naïve Bayes
3. Metrics module 13. LDA , QDA
4. Feature selection 14. Hierarchical Clusters
5. Data Scaling 15. DbScan
6. Data Split 16. NLP
17. Apriori
B. ML Algorithms
1. Linear Regression C. Algorithm Evaluation :
2. Logistic Regression 1. Model Check
3. Neural Network 2. Grid Search
4. SVR 3. Pipeline
5. SVC 4. Model Save
6. K-means
7. PCA D. Time Series
8. Decision Tree
9. Ensemble Regression

1
1.4 ) Feature Selection
, y ‫ المطلوبة و المؤثرة و استبعاد الباقين ويتم اختيارها بناء علي مدي ارتباطها بالمخرج‬features ‫و هي خاصة باختيار الـ‬
feature_selection ‫وتتم عبر الموديول‬

1.4.1 feature_selection.SelectPercentile
1.4.2 feature_selection.GenericUnivariateSelect
1.4.3 feature_selection.SelectKBest
1.4.4 feature_selection.SelectFromModel

2
1.4.1) Select Percentile
, ‫ ويتم تحديد مقدار االهمية بطرق عديدة‬, ‫ واللي بتختار اهم فيتشرز مرتبطة بالناتج حسب النسبة المئوية المعطاة‬selectpercentile ‫يتم استخدام اداة‬
chi2 ‫ او أداة‬f_classif ‫مثل أداة‬

: ‫الصيغة‬
# Import Libraries
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import chi2 , f_classif
#----------------------------------------------------

#Feature Selection by Percentile


print('Original X Shape is ' , X.shape)
FeatureSelection = SelectPercentile(score_func = chi2, percentile=20) # score_func can = f_classif
X = FeatureSelection.fit_transform(X, y)

#showing X Dimension

3
print('X Shape is ' , X.shape)
print('Selected Features are : ' , FeatureSelection.get_support())

‫مثال‬
from sklearn.datasets import load_digits
from sklearn.feature_selection import SelectPercentile, chi2
X, y = load_digits(return_X_y=True)
64 ‫هنا يتم اظهار عدد فيتشرز الداتا و هي‬
X.shape

‫ فيشترز‬7 ‫ منهم اي‬% 10 ‫نقوم بجعله يختار اهم‬


X_new = SelectPercentile(score_func =chi2, percentile=10).fit_transform(X, y)

print(X_new.shape)

‫مثال آخر‬

from sklearn.datasets import load_breast_cancer


from sklearn.feature_selection import SelectPercentile , chi2

4
data = load_breast_cancer()
X = data.data
y = data.target
X.shape
sel = SelectPercentile(score_func = chi2 , percentile = 20).fit_transform(X,y)
sel.shape

‫و اذا اردنا معرفة الفيتشرز المختارة و المستبعدة‬


from sklearn.datasets import load_digits
from sklearn.feature_selection import SelectPercentile, chi2
X, y = load_digits(return_X_y=True)
X.shape

X_new = SelectPercentile(score_func =chi2, percentile=10)


X_new.fit(X, y)
selected = X_new.transform(X)
X_new.get_support()

5
1.4.2) Generic Univariate Select
‫و فيها يتم اختيار عدد معين من الفيتشرز بناء علي احد االدوات‬

: ‫الصيغة‬
#Import Libraries
from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.feature_selection import chi2 , f_classif
#----------------------------------------------------
#Feature Selection by Generic
#print('Original X Shape is ' , X.shape)
FeatureSelection = GenericUnivariateSelect(score_func= chi2, mode= 'k_best', param=3) # score_func can =
f_classif : mode can = percentile,fpr,fdr,fwe
X = FeatureSelection.fit_transform(X, y)
#showing X Dimension
#print('X Shape is ' , X.shape)
#print('Selected Features are : ' , FeatureSelection.get_support())

6
‫ فيتشرز‬5 ‫وهنا يتم تحديد عدد‬

from sklearn.datasets import load_breast_cancer


from sklearn.feature_selection import GenericUnivariateSelect, chi2
X, y = load_breast_cancer(return_X_y=True)
X.shape

transformer = GenericUnivariateSelect(chi2, 'k_best', param=5)


X_new = transformer.fit_transform(X, y)

X_new.shape

transformer.get_support()

7
1.4.3) Select KBest
‫ باسلوب رياضي مختلف‬features ‫يقوم كذلك باختيار عدد معين من الـ‬

: ‫الصيغة‬
#Import Libraries
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2 , f_classif
#----------------------------------------------------
#Feature Selection by KBest
#print('Original X Shape is ' , X.shape)
FeatureSelection = SelectKBest(score_func= chi2 ,k=3) # score_func can = f_classif
X = FeatureSelection.fit_transform(X, y)

#showing X Dimension
#print('X Shape is ' , X.shape)
#print('Selected Features are : ' , FeatureSelection.get_support())

8
‫مثال‬
from sklearn.datasets import load_digits
from sklearn.feature_selection import SelectKBest, chi2
X, y = load_digits(return_X_y=True)
X.shape

X_new = SelectKBest(chi2, k=30).fit_transform(X, y)

X_new.shape

9
1.4.4) Select From Model
selectfrommodel ‫ وده بامر‬, ‫يتم اختيار الفيتشر بناء علي موديل معين بحيث الموديل نفسه يشوف انهي فيشترز مهمة‬

: ‫الصيغة‬
#Import Libraries
from sklearn.feature_selection import SelectFromModel
#----------------------------------------------------

#Feature Selection by KBest


#print('Original X Shape is ' , X.shape)

'''
from sklearn.linear_model import LinearRegression
thismodel = LinearRegression()
'''

10
FeatureSelection = SelectFromModel(estimator = thismodel, max_features = None) # make sure that thismodel is
well-defined
X = FeatureSelection.fit_transform(X, y)

#showing X Dimension
#print('X Shape is ' , X.shape)
#print('Selected Features are : ' , FeatureSelection.get_support())

‫مثال‬
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
data = load_breast_cancer()
X = data.data
y = data.target
‫ وعلي اساسها هيتم اختيار اقوي فيتشرز‬, ‫ لها‬20 ‫ برقم‬, ‫تم تحديد موديل الغابة العشوائية‬

sel = SelectFromModel(RandomForestClassifier(n_estimators = 20))


11
‫)‪sel.fit(X,y‬‬
‫)‪selected_features = sel.transform(X‬‬
‫)(‪sel.get_support‬‬

‫الحظ ان مش الزم يتم اختيار نفس الموديل في الترين ‪ ,‬ممكن موديل تاني ‪ ,‬فاختيار الفيتشرز عادي من موديل مختلف ‪ ,‬كمان متنساش ان ممكن يتم‬
‫عمل خطوات ورا بعص ‪ ,‬يعني مثال بولينوميال عشان اعمل فيتشرز كتير جدا ‪ ,‬بعدها اجيب موديل يختار منهم‬

‫‪12‬‬

You might also like