Feature Engineering
Feature Engineering
3. Temporal Features
4. Text Features
● Tokenization: nltk.word_tokenize(df['text'])
● TF-IDF Vectorization:
sklearn.feature_extraction.text.TfidfVectorizer().fit_transform(df['text'
])
5. Numerical Features
6. Feature Interaction
● Polynomial Features:
sklearn.preprocessing.PolynomialFeatures().fit_transform(df[['feature1',
'feature2']])
● Interaction Between Categorical and Numerical Features:
df['new_feature'] = df['cat_feature'].astype(str) + '_' +
df['num_feature'].astype(str)
● Feature Crosses for Categorical Features: pd.get_dummies(df['feature1'] +
'_' + df['feature2'])
● Pairwise Products of Numerical Features: df['feature1'] * df['feature2']
● Creating Ratios of All Pairwise Features: df['feature1'] / df['feature2']
7. Dimensionality Reduction