0% found this document useful (0 votes)
16 views14 pages

6 - Machine Learning 2

The document discusses machine learning concepts including scikit-learn library, splitting datasets, features and targets, feature extraction, feature scaling, encoding categorical features, choosing models, and improving models through validation, hyperparameter tuning, and regularization.

Uploaded by

sdog444514
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

6 - Machine Learning 2

The document discusses machine learning concepts including scikit-learn library, splitting datasets, features and targets, feature extraction, feature scaling, encoding categorical features, choosing models, and improving models through validation, hyperparameter tuning, and regularization.

Uploaded by

sdog444514
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Machine Learning

(Continued)
Welcome to Machine Learning!

Big Data, Machine Learning, and their Real World Applications


Pre-College Program
Columbia University, SPS
Let’s review: scikit-learn library

https://machinelearningmastery.com
/a-gentle-introduction-to-scikit-learn-a-python-machine-learnin
g-library/
Splitting a Dataset with scikit-learn

https://towardsdatascience.com
/splitting-a-dataset-e328dab2760a
Scikit-learn
• Train, Test split
• Features, target
• fit() - for training
• model
• new_features
• model.predict() -for testing
Concepts on Features
• Feature extraction
• Numerical features: feature scaling
• Categorical features:
• One-Hot Encoding
• Ordinal Encoding
A Note on Feature Extraction
• Not all your data will be ready to input into an algorithm.
Preprocessing!

• Some more complex data (audio, images, sentences of text,


biosignals, etc.) might require extracting features so that you can
input these features into the algorithm.

• “Traditional”, (non-neural network) algorithms, usually rely on


features to be represented as numerical or categorical values rather
than raw complex signals.
Numerical Features: Feature Scaling
Feature scaling in machine learning is
one of the most critical steps during the
pre-processing of data before creating a
machine learning model. Scaling can
make a difference between a weak
machine learning model and a better
one.
Standard Scaler()
• fit() finds the
mean and
variance

• transform() scales
the data to that
mean and
variance

• fit_transform()
does both!
Dealing with Categorical Variables as
Features
• Feature Encoding:
• One-Hot Encoding
• Ordinal Encoding
https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
A shortcut from Pandas
• get_dummies()
Encoding the Labels
• Label encoder can be used
to normalize labels

• Or transform categorical
labels into numerical labels
Which Model to Choose? Underfitting vs
Overfitting
Model Improvement- if you care to know…
• Validation dataset
• Hyperparameter tuning
• Cross-validation
• Cost functions
• Regularization
Activity for Group Project
Run a decision tree algorithm with scikit-learn for the dataset you
chose for your project.
• Remember to separate features from targets. You might also have to
convert your data to a numpy array.
• Remember to train/test split adequately.
• Fit the decision tree (either classifier or regressor) to your data.
• Predict using the testing features.
• Compare the expected values (test_target) to your predictions.

You might also like