0% found this document useful (0 votes)
58 views16 pages

Predictive Modeling Machine Learning

Predictive modeling involves: 1. Using machine learning algorithms to extract patterns from historical data and use those patterns to make predictions on new data. 2. Statistical tools are used to summarize training data into an executable predictive model that can predict outcomes without relying on hard-coded rules. 3. Common machine learning algorithms like support vector machines, linear classifiers, and random forests are used to build predictive models from labeled training data and evaluate their performance on unlabeled test data.

Uploaded by

Indra Sar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views16 pages

Predictive Modeling Machine Learning

Predictive modeling involves: 1. Using machine learning algorithms to extract patterns from historical data and use those patterns to make predictions on new data. 2. Statistical tools are used to summarize training data into an executable predictive model that can predict outcomes without relying on hard-coded rules. 3. Common machine learning algorithms like support vector machines, linear classifiers, and random forests are used to build predictive models from labeled training data and evaluate their performance on unlabeled test data.

Uploaded by

Indra Sar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Predictive modeling

~= machine learning
• Make predictions of outcome on new data

• Extract the structure of historical data

• Statistical tools to summarize the training data into


a executable predictive model

• Alternative to hard-coded rules written by experts


type # rooms surface public trans
(category) (int) (float m2) (boolean)

Apartment 3 50 TRUE

House 5 254 FALSE

Duplex 4 68 TRUE

Apartment 2 32 TRUE
type # rooms surface public trans sold
(category) (int) (float m2) (boolean) (float k€)

Apartment 3 50 TRUE 450

House 5 254 FALSE 430

Duplex 4 68 TRUE 712

Apartment 2 32 TRUE 234


features target

type # rooms surface public trans sold


(category) (int) (float m2) (boolean) (float k€)

Apartment 3 50 TRUE 450


samples
(train)

House 5 254 FALSE 430

Duplex 4 68 TRUE 712

Apartment 2 32 TRUE 234


features target

type # rooms surface public trans sold


(category) (int) (float m2) (boolean) (float k€)

Apartment 3 50 TRUE 450


samples
(train)

House 5 254 FALSE 430

Duplex 4 68 TRUE 712

Apartment 2 32 TRUE 234


samples

Apartment 2 33 TRUE ?
(test)

House 4 210 TRUE ?


Training
text docs
images
sounds
transactions Machine
Feature vectors Learning
Algorithm

Labels

Model

Predictive Modeling Data Flow


Training
text docs
images
sounds
transactions Machine
Feature vectors Learning
Algorithm

Labels

New
text doc
Expected
image Model
Label
sound
transaction Feature vector

Predictive Modeling Data Flow


Predictive modeling
in the wild

Virality and readers Personalized


Fraud detection
engagement radios

Inventory forecasting
Predictive maintenance Personality matching
& trends detection
• Library of Machine Learning algorithms

• Focus on established methods (e.g. ESL-II)

• Open Source (BSD)

• Simple fit / predict / transform API

• Python / NumPy / SciPy / Cython

• Model Assessment, Selection & Ensembles


Train data
model = ModelClass(**hyperparams)
Model
model.fit(X_train, y_train)
Train labels

Fitted
Test data
model

Predicted labels

Test labels Evaluation


Train data
model = ModelClass(**hyperparams)
Model
model.fit(X_train, y_train)
Train labels

Fitted
Test data y_pred = model.predict(X_test)
model

Predicted labels

Test labels Evaluation


Train data
model = ModelClass(**hyperparams)
Model
model.fit(X_train, y_train)
Train labels

Fitted
Test data y_pred = model.predict(X_test)
model

Predicted labels

Test labels Evaluation accuracy_score(y_test, y_pred)


Support Vector Machine
from sklearn.svm import SVC

model = SVC(kernel="rbf", C=1.0, gamma=1e-4)

model.fit(X_train, y_train)

y_predicted = model.predict(X_test)

from sklearn.metrics import f1_score


f1_score(y_test, y_predicted)
Linear Classifier
from sklearn.linear_model import SGDClassifier

model = SGDClassifier(alpha=1e-4,
penalty="elasticnet")
model.fit(X_train, y_train)

y_predicted = model.predict(X_test)

from sklearn.metrics import f1_score


f1_score(y_test, y_predicted)
Random Forests
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=200)

model.fit(X_train, y_train)

y_predicted = model.predict(X_test)

from sklearn.metrics import f1_score


f1_score(y_test, y_predicted)

You might also like