Efficient Python Tricks and Tools For Data Scientists
Efficient Python Tricks and Tools For Data Scientists
Machine Learning
GitHub View on GitHub Book View Book
This section shows some tricks and libraries for building and
visualizing a machine learning model.
causalimpact: Find Causal Relation of an
Event and a Variable in Python
!pip install pycausalimpact
When working with time series data, you might want to determine
whether an event has an impact on some response variable or not.
For example, if your company creates an advertisement, you might
want to track whether the advertisement results in an increase in
sales or not.
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import
ArmaProcess
import causalimpact
from causalimpact import CausalImpact
np.random.seed(0)
ar = np.r_[1, 0.9]
ma = np.array([1])
arma_process = ArmaProcess(ar, ma)
X = 50 +
arma_process.generate_sample(nsample=1000)
y = 1.6 * X + np.random.normal(size=1000)
ci = CausalImpact(data, pre_period,
post_period)
print(ci.summary())
ci.plot()
# load data
df = load_iris()
X = df.data
y = df.target
# hypertuning
grid = GridSearchCV(make_pipe, grid_params,
cv=5)
grid.fit(X_train, y_train)
# predict
y_pred = grid.predict(X_test)
The estimator is now the entire pipeline instead of just the machine
learning model.
squared=False: Get RMSE from Sklearn’s
mean_squared_error method
If you want to get the root mean squared error using sklearn, pass
squared=False to sklearn’s mean_squared_error method.
y_actual = [1, 2, 3]
y_predicted = [1.5, 2.5, 3.5]
rmse = mean_squared_error(y_actual,
y_predicted, squared=False)
rmse
0.5
modelkit: Build Production ML Systems in
Python
!pip install modelkit textblob
noun_extractor = NounPhraseExtractor()
noun_extractor("What are your learning
strategies?")
WordList(['learning strategies'])
You can also create test cases for your model and make sure all test
cases are passed.
class NounPhraseExtractor(Model):
TEST_CASES = [
{"item": "There is a red apple on the
tree", "result": WordList(["red apple"])}
]
noun_extractor = NounPhraseExtractor()
noun_extractor.test()
TEST 1: SUCCESS
nlp_models = ModelLibrary(models=
[NounPhraseExtractor, SentimentAnalyzer])
noun_extractor =
model_collections.get("noun_phrase_extractor")
noun_extractor("What are your learning
strategies?")
WordList(['learning strategies'])
sentiment_analyzer =
model_collections.get("sentiment_analyzer")
sentiment_analyzer("Today is a beautiful
day!")
Sentiment(polarity=1.0, subjectivity=1.0)
Link to modelkit.
Decompose high dimensional data into
two or three dimensions
!pip install yellowbrick
X, y = load_credit()
classes = ["account in defaut", "current with
bills"]
model = DecisionTreeClassifier()
viz = FeatureImportances(model)
viz.fit(X, y)
viz.show();
From the plot above, it seems like the light is the most important
feature to DecisionTreeClassifier, followed by CO2, temperature.
Link to Yellowbrick.
# Initializing Classifiers
clf1 = LogisticRegression(random_state=0)
clf2 = RandomForestClassifier(random_state=0)
clf3 = SVC(random_state=0, probability=True)
eclf = EnsembleVoteClassifier(clfs=[clf1,
clf2, clf3], weights=[2, 1, 1], voting='soft')