Eduonix - Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
Eduonix - Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [2]: df_penguins=sns.load_dataset("penguins")
In [3]: df_penguins.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 species 344 non-null object
1 island 344 non-null object
2 bill_length_mm 342 non-null float64
3 bill_depth_mm 342 non-null float64
4 flipper_length_mm 342 non-null float64
5 body_mass_g 342 non-null float64
6 sex 333 non-null object
dtypes: float64(4), object(3)
memory usage: 18.9+ KB
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 1/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [4]: df_penguins.head(10)
Out[4]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
Out[5]:
bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
Out[6]:
species island sex
unique 3 3 2
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 2/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [7]: #looks like we are missing the sex for a few penguins
#since we still have a fairly large dataset lets remove these to avoid
#odd visualizations
df_penguins = df_penguins[df_penguins["sex"].notnull()]
Out[8]: species 0
island 0
bill_length_mm 0
bill_depth_mm 0
flipper_length_mm 0
body_mass_g 0
sex 0
dtype: int64
In [10]: df_penguins.columns
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 3/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [13]: sns.set(color_codes=True)
#choosing a color palette from seaborn
#options: pastel, muted, bright, deep, colorblind, dark
#tons of other ways to do this as well!
colors = sns.color_palette("bright")
#first lets set up our plotting area, this gives us 9 potential spots to plot
fig,axes = plt.subplots(2,2, figsize = (10,10))
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 4/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 5/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [14]: sns.set(color_codes=True)
#choosing a color palette from seaborn
#options: pastel, muted, bright, deep, colorblind, dark
#tons of other ways to do this as well!
colors = sns.color_palette("bright")
#first lets set up our plotting area, this gives us 9 potential spots to plot
fig,axes = plt.subplots(2,2, figsize = (10,10))
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 6/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 7/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
#first lets set up our plotting area, this gives us 9 potential spots to plot
fig,axes = plt.subplots(1,3, figsize = (10,6))
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 8/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
#not too concerned about the islands for our purposes today
#then looking closer at the species column, to create a new sample data set that
In [19]: df_penguins["species"].value_counts()
In [20]: #look like we have 68 chinstrap, so lets get 68 randomly selected from the other
adelie = df_penguins[df_penguins["species"] == "Adelie"].sample(n=68)
gentoo = df_penguins[df_penguins["species"] == "Gentoo"].sample(n=68)
Out[21]: pandas.core.frame.DataFrame
In [22]: #now we need to merge these together for our new dataframe
#axis = 0 implies a vertical concat
new_peng = pd.concat([adelie, gentoo, chinstrap], axis = 0)
In [23]: new_peng.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 204 entries, 37 to 192
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 species 204 non-null object
1 bill_length_mm 204 non-null float64
2 bill_depth_mm 204 non-null float64
3 flipper_length_mm 204 non-null float64
4 body_mass_g 204 non-null float64
5 sex 204 non-null object
dtypes: float64(4), object(2)
memory usage: 11.2+ KB
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Produ… 9/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [24]: #now the last thing we need to do is our one hot encoding of the variables "sex",
#that means we'll do this in portions, then merge into our final dataframe we wil
peng_sex = pd.get_dummies(new_peng["sex"])
#dropping sex column from new_peng now before putting the one hot encoded column
new_peng = new_peng.drop(columns= "sex", axis = 0)
In [25]: final_df
Out[25]:
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g Female Male
In [26]: #random state, pick a #, ensures that if we go back and re execute code that the
#this creates out two dataframes to pull from to build models, and then unbiasly
#we use a very high test size of 0.5, this data is actually pretty easy to classi
train_df, test_df = train_test_split(final_df, test_size=0.5, random_state=32)
In [27]: #separating our testing data into two data frames, separating the price column fo
#make sure to copy the column before dropping it in the second line here
#for now we'll also drop the categorical variables, those will require additional
Y_test = test_df["species"]
Y_train = train_df["species"]
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Prod… 10/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
Out[30]: MultinomialNB()
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Prod… 11/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
#The F1 score is a weighted harmonic mean of precision and recall such that the b
#are lower than accuracy measures as they embed precision and recall into their c
#be used to compare classifier models, not global accuracy.
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Prod… 12/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
In [36]: #lets rename some things here to make this model more understandable
features = X_train.copy()
targets = Y_train.copy()
models = [
MultinomialNB(),
LogisticRegression(multi_class='multinomial',max_iter = 10000),
KNeighborsClassifier(),
SVC(),
RandomForestClassifier()
]
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Prod… 13/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Prod… 14/15
4/7/2021 Eduonix_Model Development Production Case Study - Part 2 CODE - Jupyter Notebook
Out[38]:
model_name accuracy
1 LogisticRegression 0.980000
3 RandomForestClassifier 0.960952
2 MultinomialNB 0.814286
0 KNeighborsClassifier 0.744762
4 SVC 0.657619
In [ ]:
localhost:8888/notebooks/Desktop/Eduonix_Model Development Production Case Study - Part 2 - from Desktop/Eduonix_Model Development Prod… 15/15