0% found this document useful (0 votes)

70 views41 pages

Data Science Tutorial 1686911993

The document describes an e-commerce shipping dataset containing information about customer shipments. The dataset includes 12 features: customer ID, warehouse block, shipment mode, customer calls, customer rating, product cost, prior purchases, product importance, gender, discount offered, weight, and whether the product reached on time. The summary describes preprocessing steps including separating numeric and categorical features, checking for missing/duplicate values, identifying outliers, and renaming the target column to simplify it. The summary provides high-level information about the dataset contents and preprocessing workflow in 3 sentences.

Uploaded by

Mudança de Hábito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views41 pages

Data Science Tutorial 1686911993

Uploaded by

Mudança de Hábito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

E-Commerce Shipping Dataset :

Product Shipment Delivered on time or not ; To Meet E-Commerce

Customer Demand
The data contains the following information:

1. ID : ID Number of Customers.
2. Warehouse block : The Company have big Warehouse which is divided in to block such as
A,B,C,D,E.
3. Mode of shipment :The Company Ships the products in multiple way such as Ship, Flight and
Road.
4. Customer care calls : The number of calls made from enquiry for enquiry of the shipment.
5. Customer rating : The company has rated from every customer. 1 is the lowest (Worst), 5 is the
highest (Best).
6. Cost of the product : Cost of the Product in US Dollars.
7. Prior purchases : The Number of Prior Purchase.
8. Product importance : The company has categorized the product in the various parameter such as
low, medium, high.
9. Gender : Male and Female.
10. Discount offered : Discount offered on that specific product.
11. Weight in gms : It is the weight in grams.
12. Reached on time : It is the target variable, where 1 Indicates that the product has NOT reached on
time and 0 indicates it has reached on time.

Data Pre-processing

Load & Describe Data

Import library
In [53]: import numpy as np
import pandas as pd

#Visualization
import matplotlib.pyplot as plt
import matplotlib.style as style
style.use('fivethirtyeight') # use style fivethirtyeight
import seaborn as sns
from matplotlib import rcParams
import warnings
warnings.filterwarnings("ignore")

# Scaling
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Selection
from scipy.stats import chi2_contingency

# Splitting the data into Train and Test

from sklearn.model_selection import train_test_split
# Algorithm
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# Evaluation metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report

# Hyperparameter tuning
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

Import file
In [54]: df = pd.read_csv(r"C:\Users\HP\Downloads\Train.csv")

Rename column target

In [55]: df.rename(columns={'Reached.on.Time_Y.N':'is_late'}, inplace=True)
df.head()

Out[55]: ID Warehouse_block Mode_of_Shipment Customer_care_calls Customer_rating Cost_of_the_Product Pr

0 1 D Flight 4 2 177

1 2 F Flight 4 5 216

2 3 A Flight 2 2 183

3 4 B Flight 3 3 176

4 5 C Flight 2 2 184

Because of the target's name is too long, so we simplify the name to ease the next step.

Get the shape of dataset

In [56]: df.shape

(10999, 12)
Out[56]:

Get list of columns

In [57]: df.columns

Index(['ID', 'Warehouse_block', 'Mode_of_Shipment', 'Customer_care_calls',

Out[57]:
'Customer_rating', 'Cost_of_the_Product', 'Prior_purchases',
'Product_importance', 'Gender', 'Discount_offered', 'Weight_in_gms',
'is_late'],
dtype='object')

Change all column names to lower case

In [58]: df.columns = df.columns.str.lower()

Get dataset information

In [59]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10999 entries, 0 to 10998
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 10999 non-null int64
1 warehouse_block 10999 non-null object
2 mode_of_shipment 10999 non-null object
3 customer_care_calls 10999 non-null int64
4 customer_rating 10999 non-null int64
5 cost_of_the_product 10999 non-null int64
6 prior_purchases 10999 non-null int64
7 product_importance 10999 non-null object
8 gender 10999 non-null object
9 discount_offered 10999 non-null int64
10 weight_in_gms 10999 non-null int64
11 is_late 10999 non-null int64
dtypes: int64(8), object(4)
memory usage: 1.0+ MB

In [60]: df.describe()

Out[60]: id customer_care_calls customer_rating cost_of_the_product prior_purchases discount_of

count 10999.00000 10999.000000 10999.000000 10999.000000 10999.000000 10999.00

mean 5500.00000 4.054459 2.990545 210.196836 3.567597 13.37

std 3175.28214 1.141490 1.413603 48.063272 1.522860 16.20

min 1.00000 2.000000 1.000000 96.000000 2.000000 1.00

25% 2750.50000 3.000000 2.000000 169.000000 3.000000 4.00

50% 5500.00000 4.000000 3.000000 214.000000 3.000000 7.00

75% 8249.50000 5.000000 4.000000 251.000000 4.000000 10.00

max 10999.00000 7.000000 5.000000 310.000000 10.000000 65.00

Based on the information above :

1. Dataframe has 10999 rows and 12 columns.

2. No missing values are found.
3. There are only 2 data types, integer and object.
4. Classification target is_late and others we call features.

Separate numeric & categorical column

In [61]: # Categorical data
categorical = ['warehouse_block','mode_of_shipment','product_importance', 'gender', '
# Numerical data
numeric = ['customer_care_calls', 'cost_of_the_product', 'prior_purchases', 'discount_

Data Cleansing & Feature Engineering

Reload dataset

In [62]: df_dt = df.copy()

Identify missing values

In [63]: df_dt.isna().values.any() # Missing value detection

False
Out[63]:

In [64]: df_dt.isna().sum() # Calculate missing values

id 0
Out[64]:
warehouse_block 0
mode_of_shipment 0
customer_care_calls 0
customer_rating 0
cost_of_the_product 0
prior_purchases 0
product_importance 0
gender 0
discount_offered 0
weight_in_gms 0
is_late 0
dtype: int64

Just for making sure that no missing values are found.

Identify duplicated values

In [65]: # Select all duplicate rows based on all columns

df_dt[df_dt.duplicated(keep=False)]

Out[65]: id warehouse_block mode_of_shipment customer_care_calls customer_rating cost_of_the_product prior_

In [66]: # Select all duplicate rows based on selected column

df_dt[df_dt.duplicated(subset=['id'],keep=False)] # Display all duplicated rows based

Out[66]: id warehouse_block mode_of_shipment customer_care_calls customer_rating cost_of_the_product prior_

Luckily, there is no duplicated value in the dataframe.

Identify outliers

In [67]: # Identify using boxplot

plt.figure(figsize=(20,8))
for i in range(0,len(numeric)):
plt.subplot(1, len(numeric), i+1)
sns.boxplot(y=df_dt[numeric[i]], color='orange')
plt.tight_layout()

In [68]: # Identify outlier using IQR

for col in numeric:

# Menghitung nilai IQR

Q1 = df_dt[col].quantile(0.25)
Q3 = df_dt[col].quantile(0.75)
IQR = Q3 - Q1

# Define value
nilai_min = df_dt[col].min()
nilai_max = df_dt[col].max()
lower_lim = Q1 - (1.5*IQR)
upper_lim = Q3 + (1.5*IQR)

# Identify low outlier

if (nilai_min < lower_lim):

print('Low outlier is found in column',col,'<', lower_lim,'\n')
#display total low outlier
print('Total of Low Outlier in column',col, ':', len(list(df_dt[df_dt[col] <
elif (nilai_max > upper_lim):
print('High outlier is found in column',col,'>', upper_lim,'\n')
#display total high outlier
print('Total of High Outlier in column',col, ':', len(list(df_dt[df_dt[col] >

else:
print('Outlier is not found in column',col,'\n')

Outlier is not found in column customer_care_calls

Outlier is not found in column cost_of_the_product

High outlier is found in column prior_purchases > 5.5

Total of High Outlier in column prior_purchases : 1003

High outlier is found in column discount_offered > 19.0

Total of High Outlier in column discount_offered : 2209

Outlier is not found in column weight_in_gms

We found outliers in discount_offered & prior_purchases with almost 30% of data.

In [69]: # We handle outlier with replace the value with upper_bound or lower_bound
for col in ['prior_purchases', 'discount_offered']:
# Initiate Q1
Q1 = df_dt[col].quantile(0.25)
# Initiate Q3
Q3 = df_dt[col].quantile(0.75)
# Initiate IQR
IQR = Q3 - Q1
# Initiate lower_bound & upper_bound
lower_bound = Q1 - (IQR * 1.5)
upper_bound = Q3 + (IQR * 1.5)

# Filtering outlier & replace with upper_bound or lower_bound

df_dt[col] = np.where(df_dt[col] >= upper_bound,
upper_bound, df_dt[col])
df_dt[col] = np.where(df_dt[col] <= lower_bound,
lower_bound, df_dt[col])

In [70]: # Identify using boxplot

plt.figure(figsize=(20,8))
for i in range(0,len(numeric)):
plt.subplot(1, len(numeric), i+1)
sns.boxplot(y=df_dt[numeric[i]], color='orange')
plt.tight_layout()
In [71]: sns.boxplot(y= df_dt['prior_purchases'], color = 'orange', orient = 'h');

In [72]: sns.boxplot(y= df_dt['discount_offered'], color = 'orange', orient = 'h');

We didn't remove the outliers, but replacing with upper bound and lower bound. And we can see in the
visualization above, there is no outliers detected.

Feature Transformation : Log transform

In [73]: # Check data distribution

plt.figure(figsize=(20,5))
for i in range(0,len(numeric)):
plt.subplot(1, len(numeric), i+1)
sns.distplot(df_dt[numeric[i]], color='orange')
plt.tight_layout()
In [74]: # Apply log transformation
for col in numeric:
df_dt[col] = (df_dt[col]+1).apply(np.log)

In [75]: # Visualize after log transformation

plt.figure(figsize=(20,5))
for i in range(0,len(numeric)):
plt.subplot(1, len(numeric), i+1)
sns.distplot(df_dt[numeric[i]], color='orange')
plt.tight_layout()

Feature Scaling : Standardization

In [76]: # Apply standardization

for col in numeric:
df_dt[col]= StandardScaler().fit_transform(df_dt[col].values.reshape(len(df_dt),

In [77]: df_dt.describe()

Out[77]: id customer_care_calls customer_rating cost_of_the_product prior_purchases discount_of

count 10999.00000 1.099900e+04 10999.000000 1.099900e+04 1.099900e+04 1.099900

mean 5500.00000 -6.375198e-15 2.990545 -6.128706e-15 -3.886356e-15 -5.67272

std 3175.28214 1.000045e+00 1.413603 1.000045e+00 1.000045e+00 1.000045

min 1.00000 -2.177630e+00 1.000000 -3.086779e+00 -1.387728e+00 -1.947658

25% 2750.50000 -9.146463e-01 2.000000 -7.773403e-01 -2.636003e-01 -6.23428

50% 5500.00000 6.500001e-02 3.000000 1.892606e-01 -2.636003e-01 5.58237

75% 8249.50000 8.654293e-01 4.000000 8.428454e-01 6.083412e-01 5.16055

max 10999.00000 2.128413e+00 5.000000 1.708704e+00 1.633539e+00 1.380053

Feature Selection : Chi squared method

In [78]: # Selection for categorial feature

# Import module
from scipy.stats import chi2_contingency

category = ['warehouse_block','mode_of_shipment','product_importance',
'gender','customer_rating']
chi2_check = []
# Iteration
for col in category:
# If pvalue < 0.05
if chi2_contingency(pd.crosstab(df_dt['is_late'], df_dt[col]))[1] < 0.05 :
chi2_check.append('Reject Null Hypothesis')
# If pvalue > 0.05
else :
chi2_check.append('Fail to Reject Null Hypothesis')

# Make the result into dataframe

res = pd.DataFrame(data = [category, chi2_check]).T
# Rename columns
res.columns = ['Column', 'Hypothesis']
res

Out[78]: Column Hypothesis

0 warehouse_block Fail to Reject Null Hypothesis

1 mode_of_shipment Fail to Reject Null Hypothesis

2 product_importance Reject Null Hypothesis

3 gender Fail to Reject Null Hypothesis

4 customer_rating Fail to Reject Null Hypothesis

In [79]: # Adjusted P-Value use the Bonferroni-adjusted method

# Initiate empty dictionary

check = {}
# Iteration for product_importance column
for i in res[res['Hypothesis'] == 'Reject Null Hypothesis']['Column']:
# One hot encoding product_importance column
dummies = pd.get_dummies(df_dt[i])
# Initiate Bonferroni-adjusted formula
bon_p_value = 0.05/df_dt[i].nunique()
for series in dummies:
if chi2_contingency(pd.crosstab(df_dt['is_late'], dummies[series]))[1] < bon_
check['{}-{}'.format(i, series)] = 'Reject Null Hypothesis'
else :
check['{}-{}'.format(i, series)] = 'Fail to Reject Null Hypothesis'
# Make the result into dataframe
res_chi_ph = pd.DataFrame(data=[check.keys(), check.values()]).T
# Rename the columns
res_chi_ph.columns = ['Pair', 'Hypothesis']
res_chi_ph

Out[79]: Pair Hypothesis

0 product_importance-high Reject Null Hypothesis

1 product_importance-low Fail to Reject Null Hypothesis

2 product_importance-medium Fail to Reject Null Hypothesis

From the result above, product_importance with high category has a correlation with our target.

Feature Encoding : One hot encoding

In [80]: # one hot encoding feature product_importance and keep high category
onehots = pd.get_dummies(df_dt['product_importance'], prefix = 'product_importance')
df_dt = df_dt.join(onehots)

# drop all categorical columns & 'id, except product_importance_high

df_dt.drop(columns=['warehouse_block','gender','mode_of_shipment',
'product_importance', 'product_importance_low',
'product_importance_medium','id'], inplace = True)
# check dataframe after encoding
df_dt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10999 entries, 0 to 10998
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customer_care_calls 10999 non-null float64
1 customer_rating 10999 non-null int64
2 cost_of_the_product 10999 non-null float64
3 prior_purchases 10999 non-null float64
4 discount_offered 10999 non-null float64
5 weight_in_gms 10999 non-null float64
6 is_late 10999 non-null int64
7 product_importance_high 10999 non-null uint8
dtypes: float64(5), int64(2), uint8(1)
memory usage: 612.4 KB

Exploratory Data Analysis (EDA)

In [81]: # Copy dataset
df_eda = df.copy()

Target Visualization
In [82]: delay = pd.DataFrame(df_eda.groupby(['is_late'])['id'].count()/len(df_eda)).reset_ind
plt.pie(delay['id'],labels=delay['is_late'],autopct='%.2f%%');

The class of target looks balance.

Descriptive Statistic

Categorical values

In [83]: # for categorical column

for col in categorical:
print('Value count kolom', col, ':')
print(df_eda[col].value_counts())
print()

Value count kolom warehouse_block :

F 3666
D 1834
A 1833
B 1833
C 1833
Name: warehouse_block, dtype: int64
Value count kolom mode_of_shipment :
Ship 7462
Flight 1777
Road 1760
Name: mode_of_shipment, dtype: int64

Value count kolom product_importance :

low 5297
medium 4754
high 948
Name: product_importance, dtype: int64

Value count kolom gender :

F 5545
M 5454
Name: gender, dtype: int64

Value count kolom is_late :

1 6563
0 4436
Name: is_late, dtype: int64

Value count kolom customer_rating :

3 2239
1 2235
4 2189
5 2171
2 2165
Name: customer_rating, dtype: int64

In [84]: # Plot categorical columns

for col in categorical:
plt.figure(figsize=(15, 5))

plt.subplot(141);
sns.countplot(df_eda[col], palette = 'colorblind', orient='v');
plt.title('Countplot')
plt.tight_layout();

plt.subplot(143);
df_eda[col].value_counts().plot.pie(autopct='%1.2f%%');
plt.title('Pie chart')
plt.legend()
Summary :

Warehouse_Block has 5 unique values and dominated with Warehouse_block_f .

Mode_of_Shipment has 3 unique values and mostly used ship.
Product_importance has 3 unique values and mostly priority of products are low.
Female customers are often shopping than male.

Numeric values

In [85]: df_eda[numeric].describe()

Out[85]: customer_care_calls cost_of_the_product prior_purchases discount_offered weight_in_gms

count 10999.000000 10999.000000 10999.000000 10999.000000 10999.000000

mean 4.054459 210.196836 3.567597 13.373216 3634.016729

std 1.141490 48.063272 1.522860 16.205527 1635.377251

min 2.000000 96.000000 2.000000 1.000000 1001.000000

25% 3.000000 169.000000 3.000000 4.000000 1839.500000

50% 4.000000 214.000000 3.000000 7.000000 4149.000000

75% 5.000000 251.000000 4.000000 10.000000 5050.000000

max 7.000000 310.000000 10.000000 65.000000 7846.000000

Summary :
Distribution of customer_care_calls, Customer_rating, Cost_of_the_Product, Prior_purchases
look normal, beacuse the mean and the median are close, while discount_offered and
weight_in_grams are indicated skewed.

Correlation Heatmap

In [86]: plt.figure(figsize=(7,6));
sns.heatmap(df_eda.corr(), annot = True, fmt = '.2f', cmap = 'Reds');

Based on the Correlation heatmap above :

1. Target is_late has a moderate positive correlation with discount_offered & weak negative correlation
with weight_in_gms.
2. Feature customer_care_calss has a weak positive correlation with cost_of_the_product and
negative correlation with weight_in_gms.
3. Feature discount_offered has a moderate negative correlation with weight_in_gms.

Categorical - Categorical

Based on Gender

In [87]: i=1
plt.figure(figsize=(15,10))
for col in ['mode_of_shipment', 'warehouse_block', 'product_importance']:
plt.subplot(2,2,i)
sns.countplot(df_eda[col], hue=df_eda['gender'], palette="ch:.25")
i+=1
Summary :

Total parcels of female customers in the warehouse_block are more dominant than male
customers, except in warehouse_block B.

Based on Product Importance

In [88]: i=1
plt.figure(figsize=(15,10))
for col in ['mode_of_shipment', 'warehouse_block']:
plt.subplot(2,2,i)
sns.countplot(df_eda[col], hue=df_eda['product_importance'], palette="ch:.25")
i+=1

Summary :

Mostly high & low priority parcels used ship.

Warehouse block - Mode of Shipment

In [89]: sns.catplot(x="warehouse_block", kind="count", hue='mode_of_shipment',

palette="ch:.25", data=df_eda);
Based on target 'is_late'

In [90]: i=1
plt.figure(figsize=(15,10))
for col in ['mode_of_shipment', 'warehouse_block', 'product_importance',
'gender','customer_rating']:
plt.subplot(2,3,i)
sns.countplot(df_eda[col], hue=df_eda['is_late'], palette="ch:.25")
i+=1
plt.legend(['on_time','late']);

Summary :

Most of parcels are stored in warehouse_block F.

The ship contributes the most late delivery.
Most of parcels in all shipment priority are delivered late.
Machine Learning Modelling & Evaluation
Separate feature & target column
In [91]: # Inititate feature & target
X = df_dt.drop(columns = 'is_late')
y = df_dt['is_late']

Split train & test data

In [92]: # Split Train & Test Data
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.30, random_state=42

Fit & Evaluation Model

In [93]: # Create function to fit model & model evaluation
def fit_evaluation(Model, Xtrain, ytrain, Xtest, ytest):
model = Model # initiate model
model.fit(Xtrain, ytrain) # fit the model
y_pred = model.predict(Xtest)
y_pred_train = model.predict(Xtrain)
train_score = model.score(Xtrain, ytrain) # Train Accuracy
test_score = model.score(Xtest, ytest) # Test Accuracy
fpr, tpr, thresholds = roc_curve(ytest, y_pred, pos_label=1)
AUC = auc(fpr, tpr) # AUC
return round(train_score,2), round(test_score,2), round(precision_score(ytest, y_
round(recall_score(ytrain, y_pred_train),2),round(recall_score(ytest, y_pr
round(f1_score(ytest, y_pred),2), round(AUC,2)

Default Parameter

In [94]: # Inititate algorithm

lr = LogisticRegression(random_state=42)
dt = DecisionTreeClassifier(random_state=42)
rf = RandomForestClassifier(random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
svc = SVC(random_state=42)

# Create function to make the result as dataframe

def model_comparison_default(X,y):

# Logistic Regression
lr_train_score, lr_test_score, lr_pr, lrtr_re, lrte_re, lr_f1, lr_auc = fit_evalu
lr, Xtrain, ytrain, Xtest, ytest)
# Decision Tree
dt_train_score, dt_test_score, dt_pr, dttr_re, dtte_re, dt_f1, dt_auc = fit_evalu
dt, Xtrain, ytrain, Xtest, ytest)
# Random Forest
rf_train_score, rf_test_score, rf_pr, rftr_re, rfte_re, rf_f1, rf_auc = fit_evalu
rf, Xtrain, ytrain, Xtest, ytest)
# KNN
knn_train_score, knn_test_score, knn_pr, knntr_re, knnte_re, knn_f1, knn_auc = fi
knn, Xtrain, ytrain, Xtest, ytest)
# SVC
svc_train_score, svc_test_score, svc_pr, svctr_re, svcte_re, svc_f1, svc_auc = fi
svc, Xtrain, ytrain, Xtest, ytest)

models = ['Logistic Regression','Decision Tree','Random Forest',

model_comparison = pd.DataFrame(data=[models, train_score, test_score,

precision, recall_train, recall_test,
f1,auc]).T.rename({0: 'Model',
1: 'Accuracy_Train',
2: 'Accuracy_Test',
3: 'Precision',
4: 'Recall_Train',
5: 'Recall_Test',
6: 'F1 Score',
7: 'AUC'
},

return model_comparison

In [95]: model_comparison_default(X,y)

Out[95]: F1
Model Accuracy_Train Accuracy_Test Precision Recall_Train Recall_Test AUC
Score

Logistic
0 0.63 0.63 0.67 0.75 0.75 0.71 0.6
Regression

1 Decision Tree 1.0 0.66 0.72 1.0 0.71 0.71 0.64

2 Random Forest 1.0 0.67 0.76 1.0 0.66 0.71 0.67

3 KNN 0.76 0.64 0.72 0.77 0.67 0.69 0.64

4 SVC 0.68 0.66 0.88 0.52 0.51 0.65 0.7

From the result above, only Logistic Regression and SVC which are neither overfitting nor
underfiting. Logistic Regression has the highest recall. Let's see with tuned parameter.

Hyperparameter

Logistic Regression

In [96]: # List Hyperparameters

penalty = ['l2','l1','elasticnet']
C = [0.0001, 0.001, 0.002] # Inverse of regularization strength; smaller values speci
hyperparameters = dict(penalty=penalty, C=C)

# Inisiasi model
logres = LogisticRegression(random_state=42) # Init Logres dengan Gridsearch, cross va
model = RandomizedSearchCV(logres, hyperparameters, cv=5, random_state=42, scoring='

# Fitting Model & Evaluation

model.fit(Xtrain, ytrain)
y_pred = model.predict(Xtest)
model.best_estimator_

LogisticRegression(C=0.0001, random_state=42)
Out[96]:

Decision Tree

In [97]: # Let's do hyperparameter tuning using RandomizesearchCV

# Hyperparameter lists to be tested
max_depth = list(range(1,10))
min_samples_split = list(range(5,10))
min_samples_leaf = list(range(5,15))
max_features = ['auto', 'sqrt', 'log2']
criterion = ['gini','entropy']
splitter = ['best','random']

# Initiate hyperparameters
hyperparameters = dict(max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
max_features=max_features,
criterion = criterion,
splitter = splitter)

# Initiate model
dt_tun = DecisionTreeClassifier(random_state=42)
model = RandomizedSearchCV(dt_tun, hyperparameters, cv=10, scoring='recall',random_st
model.fit(Xtrain, ytrain)
y_pred_tun = model.predict(Xtest)
model.best_estimator_

DecisionTreeClassifier(criterion='entropy', max_depth=4, max_features='sqrt',

Out[97]:
min_samples_leaf=12, min_samples_split=6,
random_state=42)

Random Forest

In [98]: # Initiate hyperparameters

params = {'max_depth':[50],'n_estimators':[100,150],
'criterion':['gini', 'entropy']}
# Initiate model
model = GridSearchCV(estimator=RandomForestClassifier(random_state=42),
param_grid=params,scoring='recall', cv=5)
# Fit model
model.fit(Xtrain,ytrain)
y_pred = model.predict(Xtest)
# Get best estimator
model.best_estimator_

RandomForestClassifier(max_depth=50, random_state=42)
Out[98]:

KNN

In [99]: #List Hyperparameters that we want to tune.

leaf_size = list(range(1,50))
n_neighbors = list(range(1,30))
p=[1,2]

#Convert to dictionary
hyperparameters = dict(leaf_size=leaf_size, n_neighbors=n_neighbors, p=p)

#Create new KNN object

KNN_2 = KNeighborsClassifier()

#Use RandomizedSearchCV
clf = RandomizedSearchCV(KNN_2, hyperparameters, cv=10, scoring = 'recall')

#Fit the model

best_model = clf.fit(X,y)
# Get best estimator
clf.best_estimator_

KNeighborsClassifier(leaf_size=36, n_neighbors=3)
Out[99]:
SVC

In [100… # Hyperparameter lists to be tested

kernel = ['linear', 'poly', 'rbf', 'sigmoid']
C = [0.0001, 0.001, 0.002]
gamma = ['scale', 'auto']

#Convert to dictionary
hyperparameters = dict(kernel=kernel, C=C, gamma=gamma)

# Initiate model
svc = SVC(random_state=42)
model = RandomizedSearchCV(svc, hyperparameters, cv=5, random_state=42,
scoring='recall')

# Fitting Model & Evaluation

model.fit(Xtrain, ytrain)
y_pred = model.predict(Xtest)
model.best_estimator_

SVC(C=0.0001, kernel='linear', random_state=42)

Out[100]:

Tuned Parameter

In [103… # Inititate best estimator

lr_tune = LogisticRegression(C=0.0001, random_state=42)
dt_tune = DecisionTreeClassifier(criterion='entropy', max_depth=4, max_features='sqrt
min_samples_leaf=12, min_samples_split=6,
random_state=42)
rf_tune = RandomForestClassifier(max_depth=50, random_state=42)

knn_tune = KNeighborsClassifier(leaf_size=24, n_neighbors=3, p=1)

svc_tune = SVC(C=0.0001, kernel='linear', random_state=42)

# Create function to make the result as dataframe

def model_comparison_tuned(X,y):

# Logistic Regression
lr_train_score, lr_test_score, lr_pr, lrtr_re, lrte_re, lr_f1, lr_auc = fit_evalu
lr_tune, Xtrain, ytrain, Xtest, ytest)
# Decision Tree
dt_train_score, dt_test_score, dt_pr, dttr_re, dtte_re, dt_f1, dt_auc = fit_evalu
dt_tune, Xtrain, ytrain, Xtest, ytest)
# Random Forest
rf_train_score, rf_test_score, rf_pr, rftr_re, rfte_re, rf_f1, rf_auc = fit_evalu
rf_tune, Xtrain, ytrain, Xtest, ytest)
# KNN
knn_train_score, knn_test_score, knn_pr, knntr_re, knnte_re, knn_f1, knn_auc = fi
knn_tune, Xtrain, ytrain, Xtest, ytest)
# SVC
svc_train_score, svc_test_score, svc_pr, svctr_re, svcte_re, svc_f1, svc_auc = fi
svc_tune, Xtrain, ytrain, Xtest, ytest)

models = ['Logistic Regression','Decision Tree','Random Forest',

'KNN','SVC']
train_score = [lr_train_score, dt_train_score, rf_train_score,
knn_train_score, svc_train_score]
test_score = [lr_test_score, dt_test_score, rf_test_score,
knn_test_score, svc_test_score]
precision = [lr_pr, dt_pr, rf_pr, knn_pr, svc_pr]
recall_train = [lrtr_re, dttr_re, rftr_re, knntr_re, svctr_re]
recall_test = [lrte_re, dtte_re, rfte_re, knnte_re, svcte_re]
f1 = [lr_f1, dt_f1, rf_f1, knn_f1, svc_f1]
auc = [lr_auc, dt_auc, rf_auc, knn_auc, svc_auc]
model_comparison = pd.DataFrame(data=[models, train_score, test_score,
precision, recall_train, recall_test,
f1,auc]).T.rename({0: 'Model',
1: 'Accuracy_Train',
2: 'Accuracy_Test',
3: 'Precision',
4: 'Recall_Train',
5: 'Recall_Test',
6: 'F1 Score',
7: 'AUC'
},

return model_comparison

In [104… model_comparison_tuned(X,y)

Out[104]: F1
Model Accuracy_Train Accuracy_Test Precision Recall_Train Recall_Test AUC
Score

Logistic
0 0.59 0.6 0.6 1.0 1.0 0.75 0.5
Regression

1 Decision Tree 0.67 0.68 0.8 0.62 0.62 0.7 0.69

2 Random Forest 1.0 0.67 0.76 1.0 0.66 0.71 0.67

3 KNN 0.81 0.65 0.72 0.82 0.69 0.7 0.64

4 SVC 0.59 0.6 0.6 1.0 1.0 0.75 0.5

Decision Tree algorithm with hyper-parameter tuning has a good balance between its score, also neither
underfitting nor overfitting.

Confusion matrix
In [105… from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

def get_confusion_matrix(model, X_train, y_train, X_test, y_test, labels=None):

# Train the model on the training data
model.fit(X_train, y_train)

# Make predictions on the test data

y_pred = model.predict(X_test)

# Create confusion matrix

cm = confusion_matrix(y_test, y_pred)

# Set display labels

if labels is None:
labels = ['Negative', 'Positive']

# Plot the confusion matrix using Seaborn heatmap

plt.figure(figsize=(3, 3))
sns.heatmap(cm, annot=True, cmap='Blues', fmt='g', xticklabels=labels, yticklabel
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()

In [106… # After hyperparameter tuning

get_confusion_matrix(dt_tune, Xtrain, ytrain, Xtest, ytest,labels=['on_time','late'])
In [107… # Before hyperparameter tuning
get_confusion_matrix(dt, Xtrain, ytrain, Xtest, ytest,labels=['on_time','late'])

Feature Importance
In [108… feat_importances = pd.Series(dt_tune.feature_importances_, index=X.columns)
ax = feat_importances.nlargest(25).plot(kind='barh', figsize=(10, 8))
ax.invert_yaxis()

plt.xlabel('Score');
plt.ylabel('Feature');
plt.title('Feature Importance Score');
Recommendation for E-Commerce :

The operation team should add more manpower when there is a sale program, especially for the
discount more than 10% and the parcel weight is 1 - 4 Kg.
The parcel should not be centralized in the warehouse block F, so that the handling is not too
crowded which can cause the late shipment.
Adding more features can imporve model performance, such as delivery time estimation, delivery
date, customer address, and courier.

In [ ]:
Name of the Project- Product Shipment Delivered on time or not ?

The E-Commerce Shipping Problem

By Pooja Keer

Batch- PGA19(THANE)
E-COMMERCE INDUSTRY-

" Ecommerce" or "electronic commerce" is the trading of goods and services on the internet.

The e-commerce industry has seen significant growth in recent years, with more and
more people shopping online. As a result, there is demand for efficient and reliable
shipment services to deliver goods to customers in a timely manner.
One of the major challenges in e-commerce shipment is the management of the
delivery process. The shipment process involves multiple stages, from receiving the
order to packaging the goods and finally delivering them to the customer.
Each stage of the process must be carefully coordinated to ensure timely delivery and
minimize the risk of errors or delays.
E-COMMERCE SHIPPING PROCESS
The shipping process involves everything from receiving a
customer order to preparing it for last-mile delivery. The shipping
process can be broken down into three primary stages:
•Order receiving: make sure items are in stock to fulfill the order
•Order processing: verify order data and make sure it’s accurate
(e.g., verifying the shipping address)
•Order fulfillment: a picking list is generated and items are picked,
packed, and prepared to be shipped
HOW IT AFFECTS INDUSTRY?

◼ Customer dissatisfaction: Customers expect their orders to be delivered on time. If products are
not delivered on time, customers may become dissatisfied with the service and the e-commerce
platform. This can lead to negative reviews and decreased customer loyalty.
◼ Loss of sales: Delayed product delivery can lead to canceled orders, which can result in a loss of
sales for the e-commerce platform. Customers may also choose to shop with competitors who are
better at delivering products on time.
◼ Increased shipping costs: If products are not delivered on time, the e-commerce platform may
need to pay for expedited shipping or other additional costs to meet the delivery deadline. This can
increase the shipping costs, which can negatively impact the company's bottom line.
◼ Reputation damage: The timely delivery of products is critical to maintaining the e-commerce
platform's reputation. If products are not delivered on time, it can damage the company's
reputation and lead to decreased trust from customers.
WHAT COULD BE BENEFITS OF DELIVERING PRODUCT ON TIME ?

◼ Increased customer satisfaction: Timely delivery of products is essential for meeting

customer expectations and delivering a positive shopping experience.
◼ Improved brand reputation: Consistently delivering products on time can help build a
positive brand reputation for the e-commerce platform.
◼ Increased efficiency and cost savings: An efficient logistics system that ensures timely
delivery can help reduce shipping and handling costs, reduce product returns and
exchanges, and streamline order fulfillment processes.
◼ Competitive advantage: Timely delivery of products can be a competitive advantage for
e-commerce platforms in a crowded market.
◼ Improved operational performance: Delivering products on time can help e-commerce
platforms improve their operational performance.
Solution towards the business problem
◼ I have worked on the dataset called e-commerce shipping Data of about 11000 of records having features as below -
◼ ID : ID Number of Customers.
◼ Warehouse block : The Company have big Warehouse which is divided in to block such as A,B,C,D,E.
◼ Mode of shipment :The Company Ships the products in multiple way such as Ship, Flight and Road.
◼ Customer care calls : The number of calls made from enquiry for enquiry of the shipment.
◼ Customer rating : The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best).
◼ Cost of the product : Cost of the Product in US Dollars.
◼ Prior purchases : The Number of Prior Purchase.
◼ Product importance : The company has categorized the product in the various parameter such as low, medium, high.
◼ Gender : Male and Female.
◼ Discount offered : Discount offered on that specific product.
◼ Weight in gms : It is the weight in grams.
◼ Reached on time : It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time.
The following steps are carried out –

1. Data Preprocessing

2. Data Visualization

3. Model Fitting

4. Prediction

5. Hyperparameter Tuning
1.Data Pre-processing-

—Imported file in CSV format by renaming the target column

‘Reached.on.Time_Y.N':'is_late’.
—Obtained the shape of dataset i.e (10999 rows , 12 columns) by changing the all
columns into lower case as it is case sensitive.
—Described the data as info below & separated the numeric & categorical columns
Dataframe has 10999 rows and 12 columns.
1. No missing values are found.
2. There are only 2 data types, integer and object.
3. Classification target is_late and others we call features.
2.Data Visualization-
—Data Cleansing done by checking the missing values & duplicates in
dataset
—No any missing values/duplicates are found in data.
—Identification of outlier was carried out for numeric columns using
boxplot & IQR (interquartile range) method where prior_purchase
column has identified outlier which is replaced by the upper & lower
bound of itself.
—so , now no outlier is identified in any numeric column.
3.Featuring Engineering-
—Log transformation is carried out for numeric columns in which the plots
are normalized i.eskewed to normal distribution & Numeric Values are
standardized using StandardScaler
—Feature selection is carried-out by ‘ CHI-SQUARE METHOD’ in which the
categorical column who ‘rejects the null hypothesis ‘ is the important feature for
further feature encoding.
—In this, the Product_Importance column rejected the null hypothesis ,thus its the
feature importance for the data & other columns 'warehouse_block','
mode_of_shipment','gender','customer_rating' “Failed to reject the null hypothesis”.
—Further, one-hot encoding is done(feature encoding)by dropping the categorical
column ‘who failed to reject the null hypothesis’.
4.Exploratory Data Analysis-
—For carrying out EDA, the copy of data is used where ‘TARGET VISUALISATION’ is
done & target of class looks balanced.
—Descriptive Statistic is carried out for categorical column to
get the value counts of each column & plotted the countplot
as shown below-
I) Warehouse_Block has 5 unique values and dominated with Warehouse_block_f.
II)Mode_of_Shipment has 3 unique values and mostly used ship.

III)Product_importance has 3 unique values and mostly priority of products are low.
● After doing EDA of numeric ,Distribution of customer_care_calls, Customer_rating, Cost_of_the_Product,
Prior_purchases is normal, because the mean and the median are close, while discount_offered and
weight_in_grams are indicated skewed.
● The Correlation Heatmap is obtained for studying the correlation with target variable as follows-
Based on the Correlation heatmap above :

1. Target is_late has a moderate positive correlation with discount_offered & weak negative correlation with weight_in_gms.
2. Feature customer_care_calss has a weak positive correlation with cost_of_the_product and negative correlation
withweight_in_gms.
3. Feature discount_offered has a moderate negative correlation with weight_in_gms.

—Further, graphs were plot based on categorical columns'mode_of_shipment',

'warehouse_block', 'product_importance', 'gender','customer_rating'-
where we got the output-
● Most of parcels are stored in warehouse_block F.
● The ship contributes the most late delivery.
● Most of parcels in all shipment priority are delivered late.
5.Model Fitting-
–Through ‘Def Fit_evaluation ‘ function the model were initiated & evaluated.
& through ‘Model Comparison’ function the following models were passes through
dataframe -
LogisticRegression,DecisionTreeClassifier,RandomForestClassifier,
KNeighborsClassifier,Support Vector Classifier.
with Accuracy_Train','Accuracy_Test','Precision','Recall_Train,'Recall_Test',
'F1 Score','AUC': - where the result is as-
Only Logistic Regression and SVC which are neither overfitting nor underfiting. Logistic
Regression has the highest recall.
6. Parameter tuning-

–Decision Tree algorithm with hyper-parameter tuning has a good balance between its score, also
neither underfitting nor overfitting.
–Confusion Matrix after parameter tuning-
7.Feature Importance-

● The operation team should add more manpower when there is a sale program, especially for
the discount more than 10% and the parcel weight is 1 - 4 Kg.
● The parcel should not be centralized in the warehouse block F, so that the handling is not too
crowded which can cause the late shipment.
● Adding more features can imporve model performance, such as delivery time estimation,
CONCLUSION

◼ The timely delivery of products can also have a significant financial impact on the e-commerce
business. If products are delivered on time, it can lead to increased sales, repeat business, and revenue
growth. However, if products are consistently delivered late, it can lead to canceled orders, lost sales,
and increased shipping costs, which can negatively impact the company's bottom line.
◼ Late delivery of products can have legal consequences, particularly if there are contractual obligations
regarding delivery times. Failure to meet these obligations can result in breach of contract lawsuits,
financial penalties, and damage to the business reputation.
◼ Timely delivery of products is critical to maintaining a positive business reputation. If products are
consistently delivered on time, it can lead to a positive brand reputation and increased customer
trust. However, if products are frequently delivered late, it can damage the business reputation and
lead to decreased customer trust.
FURTHER ENHANCING MODEL

◼ The model Applied for the business problem were machine learning based. By maintaining accuracy ,further
we can enhance by several deep learning models that could be useful for predicting delivery of a product
on time.
◼ Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Long Short-Term Memory
(LSTM) networks, Autoencoders.
◼ Time-series models are useful for analyzing data that changes over time, such as the frequency and timing
of deliveries.You could use time-series models to predict the delivery time based on historical data,
identifying patterns and trends that can help you make accurate predictions.
◼ RapidMiner can be useful for predicting the delivery of products on time. RapidMiner is a data science
platform that offers a variety of tools and techniques for data preprocessing, modeling, and evaluation. By
leveraging RapidMiner's predictive modeling capabilities, you can build and train a model that can predict
the likelihood of delivery delays based on various factors such as shipping method, location, product type,
and weather conditions.
THANK YOU

Problem Scenario
No ratings yet
Problem Scenario
13 pages
SC-200: Microsoft Security Operations Analyst Preparation
From Everand
SC-200: Microsoft Security Operations Analyst Preparation
Georgio Daccache
No ratings yet
Task 6
No ratings yet
Task 6
14 pages
Data Clearning
No ratings yet
Data Clearning
7 pages
Mini Project2 DAV Answers - Jupyter Notebook
No ratings yet
Mini Project2 DAV Answers - Jupyter Notebook
21 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
ML Ex2
No ratings yet
ML Ex2
7 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
Exp 01-B Feature Selection and Extraction
No ratings yet
Exp 01-B Feature Selection and Extraction
12 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
PRJ Sales Forecasting
No ratings yet
PRJ Sales Forecasting
22 pages
Practice Questions2
No ratings yet
Practice Questions2
2 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
ML 8 Program
No ratings yet
ML 8 Program
5 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
MLM Report Customer Churn
No ratings yet
MLM Report Customer Churn
17 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Analysis and Prediction of House Prices by Linear Regression Model
No ratings yet
Analysis and Prediction of House Prices by Linear Regression Model
91 pages
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
No ratings yet
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
41 pages
Porter Case Study
No ratings yet
Porter Case Study
153 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Data Assigment 1
100% (2)
Data Assigment 1
32 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
MGNM - 801 - Ca1
No ratings yet
MGNM - 801 - Ca1
14 pages
AIML
No ratings yet
AIML
13 pages
Aerofit Case Study
No ratings yet
Aerofit Case Study
16 pages
Lab File
No ratings yet
Lab File
96 pages
BankX Marketing 1744722258
No ratings yet
BankX Marketing 1744722258
29 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Project 12 Big Mart Sales Prediction
No ratings yet
Project 12 Big Mart Sales Prediction
15 pages
DS Food
No ratings yet
DS Food
23 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Data Cleaning in Machine Learning With Numerical Example
No ratings yet
Data Cleaning in Machine Learning With Numerical Example
3 pages
Lab1 Features Selections-Class-GI2
No ratings yet
Lab1 Features Selections-Class-GI2
25 pages
Extracted Notebook Content
No ratings yet
Extracted Notebook Content
17 pages
Project Amazon Sales Data Analysis
No ratings yet
Project Amazon Sales Data Analysis
12 pages
GRL - EX - 4 (1) .Ipynb - Colaboratory
No ratings yet
GRL - EX - 4 (1) .Ipynb - Colaboratory
7 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Advance Python
No ratings yet
Advance Python
5 pages
Churn Prediction Model
No ratings yet
Churn Prediction Model
36 pages
Clustering
No ratings yet
Clustering
53 pages
Data Preprocessing 1
No ratings yet
Data Preprocessing 1
6 pages
Big Sales Mart Final Script PDF
No ratings yet
Big Sales Mart Final Script PDF
36 pages
Data Wrangling Notebook Summary
No ratings yet
Data Wrangling Notebook Summary
9 pages
Practicals
No ratings yet
Practicals
42 pages
Data Cleaning and Preprocessing Techniques
No ratings yet
Data Cleaning and Preprocessing Techniques
13 pages
Classification - Bank - Marketing - Dataset - Jupyter Notebook
No ratings yet
Classification - Bank - Marketing - Dataset - Jupyter Notebook
23 pages
Data Pre-Processing Steps
No ratings yet
Data Pre-Processing Steps
32 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
09 Crypto Intro
No ratings yet
09 Crypto Intro
46 pages
Digital Signature in India
No ratings yet
Digital Signature in India
7 pages
Digitize The Line A
No ratings yet
Digitize The Line A
2 pages
Numpy Pandas Matplotlib
No ratings yet
Numpy Pandas Matplotlib
70 pages
ST1 Lesson4 Probability Distribution
No ratings yet
ST1 Lesson4 Probability Distribution
19 pages
Greenwood (LPI Swaps - Pricing and Trading - Uk
No ratings yet
Greenwood (LPI Swaps - Pricing and Trading - Uk
26 pages
CH 2 - Nonlinear Equations
No ratings yet
CH 2 - Nonlinear Equations
22 pages
Design and Analysis of Algorithms CSC201: Shahid Hussain
No ratings yet
Design and Analysis of Algorithms CSC201: Shahid Hussain
18 pages
Lec 10
No ratings yet
Lec 10
16 pages
OBC-YOLOv8 - An Improved Road Damage Detection Model Based On YOLOv8
No ratings yet
OBC-YOLOv8 - An Improved Road Damage Detection Model Based On YOLOv8
15 pages
Simple Linear Equations A Through H
No ratings yet
Simple Linear Equations A Through H
20 pages
CSE B.SC Operating System 12th Batch
No ratings yet
CSE B.SC Operating System 12th Batch
2 pages
CS2403 Digital Signal Processing
No ratings yet
CS2403 Digital Signal Processing
54 pages
Advanced Methods For Complex Network Analysis
No ratings yet
Advanced Methods For Complex Network Analysis
2 pages
(2018) 30408 - Exam - Midterm
No ratings yet
(2018) 30408 - Exam - Midterm
3 pages
MFDM
100% (1)
MFDM
2 pages
Material: Glad & Ljung Ch. 12.2 Khalil Ch. 4.1-4.3 Lecture Notes
No ratings yet
Material: Glad & Ljung Ch. 12.2 Khalil Ch. 4.1-4.3 Lecture Notes
42 pages
Resume Sudeep - Fadadu
No ratings yet
Resume Sudeep - Fadadu
2 pages
Problem Solving Lesson 5 Problem Solving Strategies Objective: at The End of This Lesson, You Should Be Able To
No ratings yet
Problem Solving Lesson 5 Problem Solving Strategies Objective: at The End of This Lesson, You Should Be Able To
3 pages
Monte Carlo
No ratings yet
Monte Carlo
59 pages
Statistics For Management
No ratings yet
Statistics For Management
7 pages
Sample Final AI
No ratings yet
Sample Final AI
9 pages
Building Your Own Supervised Learning Model: This Project (Both Part 1 and 2) Is Due Friday, Feb. 26, 11:59 PM PST
No ratings yet
Building Your Own Supervised Learning Model: This Project (Both Part 1 and 2) Is Due Friday, Feb. 26, 11:59 PM PST
9 pages
AF1 BSLimitations PDF
No ratings yet
AF1 BSLimitations PDF
18 pages
.E-Commerce Product Rating Based On Customer Review Mining
No ratings yet
.E-Commerce Product Rating Based On Customer Review Mining
4 pages
NXN Application To Cryptography
No ratings yet
NXN Application To Cryptography
34 pages
Himanshu
No ratings yet
Himanshu
2 pages
Introduction To Computational Science
No ratings yet
Introduction To Computational Science
9 pages
Data Visualization-5
No ratings yet
Data Visualization-5
14 pages
Midterm 1 2024
No ratings yet
Midterm 1 2024
8 pages