Basic coding in Python: part 4.
0
Making loops in Python
• while loops: while loop are used to execute a set of statements as long as a condition is true
+= means x = x + value
Making loops in Python
• while loops: while loop are used to execute a set of statements as long as a condition is true
• Loops allow for repeating a piece of code
Loops and breaks
break statement can stop the loop even if the while condition is true
Loops and continue statements
continue statements can stop the current iteration and continue with the next
Note that number 3 is missing from the series
Loops and else statements
else statements run a block of code once a condition stops being true
Basic coding in Python: part 5.0
Step 1. Click on the symbol with 3 dots
Step 2. Click on Upload File > Upload file “data.csv” from your local computer
- the file is also uploaded on the Course page -
Step 3. Import the database file into replit by executing the following 3 lines of code
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
Step 4. Import libraries with the following 4 lines of code
import numpy
from scipy import stats
import math
import matplotlib.pyplot as plt
Step 5: Statistics for the Age column - Define mean, median and standard deviation (std)
by executing the following 6 lines of code
median = numpy.median(df['age'])
print(median)
mean = numpy.mean(df['age'])
print(mean)
std = numpy.std(df['age'])
print(std)
Step 5: Statistics for the Age column - Define mean, median and standard deviation (var)
by executing the following 6 lines of code
Mean = 55 years
Median = 54.3 years
Stand. Dev = 9 years
Step 6: Let’s organize the data in rows
Input the following code:
categorical_val = []
continous_val = []
for column in df.columns:
print('==============================')
print(f"{column} : {df[column].unique()}")
if len(df[column].unique()) <= 10:
categorical_val.append(column)
else:
continous_val.append(column)
Step 6: Let’s organize the data in rows
Input the following code:
Step 6: Let’s organize the data in rows
Cp - Chest pain
Trestbps - resting blood pressure (above 135 is of concern)
Cholesterol - greater than 200 is of concern
resting EKG (1 = an abnormal heart rhythm, which can range from
mild symptoms to severe problems)
Thalach - maximum heart rate achieved (over 140 is more likely to
have heart disease)
exercise-induced angina
Ca: number of major vessels (more blood movement = better, so
people with CA = 0 are more likely to have heart disease)
thal - thalium stress result (flow of blood through coronary)
Step 7: Before applying a machine learning algorithm, let’s prepare a function to classify data
Step 7: Before applying a machine learning algorithm, let’s prepare a function to classify data
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
def print_score(clf, X_train, y_train, X_test, y_test, train=True):
if train:
pred = clf.predict(X_train)
clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True))
print("Train Result:\n================================================")
print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:\n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n")
elif train==False:
pred = clf.predict(X_test)
clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True))
print("Test Result:\n================================================")
print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:\n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n")
Step 8: Let’s split the data into training and test sets
Usually, data split is 70% of the total for the training set and 30% testing set
Class #4, slide #5
Step 8: Let’s split the data into training and test sets
Data split is 70% for training set and 30% for testing set
from sklearn.model_selection import train_test_split
X = df.drop('target', axis=1)
y = df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Step 9: Let’s train the machine learning model with a logistic regression model:
Class #2, slide #27
This model only accounts for 2 parameters:
tumor size and abnormality score
Step 9: Let’s train the machine learning model with a logistic regression model:
Step 8: Let’s train the machine learning model with a logistic regression model:
from sklearn.linear_model import LogisticRegression
lr_clf = LogisticRegression(solver='liblinear')
lr_clf.fit(X_train, y_train)
print_score(lr_clf, X_train, y_train, X_test, y_test, train=True)
print_score(lr_clf, X_train, y_train, X_test, y_test, train=False)
Step 8: Let’s train the machine learning model with a logistic regression model:
the model performs well: the test set has almost the same accuracy as the training set
Class #5, slide#9
Step 8: Let’s train the machine learning model with a logistic regression model:
Class #5, slide #7: Fewer examples
(30% of whole dataset)