TP1 - Machine Learning h
TP1 - Machine Learning h
Lab1 : Review of the core modules NumPy, Pandas, and Matplotlib for
successful data preparation
This lab’s objective : getting started with the most used python libraries in the preprocessing
phase of the machine learning projects
Note : Before starting, open a new google colab/jupyter notebook python file, and import the
datasets : adults.csv, amazon.csv and apple.csv
Math_grades = [80, 50, 60, 70, 60, 100, 70, 70, 60, 70]
Science_grades = [90, 80, 50, 50, 60, 50, 90, 70, 80, 80]
History_grades = [60, 90, 50, 90, 100, 100, 100, 100, 90, 70]
1 - Create a code using NumPy that calcuate and report their grade average.
2 - Create a better looking report under the format (ex: Average grade of student Jevon
is: … )
The Library Pandas : Loading datasets, going through its rows and columns
Example 1: We start by importing the library pandas, and loading our dataset as follows :
Example 2 : To go through the dataset, we should use the loc and iloc functions :
Exercise : the loc vs iloc functions
Use the adult.csv dataset and run the codes shown in the following Screenshots. Then answer
the questions.
a) Use the output to answer what is the difference in the behavior of .loc and .iloc
when it comes to slicing.
b) Without running but by only looking at the data, what will be the output of
adult_df.loc['10000':'10003', 'relationship':'sex'].
c) Without running but by only looking at the data, what will be the output of
adult_df.iloc[0:3, 7:9].
Example 3 : Exploring the dataset further using Pandas and Matplotlib
Exercise :
a- Write a python code to group adults by race, sex, income and the mean of the fnlwgt
feature.
b- Calculate the mean and median of capitalLoss and capitalGain for every race in the
data.
c- Visualise the distribution of the capitalGain of the adults dataset.
Exercise :
Given two datasets : Amazon Stock.csv and Apple Stock.csv :
d- Can we specify the number of rows to be displayed in the head function ? is there a
defaut one ?
e- Write a python code to display statistical information about the two datasets
f- Display a list containting only the first two features of the Amazon stock dataset.
g- Build a plot displaying the closing price for Apple, using a boxplot ? what does this
graph reflect ?
h- Build a plot displaying the closing price for Amazon, what can you say about it ?
i- Make a plot displaying both distributions for the closing price of Apple and Amazon
j- Make sure you add the title and legend to the plot
k- Calculate the mean, median values for the closing price.