CB0494 Notes
CB0494 Notes
Discussion 1
Question 1
Discussion 2
Classification
Question 2
More Library
NumPy : Library for Numeric Computations in Python
Pandas : Library for Data Acquisition and Preparation
Matplotlib : Low-level library for Data Visualization
Seaborn : Higher-level library for Data Visualization
Warning Messages?
# Basic Libraries
import matplotlib.pyplot (lesser tool needed) as plt # we only need
pyplot
# Data Preparation
1(a)Import CSV file.
1(c)Extract only the needed data – 2 Methods
2(c) lotarea
Present 6 plot together – Prepare the figure
F,axes(2 output variable) = Plt.subplot(2(Row),3(Column) cannot
change the number)
For the fifth figure: In order for the his to be green need to include
x=’lot area’
Statistics = .describe
Statistics is not a function but a variable
Same library same data
.skew
Find Total Number of outliners(for-loop)
Temp = pd. (extract data)
Compute Q1,Q3 using .quantile
Using | to check whether is it an outliners
*Extract data from .describe*
Plot the lotArea using for loop again
Count = 0
Additional input(x=Var, color=color [count])
Count += to choose different column when moving to other plotting
*Last Week’s Homework*
.corr
Sb.heatmap(linewidth=1(white column boxes))
.dtypes change object to categorical data
Sb.catplot
Tutorial 4:
Indicators of fitness
R2 (Need to know upper, lower limit, does it logical in DS (only
interested range in the positive region),)> MSE
Linear Regression is a supervised learning, (used historical data to
train model)
1(c) Split the data into train and test sets (orderly splitting)
Retrieve the rows become individual values
.fit to do LR. On training set
Linreg.intercept to extract y intercept
Linreg.coefficients to extract coeffiecients
HW:?
Undefitting vs overfitting