0% found this document useful (0 votes)

20 views6 pages

CB0494 Notes

The document discusses various machine learning techniques, including classification, clustering, and anomaly detection, emphasizing their applications in data science. It covers the use of libraries like Pandas, NumPy, Matplotlib, and Seaborn for data manipulation and visualization, along with methods for data preparation and statistical analysis. Additionally, it touches on linear regression, model fitting, and the importance of understanding fitness indicators in supervised learning.

Uploaded by

ansonsee236

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views6 pages

CB0494 Notes

Uploaded by

ansonsee236

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Tutorial 1

Discussion 1

 Classification is Supervised learning techniques – data is being

trained to set a model; hence historical data is used to predicts the
future trends. e.g. Property agents using historical data to build a
classifier, use the feature to input into classifier and give a result.
 Clustering is unsupervised Learning techniques – grouping the
similar data together, e.g. comparing house prices within the same
block.
 Detection in Anomalies e.g. Bank tracking our transactions (prevent
further losses).
 AI through Adaptive learning process e.g. Aviation Co-piloting using
AI To Improve Safety.
Relate problem with data science questions is the hardest.
 To know how many types of whales in a place during a timing:
Most suitable data is to use clustering (lack of data).

Question 1

 Text box is just html code

 Import libraries: Pandas (Standard)
 String Value (“red color”), computer will not read it as a variable
 Name house data as the variable: data frame variable, its able to
call all the panda’s function directly.
 . head (can input numbers to see more rows) is checking of data and
give a short summary for data.
 Dimensions of data:. shape (row * column)
 .dtypes give type of dataset
 .info gives the info and number of non-null vales of dataset
data type float64(memory bused to store number, decimal places),
int64(number of integer), object
 .descibe gives stats info of column with numerical values(not with
the objects)

 Import data from the internet, remember to check the Wi-Fi

connection, easier method to import files
 Type (dataset) to check the validity of the data
 Len tells how many tables
 () for functions {} for the lease for arrays; computer counts no from
0
 We want [: top] to get the top 20 result.
 Bonus Problem: for loop method uses 1 time memory only
 X is int, convert X into string and input into the import of the files
 Check after importing the files
 .append to insert the selected dataset into the blank list
 Integrate the printing of result into the for-loop codes
Tutorial 2

Discussion 2

 Classification

Question 2

 More Library
NumPy : Library for Numeric Computations in Python
Pandas : Library for Data Acquisition and Preparation
Matplotlib : Low-level library for Data Visualization
Seaborn : Higher-level library for Data Visualization
 Warning Messages?
 # Basic Libraries
import matplotlib.pyplot (lesser tool needed) as plt # we only need
pyplot
 # Data Preparation
1(a)Import CSV file.
1(c)Extract only the needed data – 2 Methods

M1 Loc command (very lean,resource efficient,not so user

friendly,need to use loop for multiple string values)
:(all the rows needed)
== matching statement of LHS and RHS variable,compare and
extract each of LHS values with the RHS ‘np.int64’ (‘’becomes a
string value instead of a variable) into housedataNUM

M2 select_dtypes (user friendly,more consumption on resources.)

(do not put include to,able to extract multiple string (e.g
float64),cuz it is not a mathematical equations)

1(e) drop the unwanted string value(not ‘int64’)

Axis = 1 (dropping column, axis =0 means dropping rows)
Need to assign drop to the original one (update on the existing,
lesser variable, need to restart entire play if amendments are made)
Or define a new variable

 Find Statistics, using pd.data frame variable

Dataset Variable (Proper Table) vs series variable (values is not
presented well)
. describe extraction of statisctical values (how to extract solely
median from the dataset?)a
Data Visualization plt.figure(do not use subplot) (size of this
canvas,length * height)
Called out function for plotting. Boxplot, (adjust 3 parameters of
figure, orientation, color, which data using – assignment statement
sequence does not matters)
.histplot to plot histogram, .violinplot (combined of box and histplot)

2(c) lotarea
Present 6 plot together – Prepare the figure
F,axes(2 output variable) = Plt.subplot(2(Row),3(Column) cannot
change the number)
For the fifth figure: In order for the his to be green need to include
x=’lot area’

2(e) .reindex(only if combining data from different table that have

different indexes)
Sb.joint plot
Strong relationship will be increasing linearly
Correlation – CRI
Tutorial 3

 Statistics = .describe
 Statistics is not a function but a variable
 Same library same data
 .skew
 Find Total Number of outliners(for-loop)
Temp = pd. (extract data)
Compute Q1,Q3 using .quantile
Using | to check whether is it an outliners
*Extract data from .describe*
 Plot the lotArea using for loop again
Count = 0
Additional input(x=Var, color=color [count])
Count += to choose different column when moving to other plotting
*Last Week’s Homework*
 .corr
Sb.heatmap(linewidth=1(white column boxes))
 .dtypes change object to categorical data
 Sb.catplot


Tutorial 4:
 Indicators of fitness
R2 (Need to know upper, lower limit, does it logical in DS (only
interested range in the positive region),)> MSE
 Linear Regression is a supervised learning, (used historical data to
train model)
 1(c) Split the data into train and test sets (orderly splitting)
Retrieve the rows become individual values
 .fit to do LR. On training set
 Linreg.intercept to extract y intercept
Linreg.coefficients to extract coeffiecients
HW:?
 Undefitting vs overfitting

Record of Processing Activities RoPA - Template
No ratings yet
Record of Processing Activities RoPA - Template
13 pages
Power BI Resume 04
No ratings yet
Power BI Resume 04
6 pages
Hard Disk Basic
100% (1)
Hard Disk Basic
27 pages
TR - Reinforcing Steel Works NC II
No ratings yet
TR - Reinforcing Steel Works NC II
46 pages
Intro To MongoDB
100% (1)
Intro To MongoDB
13 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
Basic Linux Commands: Mkdir - Make Directories
No ratings yet
Basic Linux Commands: Mkdir - Make Directories
5 pages
The Advantages and Disadvantages of Vector and Raster Data For GIS
100% (5)
The Advantages and Disadvantages of Vector and Raster Data For GIS
2 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Data Science
No ratings yet
Data Science
18 pages
SCORE Oracle v3.1
No ratings yet
SCORE Oracle v3.1
149 pages
EDA Document
No ratings yet
EDA Document
13 pages
Comm 226 Assignment #2 Database
No ratings yet
Comm 226 Assignment #2 Database
6 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Study Material For XII Computer Science On: Data Visualization Using Pyplot
No ratings yet
Study Material For XII Computer Science On: Data Visualization Using Pyplot
22 pages
Letter Writing and Types of Letters
No ratings yet
Letter Writing and Types of Letters
16 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
Data Science Interview Questions and Answers For 2020 PDF
No ratings yet
Data Science Interview Questions and Answers For 2020 PDF
20 pages
Python MCQ Questions and Answers 2021-22 Sample Paper 2
100% (1)
Python MCQ Questions and Answers 2021-22 Sample Paper 2
1 page
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
ML File
No ratings yet
ML File
37 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Web Engineering: Presentation
No ratings yet
Web Engineering: Presentation
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
VJA 3330 3130 Lesson 5 Communication March 2015
No ratings yet
VJA 3330 3130 Lesson 5 Communication March 2015
38 pages
DBMS Functions Data, Storage, Retrieval, and Update
No ratings yet
DBMS Functions Data, Storage, Retrieval, and Update
14 pages
DBMS
No ratings yet
DBMS
15 pages
VSP 70-02-80-00-Mo92
No ratings yet
VSP 70-02-80-00-Mo92
8 pages
Save - Delete - Update C#
No ratings yet
Save - Delete - Update C#
4 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Introduction To HBase
No ratings yet
Introduction To HBase
14 pages
Project Report Full and Final 02102020
No ratings yet
Project Report Full and Final 02102020
53 pages
Assessing The Analytical Skills
No ratings yet
Assessing The Analytical Skills
21 pages
Sireesha Gundala: Technical Skills
No ratings yet
Sireesha Gundala: Technical Skills
2 pages
Railway Route Optimization
No ratings yet
Railway Route Optimization
2 pages
Data Science
No ratings yet
Data Science
15 pages
Module 2
No ratings yet
Module 2
20 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
16 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Bigdatamcq mcq1
No ratings yet
Bigdatamcq mcq1
21 pages
Project Report
No ratings yet
Project Report
37 pages
Nguyen Thanh Dat - CV
No ratings yet
Nguyen Thanh Dat - CV
1 page
Lab 03
No ratings yet
Lab 03
32 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Data Science
No ratings yet
Data Science
42 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
3.PH. Dict - Nics
No ratings yet
3.PH. Dict - Nics
125 pages
Python For Statistics
No ratings yet
Python For Statistics
40 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Datascience Internship
No ratings yet
Datascience Internship
43 pages
Gaurav Yadav Resume
No ratings yet
Gaurav Yadav Resume
1 page
Dataanalysis Finals123
No ratings yet
Dataanalysis Finals123
36 pages
Introduction To Forensic Toolkit (FTK)
No ratings yet
Introduction To Forensic Toolkit (FTK)
8 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
External
No ratings yet
External
11 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Class Xii (Informatics Practices) Half Yearly QP & Ms Ernakulam Region
No ratings yet
Class Xii (Informatics Practices) Half Yearly QP & Ms Ernakulam Region
5 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
End Semester Answer Key Format-Fods
No ratings yet
End Semester Answer Key Format-Fods
8 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
ML Manual
No ratings yet
ML Manual
21 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Starfish Description by Farmer
No ratings yet
Starfish Description by Farmer
7 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Questioning Grid
No ratings yet
Questioning Grid
7 pages
Feature Extraction and Dimensionality Reduction - 2
No ratings yet
Feature Extraction and Dimensionality Reduction - 2
75 pages
Data Science
No ratings yet
Data Science
10 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Anil DS Project
No ratings yet
Anil DS Project
33 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Conservation Biology - 2025 - Detoeuf - Gap Analysis of Social Science Resources For Conservation Practice
No ratings yet
Conservation Biology - 2025 - Detoeuf - Gap Analysis of Social Science Resources For Conservation Practice
9 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
DXE 24gksmknvj
No ratings yet
DXE 24gksmknvj
16 pages
Ids Lab
No ratings yet
Ids Lab
14 pages
Viva
No ratings yet
Viva
7 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet

CB0494 Notes

Uploaded by

CB0494 Notes

Uploaded by

Tutorial 1

 Classification is Supervised learning techniques – data is being

 Text box is just html code

 Import data from the internet, remember to check the Wi-Fi

M1 Loc command (very lean,resource efficient,not so user

M2 select_dtypes (user friendly,more consumption on resources.)

1(e) drop the unwanted string value(not ‘int64’)

 Find Statistics, using pd.data frame variable

2(e) .reindex(only if combining data from different table that have

You might also like