COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence
COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence
TA
Tianwen CHEN
Dandan LIN
Min XIE
Sepanta ZEIGHAMI
Yinghua ZHANG
Overview 2
Course Details
Webpage
http://course.cse.ust.hk/comp4332/
Overview 3
Course Details
Lecture
Time: Wednesday and Friday (13:30-14:50)
Venue: G010 (CYT Building)
Tutorial
Time: Tue (18:00-18:50)
Venue: Rm 4619 (Lift 31/32)
Overview 4
Course Details
The course code of this course is
COMP4332
RMBI4310
These 2 courses are co-listed.
This is a 4000-level course.
This course is an “advanced” Computer
Science course.
This course is very challenging if your “basic”
skill needs improvement.
Overview 5
Course Details
In this course, you are required to have a lot
of programming tasks.
Thus, you are expected that you could write a
program well in one particular programming
language (e.g., C++, Java and Python)
In this course, we focus on Python which
becomes more and more popular nowadays
outside now.
Overview 6
Course Details
Even if you do not know Python well, you
could learn Python quickly (since you know
another programming language).
We will have a “special” lecture about Python
programming in the following period.
Date: 3 Feb (Sat)
Overview 7
Course Details
Grading Scheme:
Assignment 10%
Project 30%
Mid-Term Exam 20%
Final Exam 40%
Overview 8
Assignment
2 assignments
Assignment 1
Content before the mid-term exam
Assignment 2
Content after the mid-term exam
NOTE: No late submissions are allowed.
Assignment 10%
Project 30%
Mid-Term Exam 20%
Final Exam 40%
Overview 9
Assignment
If the students can answer the selected
questions in class correctly,
for each correct answer,
I will give him/her a coupon
This coupon can be used to waive one
question in an assignment
which means that s/he can get full marks for
this question without answering this question
Overview 10
Assignment
Guideline
For each assignment, each student can waive at most
one question only.
s/he can waive any question s/he wants and obtain full marks
for this question (no matter whether s/he answer this
question or not)
s/he may also answer this question. But, we will also mark it
but will give full marks to this question.
When the student submits the assignment,
please staple the coupon to the submitted assignment
please write down the question no. s/he wants to waive on
the coupon
Overview 11
Project
Phase 1 (program)
Phase 2 (design report and script)
Phase 3 (program)
Phase 4 (design report and program)
Phase 5 (design report)
Phase 6 (final report and program)
Assignment 10%
Project 30%
Mid-Term Exam 20%
Overview
Final Exam 40% 12
Project
You are required to form a group.
Each group contains 1 or 2 members.
3-member group is NOT allowed.
Please fill in the following information of each member in the
link
https://docs.google.com/forms/d/e/1FAIpQLSemuEZ5S0_w7
Qu47qlo3iFNnrWfxnk8U40Yge6eqrveUjuk3w/viewform
student ID
student name
Email
Overview 14
Project
Each coupon can be used to add 20% of the
total score of the final program part
Each coupon is used once only.
After it is used, it could not be used again.
The coupon is non-transferrable. That is, the
coupon with a unique ID can be used only by
the student who obtained it in class.
Please bring your coupon(s) in the
demonstration session.
Overview 15
Midterm and Final Exam
You are allowed to bring a calculator with you.
Please remember to prepare a calculator for the exam
It is an open notes/book exam.
Please remember to bring the “printed” version of the
notes/books for the exam.
No electronic devices (except your calculator) could be
used in the exam.
For example, you could not use your laptop and your
phone in the exam. Assignment 10%
Project 30%
Mid-Term Exam 20%
Overview Final Exam 40% 16
Midterm Exam
In-class Midterm
Overview 17
Course Content
In this course, you are expected to
learn something related to “Big Data”.
Not only this!
Overview 18
Big Data
There are a lot of data anywhere
nowadays
Web data (e.g., webpages and social
network data)
Purchase records from supermarkets and
shops
Transaction records from bank/credit card
companies
Overview 19
Nature of Big Data
3Vs
Volume
Velocity
Variety
Overview 20
Big Data (Volume)
In the internet age, there are many
data generated nowadays due to a lot
of electronic devices
Overview 21
Big Data (Velocity)
Speed of Data Arrival
Static Data
Data Stream
Overview 22
Big Data (Velocity) - Data
Mining over Static Data
1. Association
2. Clustering
3. Classification
Static Output
Data (Data Mining Results)
Overview 23
Big Data (Velocity) - Data
Mining over Data Streams
1. Association
2. Clustering
3. Classification
… Output
Unbounded Data (Data Mining Results)
Overview
Real-time Processing 24
Big Data (Variety)
Type of Data Traditional data form
Overview 25
Analyzing Big Data
Since there are a lot of data, we could
discover non-trivial knowledge over big
data (with data mining/data analytics
techniques)
This knowledge could help us for
decision-making
Overview 26
Process of Data Analytics
There are the following processes in
data analytics
Data Collection
Data Processing
Data Mining (or Data Analytics)
Result Presenting
Overview 27
Data Data
Collection Processing
Data Result
Mining Presenting
Data Result
Mining Presenting
Overview 31
Data Data
Collection Processing
Data Result
Mining Presenting
Overview 33
In this course, we will learn the
following in “Data Processing”
Data Reading
Data Transforming
Overview 34
Data Data
Collection Processing
Data Result
Mining Presenting
Data
Mining
Overview 36
In this course, we will learn the
following in “Data Mining”
Some data mining models
A high-level tool for data mining models
(called “Keras”) based on a low-level tool
for data mining models (called
“TensorFlow”)
Overview 37
Data Data
Collection Processing
Data Result
Mining Presenting
Result
Presenting
Overview 40
We have just given and just elaborated
the following processes in data analytics
Data Collection
Data Processing
Data Mining (or Data Analytics)
Result Presenting
Overview 41
Note that the above processes handled
one of V’s in Big Data “explicitly”
Which V’s?
Overview 42
There are 2 remaining V’s to be
handled.
Which two V’s?
Overview 43
In this course, we will learn the
following technology which could be
used to handle them
Distributed Data Management (Spark)
Which V?
Which V?
Overview 44
Summary
In this course, we will learn the following.
Data Collection
Data Crawling
SQL
NoSQL
Data Processing
Python Libraries
Data Mining
Data Mining Models
Result Presenting
Matplotlib
Overview 45