0% found this document useful (0 votes)
167 views45 pages

COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence

This document provides an overview of the COMP4332/RMBI4310 Big Data Mining and Management course. It outlines details such as the instructor, TAs, schedule, topics covered including big data characteristics and analytics processes, assignments, project, and exams. The course focuses on advanced data mining techniques for big data and risk management using Python programming.

Uploaded by

Nilesh Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views45 pages

COMP4332/RMBI4310: Big Data Mining and Management Advanced Data Mining For Risk Management and Business Intelligence

This document provides an overview of the COMP4332/RMBI4310 Big Data Mining and Management course. It outlines details such as the instructor, TAs, schedule, topics covered including big data characteristics and analytics processes, assignments, project, and exams. The course focuses on advanced data mining techniques for big data and risk management using Python programming.

Uploaded by

Nilesh Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

COMP4332/RMBI4310

Big Data Mining and Management


Advanced Data Mining for Risk
Management and Business Intelligence
Overview
Prepared by Raymond Wong
Presented by Raymond Wong
Overview 1
Course Details
 Instructor
 Dr. Raymond Wong

 TA
 Tianwen CHEN

 Ken Chung Hin KWOK

 Dandan LIN

 Min XIE

 Sepanta ZEIGHAMI

 Yinghua ZHANG

Overview 2
Course Details
 Webpage
 http://course.cse.ust.hk/comp4332/

Overview 3
Course Details
 Lecture
 Time: Wednesday and Friday (13:30-14:50)
 Venue: G010 (CYT Building)
 Tutorial
 Time: Tue (18:00-18:50)
 Venue: Rm 4619 (Lift 31/32)

Overview 4
Course Details
 The course code of this course is
 COMP4332
 RMBI4310
 These 2 courses are co-listed.
 This is a 4000-level course.
 This course is an “advanced” Computer
Science course.
 This course is very challenging if your “basic”
skill needs improvement.

Overview 5
Course Details
 In this course, you are required to have a lot
of programming tasks.
 Thus, you are expected that you could write a
program well in one particular programming
language (e.g., C++, Java and Python)
 In this course, we focus on Python which
becomes more and more popular nowadays
outside now.

Overview 6
Course Details
 Even if you do not know Python well, you
could learn Python quickly (since you know
another programming language).
 We will have a “special” lecture about Python
programming in the following period.
 Date: 3 Feb (Sat)

 Time: 9am-6pm (with breaks)

(with a lunch period (12 noon-1pm))


 Venue: G010 (CYT Building)

Overview 7
Course Details
 Grading Scheme:
 Assignment 10%
 Project 30%
 Mid-Term Exam 20%
 Final Exam 40%

Overview 8
Assignment
 2 assignments
 Assignment 1
 Content before the mid-term exam
 Assignment 2
 Content after the mid-term exam
 NOTE: No late submissions are allowed.
Assignment 10%
Project 30%
Mid-Term Exam 20%
Final Exam 40%
Overview 9
Assignment
 If the students can answer the selected
questions in class correctly,
 for each correct answer,
I will give him/her a coupon
 This coupon can be used to waive one
question in an assignment
 which means that s/he can get full marks for
this question without answering this question

Overview 10
Assignment
 Guideline
 For each assignment, each student can waive at most
one question only.
 s/he can waive any question s/he wants and obtain full marks
for this question (no matter whether s/he answer this
question or not)
 s/he may also answer this question. But, we will also mark it
but will give full marks to this question.
 When the student submits the assignment,
 please staple the coupon to the submitted assignment
 please write down the question no. s/he wants to waive on
the coupon

Overview 11
Project
 Phase 1 (program)
 Phase 2 (design report and script)
 Phase 3 (program)
 Phase 4 (design report and program)
 Phase 5 (design report)
 Phase 6 (final report and program)
Assignment 10%
Project 30%
Mid-Term Exam 20%
Overview
Final Exam 40% 12
Project
 You are required to form a group.
 Each group contains 1 or 2 members.
 3-member group is NOT allowed.
 Please fill in the following information of each member in the
link
https://docs.google.com/forms/d/e/1FAIpQLSemuEZ5S0_w7
Qu47qlo3iFNnrWfxnk8U40Yge6eqrveUjuk3w/viewform
 student ID
 student name
 Email

 One group needs to submit the grouping information ONCE.


 The group forming deadline is 15 Feb (Thu) 11am.
Overview 13
Project
 In the project, you could use coupons
to obtain scores in the “final program”
part of Phase 6 (not the “final report”
part)
 Each group are allowed to use at most
2 coupons in this part.

Overview 14
Project
 Each coupon can be used to add 20% of the
total score of the final program part
 Each coupon is used once only.
 After it is used, it could not be used again.
 The coupon is non-transferrable. That is, the
coupon with a unique ID can be used only by
the student who obtained it in class.
 Please bring your coupon(s) in the
demonstration session.

Overview 15
Midterm and Final Exam
 You are allowed to bring a calculator with you.
 Please remember to prepare a calculator for the exam
 It is an open notes/book exam.
 Please remember to bring the “printed” version of the
notes/books for the exam.
 No electronic devices (except your calculator) could be
used in the exam.
 For example, you could not use your laptop and your
phone in the exam. Assignment 10%
Project 30%
Mid-Term Exam 20%
Overview Final Exam 40% 16
Midterm Exam
 In-class Midterm

 Date: 23 March (Fri)


 Time: 1:30pm-2:50pm
 Venue: G010 (CYT Building) and
Rm 5619 (Academic Building)

Overview 17
Course Content
 In this course, you are expected to
learn something related to “Big Data”.
Not only this!

 In this course, you are expected to


learn how to solve problems and how to
analyze problems.
This is very important to your future.

Overview 18
Big Data
 There are a lot of data anywhere
nowadays
 Web data (e.g., webpages and social
network data)
 Purchase records from supermarkets and
shops
 Transaction records from bank/credit card
companies

Overview 19
Nature of Big Data
 3Vs
 Volume
 Velocity
 Variety

Overview 20
Big Data (Volume)
 In the internet age, there are many
data generated nowadays due to a lot
of electronic devices

Overview 21
Big Data (Velocity)
 Speed of Data Arrival
 Static Data
 Data Stream

Overview 22
Big Data (Velocity) - Data
Mining over Static Data
1. Association
2. Clustering
3. Classification

Static Output
Data (Data Mining Results)

Overview 23
Big Data (Velocity) - Data
Mining over Data Streams
1. Association
2. Clustering
3. Classification

… Output
Unbounded Data (Data Mining Results)

Overview
Real-time Processing 24
Big Data (Variety)
 Type of Data Traditional data form

 Relational Data (e.g. purchase records and


transaction records)

 Non-relational Data New data form

 Document data (e.g., webpages)


 Graph data (e.g., social network data)

Overview 25
Analyzing Big Data
 Since there are a lot of data, we could
discover non-trivial knowledge over big
data (with data mining/data analytics
techniques)
 This knowledge could help us for
decision-making

Overview 26
Process of Data Analytics
 There are the following processes in
data analytics
 Data Collection
 Data Processing
 Data Mining (or Data Analytics)
 Result Presenting

Overview 27
Data Data
Collection Processing

Raw Data Collected Data Processed Data

Data Result
Mining Presenting

Processed Data Data Mining Results Presentable


Forms of Data
Mining Results
Overview 28
Data Data
Collection Processing

Raw Data Collected Data Processed Data

Data Result
Mining Presenting

Processed Data Data Mining Results Presentable


Forms of Data
Mining Results
Overview 29
Relational data is stored with
the technology of “traditional”
relational database
management system.

Data This system could be manipulated


with a database programming
Collection language called SQL (Structured
Query Language).
Raw Data Collected Data

e.g., purchase records and transaction records Relational data


e.g., webpages and social network data Non-relational data

Non-relational data is stored with


We know how to “access” the TEXT the technology of “new” non-
file (e.g., file reading) relational database management
system.
We could also “access” the webpages This system could be manipulated
(i.e., data crawling) with a database programming
language called NoSQL (Not Only
SQL).
Overview 30
 In this course, we will learn the
following in “Data Collection”
 Data Crawling
 SQL
 NoSQL

Overview 31
Data Data
Collection Processing

Raw Data Collected Data Processed Data

Data Result
Mining Presenting

Processed Data Data Mining Results Presentable


Forms of Data
Mining Results
Overview 32
Data
Processing

Collected Data Processed Data

We have to transform and extract the


collected data in the “correct” form so that
that form could be used for the data mining
models to be used in the next process

Overview 33
 In this course, we will learn the
following in “Data Processing”
 Data Reading
 Data Transforming

Overview 34
Data Data
Collection Processing

Raw Data Collected Data Processed Data

Data Result
Mining Presenting

Processed Data Data Mining Results Presentable


Forms of Data
Mining Results
Overview 35
We have to define some “data mining”
models to perform some “data mining”
tasks
We could call many existing libraries to
complete these “data mining” tasks

Data
Mining

Processed Data Data Mining Results

Overview 36
 In this course, we will learn the
following in “Data Mining”
 Some data mining models
 A high-level tool for data mining models
(called “Keras”) based on a low-level tool
for data mining models (called
“TensorFlow”)

Overview 37
Data Data
Collection Processing

Raw Data Collected Data Processed Data

Data Result
Mining Presenting

Processed Data Data Mining Results Presentable


Forms of Data
Mining Results
Overview 38
We have to present the data mining results
in a “readable” form and a “presentable”
form

Some data mining results could be


presented directly.
Some other data mining results could be
presented better by using some existing
visualization libraries.

Result
Presenting

Data Mining Results Presentable


Forms of Data
Mining Results
Overview 39
 In this course, we will learn the
following in “Resulting Presenting”
 Visualization tools which could present data
mining results in a better form (e.g.,
Matplotlib)

Overview 40
 We have just given and just elaborated
the following processes in data analytics
 Data Collection
 Data Processing
 Data Mining (or Data Analytics)
 Result Presenting

Overview 41
 Note that the above processes handled
one of V’s in Big Data “explicitly”
 Which V’s?

Overview 42
 There are 2 remaining V’s to be
handled.
 Which two V’s?

Overview 43
 In this course, we will learn the
following technology which could be
used to handle them
 Distributed Data Management (Spark)
Which V?

This is a technology using distributed data management for


handling data streams

Which V?

Overview 44
Summary
 In this course, we will learn the following.
 Data Collection
 Data Crawling

 SQL

 NoSQL

 Data Processing
 Python Libraries

 Data Mining
 Data Mining Models

 Keras (on TensorFlow)

 Result Presenting
 Matplotlib

 Distributed Data Management


 Spark

Overview 45

You might also like