Aula 1 - Programa Mestrado Data Mining I 201617 v2

Uploaded by

carla@escs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

Aula 1 - Programa Mestrado Data Mining I 201617 v2

Uploaded by

carla@escs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DATA MINING I

Knowledge Discovery in Databases

SYLLABUS
2016-2017
1
INSTRUCTOR FERNANDO LUCAS BAÇÃO
INFORMATION 2nd floor, room 10
Phone: 21 3870413 (ext. 222)
[email protected]
http://www.novaims.unl.pt/fbacao/
FREDERICO JESUS, VASCO JESUS E JOÃO SANTOS
[email protected]; [email protected]
SCHEDULE Tuesdays 18:30h – 20:15h; 20:30h – 22:15h;
OFFICE HOURS Tuesdays from 17:00h – 18:00h (schedule appointment by email)
2nd Floor, Room 10
CONTACT The course has its own email address [email protected], which
should be used by the student to contact the teachers as well as to
submit any homework and projects.
DESCRIPTION The Data Mining course aims to study the main methods and tools
available in data mining (knowledge discovery in databases), in
particular descriptive models. The course does not assume
familiarity of the student with the theme, but it is highly
recommended that the student have knowledge of inferential
statistics, as well as a computer user skills.
The course seeks a trade-off between courses dedicated to in-depth
analysis of the algorithms, and the courses for managers where what
is sought is to raise awareness of the importance of the tools. This is
a technical course for all who work or seek to work on developing
descriptive models and exploring big databases. As such, during the
course, students will develop the activities of a typical data analyst,
thus practice constitutes a central component of the course.
The main concern in this course is to present the algorithms in a clear
and comprehensible way to a wide audience with different academic
backgrounds. It is intended that the student is able to understand the
fundamentals associated with the inner workings of the different
methods, because only then he will be able to apply them
judiciously.
The course program covers the main methodological aspects as well
as the most used tools, including visualization tools, algorithms for
clustering, association rules and link analysis, among others. The aim
is also to provide students the opportunity to use the Enterprise
Miner software from SAS Institute, so that they can develop the
practical aspects related to the use of these tools.
OBJECTIVES At the end of the course, students should be able to:
• Discuss the most relevant ideas and concepts associated with
data mining;
• Be able to execute basic and intermediate data preparation

1
Fernando Lucas Bação
DATA MINING I 2016/2017
and pre-processing tasks;
• Describe the principles and execute an RFM analysis;
• Describe with detail the hierarchical, k-means and self-
organizing map algorithms;
• Create a segmentation, being able to explain the options used
and explaining alternative, whenever available;
• Describe the apriori algorithm and the association rules are
generated;
• Calculate and explain the most relevant performance
measures of association rules;
COURSE SUCCESS In this course success depends on a number of factors:
• Basic knowledge of statistics;
• Attend classes;
• Work during the semester and not only when exams are
about to start;
• Develop the course project during the semester, making the
most of the practical classes;
• Read the suggested references.
CONTENTS 1. The context for analytics
a. The data deluge
b. Information as a strategic resource
c. Data-driven decision making
d. The relation between analytics and company
performance
e. Data analytic thinking
f. Data mining and data science
2. Business problems and analytical solutions
a. From business problems to data mining tasks
b. Supervised versus unsupervised methods
i. Knowledge discovery (Clustering e
Summary)
ii. Predictive Modeling (Classification e
Regression)
c. The data mining process
i. Business understanding
ii. Data understanding
iii. Data preparation
iv. Modeling
v. Evaluation
vi. Deployment
3. Data visualization
a. Motivation
b. Guidelines for presenting information
c. Graphics for presentation
d. Graphics for analysis
4. Data preparation and preprocessing
a. Motivation
b. Types of measurements
c. Noise vs signal
d. Descriptive statistics
e. Variable Distribution
f. Ouliers

2
Fernando Lucas Bação
DATA MINING I 2016/2017
g. Missing data
h. Data discretization
i. Standardization:
j. Transformations
k. Dimensionality reduction
i. Feature extraction and selection
ii. Business transformations
5. Cluster analysis
a. Motivation
b. Components of a Clustering Task
c. The User’s Dilemma and the Role of Expertise
d. History
e. Similarity Measures
6. Clustering techniques
a. Hierarchical Clustering Algorithms
b. Partitional Algorithms (k-means)
c. Fuzzy Clustering
d. Artificial Neural Networks (Self-Organizing Maps)
7. Analysis and validation of clustering solutions
a. The number of clusters
b. Analysis and profiling of the clustering solution
c. Classification trees
d. Validity of the solution
e. Supervised classification through k-nearest
neighbors
8. Association rules
a. Motivation
b. Apriori algorithm
c. Interpretation measures
d. Types of rules
e. Temporal extension
9. Introduction to network analysis
a. Structural importance
b. Degree centrality
c. Geometric centrality
10. Introduction to text mining
a. Distance functions
b. Clustering algorithms
c. Visualization techniques
BIBLIOGRAPHY References:
q Provost, F. and Fawcett, T. Data Science for Business.
O’Reilly Media, New York, 2013.
q M.J.A. Berry, G.S. Linoff, Data mining techniques second
edition - for marketing, sales, and customer relationship
management. Wiley 2004 Chap. 1, 2, 3, 4, 5, 8 e 10.
q A. K. Jain, M.N. Murthy and P.J. Flynn, 1999 Data
Clustering: A Review, ACM Computing Review.
q Course Notes Enterprise MinerTM: Applying Data Mining
Techniques
Additional References:
q Mitchell, T., (1997) Machine Learning, McGraw Hill.
q Hand, D. J., Mannila, H., Smyth, P. (2001) Principles of Data
Mining (Adaptive Computation and Machine Learning),

3
Fernando Lucas Bação
DATA MINING I 2016/2017
MIT Press.
q Kohonen, T. (1988). “Self-organization and Associative
Memory” (2nd Edition). Springer-Verlag: New York
Note: all references are available at ISEGI-NOVA library or are
provided by the teacher.
STUDENT EVALUATION 1st Session – Exam (65%), Project (35%)
2nd Session – Exam (65%), Project (35%)
CALENDAR Lec. 1 13 Sep. Course presentation (Syllabus)
Evaluation
Course project
The context for analytics
The data deluge
Information as a strategic resource
Data-driven decision making
The relation between analytics and
company performance
Data analytic thinking
Data mining and data science
Lec. 2 20 Sep. Business problems and analytical solutions
From business problems to data mining
tasks
Supervised versus unsupervised methods
Knowledge discovery
(Clustering e Summary)
Predictive Modeling
(Classification e Regression)
The data mining process
Business understanding
Data understanding
Data preparation
Modeling
Evaluation
Deployment
Lec. 3 27 Sep. Data visualization
Motivation
Guidelines for presenting information
Graphics for presentation
Graphics for analysis
Lec. 4 4 Oct. Data preparation and preprocessing
Motivation
Types of measurements
Noise vs signal
Descriptive statistics
Variable Distribution
Ouliers
Missing data
Data discretization
Standardization:
Transformations
Dimensionality reduction
Feature extraction and
selection
Business transformations
Lec. 5 11 Oct. Practical Class Enterprise Miner
Lec. 6 18 Oct. Cluster analysis
Motivation
Components of a Clustering Task
The User’s Dilemma and the Role of

4
Fernando Lucas Bação
DATA MINING I 2016/2017
Expertise
History
Similarity Measures
Clustering techniques
Hierarchical Clustering Algorithms
Partitional Algorithms (k-means)
Lec. 7 25 Oct. Clustering techniques
Fuzzy Clustering
Artificial Neural Networks (Self-
Organizing Maps)
Lec. 8 8 Nov. Practical Class Enterprise Miner
Lec. 9 15 Nov. Analysis and validation of clustering solutions
The number of clusters
Analysis and profiling of the clustering
solution
Classification trees
Validity of the solution
Supervised classification through k-
nearest neighbors
Lec. 10 22 Nov. Practical Class Enterprise Miner
Lec. 11 29 Nov. Association rules
Motivation
Apriori algorithm
Interpretation measures
Types of rules
Temporal extension
Lec. 12 6 Dec. Introduction to network analysis
Structural importance
Degree centrality
Geometric centrality
Introduction to text mining
Distance functions
Clustering algorithms
Visualization techniques
Lec. 13 13 Dec. Practical Class Enterprise Miner
Lec. 14 19 Dec. Practical Class Enterprise Miner

Course Projects

Project consists on a practical project using SAS Enterprise Miner. In this project the students will
complete the segmentation of a customer’s database, following all the usual steps of a real world
project. For this the students will receive a set of specific guidelines that they should follow, as
well as the data. The guidelines provide the students with the type of tasks they should do and
the general results they should achieve. The end product of the project should be a report about
the database and the different segments of the company. With this project the students should
develop their analytical skills, but also their proficiency working with large datasets, extract,
transform and load tasks, visualization and reporting conclusions.

Due: 31st of December

5
Fernando Lucas Bação
DATA MINING I 2016/2017
Tasks. In both, practical and theoretical classes, students will be frequently assigned homework,
which will consist on simple tasks related with the material of the course. It is expected that the
students complete these tasks.

Final Exam. The exam will be a single hour in-class exam covering all the course material. The
exam will consist on 15 multiple-choice questions, 5 true or false questions and a small essay.

Grading

Project : 35%
Exam: 65%

Both components of the evaluation are mandatory. There are two opportunities to do the exam.
Any delay in the delivery of the project is subject to a penalty of 10% of the grade for each day of
delay. Please note that the project will be developed in groups, but each group cannot have more
than 3 elements. To obtain approval in the discipline the student cannot have less than 8 (40%) in
the exam grade.

6
Fernando Lucas Bação
DATA MINING I 2016/2017

Product MANUAL: EC35D, ECR35D, ECR40D, ECR50D
75% (4)
Product MANUAL: EC35D, ECR35D, ECR40D, ECR50D
42 pages
Strategic Marketing Plan For Red Bull
100% (1)
Strategic Marketing Plan For Red Bull
34 pages
Natureview Farm
No ratings yet
Natureview Farm
11 pages
Some Recommendations For Publishing Coin
No ratings yet
Some Recommendations For Publishing Coin
6 pages
Data Mining: Ying Liu, Prof., PH.D
No ratings yet
Data Mining: Ying Liu, Prof., PH.D
57 pages
Course Outline - Data Mining
No ratings yet
Course Outline - Data Mining
18 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
Data Mining Module - New
No ratings yet
Data Mining Module - New
38 pages
DM Day1 Intro MS F24
No ratings yet
DM Day1 Intro MS F24
111 pages
Datawarehouse&Data Mining - ALL
No ratings yet
Datawarehouse&Data Mining - ALL
46 pages
Data Mining
No ratings yet
Data Mining
8 pages
Unit 3
No ratings yet
Unit 3
22 pages
DM 1
No ratings yet
DM 1
47 pages
A4629ac494 Syllabus
No ratings yet
A4629ac494 Syllabus
3 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Unit 1
No ratings yet
Unit 1
148 pages
Course Specification: (Main, Optional, Free Choice) : Main F, A, P, 1,2,3, M)
No ratings yet
Course Specification: (Main, Optional, Free Choice) : Main F, A, P, 1,2,3, M)
3 pages
Data Mining
No ratings yet
Data Mining
3 pages
Data Mining - I
No ratings yet
Data Mining - I
126 pages
Data Mining Chapter 1 Notes
No ratings yet
Data Mining Chapter 1 Notes
40 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining: V Mounika Revathi Dept of Cse Sitam
No ratings yet
Data Mining: V Mounika Revathi Dept of Cse Sitam
13 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
CE0716-Data Warehouse and Mining - Compulsory
No ratings yet
CE0716-Data Warehouse and Mining - Compulsory
5 pages
Course Manual On Data Mining - CSC 425 - 015446
No ratings yet
Course Manual On Data Mining - CSC 425 - 015446
44 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Why Data Mining?: March 3, 2015
No ratings yet
Why Data Mining?: March 3, 2015
41 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
Ljku Sem 1 049010105 Data Mining and Analysis
No ratings yet
Ljku Sem 1 049010105 Data Mining and Analysis
3 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Course Outline ADV 08 - Data Mining
No ratings yet
Course Outline ADV 08 - Data Mining
3 pages
Co-Requisite: Prerequisite: Data Book / Codes/Standards Course Category Course Designed by Approval
No ratings yet
Co-Requisite: Prerequisite: Data Book / Codes/Standards Course Category Course Designed by Approval
2 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Unit 1
No ratings yet
Unit 1
102 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
Introduction
No ratings yet
Introduction
46 pages
Chap 1
No ratings yet
Chap 1
45 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
CCS415-CCT416 Course Outline
No ratings yet
CCS415-CCT416 Course Outline
3 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
4 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
DataMining S
No ratings yet
DataMining S
103 pages
Data Mining
No ratings yet
Data Mining
9 pages
Aryan DWMPPT
No ratings yet
Aryan DWMPPT
9 pages
Study Material I
No ratings yet
Study Material I
140 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
01 Intro
No ratings yet
01 Intro
23 pages
1 - DM
No ratings yet
1 - DM
5 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
43.guidance On Publications IPSAS 12 and IAS 38 Guidance Note 1
No ratings yet
43.guidance On Publications IPSAS 12 and IAS 38 Guidance Note 1
24 pages
WEEK 2 1 Spring Clean Your Fitness Apr 2020 PDF
No ratings yet
WEEK 2 1 Spring Clean Your Fitness Apr 2020 PDF
9 pages
Clean Edge Razor Case
100% (1)
Clean Edge Razor Case
9 pages
Segmentation Tool The Fashion Channel
No ratings yet
Segmentation Tool The Fashion Channel
2 pages
G12 DR Geography
No ratings yet
G12 DR Geography
216 pages
Machine Design 1
No ratings yet
Machine Design 1
13 pages
Question 1 Encrypt The Following Message Using A Vigeniere Cipher
No ratings yet
Question 1 Encrypt The Following Message Using A Vigeniere Cipher
2 pages
OpenAI Function Calling For Financial Data Retrieval
No ratings yet
OpenAI Function Calling For Financial Data Retrieval
6 pages
Full Solved English Paper Class X 2025
No ratings yet
Full Solved English Paper Class X 2025
6 pages
S. Brînza Omorul Săvârşit Asupra A Două Sau Mai Multor Persoane (Lit.g) Alin. (2) Art.145 C.pen. RM) : Analiză de Drept Penal
No ratings yet
S. Brînza Omorul Săvârşit Asupra A Două Sau Mai Multor Persoane (Lit.g) Alin. (2) Art.145 C.pen. RM) : Analiză de Drept Penal
11 pages
Theories of Evolution
No ratings yet
Theories of Evolution
17 pages
Positioner AVP300
No ratings yet
Positioner AVP300
190 pages
Split Type Air Conditioners: 2014 R-32 New Lineup
No ratings yet
Split Type Air Conditioners: 2014 R-32 New Lineup
4 pages
19xr Impeller
No ratings yet
19xr Impeller
1 page
Class X TERM 1 PAPER
No ratings yet
Class X TERM 1 PAPER
18 pages
Ahmad Yazid Rozaan - TUGAS EAS PAPER
No ratings yet
Ahmad Yazid Rozaan - TUGAS EAS PAPER
5 pages
Strengths and Weaknesses of Approaches To Teaching Writing
80% (10)
Strengths and Weaknesses of Approaches To Teaching Writing
10 pages
Example Summary Writing A Goal of Service To Humankind Summary
No ratings yet
Example Summary Writing A Goal of Service To Humankind Summary
2 pages
Precision Oxygen Analyzer: Key Features
No ratings yet
Precision Oxygen Analyzer: Key Features
2 pages
6 - 0L Air Filter Removal
No ratings yet
6 - 0L Air Filter Removal
4 pages
Statistik English
No ratings yet
Statistik English
16 pages
CBSE CCE Guidelines and CCE Teachers Manual
100% (5)
CBSE CCE Guidelines and CCE Teachers Manual
143 pages
Background of The Study
No ratings yet
Background of The Study
5 pages
Feminist Standpoint Theory - Internet Encyclopedia of Philosophy
No ratings yet
Feminist Standpoint Theory - Internet Encyclopedia of Philosophy
17 pages
What Is Internet Banking
No ratings yet
What Is Internet Banking
13 pages
Log Cat 1750001494765
No ratings yet
Log Cat 1750001494765
5 pages
Ana Nadhya Abrar (2020) - Environemntal Journaism in Indonesia - in Search of Principles and Technical Guidelines
No ratings yet
Ana Nadhya Abrar (2020) - Environemntal Journaism in Indonesia - in Search of Principles and Technical Guidelines
15 pages
Final + Sol - Spring 2023
No ratings yet
Final + Sol - Spring 2023
11 pages
DDN Budget of Work Mathematics
No ratings yet
DDN Budget of Work Mathematics
12 pages
Schedule of Charges - Citi Rewards Credit Card: As On The Date of Levy of The Charge
No ratings yet
Schedule of Charges - Citi Rewards Credit Card: As On The Date of Levy of The Charge
2 pages
Basement - Construction - CT 3100 PDF
No ratings yet
Basement - Construction - CT 3100 PDF
21 pages
Different Types of Brick Cuts
No ratings yet
Different Types of Brick Cuts
6 pages

Aula 1 - Programa Mestrado Data Mining I 201617 v2

Uploaded by

Aula 1 - Programa Mestrado Data Mining I 201617 v2

Uploaded by

DATA MINING I

Knowledge Discovery in Databases

Due: 31st of December

You might also like