CS429: Data Mining: About Instructor
CS429: Data Mining: About Instructor
Sibt ul Hussain
http://sites.google.com/SibtulHussain
About Instructor
• Instructor: Sibt ul Hussain (Leading Research Scientist of ReVeaL
Research Group)
Experience:
2006-2007 SUPELEC, Paris, France
2007 INRIA (LEAR), Grenoble, France
2007-2012 LJK, Grenoble,
2012-2013 GREYC, Caen, France.
2013-now Assistant Professor, FAST
Research Grants
2014 -- 2016 NVIDIA ~US$ 11000 for research
in deep learning.
1
1/27/2017
2
1/27/2017
3
1/27/2017
4
1/27/2017
5
1/27/2017
6
1/27/2017
7
1/27/2017
8
1/27/2017
9
1/27/2017
10
1/27/2017
11
1/27/2017
12
1/27/2017
Contrast: Databases
Databases Data Science
Querying the past Querying the future
Not
13
1/27/2017
Publish a paper
14
1/27/2017
Evaluate
Interpret
15
1/27/2017
16
1/27/2017
17
1/27/2017
▫ Medical Data
▫ Reviews Data
▫ Mobile Sensor Data
▫ Company Consumer Data
▫ …
18
1/27/2017
19
1/27/2017
In Labs
Course Contents
•Data
▫Geometric & Algebric View
Numeric Attributes
Categorical Attributes
▫Probablistic View
•Classification
▫Bayes Networks (Naive Bayes)
▫Decision Trees
▫Random Forests and Boosting
▫Linear Regression
▫Percepterons
▫SVM
▫Logistic Regression and Softmax
▫Neural Networks
•Clustering
▫Kmeans
▫Hierarchical
•Pattern Mining
▫A Priori
20
1/27/2017
Resources
•Quora •Statistical Learning Theory and
•What is Data Science? Applications, MIT
•How do I become a Data •Data Literacy, MIT
Scientist? •Prediction, MIT
•How does Data Science differ •Introduction to Data Mining,
from traditional statistical UIUC
analysis? •Learning from Data, Caltech
•Related Courses •Introduction to Statistics,
•Concepts in Computing with Harvard
Data, Berkeley •Introduction to Computing,
•Practical Machine Learning, Modeling, and Visualization,
Berkeley Harvard
•Artificial Intelligence, Berkeley •Data-Intensive Information
•Visualization, Berkeley Processing Applications,
•Data Mining and Analytics in University of Maryland
Intelligent Business Services, •Statistical Inference, UPenn
Berkeley •Introduction to Data Science,
21
1/27/2017
Resources •Books
•Competing on Analytics
•Coursera •Analytics at Work
•Data Analysis, Johns Hopkins •Super Crunchers
•Computing for Data Analysis, •The Numerati
Johns Hopkins •Data Driven
•Machine Learning, Stanford •Data Source Handbook
•Introduction to Data Science, •Programming Collective
University of Washington Intelligence
•Computational Methods for Data •Mining the Social Web
Analysis, University of •Data Analysis with Open Source
Washington Tools
•Machine Learning, University of •Visualizing Data
Washington •The Visual Display of
•Related Workshops Quantitative Information
•Data Bootcamp, Strata 2011 •Envisioning Information
•Machine Learning Summer •Visual Explanations: Images and
School, Purdue 2011 Quantities, Evidence and
•Looking at Data Narrative
•Beautiful Evidence
22
1/27/2017
Prerequisites
1. Linear Algebra
2.Calculus
3. Probability
4. Solid Knowledge of programming (if you are not
comfortable with programming do not take this
course)
23
1/27/2017
General Information
Grading
Programming Assignments --- 20% to 30%
Quizzes --- 10%
Mid-term exams --- 20% to 30%
Final projects (most probably on individual basis) --- 10%
Final --- 40%
Lab --- 15%
Piazza group (upto 2 Bonus Marks):
For open communication, questions, feedback, polls, suggestions, etc.
https://piazza.com/class/ (Slate Integrated)
Please make the best use of piazza for learning among yourselves
Work Submission
IPython Notebooks and Kaggle Competition Score.
Soft copy only (we will run your program).
Plagiarism: If our system found your code is plagiarized, then you
risk:
Zero in all assignments.
Referral to DC committee.
A straight F in course.
Warning: Majority of Failure cases will be due to
plagiarism cases.
Feel free to discuss assignments with each other (piazza), but coding
must be done individually (except for the final project)
Late Submissions of assignments are not allowed.
24
1/27/2017
25
1/27/2017
26