Week 4 - Introduction to Data Mining and Data Mining Techniques (3)
Week 4 - Introduction to Data Mining and Data Mining Techniques (3)
Dr Mariam Adedoyin-Olowe
[email protected]
Recap
• Week 1: Introduction of AI
• Week 2:
• AI Early Concepts
• What triggers the real artificial intelligence (The 3 key ingredients )
• How did AI evolve over time (Modern Historical Perspective)
• Rise of Machine Learning (2000s - 2010s)
• Current Landscape (2020s)
• Future Directions
Recap
Week 1: Introduction to AI Week 2: Evolution of AI & AI Systems
– D e fi n i ti o n – AI Early Concepts
– Why AI?
– What triggers the real artificial
– Principles of AI?
intelligence (The 3 key ingredients )
– Ty pe s o f A I – How did AI evolve over time (Modern
– AI Life Cycle Historical Perspective)
Learning Objectives
• Learn the Origin and the significance of KDD and Data Mining
• Examine the different Data Mining techniques
What is KDD
Data Mining
Transformation
Preprocessing Knowledge
Selection
Patterns
Transformed
Data
Preprocessed
Raw Target Data
Data Data
What is data mining?
•
It is the iterative and interactive process of
discovering valid, novel, useful, and
understandable knowledge ( patterns,
models, rules etc.) in Huge databases
•
It’s a knowledge discovery from data
Data Mining
• It applies to multiple disciplines
Database
Statistics
Systems
• Other Applications
– Text mining (news group, email, documents) and Web mining
– Stream data mining
– DNA and bio-data analysis
Need for data mining tools
• Human analysis crashes with volume and
dimensionality
– How swiftly can human assimilate 2 million records, with
200 elements?
– High rate of growth, changing sources
The Challenge
51020188905212001539458199000000001419881
22944882199608162100000010100010000000110
00031111100000000010031302000000000000002
02001000000000000000000000000000043438888
88884242434243330122020222000010100100000
00441000000001100000000000000000100000100
00000000000000000000000000000000000000000
00000019981027510201896060120021269409680
00000159019980903379811998091731001000001
00010000000110000320002000000100000001239
90000000000002002222003131003120000000000
00000042438888888888424342423321212122220
00000101100000024410000000001002000000000
00000000000100000000000000000000000000000
00000000000000000000199812305102018970203
20001862692920000004709199802135697119980
22731000001001000100000000011011000000200
00100000000021011000100000000000100000000
00001000110000000111003388882222331132334
33300000011000001110100110010200010000000
01000000001000000000000000000000000000000
00000000000000000000000000000000001998122
15102018990930200520089867300000194101999
01127598119990126310010001010001000000000
The Challenge
Build/select
Original DB target database Select sample
Prediction: It is used for forecasting the future. For example, you might
consider the previous sale of a particular model of smartphone to predict the
demand of its new release in the market
https://www.datasciencecentral.com/profiles/blogs/the-7-most-important-data-mining-te
chniques
• Classification
– allocate a new data record to one of numerous prior
groups or classes
– We know X and Y belong together, find other things in
same group
https://magoosh.com/statistics/time-series-
analysis-and-forecasting-definition-
and-examples/
Ex. Time series analysis
• Example: Stock Market
• Predict future values
• Determine similar patterns over
time
• Classify behaviour
Regression
• Regression is used to identify
the probability of a certain
variable, while considering the
occurrence of other variables.