Knowledge Discovery in Databases
Knowledge Discovery in Databases
Databases
We Are Data Rich but
Information Poor
Terrorbytes
Principles of Knowledge Discovery in Data
What Is Our Need?
Knowledge
Data
• Business transactions
• Scientific data (biology, physics, etc.)
• Medical and personal data
• Surveillance video and pictures
• Satellite sensing
• Games
• Digital media
• CAD and Software engineering
• Virtual worlds
• Text reports and memos
• The World Wide Web
•Data Mining is the root of the KDD procedure, including the inferring of
algorithms that investigate the data, develop the model, and find previously
unknown patterns.
•The model is used for extracting the knowledge from the data, analyze
the data, and predict the data.
The Challenge (Humans aren’t particularly well suited to finding
patterns in data Computers, on the other hand…)
510201889052120015394581990000000014198812294488219960816210000001010001000000011000031111100000
000010031302000000000000002020010000000000000000000000000000434388888888424243424333012202022200
001010010000000441000000001100000000000000000100000100000000000000000000000000000000000000000000
000001998102751020189606012002126940968000000159019980903379811998091731001000001000100000001100
003200020000001000000012399000000000000200222200313100312000000000000000042438888888888424342423
321212122220000001011000000244100000000010020000000000000000000010000000000000000000000000000000
000000000000000000199812305102018970203200018626929200000047091998021356971199802273100000100100
010000000001101100000020000100000000021011000100000000000100000000000010001100000001110033888822
223311323343330000001100000111010011001020001000000001000000001000000000000000000000000000000000
000000000000000000000000000000019981221510201899093020052008986730000019410199901127598119990126
310010001010001000000000111110111112201010000011123001001000000102100022000000000020000000000000
111334388884342424243424233000000111100000101100100002441000000000100200000001001010000000100000
000000000000000000000000000000000000000000001999052551020189912272009354051583000001448419970527
179711997061031000000101100100000001000003111201200001001001012000111100100001101001200000000000
100000000001010132438888888888224242433100000001002100001110010011230100000010000020001000000000
011000010000000010000010000000000000000100000000000000000199811175102018991227200935405158300000
144841997052717972199806163100000010110010000000110100311111121000100000202210012220220020221222
201000000000000000001010011003243434321324221424242330021002100001111011000001122310011000001000
00010000000000110000100000000100000100000000000000000000000000000000001998122351020190001
Context
• Where you stand on Data Mining depends on where you sit:
• A business user will be interested in efficiency and results, validity
may not be as important.
• A researcher clearly will be interested in a different type of results,
and validity will be important.
• A computer scientist may be interested in introducing new algorithms
or computational approaches and achieving improved results or more
efficient processing.
KDD: A Definition
KDD is the automatic extraction of non-obvious,
hidden knowledge from large volumes of data.
9
Data, Information, Knowledge
We often see data as a string of bits, or numbers and
symbols, or “objects” which we collect daily.
10
The KDD Process
• The knowledge discovery process (illustrates in the given figure) is
iterative and interactive, comprises of nine steps.
• The process is iterative at each stage, implying that moving back to the
previous actions might be required.
• The process has many imaginative aspects in the sense that one can’t
present one formula or make a complete scientific categorization for the
correct decisions for each step and application type.