Topic 10-Data Mining
Topic 10-Data Mining
Mining
Pal, C.J., Hall, M.A., Frank, E., Witten, I.H., 2016, Data Mining, 4th
Edition, Morgan Kauffman. Chapter 1, available from Topic 08
Readings
Learning outcomes
• Global competition
“We are overwhelmed • Untapped value of organisational data
with data. The amount of
data in the world, in our • Increasing consolidation of data
lives, seems ever- • Vast improvements in processing and
increasing—and there’s
no end in sight.” Pal et al
storage capabilities and reduction in
(2014) cost
Why bother?
• Data mining has been, and continues to be, used in a wide variety of
contexts, some examples are:
• Customer relationship management
• Banking and other financial
• Retailing/logistics
• Insurance
• Brokerage and securities trading
• Manufacturing and Maintenance
CRM
https://www.linkedin.com/pulse/20140403185
417-4785379-diapers-and-beer
Manufacturing and maintenance
• Page ranking
• Social media
• “And then there are social networks and other personal data. We live in the
age of selfrevelation: people share their innermost thoughts in blogs and
tweets, their photographs, their music and movie tastes, their opinions of
books, software, gadgets, and hotels, their social life. They may believe they
are doing this anonymously, or pseudonymously, but often they are
incorrect... There is huge commercial interest in making money by mining the
Web.” Pal, et al
Topic 10: Part 03
Data Mining Issues
Privacy issues
• Data mining …
• provides instant solutions/predictions
• is not yet viable for business applications
• requires a separate, dedicated database
• can only be done by those with advanced degrees
• is only for large firms that have lots of customer data
• is another name for the good-old statistics
Blunders
• Re-identification
• >85% of Americans can be identified from publicly available records using
three pieces of information: ZIP code, birthdate and gender
• There are many examples of companies releasing “de-identified” data in good
faith, but finding they were re-identifiable
• Using Personal Information
• When collecting and using information from individuals, they should be told
what is being done to protect their data
• Wider issues
• E.g., Cambridge Analytica
Topic 10: Part 04
Topic Summary
Learning outcomes