Data Science Applications
Data Science Applications
Some highlights:
• Topics for data scientists
• R
• IBM Cognos Workspace, IBM SPSS Modeler, Watson
Analytics
• VCL cloud
• Course projects
Evaluation
Course Project
Olga Baysal
Email: [email protected]
Office hours: By appointment or via Slack
Office: HP 5125D
Website: http://olgabaysal.com/teaching/winter16/
data5000.html
Boyan Bejanov
Email: [email protected]
Office hours: By appointment or via Slack
Office: none
Website: http://scs.carleton.ca/~boyanbejanov/data5000
What is Data Science?
Business efficiency: Wal-Mart
http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html
Business marketing: Target
http://tinyurl.com/7jbntx3
Recommendations: Netflix
http://www2.research.att.com/~volinsky/netflix/bpc.html
Sports analytics
Many others
• Cities: http://data.cityofchicago.org/
• Physics: http://particlefever.com/
• Politics: http://53eig.ht/1zPmuCD
• Social networks
• Biology
• Medicine
• etc.
Cholera outbreak in London 1856
• Physician John
Snow links the
outbreak to a
contaminated
well by plotting
number of
cases on a map
• Started the
science of
epidemiology
The Winchester Roll of 1086
• Commissioned in 1085 by
William the Conqueror
• Record of the Great
Survey of England
• Last used to settle dispute
in court in the 1960s!
http://www.domesdaybook.co.uk/
Data in the 20-th century
http://research.microsoft.com/en-us/collaboration/fourthparadigm/
For example
Network security:
• 20-th century: based on rules and signatures
• 21-st century: data mining traffic logs, cf.
http://www.bro.org/
Artificial Intelligence:
VS.
A good question
Skills:
• make discoveries while swimming in data
• don’t allow technical limitations to bog down solutions
• often fashion their own tools
• skilled in storytelling with data
Some data-driven companies:
• Google, Wal-Mart, Twitter, LinkedIn, Amazon
What data scientists do
• Ask a question
• Get relevant data
• Prepare data for analysis
- outliers, missing values, incorrect values
• Explore data
- understand the world as it is (was)
• Statistical model
- estimate/train and validate model
- predict what will (likely) happen
• Communicate results
- tell a story
- recommend
Data scientist skills
• Computer science
- programming, hacking skills
• Statistics
- probability, distributions, modelling
• Mathematics
- linear algebra, calculus, optimization
• Domain expertise
- storytelling, pose question, interpret result
• Communication
- presentation, data visualization
Drew Conway’s Venn diagram
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Tentative course schedule