0% found this document useful (1 vote)
286 views

What Is Data Science GDI

Data science involves extracting knowledge from large amounts of data through data mining, predictive modeling, and data visualization. It is an interdisciplinary field that uses techniques from statistics, computer science, mathematics, and domain expertise to solve problems. Data scientists come from diverse backgrounds and use their skills to collect, clean, analyze, and interpret data to discover patterns and make business decisions. Getting started in data science involves learning programming, statistics, math, and doing hands-on projects with publicly available data sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
286 views

What Is Data Science GDI

Data science involves extracting knowledge from large amounts of data through data mining, predictive modeling, and data visualization. It is an interdisciplinary field that uses techniques from statistics, computer science, mathematics, and domain expertise to solve problems. Data scientists come from diverse backgrounds and use their skills to collect, clean, analyze, and interpret data to discover patterns and make business decisions. Getting started in data science involves learning programming, statistics, math, and doing hands-on projects with publicly available data sets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

What is Data Science?

{ Girl Develop It! Meetup


Renée M. P. Teate, March 2015
Let’s start with: “What is Data?”

http://upload.wikimedia.org/wikipedia/commons/f/f0/DARPA https://encrypted-
_Big_Data.jpg tbn2.gstatic.com/images?q=tbn:ANd9GcS9dKu3_Tzi-sWW-
yAqee5y0EhuvoIZNSya_rAKnuBBd0JYxPX7pw

http://www.freefoto.com/images/1351/06/1351_06_2---Books--
http://fc01.deviantart.net/fs71/i/2012/326/3/4/cute_dog_by_tho Shakespeare-and-Company-Bookstore--The-Latin-Quarter--
masmeadows345-d5lsah9.jpg Paris_web.jpg
http://upload.wikimedia.org/wikipedia/commons/9/96/Bill_Nye
,_Barack_Obama_and_Neil_deGrasse_Tyson_selfie_2014.jpg

https://c2.staticflickr.com/4/3273/3017878633_65beb1c7d6.jpg
https://c1.staticflickr.com/1/2/1349370_07
03fce74c.jpg

http://upload.wikimedia.org/wikipedia/commons/e/e4/Gr
een_Bank_100m_diameter_Radio_Telescope.jpg
 Around 100 hours of video are uploaded to YouTube every minute
 it would take about 15 years to watch every video uploaded in one day

 AT&T is thought to hold the world’s largest volume of data in one


unique database – its phone records database is 312 terabytes in size,
and contains almost 2 trillion rows.

 Every minute we send 204,000,000 emails, generate 1,800,000 Facebook


likes, send 278,000 Tweets, and up-load 200,000 photos to Facebook

 570 new websites spring into existence every minute of every day.

http://smartdatacollective.com/bernardmarr/277731/big-data-25-facts-everyone-needs-know
http://pixabay.com/static/uploads/photo/2014/03/13/01/12/datacen
ter-286386_640.jpg

https://c2.staticflickr.com/2/1296/533233247_b6baa30fdb_z.jpg?zz=1
https://c1.staticflickr.com/3/2300/2596366618_2d6cb01735.jpg

http://upload.wiki
media.org/wikipedi
a/commons/9/90/Ke
ncf0618FacebookNe
twork.jpg

http://upload.wikimedia.org/wikipedia/commons/b/bf/USDA_Hardine
ss_zone_map.jpg http://upload.wikimedia.org/wikipedia/commons/1/1c/CMS_Higgs-event.jpg
Databases You Use
 Pretty much every website you interact with
 Social Media  Online Shopping
 Banking  Course Registration/Canvas

 File Sharing  Travel

 Search Engines  Etc. etc. etc…..

 You broadcast/generate data everywhere you go


 Cell phones  Email
 Purchases  Posting status updates

 Driving (GPS)  Attending events

 Streaming music  Etc. etc. etc…..


https://www.google.com/maps/@38.8905569,-77.1721577,13z/data=!5m1!1e1

http://upload.wikimedia.org/wikipedia/commons/6/69/Netflix_logo.svg

How is data
https://c2.staticflickr.com/4/3324/3507973704_563846fe14_z.jpg?zz=1
collected about you
used to help you?
Who builds these systems?
Data Scientist
Computer Scientist Mathematician Business Person
• Data collection systems • Statistical Models • Domain Expertise
• Machine Learning • Evaluation Metrics • Knowing what
Algorithms • Predictive Analytics questions to ask
• Interface Design • Data Visualizations • Interpreting results for
• Design/Manage/Query business decisions
Databases • Presenting outcomes
• Data Aggregation
• Data Mining

Examples – not a complete definition, and not all


simultaneously necessary skills
Data Science Venn Diagram by Drew Conway
http://static.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10
f77ab/1364352052403/Data_Science_VD.png?format=750w
From “Doing Data Science” by Cathy
O’Neill & Rachel Schutt
http://semanticommunity.info/@api/deki/files/27057/Figure1-
http://www.becomingadatascientist.com/wp- 4.png?size=bestfit&width=484&height=541&revision=1
content/uploads/2014/06/DS_profile.png

No need to be a “unicorn”, but do need to know something


about all of these areas, and become expert in some
Some other names for “Data Scientist”
 Statistician  Pythonista
 Data Mining Specialist  Financial Analyst
 Biostatistician  Recommendation System
 Social Science Researcher Engineer
 Big Data Analyst  Information Architect
 Spatial/GIS Analyst  Artificial Intelligence
 Natural Language Researcher
Programmer
 Neuroscientist
 Computational Physicist
 Data Visualization Designer
Data Science jobs pay an
average of $118,000 per year

It is estimated that by 2018, US could have a


shortage of 140,000+ people with advanced
analytical skills & need 1.5M managers/analysts
that can make decisions based on data analysis
“Extraction of Knowledge”
 Also known as “knowledge discovery”

 Goes beyond queries

 Data Mining
 Business Understanding
 Data Understanding
 Data Preparation
 Modeling
 Clustering
 Classification
 Regression
 Evaluation
 From “Data Science for
Business” by Provost & Fawcett Images from ODU ECE 607 Lecture Slides by Prof. Jiang Li
Video clip: Interview with Neha Kothari, LinkedIN Data Scientist
http://youtu.be/8dxKe5cGHdA?t=17s
Examples
 Galaxy Classification using Convolutional
Neural Networks
http://benanne.github.io/2014/04/05/galaxy-zoo.html

 Choosing Facebook Audience for Content


Promotion using Random Forests
http://citizennet.com/blog/2012/11/10/random-forests-
ensembles-and-performance-metrics/

 Predicting Wine Quality with Principal


Component Analysis
http://fastml.com/predicting-wine-quality/

 Readmission Risk Score to decide which


patients to give additional follow-up help at
Mt. Sinai hospital
http://www.technologyreview.com/news/518916/a-
hospital-takes-its-own-big-data-medicine/
http://xkcd.com/1425/
How to get started
Topics to learn about
 Programming  Research and Analysis
 Any language is good to  Science involving data
start with. Gain core collection and interpretation
understanding.
 Working with “messy” real
 Python or R data analysis life data
experience a plus
 Business Analytics
 Database design, SQL
 Data Mining
 Math
 Others
 Calculus
 Business / Communication
 Linear Algebra
 Statistics  Graphic Design

 Advanced: Optimization /
Linear Programming
Read, read, read
 Doing Data Science by Cathy O’Neil* & Rachel Schutt
 Data Science for Business by Forster Provost & Tom Fawcett
 Data Smart by John Foreman* (uses Excel)
 I review other books as I read them:
http://www.becomingadatascientist.com/learning/
 Blogs & News Feeds (FlowingData.com is a good one to start with)
 Twitter – look for curated lists of people to follow
https://twitter.com/BecomingDataSci/lists/women-in-data-
science/members

*on Twitter and


willing to chat!
Free Online Courses
 Python Fundamentals – Codecademy http://www.codecademy.com/tracks/python

 Machine Learning – Coursera / Stanford https://www.coursera.org/course/ml

 Data Analyst Nanodegree – Udacity https://www.udacity.com/course/nd002


(includes Hadoop mini-course)

 Applied Data Mining and Statistical Learning – Penn State


https://onlinecourses.science.psu.edu/stat857/

 Pretty comprehensive list here: http://www.kdnuggets.com/education/online.html


 TED talks on Data http://www.ted.com/search?q=data

 Susan Etlinger* http://www.ted.com/talks/susan_etlinger_what_do_we_do_with_all_this_big_data


 “Need to spend more time on critical thinking skills…[because we have
the] potential to make bad decisions far more quickly, efficiently, and with
far greater impact than we did in the past.”
 “…we need to be clear about ..the methodologies that we use, …because if I
don't know what …questions you asked, I don't know what questions you
didn't ask.”
Explore
 Volunteer to Analyze Data (DataKind)

 Play with public data sets


 http://101.datascience.community/2014/10/17/data-sources-for-cool-data-
science-projects-part-1-guest-post/

 https://www.opensciencedatacloud.org/publicdata/

 http://catalog.data.gov/dataset

 https://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&nu
mAtt=&numIns=&type=&sort=nameUp&view=table

 Data Science Competitions


(Kaggle also has “knowledge competitions” for learning)
Questions?

Renee Teate
[email protected], @becomingdatasci
http://www.becomingadatascientist.com

You might also like