0% found this document useful (0 votes)

9 views15 pages

CSE6242 400 AnalyticsConcepts

CSE6242/CX4242 at Georgia Tech focuses on Data and Visual Analytics, covering eight key data analytics concepts including classification, regression, similarity matching, clustering, co-occurrence grouping, profiling, link prediction, and data reduction. The course emphasizes practical applications and encourages students to think about real-world problems they want to solve using large datasets and appropriate techniques. It is free for Georgia Tech students and is partly based on materials from various experts in the field.

Uploaded by

runner4ever81

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views15 pages

CSE6242 400 AnalyticsConcepts

Uploaded by

runner4ever81

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

http://poloclub.gatech.

edu/cse6242

CSE6242/CX4242: Data & Visual Analytics

Data Analytics Concepts

Duen Horng (Polo) Chau
Professor, College of Computing
Associate Director, MS Analytics
Georgia Tech

Partly based on materials by Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos
8 concept non-mutually
exclusive classes
Free for GT students
1. Classi cation
(or Probability Estimation)

Predict which of a (small) set of classes an

entity belong to.

3
fi
1. Classi cation
(or Probability Estimation)

Predict which of a (small) set of classes an entity belong to.

•email spam (y, n)
•sentiment analysis (+, -, neutral)
•news (politics, sports, …)
•medical diagnosis (cancer or not)
•shirt size (s, m, l)
•cat detection
•face detection (baby, middle-aged, etc.)
•buy /not buy - commerce
4
fi
2. Regression (“value estimation”)
Predict the numerical value of some variable for
an entity.

5
2. Regression (“value estimation”)
Predict the numerical value of some variable for an
entity.
•point value of wine (50-100)
•credit score
•stock prices
•relationship between price and sales
•weather
•sports and game scores
6
3. Similarity Matching
Find similar entities (from a large dataset)
based on what we know about them.

7
3. Similarity Matching
Find similar entities (from a large dataset) based on what we know
about them.

• nd similar gene sequences (that may be repeating, or does

similar things)

•online dating

•patent search
•carpool matching ( nd people to carpool)

8
fi
fi
4. Clustering (unsupervised learning)
Group entities together by their similarity.
(For most algorithms, user provides # of clusters)

9
4. Clustering (unsupervised learning)
Group entities together by their similarity.
•groupings of similar bugs in code
•topical analysis (tweets?)
•land cover: tree/road/…
•for advertising: grouping users for marketing
purposes
•cluster people by accents (y’all, you all)

10
5. Co-occurrence grouping
(Many names: frequent itemset mining, association rule
discovery, market-basket analysis)

Find associations between entities based on

transactions that involve them
(e.g., bread and milk often bought together)

http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target- gured-out-a-teen-girl-
was-pregnant-before-her-father-did/ 11
fi
6. Pro ling / Pattern Mining /
Anomaly Detection (unsupervised)
Characterize typical behaviors of an entity (person,
computer router, etc.) so you can nd trends and outliers.

• Google sign-in alert

• Computer instruction prediction
• Removing noisy data (data cleaning)
• Detect anomalies in network tra c
• Moneyball
• Smart security camera
12
fi
ffi
fi
7. Link Prediction / Recommendation
Predict if two entities should be connected, and how
strongly that link should be.
Linkedin/Facebook: people you may know
Amazon/Net ix.Pandora: because you like
terminator…suggest other movies you may also like

13
fl
8. Data reduction (“dimensionality reduction”)
Shrink a large dataset into smaller one, with as
little loss of information as possible
1. if you want to visualize the data (in 2D/3D)
Most popular: UMAP, T-SNE
2. faster computation/less storage
3. reduce noise

14
Start Thinking About Project!

• What problems do you want to solve?

• Using what large, real datasets?
• What techniques do you need?

IE5005 Lecture 00
No ratings yet
IE5005 Lecture 00
32 pages
Lecture 1 - Introduction To Big Data
No ratings yet
Lecture 1 - Introduction To Big Data
51 pages
0 KDLVLP Đã G P
No ratings yet
0 KDLVLP Đã G P
523 pages
Datamining Lect 1
No ratings yet
Datamining Lect 1
118 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Module 1 Part1
No ratings yet
Module 1 Part1
68 pages
Data Science Intro Mulawarman
No ratings yet
Data Science Intro Mulawarman
89 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
3 Data Mining
No ratings yet
3 Data Mining
58 pages
ch01 Intro
No ratings yet
ch01 Intro
45 pages
L1
No ratings yet
L1
44 pages
2020 11 10 Polo
No ratings yet
2020 11 10 Polo
28 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
PSK Unit 1 Merged
No ratings yet
PSK Unit 1 Merged
125 pages
Datamining-Lect1 2
No ratings yet
Datamining-Lect1 2
44 pages
Dsbda Unit1
No ratings yet
Dsbda Unit1
232 pages
CSE6242 000 Intro
No ratings yet
CSE6242 000 Intro
44 pages
CS822 DataMining Week1
No ratings yet
CS822 DataMining Week1
97 pages
Lecture 2
No ratings yet
Lecture 2
32 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
89 pages
Big Data
No ratings yet
Big Data
20 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
Mlintro 2
No ratings yet
Mlintro 2
28 pages
Unit 1
No ratings yet
Unit 1
137 pages
Week 1, DM Intro
No ratings yet
Week 1, DM Intro
31 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Chapter 1
No ratings yet
Chapter 1
149 pages
Workshop 0
No ratings yet
Workshop 0
22 pages
lecture1&2-đã chuyển đổi
No ratings yet
lecture1&2-đã chuyển đổi
46 pages
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
No ratings yet
This PPT Is Dedicated To My Inner Controller Founders.: Amma Bhagavan
84 pages
Big Data
No ratings yet
Big Data
35 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Introductory Big Data
No ratings yet
Introductory Big Data
34 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
DSand ML
No ratings yet
DSand ML
76 pages
Chap1 Intro-2
No ratings yet
Chap1 Intro-2
34 pages
GE 461 Introduction To Data Science: Spring 2021
No ratings yet
GE 461 Introduction To Data Science: Spring 2021
39 pages
1c. INTRODUCTION-Data-Science-basic
No ratings yet
1c. INTRODUCTION-Data-Science-basic
31 pages
Data and Analytics 4.1-2 v3 Handout
No ratings yet
Data and Analytics 4.1-2 v3 Handout
44 pages
Unit 3
No ratings yet
Unit 3
33 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Data Analytics PDF
0% (1)
Data Analytics PDF
6 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
Data Mining
No ratings yet
Data Mining
84 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
Cs253 01 Introduction Marked
No ratings yet
Cs253 01 Introduction Marked
49 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Introduction To Data Science and Analytics: Summer School 2015
No ratings yet
Introduction To Data Science and Analytics: Summer School 2015
31 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Abhijitya Midsem
No ratings yet
Abhijitya Midsem
6 pages
Data Analytics
No ratings yet
Data Analytics
24 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
32 pages
Project Report
No ratings yet
Project Report
29 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Research Paper On Hadoop
No ratings yet
Research Paper On Hadoop
47 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Dis. Sensor
No ratings yet
Dis. Sensor
3 pages
Module 6 NC II Presenting Relevant Information Final
No ratings yet
Module 6 NC II Presenting Relevant Information Final
68 pages
Module 1 Rhyming Words (For Reading On-The-Air) (Final)
No ratings yet
Module 1 Rhyming Words (For Reading On-The-Air) (Final)
12 pages
Pursue Lesson 1
No ratings yet
Pursue Lesson 1
10 pages
A 10-Bit 50-MS Per S SAR ADC With A Monotonic Capacitor Switching Procedure
No ratings yet
A 10-Bit 50-MS Per S SAR ADC With A Monotonic Capacitor Switching Procedure
10 pages
BARTEC Engineers Manual
No ratings yet
BARTEC Engineers Manual
12 pages
Hindustan Aeronautics Limited: Asia'S Premier Aerospace Complex
No ratings yet
Hindustan Aeronautics Limited: Asia'S Premier Aerospace Complex
20 pages
Volume Shockers (Stocks With Rising Volumes), Technical Analysis Scanner
No ratings yet
Volume Shockers (Stocks With Rising Volumes), Technical Analysis Scanner
2 pages
MGD Lime Projects - Activation Schedule (01 April 2025) Calls
No ratings yet
MGD Lime Projects - Activation Schedule (01 April 2025) Calls
81 pages
Epic Failures in DevSecOps V1
No ratings yet
Epic Failures in DevSecOps V1
156 pages
J24 Jimmys Combo
No ratings yet
J24 Jimmys Combo
54 pages
Digestion of Carbohydrates
No ratings yet
Digestion of Carbohydrates
4 pages
Atitude of Fast-Food Worker
No ratings yet
Atitude of Fast-Food Worker
8 pages
Lesson 5
No ratings yet
Lesson 5
2 pages
GDC BCP Template
No ratings yet
GDC BCP Template
53 pages
Leg en D: Construction Project Schedule
No ratings yet
Leg en D: Construction Project Schedule
6 pages
Essay For Ielts
No ratings yet
Essay For Ielts
64 pages
PES 4 AFS Map Files by Ajay
No ratings yet
PES 4 AFS Map Files by Ajay
2 pages
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
No ratings yet
Which Advertisement. Next To Each Statement Write A Letter (A-H) - Some Advertisements Correspond To More Than One Statement. One Example Is Given
9 pages
Skin Rejuvenation Regimens
No ratings yet
Skin Rejuvenation Regimens
5 pages
My Life 10 Years From Now
No ratings yet
My Life 10 Years From Now
2 pages
2018-1 - Classifications in Brief Tonnis Classification of Hip Osteoarthritis
No ratings yet
2018-1 - Classifications in Brief Tonnis Classification of Hip Osteoarthritis
5 pages
190 MP IgM-IFU-en-EU-IVDD-V2.1
No ratings yet
190 MP IgM-IFU-en-EU-IVDD-V2.1
2 pages
Pinterest
No ratings yet
Pinterest
6 pages
C++ With Visual Basic
No ratings yet
C++ With Visual Basic
10 pages
EBD Blades Sponsorhip Letter
No ratings yet
EBD Blades Sponsorhip Letter
2 pages
Ielts Reading Question Sheet
No ratings yet
Ielts Reading Question Sheet
2 pages
Psychology Chapter 1
No ratings yet
Psychology Chapter 1
2 pages
Fascinating Photos of Afghanistan in The 1960s Show Life Before The Taliban
No ratings yet
Fascinating Photos of Afghanistan in The 1960s Show Life Before The Taliban
1 page
HVCIA2011 Agenda
No ratings yet
HVCIA2011 Agenda
1 page
Search Manifesto
From Everand
Search Manifesto
Rajan Manickavasagam
No ratings yet

CSE6242 400 AnalyticsConcepts

Uploaded by

CSE6242 400 AnalyticsConcepts

Uploaded by

http://poloclub.gatech.

CSE6242/CX4242: Data & Visual Analytics

Data Analytics Concepts

Predict which of a (small) set of classes an

Predict which of a (small) set of classes an entity belong to.

• nd similar gene sequences (that may be repeating, or does

Find associations between entities based on

• Google sign-in alert

• What problems do you want to solve?

You might also like