0% found this document useful (0 votes)
74 views

Intro Bigdata PDF

This document provides an overview of basic concepts in big data. It defines big data as high-volume, high-velocity, and high-variety information that requires new processing methods to enable enhanced decision making and insights. Examples are given of big data in government, private sector, and science. The lifecycle of data is described as acquisition, aggregation, analysis, and application. Computational and analytical techniques for big data are discussed, including visualization, classification, clustering, and predictive modeling. Related courses on big data topics are also listed.

Uploaded by

nima_farzad5718
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Intro Bigdata PDF

This document provides an overview of basic concepts in big data. It defines big data as high-volume, high-velocity, and high-variety information that requires new processing methods to enable enhanced decision making and insights. Examples are given of big data in government, private sector, and science. The lifecycle of data is described as acquisition, aggregation, analysis, and application. Computational and analytical techniques for big data are discussed, including visualization, classification, clustering, and predictive modeling. Related courses on big data topics are also listed.

Uploaded by

nima_farzad5718
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Basic

 Concepts  in  Big  Data  

ChengXiang  (“Cheng”)  Zhai  


Department  of  Computer  Science  
University  of  Illinois  at  Urbana-­‐Champaign  
hBp://www.cs.uiuc.edu/homes/czhai  
 [email protected]  
What  is  “big  data”?    

•  "Big  Data  are  high-­‐volume,  high-­‐velocity,  and/or  


high-­‐variety  informaNon  assets  that  require  new  
forms  of  processing  to  enable  enhanced  decision  
making,  insight  discovery  and  process  
opNmizaNon”    (Gartner  2012)  
•  Complicated  (intelligent)  analysis  of  data  may  
make  a  small  data  “appear”  to  be  “big”  
•  BoBom  line:  Any  data  that  exceeds  our  current  
capability  of  processing  can  be  regarded  as  “big”  
Why  is  “big  data”  a  “big  deal”?  
•  Government    
–  Obama  administraNon  announced  “big  data”  iniNaNve    
–  Many  different  big  data  programs  launched  
•  Private  Sector  
–  Walmart handles more than 1 million customer transactions
every hour, which is imported into databases estimated to
contain more than 2.5 petabytes of data
–  Facebook handles 40 billion photos from its user base.
–  Falcon Credit Card Fraud Detection System protects 2.1 billion
active accounts world-wide
•  Science  
–  Large Synoptic Survey Telescope will generate 140 Terabyte
of data every 5 days.
–  Biomedical computation like decoding human Genome &
personalized medicine
–  Social science revolution
–  -…
Lifecycle  of  Data:  4  “A”s  

AggregaNon  

AcquisiNon   Analysis  

 ApplicaNon  
ComputaNonal  View  of    Big  Data  

Data  Visualiza8on    

Data  Access   Data  Analysis  

Data  Understanding   Data  Integra8on  

Forma&ng,  Cleaning  

Storage   Data  
Big  Data  &  Related  Topics/Courses  
CS199    
Human-­‐Computer  Interac8on  
Data  Visualiza8on    
Databases   Informa8on  Retrieval   Machine  Learning  
Data  Access   Data  Analysis  
Data  Mining  
Computer  Vision   Speech  Recogni8on  
Data  Understanding   Data  Integra8on  
Natural  Language  Processing   Data  Warehousing  

Forma&ng,  Cleaning  
Signal  Processing  
Many  Applica8ons!  
Storage   Data  
Informa8on  Theory  
Some  Data  Analysis  Techniques  

Visualiza8on  

Classifica8on   Predic8ve  Modeling  

Time  Series   Clustering  


Example  of  Analysis:    
Clustering  &  Latent  Factor  Analysis  
Group  M1   Group  M2  

Movie  1   Movie  2   …   Movie  m  

Group    U1  
User1   3.5   4   5  

User2   5   1  

…  
Group    U2  

User  n   2   1   4  
Example  of  Analysis:  PredicNve  Modeling  
Group  M1   Group  M2  

Movie  1   Movie  2   …   Movie  m  

Group    U1  
User1   3.5   4   5  

User2   5   1  
=?  
…  
Group    U2  

User  n   2   1   4  

Does  user2  like  movie  m?         (Binary)  Classifica8on  


What  raNng  is  user2  likely  going  to  give  movie  m?     Regression  
Some  topics  we’ll  cover  

You might also like