Week6 Iot Big Data

The document discusses big data and the Internet of Things (IoT). It notes that the number of devices generating data is increasing exponentially as they continuously create data. This massive amount of data from various sources is what constitutes big data. It also discusses how IoT applications generate large amounts of data continuously. The document provides an overview of big data classification, the path data takes from collection to processing, different sources of data generation, techniques for data acquisition and transportation, preprocessing methods, and approaches to big data storage.

Uploaded by

Hector Tibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views21 pages

Week6 Iot Big Data

Uploaded by

Hector Tibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Big

Data and IoT

Baris Aksanli
02/10/2016
Why is there big data?

•  Number of devices increasing exponenEally

–  They conEnuously generate data
–  For example, on average, 72 hours of videos are uploaded to YouTube in
every minute. 2
How much data is big?
•  2010: Apache Hadoop:
“datasets which could
not be captured,
managed, and processed
by general computers
within an acceptable
scope.”

•  3V model: Volume,
Velocity, Variety [META]
–  +1V: Value [IDC]
3
Value of Big Data
•  New business and
eﬃciency
opportuniEes
•  $300B in US medical
industry
•  Increased eﬃciency
of government
operaEons
•  Search engines
personalized for users
•  Personalized ads,
products, etc.
4
IoT and Big Data
•  IoT applicaEons conEnuously generate data
–  Even the smallest device generates data
•  The problem: data processing capacity is lower than
data genera9on speed

5
Big Data ClassiﬁcaEon

6
Path of the Data

Data collecEon Data transfer Data processing

& acquisiEon & analysis
7
Data GeneraEon
•  Enterprise data: big companies, e.g. Facebook, Amazon
–  Business data is expected to double every 1.2 years
–  Walmart processes 1M customer trades/hour
–  Akamai adverEses 75M events/day
•  IoT data: pervasive applicaEons, clinical medical care-‐R&D
–  Large scale, heterogeneous and strongly correlated data
–  30 billion RFID tags and 4.6 billion camera phones are used around the
world today
–  If Wal-‐Mart operates RFID on item level, it is expected to generate 7
terabytes (TB) of data every day
•  Bio-‐medical data: human gene sequencing
–  One sequencing of human gene may generate 100 sequences of 600GB
raw data
•  Other areas: physics, bio-‐informaEcs, etc.
–  Astronomy: Sloan Digital Sky Survey (SDSS), the data volume generated
per night surpasses 20TB 8
Data AcquisiEon
•  Log files: almost all digital devices
provide logging capability
–  Web acEvity recording, financial
applicaEons, network monitoring
•  Sensing: physical quanEEes into
readable digital signals
–  Sound wave, voice, vibraEon,
automobile, chemical, current,
weather, pressure, temperature, etc.
–  LocalizaEon
•  Mobile plakorms: similar to sensing
–  More personalized, specific to a user
9
Data TransportaEon
•  Data transfer to a storage infrastructure for
processing and analysis
•  Inter data center
network (DCN)
transmissions:
–  Source to data center
–  Using WAN: 40-‐100Gbps
•  Intra DCN transmissions:
–  Data center interconnect
–  Top-‐of-‐the-‐rack vs.
aggregator switches
–  1-‐10-‐100 Gbps Data transportaEon example 10
Data Preprocessing
•  Eliminate or reduce redundancy, noise, meaningless data
–  Increase storage efficiency, data analysis speed
•  IntegraEon: combining data from different sources
–  Data warehouse: ETL (Extract, Transform and Load)
–  Data federaEon
–  Mostly used by search engines
•  Cleaning: how can data be cleaned?
–  Define error types -‐> idenEfy errors -‐> correct errors -‐>
document errors -‐> modify infrastructure to prevent errors
•  Redundancy eliminaEon
–  Redundancy detecEon, data filtering, data compression
–  Areas: Images, videos
•  One soluEon: Compression! 11
Preprocessing CapabiliEes

Arduino Network Raspberry Pi 2 Commodity server

16 MHz speed: 1Gpbs 600 MHz Network 3 GHz
32KB ﬂash 1GB Ram speed: 10Gpbs 32 GB Ram

•  Assume there is a job with 1TB total size

•  100K Arduino, 1K Raspberry Pi 2, 100 servers
•  Time spent in computaEon vs. networking
–  Ardunio level
–  Raspberry Pi 2 level
–  Server level 12
Big Data Storage
•  Storage and management of large-‐scale data sets while
achieving reliability and availability of data accessing
–  TradiEonally on servers with structured RDBMSs.
•  ExisEng storage systems for massive data
–  Direct asached storage (DAS)
•  Several hard disks directly connected with servers
•  Only suitable to interconnect servers with a small scale
–  Network asached storage (NAS)
•  NAS uElizes network to provide a union interface for data access and
sharing
•  The I/O burden is reduced extensively since the server accesses a storage
device indirectly through a network
–  Storage area network (SAN)
•  Designed for data storage with a scalable and bandwidth intensive network
•  Data storage management is relaEvely independent within a storage local
area network 13
Distributed Storage System

•  CAP: Consistency – Availability – ParEEon tolerance

–  At most two of the three requirements can be saEsﬁed simultaneously
•  CA vs. CP vs. AP systems
–  CA: for single servers
–  CP: useful for moderate load [BigTable and Hbase]
–  AP: useful when no high demand on accuracy [Dynamo and Cassandra]
14
File systems for Big Data

•  Other examples:
–  HDFS and Kosmos
–  Extensions to GFS
•  Google ﬁle system (GFS) –  Cosmos from MS
–  File broken into chunks (typically
64MB) –  Haystack from FB
–  Master manages metadata
–  Data transfers happen directly
between clients and chunkservers
15
Database Technology
•  Key-‐value databases: data is stored corresponding to
unique key-‐values -‐> shorter query response Eme
–  Provide expandability by distribuEng key words into nodes
–  Dynamo [Amazon] and Voldemort [LinkedIn]
•  Column-‐oriented databases: store and process data
according to columns rather than rows
–  Both columns and rows are segmented in mulEple nodes to
realize expandability
–  BigTable [Google] and Cassandra [Facebook]
•  Document databases: can support more complex data
forms and key-‐value pairs can sEll be saved
–  Structured data storage with objects
–  MongoDB [Binary JSON objects], SimpleDB [Amazon] and
CouchDB [Apache] 16
Programming Models
•  TradiEonal parallel models do not perform well
–  Scalability issues: big data are generally stored in hundreds and even
thousands of commercial servers

17
Data Analysis
•  Goal is to extract useful values, w/suggesEons or decisions
•  TradiEonal data analysis
–  Cluster analysis: grouping objects
–  Factor analysis: describe the
relaEon among many elements with
a few factors

–  CorrelaEon analysis: dependence among variables
–  Regression analysis: dependence relaEonships among
variables hidden by randomness
–  A/B tesEng: improve target variables by comparing the
tested group
–  StaEsEcal analysis: summarize and describe data sets 18
Big Data AnalyEcs
•  Bloom filter: using hash funcEons to conduct lossy
compression storage of data
–  High space efficiency and high query speed
•  Hashing: transforms data into shorter fixed-‐length
numerical values or index values
–  Rapid reading but hard to find a good hash funcEon
•  Index: fast data retrieval and modificaEon
–  AddiEonal cost for storing index files which should be
maintained dynamically when data is updated
•  Triel: trie tree, a variant of hash tree
–  Fast string operaEons
–  Leverage common prefixes of character strings to reduce
comparison on character strings
19
Tools for Big Data Analysis
•  The top five most widely used sovware,
according to a survey of “What AnalyEcs, Data
mining, Big Data sovware that you used in the
past 12 months for a real project?” of 798
professionals made by KDNuggets in 2012
•  R [30.7%]
•  Excel [29.8%]
•  Rapid-‐I Rapidminer [26.7%]
•  KNMINE [21.8%]
•  Weka/Pentaho [14.8%]

20
Summary
•  Big data is diﬀerent than tradiEonal massive data
–  Cannot be processed by general computers within
acceptable Eme
–  Why big data is an inevitable result of the IoT
•  The basics of big data and analyEcs
–  Data generaEon/acquisiEon
–  Data storage
–  Data analyEcs
•  Many systems built to address a diﬀerent aspect
of big data
21

Big Data Analytics - Lecture Slides
No ratings yet
Big Data Analytics - Lecture Slides
72 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
BDA Techneo
100% (1)
BDA Techneo
91 pages
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
100% (1)
Wollega University Department of Computer Science Selected Topics in Computer Science by Tadele D. March 18, 2023
75 pages
Big Data Presentation Slide
100% (1)
Big Data Presentation Slide
30 pages
12 Issue Akira
100% (1)
12 Issue Akira
20 pages
Computer Applications Sample Papers Knowledge Boat
No ratings yet
Computer Applications Sample Papers Knowledge Boat
76 pages
LTE: RRC Connection Request: Ue-Identity (Initial UE-Identity) : This IE Is Included To Facilitate Contention Resolution
No ratings yet
LTE: RRC Connection Request: Ue-Identity (Initial UE-Identity) : This IE Is Included To Facilitate Contention Resolution
20 pages
BSE Telangana 10th Maths Model Paper II by Sakshi
No ratings yet
BSE Telangana 10th Maths Model Paper II by Sakshi
3 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
BDA
No ratings yet
BDA
148 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
EmTec Chapter 2
No ratings yet
EmTec Chapter 2
32 pages
RAM Guide 080305
No ratings yet
RAM Guide 080305
266 pages
Unit 6 Arrays: Structure
100% (1)
Unit 6 Arrays: Structure
14 pages
Algebra FX 2.0 Plus FX 1.0 Plus: User's Guide
No ratings yet
Algebra FX 2.0 Plus FX 1.0 Plus: User's Guide
455 pages
Design Thinking in Food Delivery Apps 9921004850
No ratings yet
Design Thinking in Food Delivery Apps 9921004850
12 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Taktis Multi Protocol FCAP MAN 1431KE M
No ratings yet
Taktis Multi Protocol FCAP MAN 1431KE M
187 pages
Racing
No ratings yet
Racing
130 pages
Unit I
No ratings yet
Unit I
64 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Final Intern Apoorva
No ratings yet
Final Intern Apoorva
38 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
153 pages
Big Data 1
No ratings yet
Big Data 1
28 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
0 Principles of Big Data
No ratings yet
0 Principles of Big Data
70 pages
Bdt..u1 PPT 08112023
No ratings yet
Bdt..u1 PPT 08112023
71 pages
BIG DATA AND ANALYTICS Presentation
No ratings yet
BIG DATA AND ANALYTICS Presentation
31 pages
Introduction To Big Data-0
No ratings yet
Introduction To Big Data-0
77 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
HKSI LE Withdrawal Application Form Chinese
No ratings yet
HKSI LE Withdrawal Application Form Chinese
1 page
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
4 TwinCAT - 3 - PLC - HMI
No ratings yet
4 TwinCAT - 3 - PLC - HMI
27 pages
BD Imp Ques 1
No ratings yet
BD Imp Ques 1
22 pages
Big Data
No ratings yet
Big Data
30 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
BigData AmberSahai1
No ratings yet
BigData AmberSahai1
32 pages
RocheCobasC111Host Interface Manual - 2.1 - EN - 2 PDF
No ratings yet
RocheCobasC111Host Interface Manual - 2.1 - EN - 2 PDF
93 pages
Unit I
No ratings yet
Unit I
61 pages
Unit 5
No ratings yet
Unit 5
41 pages
Test 1 Big Data
No ratings yet
Test 1 Big Data
17 pages
Big Data Analytics
No ratings yet
Big Data Analytics
31 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Bigdatappt
No ratings yet
Bigdatappt
31 pages
Lecture 6 BigData
No ratings yet
Lecture 6 BigData
61 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
17 pages
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
No ratings yet
BigData Terminology Hadoop MapReduce Yarn Spark File Formats
42 pages
Bigdata Overview PDF
No ratings yet
Bigdata Overview PDF
98 pages
Document Approval Chain Example
No ratings yet
Document Approval Chain Example
18 pages
Internship-Report 2028208
No ratings yet
Internship-Report 2028208
24 pages
Data For AI August 2021 Cognilytica Slides
No ratings yet
Data For AI August 2021 Cognilytica Slides
8 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Top Election Offenses
No ratings yet
Top Election Offenses
46 pages
Creating and Managing A Bluebeam Session For Construction Administration - r1
No ratings yet
Creating and Managing A Bluebeam Session For Construction Administration - r1
8 pages
Unit 1 J2 Big Data
No ratings yet
Unit 1 J2 Big Data
6 pages
2 Data Science
No ratings yet
2 Data Science
27 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Unit 3
No ratings yet
Unit 3
8 pages
Intel (R) ME SW Installation Guide PDF
No ratings yet
Intel (R) ME SW Installation Guide PDF
30 pages
Bda Unit-1 Notes
No ratings yet
Bda Unit-1 Notes
10 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
A Survey On OpenFlow-based Software Defined Networks: Security Challenges and Countermeasures
No ratings yet
A Survey On OpenFlow-based Software Defined Networks: Security Challenges and Countermeasures
14 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
Serverless Computing
No ratings yet
Serverless Computing
6 pages
User Manual For Amazfit Band 5
No ratings yet
User Manual For Amazfit Band 5
25 pages
DM 04 04 Rule-Based Classification
No ratings yet
DM 04 04 Rule-Based Classification
72 pages
500E25 - Use of Certificate Symbols-00000003
No ratings yet
500E25 - Use of Certificate Symbols-00000003
2 pages
Big Data PPT 55b0fc01e7543
No ratings yet
Big Data PPT 55b0fc01e7543
31 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Big Data-Hadoop
No ratings yet
Big Data-Hadoop
6 pages
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
No ratings yet
Prepared by Richa Btech (Cse) 6 Sem Dav University Jalandhar
30 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
Big Data
No ratings yet
Big Data
25 pages
Routing Protocol Selection Guide - IGRP, Eigrp, Ospf, Is-Is, BGP
No ratings yet
Routing Protocol Selection Guide - IGRP, Eigrp, Ospf, Is-Is, BGP
9 pages
Session 8 - George Strawn - Big Data
No ratings yet
Session 8 - George Strawn - Big Data
34 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
External Parts: Types of Peripheral
No ratings yet
External Parts: Types of Peripheral
3 pages
Poison Ivy 2.1.0 Documentation by Shapeless - Tutorial
No ratings yet
Poison Ivy 2.1.0 Documentation by Shapeless - Tutorial
12 pages
IOT Design 1. IOT Topology
No ratings yet
IOT Design 1. IOT Topology
5 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet

Week6 Iot Big Data

Uploaded by

Week6 Iot Big Data

Uploaded by

Big

Data and IoT

• Number of devices increasing exponenEally

Data collecEon Data transfer Data processing

Arduino Network Raspberry Pi 2 Commodity server

• Assume there is a job with 1TB total size

• CAP: Consistency – Availability – ParEEon tolerance

You might also like

•  Number of devices increasing exponenEally

•  Assume there is a job with 1TB total size

•  CAP: Consistency – Availability – ParEEon tolerance