01 - Introduction To Big Data Analytics PDF
01 - Introduction To Big Data Analytics PDF
2
Course Outlines
1. Introduction to big data analytics
2. Hadoop Ecosystem
3. MapReduce (Distributed processing)
4. Hadoop DB
5. Spark (Big data processing)
6. Pig (HLL for Data Processing)
7. Hive (Data warehouse system)
8. Hbase (Distributed database)
9. Big data use cases Source:
IBM Big Data & Analytics Course
Level (1) & (2)
.v • ■
• ._,'
NUMBER OF DATA VIDEO DATA PER TWEE TOTAL MINUTES OATA PRODUCT
EMAILS CONSUMED UPLOADED DAY TS SENT SPENT ON AND S
SENT BY TO YOUTUBE PROCESS PER RECEIVED ORDERED
EVERY HOUSEHOL EVERY ED BY OAV FACEBOOK BY ON
SECOND DS EACH MINUTE GOOGLE MOBILE AMAZON
DAY EACH MONTH INTERNET PER
USERS SECONO
8
9
10
Characteristics of Big Data
The main characteristic of big data is its huge
volume collected through various sources. We are
used to measuring data in Gigabytes or Terabytes.
However, according to various studies, big data volume
created so far is in Zettabytes which is equivalent to a
trillion gigabytes.
Tabular Representation of various data Sizes
Big data is collected and created in various
formats and sources. It includes structured
data as well as unstructured data like text,
multimedia, social media, business reports etc.
Structured data such as bank records, demographic data,
inventory databases, business data, product data feeds
have a defined structure and can be stored and analyzed
using traditional data management and analysis methods.
Unstructured data includes captured like images, tweets
or Facebook status updates, instant messenger
conversations, blogs, videos uploads, voice recordings,
sensor data. These types of data do not have any defined
pattern.
Note:
• Unstructured data is most of the time reflection of human
thoughts, emotions and feelings which sometimes would be
difficult to be expressed using exact words.
• One of the main objectives of big data is to collect all this
unstructured data and analyze it using the appropriate
technology. Data crawling, also known as web crawling, is a
popular technology includes data mining algorithms designed to
reach the maximum depth of a page and extract useful data
worth analyzing.
In today’s fast paced world, speed is one of the key
drivers for success in your business as time is
equivalent to money.
Expectations of quick results and quick deliverables are
pressing to a great extent.
Big data technology allows you to process the real- time data,
sometimes without even capturing in a database.
•Structured
•Semi-structured
•Unstructured
Structured Data
Any data that can be stored, accessed and processed in the form
of fixed format is termed as a 'structured' data.
Semi-Structured Data
Traditional vs. Big Data approaches to using data
Using people’s history on internet, what they buy, what they search giving a rough
view of attitude on a product.
More, these output can be used to study:
customer satisfaction, churn prediction, financial performance, stock performance.
37
PREPARE
YOURSELF
TO SURF THE DATA ERA!