20210913115458D3708 - Session 01 Introduction To Big Data Analytics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Course : COMP6678

Effective Period : 2021

Introduction to Big Data Analytics

Session 1
LEARNING OBJECTIVES
 LO 1: Define big data analytics concepts and the use for business
Topics

• Big Data Overview


– Data Structures
– Analyst Perspective on Data Repositories
• State of the Practice in Analytics
– BI Versus Data Science
– Current Analytical Architecture
– Drivers of Big Data
– Emerging Big Data Ecosystem and a New Approach to
Analytics
• Key Roles for the New Big Data Ecosystem
• Examples of Big Data Analytics
Big Data Overview
• Data is created constantly, and at an
ever-increasing rate.
– Mobile phones, social media, imaging
technologies to determine a medical
diagnosis
• all these and more create new data, and that
must be stored somewhere for some purpose.
• Several industries have led the way in developing
their ability to gather and exploit data:
– Credit card companies monitor every purchase their customers
make and can identify fraudulent purchases with a high degree of
accuracy using rules derived by processing billions of transactions.
– Mobile phone companies analyze subscribers’ calling patterns to
determine, for example, whether a caller’s frequent contacts are
on a rival network. If that rival network is offering an attractive
promotion that might cause the subscriber to defect, the mobile
phone company can proactively offer the subscriber an incentive
to remain in her contract.
– For companies such as LinkedIn and Facebook, data itself is their
primary product. The valuations of these companies are heavily
derived from the data they gather and host, which contains more
and more intrinsic value as the data grows.
• Three attributes stand out as defining Big Data
characteristics:
– Huge volume of data: Rather than thousands or millions of rows,
Big Data can be billions of rows and millions of columns.
– Complexity of data types and structures: Big Data reflects the
variety of new data sources, formats, and structures, including
digital traces being left on the web and other digital repositories
for subsequent analysis.
– Speed of new data creation and growth: Big Data can describe
high velocity data, with rapid data ingestion and near real time
analysis.
• Although the volume of Big Data tends to attract the most
attention, generally the variety and velocity of the data
provide a more apt definition of Big Data. (Big Data is
sometimes described as having 3 Vs: volume, variety, and
velocity.)
• Another definition of Big Data comes from the McKinsey
Global report from 2011:

Big Data is data whose scale, distribution, diversity, and/or


timeliness require the use of new technical architectures and
analytics to enable insights that unlock new sources of
business value.
Data Structures
• Big data can come in
multiple forms,
including structured
and non-structured
data such as financial
data, text files,
multimedia files, and
genetic mappings.

Big Data Growth is increasingly unstructured


Data Structures
• Here are examples of how
each of the four main types
of data structures may look.
– Structured data: Data
containing a defined data type,
format, and structure (that is,
transaction data, online
analytical processing [OLAP]
data cubes, traditional RDBMS,
CSV files, and even simple
spreadsheets).

Example of structured data


Data Structures
– Semi-
structured
data: Textual
data files with a
discernible
pattern that
enables parsing
(such as
Extensible
Markup
Language [XML]
data files that
are self-
describing and
defined by an
Example of semi-structured data
XML schema).
Data Structures
– Quasi-
structured
data: Textual
data with erratic
data formats
that can be
formatted with
effort, tools, and
time (for
instance, web
clickstream data
that may contain
inconsistencies
in data values
and formats).
Example of click stream data (quasi-structured data)
Data Structures
– Unstructured data: Data that has no inherent structure, which
may include text documents, PDFs, images, and video.

Example of unstructured data: video about Antarctica expedition


Analyst Perspective on Data Repositories
• Table 1-1 summarizes the characteristics of the data repositories
mentioned in this section.

Types of Data Repositories, from an Analyst Perspective


State of the Practice in Analytics
• Current business problems provide many opportunities
for organizations to become more analytical and data
driven, as shown in Table below.

Business Drivers for Advanced Analytics


BI Versus Data Science
• Comparing BI with Data Science
Current Analytical Architecture

Typical analytic architecture


Drivers of Big Data
• The data now comes from multiple sources, such as these:
– Medical information, such as genomic sequencing and diagnostic
imaging
– Photos and video footage uploaded to the World Wide Web
– Video surveillance, such as the thousands of video cameras spread
across a city
– Mobile devices, which provide geospatial location data of the
users, as well as metadata about text messages, phone calls, and
application usage on smart phones
– Smart devices, which provide sensor-based collection of
information from smart electric grids, smart buildings, and many
other public and industry infrastructures
– Nontraditional IT devices, including the use of radio-frequency
identification (RFID) readers, GPS navigation systems, and seismic
processing
Drivers of Big Data

Data evolution and the rise of Big Data sources


Emerging Big Data Ecosystem and a New
Approach to Analytics
As the new ecosystem takes shape, there are four main groups of players
within this interconnected web.

Emerging Big Data ecosystem


Key Roles for the New Big Data Ecosystem
The Big Data ecosystem demands three categories of roles.

Key roles of the new Big Data ecosystem


Examples of Big Data Analytics
Example (1)
As mentioned earlier, Big Data presents many opportunities to
improve sales and marketing analytics.
• An example of this is the U.S. retailer Target. Charles Duhigg’s
book The Power of Habit discusses how Target used Big Data
and advanced analytical methods to drive new revenue. After
analyzing consumer purchasing behavior. Target’s statisticians
determined that the retailer made a great deal of money from
three main life-event situations.
– Marriage, when people tend to buy many new products
– Divorce, when people buy new products and change their spending
habits
– Pregnancy, when people have many new things to buy and have an
urgency to buy them
Example (2)
• Hadoop represents another example of Big Data innovation on the IT
infrastructure.
– Apache Hadoop is an open-source framework that allows companies to
process vast amounts of information in a highly parallelized way.
– Hadoop represents a specific implementation of the MapReduce paradigm
and was designed by Doug Cutting and Mike Cafarella in 2005 to use data
with varying structures.
– It is an ideal technical framework for many Big Data projects, which rely on
large or unwieldy datasets with unconventional data structures.
– One of the main benefits of Hadoop is that it employs a distributed file
system, meaning it can use a distributed cluster of servers and commodity
hardware to process large amounts of data.
– Some of the most common examples of Hadoop implementations are in the
social media space, where Hadoop can manage transactions, give textual
updates, and develop social graphs among millions of users.
– Twitter and Facebook generate massive amounts of unstructured data and
use Hadoop and its ecosystem of tools to manage this high volume.
Example (3)
• Finally, social media represents a tremendous opportunity to leverage
social and professional interactions to derive new insights.
– LinkedIn exemplifies a company in which data itself is the product. Early on, LinkedIn
founder Reid Hoffman saw the opportunity to create a social network for working
professionals.
– As of 2014, LinkedIn has more than 250 million user accounts and has added many
additional features and data-related products, such as recruiting, job seeker tools,
advertising, and InMaps, which show a social graph of a user’s professional network.
– Figure is an example of an InMap visualization that enables a LinkedIn user to get a
broader view of the interconnectedness of his contacts and understand how he
knows most of them.

Data visualization of a user’s social network using InMaps


REFERENCES
• Data Science and Big Data Analytics: Discovering, Analyzing,
Visualizing and Presenting Data. Ch. 1.

You might also like