0% found this document useful (0 votes)
15 views22 pages

Chapter - 01 - Introduction To Big Data

Chapter_01_Introduction to Big Data

Uploaded by

datnthe171250
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views22 pages

Chapter - 01 - Introduction To Big Data

Chapter_01_Introduction to Big Data

Uploaded by

datnthe171250
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

INTRODUCTION

TO BIG DATA
ANALYTICS
Author : FU
Date : Mar-2022
Objectives

After studying this chapter, the student should be able to


understand the key concepts:
 Clarify what is meant by Big Data?
 Why advanced analytics are needed?
 How Data Science differs from Business Intelligence (BI)?
 What new roles are needed for the new Big Data ecosystem?
Content

1. Big Data Overview


2. State of the Practice in Analytics
3. Key Roles for the New Big Data Ecosystem
4. Examples of Big Data Analytics
1. Big Data Overview – Where Big Data comes?

 Data created constantly, and at an ever-increasing rate. Mobile phones,


social media, Imaging technologies … create new data, stored some
where for some purpose. Devices and sensors automatically generate
diagnostic information that needs to be stored and processed in real time.
 Merely keeping up with this huge influx of data is difficult, more
challenging in analyzing vast amount of data, especially non-conform
traditional data structure
 Challenges of data deluge present the opportunity to transform business,
government, science, and everyday life
 Several industries led the way in the ability to gather and exploit data:
o Credit card companies
o Mobile phone companies
o For companies such as LinkedIn and Facebook
1. Big Data Overview – What is Big data?

 Three attributes stand out as defining Big Data characteristics:


o Huge volume of data
o Complexity of data types and structures (variety of new data sources, formats,
and structures)
o Speed of new data creation and growth

 Big Data is sometimes described as having 3 Vs:


o volume,
o variety,
o and velocity

 Another definition of Big Data comes from the McKinsey Global


report from 2011:
o Big Data is data whose scale, distribution, diversity, and/or timeliness
require the use of new technical architectures and analytics to enable
insights that unlock new sources of business value.
1. Big Data Overview – What methods and
tools used for Big data analysis?

 Cannot be efficiently analyzed using only traditional databases


or methods.
 Require new tools and technologies to store, manage, and
realize the benefits
 New tools and technologies enable creation, manipulation, and
management of large datasets and the storage environments
 McKinsey’s definition of Big Data implies that organizations will
need new data architectures and analytic sandboxes, new
tools, new analytical methods, and an integration of multiple
skills into the new role of the data scientist
1. Big Data Overview – What’s Driving Data
Deluge?

 Several sources of the Big Data deluge. The rate of data


creation is accelerating, driven by many of the items in Figure
1-1.
1. Big Data Overview - Fastest-growing sources of
Big Data?

 Social media and genetic sequencing are


among the fastest-growing sources of Big Data
o Social media data
2012 Facebook users posted 700 status updates per second
worldwide
Facebook construct social graphs to analyze users data
o Genetic sequencing
Genetic sequencing and human genome mapping provide a detailed
understanding of genetic makeup and lineage
Health care industry is looking toward these advances to help
predict which illnesses a person is likely to get in his lifetime and
take steps to avoid these maladies or reduce their impact through
the use of personalized medicine and treatment
1.1 Data Structures

 Big data forms:


structured and non-
structured data
(financial data, text files,
multimedia files, and
genetic mappings)

 Most of the Big Data


is unstructured or
semi-structured,
requires different
techniques and tools
to process and
analyze
1.1 Data Structures – Data types

 Structured data: Data containing a defined data type,


format, and structure
 Semi-structured data: Textual data files with a discernible
pattern that enables parsing (such as Extensible Markup
Language [XML] data files that are self-describing and defined
by an XML schema)
 Quasi-structured data: Textual data with erratic data formats
that can be formatted with effort, tools, and time (for instance,
web clickstream data that may contain inconsistencies in data
values and formats)
 Unstructured data: Data that has no inherent structure,
which may include text documents, PDFs, images, and video
1.2 Analyst Perspective on Data Repositories
2. State of the Practice in Analytics

 Business problems provide many opportunities for


organizations to become more analytical and data driven, as
shown in Table 1-2
2.1 BI versus Data Science - What is BI?
 Several ways to compare these groups of analytical techniques
 BI tends to provide reports, dashboards, and queries on business
questions (closed-ended and explain current or past behavioral) for the
current period or in the past
 BI systems used to answer questions related to quarter-to-date
revenue, progress toward quarterly targets, and understand how much
of a given product was sold in a prior quarter or year
 BI provides hindsight and some insight and generally answers
questions related to “when” and “where” events occurred
 BI problems tend to require highly structured data organized in rows
and columns for accurate reporting
2.1 BI versus Data Science - What is Data Science?

 Use disaggregated data in a more forward-looking,


exploratory way, focusing on analyzing the present
and enabling informed decisions about the future.
 Be more exploratory in nature and may use scenario
optimization to deal with more open-ended questions,
focusing on questions related to “how” and “why”
events occur.
 Data Science projects tend to use many types of data
sources, including large or unconventional datasets
2.1 BI versus Data Science - Summary
2.2 Analytical Architecture

FIGURE 1-9 Typical analytic architecture


2.3 Drivers of Big Data

 Data now comes


from multiple
sources:
o Medical information
o Photos and video
o Video surveillance
o Mobile devices
o Smart devices
o Nontraditional IT
devices

FIGURE 1-10 Data evolution and the rise of Big Data sources
2.4 Emerging Big Data Ecosystem and a New
Approach to Analytics

 New ecosystem
takes shape,
there are four
main groups of
players within
this
interconnected
web
o Data devices
o Data collectors
o Data
aggregators
o Data users and FIGURE 1-11 Emerging Big Data ecosystem
3. Key Roles for the New Big Data Ecosystem (1)

FIGURE 1-12 Key


roles of the new
Big Data
ecosystem
3. Key Roles for the New Big Data Ecosystem (2)

 Three recurring sets of activities for data scientists


o Reframe business challenges as analytics challenges
o Design, implement, and deploy statistical models and data mining
techniques on Big Data
o Develop insights that lead to actionable recommendations

 Required main skills for data scientists


o Quantitative skill
o Technical aptitude
o Skeptical mind-set and critical thinking
o Curious and creative
o Communicative and collaborative:

FIGURE 1-13 Profile of a Data Scientist


4. Examples of Big Data Analytics

 An example of this is the U.S. retailer Target. Charles Duhigg’s book The Power
of Habit discusses how Target used Big Data and advanced analytical
methods to drive new revenue. After analyzing consumer purchasing
behavior, Target’s statisticians determined that the retailer made a great deal
of money from three main life-event situations.
o Marriage, when people tend to buy many new products
o Divorce, when people buy new products and change their spending habits
o Pregnancy, when people have many new things to buy and have an urgency to buy them

 Hadoop represents another example of Big Data innovation on the IT


infrastructure. Apache Hadoop is an open source framework that allows
companies to process vast amounts of information in a highly parallelized way
 Social media e.g. LinkedIn (250 million user accounts) represents a
tremendous opportunity to leverage social and professional interactions to
derive new insights
Summary

 Big Data comes from myriad sources, including social media,


sensors, the Internet of Things, video surveillance, and many
sources of data
 Organizations evolve their processes and see the opportunities from
Big Data. Move beyond traditional BI activities, such as using data to
populate reports and dashboards, and move toward Data Science-
driven projects that attempt to answer more open-ended and
complex questions.
 Big Data presents requires new data architectures, including analytic
sandboxes, new ways of working, and people with new skill sets
 Need to build Data Science team. Growing talent gap that makes
finding and hiring data scientists in a timely manner difficult

You might also like