0% found this document useful (0 votes)
32 views57 pages

Module 13 FusionInsight HD Solution Overview

The document provides an overview of Huawei's FusionInsight HD, a big data solution that includes features, architecture, and success cases. It highlights the platform's capabilities in data integration, processing, and analysis, as well as its contributions to the open-source community. Additionally, it discusses system reliability, security measures, and advanced functionalities such as Spark SQL and Apache CarbonData for optimized data handling.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views57 pages

Module 13 FusionInsight HD Solution Overview

The document provides an overview of Huawei's FusionInsight HD, a big data solution that includes features, architecture, and success cases. It highlights the platform's capabilities in data integration, processing, and analysis, as well as its contributions to the open-source community. Additionally, it discusses system reliability, security measures, and advanced functionalities such as Spark SQL and Apache CarbonData for optimized data handling.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

FusionInsight HD

Solution Overview

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.


Objectives
 After completing this course, you will be able to understand:
 Huawei big data solution FusionInsight HD
 The features of FusionInsight HD
 Success cases of FusionInsight HD

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. FusionInsight Overview

2. FusionInsight Features

3. Success Cases of FusionInsight

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Apache Hadoop - Prosperous Open - Source
Ecosystem

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Big Data Is an Important Pillar for Huawei
ICT Strategy
Huawei Strategy Map
Huawei Big Data R&D Team
Content Global Distribution
Third and App
Partners Third ISVs

Enterprise SDP
Apps BSS/OSS
Professional Service

Big Data Analytics Platform


Data Center Infrastructure
Core Network

IP+Optical

Enterprise
FBB MBB
Network  There are eight research centers with
Things People thousands of employees around the
(M2M Module) (Smart Device) world.
 World-class data mining and artificial
Source: Huawei corporate presentation intelligence experts, such as PMC
Committer and IEEE Fellow

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
FusionInsight HD: From Open-Source to
Enterprise Versions
Version
Security Configuration
mapping

Patch selection Performance Baseline


optimization selection

Hadoop HBase Log

Initial Prosperous Enterprise


open-source community version

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
FusionInsight Platform Architecture
Power Financial industry Big data cloud services
Safe city Telecom
industry
Big data cloud services
Data integration services Data processing services Real-time computing Data analysis services Machine learning Artificial Intelligence
services services Service (AIS)
Data Ingest services MapReduce Service (MRS) Stream services DWS MLS Image tagging service

DPS services, ... CloudTable RTD services, ... MOLAP services, ... Log analysis, ... NLP service, ...

FusionInsight
Porter FusionInsight Miner data insight FusionInsight Farmer data intelligence FusionInsight
Data Manager
integration Weaver graphics analysis engine RTD real-time decision engine Management
platform
Sqoop Miner Studio mining platform Farmer Base reasoning framework
Batch
Security
collection management
FusionInsight HD data processing
Flume Spark FusionInsight Elk Storm/Flink Performance
Real-time One-stop analysis Standard SQL Stream processing management

Collaboration service
collection framework engine framework
Fault

ZooKeeper
FusionInsight management
Kafka Yarn resource management
Message LibrA
Parallel Tenant
queue CarbonData new file format management
HBase database
Oozie NoSQL database
HDFS distributed file system Configuration
Job management
scheduling

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Contribution to the Open-Source
Community
Create top
Lead the community
community to projects and be
Perform complete future- recognized by
kernel-level oriented kernel- the ecosystem
development level feature
to support development
Be able to
resolve kernel- key service
Be able to level problems features
resolve kernel- by teams
level problems
(outstanding
Large number of
individuals)
Locate components and
peripheral codes
Be able problems Frequent
Apache open-source
to use component update
community ecosystem
Hadoop Efficient feature
integration

Outstanding product development and delivery capabilities and carrier-class


operation support capabilities empowered by the Hadoop kernel team

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Contents
1. FusionInsight Overview

2. FusionInsight Features

3. Success Cases of FusionInsight

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
System and Data Reliability

All components Cross–data center


without SPOF DR

HA for all Third-party backup system


management nodes integration
System Data
Reliability Reliability
Software and hardware
health status Key data power-off
monitoring protection

Network plane Hot-swappable hard


isolation disks

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Security

System Permission Data


Security Authentication Security

Fully open-
Authentication
source
management of Data integrity
Component
user permission verification
enhancement

Operating User permission


system security control of File data
hardening different encryption
components

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Network Security and Reliability - Dual-
Plane Networking
App-Server App-Server Cluster service
plane Network Type Trustworthiness Description

OMS-Server
Hadoop cluster
core components
Cluster service
High for the storage and
Cluster plane
transfer of service
management plane
data

WebUI-Client It only manages the


Cluster
cluster and is
management Medium
involved with no
plane
service data.

Maintenance network
Only web services
outside the cluster Maintenance
provided by the
network outside Low
OMS server can be
the cluster
accessed.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Visualized Cluster Management,
Simplifying O&M

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Graphical Health Check Tool (1)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Graphical Health Check Tool (2)
Qualification ratio of
inspection items

 Qualification ratio of inspection items


 Disqualification ratio of inspection items

Node qualification rate

Node qualification rate 


Node disqualification rate 

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Easy Development
Native APIs of HBase Enhanced APIs
try { try {
table = new HTable(conf, TABLE); table = new ClusterTable(conf,
// 1. Generate RowKey. CLUSTER_TABLE);
{......} // 1. Create CTRow instance.
// 2. Create Put instance. CTRow row = new CTRow();
Put put = new Put(rowKey); // 2. Add columns.
// 3. Convert columns into qualifiers(Need to {........}
consider merging cold columns). } // 3. Put into HBase.
// 3.1. Add hot columns. table.put(TABLE, row);
{.......} } catch (IOException e) {
// 3.2. Merge cold columns. // Does not care connection re-creation.
{.......}
put.add(COLUMN_FAMILY, Bytes.toBytes("QA"),
hotCol); Enhanced HBase SDK
// 3.3. Add cold columns.
put.add(COLUMN_FAMILY, Bytes.toBytes("QB"), HBase
coldCols) Recoverable table
Connection Schema design
Manager Data tool

The HBase table design tool, HBase API


connection pool management
function, and enhanced SDK are
used to simplify development of HBase
complex data tables.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
FusionInsight Spark SQL
 SQL compatibility – All 99 TPC-  Long-term stability test:
DS cases of the standard  Memory optimization – resolves
SQL:2003 are passed. memory leakage problems,
decentralizes broadcasting, and
optimizes Spark heap memory.
 Data update and deletion –  Communication optimization – RPC
Spark SQL supports data enhancement, shuffle fetch
insertion, update, and deletion optimization, and shuffle network
configuration
when the CarbonData file format
 Scheduling optimization – GetSplits(),
is used. AddPendingTask() acceleration (), DAG
serialization reuse
 Large-scale Spark with stable  Extreme pressure test – 24/7 pressure
and high performance – is used test, HA test
to test the TPC-DS long-term  O&M enhancement – Log security
review and DAG UI optimization
stability in the scale of 100 TB
data volume.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Spark SQL Multi-Tenant
JDBCServer (Proxy) Yarn

YarnQuery Tenant A
Spark JDBC
JDBC Spark JDBCServer 1
Proxy 1
Beeline Spark JDBCServer 2
Spark JDBC
Proxy 2
JDBC
YarnQuery Tenant B
Beeline Spark JDBC
Proxy X Spark JDBCServer 1
...

Spark JDBCServer 2

The community's Spark JDBCServer supports only single tenants. A tenant is


bound to a Yarn resource queue.
FusionInsight Spark JDBCServer supports multiple tenants, and resources
are isolated among different tenants.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Spark SQL Small File Optimization

1 MB+1 MB … 1 MB+1 MB … RDD2

1 MB 1 MB 1 MB 1 MB 1 MB 1 MB RDD1

1 MB 1 MB 1 MB 1 MB 1 MB 1 MB HDFS

Text/Parquet/ORC/Json Table on HDFS

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Apache CarbonData - Converging Data
Formats of Data Warehouse (1)
CarbonData:
OLAP (multidimensional analysis) A single file format meets the requirements
of different access types.

Sequential access
(large-scale Random access (small range
scanning) scanning)

 Random access (small-scale scanning):


7.9 to 688 times
 OLAP/Interactive query: 20 to 33
times
 Sequential access (large-scale
scanning): 1.4 to 6 times

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Apache CarbonData - Converging Data
Formats of Data Warehouse (2)
 Apache Incubator Project since June 2016
 Apache releases
 4 stable releases Compute

 Latest 1.0.0, Jan 28, 2017


 Contributors:

Storage

 In Production:

CarbonData supports IUD statements and provides data update and deletion capabilities in big data
scenarios. Pre-generated dictionaries and batch sort improve CarbonData import efficiency while
global sort improves query efficiency and concurrency.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
CarbonData Enhancement

 Quick query response: CarbonData features high-performance query. The query speed of CarbonData is ten times
that of Spark SQL. The dedicated data format used by CarbonData is designed based on high-performance queries,
including multiple index technologies, global dictionary codes, and multiple push down optimizations, thereby
quickly responding to TB-level data queries.
 Efficient data compression: CarbonData compresses data by combining the lightweight and heavyweight
compression algorithms. This compression method saves 60% to 80% data storage spaces coupled with significant
hardware storage cost savings.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Flink – Distributed Real - Time Processing
System
 Flink is a distributed real-time processing system with low latency (latency measured
in milliseconds), high throughput, and high reliability, which is promoted by Huawei
in the IT field. Flink is integrated into FusionInsight HD for sale.
 Flink is a unified computing framework that supports both batch processing and
stream processing. It provides a stream data processing engine that supports data
distribution and parallel computing. Flink features stream processing and is a top
open-source stream processing engine in the industry. Flink is suitable for low-latency
data processing scenarios. Flink provides high-concurrency pipeline data processing,
millisecond-level latency, and high reliability.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Visible HBase Modeling
Column Family Column Family
A collection of columns that A collection of columns that have
have service association service association relationships
relationships

Column
User list: Qualifier
Each column HBase column
indicates an Each column indicates a KeyValue.
attribute of service Mapping
data.

reverse(Column1, 4) Column2 Column3

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
HBase Cold Field Merging Transparent to
Applications
User Data

ID Name Phone ColA ColB ColC ColD ColE ColF ColG ColH

A B C D
HBase KeyValues

Problems
 High expansion rate and poor data query performance due to the HBase column increase
 Increased development complexity and metadata maintenance due to the application layer
merging cold data columns
Features
 Cold field merging transparent to applications
 Real-time write and batch import interfaces

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Hive/HBase Fine - Grained Encryption
Hive/HBase Application scenarios
 Data saved in plaintext mode may cause
security risks of sensitive data leakage.
Sensitive Sensitive Insensitive Solution
data write data read data  Hive encryption of tables and columns
 HBase encryption of tables, column
families, and columns
 Encryption algorithms of AES and SM4, and
user-defined encryption algorithms
Encryption/Decryption Customer benefits
 Sensitive data is encrypted and stored by
table or column.
HDFS  Algorithm diversity and system security
*(&@#$^%!%$#$!(*  Encryption and decryption transparency to
Insensitive
^&*^*5!$!@^%$^!$! data services
%#$@%#!!$#@!

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
HBase Secondary Indexing
UserTable UserTable_idx UserTable
ColumnFamily Data CF ColumnFamily
RowKey Scanning RowKey RowKey
colA colB colC colA colB colC
area
a0001 01 a0001#coluA01#a0001 a0001 01
a0002 02 a0001#coluA02#a0002 a0002 02
a0003 06 a0001#coluA03#a0006 a0003 06
a0004 08 Destination a0001#coluA04#a0005 a0004 08
a0005 04 line a0005 04
a0001#coluA06#a0003
a0006 03 B C a0001#coluA08#a0004 a0006 03 B C

No index: "Scan+Filter", scanning Secondary index: The target data can


a large amount of data be located after twice I/Os.

 Index Region and Data Region as companions under a unified processing


mechanism
 Original HBase API interfaces, user-friendly
 Coprocessor-based plug-ins, easy to upgrade
 Write optimization, supporting real-time write

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
CTBase Simplifies HBase Multi - Table
Service Development
Transaction CTBase
Account_id Amount Time A0001 Andy $100232

AccountInfo
12/12/2014 12/12/2014
A0001 $100
18:00:02
A0001 $100
18:00:02 record

10/12/2014 10/12/2014
A0001 $1020 A0001 $1020
15:30:05 15:30:05

09/12/2014 09/12/2014
A0001 $89 A0001 $89
13:00:07 13:00:07
Transaction
11/12/2014 record
A0002 $105 A0002 Lily $902323
20:15:00

AccountInfo A0002 $105


11/12/2014
20:15:00
Account_id Account_name Account_balance
11/11/2014
A0002 $129
18:15:00
A0001 Andy $100232
A0003 Selin $90000
A0002 Lily $902323

A0003 Selin $90000

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
HFS Small File Storage and Retrieval Engine
Application scenario
 A large number of small files and associated
description information needs to be stored.
Current problem
 A large number of small files are stored in the
Hadoop Distributed File System (HDFS), which
brings great pressure to the NameNode. HBase
stores a large number of small files, and
Metadata and Medium/Large Compaction wastes I/O resources.
small files files
HFS solution value
 The HFS stores not only small files but also
metadata description information related to the
files.
 The HFS provides a unified and friendly access
API.
 The HFS selects the optimal storage solution
based on the file size.
 Small files are directly stored in the Medium-
sized Objects (MOB).
 Large files are directly stored in the HDFS.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Label - based Storage
The data of online applications is stored only on
nodes labeled with "Online Application" and is
I/O conflicts affect isolated from the data of offline applications. This
online services. design prevents I/O competition and improves the
local hit ratio.
Online Offline Online Offline
application application application application

Batch processing

Batch processing

Batch processing
application
application

application

Online
Online

Online
HDFS common storage HDFS label-based storage

 Solution description: Label cluster nodes based on applications or physical characteristics, for example, label a node
with "Online Application." Then application data is stored only on nodes with specified labels.
 Application scenarios: 1. Online and offline applications share a cluster. 2. Specific services (such as online
applications) run on specific nodes.
 Customer benefits: 1. I/Os of different applications are isolated to ensure the application SLA. 2. The system
performance is improved by improving the hit ratio of application data.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Label - based Scheduling
Spark MapReduce Spark MapReduce
application application application application

Large memory

Large memory
memory
Large

Default

Default

Default
Common scheduling Label-based scheduling

Fine-grained scheduling based on application awareness, improving resource utilization


 Different applications such as online and batch processing are running on nodes with their specific labels to absolutely
isolate computing resources of different applications and improve service SLA.
 Applications that have special requirements on node hardware are running only on nodes with special hardware, for
example, Spark applications need to run on nodes with large memory. Resources are scheduled on demand, improving
resource utilization and system performance.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
CPU Resource Configuration Period
Adjustment
Batch processing Real-time
Real-time application Batch processing application
application application

Hive/Spark/… Hive/Spark/…
HBase HBase
QA QB QC QD QA QB QC QD
CPU CPU
Cgroup1 Cgroup2 Cgroup1 Cgroup2
40% 60% 80% 20%
Time
7:00 20:00

 Solution description: Different services have different proportions of resources in different time segments. For
example, from 7:00 a.m. to 20: 00 p.m., real-time services can be allocated to 60% resources at peak hours. From
20:00 p.m. to 7: 00a.m., the 80% resource can be allocated to the batch processing applications when the real-
time services are at off-peak hours.
 Application scenario: The peak hours and off-peak hours of different services are different.
 Customer benefit: Services can obtain as many resources as possible at peak hours, boosting the average
resource utilization of the system.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Resource Distribution Monitoring

 Benefits
 Quick focusing on the most critical resource consumption
 Quick locating of the node with the highest resource consumption to take
appropriate measures

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Dynamic Adjustment of the Log Level

 Application scenario: When a fault occurs in the Hadoop cluster, quickly locating the
fault needs to change the log level. During log level modification, the process cannot
be restarted, resulting in service interruption. How do I resolve this problem?
 Solution: Dynamically adjusting the log level on the WebUI
 Benefits: When locating a fault, you can quickly change the log level of a specified
service or node without restarting the service or interrupting services.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Wizard - based Cluster Data Backup

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Wizard - based Cluster Data Restoration

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Multi Tenant Management
Multi-level tenant management

Company Enterprise tenant

Dept. A
Tenant A_1 Tenant A

Sub-department
A_1

Computing
Yarn queue (CPU/memory/I/O)
resources
Storage resources HDFS (storage space/file overview)

Service resources HBase ...

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 37
One Stop Tenant Management

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Visualized, Centralized User Rights
Management
Visualized, centralized user rights management is easy to use, flexible, and
refined:
 Easy to use: visualized multi-component unified user rights management
 Flexible: role-based access control (RBAC) and predefined privilege sets (roles) which
can be used repeatedly
 Refined: multi-level (database/table/column-level) and fine-grained
(Select/Delete/Update/Insert/Grant) authorization

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 39
Automatic NTP Configuration
External NTP Server

NTP Client

Management Management
Node (Active) Node (Standby)
NTP Server NTP Client

NTP Client NTP Client NTP Client NTP Client NTP Client
Data Node Data Node Data Node Data Node Control Node

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 40
Automatically Configuring Mapping of
Hosts
 Benefits
 Shorten environment
preparation time to install
the Hadoop cluster.
 Reduce probability of user
configuration errors.
 Reduce the risk of
manually configuring
mapping for stable
running nodes after
capacity expansion in a
large-scale cluster.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Rolling Restart/Upgrade/Patch
HDFS rolling upgrade example:
 Modifying a Configuration Service interruption duration of core
 Performing the Upgrade components: no interruption in 12
 Installing the Patch Upgrade
hours
Without
Service
Interrupting
Services
C70 Client
ZooKeeper
C60
HDFS
HDFS Cluster Yarn
NameNode NameNode HBase
Storm

DataNode DataNode DataNode DataNode DataNode Flume


Loader
Spark
Hive
Solr

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 42
Contents
1. FusionInsight Overview

2. FusionInsight Features

3. Success Cases of FusionInsight

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 43
Huawei Smart Transportation Solution

Secure Organized
Challenges to key vehicle identification: insufficient Challenges to checkpoint and e-police
capability of key vehicle automatic identification capabilities: rigid algorithm
Insufficient traffic accident detection capability: Challenges to violation review and handling
blind spot, weak detection technology, and manual capabilities: heavy workload
accident reporting and handling Challenges to special attack data analysis
Low efficiency of special attacks: information capabilities: manual analysis and taking 7-30
fragmentation and poor special attack platform days

Smooth Intelligent
Challenges to traffic detection capability: faulty
detection devices, low detection efficiency, and low Computing intelligence challenges: closed system
reliable detection results and technology and fragmented information
Challenges to traffic analysis capabilities: not shared Perceptual intelligence challenges: weak awareness
traffic information among cities of traffic, events, and peccancy
Challenges to traffic signal optimization Cognitive intelligence challenges: lack of traffic
awareness in regions and intersections

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Traffic Awareness in the Whole City: Deep Learning
and Digital Transformation
 No camera is added. By deep learning and intelligent analysis, about 50 billion real-time pavement
traffic parameters are added every month, which lays a foundation for digital transformation of
traffic.

Vehicle traffic and event awareness Traffic flow analysis

Traffic accident perception and analysis Traffic signal optimization


Algorithm warehouse
Deep learning
platform Deep learning training engine Deep learning search
Deep learning reasoning engine
engine

Video cloud storage and cloud computing platform Traffic big data attacks modeling engine and time
and space analysis engine.

Monitoring more than More than 4000 traffic More than 3000 channels
6000 roads checkpoints of HD e-police
Note: The preceding figures use a city as an example.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 45
Traffic Big Data Analysis Platform

Key vehicle Key vehicle


traffic analysis violation analysis

Number of vehicles Number of vehicles (400


(400 million) million)
+pass records (12.6 +illegal records (2.6
billion) billion) National
transportation
integrated command
Detection Buy and sell analysis Serving 400 million vehicles in provinces and
replacement analysis cities in China, the traffic big data analysis
Number of vehicles (400 platform analyzes 2.6 billion illegal records and
Number of vehicles (400 million)
+illegal records (2.6 billion) million) 12.6 billion traffic records, greatly improving the
+detection records (1.1 billion) +illegal records (2.6 billion) security and orderly management capability of
(20 minutes) +number of drivers who cleared cross-province traffic and reaching the world's
the license point (110 million)
leading level.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 46
Limitations of Traditional Marketing
Systems

 Customer groups are  Advertisements can be


obtained through data pushed only according to the
collection and filtering, preset rules.
which is time-consuming  Real-time marketing by
and labor-consuming. event or location cannot be
 Precise sales cannot be Low implemented.
implemented.
accuracy
Non-real-
 Mainly structured data,
unable to handle semi-
time
structured data.  Marketing strategies and
 Customer behavior rules are fixed. New rules
involved in rule need to be developed and
operation and implemented.
configurations, low
support rate.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 47
Marketing System Architecture

Application Marketing Marketing Statistical Scheduling


layer Marketing plan ...
execution analysis analysis monitoring

Event detection Recommendation


Model layer Marketing model Rule engine
model engine

Ark
Chinasoft big data middleware (Ark)

Huawei enterprise-class big data platform (FusionInsight)


Real-time stream FusionInsight
Offline processing component
processing component Farmer RTD

ZooKeeper
ZooKeeper
Big data Flume Spark Loader Hive Farmer
platform MPPDB
Storm/Flink HBase MapReduce
Kafka Redis HDFS/Yarn RTD MQ Redis

Manager
Infrastructure
/Cloud x86 server ... x86 server Network device Security device
platform

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 48
Big Data Analysis, Mining, and Machine
Learning Make Marketing More Accurate
Data Predictive Model Model effect
monitoring and
analysis modeling application evaluation

Data source Marketing Effect evaluation and


activity plan continuous optimization

Customer group filtering


SMS
Marketing activity
Customer data Multiple
channels
App
Correlation analysis
Twitter
Analysis report

Model effect evaluation, customer data update, and model improvement

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 49
Solution Benefits
Precise: precise customer Easy to use: self-learning of
group mining rules
 Customizable/Developmen
 Customer-based t variables, rules, and rule
360-degree view modes
 Customer type-  Rule auto-learning and
based mining optimization

Comprehensive: supporting Precise Reliable: uninterrupted


various types of data marketing services
 Support of various types of data
(structured, unstructured, and
 Always-on service
semi-structured)
 Support of multi-channel
comprehensive analysis
 Support of statistics analysis
Real-time: real-time marketing
information push
 Event-based
 Location-based
 Millisecond-level analysis based
on full data

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 50
A Carrier: Big Data Convergence to
Achieve Big Values
Credit
Crowd
investigation
Service experience ... Internet access log Domain name log ...
gathering quality Signaling log query
computing query query

Real-time query platform


Basic analysis platform
Hadoop resource pool

Hive Spark SQL


KV interface SQL interface

Manager
MapReduce Spark
Yarn/ZooKeeper
Yarn/ZooKeeper
HBase
HDFS

ETL
Data source

Traditional data (BOM) New data (Internet)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 51
Philippine PLDT: Converting and
Archiving Massive CDRs
Report/Interactive analysis/Forecast
analysis/Text mining CSP

Data Federation

DWH Hadoop Archiving CSSD


Aggregation

Periodically obtain the source file from the transit server, convert the files to the T0/T1
format, and
upload the converted files to the CSSD/DWH server.
Structured Data Unstructured Data
Voice
Mobile Social
SUN NSN E/// PLP ODS ... AURA
Internet Media
to ... ...
Text

Hadoop stores original CDRs and structured and unstructured data, improving storage capacity
and processing performance, and reducing hardware costs.
A total of 1.1 billion records (664300 MB) are extracted, converted, and loaded at an overall
processing speed of 113 MB/s, much higher than the 11 MB/s expected by the customer.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 53
Summary
 These slides describe the enterprise edition of Huawei FusionInsight
HD, focus on FusionInsight HD features and application scenarios, and
describe Huawei FusionInsight HD success cases in the industry.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 54
Quiz
1. What are the features of FusionInsight HD?

2. Which encryption algorithms are supported by Hive/HBase fine-


grained encryption?

3. A large number of small files are stored in the Hadoop HDFS, which
brings great pressure to the NameNode. HBase stores a large number
of small files, and Compaction wastes I/O resources. What are the
technical solutions to this problem?

4. What are the levels of logs that can be adjusted?

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 55
Quiz
1. True or False
① Hive supports encryption of tables and columns. HBase supports encryption of
tables, column families, and columns. (T or F)
② User rights management is role-based access control and provides visualized and
unified user rights management for multiple components. (T or F)
2. Multiple-Answer Question
Which of the following indicate the high reliability of FusionInsight HD? ( )
A. All components are free of SPOFs.
B. All management nodes support HA.
C. Health status monitoring for the software and hardware
D. Network plane isolation

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 56
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 57
Thank You
www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 58

You might also like