Module 13 FusionInsight HD Solution Overview
Module 13 FusionInsight HD Solution Overview
Solution Overview
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. FusionInsight Overview
2. FusionInsight Features
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Apache Hadoop - Prosperous Open - Source
Ecosystem
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Big Data Is an Important Pillar for Huawei
ICT Strategy
Huawei Strategy Map
Huawei Big Data R&D Team
Content Global Distribution
Third and App
Partners Third ISVs
Enterprise SDP
Apps BSS/OSS
Professional Service
IP+Optical
Enterprise
FBB MBB
Network There are eight research centers with
Things People thousands of employees around the
(M2M Module) (Smart Device) world.
World-class data mining and artificial
Source: Huawei corporate presentation intelligence experts, such as PMC
Committer and IEEE Fellow
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
FusionInsight HD: From Open-Source to
Enterprise Versions
Version
Security Configuration
mapping
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
FusionInsight Platform Architecture
Power Financial industry Big data cloud services
Safe city Telecom
industry
Big data cloud services
Data integration services Data processing services Real-time computing Data analysis services Machine learning Artificial Intelligence
services services Service (AIS)
Data Ingest services MapReduce Service (MRS) Stream services DWS MLS Image tagging service
DPS services, ... CloudTable RTD services, ... MOLAP services, ... Log analysis, ... NLP service, ...
FusionInsight
Porter FusionInsight Miner data insight FusionInsight Farmer data intelligence FusionInsight
Data Manager
integration Weaver graphics analysis engine RTD real-time decision engine Management
platform
Sqoop Miner Studio mining platform Farmer Base reasoning framework
Batch
Security
collection management
FusionInsight HD data processing
Flume Spark FusionInsight Elk Storm/Flink Performance
Real-time One-stop analysis Standard SQL Stream processing management
Collaboration service
collection framework engine framework
Fault
ZooKeeper
FusionInsight management
Kafka Yarn resource management
Message LibrA
Parallel Tenant
queue CarbonData new file format management
HBase database
Oozie NoSQL database
HDFS distributed file system Configuration
Job management
scheduling
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Contribution to the Open-Source
Community
Create top
Lead the community
community to projects and be
Perform complete future- recognized by
kernel-level oriented kernel- the ecosystem
development level feature
to support development
Be able to
resolve kernel- key service
Be able to level problems features
resolve kernel- by teams
level problems
(outstanding
Large number of
individuals)
Locate components and
peripheral codes
Be able problems Frequent
Apache open-source
to use component update
community ecosystem
Hadoop Efficient feature
integration
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Contents
1. FusionInsight Overview
2. FusionInsight Features
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
System and Data Reliability
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Security
Fully open-
Authentication
source
management of Data integrity
Component
user permission verification
enhancement
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Network Security and Reliability - Dual-
Plane Networking
App-Server App-Server Cluster service
plane Network Type Trustworthiness Description
OMS-Server
Hadoop cluster
core components
Cluster service
High for the storage and
Cluster plane
transfer of service
management plane
data
Maintenance network
Only web services
outside the cluster Maintenance
provided by the
network outside Low
OMS server can be
the cluster
accessed.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Visualized Cluster Management,
Simplifying O&M
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Graphical Health Check Tool (1)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Graphical Health Check Tool (2)
Qualification ratio of
inspection items
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Easy Development
Native APIs of HBase Enhanced APIs
try { try {
table = new HTable(conf, TABLE); table = new ClusterTable(conf,
// 1. Generate RowKey. CLUSTER_TABLE);
{......} // 1. Create CTRow instance.
// 2. Create Put instance. CTRow row = new CTRow();
Put put = new Put(rowKey); // 2. Add columns.
// 3. Convert columns into qualifiers(Need to {........}
consider merging cold columns). } // 3. Put into HBase.
// 3.1. Add hot columns. table.put(TABLE, row);
{.......} } catch (IOException e) {
// 3.2. Merge cold columns. // Does not care connection re-creation.
{.......}
put.add(COLUMN_FAMILY, Bytes.toBytes("QA"),
hotCol); Enhanced HBase SDK
// 3.3. Add cold columns.
put.add(COLUMN_FAMILY, Bytes.toBytes("QB"), HBase
coldCols) Recoverable table
Connection Schema design
Manager Data tool
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
FusionInsight Spark SQL
SQL compatibility – All 99 TPC- Long-term stability test:
DS cases of the standard Memory optimization – resolves
SQL:2003 are passed. memory leakage problems,
decentralizes broadcasting, and
optimizes Spark heap memory.
Data update and deletion – Communication optimization – RPC
Spark SQL supports data enhancement, shuffle fetch
insertion, update, and deletion optimization, and shuffle network
configuration
when the CarbonData file format
Scheduling optimization – GetSplits(),
is used. AddPendingTask() acceleration (), DAG
serialization reuse
Large-scale Spark with stable Extreme pressure test – 24/7 pressure
and high performance – is used test, HA test
to test the TPC-DS long-term O&M enhancement – Log security
review and DAG UI optimization
stability in the scale of 100 TB
data volume.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Spark SQL Multi-Tenant
JDBCServer (Proxy) Yarn
YarnQuery Tenant A
Spark JDBC
JDBC Spark JDBCServer 1
Proxy 1
Beeline Spark JDBCServer 2
Spark JDBC
Proxy 2
JDBC
YarnQuery Tenant B
Beeline Spark JDBC
Proxy X Spark JDBCServer 1
...
Spark JDBCServer 2
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Spark SQL Small File Optimization
1 MB 1 MB 1 MB 1 MB 1 MB 1 MB RDD1
1 MB 1 MB 1 MB 1 MB 1 MB 1 MB HDFS
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Apache CarbonData - Converging Data
Formats of Data Warehouse (1)
CarbonData:
OLAP (multidimensional analysis) A single file format meets the requirements
of different access types.
Sequential access
(large-scale Random access (small range
scanning) scanning)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Apache CarbonData - Converging Data
Formats of Data Warehouse (2)
Apache Incubator Project since June 2016
Apache releases
4 stable releases Compute
Storage
In Production:
CarbonData supports IUD statements and provides data update and deletion capabilities in big data
scenarios. Pre-generated dictionaries and batch sort improve CarbonData import efficiency while
global sort improves query efficiency and concurrency.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
CarbonData Enhancement
Quick query response: CarbonData features high-performance query. The query speed of CarbonData is ten times
that of Spark SQL. The dedicated data format used by CarbonData is designed based on high-performance queries,
including multiple index technologies, global dictionary codes, and multiple push down optimizations, thereby
quickly responding to TB-level data queries.
Efficient data compression: CarbonData compresses data by combining the lightweight and heavyweight
compression algorithms. This compression method saves 60% to 80% data storage spaces coupled with significant
hardware storage cost savings.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Flink – Distributed Real - Time Processing
System
Flink is a distributed real-time processing system with low latency (latency measured
in milliseconds), high throughput, and high reliability, which is promoted by Huawei
in the IT field. Flink is integrated into FusionInsight HD for sale.
Flink is a unified computing framework that supports both batch processing and
stream processing. It provides a stream data processing engine that supports data
distribution and parallel computing. Flink features stream processing and is a top
open-source stream processing engine in the industry. Flink is suitable for low-latency
data processing scenarios. Flink provides high-concurrency pipeline data processing,
millisecond-level latency, and high reliability.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Visible HBase Modeling
Column Family Column Family
A collection of columns that A collection of columns that have
have service association service association relationships
relationships
Column
User list: Qualifier
Each column HBase column
indicates an Each column indicates a KeyValue.
attribute of service Mapping
data.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
HBase Cold Field Merging Transparent to
Applications
User Data
ID Name Phone ColA ColB ColC ColD ColE ColF ColG ColH
A B C D
HBase KeyValues
Problems
High expansion rate and poor data query performance due to the HBase column increase
Increased development complexity and metadata maintenance due to the application layer
merging cold data columns
Features
Cold field merging transparent to applications
Real-time write and batch import interfaces
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Hive/HBase Fine - Grained Encryption
Hive/HBase Application scenarios
Data saved in plaintext mode may cause
security risks of sensitive data leakage.
Sensitive Sensitive Insensitive Solution
data write data read data Hive encryption of tables and columns
HBase encryption of tables, column
families, and columns
Encryption algorithms of AES and SM4, and
user-defined encryption algorithms
Encryption/Decryption Customer benefits
Sensitive data is encrypted and stored by
table or column.
HDFS Algorithm diversity and system security
*(&@#$^%!%$#$!(* Encryption and decryption transparency to
Insensitive
^&*^*5!$!@^%$^!$! data services
%#$@%#!!$#@!
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
HBase Secondary Indexing
UserTable UserTable_idx UserTable
ColumnFamily Data CF ColumnFamily
RowKey Scanning RowKey RowKey
colA colB colC colA colB colC
area
a0001 01 a0001#coluA01#a0001 a0001 01
a0002 02 a0001#coluA02#a0002 a0002 02
a0003 06 a0001#coluA03#a0006 a0003 06
a0004 08 Destination a0001#coluA04#a0005 a0004 08
a0005 04 line a0005 04
a0001#coluA06#a0003
a0006 03 B C a0001#coluA08#a0004 a0006 03 B C
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
CTBase Simplifies HBase Multi - Table
Service Development
Transaction CTBase
Account_id Amount Time A0001 Andy $100232
AccountInfo
12/12/2014 12/12/2014
A0001 $100
18:00:02
A0001 $100
18:00:02 record
10/12/2014 10/12/2014
A0001 $1020 A0001 $1020
15:30:05 15:30:05
09/12/2014 09/12/2014
A0001 $89 A0001 $89
13:00:07 13:00:07
Transaction
11/12/2014 record
A0002 $105 A0002 Lily $902323
20:15:00
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
HFS Small File Storage and Retrieval Engine
Application scenario
A large number of small files and associated
description information needs to be stored.
Current problem
A large number of small files are stored in the
Hadoop Distributed File System (HDFS), which
brings great pressure to the NameNode. HBase
stores a large number of small files, and
Metadata and Medium/Large Compaction wastes I/O resources.
small files files
HFS solution value
The HFS stores not only small files but also
metadata description information related to the
files.
The HFS provides a unified and friendly access
API.
The HFS selects the optimal storage solution
based on the file size.
Small files are directly stored in the Medium-
sized Objects (MOB).
Large files are directly stored in the HDFS.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Label - based Storage
The data of online applications is stored only on
nodes labeled with "Online Application" and is
I/O conflicts affect isolated from the data of offline applications. This
online services. design prevents I/O competition and improves the
local hit ratio.
Online Offline Online Offline
application application application application
Batch processing
Batch processing
Batch processing
application
application
application
Online
Online
Online
HDFS common storage HDFS label-based storage
Solution description: Label cluster nodes based on applications or physical characteristics, for example, label a node
with "Online Application." Then application data is stored only on nodes with specified labels.
Application scenarios: 1. Online and offline applications share a cluster. 2. Specific services (such as online
applications) run on specific nodes.
Customer benefits: 1. I/Os of different applications are isolated to ensure the application SLA. 2. The system
performance is improved by improving the hit ratio of application data.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Label - based Scheduling
Spark MapReduce Spark MapReduce
application application application application
Large memory
Large memory
memory
Large
Default
Default
Default
Common scheduling Label-based scheduling
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
CPU Resource Configuration Period
Adjustment
Batch processing Real-time
Real-time application Batch processing application
application application
Hive/Spark/… Hive/Spark/…
HBase HBase
QA QB QC QD QA QB QC QD
CPU CPU
Cgroup1 Cgroup2 Cgroup1 Cgroup2
40% 60% 80% 20%
Time
7:00 20:00
Solution description: Different services have different proportions of resources in different time segments. For
example, from 7:00 a.m. to 20: 00 p.m., real-time services can be allocated to 60% resources at peak hours. From
20:00 p.m. to 7: 00a.m., the 80% resource can be allocated to the batch processing applications when the real-
time services are at off-peak hours.
Application scenario: The peak hours and off-peak hours of different services are different.
Customer benefit: Services can obtain as many resources as possible at peak hours, boosting the average
resource utilization of the system.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Resource Distribution Monitoring
Benefits
Quick focusing on the most critical resource consumption
Quick locating of the node with the highest resource consumption to take
appropriate measures
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Dynamic Adjustment of the Log Level
Application scenario: When a fault occurs in the Hadoop cluster, quickly locating the
fault needs to change the log level. During log level modification, the process cannot
be restarted, resulting in service interruption. How do I resolve this problem?
Solution: Dynamically adjusting the log level on the WebUI
Benefits: When locating a fault, you can quickly change the log level of a specified
service or node without restarting the service or interrupting services.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Wizard - based Cluster Data Backup
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Wizard - based Cluster Data Restoration
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Multi Tenant Management
Multi-level tenant management
Dept. A
Tenant A_1 Tenant A
Sub-department
A_1
Computing
Yarn queue (CPU/memory/I/O)
resources
Storage resources HDFS (storage space/file overview)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 37
One Stop Tenant Management
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Visualized, Centralized User Rights
Management
Visualized, centralized user rights management is easy to use, flexible, and
refined:
Easy to use: visualized multi-component unified user rights management
Flexible: role-based access control (RBAC) and predefined privilege sets (roles) which
can be used repeatedly
Refined: multi-level (database/table/column-level) and fine-grained
(Select/Delete/Update/Insert/Grant) authorization
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 39
Automatic NTP Configuration
External NTP Server
NTP Client
Management Management
Node (Active) Node (Standby)
NTP Server NTP Client
NTP Client NTP Client NTP Client NTP Client NTP Client
Data Node Data Node Data Node Data Node Control Node
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 40
Automatically Configuring Mapping of
Hosts
Benefits
Shorten environment
preparation time to install
the Hadoop cluster.
Reduce probability of user
configuration errors.
Reduce the risk of
manually configuring
mapping for stable
running nodes after
capacity expansion in a
large-scale cluster.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Rolling Restart/Upgrade/Patch
HDFS rolling upgrade example:
Modifying a Configuration Service interruption duration of core
Performing the Upgrade components: no interruption in 12
Installing the Patch Upgrade
hours
Without
Service
Interrupting
Services
C70 Client
ZooKeeper
C60
HDFS
HDFS Cluster Yarn
NameNode NameNode HBase
Storm
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 42
Contents
1. FusionInsight Overview
2. FusionInsight Features
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 43
Huawei Smart Transportation Solution
Secure Organized
Challenges to key vehicle identification: insufficient Challenges to checkpoint and e-police
capability of key vehicle automatic identification capabilities: rigid algorithm
Insufficient traffic accident detection capability: Challenges to violation review and handling
blind spot, weak detection technology, and manual capabilities: heavy workload
accident reporting and handling Challenges to special attack data analysis
Low efficiency of special attacks: information capabilities: manual analysis and taking 7-30
fragmentation and poor special attack platform days
Smooth Intelligent
Challenges to traffic detection capability: faulty
detection devices, low detection efficiency, and low Computing intelligence challenges: closed system
reliable detection results and technology and fragmented information
Challenges to traffic analysis capabilities: not shared Perceptual intelligence challenges: weak awareness
traffic information among cities of traffic, events, and peccancy
Challenges to traffic signal optimization Cognitive intelligence challenges: lack of traffic
awareness in regions and intersections
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Traffic Awareness in the Whole City: Deep Learning
and Digital Transformation
No camera is added. By deep learning and intelligent analysis, about 50 billion real-time pavement
traffic parameters are added every month, which lays a foundation for digital transformation of
traffic.
Video cloud storage and cloud computing platform Traffic big data attacks modeling engine and time
and space analysis engine.
Monitoring more than More than 4000 traffic More than 3000 channels
6000 roads checkpoints of HD e-police
Note: The preceding figures use a city as an example.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 45
Traffic Big Data Analysis Platform
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 46
Limitations of Traditional Marketing
Systems
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 47
Marketing System Architecture
Ark
Chinasoft big data middleware (Ark)
ZooKeeper
ZooKeeper
Big data Flume Spark Loader Hive Farmer
platform MPPDB
Storm/Flink HBase MapReduce
Kafka Redis HDFS/Yarn RTD MQ Redis
Manager
Infrastructure
/Cloud x86 server ... x86 server Network device Security device
platform
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 48
Big Data Analysis, Mining, and Machine
Learning Make Marketing More Accurate
Data Predictive Model Model effect
monitoring and
analysis modeling application evaluation
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 49
Solution Benefits
Precise: precise customer Easy to use: self-learning of
group mining rules
Customizable/Developmen
Customer-based t variables, rules, and rule
360-degree view modes
Customer type- Rule auto-learning and
based mining optimization
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 50
A Carrier: Big Data Convergence to
Achieve Big Values
Credit
Crowd
investigation
Service experience ... Internet access log Domain name log ...
gathering quality Signaling log query
computing query query
Manager
MapReduce Spark
Yarn/ZooKeeper
Yarn/ZooKeeper
HBase
HDFS
ETL
Data source
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 51
Philippine PLDT: Converting and
Archiving Massive CDRs
Report/Interactive analysis/Forecast
analysis/Text mining CSP
Data Federation
Periodically obtain the source file from the transit server, convert the files to the T0/T1
format, and
upload the converted files to the CSSD/DWH server.
Structured Data Unstructured Data
Voice
Mobile Social
SUN NSN E/// PLP ODS ... AURA
Internet Media
to ... ...
Text
Hadoop stores original CDRs and structured and unstructured data, improving storage capacity
and processing performance, and reducing hardware costs.
A total of 1.1 billion records (664300 MB) are extracted, converted, and loaded at an overall
processing speed of 113 MB/s, much higher than the 11 MB/s expected by the customer.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 53
Summary
These slides describe the enterprise edition of Huawei FusionInsight
HD, focus on FusionInsight HD features and application scenarios, and
describe Huawei FusionInsight HD success cases in the industry.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 54
Quiz
1. What are the features of FusionInsight HD?
3. A large number of small files are stored in the Hadoop HDFS, which
brings great pressure to the NameNode. HBase stores a large number
of small files, and Compaction wastes I/O resources. What are the
technical solutions to this problem?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 55
Quiz
1. True or False
① Hive supports encryption of tables and columns. HBase supports encryption of
tables, column families, and columns. (T or F)
② User rights management is role-based access control and provides visualized and
unified user rights management for multiple components. (T or F)
2. Multiple-Answer Question
Which of the following indicate the high reliability of FusionInsight HD? ( )
A. All components are free of SPOFs.
B. All management nodes support HA.
C. Health status monitoring for the software and hardware
D. Network plane isolation
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 56
More Information
Training materials:
http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
Exam outline:
http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
Mock exam:
http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
Authentication process:
http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 57
Thank You
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 58