0% found this document useful (0 votes)

32 views57 pages

Module 13 FusionInsight HD Solution Overview

The document provides an overview of Huawei's FusionInsight HD, a big data solution that includes features, architecture, and success cases. It highlights the platform's capabilities in data integration, processing, and analysis, as well as its contributions to the open-source community. Additionally, it discusses system reliability, security measures, and advanced functionalities such as Spark SQL and Apache CarbonData for optimized data handling.

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views57 pages

Module 13 FusionInsight HD Solution Overview

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

FusionInsight HD

Solution Overview

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

Objectives
 After completing this course, you will be able to understand:
 Huawei big data solution FusionInsight HD
 The features of FusionInsight HD
 Success cases of FusionInsight HD

2. FusionInsight Features

3. Success Cases of FusionInsight

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Apache Hadoop - Prosperous Open - Source
Ecosystem

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Big Data Is an Important Pillar for Huawei
ICT Strategy
Huawei Strategy Map
Huawei Big Data R&D Team
Content Global Distribution
Third and App
Partners Third ISVs

Enterprise SDP
Apps BSS/OSS
Professional Service

Big Data Analytics Platform

Data Center Infrastructure
Core Network

IP+Optical

Enterprise
FBB MBB
Network  There are eight research centers with
Things People thousands of employees around the
(M2M Module) (Smart Device) world.
 World-class data mining and artificial
Source: Huawei corporate presentation intelligence experts, such as PMC
Committer and IEEE Fellow

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
FusionInsight HD: From Open-Source to
Enterprise Versions
Version
Security Configuration
mapping

Patch selection Performance Baseline

optimization selection

Hadoop HBase Log

Initial Prosperous Enterprise

open-source community version

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
FusionInsight Platform Architecture
Power Financial industry Big data cloud services
Safe city Telecom
industry
Big data cloud services
Data integration services Data processing services Real-time computing Data analysis services Machine learning Artificial Intelligence
services services Service (AIS)
Data Ingest services MapReduce Service (MRS) Stream services DWS MLS Image tagging service

DPS services, ... CloudTable RTD services, ... MOLAP services, ... Log analysis, ... NLP service, ...

FusionInsight
Porter FusionInsight Miner data insight FusionInsight Farmer data intelligence FusionInsight
Data Manager
integration Weaver graphics analysis engine RTD real-time decision engine Management
platform
Sqoop Miner Studio mining platform Farmer Base reasoning framework
Batch
Security
collection management
FusionInsight HD data processing
Flume Spark FusionInsight Elk Storm/Flink Performance
Real-time One-stop analysis Standard SQL Stream processing management

Collaboration service
collection framework engine framework
Fault

ZooKeeper
FusionInsight management
Kafka Yarn resource management
Message LibrA
Parallel Tenant
queue CarbonData new file format management
HBase database
Oozie NoSQL database
HDFS distributed file system Configuration
Job management
scheduling

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Contribution to the Open-Source
Community
Create top
Lead the community
community to projects and be
Perform complete future- recognized by
kernel-level oriented kernel- the ecosystem
development level feature
to support development
Be able to
resolve kernel- key service
Be able to level problems features
resolve kernel- by teams
level problems
(outstanding
Large number of
individuals)
Locate components and
peripheral codes
Be able problems Frequent
Apache open-source
to use component update
community ecosystem
Hadoop Efficient feature
integration

Outstanding product development and delivery capabilities and carrier-class

operation support capabilities empowered by the Hadoop kernel team

2. FusionInsight Features

3. Success Cases of FusionInsight

All components Cross–data center

without SPOF DR

HA for all Third-party backup system

management nodes integration
System Data
Reliability Reliability
Software and hardware
health status Key data power-off
monitoring protection

Network plane Hot-swappable hard

isolation disks

System Permission Data

Security Authentication Security

Fully open-
Authentication
source
management of Data integrity
Component
user permission verification
enhancement

Operating User permission

system security control of File data
hardening different encryption
components

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Network Security and Reliability - Dual-
Plane Networking
App-Server App-Server Cluster service
plane Network Type Trustworthiness Description

OMS-Server
Hadoop cluster
core components
Cluster service
High for the storage and
Cluster plane
transfer of service
management plane
data

WebUI-Client It only manages the

Cluster
cluster and is
management Medium
involved with no
plane
service data.

Maintenance network
Only web services
outside the cluster Maintenance
provided by the
network outside Low
OMS server can be
the cluster
accessed.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Visualized Cluster Management,
Simplifying O&M

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Graphical Health Check Tool (2)
Qualification ratio of
inspection items

 Qualification ratio of inspection items

 Disqualification ratio of inspection items

Node qualification rate

Node qualification rate 

Node disqualification rate 

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Easy Development
Native APIs of HBase Enhanced APIs
try { try {
table = new HTable(conf, TABLE); table = new ClusterTable(conf,
// 1. Generate RowKey. CLUSTER_TABLE);
{......} // 1. Create CTRow instance.
// 2. Create Put instance. CTRow row = new CTRow();
Put put = new Put(rowKey); // 2. Add columns.
// 3. Convert columns into qualifiers(Need to {........}
consider merging cold columns). } // 3. Put into HBase.
// 3.1. Add hot columns. table.put(TABLE, row);
{.......} } catch (IOException e) {
// 3.2. Merge cold columns. // Does not care connection re-creation.
{.......}
put.add(COLUMN_FAMILY, Bytes.toBytes("QA"),
hotCol); Enhanced HBase SDK
// 3.3. Add cold columns.
put.add(COLUMN_FAMILY, Bytes.toBytes("QB"), HBase
coldCols) Recoverable table
Connection Schema design
Manager Data tool

The HBase table design tool, HBase API

connection pool management
function, and enhanced SDK are
used to simplify development of HBase
complex data tables.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
FusionInsight Spark SQL
 SQL compatibility – All 99 TPC-  Long-term stability test:
DS cases of the standard  Memory optimization – resolves
SQL:2003 are passed. memory leakage problems,
decentralizes broadcasting, and
optimizes Spark heap memory.
 Data update and deletion –  Communication optimization – RPC
Spark SQL supports data enhancement, shuffle fetch
insertion, update, and deletion optimization, and shuffle network
configuration
when the CarbonData file format
 Scheduling optimization – GetSplits(),
is used. AddPendingTask() acceleration (), DAG
serialization reuse
 Large-scale Spark with stable  Extreme pressure test – 24/7 pressure
and high performance – is used test, HA test
to test the TPC-DS long-term  O&M enhancement – Log security
review and DAG UI optimization
stability in the scale of 100 TB
data volume.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Spark SQL Multi-Tenant
JDBCServer (Proxy) Yarn

YarnQuery Tenant A
Spark JDBC
JDBC Spark JDBCServer 1
Proxy 1
Beeline Spark JDBCServer 2
Spark JDBC
Proxy 2
JDBC
YarnQuery Tenant B
Beeline Spark JDBC
Proxy X Spark JDBCServer 1
...

Spark JDBCServer 2

The community's Spark JDBCServer supports only single tenants. A tenant is

bound to a Yarn resource queue.
FusionInsight Spark JDBCServer supports multiple tenants, and resources
are isolated among different tenants.

1 MB+1 MB … 1 MB+1 MB … RDD2

1 MB 1 MB 1 MB 1 MB 1 MB 1 MB RDD1

1 MB 1 MB 1 MB 1 MB 1 MB 1 MB HDFS

Text/Parquet/ORC/Json Table on HDFS

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Apache CarbonData - Converging Data
Formats of Data Warehouse (1)
CarbonData:
OLAP (multidimensional analysis) A single file format meets the requirements
of different access types.

Sequential access
(large-scale Random access (small range
scanning) scanning)

 Random access (small-scale scanning):

7.9 to 688 times
 OLAP/Interactive query: 20 to 33
times
 Sequential access (large-scale
scanning): 1.4 to 6 times

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Apache CarbonData - Converging Data
Formats of Data Warehouse (2)
 Apache Incubator Project since June 2016
 Apache releases
 4 stable releases Compute

 Latest 1.0.0, Jan 28, 2017

 Contributors:

Storage

 In Production:

CarbonData supports IUD statements and provides data update and deletion capabilities in big data
scenarios. Pre-generated dictionaries and batch sort improve CarbonData import efficiency while
global sort improves query efficiency and concurrency.

 Quick query response: CarbonData features high-performance query. The query speed of CarbonData is ten times
that of Spark SQL. The dedicated data format used by CarbonData is designed based on high-performance queries,
including multiple index technologies, global dictionary codes, and multiple push down optimizations, thereby
quickly responding to TB-level data queries.
 Efficient data compression: CarbonData compresses data by combining the lightweight and heavyweight
compression algorithms. This compression method saves 60% to 80% data storage spaces coupled with significant
hardware storage cost savings.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Flink – Distributed Real - Time Processing
System
 Flink is a distributed real-time processing system with low latency (latency measured
in milliseconds), high throughput, and high reliability, which is promoted by Huawei
in the IT field. Flink is integrated into FusionInsight HD for sale.
 Flink is a unified computing framework that supports both batch processing and
stream processing. It provides a stream data processing engine that supports data
distribution and parallel computing. Flink features stream processing and is a top
open-source stream processing engine in the industry. Flink is suitable for low-latency
data processing scenarios. Flink provides high-concurrency pipeline data processing,
millisecond-level latency, and high reliability.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Visible HBase Modeling
Column Family Column Family
A collection of columns that A collection of columns that have
have service association service association relationships
relationships

Column
User list: Qualifier
Each column HBase column
indicates an Each column indicates a KeyValue.
attribute of service Mapping
data.

reverse(Column1, 4) Column2 Column3

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
HBase Cold Field Merging Transparent to
Applications
User Data

ID Name Phone ColA ColB ColC ColD ColE ColF ColG ColH

A B C D
HBase KeyValues

Problems
 High expansion rate and poor data query performance due to the HBase column increase
 Increased development complexity and metadata maintenance due to the application layer
merging cold data columns
Features
 Cold field merging transparent to applications
 Real-time write and batch import interfaces

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Hive/HBase Fine - Grained Encryption
Hive/HBase Application scenarios
 Data saved in plaintext mode may cause
security risks of sensitive data leakage.
Sensitive Sensitive Insensitive Solution
data write data read data  Hive encryption of tables and columns
 HBase encryption of tables, column
families, and columns
 Encryption algorithms of AES and SM4, and
user-defined encryption algorithms
Encryption/Decryption Customer benefits
 Sensitive data is encrypted and stored by
table or column.
HDFS  Algorithm diversity and system security
*(&@#$^%!%$#$!(*  Encryption and decryption transparency to
Insensitive
^&*^*5!$!@^%$^!$! data services
%#$@%#!!$#@!

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
HBase Secondary Indexing
UserTable UserTable_idx UserTable
ColumnFamily Data CF ColumnFamily
RowKey Scanning RowKey RowKey
colA colB colC colA colB colC
area
a0001 01 a0001#coluA01#a0001 a0001 01
a0002 02 a0001#coluA02#a0002 a0002 02
a0003 06 a0001#coluA03#a0006 a0003 06
a0004 08 Destination a0001#coluA04#a0005 a0004 08
a0005 04 line a0005 04
a0001#coluA06#a0003
a0006 03 B C a0001#coluA08#a0004 a0006 03 B C

No index: "Scan+Filter", scanning Secondary index: The target data can

a large amount of data be located after twice I/Os.

 Index Region and Data Region as companions under a unified processing

mechanism
 Original HBase API interfaces, user-friendly
 Coprocessor-based plug-ins, easy to upgrade
 Write optimization, supporting real-time write

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
CTBase Simplifies HBase Multi - Table
Service Development
Transaction CTBase
Account_id Amount Time A0001 Andy $100232

AccountInfo
12/12/2014 12/12/2014
A0001 $100
18:00:02
A0001 $100
18:00:02 record

10/12/2014 10/12/2014
A0001 $1020 A0001 $1020
15:30:05 15:30:05

09/12/2014 09/12/2014
A0001 $89 A0001 $89
13:00:07 13:00:07
Transaction
11/12/2014 record
A0002 $105 A0002 Lily $902323
20:15:00

AccountInfo A0002 $105

11/12/2014
20:15:00
Account_id Account_name Account_balance
11/11/2014
A0002 $129
18:15:00
A0001 Andy $100232
A0003 Selin $90000
A0002 Lily $902323

A0003 Selin $90000

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
HFS Small File Storage and Retrieval Engine
Application scenario
 A large number of small files and associated
description information needs to be stored.
Current problem
 A large number of small files are stored in the
Hadoop Distributed File System (HDFS), which
brings great pressure to the NameNode. HBase
stores a large number of small files, and
Metadata and Medium/Large Compaction wastes I/O resources.
small files files
HFS solution value
 The HFS stores not only small files but also
metadata description information related to the
files.
 The HFS provides a unified and friendly access
API.
 The HFS selects the optimal storage solution
based on the file size.
 Small files are directly stored in the Medium-
sized Objects (MOB).
 Large files are directly stored in the HDFS.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Label - based Storage
The data of online applications is stored only on
nodes labeled with "Online Application" and is
I/O conflicts affect isolated from the data of offline applications. This
online services. design prevents I/O competition and improves the
local hit ratio.
Online Offline Online Offline
application application application application

Batch processing

Batch processing
application
application

application

Online
Online

Online
HDFS common storage HDFS label-based storage

 Solution description: Label cluster nodes based on applications or physical characteristics, for example, label a node
with "Online Application." Then application data is stored only on nodes with specified labels.
 Application scenarios: 1. Online and offline applications share a cluster. 2. Specific services (such as online
applications) run on specific nodes.
 Customer benefits: 1. I/Os of different applications are isolated to ensure the application SLA. 2. The system
performance is improved by improving the hit ratio of application data.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Label - based Scheduling
Spark MapReduce Spark MapReduce
application application application application

Large memory

Large memory
memory
Large

Default

Default
Common scheduling Label-based scheduling

Fine-grained scheduling based on application awareness, improving resource utilization

 Different applications such as online and batch processing are running on nodes with their specific labels to absolutely
isolate computing resources of different applications and improve service SLA.
 Applications that have special requirements on node hardware are running only on nodes with special hardware, for
example, Spark applications need to run on nodes with large memory. Resources are scheduled on demand, improving
resource utilization and system performance.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
CPU Resource Configuration Period
Adjustment
Batch processing Real-time
Real-time application Batch processing application
application application

Hive/Spark/… Hive/Spark/…
HBase HBase
QA QB QC QD QA QB QC QD
CPU CPU
Cgroup1 Cgroup2 Cgroup1 Cgroup2
40% 60% 80% 20%
Time
7:00 20:00

 Solution description: Different services have different proportions of resources in different time segments. For
example, from 7:00 a.m. to 20: 00 p.m., real-time services can be allocated to 60% resources at peak hours. From
20:00 p.m. to 7: 00a.m., the 80% resource can be allocated to the batch processing applications when the real-
time services are at off-peak hours.
 Application scenario: The peak hours and off-peak hours of different services are different.
 Customer benefit: Services can obtain as many resources as possible at peak hours, boosting the average
resource utilization of the system.

 Benefits
 Quick focusing on the most critical resource consumption
 Quick locating of the node with the highest resource consumption to take
appropriate measures

 Application scenario: When a fault occurs in the Hadoop cluster, quickly locating the
fault needs to change the log level. During log level modification, the process cannot
be restarted, resulting in service interruption. How do I resolve this problem?
 Solution: Dynamically adjusting the log level on the WebUI
 Benefits: When locating a fault, you can quickly change the log level of a specified
service or node without restarting the service or interrupting services.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Multi Tenant Management
Multi-level tenant management

Company Enterprise tenant

Dept. A
Tenant A_1 Tenant A

Sub-department
A_1

Computing
Yarn queue (CPU/memory/I/O)
resources
Storage resources HDFS (storage space/file overview)

Service resources HBase ...

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Visualized, Centralized User Rights
Management
Visualized, centralized user rights management is easy to use, flexible, and
refined:
 Easy to use: visualized multi-component unified user rights management
 Flexible: role-based access control (RBAC) and predefined privilege sets (roles) which
can be used repeatedly
 Refined: multi-level (database/table/column-level) and fine-grained
(Select/Delete/Update/Insert/Grant) authorization

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 39
Automatic NTP Configuration
External NTP Server

NTP Client

Management Management
Node (Active) Node (Standby)
NTP Server NTP Client

NTP Client NTP Client NTP Client NTP Client NTP Client
Data Node Data Node Data Node Data Node Control Node

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 40
Automatically Configuring Mapping of
Hosts
 Benefits
 Shorten environment
preparation time to install
the Hadoop cluster.
 Reduce probability of user
configuration errors.
 Reduce the risk of
manually configuring
mapping for stable
running nodes after
capacity expansion in a
large-scale cluster.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Rolling Restart/Upgrade/Patch
HDFS rolling upgrade example:
 Modifying a Configuration Service interruption duration of core
 Performing the Upgrade components: no interruption in 12
 Installing the Patch Upgrade
hours
Without
Service
Interrupting
Services
C70 Client
ZooKeeper
C60
HDFS
HDFS Cluster Yarn
NameNode NameNode HBase
Storm

DataNode DataNode DataNode DataNode DataNode Flume

Loader
Spark
Hive
Solr

2. FusionInsight Features

3. Success Cases of FusionInsight

Secure Organized
Challenges to key vehicle identification: insufficient Challenges to checkpoint and e-police
capability of key vehicle automatic identification capabilities: rigid algorithm
Insufficient traffic accident detection capability: Challenges to violation review and handling
blind spot, weak detection technology, and manual capabilities: heavy workload
accident reporting and handling Challenges to special attack data analysis
Low efficiency of special attacks: information capabilities: manual analysis and taking 7-30
fragmentation and poor special attack platform days

Smooth Intelligent
Challenges to traffic detection capability: faulty
detection devices, low detection efficiency, and low Computing intelligence challenges: closed system
reliable detection results and technology and fragmented information
Challenges to traffic analysis capabilities: not shared Perceptual intelligence challenges: weak awareness
traffic information among cities of traffic, events, and peccancy
Challenges to traffic signal optimization Cognitive intelligence challenges: lack of traffic
awareness in regions and intersections

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Traffic Awareness in the Whole City: Deep Learning
and Digital Transformation
 No camera is added. By deep learning and intelligent analysis, about 50 billion real-time pavement
traffic parameters are added every month, which lays a foundation for digital transformation of
traffic.

Vehicle traffic and event awareness Traffic flow analysis

Traffic accident perception and analysis Traffic signal optimization

Algorithm warehouse
Deep learning
platform Deep learning training engine Deep learning search
Deep learning reasoning engine
engine

Video cloud storage and cloud computing platform Traffic big data attacks modeling engine and time
and space analysis engine.

Monitoring more than More than 4000 traffic More than 3000 channels
6000 roads checkpoints of HD e-police
Note: The preceding figures use a city as an example.

Key vehicle Key vehicle

traffic analysis violation analysis

Number of vehicles Number of vehicles (400

(400 million) million)
+pass records (12.6 +illegal records (2.6
billion) billion) National
transportation
integrated command
Detection Buy and sell analysis Serving 400 million vehicles in provinces and
replacement analysis cities in China, the traffic big data analysis
Number of vehicles (400 platform analyzes 2.6 billion illegal records and
Number of vehicles (400 million)
+illegal records (2.6 billion) million) 12.6 billion traffic records, greatly improving the
+detection records (1.1 billion) +illegal records (2.6 billion) security and orderly management capability of
(20 minutes) +number of drivers who cleared cross-province traffic and reaching the world's
the license point (110 million)
leading level.

 Customer groups are  Advertisements can be

obtained through data pushed only according to the
collection and filtering, preset rules.
which is time-consuming  Real-time marketing by
and labor-consuming. event or location cannot be
 Precise sales cannot be Low implemented.
implemented.
accuracy
Non-real-
 Mainly structured data,
unable to handle semi-
time
structured data.  Marketing strategies and
 Customer behavior rules are fixed. New rules
involved in rule need to be developed and
operation and implemented.
configurations, low
support rate.

Application Marketing Marketing Statistical Scheduling

layer Marketing plan ...
execution analysis analysis monitoring

Event detection Recommendation

Model layer Marketing model Rule engine
model engine

Ark
Chinasoft big data middleware (Ark)

Huawei enterprise-class big data platform (FusionInsight)

Real-time stream FusionInsight
Offline processing component
processing component Farmer RTD

ZooKeeper
ZooKeeper
Big data Flume Spark Loader Hive Farmer
platform MPPDB
Storm/Flink HBase MapReduce
Kafka Redis HDFS/Yarn RTD MQ Redis

Manager
Infrastructure
/Cloud x86 server ... x86 server Network device Security device
platform

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 48
Big Data Analysis, Mining, and Machine
Learning Make Marketing More Accurate
Data Predictive Model Model effect
monitoring and
analysis modeling application evaluation

Data source Marketing Effect evaluation and

activity plan continuous optimization

Customer group filtering

SMS
Marketing activity
Customer data Multiple
channels
App
Correlation analysis
Twitter
Analysis report

Model effect evaluation, customer data update, and model improvement

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 49
Solution Benefits
Precise: precise customer Easy to use: self-learning of
group mining rules
 Customizable/Developmen
 Customer-based t variables, rules, and rule
360-degree view modes
 Customer type-  Rule auto-learning and
based mining optimization

Comprehensive: supporting Precise Reliable: uninterrupted

various types of data marketing services
 Support of various types of data
(structured, unstructured, and
 Always-on service
semi-structured)
 Support of multi-channel
comprehensive analysis
 Support of statistics analysis
Real-time: real-time marketing
information push
 Event-based
 Location-based
 Millisecond-level analysis based
on full data

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 50
A Carrier: Big Data Convergence to
Achieve Big Values
Credit
Crowd
investigation
Service experience ... Internet access log Domain name log ...
gathering quality Signaling log query
computing query query

Real-time query platform

Basic analysis platform
Hadoop resource pool

Hive Spark SQL

KV interface SQL interface

Manager
MapReduce Spark
Yarn/ZooKeeper
Yarn/ZooKeeper
HBase
HDFS

ETL
Data source

Traditional data (BOM) New data (Internet)

Data Federation

DWH Hadoop Archiving CSSD

Aggregation

Periodically obtain the source file from the transit server, convert the files to the T0/T1
format, and
upload the converted files to the CSSD/DWH server.
Structured Data Unstructured Data
Voice
Mobile Social
SUN NSN E/// PLP ODS ... AURA
Internet Media
to ... ...
Text

Hadoop stores original CDRs and structured and unstructured data, improving storage capacity
and processing performance, and reducing hardware costs.
A total of 1.1 billion records (664300 MB) are extracted, converted, and loaded at an overall
processing speed of 113 MB/s, much higher than the 11 MB/s expected by the customer.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 53
Summary
 These slides describe the enterprise edition of Huawei FusionInsight
HD, focus on FusionInsight HD features and application scenarios, and
describe Huawei FusionInsight HD success cases in the industry.

2. Which encryption algorithms are supported by Hive/HBase fine-

grained encryption?

3. A large number of small files are stored in the Hadoop HDFS, which
brings great pressure to the NameNode. HBase stores a large number
of small files, and Compaction wastes I/O resources. What are the
technical solutions to this problem?

4. What are the levels of logs that can be adjusted?

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 55
Quiz
1. True or False
① Hive supports encryption of tables and columns. HBase supports encryption of
tables, column families, and columns. (T or F)
② User rights management is role-based access control and provides visualized and
unified user rights management for multiple components. (T or F)
2. Multiple-Answer Question
Which of the following indicate the high reliability of FusionInsight HD? ( )
A. All components are free of SPOFs.
B. All management nodes support HA.
C. Health status monitoring for the software and hardware
D. Network plane isolation

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 56
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

IMaster NCE-Campus V300R021C00 Monitoring and O&M
100% (1)
IMaster NCE-Campus V300R021C00 Monitoring and O&M
107 pages
CL15-01 FusionSphere Overview
No ratings yet
CL15-01 FusionSphere Overview
30 pages
FusionCube 1000 (Hypervisor) - Datasheet
No ratings yet
FusionCube 1000 (Hypervisor) - Datasheet
4 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Huawei Vision of Cloud Computing
No ratings yet
Huawei Vision of Cloud Computing
22 pages
HD Insight
No ratings yet
HD Insight
1,315 pages
HCIA-Big Data V2.0 Lab Guide For Big Data Engineers - Revision 4
No ratings yet
HCIA-Big Data V2.0 Lab Guide For Big Data Engineers - Revision 4
131 pages
103-Huawei OceanStor Distributed Storage V1.2
No ratings yet
103-Huawei OceanStor Distributed Storage V1.2
42 pages
7 Things You Must Know About Hpe Oneview
No ratings yet
7 Things You Must Know About Hpe Oneview
1 page
P TS4FI 1909 Final - Updated
100% (2)
P TS4FI 1909 Final - Updated
42 pages
Data Platform and Analytics Foundational Training: (Speaker Name)
No ratings yet
Data Platform and Analytics Foundational Training: (Speaker Name)
31 pages
A Glimpse of The Hadoop Echosystem
No ratings yet
A Glimpse of The Hadoop Echosystem
16 pages
GMDB
No ratings yet
GMDB
46 pages
HCS IT Dumps-Word
No ratings yet
HCS IT Dumps-Word
10 pages
Trafodian Overview
No ratings yet
Trafodian Overview
9 pages
Hadoop Migration Guide
No ratings yet
Hadoop Migration Guide
9 pages
TSB 55L16XMEA Service Manual
No ratings yet
TSB 55L16XMEA Service Manual
38 pages
Hpe Infosight: Artificial Intelligence For Autonomous Infrastructure
No ratings yet
Hpe Infosight: Artificial Intelligence For Autonomous Infrastructure
6 pages
m3 NoSQL Database
No ratings yet
m3 NoSQL Database
9 pages
Huawei Cloud Big Data Services
No ratings yet
Huawei Cloud Big Data Services
6 pages
Big Data Huawei Course
No ratings yet
Big Data Huawei Course
23 pages
Slides PDF
No ratings yet
Slides PDF
30 pages
Hcia Big Data Practice
No ratings yet
Hcia Big Data Practice
35 pages
Apache Spark Tutorial, With Deep-Dives On SparkR and Data Sources API
No ratings yet
Apache Spark Tutorial, With Deep-Dives On SparkR and Data Sources API
39 pages
AFSIM
No ratings yet
AFSIM
60 pages
Chapter 09 MRSHuawei's Big Data Platform
No ratings yet
Chapter 09 MRSHuawei's Big Data Platform
41 pages
08 Cloud Native and Transformation
No ratings yet
08 Cloud Native and Transformation
25 pages
Huawei: IT Product Portfolio
No ratings yet
Huawei: IT Product Portfolio
42 pages
Module 04 Spark2x - In-Memory Distributed Computing Engine
No ratings yet
Module 04 Spark2x - In-Memory Distributed Computing Engine
46 pages
En DAY4 Peter Zhou Reinventing The IT Infrastructure To Bolster Digital Transformation en
No ratings yet
En DAY4 Peter Zhou Reinventing The IT Infrastructure To Bolster Digital Transformation en
14 pages
A New IT Experience in The Era of Digital Transformation (PDFDrive)
No ratings yet
A New IT Experience in The Era of Digital Transformation (PDFDrive)
178 pages
Enterprise Data Storage and Analysis On Spark
No ratings yet
Enterprise Data Storage and Analysis On Spark
34 pages
Unit 5
No ratings yet
Unit 5
10 pages
HPE InfoSight Customer Presentation
No ratings yet
HPE InfoSight Customer Presentation
36 pages
DocScanner Jan 12, 2023 2-29 PM
No ratings yet
DocScanner Jan 12, 2023 2-29 PM
32 pages
Chapter 4 Database and Data Governance
No ratings yet
Chapter 4 Database and Data Governance
27 pages
Huawei OceanStor 9000 Technical Presentation
No ratings yet
Huawei OceanStor 9000 Technical Presentation
57 pages
Big Data and Cloud Computing
No ratings yet
Big Data and Cloud Computing
27 pages
Module 02 HDFS - Hadoop Distributed File System
No ratings yet
Module 02 HDFS - Hadoop Distributed File System
31 pages
Imaster NCE-Campus V300R020C10 Monitoring and O&M
No ratings yet
Imaster NCE-Campus V300R020C10 Monitoring and O&M
96 pages
Huawei Big Data Platform Architecture
No ratings yet
Huawei Big Data Platform Architecture
4 pages
Understanding Information Centric Networking and Mobile Edge Computing
No ratings yet
Understanding Information Centric Networking and Mobile Edge Computing
24 pages
02 Haddop Biginsights
No ratings yet
02 Haddop Biginsights
36 pages
Module 06 Hive - Distributed Data Warehouse
No ratings yet
Module 06 Hive - Distributed Data Warehouse
36 pages
02 - Campus Network Intelligent O&M and CampusInsight
No ratings yet
02 - Campus Network Intelligent O&M and CampusInsight
59 pages
Chapter 14 Huawei Big Data Solution
No ratings yet
Chapter 14 Huawei Big Data Solution
31 pages
Part2 HDFS
No ratings yet
Part2 HDFS
33 pages
Chapter 6 Cloud Native and Digital Transformation
No ratings yet
Chapter 6 Cloud Native and Digital Transformation
22 pages
PACIFIC
No ratings yet
PACIFIC
26 pages
Huawei Oceanstor 9000 Storage System
No ratings yet
Huawei Oceanstor 9000 Storage System
4 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
HPE Servers and Storage: Portfolio at A Glance: October 2019
No ratings yet
HPE Servers and Storage: Portfolio at A Glance: October 2019
14 pages
Huawei IT Product Line Introduction
No ratings yet
Huawei IT Product Line Introduction
32 pages
B1 HPE Storage Overview. How To Position The Different Storage Products Van Der Lugt 1
No ratings yet
B1 HPE Storage Overview. How To Position The Different Storage Products Van Der Lugt 1
70 pages
Hadoop Chapter 1
No ratings yet
Hadoop Chapter 1
6 pages
Huawei FusionStorage Data Sheet
No ratings yet
Huawei FusionStorage Data Sheet
6 pages
Cloud
No ratings yet
Cloud
1 page
Big Data
No ratings yet
Big Data
27 pages
CL01 FusionSphere 6.3 Overview
No ratings yet
CL01 FusionSphere 6.3 Overview
29 pages
Huawei Green & Cloud Data Center Introduction: Huawei Technologies Co., LTD
No ratings yet
Huawei Green & Cloud Data Center Introduction: Huawei Technologies Co., LTD
27 pages
Huawei FusionCloud Solution Brochure PDF
No ratings yet
Huawei FusionCloud Solution Brochure PDF
12 pages
1 CNC Press Break
No ratings yet
1 CNC Press Break
27 pages
DAA Practical
No ratings yet
DAA Practical
68 pages
Chapter 8 Mapreduce Service (MRS)
No ratings yet
Chapter 8 Mapreduce Service (MRS)
23 pages
CT 2
No ratings yet
CT 2
8 pages
R12 Employee Suppliers
No ratings yet
R12 Employee Suppliers
24 pages
Toolbox PLUS Users Manual 3.11.0
No ratings yet
Toolbox PLUS Users Manual 3.11.0
238 pages
ATC Course Structures
No ratings yet
ATC Course Structures
8 pages
Cics Mock Test III
No ratings yet
Cics Mock Test III
6 pages
MT8121XE3 Datasheet ENG
No ratings yet
MT8121XE3 Datasheet ENG
2 pages
Data Communication & Networking (CEN-222)
No ratings yet
Data Communication & Networking (CEN-222)
12 pages
Notes - 5 Unit
No ratings yet
Notes - 5 Unit
54 pages
Module 11 Kafka - Distributed Message Subscription System
No ratings yet
Module 11 Kafka - Distributed Message Subscription System
34 pages
A Wormhole Attack Detection and Prevention Techniq
No ratings yet
A Wormhole Attack Detection and Prevention Techniq
9 pages
Altr BRSFT IPO9
No ratings yet
Altr BRSFT IPO9
31 pages
Module 12 Zookeeper - Cluster Distributed Coordination Service
No ratings yet
Module 12 Zookeeper - Cluster Distributed Coordination Service
26 pages
Module 01 Big Data Industry and Technological Trends
No ratings yet
Module 01 Big Data Industry and Technological Trends
50 pages
A6K Product Manual
No ratings yet
A6K Product Manual
16 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Boucherit Oussama F1
No ratings yet
Boucherit Oussama F1
55 pages
Appendix 4 Apply For HCAI Certificate Guide
No ratings yet
Appendix 4 Apply For HCAI Certificate Guide
8 pages
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
No ratings yet
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
8 pages
TopupArticle Latest PDF
No ratings yet
TopupArticle Latest PDF
23 pages
Module 07 Streaming - Distributed Stream Computing Engine
No ratings yet
Module 07 Streaming - Distributed Stream Computing Engine
33 pages
Module 10 Flume - Massive Logs Aggregation
No ratings yet
Module 10 Flume - Massive Logs Aggregation
42 pages
12th IP Unit-1 Numpy - Array
No ratings yet
12th IP Unit-1 Numpy - Array
21 pages
Config Windows Server 2022 SP Us
No ratings yet
Config Windows Server 2022 SP Us
26 pages
ANT260 Answer Key 2
No ratings yet
ANT260 Answer Key 2
5 pages
Aleph 20 Syslib Guide - Search
No ratings yet
Aleph 20 Syslib Guide - Search
21 pages
LPD8 Editor User Guide: To Download and Install The Editor Software
No ratings yet
LPD8 Editor User Guide: To Download and Install The Editor Software
2 pages
Cyber Security Jan 2024
No ratings yet
Cyber Security Jan 2024
8 pages
Module 1 - Introduction To React JS
No ratings yet
Module 1 - Introduction To React JS
8 pages
Game Requirements For Venge Io (Clone)
No ratings yet
Game Requirements For Venge Io (Clone)
3 pages
Bus Com
No ratings yet
Bus Com
3 pages
Resumen Del Hardware Del Ordenador
No ratings yet
Resumen Del Hardware Del Ordenador
2 pages
Cutman Course Hosting Guidelines
No ratings yet
Cutman Course Hosting Guidelines
4 pages
Windows Basic Notes December 2024
No ratings yet
Windows Basic Notes December 2024
3 pages
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
From Everand
Learn IoT Programming Using Node-RED: Begin to Code Full Stack IoT Apps and Edge Devices with Raspberry Pi, NodeJS, and Grafana
Bernardo Ronquillo Japón
No ratings yet

Module 13 FusionInsight HD Solution Overview

Uploaded by

Module 13 FusionInsight HD Solution Overview

Uploaded by

FusionInsight HD

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

3. Success Cases of FusionInsight

Big Data Analytics Platform

Patch selection Performance Baseline

Hadoop HBase Log

Initial Prosperous Enterprise

Outstanding product development and delivery capabilities and carrier-class

3. Success Cases of FusionInsight

All components Cross–data center

HA for all Third-party backup system

Network plane Hot-swappable hard

System Permission Data

Operating User permission

WebUI-Client It only manages the

 Qualification ratio of inspection items

Node qualification rate

Node qualification rate 

The HBase table design tool, HBase API

The community's Spark JDBCServer supports only single tenants. A tenant is

1 MB+1 MB … 1 MB+1 MB … RDD2

Text/Parquet/ORC/Json Table on HDFS

 Random access (small-scale scanning):

 Latest 1.0.0, Jan 28, 2017

reverse(Column1, 4) Column2 Column3

No index: "Scan+Filter", scanning Secondary index: The target data can

 Index Region and Data Region as companions under a unified processing

AccountInfo A0002 $105

A0003 Selin $90000

Fine-grained scheduling based on application awareness, improving resource utilization

Company Enterprise tenant

Service resources HBase ...

DataNode DataNode DataNode DataNode DataNode Flume

3. Success Cases of FusionInsight

Vehicle traffic and event awareness Traffic flow analysis

Traffic accident perception and analysis Traffic signal optimization

Key vehicle Key vehicle

Number of vehicles Number of vehicles (400

 Customer groups are  Advertisements can be

Application Marketing Marketing Statistical Scheduling

Event detection Recommendation

Huawei enterprise-class big data platform (FusionInsight)

Data source Marketing Effect evaluation and

Customer group filtering

Model effect evaluation, customer data update, and model improvement

Comprehensive: supporting Precise Reliable: uninterrupted

Real-time query platform

Hive Spark SQL

Traditional data (BOM) New data (Internet)

DWH Hadoop Archiving CSSD

2. Which encryption algorithms are supported by Hive/HBase fine-

4. What are the levels of logs that can be adjusted?

You might also like