0% found this document useful (0 votes)
0 views

Module 09 Loader - Data Transformation

The document provides an overview of Loader, a tool for data and file exchange between FusionInsight HD and various databases and file systems. It details Loader's features, system architecture, job management, and monitoring capabilities, including job creation and conversion rules. Additionally, it includes information on client scripts for managing jobs and data sources.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Module 09 Loader - Data Transformation

The document provides an overview of Loader, a tool for data and file exchange between FusionInsight HD and various databases and file systems. It details Loader's features, system architecture, job management, and monitoring capabilities, including job creation and conversion rules. Additionally, it includes information on client scripts for managing jobs and data sources.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Technical Principles of

Loader

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.


Objectives
 Upon completion of this course, you will be able to know:
 What Loader is
 What Loader can be used for
 Position of Loader in FusionInsight
 System architecture of Loader
 Main features of Loader
 How to manage Loader jobs
 How to monitor Loader jobs

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to Loader

2. Loader Job Management

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
What Is Loader
 Loader is a loading tool for data and file exchange between
FusionInsight HD and relational databases and file systems. Loader
provides a wizard-based job configuration management WebUI and
supports timed task scheduling and periodic Loader job
implementation. On the WebUI, users can specify multiple data
sources, configure data cleaning and conversion steps and the cluster
storage system.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Application Scenarios of Loader

RDB

SFTP Server
Loader
Hadoop

HDFS

FTP Server HBase

Hive

Customized
Data Source

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Position of Loader in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog

Data Information Knowledge Wisdom


DataFarm Loader Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
HIVE M/R Spark Storm Flink
Hadoop LibrA
YARN/ Zookeeper Security
management
HDFS/HBase

Loader is a loading tool for data and file exchange between


FusionInsight HD and relational databases and file systems.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Features of Loader

High
GUI Performance

 Provides a GUI that


 Uses MapReduce for
facilitates operations. parallel data
processing.
Loader

Highly
Reliability 
Security
Deploys Loader Servers in active/standby mode.
 Uses MapReduce to execute jobs and supports
retry after failure.
 Leaves no junk data after a job failure occurs.  Kerberos
authentication

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Module Architecture of Loader
Loader
External Data Source
Loader Client
Tool WebUI JDBC File

REST API SFTP/FTP


JDBC

Transform Engine
Job
Execution Engine
Scheduler
Submission Engine Yarn Map Task

Job Manager HBase

Metadata Repository HDFS Reduce Task

HA Manager Hive
Loader Server

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Module Architecture of Loader -
Module Description
Module Description
Loader Client Provides a web user interface (WebUI) and a command-line interface (CLI).
Processes operation requests sent from the client, manages connectors and
Loader Server
metadata, submits MapReduce jobs, and monitors MapReduce job status.
Provides a Representational State Transfer (RESTful) interface (HTTP + JSON)
REST API
to process the operation requests from the client.
Job Scheduler Periodically executes Loader jobs.
A data transformation engine that supports field combination, string
Transform Engine
cutting, and string reverse.
Execution Engine Executes Loader jobs in MapReduce manner.
Submission Engine Submits Loader jobs to MapReduce.
Manages Loader jobs, including creating, querying, updating, deleting,
Job Manager
activating/deactivating, starting and stopping jobs.
Metadata warehouse, which stores and manages connectors, conversion
Metadata Repository
steps, and Loader jobs.
Manages the standby and active status of Loader Servers. Two Loader
HA Manager
Servers are deployed in active/standby mode.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Contents
1. Introduction to Loader

2. Loader Job Management

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Service Status WebUI of Loader
 Choose Services > Loader to go to the Loader Status page.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Job Management WebUI of Loader
 On the Loader Status page, click LoaderServer (Active) to
go to the job management WebUI of Loader.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Job Management WebUI of Loader -
Job
 A job describes the process of extracting, transforming,
and loading data from the data source to the target end.
It includes data source location and attributes, rules for
source-to-target data conversion, and target end
attributes.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Job Management WebUI of Loader -
Job Conversion Rules
 Loader Coversion Operators:
 Long Date Conversion: performs long integer and date conversion.

 If Null: converts null values into specified values.

 Add Constants: generates constant fields.

 Generate Random: generates random value fields.

 Concatenate Fields: concatenates existing fields to generate new fields.

 Extracts Fields: separates an existing field by using specified delimiters to generate new
fields.

 Modulo Integer: performs modulo operations on existing fields to generate new fields.

 String Cut: cuts existing string fields by the specified start position and end position to
generate new fields.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Creating a Loader Job - Basic
Information

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Creating a Loader Job - From

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Creating a Loader Job - Transform

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Creating a Loader Job - To

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Monitoring Job Execution Status
 Check the execution status of all jobs:
1. Go to the Loader job management page.

2. The page displays all current jobs and last execution status.

3. Select a job, and click a button in the Operation column to


perform a corresponding operation.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Monitoring Job Execution Status - Job
Execution History
 View execution records of specified jobs:
1. Select a job, and then click the History button in the Operation
column.

2. The historical record page displays the start time, duration (s),
status, failure cause, number of read/written/skipped rows/files,
dirty data link, and MapReduce log link of each execution.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Monitoring Job Execution Status - Dirty
Data
 Dirty data refers to those that does not meet Loader conversion rules, which
can be checked with the following steps.
1. If the number of skipped job records is not 0 on the job history page, click the
dirty data button to go to the dirty data directory

2. Dirty data is stored in HDFS, and the dirty data generated by each Map job is
stored in a separate file.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Monitoring Job Execution Status -
MapReduce Log
 On the job history page, click the log button. The
MapReduce log page for the execution is displayed.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Monitoring Job Execution Status - Job
Execution Failure Alarm
 When a job fails to be executed, an alarm is reported.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Introduction to Client Scripts
 Loader provides GUIs and complete shell scripts. These scripts can be
used to add, delete, change, and query data sources and jobs, start
and stop jobs, and check the job status and whether a job is running.

 These scripts are described as follows:


 lt-ctl: Controls jobs and is used to query job status, start and stop jobs,
and check whether jobs are running.

 lt-ucj: Manages jobs and is used to query, create, modify, and delete jobs.

 lt-ucc: Manages data sources and is used to query, create, modify, and
delete data source connection information.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Summary
 This module describes the following information about Loader:
main functions and features, job management and monitoring.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Quiz
True or False:

1. FusionInsight Loader supports only data import and export between


relational databases and Hadoop HDFS and HBase.

2. Conversion steps must be configured for Loader jobs.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Quiz
1. Which of the following statements are CORRECT?
A. No residual original files are left when a job fails after proper running for
some time.
B. Dirty data refers to the data that does not comply with conversion rules.
C. Loader client scripts can only be used to submit jobs.
D. A human-machine account can be used to perform operations on all
Loader jobs.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Quiz
2. Which of the following statements is CORRECT?
A. If Loader is faulty after it submits a job to MapReduce, the job will fail to be
executed.
B. If a Mapper execution fails after Loader submits a job to MapReduce, a
second execution is automatically performed.
C. Residual data generated after a Loader job fails to be executed needs to be
manually cleared.
D. After Loader submits a job to MapReduce for execution, it cannot submit
other jobs before the execution is complete.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Thank You
www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31

You might also like