azkaban dev details

#### jobtype
* spark_to_mysql_and_oracle_full
* spark_to_mysql_and_oracle_incremental
* spark_to_mysql_full
* spark_to_mysql_full_no_partition
* spark_to_oracle_full_no_partition
* spark_to_mysql_incremental
* spark_to_oracle_full
* spark_to_oracle_incremental
* spark_to_spark
* spark_to_oracle_full_all_partition
* spark_to_mysql_full_all_partition
* oracle_to_spark_full
* oracle_to_spark_incremental
* mysql_to_spark_full
* mysql_to_spark_incremental

> 有一些需要注意的问题

1、默认数据导入到oracle的kettle中，如果需要修改oracle的schema和password需要在job文件中添加KETTLE_USER=\$\{KETTLE_DW_USER\}和KETTLE_PASSWORD=\$\{KETTLE_DW_PASSWORD\}

2、由于所有的job都是可以重复运行，所以会在export数据到oracle之前需要先truncate oracle需要插入的数据，需要在oracle和mysql中添加create_timestamp字段用来标示数据导入时间

3、对于incremental类型的job导出spark table partition值是dw_audit_cre_date=${YESTERDAY},对于full类型的job导出spark table partition值是dw_audit_cre_date=${CURRENT_DATE}，如果有不同需求可以设置PARTITION_INCREMENTAL_VALUE的值。例如想要incremental类型的job导出今天的分区可以设置PARTITION_INCREMENTAL_VALUE=\\$\\{CURRENT_DATE\\},想导出三天前的数据可以设置PARTITION_INCREMENTAL_VALUE=\\$\\{LAST_THREE_DAYS\\}

4、默认job导出的spark table的分区值是dw_audit_cre_date，如果需要修改可以在job文件中设置PARTITION_COLUMN=XXX

5、在spark中date类型是不带时分秒的，而oracle中的date类型必须要有时分秒，所以在spark的date类型不能直接导出到oracle的date类型，需要使用spark的datetimestamp替换

**oracle_to_spark_full，oracle_to_spark_incremental** : 需要在job文件中新增参数
*id：oracle表主键*
*spark-script：*

如果是oracle_to_spark_full时,
spark-script=/home/ubuntu/etl/scala/**import_full.sc**

如果是oracle_to_spark_incremental时,
spark-script=/home/ubuntu/etl/scala/**import_incremental.sc**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azkaban dev details #1

jobtype

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

azkaban dev details #1

Description

jobtype

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions