100% found this document useful (1 vote)
679 views17 pages

Installation Guide Apache Kylin

Uploaded by

Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
679 views17 pages

Installation Guide Apache Kylin

Uploaded by

Jose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Installation Guide

Software Requirements

 Hadoop: cdh5.x, cdh6.x, hdp2.x, EMR5.x, EMR6.x, HDI4.x


 Hive: 0.13 - 1.2.1+
 Spark: 2.4.7
 Mysql: 5.1.17 及以上
 JDK: 1.8+
 OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+

Tests passed on Hortonworks HDP2.4, Cloudera CDH 5.7 and 6.3.2, AWS EMR 5.31 and
6.0, Azure HDInsight 4.0.

We recommend you to try out Kylin or develop it using the integrated sandbox, such as
HDP sandbox, and make sure it has at least 10 GB of memory. When configuring a
sandbox, we recommend that you use the Bridged Adapter model instead of the NAT
model.

Hardware Requirements

The minimum configuration of a server running Kylin is 4 core CPU, 16 GB RAM and 100
GB disk. For high-load scenarios, a 24-core CPU, 64 GB RAM or higher is recommended.

Hadoop Environment

Kylin relies on Hadoop clusters to handle large data sets. You need to prepare a Hadoop
cluster with HDFS, YARN, Hive, Zookeeper and other services for Kylin to run.
Kylin can be launched on any node in a Hadoop cluster. For convenience, you can run
Kylin on the master node. For better stability, it is recommended to deploy Kylin on a clean
Hadoop client node with Hive, HDFS and other command lines installed and client
configuration (such as core-site.xml, hive-site.xmland others) are also reasonably
configured and can be automatically synchronized with other nodes.

Linux accounts running Kylin must have access to the Hadoop cluster, including the
permission to create/write HDFS folders, Hive tables.

Kylin Installation

 Download a Apache kylin 4.0.0 binary package from the Apache Kylin Download
Site. For example, the following command line can be used:

cd /usr/local/
wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-4.0.0/apache-
kylin-4.0.0-bin.tar.gz
 Unzip the tarball and configure the environment variable $KYLIN_HOME to the Kylin
folder.

tar -zxvf apache-kylin-4.0.0-bin.tar.gz


cd apache-kylin-4.0.0-bin
export KYLIN_HOME=`pwd`

 Run the script to download spark:

$KYLIN_HOME/bin/download-spark.sh

Or configure SPARK_HOME points to the path of spark2.4.7 in the environment.

 Configure MySQL metastore

Kylin 4.0 uses MySQL as metadata storage, make the following configuration in
kylin.properties:

kylin.metadata.url=kylin_metadata@jdbc,driverClassName=com.mysql.jdbc.Dri
ver,url=jdbc:mysql//localhost:3306/kylin_test,username=,password=
kylin.env.zookeeper-connect-string=ip:2181

You need to change the Mysql user name and password, as well as the database and table
where the metadata is stored. And put mysql jdbc connector into $KYLIN_HOME/ext/, if
there is no such directory, please create it.
Please refer to 配置 Mysql 为 Metastore learn about the detailed configuration of MySQL
as a Metastore.

Kylin tarball structure

 bin: shell scripts to start/stop Kylin service, backup/restore metadata, as well as


some utility scripts.
 conf: XML configuration files. The function of these xml files can be found in
configuration page
 lib: Kylin jar files for external use, like the Hadoop job jar, JDBC driver, HBase
coprocessor jar, etc.
 meta_backups: default backup folder when run “bin/metastore.sh backup”;
 sample_cube: files to create the sample cube and its tables.
 spark: Spark by $KYLIN_HOME/bin/download.sh download.
 tomcat the tomcat web server that run Kylin application.
 tool: the jar file for running utility CLI.

Perform additional steps for some environments

For Hadoop environment of CDH6.X, EMR5.X, EMR6.X, you need to perform some
additional steps before starting kylin.
For CDH6.X environment, please check the document: Deploy kylin4.0 on CDH6
For EMR environment, please check the document: Deploy kylin4.0 on EMR

Checking the operating environment

Kylin runs on a Hadoop cluster and has certain requirements for the version, access rights,
and CLASSPATH of each component. To avoid various environmental problems, you can
run the script, $KYLIN_HOME/bin/check-env.sh to have a test on your environment, if
there are any problems with your environment, the script will print a detailed error
message. If there is no error message, it means that your environment is suitable for Kylin
to run.

Start Kylin

Run the script, $KYLIN_HOME/bin/kylin.sh start , to start Kylin. The interface output is
as follows:

Retrieving hadoop conf dir...


KYLIN_HOME is set to /usr/local/apache-kylin-4.0.0-bin
......
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/apache-kylin-4.0.0-bin/logs/kylin.log
Web UI is at http://<hostname>:7070/kylin

Using Kylin

Once Kylin is launched, you can access it via the browser


http://<hostname>:7070/kylin with
specifying <hostname> with IP address or domain name, and the default port is 7070.
The initial username and password are ADMIN/KYLIN.
After the server is started, you can view the runtime log, $KYLIN_HOME/logs/kylin.log.

Stop Kylin

Run the $KYLIN_HOME/bin/kylin.sh stop script to stop Kylin. The console output is as
follows:

Retrieving hadoop conf dir...


KYLIN_HOME is set to /usr/local/apache-kylin-4.0.0-bin
Stopping Kylin: 25964
Stopping in progress. Will check after 2 secs again...
Kylin with pid 25964 has been stopped.

You can run ps -ef | grep kylin to see if the Kylin process has stopped.

HDFS folder structure


Kylin will generate files on HDFS. The default root directory is “kylin/”, and then the
metadata table name of kylin cluster will be used as the second layer directory name, and
the default is “kylin_metadata”(can be customized in conf/kylin.properties)

Generally, /kylin/kylin_metadata directory stores data according to different projects,


such as data directory of “learn_kylin” project is /kylin/kylin_metadata/learn_kylin,
which usually includes the following subdirectories:
1.job_tmp: store temporary files generated during the execution of tasks.
2.parquet: the cuboid file of each cube.
3.table_snapshot: stores the dimension table snapshot.

Deploy kylin on AWS EC2 without hadoop


Compared with Kylin 3.x, Kylin 4.0 implements a new Spark build engine and parquet
storage, making it possible for Kylin to deploy without Hadoop environment. Compared
with deploying Kylin 3.x on AWS EMR, deploying kylin4 directly on AWS EC2 instances
has the following advantages:
1. Cost saving. Compared with AWS EMR node, AWS EC2 node has lower cost.
2. More flexible. On the EC2 node, users can more independently select the services and
components they need for installation and deployment.
3. Remove Hadoop dependency. Hadoop ecology is heavy and needs to be maintained at a
certain labor cost. Remove hadoop can be closer to the cloud-native.

After realizing the feature of supporting build and query in Spark Standalone mode, we
tried to deploy Kylin 4.0 without Hadoop on the EC2 instance of AWS, and successfully
built the cube and query.

Environment preparation

 Apply for AWS EC2 Linux instances as required


 Create Amazon RDS for MySQL as kylin and hive metabases
 S3 as kylin’s storage

Component version information

The component version information provided here is that we selected during the test. If
users need to use other versions for deployment, you can replace them by yourself and
ensure the compatibility between component versions.

 JDK 1.8
 Hive 2.3.9
 Zookeeper 3.4.13
 Kylin 4.0 for spark3
 Spark 3.1.1
 Hadoop 3.2.0(No startup required)
Deployment process

1 Configure environment variables

 Modify profile
 vim /etc/profile

 # Add the following at the end of the profile file
 export JAVA_HOME=/usr/local/java/jdk1.8.0_291
 export JRE_HOME=${JAVA_HOME}/jre
 export HADOOP_HOME=/etc/hadoop/hadoop-3.2.0
 export HIVE_HOME=/etc/hadoop/hive
 export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
 export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:${HADOOP_HOME}/bin:$
{JAVA_HOME}/bin:$PATH

 # Execute after saving the contents of the above file
 source /etc/profile
2 Install JDK 1.8

 Download JDK1.8 to the prepared EC2 instance and unzip it to the


/usr/local/Java directory:
 mkdir /usr/local/java
 tar -xvf java-1.8.0-openjdk.tar -C /usr/local/java
3 Config Hadoop

 Download Hadoop and unzip it


 wget https://archive.apache.org/dist/hadoop/common/hadoop-
3.2.0/hadoop-3.2.0.tar.gz
 mkdir /etc/hadoop
 tar -xvf hadoop-3.2.0.tar.gz -C /etc/hadoop

 Copy the jar package required by S3 to the Hadoop class loading path, otherwise an
error of ClassNotFound type may occur
 cd /etc/hadoop
 cp hadoop-3.2.0/share/hadoop/tools/lib/aws-java-sdk-bundle-
1.11.375.jar hadoop-3.2.0/share/hadoop/common/lib/
 cp hadoop-3.2.0/share/hadoop/tools/lib/hadoop-aws-3.2.0.jar hadoop-
3.2.0/share/hadoop/common/lib/

 Modify core-site.xml,config AWS account information and endpoint. The


following is an example:
 <?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <!--
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing,
software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
 See the License for the specific language governing permissions
and
 limitations under the License. See accompanying LICENSE file.
 -->

 <!-- Put site-specific property overrides in this file. -->

 <configuration>
 <property>
 <name>fs.s3a.access.key</name>
 <value>SESSION-ACCESS-KEY</value>
 </property>
 <property>
 <name>fs.s3a.secret.key</name>
 <value>SESSION-SECRET-KEY</value>
 </property>
 <property>
 <name>fs.s3a.endpoint</name>
 <value>s3.$REGION.amazonaws.com</value>
 </property>
 </configuration>
4 Install Hive

 Download Hive and unzip it


 wget https://downloads.apache.org/hive/hive-2.3.9/apache-hive-
2.3.9-bin.tar.gz
 tar -xvf apache-hive-2.3.9-bin.tar.gz -C /etc/hadoop
 mv /etc/hadoop/apache-hive-2.3.9-bin /etc/hadoop/hive

 Configure environment variables


 vim /etc/profile

 # Add the following at the end of the profile file
 export HIVE_HOME=/etc/hadoop/hive
 export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf

 # Execute after saving the contents of the above file
 source /etc/profile
 Modify hive-site.xml, vim ${HIVE_HOME}/conf/hive-site.xml. Please start
Amazon RDS for MySQL database in advance to obtain the mysql connection URI,
user name and password.

Note: Please configure VPC and security group correctly to ensure that EC2
instances can access the database normally.

The sample content of hive-site.xml is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or
more
contributor license agreements. See the NOTICE file distributed
with
this work for additional information regarding copyright
ownership.
The ASF licenses this file to You under the Apache License,
Version 2.0
(the "License"); you may not use this file except in compliance
with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,


software
distributed under the License is distributed on an "AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
See the License for the specific language governing permissions
and
limitations under the License.
--><configuration>
<!-- WARNING!!! This file is auto generated for documentation
purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored
by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml
instead. -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password to use against metastore
database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://host-name:3306/hive?
createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
<description>Username to use against metastore
database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore
matches with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to
manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore
doesn't match with one from in Hive jars.
</description>
</property>
</configuration>

 Hive metadata initialization


 # Download the jar package of MySQL JDBC and place it in
$HIVE_HOME/lib directory
 cp mysql-connector-java-5.1.47.jar $HIVE_HOME/lib
 bin/schematool -dbType mysql -initSchema
 mkdir $HIVE_HOME/logs
 nohup $HIVE_HOME/bin/hive --service metastore >>
$HIVE_HOME/logs/hivemetastorelog.log 2>&1 &

Note:If the following error is reported in this step:

java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/Stri
ng;Ljava/lang/Object;)V

This is caused by the inconsistency between the guava version in hive2 and the
guava version in Hadoop3. Please replace the guava jar in directory
$HIVE_HOME/lib with the guava jar in directory
$HADOOP_HOME/share/hadoop/common/lib/.

 To prevent jar package conflicts in the subsequent process, you need to remove
some spark and scala related jar packages from hive’s class loading path:
 mkdir $HIVE_HOME/spark_jar
 mv $HIVE_HOME/lib/spark-* $HIVE_HOME/spark_jar
 mv $HIVE_HOME/lib/jackson-module-scala_2.11-2.6.5.jar
$HIVE_HOME/spark_jar

Note: Here just lists the conflicting jar packages encountered during the test. If users
encounter problems similar to jar package conflicts, you can judge which jar
packages have conflicts according to the class loading path and remove the relevant
jar packages. It is recommended to keep the jar package version under the spark
class loading path when the same jar package has version conflicts.

5 Deploy Spark Standalone

 Download Spark 3.1.1 and unzip it


 wget http://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-
bin-hadoop3.2.tgz
 tar -xvf spark-3.1.1-bin-hadoop3.2.tgz -C /etc/hadoop
 mv /etc/hadoop/spark-3.1.1-bin-hadoop3.2 /etc/hadoop/spark
 export SPARK_HOME=/etc/hadoop/spark

 Copy jar package required by S3:


 cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-3.2.0.jar
$SPARK_HOME/jars
 cp $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-bundle-
1.11.375.jar $SPARK_HOME/jars
 cp mysql-connector-java-5.1.47.jar $SPARK_HOME/jars

 Copy hive-site.xml and mysql-jdbc


 cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf

 Setup Spark master and worker


 $SPARK_HOME/bin/start-master.sh
 $SPARK_HOME/bin/start-worker.sh spark://hostname:7077
6 Deploy Zookeeper

 Download zookeeper and unzip it


 wget http://archive.apache.org/dist/zookeeper/zookeeper-
3.4.13/zookeeper-3.4.13.tar.gz
 tar -xvf zookeeper-3.4.13.tar.gz -C /etc/hadoop
 mv /etc/hadoop/zookeeper-3.4.13 /etc/hadoop/zookeeper

 Preparing the zookeeper configuration file. Since only one EC2 node is used in the
test, the zookeeper pseudo cluster is deployed here.
 cp /etc/hadoop/zookeeper/conf/zoo_sample.cfg
/etc/hadoop/zookeeper/conf/zoo1.cfg
 cp /etc/hadoop/zookeeper/conf/zoo_sample.cfg
/etc/hadoop/zookeeper/conf/zoo2.cfg
 cp /etc/hadoop/zookeeper/conf/zoo_sample.cfg
/etc/hadoop/zookeeper/conf/zoo3.cfg
 Modify the above three configuration files in sequence and add the following
contents, note that change the directory name to a different directory:
 server.1=localhost:2287:3387
 server.2=localhost:2288:3388
 server.3=localhost:2289:3389
 dataDir=/tmp/zookeeper/zk1/data
 dataLogDir=/tmp/zookeeper/zk1/log
 clientPort=2181

 Create the required folders and files:


 mkdir /tmp/zookeeper/zk1/data
 mkdir /tmp/zookeeper/zk1/log
 mkdir /tmp/zookeeper/zk2/data
 mkdir /tmp/zookeeper/zk2/log
 mkdir /tmp/zookeeper/zk3/data
 mkdir /tmp/zookeeper/zk3/log
 vim /tmp/zookeeper/zk1/data/myid
 vim /tmp/zookeeper/zk2/data/myid
 vim /tmp/zookeeper/zk3/data/myid

 Setup zookeeper cluster


 /etc/hadoop/zookeeper/bin/zkServer.sh start
/etc/hadoop/zookeeper/conf/zoo1.cfg
 /etc/hadoop/zookeeper/bin/zkServer.sh start
/etc/hadoop/zookeeper/conf/zoo2.cfg
 /etc/hadoop/zookeeper/bin/zkServer.sh start
/etc/hadoop/zookeeper/conf/zoo3.cfg
7 Setup kylin

 Download kylin 4.0 binary package and unzip it


 wget https://mirror-hk.koddos.net/apache/kylin/apache-kylin-
4.0.0/apache-kylin-4.0.0-bin.tar.gz
 tar -xvf apache-kylin-4.0.0-bin.tar.gz /etc/hadoop
 export KYLIN_HOME=/etc/hadoop/apache-kylin-4.0.0-bin
 mkdir $KYLIN_HOME/ext
 cp mysql-connector-java-5.1.47.jar $KYLIN_HOME/ext

 Modify kylin.properties vim $KYLIN_HOME/conf/kylin.properties


 kylin.metadata.url=kylin_metadata@jdbc,url=jdbc:mysql://hostname:33
06/kylin,username=root,password=password,maxActive=10,maxIdle=10
 kylin.env.zookeeper-connect-string=hostname
 kylin.engine.spark-conf.spark.master=spark://hostname:7077
 kylin.engine.spark-conf.spark.submit.deployMode=client
 kylin.env.hdfs-working-dir=s3://bucket/kylin
 kylin.engine.spark-conf.spark.eventLog.dir=s3://bucket/kylin/spark-
history
 kylin.engine.spark-
conf.spark.history.fs.logDirectory=s3://bucket/kylin/spark-history
 kylin.engine.spark-conf.spark.yarn.jars=s3://bucket/spark2_jars/*
 kylin.query.spark-conf.spark.master=spark://hostname:7077
 kylin.query.spark-conf.spark.yarn.jars=s3://bucket/spark2_jars/*

 Execute bin/kylin.sh start

 Kylin may encounter ClassNotFound type errors during startUp. Please refer to the
following methods to restart kylin:
 # Download commons-collections-3.2.2.jar
 cp commons-collections-3.2.2.jar
$KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib/
 # Download commons-configuration-1.3.jar
 cp commons-configuration-1.3.jar
$KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib/
 cp $HADOOP_HOME/share/hadoop/common/lib/aws-java-sdk-bundle-
1.11.563.jar $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib/
 cp $HADOOP_HOME/share/hadoop/common/lib/hadoop-aws-3.2.2.jar
$HADOOP_HOME/tomcat/webapps/kylin/WEB-INF/lib/

Deploy in Cluster Mode


Kylin instances are stateless services, and runtime state information is stored in the Mysql
metastore. For load balancing purposes, you can enable multiple Kylin instances that share
a metastore, so that each node shares query pressure and backs up each other, improving
service availability. The following figure depicts a typical scenario for Kylin cluster mode
deployment:

Kylin Node Configuration

If you need to cluster multiple Kylin nodes, make sure they use the same Hadoop cluster.
Then do the following steps in each node’s configuration file
$KYLIN_HOME/conf/kylin.properties:

1. Configure the same kylin.metadata.url value to configure all Kylin nodes to use
the same Mysql metastore.
2. Configure the Kylin node list kylin.server.cluster-servers, including all
nodes (the current node is also included). When the event changes, the node
receiving the change needs to notify all other nodes (the current node is also
included).
3. Configure the running mode kylin.server.mode of the Kylin node. Optional
values include all, job, query. The default value is all.
The job mode means that the service is only used for job scheduling, not for queries;
the query pattern means that the service is only used for queries, not for scheduling
jobs; the all pattern represents the service for both job scheduling and queries.
Note: By default, only one instance can be used for the job scheduling (ie.,
kylin.server.mode is set to all or job).

Enable Job Engine HA

Since v2.0, Kylin supports multiple job engines running together, which is more extensible,
available and reliable than the default job scheduler.

To enable the distributed job scheduler, you need to set or update the configs in the
kylin.properties, there are two configuration options:

kylin.job.scheduler.default=2
kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock

Then please add all job servers and query servers to the kylin.server.cluster-servers.

Use CuratorScheculer

Since v3.0.0-alpha, kylin introduces the Leader/Follower mode multiple job engines
scheduler based on Curator. Users can modify the following configuration to enable
CuratorScheduler:

kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true

For more details about the kylin job scheduler, please refer to Apache Kylin Wiki.

Installing a load balancer

To send query requests to a cluster instead of a single node, you can deploy a load balancer
such as Nginx, F5 or cloudlb, etc., so that the client and load balancer communication
instead communicate with a specific Kylin instance.

Read and write separation deployment

There are some differences between read and write separation deployment of kylin 4 and
kylin 3, Please refer to : Read Write Separation Deployment for Kylin 4

Run Kylin with Docker


In order to allow users to easily try Kylin, and to facilitate developers to verify and debug
after modifying the source code. We provide Kylin’s docker image. In this image, each
service that Kylin relies on is properly installed and deployed, including:

 JDK 1.8
 Hadoop 2.8.5
 Hive 1.2.1
 Spark 2.4.7
 Kafka 1.1.1
 MySQL 5.1.73
 Zookeeper 3.4.6

Quickly try Kylin

We have pushed the Kylin image for the user to the docker hub. Users do not need to build
the image locally, just execute the following command to pull the image from the docker
hub:

docker pull apachekylin/apache-kylin-standalone:4.0.0

After the pull is successful, execute the following command to start the container:

docker run -d \
-m 8G \
-p 7070:7070 \
-p 8088:8088 \
-p 50070:50070 \
-p 8032:8032 \
-p 8042:8042 \
-p 2181:2181 \
apachekylin/apache-kylin-standalone:4.0.0

The following services are automatically started when the container starts:

 NameNode, DataNode
 ResourceManager, NodeManager
 Kylin

and run automatically $KYLIN_HOME/bin/sample.sh .

After the container is started, we can enter the container through the docker exec -it
<container_id> bash command. Of course, since we have mapped the specified port in
the container to the local port, we can open the pages of each service directly in the native
browser, such as:

 Kylin Web UI: http://127.0.0.1:7070/kylin/login


 Hdfs NameNode Web UI: http://127.0.0.1:50070
 Yarn ResourceManager Web UI: http://127.0.0.1:8088

Container resource recommendation

In order to allow Kylin to build the cube smoothly, the memory resource we configured for
Yarn NodeManager is 6G, plus the memory occupied by each service, please ensure that
the memory of the container is not less than 8G, so as to avoid errors due to insufficient
memory.

For the resource setting method for the container, please refer to:

 Mac user: https://docs.docker.com/docker-for-mac/#advanced


 Linux user: https://docs.docker.com/config/containers/resource_constraints/#memory

For how to customize the image, please check the github page kylin/docker.

Advanced Settings
Overwrite default kylin.properties at Cube level

In conf/kylin.properties there are many parameters, which control/impact on Kylin’s


behaviors; Most parameters are global configs like security or job related; while some are
Cube related; These Cube related parameters can be customized at each Cube level, so you
can control the behaviors more flexibly. The GUI to do this is in the “Configuration
Overwrites” step of the Cube wizard, as the screenshot below.

Overwrite default Spark conf at Cube level

The configurations for Spark are managed in conf/kylin.properties with prefix


kylin.engine.spark-conf.. For example, if you want to use job queue “myQueue” to
run Spark, setting “kylin.engine.spark-conf.spark.yarn.queue=myQueue” will let Spark get
“spark.yarn.queue=myQueue” feeded when submitting applications. The parameters can be
configured at Cube level, which will override the default values in
conf/kylin.properties.
Allocate more memory to Kylin instance

Open bin/setenv.sh, which has two sample settings for KYLIN_JVM_SETTINGS


environment variable; The default setting is small (4GB at max.), you can comment it and
then un-comment the next line to allocate 16GB:

export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K


-XX:MaxPermSize=128M -verbose:gc -XX:+PrintGCDetails -XX:
+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:
+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
# export KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m
-XX:NewSize=3g -XX:MaxNewSize=3g -XX:SurvivorRatio=4 -XX:
+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:
+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSInitiatingOccupancyFraction=70 -XX:+DisableExplicitGC -XX:
+HeapDumpOnOutOfMemoryError"
Enable multiple job engines (HA)

Since Kylin 2.0, Kylin support multiple job engines running together, which is more
extensible, available and reliable than the default job scheduler.

To enable the distributed job scheduler, you need to set or update the configs in the
kylin.properties:

kylin.job.scheduler.default=2
kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock

Please add all job servers and query servers to the kylin.server.cluster-servers.

Enable LDAP or SSO authentication

Check How to Enable Security with LDAP and SSO

Enable email notification

Kylin can send email notification on job complete/fail; To enable this, edit
conf/kylin.properties, set the following parameters:

mail.enabled=true
mail.host=your-smtp-server
mail.username=your-smtp-account
mail.password=your-smtp-pwd
mail.sender=your-sender-address
kylin.job.admin.dls=adminstrator-address

Restart Kylin server to take effective. To disable, set mail.enabled back to false.
Administrator will get notifications for all jobs. Modeler and Analyst need enter email
address into the “Notification List” at the first page of cube wizard, and then will get
notified for that cube.

Enable MySQL as Kylin metadata storage

Kylin can use MySQL as the metadata storage, for the scenarios that HBase is not the best
option; To enable this, you can perform the following steps:

 Install a MySQL server, e.g, v5.1.17;


 Create a new MySQL database for Kylin metadata, for example “kylin_metadata”;
 Download and copy MySQL JDBC connector “mysql-connector-java-.jar" to
$KYLIN_HOME/ext (if the folder does not exist, create it yourself);
 Edit conf/kylin.properties, set the following parameters:

kylin.metadata.url={your_metadata_tablename}@jdbc,url=jdbc:mysql://localh
ost:3306/kylin,username={your_username},password={your_password},driverCl
assName=com.mysql.jdbc.Driver
kylin.metadata.jdbc.dialect=mysql
kylin.metadata.jdbc.json-always-small-cell=true
kylin.metadata.jdbc.small-cell-meta-size-warning-threshold=100mb
kylin.metadata.jdbc.small-cell-meta-size-error-threshold=1gb
kylin.metadata.jdbc.max-cell-size=1mb

In “kylin.metadata.url” more configuration items can be added; The url, username, and
password are required items. If not configured, the default configuration items will be
used:

url: the JDBC connection URL;


username: JDBC user name
password: JDBC password, if encryption is selected, please put the
encrypted password here;
driverClassName: JDBC driver class name, the default value is
com.mysql.jdbc.Driver
maxActive: the maximum number of database connections, the default value
is 5;
maxIdle: the maximum number of connections waiting, the default value is
5;
maxWait: The maximum number of milliseconds to wait for connection. The
default value is 1000.
removeAbandoned: Whether to automatically reclaim timeout connections,
the default value is true;
removeAbandonedTimeout: the number of seconds in the timeout period, the
default is 300;
passwordEncrypted: Whether the JDBC password is encrypted or not, the
default is false;

 You can encrypt your password:

cd $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib
java -classpath kylin-server-base-\<version\>.jar:kylin-core-
common-\<version\>.jar:spring-beans-4.3.10.RELEASE.jar:spring-core-
4.3.10.RELEASE.jar:commons-codec-1.7.jar
org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES
<your_password>

 Start Kylin

You might also like