100% found this document useful (1 vote)

679 views17 pages

Installation Guide Apache Kylin

Uploaded by

Jose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

679 views17 pages

Installation Guide Apache Kylin

Uploaded by

Jose

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Installation Guide

Software Requirements

 Hadoop: cdh5.x, cdh6.x, hdp2.x, EMR5.x, EMR6.x, HDI4.x

 Hive: 0.13 - 1.2.1+
 Spark: 2.4.7
 Mysql: 5.1.17 及以上
 JDK: 1.8+
 OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+

Tests passed on Hortonworks HDP2.4, Cloudera CDH 5.7 and 6.3.2, AWS EMR 5.31 and
6.0, Azure HDInsight 4.0.

We recommend you to try out Kylin or develop it using the integrated sandbox, such as
HDP sandbox, and make sure it has at least 10 GB of memory. When configuring a
sandbox, we recommend that you use the Bridged Adapter model instead of the NAT
model.

Hardware Requirements

The minimum configuration of a server running Kylin is 4 core CPU, 16 GB RAM and 100
GB disk. For high-load scenarios, a 24-core CPU, 64 GB RAM or higher is recommended.

Hadoop Environment

Kylin relies on Hadoop clusters to handle large data sets. You need to prepare a Hadoop
cluster with HDFS, YARN, Hive, Zookeeper and other services for Kylin to run.
Kylin can be launched on any node in a Hadoop cluster. For convenience, you can run
Kylin on the master node. For better stability, it is recommended to deploy Kylin on a clean
Hadoop client node with Hive, HDFS and other command lines installed and client
configuration (such as core-site.xml, hive-site.xmland others) are also reasonably
configured and can be automatically synchronized with other nodes.

Linux accounts running Kylin must have access to the Hadoop cluster, including the
permission to create/write HDFS folders, Hive tables.

Kylin Installation

 Download a Apache kylin 4.0.0 binary package from the Apache Kylin Download
Site. For example, the following command line can be used:

cd /usr/local/
wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-4.0.0/apache-
kylin-4.0.0-bin.tar.gz
 Unzip the tarball and configure the environment variable $KYLIN_HOME to the Kylin
folder.

tar -zxvf apache-kylin-4.0.0-bin.tar.gz

cd apache-kylin-4.0.0-bin
export KYLIN_HOME=`pwd`

 Run the script to download spark:

$KYLIN_HOME/bin/download-spark.sh

Or configure SPARK_HOME points to the path of spark2.4.7 in the environment.

 Configure MySQL metastore

Kylin 4.0 uses MySQL as metadata storage, make the following configuration in
kylin.properties:

kylin.metadata.url=kylin_metadata@jdbc,driverClassName=com.mysql.jdbc.Dri
ver,url=jdbc:mysql//localhost:3306/kylin_test,username=,password=
kylin.env.zookeeper-connect-string=ip:2181

You need to change the Mysql user name and password, as well as the database and table
where the metadata is stored. And put mysql jdbc connector into $KYLIN_HOME/ext/, if
there is no such directory, please create it.
Please refer to 配置 Mysql 为 Metastore learn about the detailed configuration of MySQL
as a Metastore.

Kylin tarball structure

 bin: shell scripts to start/stop Kylin service, backup/restore metadata, as well as

some utility scripts.
 conf: XML configuration files. The function of these xml files can be found in
configuration page
 lib: Kylin jar files for external use, like the Hadoop job jar, JDBC driver, HBase
coprocessor jar, etc.
 meta_backups: default backup folder when run “bin/metastore.sh backup”;
 sample_cube: files to create the sample cube and its tables.
 spark: Spark by $KYLIN_HOME/bin/download.sh download.
 tomcat the tomcat web server that run Kylin application.
 tool: the jar file for running utility CLI.

Perform additional steps for some environments

For Hadoop environment of CDH6.X, EMR5.X, EMR6.X, you need to perform some
additional steps before starting kylin.
For CDH6.X environment, please check the document: Deploy kylin4.0 on CDH6
For EMR environment, please check the document: Deploy kylin4.0 on EMR

Checking the operating environment

Kylin runs on a Hadoop cluster and has certain requirements for the version, access rights,
and CLASSPATH of each component. To avoid various environmental problems, you can
run the script, $KYLIN_HOME/bin/check-env.sh to have a test on your environment, if
there are any problems with your environment, the script will print a detailed error
message. If there is no error message, it means that your environment is suitable for Kylin
to run.

Start Kylin

Run the script, $KYLIN_HOME/bin/kylin.sh start , to start Kylin. The interface output is
as follows:

Retrieving hadoop conf dir...

KYLIN_HOME is set to /usr/local/apache-kylin-4.0.0-bin
......
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/apache-kylin-4.0.0-bin/logs/kylin.log
Web UI is at http://<hostname>:7070/kylin

Using Kylin

Once Kylin is launched, you can access it via the browser

http://<hostname>:7070/kylin with
specifying <hostname> with IP address or domain name, and the default port is 7070.
The initial username and password are ADMIN/KYLIN.
After the server is started, you can view the runtime log, $KYLIN_HOME/logs/kylin.log.

Stop Kylin

Run the $KYLIN_HOME/bin/kylin.sh stop script to stop Kylin. The console output is as
follows:

Retrieving hadoop conf dir...

KYLIN_HOME is set to /usr/local/apache-kylin-4.0.0-bin
Stopping Kylin: 25964
Stopping in progress. Will check after 2 secs again...
Kylin with pid 25964 has been stopped.

You can run ps -ef | grep kylin to see if the Kylin process has stopped.

HDFS folder structure

Kylin will generate files on HDFS. The default root directory is “kylin/”, and then the
metadata table name of kylin cluster will be used as the second layer directory name, and
the default is “kylin_metadata”(can be customized in conf/kylin.properties)

Generally, /kylin/kylin_metadata directory stores data according to different projects,

such as data directory of “learn_kylin” project is /kylin/kylin_metadata/learn_kylin,
which usually includes the following subdirectories:
1.job_tmp: store temporary files generated during the execution of tasks.
2.parquet: the cuboid file of each cube.
3.table_snapshot: stores the dimension table snapshot.

Deploy kylin on AWS EC2 without hadoop

Compared with Kylin 3.x, Kylin 4.0 implements a new Spark build engine and parquet
storage, making it possible for Kylin to deploy without Hadoop environment. Compared
with deploying Kylin 3.x on AWS EMR, deploying kylin4 directly on AWS EC2 instances
has the following advantages:
1. Cost saving. Compared with AWS EMR node, AWS EC2 node has lower cost.
2. More flexible. On the EC2 node, users can more independently select the services and
components they need for installation and deployment.
3. Remove Hadoop dependency. Hadoop ecology is heavy and needs to be maintained at a
certain labor cost. Remove hadoop can be closer to the cloud-native.

After realizing the feature of supporting build and query in Spark Standalone mode, we
tried to deploy Kylin 4.0 without Hadoop on the EC2 instance of AWS, and successfully
built the cube and query.

Environment preparation

 Apply for AWS EC2 Linux instances as required

 Create Amazon RDS for MySQL as kylin and hive metabases
 S3 as kylin’s storage

Component version information

The component version information provided here is that we selected during the test. If
users need to use other versions for deployment, you can replace them by yourself and
ensure the compatibility between component versions.

 JDK 1.8
 Hive 2.3.9
 Zookeeper 3.4.13
 Kylin 4.0 for spark3
 Spark 3.1.1
 Hadoop 3.2.0（No startup required）
Deployment process

1 Configure environment variables

 Modify profile
 vim /etc/profile

 # Add the following at the end of the profile file
 export JAVA_HOME=/usr/local/java/jdk1.8.0_291
 export JRE_HOME=${JAVA_HOME}/jre
 export HADOOP_HOME=/etc/hadoop/hadoop-3.2.0
 export HIVE_HOME=/etc/hadoop/hive
 export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
 export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:${HADOOP_HOME}/bin:$
{JAVA_HOME}/bin:$PATH

 # Execute after saving the contents of the above file
 source /etc/profile
2 Install JDK 1.8

 Download JDK1.8 to the prepared EC2 instance and unzip it to the

/usr/local/Java directory:
 mkdir /usr/local/java
 tar -xvf java-1.8.0-openjdk.tar -C /usr/local/java
3 Config Hadoop

 Download Hadoop and unzip it

 wget https://archive.apache.org/dist/hadoop/common/hadoop-
3.2.0/hadoop-3.2.0.tar.gz
 mkdir /etc/hadoop
 tar -xvf hadoop-3.2.0.tar.gz -C /etc/hadoop

 Copy the jar package required by S3 to the Hadoop class loading path, otherwise an
error of ClassNotFound type may occur
 cd /etc/hadoop
 cp hadoop-3.2.0/share/hadoop/tools/lib/aws-java-sdk-bundle-
1.11.375.jar hadoop-3.2.0/share/hadoop/common/lib/
 cp hadoop-3.2.0/share/hadoop/tools/lib/hadoop-aws-3.2.0.jar hadoop-
3.2.0/share/hadoop/common/lib/

 Modify core-site.xml，config AWS account information and endpoint. The

following is an example:
 <?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 

 

 <configuration>
 <property>
 <name>fs.s3a.access.key</name>
 <value>SESSION-ACCESS-KEY</value>
 </property>
 <property>
 <name>fs.s3a.secret.key</name>
 <value>SESSION-SECRET-KEY</value>
 </property>
 <property>
 <name>fs.s3a.endpoint</name>
 <value>s3.$REGION.amazonaws.com</value>
 </property>
 </configuration>
4 Install Hive

 Download Hive and unzip it

 wget https://downloads.apache.org/hive/hive-2.3.9/apache-hive-
2.3.9-bin.tar.gz
 tar -xvf apache-hive-2.3.9-bin.tar.gz -C /etc/hadoop
 mv /etc/hadoop/apache-hive-2.3.9-bin /etc/hadoop/hive

 Configure environment variables

 vim /etc/profile

 # Add the following at the end of the profile file
 export HIVE_HOME=/etc/hadoop/hive
 export PATH=$PATH:$HIVE_HOME/bin:$HIVE_HOME/conf

 # Execute after saving the contents of the above file
 source /etc/profile
 Modify hive-site.xml, vim ${HIVE_HOME}/conf/hive-site.xml. Please start
Amazon RDS for MySQL database in advance to obtain the mysql connection URI,
user name and password.

Note: Please configure VPC and security group correctly to ensure that EC2
instances can access the database normally.

The sample content of hive-site.xml is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or
more
contributor license agreements. See the NOTICE file distributed
with
this work for additional information regarding copyright
ownership.
The ASF licenses this file to You under the Apache License,
Version 2.0
(the "License"); you may not use this file except in compliance
with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,

software
distributed under the License is distributed on an "AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
See the License for the specific language governing permissions
and
limitations under the License.
--><configuration>




<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password to use against metastore
database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://host-name:3306/hive?
createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC
metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
<description>Username to use against metastore
database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore
matches with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to
manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore
doesn't match with one from in Hive jars.
</description>
</property>
</configuration>

 Hive metadata initialization

 # Download the jar package of MySQL JDBC and place it in
$HIVE_HOME/lib directory
 cp mysql-connector-java-5.1.47.jar $HIVE_HOME/lib
 bin/schematool -dbType mysql -initSchema
 mkdir $HIVE_HOME/logs
 nohup $HIVE_HOME/bin/hive --service metastore >>
$HIVE_HOME/logs/hivemetastorelog.log 2>&1 &

Note：If the following error is reported in this step:

java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/Stri
ng;Ljava/lang/Object;)V

This is caused by the inconsistency between the guava version in hive2 and the
guava version in Hadoop3. Please replace the guava jar in directory
$HIVE_HOME/lib with the guava jar in directory
$HADOOP_HOME/share/hadoop/common/lib/.

 To prevent jar package conflicts in the subsequent process, you need to remove
some spark and scala related jar packages from hive’s class loading path:
 mkdir $HIVE_HOME/spark_jar
 mv $HIVE_HOME/lib/spark-* $HIVE_HOME/spark_jar
 mv $HIVE_HOME/lib/jackson-module-scala_2.11-2.6.5.jar
$HIVE_HOME/spark_jar

Note: Here just lists the conflicting jar packages encountered during the test. If users
encounter problems similar to jar package conflicts, you can judge which jar
packages have conflicts according to the class loading path and remove the relevant
jar packages. It is recommended to keep the jar package version under the spark
class loading path when the same jar package has version conflicts.

5 Deploy Spark Standalone

 Download Spark 3.1.1 and unzip it

 wget http://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-
bin-hadoop3.2.tgz
 tar -xvf spark-3.1.1-bin-hadoop3.2.tgz -C /etc/hadoop
 mv /etc/hadoop/spark-3.1.1-bin-hadoop3.2 /etc/hadoop/spark
 export SPARK_HOME=/etc/hadoop/spark

 Copy jar package required by S3:

 cp $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-3.2.0.jar
$SPARK_HOME/jars
 cp $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-bundle-
1.11.375.jar $SPARK_HOME/jars
 cp mysql-connector-java-5.1.47.jar $SPARK_HOME/jars

 Copy hive-site.xml and mysql-jdbc

 cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf

 Setup Spark master and worker

 $SPARK_HOME/bin/start-master.sh
 $SPARK_HOME/bin/start-worker.sh spark://hostname:7077
6 Deploy Zookeeper

 Download zookeeper and unzip it

 wget http://archive.apache.org/dist/zookeeper/zookeeper-
3.4.13/zookeeper-3.4.13.tar.gz
 tar -xvf zookeeper-3.4.13.tar.gz -C /etc/hadoop
 mv /etc/hadoop/zookeeper-3.4.13 /etc/hadoop/zookeeper

 Preparing the zookeeper configuration file. Since only one EC2 node is used in the
test, the zookeeper pseudo cluster is deployed here.
 cp /etc/hadoop/zookeeper/conf/zoo_sample.cfg
/etc/hadoop/zookeeper/conf/zoo1.cfg
 cp /etc/hadoop/zookeeper/conf/zoo_sample.cfg
/etc/hadoop/zookeeper/conf/zoo2.cfg
 cp /etc/hadoop/zookeeper/conf/zoo_sample.cfg
/etc/hadoop/zookeeper/conf/zoo3.cfg
 Modify the above three configuration files in sequence and add the following
contents, note that change the directory name to a different directory:
 server.1=localhost:2287:3387
 server.2=localhost:2288:3388
 server.3=localhost:2289:3389
 dataDir=/tmp/zookeeper/zk1/data
 dataLogDir=/tmp/zookeeper/zk1/log
 clientPort=2181

 Create the required folders and files:

 mkdir /tmp/zookeeper/zk1/data
 mkdir /tmp/zookeeper/zk1/log
 mkdir /tmp/zookeeper/zk2/data
 mkdir /tmp/zookeeper/zk2/log
 mkdir /tmp/zookeeper/zk3/data
 mkdir /tmp/zookeeper/zk3/log
 vim /tmp/zookeeper/zk1/data/myid
 vim /tmp/zookeeper/zk2/data/myid
 vim /tmp/zookeeper/zk3/data/myid

 Setup zookeeper cluster

 /etc/hadoop/zookeeper/bin/zkServer.sh start
/etc/hadoop/zookeeper/conf/zoo1.cfg
 /etc/hadoop/zookeeper/bin/zkServer.sh start
/etc/hadoop/zookeeper/conf/zoo2.cfg
 /etc/hadoop/zookeeper/bin/zkServer.sh start
/etc/hadoop/zookeeper/conf/zoo3.cfg
7 Setup kylin

 Download kylin 4.0 binary package and unzip it

 wget https://mirror-hk.koddos.net/apache/kylin/apache-kylin-
4.0.0/apache-kylin-4.0.0-bin.tar.gz
 tar -xvf apache-kylin-4.0.0-bin.tar.gz /etc/hadoop
 export KYLIN_HOME=/etc/hadoop/apache-kylin-4.0.0-bin
 mkdir $KYLIN_HOME/ext
 cp mysql-connector-java-5.1.47.jar $KYLIN_HOME/ext

 Modify kylin.properties vim $KYLIN_HOME/conf/kylin.properties

 kylin.metadata.url=kylin_metadata@jdbc,url=jdbc:mysql://hostname:33
06/kylin,username=root,password=password,maxActive=10,maxIdle=10
 kylin.env.zookeeper-connect-string=hostname
 kylin.engine.spark-conf.spark.master=spark://hostname:7077
 kylin.engine.spark-conf.spark.submit.deployMode=client
 kylin.env.hdfs-working-dir=s3://bucket/kylin
 kylin.engine.spark-conf.spark.eventLog.dir=s3://bucket/kylin/spark-
history
 kylin.engine.spark-
conf.spark.history.fs.logDirectory=s3://bucket/kylin/spark-history
 kylin.engine.spark-conf.spark.yarn.jars=s3://bucket/spark2_jars/*
 kylin.query.spark-conf.spark.master=spark://hostname:7077
 kylin.query.spark-conf.spark.yarn.jars=s3://bucket/spark2_jars/*

 Execute bin/kylin.sh start

 Kylin may encounter ClassNotFound type errors during startUp. Please refer to the
following methods to restart kylin:
 # Download commons-collections-3.2.2.jar
 cp commons-collections-3.2.2.jar
$KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib/
 # Download commons-configuration-1.3.jar
 cp commons-configuration-1.3.jar
$KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib/
 cp $HADOOP_HOME/share/hadoop/common/lib/aws-java-sdk-bundle-
1.11.563.jar $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib/
 cp $HADOOP_HOME/share/hadoop/common/lib/hadoop-aws-3.2.2.jar
$HADOOP_HOME/tomcat/webapps/kylin/WEB-INF/lib/

Deploy in Cluster Mode

Kylin instances are stateless services, and runtime state information is stored in the Mysql
metastore. For load balancing purposes, you can enable multiple Kylin instances that share
a metastore, so that each node shares query pressure and backs up each other, improving
service availability. The following figure depicts a typical scenario for Kylin cluster mode
deployment:

Kylin Node Configuration

If you need to cluster multiple Kylin nodes, make sure they use the same Hadoop cluster.
Then do the following steps in each node’s configuration file
$KYLIN_HOME/conf/kylin.properties:

1. Configure the same kylin.metadata.url value to configure all Kylin nodes to use
the same Mysql metastore.
2. Configure the Kylin node list kylin.server.cluster-servers, including all
nodes (the current node is also included). When the event changes, the node
receiving the change needs to notify all other nodes (the current node is also
included).
3. Configure the running mode kylin.server.mode of the Kylin node. Optional
values include all, job, query. The default value is all.
The job mode means that the service is only used for job scheduling, not for queries;
the query pattern means that the service is only used for queries, not for scheduling
jobs; the all pattern represents the service for both job scheduling and queries.
Note: By default, only one instance can be used for the job scheduling (ie.,
kylin.server.mode is set to all or job).

Enable Job Engine HA

Since v2.0, Kylin supports multiple job engines running together, which is more extensible,
available and reliable than the default job scheduler.

To enable the distributed job scheduler, you need to set or update the configs in the
kylin.properties, there are two configuration options:

kylin.job.scheduler.default=2
kylin.job.lock=org.apache.kylin.job.lock.zookeeper.ZookeeperJobLock

Then please add all job servers and query servers to the kylin.server.cluster-servers.

Use CuratorScheculer

Since v3.0.0-alpha, kylin introduces the Leader/Follower mode multiple job engines
scheduler based on Curator. Users can modify the following configuration to enable
CuratorScheduler:

kylin.job.scheduler.default=100
kylin.server.self-discovery-enabled=true

For more details about the kylin job scheduler, please refer to Apache Kylin Wiki.

Installing a load balancer

To send query requests to a cluster instead of a single node, you can deploy a load balancer
such as Nginx, F5 or cloudlb, etc., so that the client and load balancer communication
instead communicate with a specific Kylin instance.

Read and write separation deployment

There are some differences between read and write separation deployment of kylin 4 and
kylin 3, Please refer to : Read Write Separation Deployment for Kylin 4

Run Kylin with Docker

In order to allow users to easily try Kylin, and to facilitate developers to verify and debug
after modifying the source code. We provide Kylin’s docker image. In this image, each
service that Kylin relies on is properly installed and deployed, including:

 JDK 1.8
 Hadoop 2.8.5
 Hive 1.2.1
 Spark 2.4.7
 Kafka 1.1.1
 MySQL 5.1.73
 Zookeeper 3.4.6

Quickly try Kylin

We have pushed the Kylin image for the user to the docker hub. Users do not need to build
the image locally, just execute the following command to pull the image from the docker
hub:

docker pull apachekylin/apache-kylin-standalone:4.0.0

After the pull is successful, execute the following command to start the container:

docker run -d \
-m 8G \
-p 7070:7070 \
-p 8088:8088 \
-p 50070:50070 \
-p 8032:8032 \
-p 8042:8042 \
-p 2181:2181 \
apachekylin/apache-kylin-standalone:4.0.0

The following services are automatically started when the container starts:

 NameNode, DataNode
 ResourceManager, NodeManager
 Kylin

and run automatically $KYLIN_HOME/bin/sample.sh .

After the container is started, we can enter the container through the docker exec -it
<container_id> bash command. Of course, since we have mapped the specified port in
the container to the local port, we can open the pages of each service directly in the native
browser, such as:

 Kylin Web UI: http://127.0.0.1:7070/kylin/login

 Hdfs NameNode Web UI: http://127.0.0.1:50070
 Yarn ResourceManager Web UI: http://127.0.0.1:8088

Container resource recommendation

In order to allow Kylin to build the cube smoothly, the memory resource we configured for
Yarn NodeManager is 6G, plus the memory occupied by each service, please ensure that
the memory of the container is not less than 8G, so as to avoid errors due to insufficient
memory.

For the resource setting method for the container, please refer to:

 Mac user: https://docs.docker.com/docker-for-mac/#advanced

 Linux user: https://docs.docker.com/config/containers/resource_constraints/#memory

For how to customize the image, please check the github page kylin/docker.

Advanced Settings
Overwrite default kylin.properties at Cube level

In conf/kylin.properties there are many parameters, which control/impact on Kylin’s

behaviors; Most parameters are global configs like security or job related; while some are
Cube related; These Cube related parameters can be customized at each Cube level, so you
can control the behaviors more flexibly. The GUI to do this is in the “Configuration
Overwrites” step of the Cube wizard, as the screenshot below.

Overwrite default Spark conf at Cube level

The configurations for Spark are managed in conf/kylin.properties with prefix

kylin.engine.spark-conf.. For example, if you want to use job queue “myQueue” to
run Spark, setting “kylin.engine.spark-conf.spark.yarn.queue=myQueue” will let Spark get
“spark.yarn.queue=myQueue” feeded when submitting applications. The parameters can be
configured at Cube level, which will override the default values in
conf/kylin.properties.
Allocate more memory to Kylin instance

Open bin/setenv.sh, which has two sample settings for KYLIN_JVM_SETTINGS

environment variable; The default setting is small (4GB at max.), you can comment it and
then un-comment the next line to allocate 16GB:

export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K

-XX:MaxPermSize=128M -verbose:gc -XX:+PrintGCDetails -XX:
+PrintGCDateStamps -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:
+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M"
# export KYLIN_JVM_SETTINGS="-Xms16g -Xmx16g -XX:MaxPermSize=512m
-XX:NewSize=3g -XX:MaxNewSize=3g -XX:SurvivorRatio=4 -XX:
+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled -XX:
+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-XX:CMSInitiatingOccupancyFraction=70 -XX:+DisableExplicitGC -XX:
+HeapDumpOnOutOfMemoryError"
Enable multiple job engines (HA)

Since Kylin 2.0, Kylin support multiple job engines running together, which is more
extensible, available and reliable than the default job scheduler.

To enable the distributed job scheduler, you need to set or update the configs in the
kylin.properties:

kylin.job.scheduler.default=2
kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock

Please add all job servers and query servers to the kylin.server.cluster-servers.

Enable LDAP or SSO authentication

Check How to Enable Security with LDAP and SSO

Enable email notification

Kylin can send email notification on job complete/fail; To enable this, edit
conf/kylin.properties, set the following parameters:

mail.enabled=true
mail.host=your-smtp-server
mail.username=your-smtp-account
mail.password=your-smtp-pwd
mail.sender=your-sender-address
kylin.job.admin.dls=adminstrator-address

Restart Kylin server to take effective. To disable, set mail.enabled back to false.
Administrator will get notifications for all jobs. Modeler and Analyst need enter email
address into the “Notification List” at the first page of cube wizard, and then will get
notified for that cube.

Enable MySQL as Kylin metadata storage

Kylin can use MySQL as the metadata storage, for the scenarios that HBase is not the best
option; To enable this, you can perform the following steps:

 Install a MySQL server, e.g, v5.1.17;

 Create a new MySQL database for Kylin metadata, for example “kylin_metadata”;
 Download and copy MySQL JDBC connector “mysql-connector-java-.jar" to
$KYLIN_HOME/ext (if the folder does not exist, create it yourself);
 Edit conf/kylin.properties, set the following parameters:

kylin.metadata.url={your_metadata_tablename}@jdbc,url=jdbc:mysql://localh
ost:3306/kylin,username={your_username},password={your_password},driverCl
assName=com.mysql.jdbc.Driver
kylin.metadata.jdbc.dialect=mysql
kylin.metadata.jdbc.json-always-small-cell=true
kylin.metadata.jdbc.small-cell-meta-size-warning-threshold=100mb
kylin.metadata.jdbc.small-cell-meta-size-error-threshold=1gb
kylin.metadata.jdbc.max-cell-size=1mb

In “kylin.metadata.url” more configuration items can be added; The url, username, and
password are required items. If not configured, the default configuration items will be
used:

url: the JDBC connection URL;

username: JDBC user name
password: JDBC password, if encryption is selected, please put the
encrypted password here;
driverClassName: JDBC driver class name, the default value is
com.mysql.jdbc.Driver
maxActive: the maximum number of database connections, the default value
is 5;
maxIdle: the maximum number of connections waiting, the default value is
5;
maxWait: The maximum number of milliseconds to wait for connection. The
default value is 1000.
removeAbandoned: Whether to automatically reclaim timeout connections,
the default value is true;
removeAbandonedTimeout: the number of seconds in the timeout period, the
default is 300;
passwordEncrypted: Whether the JDBC password is encrypted or not, the
default is false;

 You can encrypt your password:

cd $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib
java -classpath kylin-server-base-\<version\>.jar:kylin-core-
common-\<version\>.jar:spring-beans-4.3.10.RELEASE.jar:spring-core-
4.3.10.RELEASE.jar:commons-codec-1.7.jar
org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES
<your_password>

 Start Kylin

Module1 - Satellite Communication VTU 7th Sem
No ratings yet
Module1 - Satellite Communication VTU 7th Sem
127 pages
Canrig-Top Drive 1275AC-681 750 Ton
0% (1)
Canrig-Top Drive 1275AC-681 750 Ton
17 pages
Spark Scala Protected
No ratings yet
Spark Scala Protected
211 pages
Vagabond, V01 (2002) (Viz)
100% (2)
Vagabond, V01 (2002) (Viz)
241 pages
Stoichiometric Calculations
100% (1)
Stoichiometric Calculations
33 pages
Coursera - Programming Mobile Apps Android
No ratings yet
Coursera - Programming Mobile Apps Android
6 pages
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
100% (1)
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
16 pages
Standard Operating Procedure To Learn How To Behave in Quality Control Laboratory in Pharmaceuticals
100% (1)
Standard Operating Procedure To Learn How To Behave in Quality Control Laboratory in Pharmaceuticals
38 pages
Assignment 2 (If Else If Ladder)
100% (1)
Assignment 2 (If Else If Ladder)
2 pages
Manual Desconectador Yaskawa
60% (5)
Manual Desconectador Yaskawa
32 pages
Owner'S Manual: Solar Water Heaters
No ratings yet
Owner'S Manual: Solar Water Heaters
56 pages
P7650A/B/U: Differential Pressure Sensors
No ratings yet
P7650A/B/U: Differential Pressure Sensors
4 pages
Service Manual (E) : Valid For Version 11 .Lo - 11.24
No ratings yet
Service Manual (E) : Valid For Version 11 .Lo - 11.24
41 pages
Catalogue
No ratings yet
Catalogue
36 pages
Mapr Snapshots
No ratings yet
Mapr Snapshots
31 pages
Mcca Study Guide 7.2017 Uvawomo
No ratings yet
Mcca Study Guide 7.2017 Uvawomo
30 pages
Top 10 Trenduri Big Data
No ratings yet
Top 10 Trenduri Big Data
13 pages
Use RESTful API
100% (1)
Use RESTful API
12 pages
MPS Multis Varios 2007 PDB
No ratings yet
MPS Multis Varios 2007 PDB
204 pages
Taking Leaving Message Kelas Xi #1 Meet
No ratings yet
Taking Leaving Message Kelas Xi #1 Meet
17 pages
Vagabond v02
80% (5)
Vagabond v02
246 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
Spy X Family - Volume 1
94% (32)
Spy X Family - Volume 1
212 pages
Diagnostic Tools
No ratings yet
Diagnostic Tools
36 pages
Boiler Settings, Combustion Systems, and Auxiliary Equipment
No ratings yet
Boiler Settings, Combustion Systems, and Auxiliary Equipment
115 pages
Arnaboldi, 2021
No ratings yet
Arnaboldi, 2021
46 pages
Q1 Science Iis MPS Conso
No ratings yet
Q1 Science Iis MPS Conso
2 pages
Fixed+Appliances
No ratings yet
Fixed+Appliances
36 pages
Povestiri Lev Tolstoi
88% (8)
Povestiri Lev Tolstoi
46 pages
Eurolyzer STX en
No ratings yet
Eurolyzer STX en
8 pages
构建基于Apache Kylin的大数据分析平台讲话
No ratings yet
构建基于Apache Kylin的大数据分析平台讲话
37 pages
MJH Big Data
No ratings yet
MJH Big Data
28 pages
HobbyTronics - Texas Instruments H-Bridge Motor Driver 1A - SN754410 - COM-00315
100% (1)
HobbyTronics - Texas Instruments H-Bridge Motor Driver 1A - SN754410 - COM-00315
2 pages
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
10 pages
Teqstories: Devops Professional Syllabus
No ratings yet
Teqstories: Devops Professional Syllabus
9 pages
Learning Spark - Chapter 2
No ratings yet
Learning Spark - Chapter 2
6 pages
Ken Kesey Zbor Deasupra Unui Cuib de Cuci
100% (6)
Ken Kesey Zbor Deasupra Unui Cuib de Cuci
304 pages
Radar Link Budget Analysis Using MATLAB Radar Designer
No ratings yet
Radar Link Budget Analysis Using MATLAB Radar Designer
14 pages
Percentage Points of The T-Distribution: This Table Was Generated Using Excel
No ratings yet
Percentage Points of The T-Distribution: This Table Was Generated Using Excel
1 page
Lois Lowry - Numara Stelele
100% (12)
Lois Lowry - Numara Stelele
150 pages
Nominal Ordinal Interval Ratio
No ratings yet
Nominal Ordinal Interval Ratio
7 pages
Print Si Cersetor - Mark Twain
100% (4)
Print Si Cersetor - Mark Twain
225 pages
Test Strategies For Data Processing Pipelines: Lars Albertsson, Independent Consultant (Mapflat)
No ratings yet
Test Strategies For Data Processing Pipelines: Lars Albertsson, Independent Consultant (Mapflat)
36 pages
Introduction To Teradata Data Mover Create Your First Job
No ratings yet
Introduction To Teradata Data Mover Create Your First Job
5 pages
Nexus Charge Amplifier
No ratings yet
Nexus Charge Amplifier
16 pages
Msbte JPR All Program Chapter-2
No ratings yet
Msbte JPR All Program Chapter-2
28 pages
OS Long
No ratings yet
OS Long
20 pages
Baiatul Cu Pijamale in Dungi - John Boyne PDF
97% (30)
Baiatul Cu Pijamale in Dungi - John Boyne PDF
220 pages
SJC Icse 2025 Computer Applications Prelims Paper
No ratings yet
SJC Icse 2025 Computer Applications Prelims Paper
5 pages
Question Bank - CN (Module 4,5,6)
No ratings yet
Question Bank - CN (Module 4,5,6)
3 pages
Synthesis of Copper Oxide
No ratings yet
Synthesis of Copper Oxide
2 pages
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Infrastructure as Code (IAC) Cookbook
From Everand
Infrastructure as Code (IAC) Cookbook
Stephane Jourdan
No ratings yet
Linux - a Secure Personal Computer for Beginners. Second Edition
From Everand
Linux - a Secure Personal Computer for Beginners. Second Edition
Mark Emerson
No ratings yet
Linux Commands By Example
From Everand
Linux Commands By Example
Khaled Jamal
4.5/5 (3)
Mastering SaltStack - Second Edition
From Everand
Mastering SaltStack - Second Edition
Joseph Hall
No ratings yet
WildFly Configuration, Deployment, and Administration - Second Edition
From Everand
WildFly Configuration, Deployment, and Administration - Second Edition
Christopher Ritchie
No ratings yet
Ansible by Examples: 200+ Automation Examples For Linux and Windows System Administrators and DevOps
From Everand
Ansible by Examples: 200+ Automation Examples For Linux and Windows System Administrators and DevOps
Luca Berton
No ratings yet
Perl and Apache: Your visual blueprint for developing dynamic Web content
From Everand
Perl and Apache: Your visual blueprint for developing dynamic Web content
Adam McDaniel
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Oracle Solaris 11: First Look
From Everand
Oracle Solaris 11: First Look
Philip P. Brown
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Red Hat Enterprise Linux 6 Administration: Real World Skills for Red Hat Administrators
From Everand
Red Hat Enterprise Linux 6 Administration: Real World Skills for Red Hat Administrators
Sander van Vugt
No ratings yet
Living With Linux In the Industrial World
From Everand
Living With Linux In the Industrial World
Elaiya Iswera Lallan
No ratings yet
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
From Everand
Docker Tutorial for Beginners: Learn Programming, Containers, Data Structures, Software Engineering, and Coding
Andrew Lee
3/5 (2)
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Learn Kubernetes - Container orchestration using Docker: Learn Collection
From Everand
Learn Kubernetes - Container orchestration using Docker: Learn Collection
Arnaud Weil
4/5 (1)
Learning Powershell DSC: Get started with the fundamentals of PowerShell DSC and utilize its power to automate deployment and configuration of your servers
From Everand
Learning Powershell DSC: Get started with the fundamentals of PowerShell DSC and utilize its power to automate deployment and configuration of your servers
James Pogran
No ratings yet
Azure For Starters
From Everand
Azure For Starters
Chinmoy Mukherjee
No ratings yet
P.H.P Simple C.R.U.D Design
From Everand
P.H.P Simple C.R.U.D Design
Rohaya Mohamad
4/5 (1)
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
Backend Handbook: for Ruby on Rails Apps
From Everand
Backend Handbook: for Ruby on Rails Apps
Francisco Quintero
1/5 (1)
Ansible For Linux by Examples
From Everand
Ansible For Linux by Examples
Luca Berton
No ratings yet
Build Your First Home Server
From Everand
Build Your First Home Server
R.R. Arnob
No ratings yet
Arch Linux: Fast and Light!
From Everand
Arch Linux: Fast and Light!
Frank Cheung
3/5 (2)
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Getting Started with the Lazarus IDE
From Everand
Getting Started with the Lazarus IDE
Roderick Person
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)
HACKING WITH KALI LINUX PENETRATION TESTING: Mastering Ethical Hacking Techniques with Kali Linux (2024 Guide for Beginners)
From Everand
HACKING WITH KALI LINUX PENETRATION TESTING: Mastering Ethical Hacking Techniques with Kali Linux (2024 Guide for Beginners)
BARBARA HEATH
No ratings yet
Ansible For Security by Examples
From Everand
Ansible For Security by Examples
Berton
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
OpenCart Tips and Tricks
From Everand
OpenCart Tips and Tricks
iSenseLabs
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
vSphere 5 AutoLab 1.1a Deployment Guide
From Everand
vSphere 5 AutoLab 1.1a Deployment Guide
Alastair Cooke
No ratings yet
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
From Everand
Learn Kubernetes & Docker - .NET Core, Java, Node.JS, PHP or Python
Arnaud Weil
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hedaya Alasooly
No ratings yet
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
From Everand
Evaluation of Some Intrusion Detection and Vulnerability Assessment Tools
Dr. Hedaya Mahmood Alasooly
No ratings yet
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
Configuration of Apache Server To Support ASP
From Everand
Configuration of Apache Server To Support ASP
Dr. Hedaya Mahmood Alasooly
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Cracking: Red team Hacking: Kali Linux, Parrot OS, BackBox & BlackArch
From Everand
Cracking: Red team Hacking: Kali Linux, Parrot OS, BackBox & BlackArch
Rob Botwright
No ratings yet
A concise guide to PHP MySQL and Apache
From Everand
A concise guide to PHP MySQL and Apache
alasdair gilchrist
4/5 (2)
Configuration of Apache Server To Support ASP
From Everand
Configuration of Apache Server To Support ASP
Dr. Hidaia Mahmood Alassouli
No ratings yet
Evaluation of Some Windows and Linux Intrusion Detection Tools
From Everand
Evaluation of Some Windows and Linux Intrusion Detection Tools
Dr. Hedaya Alasooly
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Configuration of a Simple Samba File Server, Quota and Schedule Backup
From Everand
Configuration of a Simple Samba File Server, Quota and Schedule Backup
Dr. Hidaia Mahmood Alassouli
No ratings yet

Installation Guide Apache Kylin

Uploaded by

Installation Guide Apache Kylin

Uploaded by

Installation Guide

 Hadoop: cdh5.x, cdh6.x, hdp2.x, EMR5.x, EMR6.x, HDI4.x

tar -zxvf apache-kylin-4.0.0-bin.tar.gz

 Run the script to download spark:

Or configure SPARK_HOME points to the path of spark2.4.7 in the environment.

 Configure MySQL metastore

Kylin tarball structure

 bin: shell scripts to start/stop Kylin service, backup/restore metadata, as well as

Perform additional steps for some environments

Checking the operating environment

Retrieving hadoop conf dir...

Once Kylin is launched, you can access it via the browser

Retrieving hadoop conf dir...

HDFS folder structure

Generally, /kylin/kylin_metadata directory stores data according to different projects,

Deploy kylin on AWS EC2 without hadoop

 Apply for AWS EC2 Linux instances as required

Component version information

1 Configure environment variables

 Download JDK1.8 to the prepared EC2 instance and unzip it to the

 Download Hadoop and unzip it

 Modify core-site.xml，config AWS account information and endpoint. The

 Download Hive and unzip it

 Configure environment variables

The sample content of hive-site.xml is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Unless required by applicable law or agreed to in writing,

 Hive metadata initialization

Note：If the following error is reported in this step:

5 Deploy Spark Standalone

 Download Spark 3.1.1 and unzip it

 Copy jar package required by S3:

 Copy hive-site.xml and mysql-jdbc

 Setup Spark master and worker

 Download zookeeper and unzip it

 Create the required folders and files:

 Setup zookeeper cluster

 Download kylin 4.0 binary package and unzip it

 Modify kylin.properties vim $KYLIN_HOME/conf/kylin.properties

 Execute bin/kylin.sh start

Deploy in Cluster Mode

Kylin Node Configuration

Enable Job Engine HA

Installing a load balancer

Read and write separation deployment

Run Kylin with Docker

Quickly try Kylin

docker pull apachekylin/apache-kylin-standalone:4.0.0

and run automatically $KYLIN_HOME/bin/sample.sh .

 Kylin Web UI: http://127.0.0.1:7070/kylin/login

Container resource recommendation

 Mac user: https://docs.docker.com/docker-for-mac/#advanced

In conf/kylin.properties there are many parameters, which control/impact on Kylin’s

Overwrite default Spark conf at Cube level

The configurations for Spark are managed in conf/kylin.properties with prefix

Open bin/setenv.sh, which has two sample settings for KYLIN_JVM_SETTINGS

export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K

Enable LDAP or SSO authentication

Check How to Enable Security with LDAP and SSO

Enable email notification

Enable MySQL as Kylin metadata storage

 Install a MySQL server, e.g, v5.1.17;

url: the JDBC connection URL;

 You can encrypt your password:

You might also like