0% found this document useful (0 votes)
148 views30 pages

Qlik Associative Big Data Index Setup Configuration and Deployment

The document describes deploying Qlik's Associative Big Data Index (QABDI) on Amazon Web Services Elastic Kubernetes Service (EKS) to index sample datasets and build on-demand applications in Qlik Sense. Key steps include deploying an EKS cluster, installing and configuring required tools, setting up QABDI using Helm charts, and indexing sample data located on Amazon Elastic File System (EFS).

Uploaded by

Manuel Sosa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views30 pages

Qlik Associative Big Data Index Setup Configuration and Deployment

The document describes deploying Qlik's Associative Big Data Index (QABDI) on Amazon Web Services Elastic Kubernetes Service (EKS) to index sample datasets and build on-demand applications in Qlik Sense. Key steps include deploying an EKS cluster, installing and configuring required tools, setting up QABDI using Helm charts, and indexing sample data located on Amazon Elastic File System (EFS).

Uploaded by

Manuel Sosa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

An Example Qlik Associative Big Data Index

Deployment in AWS EKS


Introduction
The Qlik Associative Big Data Index delivers (QABDI), Qlik’s associative experience to Big Data,
allowing users to freely explore and search big data while leaving the data where it resides.

This new capability provides a governed performant associative layer which can be deployed within
sources such as Hadoop based data lakes, without the need to load the data into memory. QABDI
Index enables fast and engaging data discovery on massive data volumes with full access to all the
details of the underlying data.

This paper is designed to deliver a technical overview of the steps involved deploying a QABDI
infrastructure in AWS EKS, indexing a sample dataset and building an on-demand solution in Qlik
Sense.

Scenario

The deployment supports a sample customer use case in which they have a requirement to derive
value from a large volume of data to be analyzed with Qlik Sense, they want to allow a section of
users to access all the data in a governed environment without impacting the source with SQL
queries. A combination of approaches will be utilized including the deploying QABDI in conjunction
with on demand app generation (ODAG). The source data used as in this paper is a combination of
open source travel data from Flights, New York Taxi, Chicago Taxi and New York City Bike
websites.

All data is in Parquet file format the expectation is the data has been “prepared” prior to the indexing
procedure, potentially by Qlik Data Catalyst. The following environment is used to analyze this
data:

• Deployment of the QABDI machinery within AWS EKS.

o 3 Tables, 66 columns 2.7Bn rows

o 2 x i3.8xlarge Ubuntu EC2 instances (32 CPU, 244GB RAM)

• A “live” application with an ODAG link

• ODAG detail app to replace the source for QABDI instead of the database

• An in-memory application using the QABDI selection GUI


The following architecture in a two node AWS EKS Kubernetes cluster supports the scenario:

A user logs into the Selection App in live


mode which contains a series of
dimensional selections in filter boxes and
a KPI obect with a count.

A user selects dimensional criteria in the


selection app from filter boxes/charts and
the map. After the governed limit is
reached the navigation button on the
toolbar becomes active with a green
indicator. The user can then choose to
generate a new application by
dynamically reloading data from the
QABDI and not the source database to
generate the detail in memory app.
The Prerequisites

This following high level process enables the deployment of QABDI in an AWS EKS cluster:

• Deployment of an EKS cluster using eksctl

• Docker login to Bintray with user and API Key

• Adding the Helm repository to install the software and charts

• Preparaing and indexing the data

Workstation

The cluster is deployed using a workstation with the following prerequisites in place and the
following software installed on an Ubuntu 18.04 instance. (Note Kubectl currently has an issue with
windows deployment terminal so Linux is recommended):

• The AWS command line interface configured to access the instance - Aws Cli

• Kubernetes command line tool - Kubectl

• Package manager for Kubernetes - Helm

• Command line JSON processor - jq

Qlik Sense Enterprise

February 2019 release or above with the following settings is also required:

In the settings file in C:\ProgramData\Qlik\Sense\Engine:

[Settings 7]
EnableBDIClient=1
BDIAsyncRequests=1
BDIStrictSynchronisation=0

Chart recommendation feature that leads to complex expressions generated from drag-and-drops is
not yet supported by QABDI, it can be disabled in the C:\Program
Files\Qlik\Sense\CapabilityServicecapabilities. json file with:
{
"flag": "DISABLE_AUTO_CHART",
"enabled": true
}

A per application set statement is also required to disable the insight advisor

SET DISABLE_INSIGHTS = 1;
AWS EKS Installation

Instantiation of the EKS cluster can be achieved using an open source utility eksctl :

$ curl -sL
"https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname-
s)_amd64.tar.gz" | tar xz -C /tmp
$ sudo mv /tmp/eksctl /usr/local/bin

aws-iam-authenticator

A tool to use AWS IAM credentials to authenticate to a Kubernetes cluster is required. To enable
this aws-iam-authenticator can be used.

https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html

$ curl -o aws-iam-authenticator https://amazon-eks.s3-us-west-


2.amazonaws.com/1.10.3/2018-07-26/bin/linux/amd64/aws-iam-authenticator
$ chmod +x ./aws-iam-authenticator
$ mkdir bin
$ cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export
PATH=$HOME/bin:$PATH
$echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc

Creating the Cluster Bootstrap with eksctl

The cluster is Initialized with a name, region, tags,nodes and node type. For additional configuration
options see the eksctl documentation:

$ eksctl create cluster \


--name=icd-eks \
--region=eu-west-1 \
--tags owner=ICD \
--nodes=2 \
--node-type=i3.8xlarge \
--ssh-access \
--ssh-public-key=bdi_icd_key

If eksctl fails, check CloudFormation for error messages. Confirm that aws-iam-authenticator has
worked by checking your current Kubernetes context:

$ kubectl config current-context

Initializing Helm into the EKS cluster


Assuming Helm is installed locally, the following four commands initialize it in the EKS cluster:

$ kubectl create -f tiller-rbac.yaml

$ helm init –-service-account tiller

Configuring EFS

The index is created within the EFS file system created as follows:

$ aws efs --region $YOUR_REGION create-file-system \


--creation-token $YOUR_CLUSTER_NAME-efs
--performance-mode maxIO

This will also create a FileSystemId, additionally VpcId and SubnetIds are required

Mount points are created for the EFS storage and connected to the Vpc that the EKS cluster is in.
The VpcId, SubnetId and SecurityGroupId’s are required, the Vpc Id can be found as follows:

$ aws ec2 --region <region> describe-vpcs \


--filters "Name=tag-key,Values=kubernetes.io/cluster/<cluster-name>" \
| jq -r '.Vpcs[0].VpcId'

Sample output:
vpc-0ad865ba8700c957d

Retrieve the SubnetId’s within the Vpc:

$ aws ec2 --region <region> describe-subnets \


--filters "Name=vpc-id,Values=<vpc-id>" \
| jq -r '.Subnets | unique_by(.AvailabilityZone) | .[].SubnetId'

Sample output:

subnet-019b8cc9e2a5c7ef0
subnet-0aba1710d318a9db4
subnet-039dfe928f362c575

Retrieve the SecurityGroupId’s:

$ aws ec2 --region $YOUR_REGION describe-security-groups \


--filters 'Name=vpc-id,Values=$YOUR_VPC_ID,Name=group-name,Values=*nodegroup*' \
| jq -r .SecurityGroups[].GroupId

Create a mount point for each SubnetId with the VpcId and SecurityGroupIds:

$ aws efs --region $YOUR_REGION create-mount-target \


--file-system-id $YOUR_FILE_SYSTEM_ID \
--security-group $YOUR_SECURITY_GROUP \
--subnet-id $YOUR_SUBNET_ID_1
$ aws efs --region $YOUR_REGION create-mount-target \
--file-system-id $YOUR_FILE_SYSTEM_ID \
--security-group $YOUR_SECURITY_GROUP \
--subnet-id $YOUR_SUBNET_ID_2

The efs provisioner allows you to mount EFS storage as PersistentVolumes in Kubernetes.

Install the EFS provisioner (name = aws-efs with storgeclass = efs) with Helm by setting
the FileSystemId and region. Be sure Path is set to / or the pod will fail to create.

$ helm install --name efs-provisioner stable/efs-provisioner \


--set efsProvisioner.efsFileSystemId=<file-system-id> \
--set efsProvisioner.awsRegion=<region> \
--set efsProvisioner.path=/ \
--set efsProvisioner.provisionerName=aws-efs \
--set efsProvisioner.storageClass.name=efs

For more configuration options consult the official EFS provisioner chart.
Deploying QABDI in the AWS EKS Cluster
Key and Repository setup

Verify the Kubernetes cluster is configured as follows:


$ kubectl config current-context

Helm packages Kubernetes applications together as a "chart". These charts are tarballs that are
stored externally.

Bintray is used as a chart repository, adding the Helm repository containing the QABDI charts as
follows:

$ sudo helm repo add bt_qlik https://qlik.bintray.com/qabdicharts

Synchronizing with the Bintray Helm chart repository:


$ sudo helm repo update

The latest charts and app are listed as follows:

$ sudo helm search bdi

NAME CHART VERSION APP VERSION DESCRIPTION


bt_qlik/bdi 1.0.0 1.4.0 Big Data Index

Deployment of QABDI

Installation of the QABDI chart with the default values achieved by providing a release-name,
repository (bt_qlik), license acceptance and licence key as follows:

$ helm install --name <release-name> bt_qlik/bdi \


--set acceptLicense=true \
--set 'license.key=enterkeyguid'

The release-name is a string that can be used to differentiate Helm deployments (as you could
theoretically deploy QABDI more than once on a cluster). If one is not provided, Helm will generate
one.

By setting acceptLicense=true you agree to the Qlik User License Agreement (QULA), which is
required to start the qsl_processor_tool and the indexer_tool. You don't need to do it while running
helm install but it is the easiest way. Another way to accept the license agreement is to log in to the
bastion and type export ACCEPT_QULA=true. When running helm install the QULA text will be
printed to the console.
The default values.yaml and be overridden with additional yaml files and optionally set flags:

$ helm install --name bdi bt_qlik/bdi \


-f my_values.yaml \
-f my_other_values.yaml \
--set 'image.tag=0.265.0'

The number of QABDI services are configured in additional yaml files. To install the configuration
required i.e. three indexers, one indexingmanager, one qslexecutor, one qslmanager, two
qslworkers and three symbolservers by changing the replicaCounts :

## Configuration values for BDI QSL Worker components.


##
qslworker:
## Override the components name (defaults to qslworker).
##
# nameOverride:
## Number of replicas.
##
replicaCount: 2

And the repository is also configured a yaml file:

## Image configuration.
image:
repository: qlik-docker-qabdi.bintray.io/bdiproduct

## List of secrets that can pull private Docker images.


imagePullSecret:
- artifactory-docker-secret

To access the index from Sense via the QSL manager LoadBalancer can be used with default port
55000 (qsl-manager-loadbalancer.yaml) provided :
## BDI values

qslmanager:
service:
type: LoadBalancer

In addition, a series of yaml files are provided with this paper for varying data volumes.

Checking the pods running per node and status will show a similar output to:

$ kubectl get pods

NAME READY STATUS RESTARTS AGE


efs-provisioner-75f9f8fd74-tvzdn 1/1 Running 0 39d
icd-mn-bdi-bastion-556858d676-pfmng 1/1 Running 0 5d
icd-mn-bdi-diskcache-mnw4k 1/1 Running 0 5d
icd-mn-bdi-diskcache-nhd8g 1/1 Running 0 5d
icd-mn-bdi-indexer-0 1/1 Running 0 5d
icd-mn-bdi-indexer-1 1/1 Running 0 5d
icd-mn-bdi-indexer-2 1/1 Running 0 5d
icd-mn-bdi-indexingmanager-6467ccf8b7-dvvwq 1/1 Running 0 5d
icd-mn-bdi-qslexecutor-55c77f5984-h9rsn 1/1 Running 0 5d
icd-mn-bdi-qslmanager-7455b84b66-m58bn 1/1 Running 0 5d
icd-mn-bdi-qslworker-0 1/1 Running 0 5d
icd-mn-bdi-qslworker-1 1/1 Running 0 5d
icd-mn-bdi-symbolserver-0 1/1 Running 0 5d
icd-mn-bdi-symbolserver-1 1/1 Running 0 5d
icd-mn-bdi-symbolserver-2 1/1 Running 0 5d

Retrieving the external IP of the qsl manager which will be used as the Host when creating a new
QABDI connection in Qlik Sense:

$ kubectl get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP


icd-mn-bdi-indexer ClusterIP None <none>
icd-mn-bdi-indexingmanager ClusterIP 10.100.75.34 <none>
icd-mn-bdi-qslexecutor ClusterIP 10.100.84.104 <none>
icd-mn-bdi-qslmanager LoadBalancer 10.100.115.189
a3b0757b1decb11e88b8f0a1f2ce7daa-397101174.eu-west-1.elb.amazonaws.com
icd-mn-bdi-qslworker ClusterIP None <none>
icd-mn-bdi-symbolserver ClusterIP None <none>
kubernetes ClusterIP 10.100.0.1 <none>

Note the host for the QABDI connection in the above example will be:

a3b0757b1decb11e88b8f0a1f2ce7daa-397101174.eu-west-1.elb.amazonaws.com
Mount the EFS Drive into the Pods

The deployed pods required a shared mount to access the source parquet files and to act as a
repository for the index output.

Note. The mount command requires root access to enable this “privileged” is required to be set to
“true” allowing the docker container root access to the pods:

$ helm upgrade <release-name> <repo/chart> -f <yaml> –-set image.privileged=true

Mounting the drives is a two stage process (in this case a shell script – exec_in_all_pods.sh shared
with this paper) is used to execute the commands in all pods vs individually, firstly the shared folder
is created:

$ ./exec_in_all_pods.sh <cluster name> 'mkdir /home/efs'

The EFS drive is mounted with reference to the Filesystem id and region:

$ ./exec_in_all_pods.sh <cluster name> 'mount -t nfs4 -o


nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport
<FileSystemID>.efs.<AWS_REGION>.amazonaws.com:/ /home/efs'
Indexing the Data

Configuration for Indexing

The source parquet files are organized in an EFS bucket and the required configuration files are
updated to specify the source/output and associations. All configuration files are stored in
/home/ubuntu/dist/runtime/config/ in the pods.

Data source (Parquet):

field_mappings_file.json

This file is required to specify the associations between the files (called A2A in the indexing
process).

{
"field_mappings": [
{
"column1": "Flights.Flight_Year",
"column2": "Link.link_flight_year"
},
{
"column1": "Taxi_Bike_Trips.pickup_year",
"column2": "Link.link_pickup_year"
}
]
}

indexing_setting.json

The purpose of this file is to indicate where the source data/output index/mapping file for association
are stored along with the model name (alltrips).

{
"output_root_folder": "/home/output",
"symbol_output_folder": "",
"index_output_folder": "",
"symbol_positions_output_folder": "",
"symbol_server_async_threads": 1,
"create_column_index_threads": 1,
"dataset_name": "alltrips",
"source_data_path":"/home/data/alltrip_source",

"field_mappings_file":"/home/ubuntu/dist/runtime/config/field_mappings_file.json
",
"logging_settings_file": ""
}

“output_root_folder” - output folder for the index files/log and config files
“dataset_name” - used as the model name in Qlik Sense
“source_data_path” - the location of the NYC sample parquet files
”Field_mappings_file” - name of the associations json file.

Starting the QABDI machinery and Indexing

All the indexing startup scripts and subsequent tasks can be executed from anyone of the available
pods, for this deployment the bastion is used by logging on to it as follows:

$ kubectl exec -it icd-mn-bdi-bastion-556858d676-pfmng bash

Indexing Startup: start_indexing_env.sh

Within the indexing folder /home/ubuntu/dist/runtime/scripts/indexer:

$ ./dist/runtime/scripts/indexer/start_indexing_env.sh

Resulting in a similar output as below:


[19-02-01T08:07:31:821]-[idx_srv-info]-[000016] RegistryServiceClient
connects to icd-mn-indexingmanager:55057
[19-02-01T08:07:31:831]-[idx_srv-info]-[000016] RegistryServiceClient
connects to icd-mn-indexingmanager:55057
[19-02-01T08:07:31:836]-[idx_srv-info]-[000016] PersistenceManagerServ
iceClient connects to icd-mn-indexingmanager:55010
[19-02-01T08:07:31:874]-[idx_srv-info]-[000016] IndexerServiceServer l
istens on icd-mn-indexer:55040
[19-02-01T08:07:31:877]-[idx_srv-info]-[000016] RegistryServiceClient
connects to icd-mn-indexingmanager:55057
[19-02-01T08:07:31:882]-[idx_srv-info]-[000016] RegistryServiceClient
connects to icd-mn-indexingmanager:55057
[19-02-01T08:07:31:883]-[reg_serv-info]-[000018] Registered indexer at
icd-mn-indexer:55040 - id: 403090185676891958616864329674924927981
[19-02-01T08:07:31:884]-[idx_srv-info]-[000016] Registered indexer at
icd-mn-indexer:55040 - id: 403090185676891958616864329674924927981

Indexing Service Management: service_manager.sh

Once the indexing services have started the service_manager.sh script can be used to query, start,
or stop indexing:

$ ./home/ubuntu/dist/runtime/scripts/indexer/service_manager.sh

source_data_path: /home/data/alltrips_source
Register ip found in cluster configuration: qlik-bdi-indexingmanager
Running in interactive Mode:
Valid Options:
h) help
1) list
2) stop
3) start
a) stop all services
q) quit

Listing the running processes shows:

Service Type : IP Address : Port


************************ : ******************************************* : **********
Registry : icd-mn-bdi-indexingmanager : 55057
PersistenceManager : icd-mn-bdi-indexingmanager : 55010
IndexingManager : icd-mn-bdi-indexingmanager : 55020
Symbol : icd-mn-bdi-symbolserver-0.icd-mn-bdi-symbolserver : 55030
Symbol : icd-mn-bdi-symbolserver-1.icd-mn-bdi-symbolserver : 55030
Symbol : icd-mn-bdi-symbolserver-2.icd-mn-bdi-symbolserver : 55030
Indexer : icd-mn-bdi-indexer-0.icd-mn-bdi-indexer : 55040
Indexer : icd-mn-bdi-indexer-1.icd-mn-bdi-indexer : 55040
Indexer : icd-mn-bdi-indexer-2.icd-mn-bdi-indexer : 55040
IndexMaintenance : icd-mn-bdi-indexingmanager : 55050
*********************** : ******************************************* : **********
Index the data with task_manager.sh

The task_manager.sh script is used to invoke the indexing process via a series of steps:

$ ./dist/runtime/scripts/indexer/task_manager.sh

Load commonly used functions


Gathering environment variables
Using user specified IP address 127.0.0.1
BDI_LOCAL_IP=127.0.0.1
BDI_ROOT_FOLDER=/home/ubuntu/dist
BDI_FOLD_STRUCTURE=DEPLOYMENT
BDI_CONFIG_FOLDER=/home/ubuntu/dist/runtime/config
BDI_CONFIG_TEMPLATE_FOLDER=/home/ubuntu/dist/runtime/config/template
BDI_INDEXER_TOOL_LOC=/home/ubuntu/dist/indexer/indexer_tool
BDI_QSL_PROCESSOR_TOOL_LOC=/home/ubuntu/dist/qsl_processor/qsl_processor_tool
BDI_GW_LOC=/home/ubuntu/dist/gateway/gw
Setting default cluster configuration file:
/home/ubuntu/dist/runtime/config/cluster.json
Using security key file /root/.ssh/inter_cluster_key.pem
Loading indexing settings from : /home/ubuntu/dist/runtime/config/indexing_setting.json
source_data_path: /home/efs/alltrips
Running in interactive Mode:
Valid Options: (Options marked with **** are to be triggered to complete index creation.
h) help
1) **** scan data for schema generation ****
2) view generated schema config
3) **** register schema (including table map for index maintenance) ****
a) **** start indexing creation full flow (option 4, 5 and 6) ****
4) add indexing task
5) create column index
6) create a2a
l) list task progress
t) list indexing tasks
r) resume indexing task
s) scan data to generate statistics (DO NOT USE IN NORMAL FLOW! Run Option 1 to
generate schema)
q) quit
Choose Action:

Step 1 scans the data and creates a series of json files in the output folder
home/data/alltrips_output/indexing/output/config/indexer containing the data types and source
parquet files reference:
1) **** scan data for schema generation ****

Once complete the following message appears in the console:


[19-02-01 10:59:55:701]-[console-info]-[345] DataScan finished .

Step 2 displays the result of step 1:


2) view generated schema config
****************************
/home/efs/All_Trips_output4/config/indexer/alltrips_schema.json
****************************
{
"associations" :
[
{
"data" :
[
{
"field" : "Flight_Year",
"table_name" : "Flights"
},
{
"field" : "link_flight_year",
"table_name" : "Link"
}
],
"op" : "add"
},
{
"data" :
[
{
"field" : "link_pickup_year",
"table_name" : "Link"
},
{
"field" : "pickup_year",
"table_name" : "Taxi_Bike_Trips"
}
],
"op" : "add"
}
],
"data_set_name" : "alltrips",
"tables" :
[
{
"fields" :
[
{
"col_no" : 0,
"name" : "ItinID",
"type" : "StringType"
},

Step 3 registers the schema in preparation to index:


3) **** register schema (including table map for index maintenance) ****

Once complete the following message appears in the console:

[19-02-01T08:13:16:316]-[idx_mgr_srv-info]-[000076] Registered schema for data source


"alltrips"
[19-02-01 08:13:16:318]-[console-info]-[000370] Response code: 0, schema_id: 0
/home/ubuntu/dist/indexer/indexer_tool -s indexmaintenance_client -r MaintainIndex -a
qlik-bdi-indexingmanager -p 55050 -l true -i
/home/output/config/indexer/alltrips_tablemap.json
[19-02-01 08:13:16:332]-[console-info]-[000376] Connect to qlik-bdi-
indexingmanager:55050
[19-02-01 08:13:16:339]-[console-info]-[000376] IndexMaintenanceServiceClient connects
to qlik-bdi-indexingmanager:55050
[19-02-01T08:13:16:341]-[idx_maintenance_srv-info]-[000120]
IndexMaintenanceServiceServer: MaintainIndex
[19-02-01 08:13:16:344]-[console-info]-[000376] Sent rpc MaintainIndex - Status: SUCCESS

Step 4 creates the entries in the RocksDB key value store:


4) add indexing task

Once complete the following message appears in the console:

[19-02-01T08:30:45:208]-[ss_srv-info]-[000504] /home/data/alltrips
_source/flights.table/ fares.parquet/1_0_9.parquet is a single parquet file
[19-02-01T08:31:33:681]-[ss_srv-info]-[000504] /home/data/alltrips_source/flights.table/
fares.parquet/1_1_4.parquet is a single parquet file
[19-02-01T08:32:21:881]-[ss_srv-info]-[000504] Create symbol table for table 'flights'
takes 8890 seconds
[19-02-01T08:32:21:881]-[ss_srv-info]-[000504] Apply compaction...
[19-02-01T08:33:27:193]-[ss_srv-info]-[000504] Apply compaction for table ' flights' in
dataset 'alltrips' takes 650 seconds
[19-02-01T08:33:27:193]-[ss_srv-info]-[000504] Create symbol table for table 'flights'
indataset 'alltrips'... DONE

Step 5 creates the index files per column in /alltrips_output/Index_output/alltrips:


5) create column index

Once complete the following message appears in the console:

[19-02-01T08:43:49:061]-[idx_srv-info]-[000038] Create indexlet for alltrips, flights,


ItinID, idx_0... DONE
[19-02-01T08:43:49:067]-[idx_srv-info]-[000038] Create indexlet for alltrips, flights,
Name, idx_0... DONE
[19-02-01T08:43:49:116]-[idx_mgr_srv-info]-[000076] Calculate SymbolPositions for task 0
[19-02-01T08:43:49:143]-[idx_mgr_srv-info]-[000076] Calculate global symbol to position
id... done in 0
[19-02-01T08:43:49:185]-[idx_srv-info]-[000038] No more indexlet in alltrips, table
flights for indexing

Step 6 creates the associations:


6) create a2a

Once complete the following message appears in the console:


[19-02-01T08:48:31:383]-[ss_srv-info]-[000065] Created A2A indexlet 0 for
Flights:Flight_Year – link_table:link_flight_year
[19-02-01T08:48:31:385]-[ss_srv-info]-[000065] Create A2A for link_table:flight_year
- link_flight_year: Vendor_ID... DONE

Checking on completion of the index shows:

Choose Action: l
[19-02-01 08:36:19:457]-[console-info]-[000400] Connect to icd-mn-indexingmanager:55020
[19-02-01 08:36:19:461]-[console-info]-[000400] Symbol Table creation progress at: 100%
[19-02-01 08:36:19:461]-[console-info]-[000400] UnmappedColumn Index creation progress
at: 100%
[19-02-01 08:36:19:462]-[console-info]-[000400] Column Index creation progress at: 100%
[19-02-01 08:36:19:463]-[console-info]-[000400] A2A creation progress at: 100%

Starting the Worker (Qlik Selection Language – (QSL)) Services

QSL services are required to process the selections made from the Sense client and to process any
extractions made in the load script from the Index into memory. The QSL services are dependent
on several services, including Indexing Registry, Persistence Manager and all Symbol services. All
executables to start the QSL are in runtime/scripts/qsl_processor folder.

The QSL services start with:


$ ./home/ubuntu/dist/runtime/scripts/qsl_processor/start_qsl_env.sh

once the following message appears in the console the process has started:

[19-02-01T08:52:26:585]-[Wrk:qlik-bdi-qslworker-info]-[000046] Registry : done with a2a


indexlet ( alltrips 0-3, 0 )
[19-02-01T08:52:26:585]-[Wrk:qlik-bdi-qslworker-info]-[000046] Total indexlets: 56
QABDI and Qlik Sense Applications
The Qlik Sense portion of the deployment uses Feb 2019 QSE and utilizes an on-demand
application generation solution using the QABDI as the source for the detail app and to build a basic
“live” application with the selection GUI.

• All Trips Live.qvf – Live mode selections app

• Flight Details Handle.qvf – Detail ODAG application using Handles

• Flight Details Cache.qvf - Detail application with modified ODAG script

• Taxi and Bike Details.qvf - Detail application with modified ODAG script

Qlik Sense settings for QABDI

Qlik Sense February 2019 release and above are the only versions which can use the QABDI
functionality and the following flags need to be set:

In the settings file in C:\ProgramData\Qlik\Sense\Engine:

[Settings 7]
EnableBDIClient=1
BDIAsyncRequests=1
BDIStrictSynchronisation=0
Chart recommendation feature that leads to complex expressions generated from drag-and-drops
which are not yet supported by QABDI, it can be disabled in capabilities. json file with:

{
"flag": "DISABLE_AUTO_CHART",
"enabled": true
}

A per application set statement is also required to disable the insight advisor:

SET DISABLE_INSIGHTS = 1;

Connectivity to QABDI and Live mode

One of the modes of interacting with the index is “live” mode which effectively allows Qlik Sense to
have a minimal memory footprint by only loading the metadata into memory.

A connection string is configured with the following parameters: Create a new connection string and
select BDI from the list and enter the criteria:

Host - external IP of the qsl manager


QSL Port - 55000
Data model - name of the “dataset_name” specified indexing_settings.json
Name - name of the connection to be stored in Qlik Sense
The live model import syntax:

IMPORT LIVE 'alltrips';

Currently QABDI does not support search and insight advisor, the search index and insights are
disabled:

SET CreateSearchIndexOnReload=0;
SET DISABLE_INSIGHTS = 1;

The QABDI Connector and Selection GUI

As part of the product a QABDI connector has been developed which will allow users to extract data
from the index using autogenerated script through the Data Load Editor (DLE) into an in-memory
application.

Opening the GUI displays all the available entities in the alltrips model for selection:
Inserting the script from connector GUI contains the following:

The set handle reference containing filter as a GUID:

SET $bdiHandle = '[alltrips].[hdfa71c23_3796_4732_8b2a_bfcfcf7c0b28]';

A default limit for the number of rows of data (MaxRowsPerTable) is set to 10,000 which can be
changed for a higher volume, this is controlled by an initial count before extraction resulting in the
script exiting if the limit is breached:

LET MaxRowsPerTable = 10000;


BDI CONNECT TO ' alltrips aa0785e5a14af11e9917dcb283-185840822.us-east-
1.elb.amazonaws.com:55000';

[rowcount@ Flights]:
QSL Select count(*) as nRows0 from [alltrips].[ Flights] at STATE ${bdiHandle};
let nFilteredRows = Peek('nRows0',0, Flights ');

TRACE 'nFilteredRows' = ${nFilteredRows};


if( nFilteredRows > MaxRowsPerTable ) then
TRACE 'Too many rows to import for Flights';
drop table [rowcount@ Flights];
Exit Script;
end if

drop table [rowcount@Flights];

Script for ODAG with Handles

The load scripts for on-demand template apps contain connection and data load commands whose
parameters are set by a special variable - odb_setHandle that the on-demand app service uses for
linkage. The odb_setHandle variable is used specifically for QABDI linkage and captures all the
selection states from the selection app:

TRACE Generated SELECTION STATE GUID: ;


TRACE $(odb_setHandle);

Flights:
QSL SELECT
[ItinID],
[SeqNum],
[Coupons],
[Flight_Year],
[Flight_Quarter],
[Origin],
[OriginCountry],
[OriginState],
[Dest],
[DestCountry],
[DestState],
[TkCarrier],
[Passengers],
[FareClass],
[Distance]
FROM [alltrips].[Flights]
AT STATE $(odb_setHandle);

The odb_setHandle gets replaced with the GUID of the handle:

AT STATE [alltrips].[h6790741d_5ed5_496c_9a5b_fd47b4c165e2]

Which are stored in the output/qsl_handles/alltrips folder on EFS.

Script Modifications converting existing ODAG apps to use QABDI

Conversion of existing SQL generating ODAG apps to use the index as the source replaces the
WHERE_CLAUSE variable creation with a selection state clause using QSL syntax, for example we can
create a specific QSL SET statement which will create the “set handle” containing a reference to the
columns selected and the data to filter on.

The QSL syntax will apply the set handle to the underlying model via an AT STATE statement with
the following syntax:

LOAD <column1,column2>;
QSL SELECT <column1,column2>
FROM <modelname.table>
AT STATE hPassedSelection;

The hPassedSelection set handle is dynamically populated by the ODAG process, in the example
below a set handle is created which consists of:

QSL SET hPassedSelection = {[alltrips].1 <rate_code={2}, payment_type={Dis}>};

hPassedSelection - name of set handle


alltrips - model name specified in the indexing process

<rate_code={2}, payment_type={Dis}>} - selection filters

The following changes are applied to an on-demand detail app which executes SQL in comparison
to the syntax required for QABDI

First subroutine – ExtendSelectionState

The first subroutine (ExtendWhere) is modified to generate a dynamic SELECTION_STATE statement


by converting the WHERE_PART generation by firstly renaming the subroutine to
ExtendSelectionState and replacing all WHERE_PART instances with SELECTION_STATE.

The SELECTION_STATE generation is modified create syntax for the selection state format and to
cater for multiple selection state criteria, all instances of WHERE_PART are placed with
SELECTION_STATE.

The WHERE and IN clauses are replaced with QSL syntax, the main change is substituting the model
name into the statement [alltrips] with a “.1” suffix to indicate current selections.

LET SELECTION_STATE = '{ [alltrips].1 <$(ColName)={$(Values)}>} ';

To cater for multiple selection states and the format required some string replacement is required:

LET SELECTION_STATE = replace('$(SELECTION_STATE)','>}', ',') &' $(ColName)={


$(Values) }>}';

The subroutine looks as follows

SUB ExtendSelectionState(Name, ValVarName)


LET T = Name & '_COLNAME';
LET ColName = $(T);
LET Values = $(ValVarName);
IF len(Values) > 0 THEN
IF len(SELECTION_STATE) > 0 THEN
LET SELECTION_STATE = replace('$(SELECTION_STATE)','>}',
',') &' $(ColName)={ $(Values) }>}';
ELSE
LET SELECTION_STATE = '{[alltrips].1 <$(ColName)={$(Values)}>}
';
ENDIF
ENDIF

END SUB;
Changing quoting char for SELECTION_STATE in CALL BuildValueList

For each of the bound fields modification of the quoting options in the CALL BuildValuelist
statements is required to cater for the QSL syntax, by changing the ASCII character code 0:

CALL BuildValueList(‘whatever the field bound’, 'OdagBinding', 'VAL', 0);

The script loops through all field bindings and calls the modified subroutine -
(ExtendSelectionState).

The set handle statement which will reference the SELECTION_STATE variable is applied to the fact
table to filter the data:

QSL SET hPassedSelection = $(SELECTION_STATE);

And finally, the set handle is applied to the QSL SELECT statement containing the fields, model
name and required table:

FROM [alltrips].[Flights]
AT STATE hPassedSelection;
Trouble Shooting and Cheat Sheet

Quick reference for scripts to run

The instance can be shut down by typing:


$ sudo helm del --purge qlik

Checking the status of the pods:


$ sudo kubectl get pods

Destroying the pods:


$ sudo helm del --purge qlik

Starting the indexing service and check the services are running:

$ cd /home/ubuntu/dist/runtime/scripts/indexer
$ ./start_indexing_env.sh

Checking the services:


$ ./service_manager.sh

Creating the index or just creating the schema:


$ ./task_manager.sh

Starting/stopping the QSL services:

$ cd /home/ubuntu/dist/runtime/scripts/qsl_processor
$ ./start_qsl_env.sh
$ ./stop_qsl_env.sh

Checking the version of QABDI:


$ kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}"

Error Checking

If the source data is not in the location specified in the indexing_setting.json file or the format is not
as described the following error will be thrown:

[18-08-25 08:31:08:846]-[console-critical]-[342] ********************** CRITICAL!


*********************
[18-08-25 08:31:08:846]-[console-critical]-[342] Missing or unsupported type for file
'/home/data/alltrips' Ref:1018[18-08-25 08:31:08:846]-[console-critical]-[342]
***************
To see which services are running:

$ ps aux|grep qsl
$ ps -ef | grep -E '[i]ndexer_tool|[q]sl_processor_tool'

Note that in case the Indexing/QSL processors stop responding, or crash or just disappear, the best
thing is to start with killing of all the running processes, and then when it comes to start the QSL
processors, you will need to start the indexing cluster, and then skip the perform Indexing tasks and
then start the QSL_Processor tool.

Currently you can kill the QSL and Indexer processes in two ways;

Recommended approach:

$ cd /home/ubuntu/dist/runtime/scripts/qsl_processor
$ ./stop_qsl_env.sh
$ cd /home/ubuntu/dist/runtime/scripts/indexer
$ ./service_manager.sh
$ Enter option: a (stop all services)

Kill command: connect to each QSL and Indexing instance, and type:

$ ps aux|grep qsl if found pkill --signal 9 -f qsl


$ cd /home/ubuntu/dist/runtime/scripts/indexer
$ ./start_indexing_env.sh
$ cd /home/ubuntu/dist/runtime/scripts/qsl_processor
$ ./start_qsl_env.sh

Configuration and logs in the QABDI environment

QABDI logs are stored in your indexing output folder. Based on the configuration defined for your
QABDI environment, it should be in the following location: /home/efs/alltrips_output/logs

Indexing_manager_service.log Check for index service registration and


schema registration on ports
55020,55010,55050

registry_service.log Check for status of indexing and symbol


servers
persistence_manager_service.log Check for status of persistence service port
55010

symbol_service.log Check for status of symbol service port 55030


and symbol output location

indexer_service.log Check for status of index service on port 55040

/worker/Wrk_xxx.qlog Check for activity of workers involved in startup


process on ports 44001,54001

/manager/Mgr_xxx.qlog Check for activity of workers involved in QSL


generation, line by line code, to see in realtime

tail -f Mgr_xxx-qslmanager_55000.qlog

/reg-exec/ReX_xxx.qlog Check for registry processes

Checking the configuration files updated during the indexing process stored in
output/config/indexer/:

symbol_service_xx.json – multiple files for multiple symbol servers

indexer_service_xx.json - multiple files for multiple indexing servers

registry_service.json

persitence_manager.json

indexing_manager_service.json

xxx_data_source_fullflow.json - parquet sources storage locations (xxx = model name)

xxx_data_source.json - parquet sources storage locations

xxx_schema.json - data type mappings in the parquet files.


Other commands

To upgrade an existing deployment after making changes to the configuration (using


my_values.yaml):
helm upgrade qlik-mn qlik/bdi -f my_values.yaml

To destroy the release and remove all pods, volumes and associated data:
helm del --purge qlik-mn

To increase the number of services, in this case workers:


kubectl scale deployment/qlik-mn-bdi-qslworker --replicas=4

About Qlik
Qlik is on a mission to create a data-literate world, where everyone can use data to solve their most
challenging problems. Only Qlik’s end-to-end data management and analytics platform brings together
all of an organization’s data from any source, enabling people at any skill level to use their curiosity to
uncover new insights. Companies use Qlik products to see more deeply into customer behavior,
reinvent business processes, discover new revenue streams, and balance risk and reward. Qlik does
business in more than 100 countries and serves over 48,000 customers around the world.
qlik.com

© 2018 QlikTech International AB. All rights reserved. Qlik®, Qlik Sense®, QlikView®, QlikTech®, Qlik Cloud®, Qlik DataMarket®, Qlik Analytics Platform®, Qlik NPrinting®, Qlik
Connectors®, Qlik GeoAnalytics®, Qlik Core®, Associative Difference®, Lead with Data™, Qlik Data Catalyst™, Qlik Associative Big Data Index™ and the QlikTech logos are trademarks of
QlikTech International AB that have been registered in one or more countries. Other marks and logos mentioned herein are trademarks or registered trademarks of their respective owners.
BIGDATAWP092618_MD

You might also like