0% found this document useful (0 votes)

63 views17 pages

KNIME Google Cloud Integration User Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-08)

Knime 4.3

Uploaded by

Ali Habib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views17 pages

KNIME Google Cloud Integration User Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-08)

Knime 4.3

Uploaded by

Ali Habib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

KNIME Google Cloud Integration

User Guide
KNIME AG, Zurich, Switzerland
Version 4.3 (last updated on 2020-12-08)
Table of Contents
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Google Dataproc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Cluster Setup with Livy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Connect to Dataproc cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Apache Hive in Google Dataproc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Google Cloud Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Google Authentication (API Key) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Google Cloud Storage Connector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Google BigQuery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Connect to BigQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Create a BigQuery table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

KNIME Google Cloud Integration User Guide

Overview
KNIME Analytics Platform includes a set of nodes to support several Google Cloud services.
The supported Google Cloud services that will be covered in this guide are Google Dataproc,
Google Cloud Storage, and Google BigQuery.

KNIME Analytics Platform provides further integration for Google Drive and Google Sheets.

Google Dataproc

Cluster Setup with Livy

To create a Dataproc cluster using the Google Cloud Platform web console, follow the step-
by-step guide provided by Google documentation.

To setup Apache Livy in the cluster, the following additional steps are necessary:

1. Copy the file livy.sh from Git repository into your cloud storage bucket. This file will be
used as the initialization action to install Livy on a master node within a Dataproc
cluster.

 Please check best practices of using initialization actions.

© 2020 KNIME AG. All rights reserved. 1

KNIME Google Cloud Integration User Guide

2. During cluster creation, open the Advanced options at the bottom of the page

Figure 1. Advanced options in the cluster creation page

© 2020 KNIME AG. All rights reserved. 2

KNIME Google Cloud Integration User Guide

3. Select the network and subnet. Remember the network and subnet for the Access to
Livy section.

Figure 2. Network and subnet

4. Select the livy.sh file from your cloud storage bucket in the initialization actions
section

Figure 3. Set livy.sh as initialization action

5. Configure the rest of the cluster settings according to your needs and create the
cluster.

Apache Livy is a service that interacts with a Spark cluster over a REST
 interface. It is the recommended service to create a Spark context in KNIME
Analytics Platform.

© 2020 KNIME AG. All rights reserved. 3

KNIME Google Cloud Integration User Guide

Access to Livy

To find the external IP address of the master node where Livy is running:

1. Click on the cluster name in the cluster list page

2. Go to VM Instances and click on the master node

Figure 4. Select the master node in the VM instances list

3. On the VM Instances page, scroll down to the Network interfaces section. Find the
network and subnet that you selected in the previous Cluster Setup with Livy section,
and you will find the external IP address of the master node.

Figure 5. Find the external IP address of the master node

© 2020 KNIME AG. All rights reserved. 4

KNIME Google Cloud Integration User Guide

Livy Firewall Setup

To allow access to Livy from the outside, you have to configure the firewall:

1. Click on the cluster name in the cluster list page

2. Go to VM Instances and click on the master node
3. On the VM Instances page, scroll down to the Firewalls section and make sure the
checkbox Allow HTTP traffic is enabled

Figure 6. Check Allow HTTP traffic in the Firewalls section

4. Next, go to the VPC network page

5. In Firewall section of the VPC network page, select the default-allow-http rule

Figure 7. Open the default-allow-http firewall rule

© 2020 KNIME AG. All rights reserved. 5

KNIME Google Cloud Integration User Guide

6. Make sure that tcp:8998 is included in the allowed protocol and ports list, and that your
IP address is included in the allowed IP addresses list.

Figure 8. Make sure to allow access to certain ports and IP addresses

Once you have followed these steps, you will be able to access the Dataproc cluster via
KNIME Analytics Platform using Apache Livy.

© 2020 KNIME AG. All rights reserved. 6

KNIME Google Cloud Integration User Guide

Connect to Dataproc cluster

Figure 9. Connecting to Dataproc cluster

Figure 9 shows how to establish a connection to a running Dataproc cluster via KNIME
Analytics Platform. The Google Authentication (API Key) node and Google Cloud Storage
Connector node are used to create a connection to google APIs and to Google Cloud Storage
respectively. For more information on both nodes, please check out the Google Cloud
Storage section of this guide.

The Create Spark Context (Livy) node creates a Spark context via Apache Livy. Inside the
node configuration dialog, the most important settings are:

• The Livy URL. It has the format http://<IP-ADDRESS>:8998 where <IP-ADDRESS> is the
external IP address of the master node of the Dataproc cluster. To find the external IP
address of your Dataproc cluster, check out the Access to Livy section.
• Under Advanced tab, it is mandatory to set the staging area for Spark jobs. The staging
area, which is located in the connected Google Cloud Storage system, will be used to
exchange temporary files between KNIME and the Spark context.

The rest of settings can be configured according to your needs. For more information on the
Create Spark Context (Livy) node, please check out our Amazon Web Services
documentation.

Once the Spark context is created, you can use any number of the KNIME Spark nodes from
the KNIME Extension for Apache Spark to visually assemble your Spark analysis flow to be
executed on the cluster.

© 2020 KNIME AG. All rights reserved. 7

KNIME Google Cloud Integration User Guide

Apache Hive in Google Dataproc

This section describes how to establish a connection to Apache Hive™ on Dataproc in KNIME
Analytics Platform.

Figure 10. Connect to Hive and create a Hive table

Figure 10 shows how to connect to Hive running on a Dataproc cluster and how to create a
Hive table.

The Hive Connector node is bundled by default with the open-source Apache Hive JDBC
driver. Proprietary drivers are also supported, but need to be registered first. Follow the guide
on how to register a Hive JDBC driver in KNIME documentation.

Once the Hive JDBC driver is registered, you can configure the Hive Connector node. For
more information on how to configure the settings in the node configuration dialog, please
refer to the KNIME documentation. Executing the node will create a connection to Apache
Hive and you can use any KNIME database nodes to visually assemble your SQL statements.

To enable access to Hive from KNIME Analytics Platform, make sure that the
Hive port (10000 by default) is opened in the firewall rules. To configure this,
 check out the Livy Firewall Setup section and change the firewall rule
accordingly.

© 2020 KNIME AG. All rights reserved. 8

KNIME Google Cloud Integration User Guide

Google Cloud Storage

KNIME Google Cloud Storage Connection extension provides nodes to connect to Google
Cloud Storage.

The new Google Cloud Storage Connector node uses the new file handling
 framework (available starting from version 4.3). For more information on the
file handling framework, please check out the KNIME File Handling Guide

Figure 11. Connecting to and working with Google Cloud Storage

Figure 11 shows an example on how to connect to Google Cloud Storage and work with the
remote files.

Google Authentication (API Key)

The Google Authentication (API Key) node allows you to authenticate with the various Google
APIs using a P12 key file. To be able to use this node, you have to create a project at the
Google Cloud Console. For more information on how to create a project on Google Cloud
Console, please follow the Google documentation.

© 2020 KNIME AG. All rights reserved. 9

KNIME Google Cloud Integration User Guide

Figure 12. Node configuration dialog of Google Authentication (API Key) node

Figure 12 shows the node configuration dialog of the Google Authentication (API Key). Inside
the node dialog, you have to configure the following settings:

• Service account email. If you don’t have one already, please follow the Google
documentation on how to create a service account. After creating the service account,
it is essential to select P12 as the service account key (see Figure 13). The service
acccount email has the format of [email protected]
where sa-name is a unique identifier, and project-id is the ID of the project.

KNIME Google Cloud Integration User Guide

Figure 13. Select P12 file as the service account key

• P12 key file location. After creating the service account in the previous step, select P12
as the service account key (see Figure 13). The P12 file will be downloaded
automatically to your local machine. Note that you should store the P12 file in a secure
place on your local system.
• The OAuth 2.0 scopes that will be granted for this connection. You should select the
scopes depending on the level of access that you need.

Google Cloud Storage Connector

The Google Cloud Storage Connector node connects to Google Cloud Storage and allows
downstream nodes to access Google Cloud Storage inside a certain project using the new
KNIME file handling nodes.

The node configuration dialog of the Google Cloud Storage Connector node contains:

• Project ID. This is the Google Cloud project ID. For more information on how to find
your project ID, please check out the Google documentation.
• Working directory. The working directory must be specified as an absolute path and it
allows downstream nodes to access files/folders using relative paths, i.e. paths that do
not have a leading slash. If not specified, the default working directory is /.

Path syntax: Paths for Google Cloud Storage are specified with a UNIX-like syntax, e.g.

KNIME Google Cloud Integration User Guide

/mybucket/myfolder/myfile. The path usually consists of:

◦ A leading slash (/)

◦ Followed by the name of a bucket (mybucket in the above example), followed by a

slash

◦ Followed by the name of an object within the bucket (myfolder/myfile in the

above example).
• Normalize paths. Path normalization eliminates redundant components of a path, e.g.
/a/../b/./c can be normalized to /b/c. When these redundant components like ../ or
. are part of an existing object, then normalization must be deactivated in order to
access them properly.
• Under the Advanced tab, it is possible to set the connection and read timeout.

This node currently only supports the Google Authentication (API key) node for
 authentication.

KNIME Google Cloud Integration User Guide

Google BigQuery
KNIME Analytics Platform includes a set of nodes to support Google BigQuery. The KNIME
BigQuery extension is available from KNIME Analytics Platform version 4.1.

Setting up KNIME Analytics Platform for Google BigQuery has the following prerequisites:

1. Create a project in the Google Cloud Console. For more information on how to create a
project on Google Cloud Console, please follow the Google documentation.
2. Create a service account. If you don’t have one already, please follow the Google
documentation on how to create a service account. It is essential to select P12 as the
service account key.
3. Download the JDBC driver for Google BigQuery, unzip, and store it in your local
machine. Register the JDBC driver on KNIME Analytics Platform by following the
tutorial in the KNIME documentation.

Connect to BigQuery

Figure 14. Connecting to and working with Google BigQuery

Figure 14 shows how to authenticate using the Google Authentication (API Key) node and the
Google BigQuery Connector node to establish a connection to BigQuery via JDBC driver. To
configure Google Authentication (API Key) node, please refer to the Google Authentication
(API Key) section.

To configure the Google BigQuery Connector node, please check out how to connect to a

KNIME Google Cloud Integration User Guide

predefined database in the KNIME documentation. For the hostname in BigQuery, you can
specify https://www.googleapis.com/bigquery/v2 or bigquery.cloud.google.com. As the
database name, use the project name you created on the Google Cloud Console.

For more information on the JDBC parameters tab or the Advanced tab in the
 node configuration dialog of Google BigQuery Connector node, please check
out the KNIME documentation.

Executing this node will create a connection to the BigQuery database and you can use any
KNIME database nodes to visually assemble your SQL statements.

For more information on KNIME database nodes, please check out the KNIME
 Database documentation.

Create a BigQuery table

To export data from KNIME Analytics Platform to Google BigQuery (shown in Figure 14):

1. Create the database schema/dataset where you want to store the table, if it doesn’t
exist already. To create a dataset, please check the Google documentation.
2. Create an empty table with the right specification. To do this, use the DB Table Creator
node. Inside the node configuration dialog, specify the schema as the name of the
dataset that you created in the previous step. For more information on the DB Table
Creator node, please check the KNIME documentation.

If the table has column names that contain space characters, e.g. column
1, make sure to delete the space characters because they would be
 automatically replaced with _ during table creation, e.g. column_1 and this
will lead to conflict, since column names will no longer match.

3. Once the empty table is created, use the DB Loader node to load the table content into
the newly created table. For more information on the DB Loader node, please check the
KNIME documentation.

KNIME AG
Hardturmstrasse 66
8005 Zurich, Switzerland
www.knime.com
[email protected]

The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME AG under license
from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany.

Details of Delta Lake Tutorial
67% (3)
Details of Delta Lake Tutorial
43 pages
Data Engineering
No ratings yet
Data Engineering
92 pages
Final
No ratings yet
Final
276 pages
DP 600
No ratings yet
DP 600
121 pages
GCP Associate Guide
No ratings yet
GCP Associate Guide
14 pages
BDA - Lab Manual
No ratings yet
BDA - Lab Manual
78 pages
Unit 2 Notes Data Analytics
No ratings yet
Unit 2 Notes Data Analytics
11 pages
Matthew Johnson SR Python Developer 85/1, South Street, Philadelphia PA, 19019 US Citizen Professional Summary
No ratings yet
Matthew Johnson SR Python Developer 85/1, South Street, Philadelphia PA, 19019 US Citizen Professional Summary
14 pages
Data Science & ML Syllabus
No ratings yet
Data Science & ML Syllabus
12 pages
Profile
No ratings yet
Profile
4 pages
Big Data With Hadoop & Spark - Introduction
No ratings yet
Big Data With Hadoop & Spark - Introduction
28 pages
Datamites Certified Data Analyst Brochure INDIA V9
No ratings yet
Datamites Certified Data Analyst Brochure INDIA V9
18 pages
Cloudera Spark
No ratings yet
Cloudera Spark
55 pages
Performance Audit Guidelines E
No ratings yet
Performance Audit Guidelines E
162 pages
Full Download Mastering Machine Learning With Spark 2 X Harness The Potential of Machine Learning Through Spark 1st Edition Alex Tellez PDF
100% (4)
Full Download Mastering Machine Learning With Spark 2 X Harness The Potential of Machine Learning Through Spark 1st Edition Alex Tellez PDF
55 pages
Course Outline Big Data Analytics
No ratings yet
Course Outline Big Data Analytics
2 pages
2yrs Mca Sem4
No ratings yet
2yrs Mca Sem4
10 pages
If Rs Teachers Hand Out
No ratings yet
If Rs Teachers Hand Out
237 pages
Chapter 6 Spark - An In-Memory Distributed Computing Engine
No ratings yet
Chapter 6 Spark - An In-Memory Distributed Computing Engine
43 pages
Subprime Mortgage Crisis
No ratings yet
Subprime Mortgage Crisis
50 pages
4.1. Spark Basics
No ratings yet
4.1. Spark Basics
28 pages
Microsoft Governance
No ratings yet
Microsoft Governance
24 pages
Data Scientist/ Machine Learning Engineer: Summary
No ratings yet
Data Scientist/ Machine Learning Engineer: Summary
4 pages
Caret Package Infographic PDF
No ratings yet
Caret Package Infographic PDF
1 page
KNIME Python Integration Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-06)
No ratings yet
KNIME Python Integration Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-06)
20 pages
4 PySpark Exercises
No ratings yet
4 PySpark Exercises
7 pages
Guidelines On Audit Quality
No ratings yet
Guidelines On Audit Quality
4 pages
KNIME Quickstart Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-09-07)
No ratings yet
KNIME Quickstart Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-09-07)
27 pages
KNIME Server User Guide: KNIME AG, Zurich, Switzerland Version 4.12 (Last Updated On 2021-04-13)
No ratings yet
KNIME Server User Guide: KNIME AG, Zurich, Switzerland Version 4.12 (Last Updated On 2021-04-13)
39 pages
Spark Connect Explained
No ratings yet
Spark Connect Explained
13 pages
SAI PMF Implementation Strategy 2020-22
No ratings yet
SAI PMF Implementation Strategy 2020-22
20 pages
Create A New KNIME Extension: Quickstart Guide
No ratings yet
Create A New KNIME Extension: Quickstart Guide
25 pages
KNIME Flow Control Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-10-22)
No ratings yet
KNIME Flow Control Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-10-22)
25 pages
Nova Guliyev: SR Data Consultant
No ratings yet
Nova Guliyev: SR Data Consultant
6 pages
KNIME Tableau Integration User Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2021-01-21)
No ratings yet
KNIME Tableau Integration User Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2021-01-21)
18 pages
Bpiche Resume-Tech
No ratings yet
Bpiche Resume-Tech
3 pages
C: A M S C C: ES Orpius Assive Panish Rawling Orpus
No ratings yet
C: A M S C C: ES Orpius Assive Panish Rawling Orpus
7 pages
Berkeley Data Analytics Stack BDAS Overview Ion Stoica Strata 2013
No ratings yet
Berkeley Data Analytics Stack BDAS Overview Ion Stoica Strata 2013
28 pages
The Art of Collaborative Data Science at Scale
No ratings yet
The Art of Collaborative Data Science at Scale
15 pages
Metric Monitoring - by Alex Xu - ByteByteGo Newsletter
No ratings yet
Metric Monitoring - by Alex Xu - ByteByteGo Newsletter
3 pages
Foreword To ISSAI 5230 E
No ratings yet
Foreword To ISSAI 5230 E
4 pages
Evan - Big Data Architect
No ratings yet
Evan - Big Data Architect
5 pages
GHRS - GAURAV NINAWE - Data Engineer Data Integration - Analyst - TIAA
No ratings yet
GHRS - GAURAV NINAWE - Data Engineer Data Integration - Analyst - TIAA
2 pages
Practical Assignment - :: Distributed Data Processing With Apache Spark
No ratings yet
Practical Assignment - :: Distributed Data Processing With Apache Spark
3 pages
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Learning AWS
From Everand
Learning AWS
Aurobindo Sarkar
4/5 (4)
Hands-On Multi-Cloud Kubernetes: Multi-cluster kubernetes deployment and scaling with FluxCD, Virtual Kubelet, Submariner and KubeFed
From Everand
Hands-On Multi-Cloud Kubernetes: Multi-cluster kubernetes deployment and scaling with FluxCD, Virtual Kubelet, Submariner and KubeFed
Joe Brian
No ratings yet
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
Cloud Computing Bible
From Everand
Cloud Computing Bible
Barrie Sosinsky
4/5 (2)
Cloud-Based Machine Learning
From Everand
Cloud-Based Machine Learning
Tanushri Kaniyar
No ratings yet
Learning Docker
From Everand
Learning Docker
Pethuru Raj
5/5 (5)
Cloud Native Security
From Everand
Cloud Native Security
Chris Binnie
5/5 (1)
Perl and Apache: Your visual blueprint for developing dynamic Web content
From Everand
Perl and Apache: Your visual blueprint for developing dynamic Web content
Adam McDaniel
No ratings yet
Learn Ansible: Automate your cloud infrastructure, security configuration, and application deployment with Ansible
From Everand
Learn Ansible: Automate your cloud infrastructure, security configuration, and application deployment with Ansible
Russ McKendrick
No ratings yet
Docker Orchestration
From Everand
Docker Orchestration
Randall Smith
No ratings yet
Hands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes
From Everand
Hands-On Microservices with Kubernetes: Build, deploy, and manage scalable microservices on Kubernetes
Gigi Sayfan
5/5 (1)
Streamlining Cloud Infrastructure: Mastering Google Cloud Deployment Manager
From Everand
Streamlining Cloud Infrastructure: Mastering Google Cloud Deployment Manager
Peter Jones
No ratings yet
OpenStack Cookbook
From Everand
OpenStack Cookbook
Jorven Halquin
No ratings yet
OpenStack Cookbook: Manage Compute, Storage and Networking through Single Interface
From Everand
OpenStack Cookbook: Manage Compute, Storage and Networking through Single Interface
Jorven Halquin
No ratings yet
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
From Everand
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
David Hecksel
5/5 (2)
Azure for .NET Core Developers: Implementing Microsoft Azure Solutions Using .NET Core Framework
From Everand
Azure for .NET Core Developers: Implementing Microsoft Azure Solutions Using .NET Core Framework
Kasam Ahmed Shaikh
No ratings yet
Hands-On Linux Administration on Azure - Second Edition: Develop, maintain, and automate applications on the Azure cloud platform, 2nd Edition
From Everand
Hands-On Linux Administration on Azure - Second Edition: Develop, maintain, and automate applications on the Azure cloud platform, 2nd Edition
Kamesh Ganesan
2/5 (1)
Building Websites with VB.NET and DotNetNuke 3.0
From Everand
Building Websites with VB.NET and DotNetNuke 3.0
Daniel N. Egan
1/5 (1)
OpenNebula 3 Cloud Computing
From Everand
OpenNebula 3 Cloud Computing
Giovanni Toraldo
No ratings yet
Installation and Configuration of IBM FileNet Information Management Software: A step-by-step guide to installing and configuring IBM FileNet ECM and Case Manager on RHEL 8.0 (English Edition)
From Everand
Installation and Configuration of IBM FileNet Information Management Software: A step-by-step guide to installing and configuring IBM FileNet ECM and Case Manager on RHEL 8.0 (English Edition)
Alan Bluck
No ratings yet
Set Up Your Own IPsec VPN, OpenVPN and WireGuard Server: Build Your Own VPN
From Everand
Set Up Your Own IPsec VPN, OpenVPN and WireGuard Server: Build Your Own VPN
Lin Song
5/5 (1)
Mastering GeoServer
From Everand
Mastering GeoServer
Colin Henderson
No ratings yet
Windows Azure programming patterns for Start-ups
From Everand
Windows Azure programming patterns for Start-ups
Becker Riccardo
No ratings yet
Puppet for Containerization
From Everand
Puppet for Containerization
Scott Coulton
No ratings yet
Mastering Shell for DevOps
From Everand
Mastering Shell for DevOps
Gilbert Stew
No ratings yet
Mastering the Art of Cloud Computing with Google Cloud Platform: Unraveling the Secrets of Experts
From Everand
Mastering the Art of Cloud Computing with Google Cloud Platform: Unraveling the Secrets of Experts
Steve Jones
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Mastering Shell for DevOps: Automate, streamline, and secure DevOps workflows with modern shell scripting
From Everand
Mastering Shell for DevOps: Automate, streamline, and secure DevOps workflows with modern shell scripting
Gilbert Stew
No ratings yet
Google Professional Cloud Developer Exam Guide: Ace the Google Professional Cloud Developer Exam with this comprehensive guide (English Edition)
From Everand
Google Professional Cloud Developer Exam Guide: Ace the Google Professional Cloud Developer Exam with this comprehensive guide (English Edition)
Fiifi Baidoo
No ratings yet
Mastering Go Network Automation
From Everand
Mastering Go Network Automation
Ian Taylor
No ratings yet
Build Your Own VPN Server: A Step by Step Guide: Build Your Own VPN
From Everand
Build Your Own VPN Server: A Step by Step Guide: Build Your Own VPN
Lin Song
No ratings yet
Building Websites with VB.NET and DotNetNuke 4
From Everand
Building Websites with VB.NET and DotNetNuke 4
Daniel N. Egan
1/5 (1)
OpenStack Essentials - Second Edition
From Everand
OpenStack Essentials - Second Edition
Dan Radez
No ratings yet
Deploy any website on google cloud platform
From Everand
Deploy any website on google cloud platform
AJ Books
No ratings yet
Microsoft Certified Azure Fundamentals Study Guide: Exam AZ-900
From Everand
Microsoft Certified Azure Fundamentals Study Guide: Exam AZ-900
James Boyce
No ratings yet
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mamood Alassouli
No ratings yet
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
From Everand
Mastering Go Network Automation: Automating Networks, Container Orchestration, Kubernetes with Puppet, Vegeta and Apache JMeter
Ian Taylor
No ratings yet
Professional Node.js: Building Javascript Based Scalable Software
From Everand
Professional Node.js: Building Javascript Based Scalable Software
Pedro Teixeira
No ratings yet
Troubleshooting Docker
From Everand
Troubleshooting Docker
John Wooten
No ratings yet
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
From Everand
Google Cloud Data Engineer 100+ Practice Exam Questions With Well Explained Answers
vivian njoroge
No ratings yet
Create Your Website and E-Commerce at No Cost. Thanks to WordPress and Google Cloud Platform
From Everand
Create Your Website and E-Commerce at No Cost. Thanks to WordPress and Google Cloud Platform
Giovanni Lillo
5/5 (1)
Advanced Serverless Data Management: Harnessing Google Cloud Functions for Cutting-Edge Processing
From Everand
Advanced Serverless Data Management: Harnessing Google Cloud Functions for Cutting-Edge Processing
Adam Jones
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Advanced GitLab CI/CD Pipelines: An In-Depth Guide for Continuous Integration and Deployment
From Everand
Advanced GitLab CI/CD Pipelines: An In-Depth Guide for Continuous Integration and Deployment
Adam Jones
No ratings yet
Kubernetes and Cloud Native Associate (KCNA) Exam Preparation
From Everand
Kubernetes and Cloud Native Associate (KCNA) Exam Preparation
Georgio Daccache
No ratings yet
Docker, Containers And All The Rest: First Edition, #1
From Everand
Docker, Containers And All The Rest: First Edition, #1
Ami Adi
No ratings yet
Google Cloud Run for DevOps: Automating Deployments and Scaling
From Everand
Google Cloud Run for DevOps: Automating Deployments and Scaling
Robert Johnson
No ratings yet
Google Cloud Platform an Architect's Guide
From Everand
Google Cloud Platform an Architect's Guide
alasdair gilchrist
5/5 (1)
Google Cloud Professional Cloud Architect 100+ Practice Exam questions with Detailed Answers
From Everand
Google Cloud Professional Cloud Architect 100+ Practice Exam questions with Detailed Answers
vivian njoroge
No ratings yet
Google Cloud Professional Cloud Security Engineer 100+ Practice Exam Questions with Detailed Answers
From Everand
Google Cloud Professional Cloud Security Engineer 100+ Practice Exam Questions with Detailed Answers
vivian njoroge
No ratings yet
CompTIA Cloud+ Study Guide: Exam CV0-003
From Everand
CompTIA Cloud+ Study Guide: Exam CV0-003
Ben Piper
No ratings yet
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
From Everand
Evaluation of Some Cloud Based Virtual Private Server (VPS) Providers
Dr. Hidaia Mahmood Alassouli
No ratings yet
Google Cloud Certified Associate Cloud Engineer Study Guide
From Everand
Google Cloud Certified Associate Cloud Engineer Study Guide
Dan Sullivan
No ratings yet
Google Cloud Platform - Networking
From Everand
Google Cloud Platform - Networking
alasdair gilchrist
No ratings yet
Cloud Native Apps on Google Cloud Platform: Use Serverless, Microservices and Containers to Rapidly Build and Deploy Apps on Google Cloud
From Everand
Cloud Native Apps on Google Cloud Platform: Use Serverless, Microservices and Containers to Rapidly Build and Deploy Apps on Google Cloud
alasdair gilchrist
No ratings yet
Mastering Google Cloud Platform: Navigating the Clouds
From Everand
Mastering Google Cloud Platform: Navigating the Clouds
Kameron Hussain
No ratings yet
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet

KNIME Google Cloud Integration User Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-08)

Uploaded by

KNIME Google Cloud Integration User Guide: KNIME AG, Zurich, Switzerland Version 4.3 (Last Updated On 2020-12-08)

Uploaded by

KNIME Google Cloud Integration

Cluster Setup with Livy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Connect to Dataproc cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Apache Hive in Google Dataproc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Google Cloud Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Google Authentication (API Key) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Google Cloud Storage Connector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Create a BigQuery table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Cluster Setup with Livy

 Please check best practices of using initialization actions.

© 2020 KNIME AG. All rights reserved. 1

Figure 1. Advanced options in the cluster creation page

© 2020 KNIME AG. All rights reserved. 2

Figure 2. Network and subnet

Figure 3. Set livy.sh as initialization action

© 2020 KNIME AG. All rights reserved. 3

1. Click on the cluster name in the cluster list page

Figure 4. Select the master node in the VM instances list

Figure 5. Find the external IP address of the master node

© 2020 KNIME AG. All rights reserved. 4

Livy Firewall Setup

1. Click on the cluster name in the cluster list page

Figure 6. Check Allow HTTP traffic in the Firewalls section

4. Next, go to the VPC network page

Figure 7. Open the default-allow-http firewall rule

© 2020 KNIME AG. All rights reserved. 5

Figure 8. Make sure to allow access to certain ports and IP addresses

© 2020 KNIME AG. All rights reserved. 6

Connect to Dataproc cluster

Figure 9. Connecting to Dataproc cluster

© 2020 KNIME AG. All rights reserved. 7

Apache Hive in Google Dataproc

Figure 10. Connect to Hive and create a Hive table

© 2020 KNIME AG. All rights reserved. 8

Google Cloud Storage

Figure 11. Connecting to and working with Google Cloud Storage

Google Authentication (API Key)

© 2020 KNIME AG. All rights reserved. 9

© 2020 KNIME AG. All rights reserved. 10

Figure 13. Select P12 file as the service account key

Google Cloud Storage Connector

© 2020 KNIME AG. All rights reserved. 11

/mybucket/myfolder/myfile. The path usually consists of:

◦ A leading slash (/)

◦ Followed by the name of a bucket (mybucket in the above example), followed by a

◦ Followed by the name of an object within the bucket (myfolder/myfile in the

© 2020 KNIME AG. All rights reserved. 12

Figure 14. Connecting to and working with Google BigQuery

© 2020 KNIME AG. All rights reserved. 13

Create a BigQuery table

© 2020 KNIME AG. All rights reserved. 14

You might also like