0% found this document useful (0 votes)
23 views

Connect Hadoop Database by Using Hive in Python - Ting Yu

The document discusses connecting Python to a Hadoop database using Hive. It explains how to install necessary Python libraries to allow a connection. It then provides an example Python script that connects to Hive and runs a sample query to retrieve data from a Hive table.

Uploaded by

Ahmed Mohamed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Connect Hadoop Database by Using Hive in Python - Ting Yu

The document discusses connecting Python to a Hadoop database using Hive. It explains how to install necessary Python libraries to allow a connection. It then provides an example Python script that connects to Hive and runs a sample query to retrieve data from a Hive table.

Uploaded by

Ahmed Mohamed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Ting Yu > Blogs >

Connect Hadoop Database by Using Hive in Python


posted Oct 11, 2014, 4:43 AM by Ting Yu   [ updated Oct 22, 2014, 2:47 AM ]

On the Hadoop platform, there are two scripting languages that simplify the code: PIG is a specific scripting language,
HIVE looks like SQL. Using HIVE is quite easy. It has a bunch of extension functions (called user defined functions) to
transform data like regular expression tools and so on. A developer can add user defined functions, by developing
them in Java. Another way to have a procedural logic that complements SQL Set-based language is to use a language
like Python.

In this example, we use a Python module to access a database table. Hive is used to get the data, partition it and
send the rows to the Python processes which are created on the different cluster nodes.

In addition to the standard python program, a few libraries need to be installed to allow Python to build the connection
to the Hadoop databae.

1.       Pyhs2, Python Hive Server 2 Client Driver: https://pypi.python.org/pypi/pyhs2/0.5.0

2.       Sasl, Cyrus-SASL bindings for Python: https://pypi.python.org/pypi/sasl/0.1.3

3.       Thrift, Python bindings for the Apache Thrift RPC system: https://pypi.python.org/pypi/thrift/0.9.1

4.       PyHive, Python interface to Hive: https://pypi.python.org/pypi/PyHive/0.1.0

All the libraries are installed in the fold ~/site-packages. Installation Commands are below:

unzip pyhs2-master.zip
cd pyhs2-master
python setup.py install --user

tar zxvf sasl-0.1.3.tar.gz


cd sasl-0.1.3
python setup.py install –user

tar zxvf thrift-0.9.1.tar.gz


cd thrift-0.9.1
python setup.py install –user

tar zxvf PyHive-0.1.0.tar.gz


cd PyHive-0.1.0
python setup.py install --user

The main Python code to connect the database:

#!/usr/bin/env python
import pyhs2 as hive
import getpass
DEFAULT_DB = 'default'
DEFAULT_SERVER = '10.37.40.1'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'PAM01-PRD01.IBM.COM'
# Get the username and password
u = raw_input('Enter PAM username: ')
s = getpass.getpass()
# Build the Hive Connection
connection = hive.connect(host=DEFAULT_SERVER, port= DEFAULT_PORT, authMechanism='LDAP', user=u + '@' +
DEFAULT_DOMAIN, password=s)
# Hive query statement
statement = "select * from user_yuti.Temp_CredCard where pir_post_dt = '2014-05-01' limit 100"
cur = connection.cursor()

# Runs a Hive query and returns the result as a list of list


cur.execute(statement)
df = cur.fetchall() 

Remember to change the permission of the executable

chmod +x test_hive2.py
./test_hive2.py

Commentaires

Vous n'êtes pas autorisé à ajouter des commentaires.

Afficher la version Ordinateur Mes sites

Avec la technologie de Google Sites

You might also like