Connect Hadoop Database by Using Hive in Python - Ting Yu
Connect Hadoop Database by Using Hive in Python - Ting Yu
On the Hadoop platform, there are two scripting languages that simplify the code: PIG is a specific scripting language,
HIVE looks like SQL. Using HIVE is quite easy. It has a bunch of extension functions (called user defined functions) to
transform data like regular expression tools and so on. A developer can add user defined functions, by developing
them in Java. Another way to have a procedural logic that complements SQL Set-based language is to use a language
like Python.
In this example, we use a Python module to access a database table. Hive is used to get the data, partition it and
send the rows to the Python processes which are created on the different cluster nodes.
In addition to the standard python program, a few libraries need to be installed to allow Python to build the connection
to the Hadoop databae.
3. Thrift, Python bindings for the Apache Thrift RPC system: https://pypi.python.org/pypi/thrift/0.9.1
All the libraries are installed in the fold ~/site-packages. Installation Commands are below:
unzip pyhs2-master.zip
cd pyhs2-master
python setup.py install --user
#!/usr/bin/env python
import pyhs2 as hive
import getpass
DEFAULT_DB = 'default'
DEFAULT_SERVER = '10.37.40.1'
DEFAULT_PORT = 10000
DEFAULT_DOMAIN = 'PAM01-PRD01.IBM.COM'
# Get the username and password
u = raw_input('Enter PAM username: ')
s = getpass.getpass()
# Build the Hive Connection
connection = hive.connect(host=DEFAULT_SERVER, port= DEFAULT_PORT, authMechanism='LDAP', user=u + '@' +
DEFAULT_DOMAIN, password=s)
# Hive query statement
statement = "select * from user_yuti.Temp_CredCard where pir_post_dt = '2014-05-01' limit 100"
cur = connection.cursor()
chmod +x test_hive2.py
./test_hive2.py
Commentaires