Installation of Hive On Ubuntu
Installation of Hive On Ubuntu
To configure Apache Hive, first you need to download and unzip Hive. Then you need to customize
the following files and settings:
• Edit .bashrc file
• Edit hive-config.sh file
• Create Hive directories in HDFS
• Configure hive-site.xml file
• Initiate Derby database
The mirror link on the subsequent page leads to the directories containing available Hive tar
packages. This page also provides useful instructions on how to validate the integrity of files
retrieved from mirror sites.
The Ubuntu system presented in this guide already has Hadoop 3.2.1 installed. This Hadoop
version is compatible with the Hive 3.1.2 release.
Once the download process is complete, untar the compressed Hive package:
tar xzf apache-hive-3.1.2-bin.tar.gz
The Hive binary files are now located in the apache-hive-3.1.2-bin directory.
Step 2: Configure Hive Environment Variables (bashrc)
The $HIVE_HOME environment variable needs to direct the client shell to the apache-hive-3.1.2-
bin directory. Edit the .bashrc shell configuration file using a text editor of your choice (we will be
using nano):
sudo nano .bashrc
The Hadoop environment variables are located within the same file.
Save and exit the .bashrc file once you add the Hive variables. Apply the changes to the current
environment with the following command:
source ~/.bashrc
Step 3: Edit hive-config.sh file
Apache Hive needs to be able to interact with the Hadoop Distributed File System. Access the hive-
config.sh file using the previously created $HIVE_HOME variable:
sudo nano $HIVE_HOME/bin/hive-config.sh
Note: The hive-config.sh file is in the bin directory within your Hive installation directory.
Add the HADOOP_HOME variable and the full path to your Hadoop directory:
export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
The output confirms that users now have write and execute permissions.
The output confirms that users now have write and execute permissions.
Note: The hive-site.xml file controls every aspect of Hive operations. The number of available
advanced settings can be overwhelming and highly specific. Consult the official Hive Configuration
Documentation regularly when customizing Hive and Hive Metastore settings.
Using Hive in a stand-alone mode rather than in a real-life Apache Hadoop cluster is a safe option
for newcomers. You can configure the system to use your local storage rather than the HDFS layer
by setting the hive.metastore.warehouse.dir parameter value to the location of your Hive warehouse
directory.
Step 6: Initiate Derby Database
Apache Hive uses the Derby database to store metadata. Initiate the Derby database, from the Hive
bin directory using the schematool command:
$HIVE_HOME/bin/schematool -dbType derby -initSchema
Derby is the default metadata store for Hive. If you plan to use a different database solution, such as
MySQL or PostgreSQL, you can specify a database type in the hive-site.xml file.
How to Fix guava Incompatibility Error in Hive
If the Derby database does not successfully initiate, you might receive an error with the following
content:
“Exception in thread “main” java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V”
This error indicates that there is most likely an incompatibility issue between Hadoop and Hive
guava versions.
Locate the guava jar file in the Hive lib directory:
ls $HIVE_HOME/lib
Locate the guava jar file in the Hadoop lib directory as well:
ls $HADOOP_HOME/share/hadoop/hdfs/lib
The two listed versions are not compatible and are causing the error. Remove the existing guava
file from the Hive lib directory:
rm $HIVE_HOME/lib/guava-19.0.jar
Copy the guava file from the Hadoop lib directory to the Hive lib directory:
cp $HADOOP_HOME/share/hadoop/hdfs/lib/guava-27.0-jre.jar $HIVE_HOME/lib/
Use the schematool command once again to initiate the Derby database:
$HIVE_HOME/bin/schematool -dbType derby -initSchema
hive
You are now able to issue SQL-like commands and directly interact with HDFS.
Conclusion
You have successfully installed and configured Hive on your Ubuntu system. Use HiveQL to query
and manage your Hadoop distributed storage and perform SQL-like tasks. Your Hadoop cluster now
has an easy-to-use gateway to previously inaccessible RDBMS.