0% found this document useful (0 votes)
602 views40 pages

CCS334 BDA Lab Manual Final

Big Data Analytics

Uploaded by

dhananjeyans41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
602 views40 pages

CCS334 BDA Lab Manual Final

Big Data Analytics

Uploaded by

dhananjeyans41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 40
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘ogrammes Accredited by NBA, Now Delhi, (UG Programmes — Mi by the Government of Tamil Na ted to Guru MarudharKosari Building, Jyothi Nagar, Rajiv Gandhi Salal, OMA Thoraipakkam, Chennai DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE CCS334_ BIG DATA ANALYTICS LABORATORY REGULATION-2021 NAME REGISTER NUMBER YEAR/SEMESTER : TIV/V MISRIMAL NAVAJEE MUNOTH JAIN , ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘Approved by AICTE &Programmes Accredited by NBA, Now Delhi, (UG Programmes ~ MECH, AIKDS, ECE, SEIT) Al Programmes Recognized by the Government of Tamil Nadu and A¥fiiated to Anna University, Chennai Guru MarudharKesari Building, 4yothi Nagar, Rajiv Gandhi Salal, OMR Thoraipakkam, Chennai - 600 097, DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE VISION To produce high quality, creative and ethical engineers, and technologists contributing effectively to the ever-advancing Artificial Intelligence and Data Science field. MISSION To educate future software engineers with strong fundamentals by continuously improving the teaching-learning methodologies using contemporary aids. To produce ethical engineers/researchers by instilling the values of humility, humaneness, honesty and courage to serve the society. To create a knowledge hub of Artificial Intelligence and Data Science with everlasting urge to learn by developing, maintaining and continuously improving the resources/Data Science. MISRIMAL NAVAJEE MUNOTH JAIN , ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘Approved by AICTE &Programmes Accredited by NBA, Now Delhi, (UG Programmes ~ MECH, AIKDS, ECE, SEIT) Al Programmes Recognized by the Government of Tamil Nadu and A¥fiiated to Anna University, Chennai Guru MarudharKesari Building, 4yothi Nagar, Rajiv Gandhi Salal, OMR Thoraipakkam, Chennai - 600 097, Register No: BONAFIDE CERTIFICATE This is to certify that this is a bonafide record of the work done by Mr./Ms. of II YEAR/ V SEM B.Tech- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE in CCS334- BIG DATA ANALYTICS LABORATORY during the Academic year 2023 — 2024. Faculty-in-charge Head of the Department Submitted for the University Practical Examination held on :_/_/ Internal Examiner External Examiner DATE: DATE: 37] MISRIMAL NAVAJEE MUNOTH JAIN fe ENGINEERING COLLEGE OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION A Jain Minority institution ‘Approved by AICTE &Programmes Accredited by NBA, New Delhi, (US Programmes ~ MECH, EEE, ECE, CSE & IT) {Al Programmes Recognized by the Government of Tamil Nadu and A¥fiiated to Anna University, Chennai Guru MarudharKesari Building, 4yothi Nagar, Rajiv Gandhi Salal, OMR Thoraipakkam, Chennai - 600 097, CCS334 BIG DATA ANALYTICS LABORATORY COURSE OUTCOMES Describe big data and use cases from selected business domains, Explain NoSQL big data management. Install, configure, and run Hadoop and HDFS. Perform map-reduce analyties using Hadoop. Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data analytics CCS334 BIG DATA ANALYTICS LABORATORY CONTENT PAGE EXPERIMENTS, No SIGNATURE Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts, Configuration files. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files and Deleting files Implement of Matrix Multiplication with Hadoop Map Reduce Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. Installation of Hive along with practice examples. Installation of HBase along with Practice examples. Installing thrift. Practice importing and exporting data from various databases. Syllabus CCS334 BIG DATA ANALYTICS LABORATORY COURSE OBJECTIVES: To understand big data. To learn and use NoSQL big data management. To learn mapreduce analytics using Hadoop and related tools. To work with map reduce applications To understand the usage of Hadoop related tools for Big Data Analytics Tools: Cassandra, Hadoop, Java, Pig, Hive and HBase. Suggested Exercises: 1 Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts, Configuration files. 2. Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files and Deleting files 3. Implement of Matrix Multiplication with Hadoop Map Reduce 4, Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm. 5. Installation of Hive along with practice examples. 6. Installation of HBase, Installing thrift along with Practice examples 7. Practice importing and exporting data from various databases. EXP.NO:I DOWNLOADING AND INSTALLING HADOOP; UNDERSTANDING DATE: DIFFERENT HADOOP MODES. STARTUP SCRIPTS, * CONFIGURATION FILES. AIM: To Downloading and installing Hadoop; Understanding different Hadoop modes. Startup scripts, Configuration files. PREREQUISITES TO INSTALL HADOOP ON WINDOWS VIRTUAL BOX (For Linux): itis used for installing the operating system on it. OPERATING SYSTEM: You can install Hadoop on Windows or Linux based operating systems. Ubuntu and CentOS are very commonly used. JAVA: You need to install the Java 8 package on your system. HADOOP: You require Hadoop latest version Install Java © Java IDK Link to download, https://www.oracle.com/java/technologies/javase-jdk8-downloads.htm] Extract and install Java in C:Mava Open ema and type -> javae —version BY Command Prompt eee ur) TCOEC REC Leys] (Cera. el la hs hoe eh a \Users\asus>javac -version ae eee 2. Download Hadoop https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz extract to C:AHadoop ADE app cfrbackup-ECPREIHB eSupport Games hadoop © hadoop-2.8.0 | hadoop-28.4 + hadoop-3.3.0 © Hortanwork 1 informatica 7 logs 3 Set the path JAVA_HOME Environment variable 4, Set the path HADOOP_HOME Environment vari < > This PC Nar ick access a @& OneDrive ~_& This Pc Collapse Manage Pin to Start Map network drive... ‘Open in new window Pin to Quick access Disconnect network drive. Add a network location Delete Rename iable > os ic) = ADE = app © cfrbackup-ECPRPIHB files (86) View basic information ebout your computer © device Manage dows ection @ mete setting & siten protection © 2019 Micosot Compton Al ght re @ Aavenced Stem Installed memory (RAM): 00.6 (789 GB usable Sistem toe bit Oeratna Sytem x6l-baced processor Computer name Desctop-s75Fci Workgroup worKsRaur System Properties Computer Name Hardware Advanced system Protection Remote ‘You must be logged on as an Administrator to make most of these changes. Performance Visual effects, processor scheduling, memory usage, and virtual memory Settings User Profiles Desktop settings related to your sig Startup and Recovery System startup, system failure, and debugging information Settings Environment Variables User variables for asus Variable Value HaDOOP HOME Intel IDEA Commer JAVA HOME OnaDrive OneDriveConsumer Pat SEF MASK NO7ONF User variables for asus Varable ChocoateylastPathlpeate HADOO? HOME ‘\hadoop-3.3.0\bin nity Edi... C\Program Files etBrains\intlld IDEA Community Ecltion 207, CAlava\idet 8. 0.241\bin CAUsers\asus\OneDrive ‘AUsers\asus\OneDrive CAPython39\Seripts:C:\Python39\CA\Python37\Seripts\.CAPytho, ‘CHECKS. 4 Value snanz949n25673884 ‘Chadoop-3.3.0\bin Intell IDEA Community Eck. C\Progrom Fle etSrainatintelld IDEA Community Eitton 2019 Java, HOME OneDrive OneDriveconsumer Path SF it User Vavinle Vasile name: "Variable vale Browse Directory. EAlavalt 8.0.24%\bin CA\cers\asusiOnedrive €.\eers\asus\Onebive AP ython29\Sesipte\C\PythorB9\CAPy thon Serpts\CPytho. HADOOP HOME Cahadoop-3 306i Browse Fil Environment Variables User variables for asus Variable 4ADOOP_HOME Value ‘CAhadoop-3.3.0\bin Intelld IDEA Community Edi... C:\Program Files\etBrains\Intelld IDEA Cormunity Edition 2019, JAVA HOME OneDrive ‘OneDriveConsumer Path CAavaldkt 80_241\bin ‘CAUsers\asus\OneDrive ‘CAUsers\asus\OneDrive CAPython39\Scripts\:C\Python39\.C\Python3\Scripts\.CAP tho, SFF MASK NOZONFCHECKS 1 Jet vibes for aus Vorable ChocoateystPathupdate — = 132412948225523054 CAhadoop 33.0\bin Inteld IOLA Community Edi. CAProgram Fes eran JAVA HOME OneDrive Path SF ce User Variable ‘Variable name: " Verisble value Downe Decoy. Almac 80 241\6i0 CAUsersasus\OneDive CAPythond% Scripts. CA\Python39\C\P thon’ Scipls\CPytho AVA HOME [CNavayaet80_24m\bid Downe Fe 5. Configurations Edit file C:/Hadoop-3,3.0/etc/hadoop/core-site.xml, paste the xml code in folder and save fs.defaultFS hdfs://localhost:9000 Rename “mapred-site.xml.template” to “mapred-site.xml” and edit this file C:/Hadoop- 3.3.0/ete/hadoop/mapred-site.xml, paste xml code and save this file. mapreduce.framework.name yam Create folder “data” under “C:\Hadoop-3.3.0" Create folder “datanode” under “C:\Hadoop-3.3.0\data” Create folder “namenode” under “C:\Hadoop-3.3.0\data” Edit file C:\Hadoop-3.3.0/ete/hadoop/hdfs-site.xml, paste xml code and save this file, dfs.replication 1 dfs.namenode.name.dir /hadoop-3.3.0/data/namenode dfs.datanode.data.dir /hadoop-3.3.0/data/datanode Edit file C:/Hadoop-3.3.0/etc/hadoop/yarn-site.xml,, paste xml code and save this file. yarn.nodemanager.aux-services mapreduce_shufile yarn.nodemanager.auxservices.mapreduce.shufle.class org.apache.hadoop.mapred ShufileHandler Edit file C:/Hadoop-3.3.0/ete/hadoop/hadoop-env.cmd by closing the command line “JAVA_HOME=%JAVA_HOME%” instead of set “JAVA_HOME=C:\Java” 6. Hadoop Configurations Download https://github.com/brainmentorspvtltd/BigData_RDE/blob/master/Hadoop%20Configuration.zip or (for hadoop 3) https://github.com/s911415/apache-hadoop-3.1.0-winutils Copy folder bin and replace existing bin folder in C:\Hadoop-3.3.0\bin Format the NameNode Open emd and type command “hdfs namenode format” El C:\Windows\System32\cmd.exe Pee as uC eC Tey! (c) 2020 Microsoft Corporation. All rights re: AULT eo EERO Su Malad 7. Testing Open emd and change directory to C:\Hadoop-3.3.0\sbin © type start-all.emd BBE C\Windows\System32\cmd.exe sion [email protected] soft Windo rea on ¢) 202@ Microsoft Corporation. All K :\hadoop-3.3.@\sbin>start-all.cmd (Or you can start like this) Start namenode and datanode with this command © type start-dfs.emd ‘© Start yarn through this command © type start-yarn.emd Make sure these apps are running + Hadoop Namenode © Hadoop datanode * YARN Resource Manager ~ YARN Node Manager Open: http://localhost:8088 All Applications Open: http://localhost:9870 SO © bocaihost eee Overview ‘ocalhost:9000' (va Compiled: “Tue Ju 7 00:14:00 +0530 2020 by bata fom braneh.3:3.0 Custer io: d Book Poot a: Summary Hadoop installed Successfully... RESULT: Downloaded and installed Hadoop and also understand different Hadoop modes. Startup scripts, Configuration files are successfully implemented. HADOOP IMPLEMENTATION OF FILE MANAGEMENT /EXP.NO:2 TASKS, SUCH AS ADDING 1 FILES AND DIRECTORIES, RETRIEVING FILES AND pat DELETING FILES. AIM: To implement the following file management tasks in Hadoop: 1, Adding files and directories 2. Retrieving files 3. Deleting Files 1.Create a directory in HDFS at given paths). Usage: hadoop fi -mkdir Example: hadoop fs -mkdir /user/saurzcode/dirl /user/saurzcode/dir2 2.List the contents of adirectory. Usage hadoop fs -Is Example: hadoop f -Is /user/saurzcode 3.Upload and download a file in HDFS. Upload: hadoop fs -put: Copy single sre file, or multiple src files from local file system to the Hadoop data file system Usage: hadoop fs -put ... Example: hadoop fs -put /home/saurzcode/Samplefile.txt /user/ saurzcode/dir3/ Download: hadoop fs -get: Copies/Downloads files to the local file system Usage: hhadoop fs -get Example: hadoop fs -get /user/saurzcode/dir3/Samplefile.txt /home/ 4.See contents of a file Same as unix cat command: Usage: hadoop fs -cat Example: hadoop fs -cat /user/saurzcode/dirl /abe.txt 1. Copy a file from source todestination This command allows multiple sources as well in which casethe destination must be a directory. Usage: hadoop fs -cp Example: hadoop fs Juser/saurzcode/dirl/abe.txt juser/saurzcode/dir2 2. Copy a file from/To Local file system to HDF copyFromLocal Usage: hadoop fs -copyFromLocal URI Example: hadoop fs -copyFromLocal /home/saurzcode/abe.txt /user/ saurzcode/abe.txt Similar to put command, except that the source is restricted to a local file reference copyToLocal Usage: hadoop fs -copyToLocal [-ignorecre] [-cre] URI Similar to get command, except that the destination is restricted to a local file reference. 3. Move file from source to destination. Note:~ Moving files across filesystem is not permitted. Usage hadoop fs -mv Example: hadoop fs -mv /user/saurzcode/dirl /abe.txt /user/saurzcode/ dir2 4, Remove a file or directory in HDFS. Remove files specified as argument, Deletes directory only when it is empty Usage hadoop fs -rm Example: hadoop fs -rm /user/saurzcode/dirl /abe.txt Reeursive version of delete, Usage hadoop fs -rmr Example: hadoop fs -rmr /user/saurzcode/ 5. Display last few lines of a file. Similar to tail command in Unix. Usage : hadoop fs -tail Example: hadoop fi -tail /user/saurzcode/dirl /abe.txt 6. Display the aggregate length of a file. Usage hadoop fs -du Example: hadoop fs -du /user/saurzcode/dir /abe.txt RESULT: Thus,the Hadoop Implementation of file management tasks, such as Adding files and directories, retrieving files and Deleting files is executed successfully. IEXP.NO:3 IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP MAP REDUCE To write a Map Reduce Program that implements Matrix Multiplication. ALGORITE We assume that the input matrices are already stored in Hadoop Distributed File System (HDES) in a suitable format (e.g., CSV, TSV) where each row represents a matrix element. The matrices are compatible for multiplication (the number of columns in the first matrix is equal to the number of rows in the second matrix). STEP 1: MAPPER ‘The mapper will take the input matrices and emit key-value pairs for each element in the result matrix. The key will be the (row, column) index of the result element, and the value will be the corresponding element value, STEP 2: REDUCER The reducer will take the key-value pairs emitted by the mapper and calculate the partial sum for each clement in the result matrix. STEP 3: MAIN DRIVER ‘The main driver class sets up the Hadoop job configuration and specifies the input and output paths for the matrices. STEP 4: RUNNING THE JOB To run the MapReduce job, you need to package your classes into a JAR file and then submit it to Hadoop using the hadoop jar command. Make sure to replace input_path and output_path with the actual HDFS paths to your input matrices and desired output directory. PROGRAM: import java.io JOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop .io. Text; import org.apache.hadoop.mapreduce Mapper; import org.apache.hadoop.mapreduce. Reducer; import org.apache. hadoop.conf, Configuration; import org.apache.hadoop.mapreduce Job; import org.apache.hadoop.mapreduce.lib input. TextInputFormat; import org.apache. hadoop.mapreduce.lib.output. TextOutputFormat; import org.apache.hadoop.mapreduce.lib input FilelnputFormat; import org.apache.hadoop .mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.f.Path; public class MatrixMultiplicationMapper extends Mapper { protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int result = 0; for (Text value : values) { #/ Accumulate the partial sum for the result element result += Integer.parselnt(value.toString()); 3 // Emit the final result for the result element context.write(key, new IntWritable(result)); } public class MatrixMultiplicationDriver { public static void main(Stringf] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Matrix Multiplication"); job.setJarByClass(MatrixMultiplicationDriver.class); job.setMapperClass(MatrixMultiplicationMapper.class); job. setReducerClass(MatrixMultiplicationReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat setOutputPath(job, new Path(args[1); System.exit(job.waitForCompletion(true) ? 0 : 1); Run the program hadoop jar matrixmultiplication jar MatrixMultiplicationDriver input_path output_path Dy Jlendigubuntu:~/Desktop$ hadoop jar HatrixHulttplicatton. jar /matrix_data/ /natr\x_output_nev] part-00000(4) 3% €,0,240.0 61,250. 62,260. 10,880. 11,930. 1,2,980. RESUL’ ‘Thus the Map Reduce Program that implements Matrix Multiplication was executed and verified successfully. /EXP.NO:4 RUN A BASIC WORD COUNT MAP REDUCE PROGRAM TO UNDERSTAND MAP REDUCE PARADIGM [DATE: AIM: To write a Basic Word Count program to understand Map Reduce Paradigm. ALGORITHM: The entire MapReduce program can be fundamentally divided into three parts: + Mapper Phase Code * Reducer Phase Code © Driver Code STEP 1: MAPPER CODE: We have created a class Map that extends the class Mapper which is already defined in the MapReduce Framework. We define the data types of input and output key/value pair afler the class declaration using angle brackets. * Both the input and output of the Mapper is a key/value pair. Input: © The key is nothing but the offset of each line in the text file:LongWritable + The value is each individual line : Text Output: + The key is the tokenized words: Text We have the hardcoded value in our case which is 1: IntWritable © Example - Dear 1, Bear 1, We have written a java code where we have tokenized cach word and assigned them a hardcoded value equal to 1 STEP 2: REDUCER CODE: * We have created a class Reduce which extends class Reducer like that of Mapper. * We define the data types of input and output key/value pair after the class declaration using angle brackets as done for Mapper. Both the input and the output of the Reducer is a key value pair. ‘The key nothing but those unique words which have been generated after the sorting and shuffling phase: Text The value is a list of integers corresponding to each key: IntWritable Example — Bear, [1, 1], ete Output: * The key is all the unique words present in the input text file: Text, * The value is the number of occurrences of each of the unique words: IntWritable © Example — Bear, 2; Car, 3, ete, «We have aggregated the values present in each of the list corresponding to each key and produced the final answer. + In general, a single reducer is created for each of the unique words, but, you can specify the number of reducer in mapred-site xml. STEP 3: DRIVER CODE: * Inthe driver class, we set the configuration of our MapReduce job to run in Hadoop. We specify the name of the job , the data type of input/ output of the mapper and reducer We also specify the names of the mapper and reducer classes. The path of the input and output folder is also specified. The method setInputFormatClass () is used for specifying that how a Mapper will read the input data or what will be the unit of work. Here, we have chosen TextInputFormat so that single line is read by the mapper at a time from the input text file. The main () method is the entry point for the driver. In this method, we instantiate a new Configuration object for the job. PROGRAM: import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop io. Text; import org.apache.hadoop.mapreduce Mapper; import org.apache hadoop.mapreduce. Reducer; import org.apache. hadoop.conf. Configuration; import org.apache.hadoop.mapreduce.Job; import org.apache hadoop.mapreduce.lib input. TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.mapreduce.lib input FilelnputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat, import org.apache. hadoop. fs.Path; public class WordCount { public static class Map extends Mapper { public void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException{ String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { value.set(tokenizer.nextToken()); context.write(value, new IntWritable(1)); public static class Reduce extends Reducer { public void reduce(Text key, Iterable values Context context) throws IOException InterruptedException { int sum=0; for(IntWritable x: values) { sum+=x.get(); } context.write(key, new IntWritable(sum)); public static void main(String[] args) throws Exception { Configuration conf= new Configuration; Job job = new Job(conf,"My Word Count Program"); job.setJarByClass(WordCount.class); job.setMapperClass(Map class); job.sctReducerClass(Reduce.class); job setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputF ormat.class); Path outputPath = new Path(args[1]); J/Configuring the input/output path from the filesystem into the job FileInputFormat.addInputPath(job, new Path(args[0|)); FileOutputFormat setOutputPath(job, new Path(args[1])); /ideleting the output path automatically from hdfs so that we don't have to delete it explicitly outputPath.getFileSystem(conf).delete(outputPath); Hexiting the job only if the flag value becomes false System.exit(job.waitForCompletion(true) ? 0 : 1); } } Run the MapReduce code: The command for running a MapReduce code is: hadoop jar hadoop-mapreduce-example jar WordCount /sample/input /sample/output OUTPUT: Sears ere ei ter re COC’) Ms Se cares INFO napreduce.20b: nop OX reduce OX INFO Rapreduce.2eb: ap 100% reduce 109% INFO napreduce. 20 yab_3473930730090 e663 completed suced ira part:r-00000(3) % 2 a ADRIAN, 2 Agdiles, AEsculaptus? ALARBUS, ALENCON, ALL'S (25 ANDRONICUS ANGELO, 2 RESULT: Thus the Map Reduce Program that implements word count was executed and verified successfully. IEXP.NO:5 INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES. DATE: AIM: To install HIVE along with practice examples. PREREQUISITES: * Java Development Kit (JDK) installed and the JAVA_HOME environment variable set + Hadoop installed and configured on your Windows system. STEP-BY-STEP INSTALLATION: 1. Download HIVE: Visit the Apache Hive website and download the latest stable version of Hive. Official Apache Hive website: https://hive.apache.org/ 2. Extract the Downloaded Hive Archive to a Directory on Your Windows Machine, e.g., CAhive. 3. Configure Hive: * Open the Hive configuration file (hive-site.xml) located in the conf folder of the extracted Hive directory. Set the necessary configurations, such as Hive Metastore connection settings and Hadoop configurations. Make sure to adjust paths accordingly for Windows. Here's an example of some configurations: javax.jdo.option.ConnectionURL jdbe:derby:;databaseName~/path/to/metastore_db;ereate-true JDBC connect string for a JDBC metastore. 4.Environment Variables Setup: Add the Hive binary directory (C:\hive\bin in this example) to your PATH environment variable. Set the HIVE HOME environment variable to point to the Hive installation directory (CAhive in this example). 5. Start the Hive Metastore service: To start the Hive Metastore service, you can use the schematool script: Stree wer 6. Start Hive: * Open a command prompt or terminal and navigate to the Hive installation directory. © Execute the hive command to start the Hive shell. EXAMPLES: 1. Create a Database: To create a new database in HIVE, use the following syntax: CREATE DATABASE database_name; Example: CREATE DATABASE mydatabase; 2. Use a Database: To use a specific database in HIVE, use the following syntax: USE database_name; Example: USE mydatabase; Show Databases: To display a list of available databases in HIVE, use the following syntax: SHOW DATABASES; Create a Table: To create a table in HIVE, use the following syntax: CREATE TABLE table_name ( column! datatype, column? datatype, Example: CREATE TABLE mytable (id INT, name STRING, age INT. Show Tables: To display a list of tables in the current database, use the following syntax: SHOW TABLES; 6. Deseribe a Table: To view the schema and details of a specific table, use the following syntax: DESCRIBE table_name; Example: DESCRIBE mytable; 7. Insert Data into a Table: To insert data into a table in HIVE, use the following syntax: INSERT INTO table_name (column1, column2, ...) VALUES (valuel, value2, ...); Example: INSERT INTO mytable (id, name, age) VALUES (1, ‘John Doe', 25); 8. Select Data from a Table: SELECT column], column2, ... FROM table_name WHERE condition; Example: SELECT * FROM mytable WHERE age > 20; RESUL’ ‘Thus the Installation of HIVE was done successfully. IEXP.NO:6 INSTALLATION OF HBASE ALONG WITH PRACTICE EXAMPLES DATE: AIM: To install HBASE using Virtual Machine and perform some operations in HBASE. ALGORITHM: Step 1: Install a Virtual Machine * Download and install a virtual machine software such as VirtualBox (https://www.virtualbox.org/) or VMware (https://www.vmware.com/), * Create a new virtual machine and install a Unix-based operating system like Ubuntu or CentOS, You can download the ISO image of your desired Linux distribution from their official websites, Step 2: Set up the Virtual Machine + Launch the virtual machine and install the Unix-based operating system following the installation wizard. ‘© Make sure the virtual machine has network connectivity to download software packages. Step 3: Install Java Open the terminal or command line in the virtual machine. Update the package list sudo apt update Install OpenJDK (Java Development Kit) sudo apt install default-jdk Verify the Java installation: java -version Step 4: Download and Install HBase * Inthe virtual machine, navigate to the directory where you want to install HBase. * Download the HBase binary distribution from the Apache HBase website (https://hbase.apache.org/). Look for the latest stable version. ‘Extract the downloaded archive tar -xvf .tar.gz * Replace with the actual name of the HBase archive file 34 Move the extracted HBase directory to a desired location: sudo my /opthhbase Replace with the actual name of the extracted HBase directory. Step 5: Configure HBase * Open the HBase configuration file for editing: sudo nano /opt/hbase/conf/hbase-site.xml ‘Add the following properties to the configuration file: hbase.rootdir file://var/libfhbase hbase.zookeeper.property.dataDir /var/lib/zookeeper Save the file and exit the text editor. Step 6: Start HBase © Start the HBase server: sudo /opt/hbase/bin/start-hbase.sh HBASE PRACTICE EXAMPLES: Step 1: Start HBase * Make sure HBase is installed and running on your Windows system. Step 2: Open HBase Shell © Open a command prompt or terminal window and navigate to the directory where the HBase installation is located. Run the following command to start the HBase shell. >>hbase shell tep 3: Create a Table «Inthe HBase shell, you can create a table with column families. For example, let's create a table named "my_table" with a column family called "cf": >> ereate 'my_table', 'ef* Step 4: Insert Data * To insert data into the table, you can use the put command, ‘Here's an example of inserting a row with a specific row key and values: >> put 'my_table’, ‘row!’ ‘ef:columnt’, 'valuel’ >> put 'my_table’, ‘row’, 'cf:column2', 'value2 Step 5: Get Data * You can retrieve data from the table using the get command. + For example, to get the values of a specific row: >> get 'my_table', 'rowl’ * This will display all the column family values for the specified row Step 6: Sean Data * To scan and retrieve multiple rows or the entire table, use the scan command, ‘* For instance, to scan all rows in the table: >> scan 'my_table’ * This will display all rows and their corresponding column family values. Step 7: Delete Data * To delete a specific row or a particular cell value, you can use the delete command. «Here's an example of deleting a specific row: >>delete 'my_table', ‘row!’ Step 8: Disable and Drop Table * Ifyou want to remove the table entirely, you need to disable and drop it. * Use the following commands: >>disable 'my_table’ >>drop 'my_table’ RESUL! ‘Thus the installation of HBase using Virtual Machine was done successfilly. [EXP.NO:7 INSTALLATION OF THRIFT DATE: To install Apache thrift on Windows OS. ALGORITHM: Step 1: Download Apache Thrift: * Visit the Apache Thrift website: https:/thrift.apache.org/ * Go to the "Downloads" section and find the latest version of Thrift. * Download the Windows binary distribution (ZIP file) for the desired version. Step 2: Extract the ZIP file: * Locate the downloaded ZIP file and extract its contents to a directory of your choice. © This directory will be referred to as in the following steps, Step 3: Set up environment variables: Open the Start menu and search for "Environment Variables" and select "Edit the system environment variables." Click the "Environment Variables" button at the bottom right of the "System Properti¢ window. Under the "System variables” section, find the "Path" variable and click "Edit.” Add the following entries to the "Variable value" field (replace with the actual directory path): \bin \lib Click "OK" to save the changes. Step 4: Verify the installation: * Open a new Command Prompt window. ‘Run the following command to verify that Thrift is installed and accessible: thrift version ‘* Ifeverything is set up correctly, you should see the version number of Thrift printed on the screen, RESULT: Thus the installation of Thrift on windows OS was done successfully. IEXP.NO:8 PRACTICE IMPORTING AND EXPORTING DATA FROM DATE: VARIOUS DATABASES. AIM: ‘To import and export data from various Databases using SQOOP. ALGORITHM: Step 1: Install SQOOP. # First, you need to install Sqoop on your Hadoop cluster or machine * Download the latest version of Sqoop from the Apache Sqoop website (hutp://sqoop.apache.org/) and follow the installation instructions provided in the documentation Step 2: Importing data from a database: ‘* To import data from a database into Hadoop, use the following Sqoop command: Sqoop import ~connect jdbe:://:/\ ~-username ~-password ~table \ --target-dir \ --m Replace the placeholders (, , , , , , , , and ) with the appropriate values for your database and Hadoop environment. Step 3: Exporting data to a database: To export data from Hadoop to a database, use the following Sqoop command: sqoop export —connect jdbe:://:/ \ ~-username ~-password ~-table ~-export-dir \ --input-fields-terminated-by '" Replace the placeholders (, , , , , , , , and ) with the appropriate values for your database and Hadoop environment, RESUL’ Thus the implementation export data from various Databases using SQOOP was done successfully.

You might also like