Bda Final1 CSBS
Bda Final1 CSBS
LABORATORY RECORD
NAME : …………………………………………………….
REG NO : …………………………………………………….
YEAR/SEM : …………………………………………………….
DEPARTMENT OF
COMPUTER SCIENCE AND BUSINESS SYSTEMS
LABORATORY RECORD
NAME : …………………………………………………….
REG NO : …………………………………………………….
YEAR/SEM : …………………………………………………….
BONAFIDE CERTIFICATE
This is to certify that this Big Data Analytics Laboratory record work
bonafide.
Reg. No …………………………………………………
______________________________________________________________________________________
To emerge as a Premier Institute for developing industry ready engineers with competency,
initiative and character to meet the challenges in global environment.
MISSION
DEPARTMENT OF
COMPUTER SCIENCE AND BUSINESS SYSTEMS
VISION:
To produce industry ready technologists with computer science and business system knowledge
and human values to contribute globally to the society at large.
MISSION:
PEO1. To ensure graduates will be proficient in utilizing the fundamental knowledge of basic
sciences, mathematics, Computer Science and Business systems for the applications relevant to
various streams of Engineering and Technology.
PEO2.To enrich graduates with the core competencies necessary for applying knowledge of computer
science and Data analytics tools to store, retrieve, implement and analyze data in the context of
business enterprise
PEO3.To enable graduates to gain employment in organizations and establish themselves as
professionals by applying their technical skills and leadership qualities to solve real world problems
and meet the diversified needs of industry, academia and research
PEO4.To equip the graduates with entrepreneurial skills and qualities which help them to perceive the
functioning of business, diagnose business problems, explore the entrepreneurial opportunities and
prepare them to manage business efficiently.
PSO1: To create, select, and apply appropriate techniques, resources, modern engineering
and business tools including prediction and data analytics to complex engineering
activities and businesssolutions
PSO2: To evolve computer science domain specific methodologies for effective decision
making inseveral critical problem domains of the real world.
PSO3: To be able to apply entrepreneurial skills and management tools for identifying,
analyzing andcreating business opportunities with smart business ideas.
CCS334 DEEP LEARNING LABORATORY L T PC
0 0 4 2
COURSE OBJECTIVES:
To understand the tools and techniques to implement deep neural networks
To apply different deep learning architectures for solving problems
To implement generative models for suitable applications
To learn to build and validate different models
LIST OF EXPERIMENTS:
TOTAL: 60 PERIODS
COURSE OUTCOMES:
After the completion of this course, students will be able to:
CO1:Describe big data and use cases from selected business domains.
CO2:Explain No SQL big data management.
CO3:Install, configure, and run Hadoop and HDFS.
CO4:Perform Map-reduce analytics using Hadoop.
CO5:Use Hadoop-related tools such as HBase,Cassandra, Pig,and Hive for bigdata analytics.
Branch and Year : CSBS/III Year Semester : V
Subject Code & Name : CCS334 & Big Data Analytics Laboratory
LIST OF EXPERIMENTS
Ex. Page
Date List of Experiments Marks Sign
No. No
S/Ws: Cassandra, Hadoop, Java, Pig, Hive and HbaseH/Ws: The machine must have 4GB RAM and
minimum 60 GB hard disk for better performance.
Exp.no: 1
Downloading and installing of Hadoop
Date: / / 20
Aim:
Procedure:
1. Installation of Hadoop:
Hadoop software can be installed in three modes of operation:
• Stand Alone Mode: Hadoop is distributed software and is designed to run on a
commodity of machines. However, we can install it on a single node in stand-alone mode. In this
mode, Hadoop software runs as a single monolithic java process. This mode is extremely useful
for debugging purpose. You can first test run your Map-Reduce application in this mode on
small data, before actually executing it on cluster with big data.
• Pseudo Distributed Mode: In this mode also, Hadoop software is installed on a Single
Node. Various daemons of Hadoop will run on the same machine as separate java processes.
Hence all the daemons namely, NameNode, DataNode, SecondaryNameNode, JobTracker,
TaskTracker run on single machine.
• Fully Distributed Mode: In Fully Distributed Mode, the daemons NameNode,
JobTracker, SecondaryNameNode (Optional and can be run on a separate node) run on the
Master Node.
The daemons DataNode and TaskTracker run on the Slave Node. Hadoop Installation:
UbuntuOperating System in stand-alone mode
#--insert JAVA_HOME
JAVA_HOME=
/opt/jdk1.8.0_05
#--in PATH variable just append at the end of the
linePATH=$PATH:$JAVA_HOME/bin
#--Append JAVA_HOME at end of the export
statementexport PATH JAVA_HOME
d. >ssh-keygen
c. save it
>source ~/.bashrc
> echo $HADOOP_PREFIX (to check the path)
> cd $HADOOP_PREFIX
> bin/hadoop version
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
> sbin/start-yarn.sh
else
>jps
8. Stopping Services
> sbin/stop-dfs.sh
> sbin/stop-yarn.sh
(or)
>sbin/stop-all.sh
Result:
Thus, the downloading and installing of Hadoop in three modes has been successfully
completed.
Exp.no: 2
File Management tasks in Hadoop
Date: / / 20
Aim:
To implementation of the file management task, such as Adding files and directories,
retrieving files and deleting files.
Procedure:
Pre-requisite o Java Installation - Check whether the Java is installed or not using the following
command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command.
hadoop version
1. Listing Files/Directories
The `-ls` command allows you to list all the files and directories in HDFS. The syntax is
similar tothe UNIX ls command.
hdfs dfs -ls /path
2. Creating Directories
You can create directories in HDFS using the `-mkdir`
command.hdfs dfs -mkdir /path/to/directory
3. Deleting Files/Directories
To remove a file or directory in HDFS, you can use the `-rm` command. To remove a
directory, you would need to use the `-r` (recursive) option.
hdfs dfs -rm /path/to/file
hdfsdfs -rm -r
/path/to/directory
4. Moving Files/Directories
The `-mv` command allows you to move files or directories from one location to another
withinHDFS.
hdfs dfs -mv /source/path /destination/path
5. Copying Files/Directories
To copy files or directories within HDFS, use the `-cp`
command.hdfs dfs -cp /source/path /destination/path
9. File/Directory Permissions
HDFS commands for file or directory permissions mirror the chmod, chown, and chgrp
commandsin UNIX.
hdfs dfs -chmod 755 /path/to/file
hdfsdfs -chown user:group
/path/to/file hdfs dfs -chgrp group
/path/to/file
10. Checking Disk Usage
The `-du` command displays the size of a directory or file, and `-dus` displays a summary of
thedisk usage.
hdfs dfs -du
/path/to/directory hdfs dfs -
dus /path/to/directory
Contents Marks
Aim and Algorithm 30
Program and Execution 30
Output and Result 30
Viva 10
Total 100
Result:
Thus, the all file management task, such as Adding files and directories, retrieving
files and deleting files are created and executed successfully.
Exp.no: 3
Installation of Hive
Date: / / 20
Aim:
To installation of Hive along with practice examples.
Procedure:
1. Pre-requisite o Java Installation - Check whether the Java is installed or not using the following
command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command.
hadoop -version
2. Starting all the services > sbin/start-dfs.sh
> sbin/start-yarn.sh
> sbin/start-all.sh
else
>jps
Change the directory to hive/conf $ gedit hive-site.xml open in editor goto line 3215 and remove
“” then save it and exit.
~/Downloads/apache-hive-3.1.2-bin$ cd ~/Downloads/apache-hive-3.1.2-bin
~/Downloads/apache-hive-3.1.2-bin$ rm -r metastore_db
Special step:
#Fix guava Incompatibility Error in Hive. The guava version has to be same as in Hadoop.
$ rm $HIVE_PREFIX/lib/guava-19.0.jar
$ cp $HADOOP_PREFIX/share/hadoop/hdfs/lib/guava-11.0.2.jar $HIVE_PREFIX/lib/
$HIVE_PREFIX/lib/
#Remember to use the schematool command once again to initiate the Derby database:
$HIVE_PREFIX/bin/ schematool -dbType derby -initSchema
Result:
Thus the installation of Hive along with practice examples successfully installed and
executed.
Exp.no: 4
Installation of Hbase, Installing thrift along with practice
Date: / / 20
Aim:
To installation of Hbase, Installing thrift along with practice examples.
Procedure:
1. Pre-requisite o Java Installation - Check whether the Java is installed or not using the following
command. java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command. hadoop -version
#--insert Hbase_PREFIX
Hbase_PREFIX=/home/vrsoopslab/Downloads/hb/hba
se#--in PATH variable just append at the end of the
line PATH=$PATH:$Hbase_PREFIX/bin
#--Append Hbase_PREFIX at end of the export statement
export PATH JAVA_HOME HADOOP_PREFIX HIVE_PREFIX Hbase_PREFIX
Open gedit hbase-site.xml and place the following properties inside the file
vrsoopslab@ubuntu:~ /Downloads/hb/hbasebin$ gedit hbase-site.xml(code as below)
<property>
<name>hbase.rootdir</name>
<value>file:///home/vrsoopslab/Downloads/hb/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/vrsoopslab/Downloads/hb/hbase/zookeeper</value>
</property>
Open hosts file present in /etc. location and mention the IPs as shown in below.
Step 7) Hadoop Run command: sbin/start-all.sh
Contents Marks
Aim and Algorithm 30
Program and Execution 30
Output and Result 30
Viva 10
Total 100
Result:
Thus the installation of Hbase, installation of thrift along with the examples are
successfullyinstalled and executed.
Exp.no: 5
Run a basic word count MapReduce program
Date: / / 20
Aim:
To run a basic word count MapReduce program to understand MapReduce paradigm
Procedure:
Pre-requisite steps:
o Java Installation - Check whether the Java is installed or not using the following command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command.
hadoop version
o To install Python on Ubuntu, follow these steps:
1. Update Package Lists: $ sudo apt update
2. Install Python: Ubuntu usually comes with Python pre-installed, but you can install
the latest version using the following command
$ sudo apt install python3
3. Check Installation: After installation, you can verify if Python is installed by typing
the following in the terminal:
$ python3 –version
4. Install pip (Python Package Manager): Pip is a package manager for Python that
allows you to easily install and manage Python libraries.
$ sudo apt install python3-pip
5. Check pip Installation: After installing pip, you can check if it's installed by running:
$pip3 –version
Hadoop Streaming is a feature that comes with Hadoop and allows users or developers to use
various different languages for writing MapReduce programs like Python, C++, Ruby, etc. It
supports all the languages that can read from standard input and write to standard output. We
will beimplementing Python with Hadoop Streaming and will observe how it works. We will
implement the word count problem in python to understand Hadoop Streaming. We will be
creating mapper.pyand reducer.py to perform map and reduce tasks.
Let‟s create one file which contains multiple words that we can count.
Step 1: Create a input file with the name input-wc.txt and add some data
to it.ubuntu@ubuntu:~$ cd wordcount
ubuntu@ubuntu:~/wordcount$ ls
-ltotal 12
-rw-rw-r-- 1 ubuntu ubuntu 125 Aug 16 22:14 input-wc.txt
Step 2: Create a mapper.py file that implements the mapper logic. It will read the data from
STDINand will split the lines into words, and will generate an output of each word with its
individual count.
ubuntu@ubuntu:~/wordcount$ nano
mapper.pyubuntu@ubuntu:~/wordcount$ cat
mapper.py
Mapper.py code:
#!/usr/bin/env
pythonimport sys
mapper.py
Step 3: Create a reducer.py file that implements the reducer logic. It will read the output of
mapper.py from STDIN (standard input) and will aggregate the occurrence of each word and
willwrite the final output to STDOUT.
ubuntu@ubuntu:~/wordcount$ nano
reducer.pyubuntu@ubuntu:~/wordcount$ cat
reducer.py
Reducer.py code:
#!/usr/bin/env python
current_word =
Nonecurrent_count
= 0 word = None
Now let‟s check our reducer code reducer.py with mapper.py is it working properly or not with
thehelp of the below command.
We can see that our reducer is also working fine in our local system.
Step 4: Now let‟s start all our Hadoop daemons with the below command.
Create an input directory in HDFS and upload your input data file:
Prepare your Python scripts and package them into a JAR file for running with Hadoop
Streaming
ubuntu@ubuntu:~/wordcount$ cat mapper.py | gzip > mapper.py.gz
ubuntu@ubuntu:~/wordcount$ cat reducer.py | gzip > reducer.py.gz
ubuntu@ubuntu:~/wordcount$ jar cvf pythonWordCount.jar mapper.py.gz
reducer.py.gz added manifest adding: mapper.py.gz(in = 136) (out=
139)(deflated -2%) adding: reducer.py.gz(in = 531) (out= 536)(deflated 0%)
ubuntu@ubuntu:~/wordcount$
Step 5: Now download the latest hadoop-streaming jar file from this Link. Then place, this
Hadoop,- streaming jar file to a place from you can easily access it. In my case, I am placing it
to /wordcount folder where mapper.py and reducer.py file is present.
Now let‟s run our python files with the help of the Hadoop streaming utility as shown below.
In the above command in -output, we will specify the location in HDFS where we want our
output to be stored. So let‟s check our output in output file at location /wordcount/output/* in
my case. Wecan check results by manually vising the location in HDFS or with the help of cat
command as shown below.
To view output:
ubuntu@ubuntu:~/wordcount$ hdfs dfs -cat /output/part-00000
To view in Browser:
localhost:5007
0
localhost:8088
39
Contents Marks
Aim and Algorithm 30
Program and Execution 30
Output and Result 30
Viva 10
Total 100
Result:
Thus the basic word count MapReduce program to understand MapReduce paradigm was
successfully executed.
Exp.no: 6
Implement of matrix multiplication with MapReduce
Date: / / 20
Aim:
To implement of matrix multiplication with hadoop MapReduce MapReduce program
Procedure:
Step 1: Pre-requisite o Java Installation - Check whether the Java is installed or not using the following
command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command.
hadoop version
We can
see that our reducer is also working fine in our local system.
Using the following command to change the execution permission for mapper.py and
reducer.pyubuntu@ubuntu:~/matrix$ chmod 777 mapper.py reducer.py
ubuntu@ubuntu:~/matrix$ ls –l
You can run the map reduce job and view the result by the following step (considering you
have already put input files in HDFS)
Now download the latest hadoop-streaming jar file from this Link. Then place, this
Hadoop,streaming jar file to a place from you can easily access it. In my case, I am placing it to
/matrix folder where mapper.py and reducer.py file is present.
Now let‟s run our python files with the help of the Hadoop streaming utility as shown below.
To view in Browser:
localhost:5007
0
localhost:8088
Contents Marks
Aim and Algorithm 30
Program and Execution 30
Output and Result 30
Viva 10
Total 100
Result:
Thus the implementation of matrix multiplication with hadoop map reduce was successfully
executed.
Exp.no: 7
Practice importing and exporting data from various database
Date: / / 20
Aim:
To practice importingand exporting data from various database.
.
Procedure:
1. Pre-requisite o Java Installation - Check whether the Java is installed or not using the following
command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command. hadoop -version
In order to store data into HDFS, we make use of Apache Hive which provides an SQL-like
interface between the user and the Hadoop distributed file system (HDFS) which integrates
Hadoop. We perform the following steps:
Step 1: Login into MySQL mysql
-u root –pcloudera
Step 2: Create a database and table and insert data. create database geeksforgeeeks; create
table geeksforgeeeks.geeksforgeeks(author_name varchar(65), total_no_of_articles int,
phone_no int, address varchar(65));
insert into geeksforgeeks values(“Rohan”,10,123456789,”Lucknow”);
Database Name : geeksforgeeeks and Table Name : geeksforgeeks
Step 3: Create a database and table in the hive where data should be imported.
create table geeks_hive_table(name string, total_articles int, phone_no int, address string) row
formatdelimited fields terminated by „,‟;
Hive Database : geeks_hive and Hive Table : geeks_hive_table
To export data into MySQL from HDFS, perform the following steps:
Step 1: Create a database and table in the hive.
create table hive_table_export(name string,company string, phone int, age int) row format delimited
fields terminated by „,‟;
Step 2: Insert data into the hive table. insert into hive_table_export
values("Ritik","Amazon",234567891,35);
Contents Marks
Aim and Algorithm 30
Program and Execution 30
Output and Result 30
Viva 10
Total 100
Result:
Thus the practice of imported and exported of data from various databases was successful.
CONTENT BEYOND SYLLABUS
Exp.no: 8
Installation of Pig
Date: / / 20
Aim:
Procedure:
1. Pre-requisite o Java Installation - Check whether the Java is installed or not using the following
command.
java -version
o Hadoop Installation - Check whether the Hadoop is installed or not using the following
command.
hadoop -version
In order to install Apache Pig, you must have Hadoop and Java installed on your system.
Step 1: Download the new release of Apache Pig from this Link. In my case I have downloaded the
pig0.17.0.tar.gz version of Pig which is latest and about 220MB in size.
Step 2: Now move the downloaded Pig tar file to your desired location. In my case I am Moving it to
my /Documents folder.
Step 3: Now we extract this tar file with the help of below command (make sure to check your tar
filename):
tar -xvf pig-0.17.0.tar.gz
Step 4: Once it is installed it‟s time for us to switch to our Hadoop user. In my case it is hadoopusr. If
you have not created the separate dedicated user for Hadoop then, in that case, no need to move that file
and set the path according to your PIG PATH in the .bashrc file. To switch user you can use below
command or you can also switch manually by switch user settings.
su - hadoopusr
Step 5: Now we need to move this extracted folder to the hadoopusr user. For that, use the below
command(make sure name of your extracted folder is pig-0.17.0 otherwise change it accordingly)
sudo mv pig-0.17.0 /usr/local/
Step 6: Now once we moved it we need to change the environment variable for Pig‟s location. For that
open the bashrc file with below command.
sudo gedit ~/.bashrc
Once the file open save the below path inside this bashrc file.
export PATH=$PATH:/usr/local/pig-0.17.0/bin
Step 7: Then check whether you have configured it correctly or not using the below command:
source ~/.bashrc
Step 8: Once you get it correct that‟s it we have successfully install pig to our Hadoop single node setup,
now we start pig with below pig command.
pig
Step 9: You can check your pig version with the below command.
pig –version
Contents Marks
Aim and Algorithm 30
Program and Execution 30
Output and Result 30
Viva 10
Total 100
Result: