100% found this document useful (1 vote)
548 views89 pages

HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!

The document provides an overview of Hadoop and Python for beginners. It covers topics such as what is Hadoop, its modules including HDFS, YARN, MapReduce, HBase and Hive. It also discusses Python basics like installing Python, strings, tuples, dictionaries, operators, functions, if-else statements and loops. The document is intended to teach coding to beginners using Hadoop and Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
548 views89 pages

HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!

The document provides an overview of Hadoop and Python for beginners. It covers topics such as what is Hadoop, its modules including HDFS, YARN, MapReduce, HBase and Hive. It also discusses Python basics like installing Python, strings, tuples, dictionaries, operators, functions, if-else statements and loops. The document is intended to teach coding to beginners using Hadoop and Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

HADOOP AND PYTHON

FOR
BEGINNERS

LEARN TO CODE FAST


BY
TAM SEL
HADOOP

What is Big Data


What is Hadoop
Hadoop Installation
HADOOP MODULES
What is HDFS
HDFS Features and Goals
What is YARN
HADOOP MapReduce
Data Flow In MapReduce
MapReduce API
MapReduce Word Count Example
MapReduce Char Count Example
HBase
What is HBase
HBase Read
HBase Write
HBase MemStore
HBase Installation
RDBMS vs HBase
HBase Commands
HBase Example
Hive Tutorial
What is HIVE
Hive Architecture
Apache Hive Installation
HIVE Data Types
Hive - Create Database
Hive - Drop Database
Hive - Create Table
Hive - Load Data
Partitioning in Hive
Dynamic Partitioning
Hadoop Interview Questions
Python for Beginners
CHAPTER 1
Installing Python
CHAPTER 2
Python String
CHAPTER 3
Python Tuple
CHAPTER 4
Python Dictionary
CHAPTER 5
Python Operators
CHAPTER 6
Python Functions
CHAPTER 7
Python If-else statements
CHAPTER 8
Python Loops
Interview Questions and Answers
HADOOP
FOR
BEGINNERS

LEARN TO CODE FAST


BY
TAM SEL
HADOOP
Hadoop tutorial provides basic and advanced concepts of Hadoop. Our Hadoop tutorial is designed for beginners and professionals.
Hadoop is an open source framework. It is provided by Apache to process and analyze very huge volume of data. It is written in Java
and currently used by Google, Facebook, LinkedIn, Yahoo, Twitter etc.
Our Hadoop tutorial includes all topics of Big Data Hadoop with HDFS, MapReduce, Yarn, Hive, HBase, Pig, Sqoop etc.

What is Big Data


Data which are very large in size is called Big Data. Normally we work on data of size MB(WordDoc ,Excel) or maximum GB(Movies,
Codes) but data in Peta bytes i.e. 10^15 byte size is called Big Data. It is stated that almost 90% of today's data has been generated in
the past 3 years.

Sources of Big Data


These data come from many sources like
Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data on a day to day
basis as they have billions of users worldwide.
E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users buying trends
can be traced.
Weather Station: All the weather station and satellite gives very huge data which are stored and manipulated to
forecast weather.
Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish their plans and
for this they store the data of its million users.
Share Market: Stock exchange across the world generates huge amount of data through its daily transaction.

3V's of Big Data


1. Velocity: The data is increasing at a very fast rate. It is estimated that the volume of data will double in every 2 years.
2. Variety: Now a days data are not stored in rows and column. Data is structured as well as unstructured. Log file, CCTV
footage is unstructured data. Data which can be saved in tables are structured data like the transaction data of the bank.
3. Volume: The amount of data which we deal with is of very large size of Peta bytes.

Use case
An e-commerce site XYZ (having 100 million users) wants to offer a gift voucher of 100$ to its top 10 customers who have spent the
most in the previous year.Moreover, they want to find the buying trend of these customers so that company can suggest more items
related to them.

Issues
Huge amount of unstructured data which needs to be stored, processed and analyzed.

Solution
Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File System) which uses commodity hardware to form
clusters and store data in a distributed fashion. It works on Write once, read many times principle.
Processing: Map Reduce paradigm is applied to data distributed over network to find the required output.
Analyze: Pig, Hive can be used to analyze the data.
Cost: Hadoop is open source so the cost is no more an issue.
What is Hadoop
Hadoop is an open source framework from Apache and is used to store process and analyze data which are very huge in volume.
Hadoop is written in Java and is not OLAP (online analytical processing). It is used for batch/offline processing.It is being used by
Facebook, Yahoo, Google, Twitter, LinkedIn and many more. Moreover it can be scaled up just by adding nodes in the cluster.

Modules of Hadoop
1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the basis of that HDFS was developed.
It states that the files will be broken into blocks and stored in nodes over the distributed architecture.
2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the cluster.
3. Map Reduce: This is a framework which helps Java programs to do the parallel computation on data using key value
pair. The Map task takes input data and converts it into a data set which can be computed in Key value pair. The output
of Map task is consumed by reduce task and then the out of reducer gives the desired result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other Hadoop modules.

Hadoop Architecture
The Hadoop architecture is a package of the file system, MapReduce engine and the HDFS (Hadoop Distributed File System). The
MapReduce engine can be MapReduce/MR1 or YARN/MR2.
A Hadoop cluster consists of a single master and multiple slave nodes. The master node includes Job Tracker, Task Tracker,
NameNode, and DataNode whereas the slave node includes DataNode and TaskTracker.

Hadoop Distributed File System


The Hadoop Distributed File System (HDFS) is a distributed file system for Hadoop. It contains a master/slave architecture. This
architecture consist of a single NameNode performs the role of master, and multiple DataNodes performs the role of a slave.
Both NameNode and DataNode are capable enough to run on commodity machines. The Java language is used to develop HDFS. So
any machine that supports Java language can easily run the NameNode and DataNode software.

NameNode
It is a single master server exist in the HDFS cluster.
As it is a single node, it may become the reason of single point failure.
It manages the file system namespace by executing an operation like the opening, renaming and closing the files.
It simplifies the architecture of the system.

DataNode
The HDFS cluster contains multiple DataNodes.
Each DataNode contains multiple data blocks.
These data blocks are used to store data.
It is the responsibility of DataNode to read and write requests from the file system's clients.
It performs block creation, deletion, and replication upon instruction from the NameNode.

Job Tracker
The role of Job Tracker is to accept the MapReduce jobs from client and process the data by using NameNode.
In response, NameNode provides metadata to Job Tracker.

Task Tracker
It works as a slave node for Job Tracker.
It receives task and code from Job Tracker and applies that code on the file. This process can also be called as a Mapper.

MapReduce Layer
The MapReduce comes into existence when the client application submits the MapReduce job to Job Tracker. In response, the Job
Tracker sends the request to the appropriate Task Trackers. Sometimes, the TaskTracker fails or time out. In such a case, that part of the
job is rescheduled.

Advantages of Hadoop
Fast: In HDFS the data distributed over the cluster and are mapped which helps in faster retrieval. Even the tools to
process the data are often on the same servers, thus reducing the processing time. It is able to process terabytes of data in
minutes and Peta bytes in hours.
Scalable: Hadoop cluster can be extended by just adding nodes in the cluster.
Cost Effective: Hadoop is open source and uses commodity hardware to store data so it really cost effective as
compared to traditional relational database management system.
Resilient to failure: HDFS has the property with which it can replicate data over the network, so if one node is down or
some other network failure happens, then Hadoop takes the other copy of data and use it. Normally, data are replicated
thrice but the replication factor is configurable.

History of Hadoop
The Hadoop was started by Doug Cutting and Mike Cafarella in 2002. Its origin was the Google File System paper, published by
Google.

Let's focus on the history of Hadoop in the following steps: -


In 2002, Doug Cutting and Mike Cafarella started to work on a project, Apache Nutch. It is an open source web crawler
software project.
While working on Apache Nutch, they were dealing with big data. To store that data they have to spend a lot of costs
which becomes the consequence of that project. This problem becomes one of the important reason for the emergence of
Hadoop.
In 2003, Google introduced a file system known as GFS (Google file system). It is a proprietary distributed file system
developed to provide efficient access to data.
In 2004, Google released a white paper on Map Reduce. This technique simplifies the data processing on large clusters.
In 2005, Doug Cutting and Mike Cafarella introduced a new file system known as NDFS (Nutch Distributed File
System). This file system also includes Map reduce.
In 2006, Doug Cutting quit Google and joined Yahoo. On the basis of the Nutch project, Dough Cutting introduces a
new project Hadoop with a file system known as HDFS (Hadoop Distributed File System). Hadoop first version 0.1.0
released in this year.
Doug Cutting gave named his project Hadoop after his son's toy elephant.
In 2007, Yahoo runs two clusters of 1000 machines.
In 2008, Hadoop became the fastest system to sort 1 terabyte of data on a 900 node cluster within 209 seconds.
In 2013, Hadoop 2.2 was released.
In 2017, Hadoop 3.0 was released.
Hadoop Installation
Environment required for Hadoop: The production environment of Hadoop is UNIX, but it can also be used in Windows using
Cygwin. Java 1.6 or above is needed to run Map Reduce Programs. For Hadoop installation from tar ball on the UNIX environment you
need
1. Java Installation
2. SSH installation
3. Hadoop Installation and File Configuration

1) Java Installation
Step 1. Type "java -version" in prompt to find if the java is installed or not. If not then download java from
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html . The tar filejdk-7u71-linux-x64.tar.gz will
be downloaded to your system.
Step 2. Extract the file using the below command

1. #tar zxf jdk-7u71-linux-x64.tar.gz

Step 3. To make java available for all the users of UNIX move the file to /usr/local and set the path. In the prompt switch to root user
and then type the command below to move the jdk to /usr/lib.

1. # mv jdk1.7.0_71 /usr/lib/

Now in ~/.bashrc file add the following commands to set up the path.

1. # export JAVA_HOME=/usr/lib/jdk1.7.0_71
2. # export PATH=PATH:$JAVA_HOME/bin

Now, you can check the installation by typing "java -version" in the prompt.

2) SSH Installation
SSH is used to interact with the master and slaves computer without any prompt for password. First of all create a Hadoop user on the
master and slave systems

1. # useradd hadoop
2. # passwd Hadoop

To map the nodes open the hosts file present in /etc/ folder on all the machines and put the ip address along with their host name.

1. # vi /etc/hosts

Enter the lines below

1. 190.12.1.114 hadoop-master
2. 190.12.1.121 hadoop-salve-one
3. 190.12.1.143 hadoop-slave-two

Set up SSH key in every node so that they can communicate among themselves without password. Commands for the same are:

1. # su hadoop
2. $ ssh-keygen -t rsa
3. $ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master
4. $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1
5. $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
6. $ chmod 0600 ~/.ssh/authorized_keys
7. $ exit

3) Hadoop Installation
Hadoop can be downloaded from http://developer.yahoo.com/hadoop/tutorial/module3.html
Now extract the Hadoop and copy it to a location.

1. $ mkdir /usr/hadoop
2. $ sudo tar vxzf hadoop-2.2.0.tar.gz ?c /usr/hadoop

Change the ownership of Hadoop folder

1. $sudo chown -R hadoop usr/hadoop

Change the Hadoop configuration files:


All the files are present in /usr/local/Hadoop/etc/hadoop
1) In hadoop-env.sh file add

1. export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71

2) In core-site.xml add following between configuration tabs,

1. <configuration>
2. <property>
3. <name>fs.default.name</name>
4. <value>hdfs://hadoop-master:9000</value>
5. </property>
6. <property>
7. <name>dfs.permissions</name>
8. <value>false</value>
9. </property>
10. </configuration>

3) In hdfs-site.xmladd following between configuration tabs,

1. <configuration>
2. <property>
3. <name>dfs.data.dir</name>
4. <value>usr/hadoop/dfs/name/data</value>
5. <final>true</final>
6. </property>
7. <property>
8. <name>dfs.name.dir</name>
9. <value>usr/hadoop/dfs/name</value>
10. <final>true</final>
11. </property>
12. <property>
13. <name>dfs.replication</name>
14. <value>1</value>
15. </property>
16. </configuration>

4) Open the Mapred-site.xml and make the change as shown below

1. <configuration>
2. <property>
3. <name>mapred.job.tracker</name>
4. <value>hadoop-master:9001</value>
5. </property>
6. </configuration>

5) Finally, update your $HOME/.bahsrc

1. cd $HOME
2. vi .bashrc
3. Append following lines in the end and save and exit
4. #Hadoop variables
5. export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71
6. export HADOOP_INSTALL=/usr/hadoop
7. export PATH=$PATH:$HADOOP_INSTALL/bin
8. export PATH=$PATH:$HADOOP_INSTALL/sbin
9. export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
10. export HADOOP_COMMON_HOME=$HADOOP_INSTALL
11. export HADOOP_HDFS_HOME=$HADOOP_INSTALL
12. export YARN_HOME=$HADOOP_INSTALL

On the slave machine install Hadoop using the command below

1. # su hadoop
2. $ cd /opt/hadoop
3. $ scp -r hadoop hadoop-slave-one:/usr/hadoop
4. $ scp -r hadoop hadoop-slave-two:/usr/Hadoop

Configure master node and slave node

1. $ vi etc/hadoop/masters
2. hadoop-master
3.
4. $ vi etc/hadoop/slaves
5. hadoop-slave-one
6. hadoop-slave-two

After this format the name node and start all the deamons

1. # su hadoop
2. $ cd /usr/hadoop
3. $ bin/hadoop namenode -format
4.
5. $ cd $HADOOP_HOME/sbin
6. $ start-all.sh

The easiest step is the usage of cloudera as it comes with all the stuffs pre-installed which can be downloaded from
http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip
HADOOP MODULES
What is HDFS
Hadoop comes with a distributed file system called HDFS. In HDFS data is distributed over several machines and replicated to ensure
their durability to failure and high availability to parallel application.
It is cost effective as it uses commodity hardware. It involves the concept of blocks, data nodes and node name.

Where to use HDFS


Very Large Files: Files should be of hundreds of megabytes, gigabytes or more.
Streaming Data Access: The time to read whole data set is more important than latency in reading the first. HDFS is
built on write-once and read-many-times pattern.
Commodity Hardware:It works on low cost hardware.

Where not to use HDFS


Low Latency data access: Applications that require very less time to access the first data should not use HDFS as it is
giving importance to whole data rather than time to fetch the first record.
Lots Of Small Files:The name node contains the metadata of files in memory and if the files are small in size it takes a
lot of memory for name node's memory which is not feasible.
Multiple Writes:It should not be used when we have to write multiple times.

HDFS Concepts
1. Blocks: A Block is the minimum amount of data that it can read or write.HDFS blocks are 128 MB by default and this is
configurable.Files n HDFS are broken into block-sized chunks,which are stored as independent units.Unlike a file
system, if the file is in HDFS is smaller than block size, then it does not occupy full block?s size, i.e. 5 MB of file stored
in HDFS of block size 128 MB takes 5MB of space only.The HDFS block size is large just to minimize the cost of seek.
2. Name Node: HDFS works in master-worker pattern where the name node acts as master.Name Node is controller and
manager of HDFS as it knows the status and the metadata of all the files in HDFS; the metadata information being file
permission, names and location of each block.The metadata are small, so it is stored in the memory of name
node,allowing faster access to data. Moreover the HDFS cluster is accessed by multiple clients concurrently,so all this
information is handled bya single machine. The file system operations like opening, closing, renaming etc. are executed
by it.
3. Data Node: They store and retrieve blocks when they are told to; by client or name node. They report back to name
node periodically, with list of blocks that they are storing. The data node being a commodity hardware also does the
work of block creation, deletion and replication as stated by the name node.

HDFS DataNode and NameNode Image:

HDFS READ IMAGE


HDFS Write Image

Since all the metadata is stored in name node, it is very important. If it fails the file system can not be used as there would be no way of
knowing how to reconstruct the files from blocks present in data node. To overcome this, the concept of secondary name node arises.
Secondary Name Node: It is a separate physical machine which acts as a helper of name node. It performs periodic check points.It
communicates with the name node and take snapshot of meta data which helps minimize downtime and loss of data.

Starting HDFS
The HDFS should be formatted initially and then started in the distributed mode. Commands are given below.
To Format $ hadoop namenode -format
To Start $ start-dfs.sh

HDFS Basic File Operations


1. Putting data to HDFS from local file system
First create a folder in HDFS where data can be put form local file system.

$ hadoop fs -mkdir /user/test


Copy the file "data.txt" from a file kept in local folder /usr/home/Desktop to HDFS folder /user/ test

$ hadoop fs -copyFromLocal /usr/home/Desktop/data.txt /user/test


Display the content of HDFS folder

$ Hadoop fs -ls /user/test


2. Copying data from HDFS to local file system
$ hadoop fs -copyToLocal /user/test/data.txt /usr/bin/data_copy.txt
3. Compare the files and see that both are same
$ md5 /usr/bin/data_copy.txt /usr/home/Desktop/data.txt

Recursive deleting
hadoop fs -rmr <arg>

Example:
hadoop fs -rmr /user/sonoo/

HDFS Other commands


The below is used in the commands
"<path>" means any file or directory name.
"<path>..." means one or more file or directory names.
"<file>" means any filename.
"<src>" and "<dest>" are path names in a directed operation.
"<localSrc>" and "<localDest>" are paths as above, but on the local file system
put <localSrc><dest>

Copies the file or directory from the local file system identified by localSrc to dest within the DFS.
copyFromLocal <localSrc><dest>

Identical to -put
copyFromLocal <localSrc><dest>

Identical to -put
moveFromLocal <localSrc><dest>

Copies the file or directory from the local file system identified by localSrc to dest within HDFS, and then deletes the local
copy on success.
get [-crc] <src><localDest>

Copies the file or directory in HDFS identified by src to the local file system path identified by localDest.
cat <filen-ame>

Displays the contents of filename on stdout.


moveToLocal <src><localDest>

Works like -get, but deletes the HDFS copy on success.


setrep [-R] [-w] rep <path>

Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target
over time)
touchz <path>
Creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already
size 0.
test -[ezd] <path>

Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.


stat [format] <path>

Prints information about path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o),
replication (%r), and modification date (%y, %Y).
HDFS Features and Goals
The Hadoop Distributed File System (HDFS) is a distributed file system. It is a core part of Hadoop which is used for data storage. It is
designed to run on commodity hardware.
Unlike other distributed file system, HDFS is highly fault-tolerant and can be deployed on low-cost hardware. It can easily handle the
application that contains large data sets.
Let's see some of the important features and goals of HDFS.

Features of HDFS
Highly Scalable - HDFS is highly scalable as it can scale hundreds of nodes in a single cluster.
Replication - Due to some unfavorable conditions, the node containing the data may be loss. So, to overcome such
problems, HDFS always maintains the copy of data on a different machine.
Fault tolerance - In HDFS, the fault tolerance signifies the robustness of the system in the event of failure. The HDFS
is highly fault-tolerant that if any machine fails, the other machine containing the copy of that data automatically
become active.
Distributed data storage - This is one of the most important features of HDFS that makes Hadoop very powerful. Here,
data is divided into multiple blocks and stored into nodes.
Portable - HDFS is designed in such a way that it can easily portable from platform to another.

Goals of HDFS
Handling the hardware failure - The HDFS contains multiple server machines. Anyhow, if any machine fails, the
HDFS goal is to recover it quickly.
Streaming data access - The HDFS applications usually run on the general-purpose file system. This application
requires streaming access to their data sets.
Coherence Model - The application that runs on HDFS require to follow the write-once-ready-many approach. So, a
file once created need not to be changed. However, it can be appended and truncate.
What is YARN
Yet Another Resource Manager takes programming to the next level beyond Java , and makes it interactive to let another application
Hbase, Spark etc. to work on it.Different Yarn applications can co-exist on the same cluster so MapReduce, Hbase, Spark all can run at
the same time bringing great benefits for manageability and cluster utilization.

Components Of YARN
Client: For submitting MapReduce jobs.
Resource Manager: To manage the use of resources across the cluster
Node Manager:For launching and monitoring the computer containers on machines in the cluster.
Map Reduce Application Master: Checks tasks running the MapReduce job. The application master and the
MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers.

Jobtracker & Tasktrackerwere were used in previous version of Hadoop, which were responsible for handling resources and checking
progress management. However, Hadoop 2.0 has Resource manager and NodeManager to overcome the shortfall of Jobtracker &
Tasktracker.

Benefits of YARN
Scalability: Map Reduce 1 hits ascalability bottleneck at 4000 nodes and 40000 task, but Yarn is designed for 10,000
nodes and 1 lakh tasks.
Utiliazation: Node Manager manages a pool of resources, rather than a fixed number of the designated slots thus
increasing the utilization.
Multitenancy: Different version of MapReduce can run on YARN, which makes the process of upgrading MapReduce
more manageable.
HADOOP MapReduce
MapReduce tutorial provides basic and advanced concepts of MapReduce. Our MapReduce tutorial is designed for beginners and
professionals.
Our MapReduce tutorial includes all topics of MapReduce such as Data Flow in MapReduce, Map Reduce API, Word Count Example,
Character Count Example, etc.

What is MapReduce?
A MapReduce is a data processing tool which is used to process the data parallelly in a distributed form. It was developed in 2004, on
the basis of paper titled as "MapReduce: Simplified Data Processing on Large Clusters," published by Google.
The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer phase. In the Mapper, the input is given in the
form of a key-value pair. The output of the Mapper is fed to the reducer as input. The reducer runs only after the Mapper is over. The
reducer too takes input in key-value format, and the output of reducer is the final output.

Steps in Map Reduce


The map takes data in the form of pairs and returns a list of <key, value> pairs. The keys will not be unique in this case.
Using the output of Map, sort and shuffle are applied by the Hadoop architecture. This sort and shuffle acts on these list
of <key, value> pairs and sends out unique keys and a list of values associated with this unique key <key, list(values)>.
An output of sort and shuffle sent to the reducer phase. The reducer performs a defined function on a list of values for
unique keys, and Final output <key, value> will be stored/displayed.

Sort and Shuffle


The sort and shuffle occur on the output of Mapper and before the reducer. When the Mapper task is complete, the results are sorted by
key, partitioned if there are multiple reducers, and then written to disk. Using the input from each Mapper <k2,v2>, we collect all the
values for each unique key k2. This output from the shuffle phase in the form of <k2, list(v2)> is sent as input to reducer phase.

Usage of MapReduce
It can be used in various application like document clustering, distributed sorting, and web link-graph reversal.
It can be used for distributed pattern-based searching.
We can also use MapReduce in machine learning.
It was used by Google to regenerate Google's index of the World Wide Web.
It can be used in multiple computing environments such as multi-cluster, multi-core, and mobile environment.

Prerequisite
Before learning MapReduce, you must have the basic knowledge of Big Data.

Audience
Our MapReduce tutorial is designed to help beginners and professionals.

Problem
We assure that you will not find any problem in this MapReduce tutorial. But if there is any mistake, please post the problem in contact
form.
Data Flow In MapReduce
MapReduce is used to compute the huge amount of data . To handle the upcoming data in a parallel and distributed form, the data has to
flow from various phases.

Phases of MapReduce data flow


Input reader
The input reader reads the upcoming data and splits it into the data blocks of the appropriate size (64 MB to 128 MB). Each data block
is associated with a Map function.
Once input reads the data, it generates the corresponding key-value pairs. The input files reside in HDFS.

Map function
The map function process the upcoming key-value pairs and generated the corresponding output key-value pairs. The map input and
output type may be different from each other.

Partition function
The partition function assigns the output of each Map function to the appropriate reducer. The available key and value provide this
function. It returns the index of reducers.

Shuffling and Sorting


The data are shuffled between/within nodes so that it moves out from the map and get ready to process for reduce function. Sometimes,
the shuffling of data can take much computation time.
The sorting operation is performed on input data for Reduce function. Here, the data is compared using comparison function and
arranged in a sorted form.

Reduce function
The Reduce function is assigned to each unique key. These keys are already arranged in sorted order. The values associated with the
keys can iterate the Reduce and generates the corresponding output.

Output writer
Once the data flow from all the above phases, Output writer executes. The role of Output writer is to write the Reduce output to the
stable storage.
MapReduce API
In this section, we focus on MapReduce APIs. Here, we learn about the classes and methods used in MapReduce programming.

MapReduce Mapper Class


In MapReduce, the role of the Mapper class is to map the input key-value pairs to a set of intermediate key-value pairs. It transforms the
input records into intermediate records.
These intermediate records associated with a given output key and passed to Reducer for the final output.

Methods of Mapper Class


void cleanup(Context context)-This method called only once at the end of the task.
void map(KEYIN key, VALUEIN value, Context context)-This method can be called only once for each key-value in the input
split.
void run(Context context)-This method can be override to control the execution of the Mapper.
void setup(Context context)-This method called only once at the beginning of the task.

MapReduce Reducer Class


In MapReduce, the role of the Reducer class is to reduce the set of intermediate values. Its implementations can access the
Configuration for the job via the JobContext.getConfiguration() method.

Methods of Reducer Class


void cleanup(Context context)-This method called only once at the end of the task.
void map(KEYIN key, Iterable<VALUEIN> values, Context context)-This method called only once for each key.
void run(Context context)-This method can be used to control the tasks of the Reducer.
void setup(Context context)-This method called only once at the beginning of the task.

MapReduce Job Class


The Job class is used to configure the job and submits it. It also controls the execution and query the state. Once the job is submitted,
the set method throws IllegalStateException.

Methods of Job Class


Methods-Description
Counters getCounters()-This method is used to get the counters for the job.
long getFinishTime()-This method is used to get the finish time for the job.
Job getInstance()-This method is used to generate a new Job without any cluster.
Job getInstance(Configuration conf)-This method is used to generate a new Job without any cluster and provided configuration.
Job getInstance(Configuration conf, String jobName)-This method is used to generate a new Job without any cluster and provided
configuration and job name.
String getJobFile()-This method is used to get the path of the submitted job configuration.
String getJobName()-This method is used to get the user-specified job name.
JobPriority getPriority()-This method is used to get the scheduling function of the job.
void setJarByClass(Class<?> c)-This method is used to set the jar by providing the class name with .class extension.
void setJobName(String name)-This method is used to set the user-specified job name.
void setMapOutputKeyClass(Class<?> class)-This method is used to set the key class for the map output data.
void setMapOutputValueClass(Class<?> class)-This method is used to set the value class for the map output data.
void setMapperClass(Class<? extends Mapper> class)-This method is used to set the Mapper for the job.
void setNumReduceTasks(int tasks)-This method is used to set the number of reduce tasks for the job
void setReducerClass(Class<? extends Reducer> class)-This method is used to set the Reducer for the job.
MapReduce Word Count Example
In MapReduce word count example, we find out the frequency of each word. Here, the role of Mapper is to map the keys to the existing
values and the role of Reducer is to aggregate the keys of common values. So, everything is represented in the form of Key-value pair.

Pre-requisite
Java Installation - Check whether the Java is installed or not using the following command.
java -version
Hadoop Installation - Check whether the Hadoop is installed or not using the following command.
hadoop version

Steps to execute MapReduce word count example


Create a text file in your local machine and write some text into it.
$ nano data.txt

Check the text written in the data.txt file.


$ cat data.txt
In this example, we find out the frequency of each word exists in this text file.
Create a directory in HDFS, where to kept text file.
$ hdfs dfs -mkdir /test
Upload the data.txt file on HDFS in the specific directory.
$ hdfs dfs -put /home/codegyani/data.txt /test

Write the MapReduce program using eclipse.

File: WC_Mapper.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import java.util.StringTokenizer;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.LongWritable;
7. import org.apache.hadoop.io.Text;
8. import org.apache.hadoop.mapred.MapReduceBase;
9. import org.apache.hadoop.mapred.Mapper;
10. import org.apache.hadoop.mapred.OutputCollector;
11. import org.apache.hadoop.mapred.Reporter;
12. public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
13. private final static IntWritable one = new IntWritable(1);
14. private Text word = new Text();
15. public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,
16. Reporter reporter) throws IOException{
17. String line = value.toString();
18. StringTokenizer tokenizer = new StringTokenizer(line);
19. while (tokenizer.hasMoreTokens()){
20. word.set(tokenizer.nextToken());
21. output.collect(word, one);
22. }
23. }
24.
25. }

File: WC_Reducer.java
1. package com.javatpoint;
2. import java.io.IOException;
3. import java.util.Iterator;
4. import org.apache.hadoop.io.IntWritable;
5. import org.apache.hadoop.io.Text;
6. import org.apache.hadoop.mapred.MapReduceBase;
7. import org.apache.hadoop.mapred.OutputCollector;
8. import org.apache.hadoop.mapred.Reducer;
9. import org.apache.hadoop.mapred.Reporter;
10.
11. public class WC_Reducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {
12. public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,
13. Reporter reporter) throws IOException {
14. int sum=0;
15. while (values.hasNext()) {
16. sum+=values.next().get();
17. }
18. output.collect(key,new IntWritable(sum));
19. }
20. }

File: WC_Runner.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import org.apache.hadoop.fs.Path;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.Text;
7. import org.apache.hadoop.mapred.FileInputFormat;
8. import org.apache.hadoop.mapred.FileOutputFormat;
9. import org.apache.hadoop.mapred.JobClient;
10. import org.apache.hadoop.mapred.JobConf;
11. import org.apache.hadoop.mapred.TextInputFormat;
12. import org.apache.hadoop.mapred.TextOutputFormat;
13. public class WC_Runner {
14. public static void main(String[] args) throws IOException{
15. JobConf conf = new JobConf(WC_Runner.class);
16. conf.setJobName("WordCount");
17. conf.setOutputKeyClass(Text.class);
18. conf.setOutputValueClass(IntWritable.class);
19. conf.setMapperClass(WC_Mapper.class);
20. conf.setCombinerClass(WC_Reducer.class);
21. conf.setReducerClass(WC_Reducer.class);
22. conf.setInputFormat(TextInputFormat.class);
23. conf.setOutputFormat(TextOutputFormat.class);
24. FileInputFormat.setInputPaths(conf,new Path(args[0]));
25. FileOutputFormat.setOutputPath(conf,new Path(args[1]));
26. JobClient.runJob(conf);
27. }
28. }

Download the source code.


Create the jar file of this program and name it countworddemo.jar.
Run the jar file
hadoop jar /home/codegyani/wordcountdemo.jar com.javatpoint.WC_Runner /test/data.txt /r_output
The output is stored in /r_output/part-00000
Now execute the command to see the output.
hdfs dfs -cat /r_output/part-00000
MapReduce Char Count Example
In MapReduce char count example, we find out the frequency of each character. Here, the role of Mapper is to map the keys to the
existing values and the role of Reducer is to aggregate the keys of common values. So, everything is represented in the form of Key-
value pair.

Pre-requisite
Java Installation - Check whether the Java is installed or not using the following command.
java -version
Hadoop Installation - Check whether the Hadoop is installed or not using the following command.
hadoop version

Steps to execute MapReduce char count example


Create a text file in your local machine and write some text into it.
$ nano info.txt
Check the text written in the info.txt file.
$ cat info.txt

In this example, we find out the frequency of each char value exists in this text file.
Create a directory in HDFS, where to kept text file.
$ hdfs dfs -mkdir /count
Upload the info.txt file on HDFS in the specific directory.
$ hdfs dfs -put /home/codegyani/info.txt /count

Write the MapReduce program using eclipse.

File: WC_Mapper.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import org.apache.hadoop.io.IntWritable;
5. import org.apache.hadoop.io.LongWritable;
6. import org.apache.hadoop.io.Text;
7. import org.apache.hadoop.mapred.MapReduceBase;
8. import org.apache.hadoop.mapred.Mapper;
9. import org.apache.hadoop.mapred.OutputCollector;
10. import org.apache.hadoop.mapred.Reporter;
11. public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
12. public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,
13. Reporter reporter) throws IOException{
14. String line = value.toString();
15. String tokenizer[] = line.split("");
16. for(String SingleChar : tokenizer)
17. {
18. Text charKey = new Text(SingleChar);
19. IntWritable One = new IntWritable(1);
20. output.collect(charKey, One);
21. }
22. }
23.
24. }

File: WC_Reducer.java
1. package com.javatpoint;
2. import java.io.IOException;
3. import java.util.Iterator;
4. import org.apache.hadoop.io.IntWritable;
5. import org.apache.hadoop.io.Text;
6. import org.apache.hadoop.mapred.MapReduceBase;
7. import org.apache.hadoop.mapred.OutputCollector;
8. import org.apache.hadoop.mapred.Reducer;
9. import org.apache.hadoop.mapred.Reporter;
10.
11. public class WC_Reducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {
12. public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,
13. Reporter reporter) throws IOException {
14. int sum=0;
15. while (values.hasNext()) {
16. sum+=values.next().get();
17. }
18. output.collect(key,new IntWritable(sum));
19. }
20. }

File: WC_Runner.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import org.apache.hadoop.fs.Path;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.Text;
7. import org.apache.hadoop.mapred.FileInputFormat;
8. import org.apache.hadoop.mapred.FileOutputFormat;
9. import org.apache.hadoop.mapred.JobClient;
10. import org.apache.hadoop.mapred.JobConf;
11. import org.apache.hadoop.mapred.TextInputFormat;
12. import org.apache.hadoop.mapred.TextOutputFormat;
13. public class WC_Runner {
14. public static void main(String[] args) throws IOException{
15. JobConf conf = new JobConf(WC_Runner.class);
16. conf.setJobName("CharCount");
17. conf.setOutputKeyClass(Text.class);
18. conf.setOutputValueClass(IntWritable.class);
19. conf.setMapperClass(WC_Mapper.class);
20. conf.setCombinerClass(WC_Reducer.class);
21. conf.setReducerClass(WC_Reducer.class);
22. conf.setInputFormat(TextInputFormat.class);
23. conf.setOutputFormat(TextOutputFormat.class);
24. FileInputFormat.setInputPaths(conf,new Path(args[0]));
25. FileOutputFormat.setOutputPath(conf,new Path(args[1]));
26. JobClient.runJob(conf);
27. }
28. }

Download the source code.


Create the jar file of this program and name it charcountdemo.jar.
Run the jar file
hadoop jar /home/codegyani/charcountdemo.jar com.javatpoint.WC_Runner /count/info.txt /char_output
The output is stored in /char_output/part-00000
Now execute the command to see the output.
hdfs dfs -cat /r_output/part-00000
HBase
HBase tutorial provides basic and advanced concepts of HBase. Our HBase tutorial is designed for beginners and professionals.
Hbase is an open source framework provided by Apache. It is a sorted map data built on Hadoop. It is column oriented and horizontally
scalable.
Our HBase tutorial includes all topics of Apache HBase with HBase Data model, HBase Read, HBase Write, HBase MemStore, HBase
Installation, RDBMS vs HBase, HBase Commands, HBase Example etc.

Prerequisite
Before learning HBase, you must have the knowledge of Hadoop and Java.

Audience
Our HBase tutorial is designed to help beginners and professionals.

Problem
We assure that you will not find any problem in this HBase tutorial. But if there is any mistake, please post the problem in contact form.
What is HBase
Hbase is an open source and sorted map data built on Hadoop. It is column oriented and horizontally scalable.
It is based on Google's Big Table.It has set of tables which keep data in key value format. Hbase is well suited for sparse data sets which
are very common in big data use cases. Hbase provides APIs enabling development in practically any programming language. It is a
part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.

Why HBase
RDBMS get exponentially slow as the data becomes large
Expects data to be highly structured, i.e. ability to fit in a well-defined schema
Any change in schema might require a downtime
For sparse datasets, too much of overhead of maintaining NULL values

Features of Hbase
Horizontally scalable: You can add any number of columns anytime.
Automatic Failover: Automatic failover is a resource that allows a system administrator to automatically switch data
handling to a standby system in the event of system compromise
Integrations with Map/Reduce framework: Al the commands and java codes internally implement Map/ Reduce to do
the task and it is built over Hadoop Distributed File System.
sparse, distributed, persistent, multidimensional sorted map, which is indexed by rowkey, column key,and timestamp.
Often referred as a key value store or column family-oriented database, or storing versioned maps of maps.
fundamentally, it's a platform for storing and retrieving data with random access.
It doesn't care about datatypes(storing an integer in one row and a string in another for the same column).
It doesn't enforce relationships within your data.
It is designed to run on a cluster of computers, built using commodity hardware.
HBase Read
A read against HBase must be reconciled between the HFiles, MemStore & BLOCKCACHE.The BlockCache is designed to keep
frequently accessed data from the HFiles in memory so as to avoid disk reads.Each column family has its own BlockCache.BlockCache
contains data in form of 'block', as unit of data that HBase reads from disk in a single pass.The HFile is physically laid out as a sequence
of blocks plus an index over those blocks. This means reading a block from HBase requires only looking up that block's location in the
index and retrieving it from disk.
Block: It is the smallest indexed unit of data and is the smallest unit of data that can be read from disk. default size 64KB.
Scenario, when smaller block size is preferred: To perform random lookups. Having smaller blocks creates a larger index and
thereby consumes more memory.
Scenario, when larger block size is preferred: To perform sequential scans frequently. This allows you to save on memory because
larger blocks mean fewer index entries and thus a smaller index.
Reading a row from HBase requires first checking the MemStore, then the BlockCache, Finally, HFiles on disk are accessed.
HBase Write
When a write is made, by default, it goes into two places:
write-ahead log (WAL), HLog, and
in-memory write buffer, MemStore.

Clients don't interact directly with the underlying HFiles during writes, rather writes goes to WAL & MemStore in parallel. Every write
to HBase requires confirmation from both the WAL and the MemStore.

HBase MemStore
The MemStore is a write buffer where HBase accumulates data in memory before a permanent write.
Its contents are flushed to disk to form an HFile when the MemStore fills up.
It doesn't write to an existing HFile but instead forms a new file on every flush.
The HFile is the underlying storage format for HBase.
HFiles belong to a column family(one MemStore per column family). A column family can have multiple HFiles, but
the reverse isn't true.
size of the MemStore is defined in hbase-site.xml called hbase.hregion.memstore.flush.size.

What happens, when the server hosting a MemStore that has not yet been flushed crashes?
Every server in HBase cluster keeps a WAL to record changes as they happen. The WAL is a file on the underlying file system.A write
isn't considered successful until the new WAL entry is successfully written, this guarantees durability.
If HBase goes down, the data that was not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL, taken
care by Hbase framework.
HBase Installation
The prerequisite for HBase installation are Java and Hadoop installed on your Linux machine.
Hbase can be installed in three modes: standalone, Pseudo Distributed mode and Fully Distributed mode.
Download the Hbase package from http://www.interior-dsgn.com/apache/hbase/stable/ and unzip it with the below commands.

1. $cd usr/local/$wget http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.8-hadoop2-bin.tar.gz


2. $tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz

Login as super user as shown below

1. $su
2. $password: enter your password here
3. mv hbase-0.99.1/* Hbase/

Configuring HBase in Standalone Mode


Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME environment variable and change the
existing path to your current JAVA_HOME variable as shown below.

1. cd /usr/local/Hbase/conf
2. gedit hbase-env.sh

Replace the existing JAVA_HOME value with your current value as shown below.

1. export JAVA_HOME=/usr/lib/jvm/java-1.7.0

Inside /usr/local/Hbase you will find hbase-site.xml. Open it and within configuration add the below code.

1. <configuration>
2. //Here you have to set the path where you want HBase to store its files.
3. <property>
4. <name>hbase.rootdir</name>
5. <value>file:/home/hadoop/HBase/HFiles</value>
6. </property>
7.
8. //Here you have to set the path where you want HBase to store its built in zookeeper files.
9. <property>
10. <name>hbase.zookeeper.property.dataDir</name>
11. <value>/home/hadoop/zookeeper</value>
12. </property>
13. </configuration>

Now start the Hbase by running the start-hbase.sh present in the bin folder of Hbase.

1. $cd /usr/local/HBase/bin
2. $./start-hbase.sh

Cloudera VM is recommended as it has Hbase preinstalled on it.


Starting Hbase: Type Hbase shell in terminal to start the hbase.
RDBMS vs HBase
There differences between RDBMS and HBase are given below.
Schema/Database in RDBMS can be compared to namespace in Hbase.
A table in RDBMS can be compared to column family in Hbase.
A record (after table joins) in RDBMS can be compared to a record in Hbase.
A collection of tables in RDBMS can be compared to a table in Hbase
HBase Commands
A list of HBase commands are given below.
Create: Creates a new table identified by 'table1' and Column Family identified by 'colf'.
Put: Inserts a new record into the table with row identified by 'row..'
Scan: returns the data stored in table
Get: Returns the records matching the row identifier provided in the table
Help: Get a list of commands

1. create 'table1', 'colf'


2. list 'table1'
3. put 'table1', 'row1', 'colf:a', 'value1'
4. put 'table1', 'row1', 'colf:b', 'value2'
5. put 'table1', 'row2', 'colf:a', 'value3'
6. scan 'table1'
7. get 'table1', 'row1'
HBase Example
Let's see a HBase example to import data of a file in HBase table.

Use Case
We have to import data present in the file into an HBase table by creating it through Java API.
Data_file.txt contains the below data

1. 1,India,Bihar,Champaran,2009,April,P1,1,5
2. 2,India, Bihar,Patna,2009,May,P1,2,10
3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15
4. 4,United States,California,Fresno,2009,April,P2,2,5
5. 5,United States,California,Long Beach,2010,July,P2,4,10
6. 6,United States,California,San Francisco,2011,August,P1,6,20

The Java code is shown below


This data has to be inputted into a new HBase table to be created through JAVA API. Following column families have to be created

1. "sample,region,time.product,sale,profit".

Column family region has three column qualifiers: country, state, city
Column family Time has two column qualifiers: year, month

Jar Files
Make sure that the following jars are present while writing the code as they are required by the HBase.
a. commons-loging-1.0.4
b. commons-loging-api-1.0.4
c. hadoop-core-0.20.2-cdh3u2
d. hbase-0.90.4-cdh3u2
e. log4j-1.2.15
f. zookeper-3.3.3-cdh3u0

Program Code
1. import java.io.BufferedReader;
2. import java.io.File;
3. import java.io.FileReader;
4. import java.io.IOException;
5. import java.util.StringTokenizer;
6.
7. import org.apache.hadoop.conf.Configuration;
8. import org.apache.hadoop.hbase.HBaseConfiguration;
9. import org.apache.hadoop.hbase.HColumnDescriptor;
10. import org.apache.hadoop.hbase.HTableDescriptor;
11. import org.apache.hadoop.hbase.client.HBaseAdmin;
12. import org.apache.hadoop.hbase.client.HTable;
13. import org.apache.hadoop.hbase.client.Put;
14. import org.apache.hadoop.hbase.util.Bytes;
15.
16.
17. public class readFromFile {
18. public static void main(String[] args) throws IOException{
19. if(args.length==1)
20. {
21. Configuration conf = HBaseConfiguration.create(new Configuration());
22. HBaseAdmin hba = new HBaseAdmin(conf);
23. if(!hba.tableExists(args[0])){
24. HTableDescriptor ht = new HTableDescriptor(args[0]);
25. ht.addFamily(new HColumnDescriptor("sample"));
26. ht.addFamily(new HColumnDescriptor("region"));
27. ht.addFamily(new HColumnDescriptor("time"));
28. ht.addFamily(new HColumnDescriptor("product"));
29. ht.addFamily(new HColumnDescriptor("sale"));
30. ht.addFamily(new HColumnDescriptor("profit"));
31. hba.createTable(ht);
32. System.out.println("New Table Created");
33.
34. HTable table = new HTable(conf,args[0]);
35.
36. File f = new File("/home/training/Desktop/data");
37. BufferedReader br = new BufferedReader(new FileReader(f));
38. String line = br.readLine();
39. int i =1;
40. String rowname="row";
41. while(line!=null && line.length()!=0){
42. System.out.println("Ok till here");
43. StringTokenizer tokens = new StringTokenizer(line,",");
44. rowname = "row"+i;
45. Put p = new Put(Bytes.toBytes(rowname));
46. p.add(Bytes.toBytes("sample"),Bytes.toBytes("sampleNo."),
47. Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
48. p.add(Bytes.toBytes("region"),Bytes.toBytes("country"),Bytes.toBytes(tokens.nextToken()));
49. p.add(Bytes.toBytes("region"),Bytes.toBytes("state"),Bytes.toBytes(tokens.nextToken()));
50. p.add(Bytes.toBytes("region"),Bytes.toBytes("city"),Bytes.toBytes(tokens.nextToken()));
51. p.add(Bytes.toBytes("time"),Bytes.toBytes("year"),Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
52. p.add(Bytes.toBytes("time"),Bytes.toBytes("month"),Bytes.toBytes(tokens.nextToken()));
53. p.add(Bytes.toBytes("product"),Bytes.toBytes("productNo."),Bytes.toBytes(tokens.nextToken()));
54. p.add(Bytes.toBytes("sale"),Bytes.toBytes("quantity"),Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
55. p.add(Bytes.toBytes("profit"),Bytes.toBytes("earnings"),Bytes.toBytes(tokens.nextToken()));
56. i++;
57. table.put(p);
58. line = br.readLine();
59. }
60. br.close();
61. table.close();
62. }
63. else
64. System.out.println("Table Already exists.Please enter another table name");
65. }
66. else
67. System.out.println("Please Enter the table name through command line");
68. }
69. }
Hive Tutorial
Hive tutorial provides basic and advanced concepts of Hive. Our Hive tutorial is designed for beginners and professionals.
Apache Hive is a data ware house system for Hadoop that runs SQL like queries called HQL (Hive query language) which gets
internally converted to map reduce jobs. Hive was developed by Facebook. It supports Data definition Language, Data Manipulation
Language and user defined functions.
Our Hive tutorial includes all topics of Apache Hive with Hive Installation, Hive Data Types, Hive Table partitioning, Hive DDL
commands, Hive DML commands, Hive sort by vs order by, Hive Joining tables etc.

Prerequisite
Before learning Hive, you must have the knowledge of Hadoop and Java.

Audience
Our Hive tutorial is designed to help beginners and professionals.

Problem
We assure that you will not find any problem in this Hive tutorial. But if there is any mistake, please post the problem in contact form.
What is HIVE
Hive is a data warehouse system which is used to analyze structured data. It is built on the top of Hadoop. It was developed by
Facebook.
Hive provides the functionality of reading, writing, and managing large datasets residing in distributed storage. It runs SQL like queries
called HQL (Hive query language) which gets internally converted to MapReduce jobs.
Using Hive, we can skip the requirement of the traditional approach of writing complex MapReduce programs. Hive supports Data
Definition Language (DDL), Data Manipulation Language (DML), and User Defined Functions (UDF).

Features of Hive
These are the following features of Hive:
Hive is fast and scalable.
It provides SQL-like queries (i.e., HQL) that are implicitly transformed to MapReduce or Spark jobs.
It is capable of analyzing large datasets stored in HDFS.
It allows different storage types such as plain text, RCFile, and HBase.
It uses indexing to accelerate queries.
It can operate on compressed data stored in the Hadoop ecosystem.
It supports user-defined functions (UDFs) where user can provide its functionality.

Limitations of Hive
Hive is not capable of handling real-time data.
It is not designed for online transaction processing.
Hive queries contain high latency.

Differences between Hive and Pig


Hive-Pig
Hive is commonly used by Data Analysts.-Pig is commonly used by programmers.
It follows SQL-like queries.-It follows the data-flow language.
It can handle structured data.-It can handle semi-structured data.
It works on server-side of HDFS cluster.-It works on client-side of HDFS cluster.
Hive is slower than Pig.-Pig is comparatively faster than Hive.
Hive Architecture
The following architecture explains the flow of submission of query into Hive.

Hive Client
Hive allows writing applications in various languages, including Java, Python, and C++. It supports different types of clients such as:-
Thrift Server - It is a cross-language service provider platform that serves the request from all those programming
languages that supports Thrift.
JDBC Driver - It is used to establish a connection between hive and Java applications. The JDBC Driver is present in the
class org.apache.hadoop.hive.jdbc.HiveDriver.
ODBC Driver - It allows the applications that support the ODBC protocol to connect to Hive.

Hive Services
The following are the services provided by Hive:-
Hive CLI - The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries and commands.
Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI. It provides a web-based GUI for
executing Hive queries and commands.
Hive MetaStore - It is a central repository that stores all the structure information of various tables and partitions in the
warehouse. It also includes metadata of column and its type information, the serializers and deserializers which is used
to read and write data and the corresponding HDFS files where the data is stored.
Hive Server - It is referred to as Apache Thrift Server. It accepts the request from different clients and provides it to
Hive Driver.
Hive Driver - It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC driver. It transfers
the queries to the compiler.
Hive Compiler - The purpose of the compiler is to parse the query and perform semantic analysis on the different query
blocks and expressions. It converts HiveQL statements into MapReduce jobs.
Hive Execution Engine - Optimizer generates the logical plan in the form of DAG of map-reduce tasks and HDFS tasks.
In the end, the execution engine executes the incoming tasks in the order of their dependencies.
Apache Hive Installation
In this section, we will perform the Hive installation.

Pre-requisite
Java Installation - Check whether the Java is installed or not using the following command.

1. $ java -version

Hadoop Installation - Check whether the Hadoop is installed or not using the following command.

1. $hadoop version

If any of them is not installed in your system, follow the below link to install Click Here to Install.

Steps to install Apache Hive


Download the Apache Hive tar file.
http://mirrors.estointernet.in/apache/hive/hive-1.2.2/
DUnzip the downloaded tar file.

1. tar -xvf apache-hive-1.2.2-bin.tar.gz

DOpen the bashrc file.

1. $ sudo nano ~/.bashrc

DNow, provide the following HIVE_HOME path.

1. export HIVE_HOME=/home/codegyani/apache-hive-1.2.2-bin
2. export PATH=$PATH:/home/codegyani/apache-hive-1.2.2-bin/bin

DUpdate the environment variable.

1. $ source ~/.bashrc

DLet's start the hive by providing the following command.

1. $ hive
HIVE Data Types
Hive data types are categorized in numeric types, string types, misc types, and complex types. A list of Hive data types is given below.

Integer Types
Type-Size-Range
TINYINT-1-byte signed integer--128 to 127
SMALLINT-2-byte signed integer-32,768 to 32,767
INT-4-byte signed integer-2,147,483,648 to 2,147,483,647
BIGINT-8-byte signed integer--9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Decimal Type
Type-Size-Range
FLOAT-4-byte-Single precision floating point number
DOUBLE-8-byte-Double precision floating point number

Date/Time Types
TIMESTAMP
It supports traditional UNIX timestamp with optional nanosecond precision.
As Integer numeric type, it is interpreted as UNIX timestamp in seconds.
As Floating point numeric type, it is interpreted as UNIX timestamp in seconds with decimal precision.
As string, it follows java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.fffffffff" (9 decimal place precision)

DATES
The Date value is used to specify a particular year, month and day, in the form YYYY--MM--DD. However, it didn't provide the time
of the day. The range of Date type lies between 0000--01--01 to 9999--12--31.

String Types
STRING
The string is a sequence of characters. It values can be enclosed within single quotes (') or double quotes (").
Varchar
The varchar is a variable length type whose range lies between 1 and 65535, which specifies that the maximum number of characters
allowed in the character string.
CHAR
The char is a fixed-length type whose maximum length is fixed at 255.

Complex Type
Type-Size-Range
Struct-It is similar to C struct or an object where fields are accessed using the "dot" notation.-struct('James','Roy')
Map-It contains the key-value tuples where the fields are accessed using array notation.-map('first','James','last','Roy')
Array-It is a collection of similar type of values that indexable using zero-based integers.-array('James','Roy')
Hive - Create Database
In Hive, the database is considered as a catalog or namespace of tables. So, we can maintain multiple tables within a database where a
unique name is assigned to each table. Hive also provides a default database with a name default.
Initially, we check the default database provided by Hive. So, to check the list of existing databases, follow the below
command: -

1. hive> show databases;

Here, we can see the existence of a default database provided by Hive.


Let's create a new database by using the following command: -

1. hive> create database demo;

So, a new database is created.


Let's check the existence of a newly created database.

1. hive> show databases;

Each database must contain a unique name. If we create two databases with the same name, the following error generates:

If we want to suppress the warning generated by Hive on creating the database with the same name, follow the below
command: -

1. hive> create a database if not exists demo;


Hive also allows assigning properties with the database in the form of key-value pair.

1. hive>create the database demo


2. >WITH DBPROPERTIES ('creator' = 'Gaurav Chawla', 'date' = '2019-06-03');

Let's retrieve the information associated with the database.

1. hive> describe database extended demo;


Hive - Drop Database
In this section, we will see various ways to drop the existing database.
Let's check the list of existing databases by using the following command: -

1. hive> show databases;

Now, drop the database by using the following command.

1. hive> drop database demo;

Let's check whether the database is dropped or not.

1. hive> show databases;

As we can see, the database demo is not present in the list. Hence, the database is dropped successfully.
If we try to drop the database that doesn't exist, the following error generates:

However, if we want to suppress the warning generated by Hive on creating the database with the same name, follow the
below command:-

1. hive> drop database if exists demo;

In Hive, it is not allowed to drop the database that contains the tables directly. In such a case, we can drop the database
either by dropping tables first or use Cascade keyword with the command.
Let's see the cascade command used to drop the database:-

1. hive> drop database if exists demo cascade;

This command automatically drops the tables present in the database first.
Hive - Create Table
In Hive, we can create a table by using the conventions similar to the SQL. It supports a wide range of flexibility where the data files for
tables are stored. It provides two types of table: -
Internal table
External table

Internal Table
The internal tables are also called managed tables as the lifecycle of their data is controlled by the Hive. By default, these tables are
stored in a subdirectory under the directory defined by hive.metastore.warehouse.dir (i.e. /user/hive/warehouse). The internal tables are
not flexible enough to share with other tools like Pig. If we try to drop the internal table, Hive deletes both table schema and data.
Let's create an internal table by using the following command:-

1. hive> create table demo.employee (Id int, Name string , Salary float)
2. row format delimited
3. fields terminated by ',' ;

Here, the command also includes the information that the data is separated by ','.
Let's see the metadata of the created table by using the following command:-

1. hive> describe demo.employee

External Table
The external table allows us to create and access a table and a data externally. The external keyword is used to specify the external
table, whereas the location keyword is used to determine the location of loaded data.
As the table is external, the data is not present in the Hive directory. Therefore, if we try to drop the table, the metadata of the table will
be deleted, but the data still exists.
To create an external table, follow the below steps: -
Let's create a directory on HDFS by using the following command: -

1. hdfs dfs -mkdir /HiveDirectory

Now, store the file on the created directory.

1. hdfs dfs -put hive/emp_details /HiveDirectory

Let's create an external table using the following command: -

1. hive> create external table emplist (Id int, Name string , Salary float)
2. row format delimited
3. fields terminated by ','
4. location '/HiveDirectory';
Hive - Load Data
Once the internal table has been created, the next step is to load the data into it. So, in Hive, we can easily load data from any file to the
database.
Let's load the data of the file into the database by using the following command: -

1. load data local inpath '/home/codegyani/hive/emp_details' into table demo.employee;

Here, emp_details is the file name that contains the data.


Now, we can use the following command to retrieve the data from the database.

1. select * from demo.employee;

If we want to add more data into the current database, execute the same query again by just updating the new file name.

1. load data local inpath '/home/codegyani/hive/emp_details1' into table demo.employee;

In Hive, if we try to load unmatched data (i.e., one or more column data doesn't match the data type of specified table
columns), it will not throw any exception. However, it stores the Null value at the position of unmatched tuple.
Let's add one more file to the current table. This file contains the unmatched data.

Here, the third column contains the data of string type, and the table allows the float type data. So, this condition arises in an unmatched
data situation.
Now, load the data into the table.

1. load data local inpath '/home/codegyani/hive/emp_details2' into table demo.employee;

Here, data loaded successfully.


Let's fetch the records of the table.

1. select * from demo.employee

Here, we can see the Null values at the position of unmatched data.
Partitioning in Hive
The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city or
country. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.
As we know that Hadoop is used to handle the huge amount of data, it is always required to use the best approach to deal with it. The
partitioning in Hive is the best example of it.
Let's assume we have a data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If
we use a traditional approach, we have to go through the entire data. This leads to performance degradation. In such a case, we can
adopt the better approach i.e., partitioning in Hive and divide the data among the different datasets based on particular columns.
The partitioning in Hive can be executed in two ways -
Static partitioning
Dynamic partitioning

Static Partitioning
In static or manual partitioning, it is required to pass the values of partitioned columns manually while loading the data into the table.
Hence, the data file doesn't contain the partitioned columns.
Example of Static Partitioning
First, select the database in which we want to create a table.

1. hive> use test;

Create the table and provide the partitioned columns by using the following command: -

1. hive> create table student (id int, name string, age int, institute string)
2. partitioned by (course string)
3. row format delimited
4. fields terminated by ',';

Let's retrieve the information associated with the table.

1. hive> describe student;

Load the data into the table and pass the values of partition columns with it by using the following command: -

1. hive> load data local inpath '/home/codegyani/hive/student_details1' into table student


2. partition(course= "java");

Here, we are partitioning the students of an institute based on courses.


Load the data of another file into the same table and pass the values of partition columns with it by using the following
command: -

1. hive> load data local inpath '/home/codegyani/hive/student_details2' into table student


2. partition(course= "hadoop");

Let's retrieve the entire data of the able by using the following command: -

1. hive> select * from student;

Now, try to retrieve the data based on partitioned columns by using the following command: -

1. hive> select * from student where course="java";

In this case, we are not examining the entire data. Hence, this approach improves query response time.
Let's also retrieve the data of another partitioned dataset by using the following command: -

1. hive> select * from student where course= "hadoop";


Dynamic Partitioning
In dynamic partitioning, the values of partitioned columns exist within the table. So, it is not required to pass the values of partitioned
columns manually.
First, select the database in which we want to create a table.

1. hive> use show;

Enable the dynamic partition by using the following commands: -

1. hive> set hive.exec.dynamic.partition=true;


2. hive> set hive.exec.dynamic.partition.mode=nonstrict;

Create a dummy table to store the data.

1. hive> create table stud_demo(id int, name string, age int, institute string, course string)
2. row format delimited
3. fields terminated by ',';

Now, load the data into the table.

1. hive> load data local inpath '/home/codegyani/hive/student_details' into table stud_demo;

Create a partition table by using the following command: -

1. hive> create table student_part (id int, name string, age int, institute string)
2. partitioned by (course string)
3. row format delimited
4. fields terminated by ',';

Now, insert the data of dummy table into the partition table.

1. hive> insert into student_part


2. partition(course)
3. select id, name, age, institute, course
4. from stud_demo;
Hadoop Interview Questions
There is given Hadoop interview questions and answers that have been asked in many companies. Let's see the list of top Hadoop
interview questions.

1) What is Hadoop?
Hadoop is a distributed computing platform. It is written in Java. It consists of the features like Google File System and MapReduce.

2) What platform and Java version are required to run Hadoop?


Java 1.6.x or higher versions are good for Hadoop, preferably from Sun. Linux and Windows are the supported operating system for
Hadoop, but BSD, Mac OS/X, and Solaris are more famous for working.

3) What kind of Hardware is best for Hadoop?


Hadoop can run on a dual processor/ dual core machines with 4-8 GB RAM using ECC memory. It depends on the workflow needs.

4) What are the most common input formats defined in Hadoop?


These are the most common input formats defined in Hadoop:
1. TextInputFormat
2. KeyValueInputFormat
3. SequenceFileInputFormat

TextInputFormat is a by default input format.

5) How do you categorize a big data?


The big data can be categorized using the following features:
Volume
Velocity
Variety

6) Explain the use of .mecia class?


For the floating of media objects from one side to another, we use this class.

7) Give the use of the bootstrap panel.


We use panels in bootstrap from the boxing of DOM components.

8) What is the purpose of button groups?


Button groups are used for the placement of more than one buttons in the same line.

9) Name the various types of lists supported by Bootstrap.


Ordered list
Unordered list
Definition list

10) Which command is used for the retrieval of the status of daemons running the
Hadoop cluster?
The 'jps' command is used for the retrieval of the status of daemons running the Hadoop cluster.

11) What is InputSplit in Hadoop? Explain.


When a Hadoop job runs, it splits input files into chunks and assigns each split to a mapper for processing. It is called the InputSplit.

12) What is TextInputFormat?


In TextInputFormat, each line in the text file is a record. Value is the content of the line while Key is the byte offset of the line. For
instance, Key: longWritable, Value: text
13) What is the SequenceFileInputFormat in Hadoop?
In Hadoop, SequenceFileInputFormat is used to read files in sequence. It is a specific compressed binary file format which passes data
between the output of one MapReduce job to the input of some other MapReduce job.

14) How many InputSplits is made by a Hadoop Framework?


Hadoop makes 5 splits as follows:
One split for 64K files
Two splits for 65MB files, and
Two splits for 127MB files

15) What is the use of RecordReader in Hadoop?


InputSplit is assigned with a work but doesn't know how to access it. The record holder class is totally responsible for loading the data
from its source and convert it into keys pair suitable for reading by the Mapper. The RecordReader's instance can be defined by the
Input Format.

16) What is JobTracker in Hadoop?


JobTracker is a service within Hadoop which runs MapReduce jobs on the cluster.

17) What is WebDAV in Hadoop?


WebDAV is a set of extension to HTTP which is used to support editing and uploading files. On most operating system WebDAV
shares can be mounted as filesystems, so it is possible to access HDFS as a standard filesystem by exposing HDFS over WebDAV.

18) What is Sqoop in Hadoop?


Sqoop is a tool used to transfer data between the Relational Database Management System (RDBMS) and Hadoop HDFS. By using
Sqoop, you can transfer data from RDBMS like MySQL or Oracle into HDFS as well as exporting data from HDFS file to RDBMS.

19) What are the functionalities of JobTracker?


These are the main tasks of JobTracker:
To accept jobs from the client.
To communicate with the NameNode to determine the location of the data.
To locate TaskTracker Nodes with available slots.
To submit the work to the chosen TaskTracker node and monitors the progress of each task.

20) Define TaskTracker.


TaskTracker is a node in the cluster that accepts tasks like MapReduce and Shuffle operations from a JobTracker.

21) What is Map/Reduce job in Hadoop?


Map/Reduce job is a programming paradigm which is used to allow massive scalability across the thousands of server.
MapReduce refers to two different and distinct tasks that Hadoop performs. In the first step maps jobs which takes the set of data and
converts it into another set of data and in the second step, Reduce job. It takes the output from the map as input and compresses those
data tuples into the smaller set of tuples.

22) What is "map" and what is "reducer" in Hadoop?


Map: In Hadoop, a map is a phase in HDFS query solving. A map reads data from an input location and outputs a key-value pair
according to the input type.
Reducer: In Hadoop, a reducer collects the output generated by the mapper, processes it, and creates a final output of its own.

23) What is shuffling in MapReduce?


Shuffling is a process which is used to perform the sorting and transfer the map outputs to the reducer as input.

24) What is NameNode in Hadoop?


NameNode is a node, where Hadoop stores all the file location information in HDFS (Hadoop Distributed File System). We can say that
NameNode is the centerpiece of an HDFS file system which is responsible for keeping the record of all the files in the file system, and
tracks the file data across the cluster or multiple machines.

25) What is heartbeat in HDFS?


Heartbeat is a signal which is used between a data node and name node, and between task tracker and job tracker. If the name node or
job tracker doesn't respond to the signal then it is considered that there is some issue with data node or task tracker.

26) How is indexing done in HDFS?


There is a very unique way of indexing in Hadoop. Once the data is stored as per the block size, the HDFS will keep on storing the last
part of the data which specifies the location of the next part of the data.

27) What happens when a data node fails?


If a data node fails the job tracker and name node will detect the failure. After that, all tasks are re-scheduled on the failed node and then
name node will replicate the user data to another node.

28) What is Hadoop Streaming?


Hadoop streaming is a utility which allows you to create and run map/reduce job. It is a generic API that allows programs written in any
languages to be used as Hadoop mapper.

29) What is a combiner in Hadoop?


A Combiner is a mini-reduce process which operates only on data generated by a Mapper. When Mapper emits the data, combiner
receives it as input and sends the output to a reducer.

30) What are the Hadoop's three configuration files?


Following are the three configuration files in Hadoop:
core-site.xml
mapred-site.xml
hdfs-site.xml

31) What are the network requirements for using Hadoop?


Following are the network requirement for using Hadoop:
Password-less SSH connection.
Secure Shell (SSH) for launching server processes.

32) What do you know by storage and compute node?


Storage node: Storage Node is the machine or computer where your file system resides to store the processing data.
Compute Node: Compute Node is a machine or computer where your actual business logic will be executed.

33) Is it necessary to know Java to learn Hadoop?


If you have a background in any programming language like C, C++, PHP, Python, Java, etc. It may be really helpful, but if you are nil
in java, it is necessary to learn Java and also get the basic knowledge of SQL.

34) How to debug Hadoop code?


There are many ways to debug Hadoop codes but the most popular methods are:
By using Counters.
By web interface provided by the Hadoop framework.

35) Is it possible to provide multiple inputs to Hadoop? If yes, explain.


Yes, It is possible. The input format class provides methods to insert multiple directories as input to a Hadoop job.

36) What is the relation between job and task in Hadoop?


In Hadoop, A job is divided into multiple small parts known as the task.
37) What is the difference between Input Split and HDFS Block?
The Logical division of data is called Input Split and physical division of data is called HDFS Block.

38) What is the difference between RDBMS and Hadoop?


RDBMS-Hadoop
RDBMS is a relational database management system.- Hadoop is a node based flat structure.
RDBMS is used for OLTP processing.- Hadoop is used for analytical and for big data processing.
In RDBMS, the database cluster uses the same data files stored in shared storage.- In Hadoop, the storage data can be stored
independently in each processing node.
In RDBMS, preprocessing of data is required before storing it.- In Hadoop, you don't need to preprocess data before storing it.

39) What is the difference between HDFS and NAS?


HDFS data blocks are distributed across local drives of all machines in a cluster whereas, NAS data is stored on dedicated hardware.

40) What is the difference between Hadoop and other data processing tools?
Hadoop facilitates you to increase or decrease the number of mappers without worrying about the volume of data to be processed.
PYTHON
FOR
BEGINNERS

LEARN TO CODE FAST


BY
TAM SEL
Python for Beginners
Python is a simple, easy to learn, powerful, high level and object-oriented programming language.
Python is an interpreted scripting language also. Guido Van Rossum is known as the founder of python programming.
Python is a general purpose, dynamic, high level and interpreted programming language. It supports Object Oriented programming
approach to develop applications. It is simple and easy to learn and provides lots of high-level data structures.
Python is easy to learn yet powerful and versatile scripting language which makes it attractive for Application Development.
Python's syntax and dynamic typing with its interpreted nature, makes it an ideal language for scripting and rapid application
development.
Python supports multiple programming pattern, including object oriented, imperative and functional or procedural programming styles.
Python is not intended to work on special area such as web programming. That is why it is known as multipurpose because it can be
used with web, enterprise, 3D CAD etc.

CHAPTER 1
Installing Python
Visit the link https://www.python.org/downloads/ to download the latest release of Python.

Double-click the executable file which is downloaded; the following window will open. Select Customize installation and proceed.
Now click Install Now.
When it finishes, you see a screen that says the Setup was successful.

Get Started with PyCharm


In our first program, we have used gedit on our CentOS as an editor. On Windows, we have an alternative like notepad or notepad++ to
edit the code. However, these editors are not used as IDE for python since they are unable to show the syntax related suggestions.
JetBrains provides the most popular and a widely used cross-platform IDE PyCharm to run the python programs.
PyCharm installation
As we have already stated, PyCharm is a cross-platform IDE, and hence it can be installed on a variety of the operating systems. In this
section of the tutorial, we will cover the installation process of PyCharm on Windows, MacOS, CentOS, and Ubuntu.
Windows
Installing PyCharm on Windows is very simple. To install PyCharm on Windows operating system, visit the
link https://www.jetbrains.com/pycharm/download/download-thanks.html?platform=windows to download the executable installer. Double click the installer
(.exe) file and install PyCharm by clicking next at each step.

Hello World: Create your First Python Program


In the last tutorial, we completed Python installation and setup. It's time to create first program.
Creating First Program
Step 1) Open PyCharm Editor. You see the introductory screen for PyCharm. To create a new project, click “Create New Project”.
Step 2)You will need select a location.
1. You can select the location where u want the project to be created. If u don’t want to change location than keep it as it is but at
least change the name from “untitled” to something more meaningful, like “FirstProject”.
2. PyCharm should have found Python interpreter you installed earlier.
3. Next Click “Create” Button.
Step 3) Now Go up to “File” menu and select “New”. Next , select “Python File”.

Step 4) A new pop up is appear. Next type the name of the file you want (Here we give “HelloWorld”) and hit “OK”.
Step 5) Next type a simple program - print (‘Hello World!’).

Step 6) Now Go up to “Run” menu and select “Run” to run your program.

Step 7) You can see output of your program at the bottom of the screen.

Step 8) Don't worry if u don't have Pycharm Editor installed, U can still run the code from the command prompt. Enter correct path of a
file in command prompt to run the program.

Python Main Function with Examples


Before we jump more Python coding, we should familiarize with Python Main function and its importance.
See the following code
def main():
print "hello world!"
print "world2020"
Here we got 2 pieces of print one is defined within a main function that is "Hello World" and the other one is independent which is
"world2020". When you run the function def main ():
● Only "world2020" prints out
● and not the code "Hello World."
It is because we did not declared the call function "if__name__== "__main__".
● When Python interpreter reads a source file, that will execute all the code found in it.
● When Python runs the "source file" as main program, that sets the special variable (__name__) to have a value ("__main__").
● When U execute the main function, it will then read the "if" statement and checks whether __name__ does equal to __main__.
● In Python "if__name__== "__main__" allows U to run the Python files either as reusable modules or standalone programs.
Like C, Python uses == for comparison while = for assignment. Python interpreter use the main function in two ways.
It is important that after defining the main function, we call the code by if__name__== "__main__" and then run the code, only then we
will get the output "hello world!" in the programming console as shown below.
Note: Make sure after defining a main function, you leave some indent and not declare the code right below the def main(): function
otherwise it will give indent error.
Python Variables
Variable is a name which is used to refer memory location. Variable also known as identifier and used to hold value.
In Python, we don't need to specify the type of variable because Python is a type infer language and smart enough to get variable type.
Variable names can be a group of both letters and digits, but they have to begin with a letter or an underscore.
It is recomended to use lowercase letters for variable name. Rahul and rahul both are two different variables.
Identifier Naming
Variables are the example of identifiers. An Identifier is used to identify the literals used in the program. The rules to name an identifier
are given below.
The first character of the variable must be an alphabet or underscore ( _ ).
All the characters except the first character may be an alphabet of lower-case(a-z), upper-case (A-Z), underscore or digit (0-9).
Identifier name must not contain any white-space, or special character (!, @, #, %, ^, &, *).
Identifier name must not be similar to any keyword defined in the language.
Identifier names are case sensitive for example my name, and MyName is not the same.
Examples of valid identifiers : a123, _n, n_9, etc.
Examples of invalid identifiers: 1a, n%4, n 9, etc.

Concatenate Variables
Let's see whether concatenate different data types like string and number together. For example, we concatenate "world" with the
number "2020".
Unlike Java, concatenates number with string without declaring number as string, Python requires declaring the number as string
otherwise it will show a TypeError
For the following code, you will get undefined output
a="world"
b = 2020
print a+b
Once the integer declared as string, it can concatenate both "world" + str("2020")= "world2020" in the output.
a="world"
b = 2020
print(a+str(b))

Local & Global Variables


In Python when we want to use the same variable for rest of your program or module you declare it a global variable, while if you want
to use the variable in a specific function or method, you use a local variable.
Let's understand this difference between local and global variable with below program.

1. Variable "f" is global in scope and it is assigned value 101 which is printed in output
2. Variable f is again declared in function and it assumes local scope. That is assigned value "I am learning Python." which
is printed out as output. This variable is different from the global variable "f" define earlier in this chapter
3. Once function call is over, the local variable f is destroyed. At line 12, when u again, print the value of "f" is it displays
the value of global variable f=101

Python 2 Example
# Declare a variable and initialize it
f = 101
print f
# Global vs. local variables in functions
def someFunction():
# global f
f = 'I am learning Python'
print f
someFunction()
print f

Python 3 Example
# Declare a variable and initialize it
f = 101
print(f)
# Global vs. local variables in functions
def someFunction():
# global f
f = 'I am learning Python'
print(f)
someFunction()
print(f)

Using keyword global, we can reference the global variable inside a function.

1. Variable "f" is global in scope and is assigned value 101 which printed in output
2. Variable f declared using the keyword global. This NOT a local variable, but the same global variable declared earlier.
Hence we print its value, the output is 101
3. We changed value of "f" inside the function. Once the function call over, the changed value of variable "f" persists. At
line 12, when we again, print value of "f" is it displays the value "changing global variable"

Python 2 Example

f = 101;
print f
# Global vs.local variables in functions
def someFunction():
global f
print f
f = "changing global variable"
someFunction()
print f

Python 3 Example
f = 101;
print(f)
# Global vs.local variables in functions
def someFunction():
global f
print(f)
f = "changing global variable"
someFunction()
print(f)

Delete a variable
We can also delete variable using the command del "variable name".
In this example below, we deleted variable f, and when we proceed to print it, we get error "variable name is not defined" which means
you have deleted the variable.

f = 11;
print(f)
del f
print(f)
Summary:
● Variables referred to "envelop" or "buckets" where information can be maintained and referenced. Like any other programming
language Python also uses variable to store the information.
● Variables can declared by any name or even alphabets like a, aa, abc, etc.
● Variables can re-declared even after we have declared them for once
● In Python we cannot concatenate string with number directly, you need to declare them as a separate variable, and after that, you
can concatenate number with string
● Declare local variable when we want to use for current function
● Declare Global variable when want to weuse the same variable for rest of the program
● To delete variable, it uses keyword "del".
CHAPTER 2
Python String
Till now, we have discussed numbers as the standard data types in python. In this section of the tutorial, we will discuss the most
popular data type in python i.e., string.
In python, strings can be created by enclosing the character or the sequence of characters in the quotes. Python allows us to use single
quotes, double quotes, or triple quotes to create the string.
Consider the following example in python to create a string.
str = "Hi Python !"

Here, if we check the type of the variable str using a python script
print(type(str)), then it will print string (str).
In python, strings are treated as the sequence of strings which means that python doesn't support the character data type instead a single
character written as 'p' is treated as the string of length 1.
Strings indexing and splitting
Like other languages, the indexing of the python strings starts from 0. For example, The string "HELLO" is indexed as given in the
below figure.

Reassigning strings
Updating the content of the strings is as easy as assigning it to a new string. The string object doesn't support item assignment i.e., A
string can only be replaced with a new string since its content can not be partially replaced. Strings are immutable in python.
Consider the following example.
Example 1
str = "HELLO"
str[0] = "h"
print(str)
Output:
Traceback (most recent call last):
File "12.py", line 2, in <module>
str[0] = "h";
TypeError: 'str' object does not support item assignment

However, in example 1, the string str can be completely assigned to a new content as specified in the following example.
Example 2
str = "HELLO"
print(str)
str = "hello"
print(str)
Output:
HELLO
hello

Example
Consider the following example to understand the real use of Python operators.
str = "Hello"
str1 = " world"
print (str* 3 ) # prints HelloHelloHello
print (str+str1) # prints Hello world
print (str[ 4 ]) # prints o
print (str[ 2 : 4 ]); # prints ll
print ( 'w' in str) # prints false as w is not present in str
print ( 'wo' not in str1) # prints false as wo is present in str1.
print (r 'C://python37' ) # prints C://python37 as it is written
print ( "The string str : %s" %(str)) # prints The string str : Hello
Output:
HelloHelloHello
Hello world
o
ll
False
False
C://python37
The string str : Hello

Python Formatting operator


Python allows us to use the format specifiers used in C's printf statement. The format specifiers in python are treated in the same way as
they are treated in C. However, Python provides an additional operator % which is used as an interface between the format specifiers
and their values. In other words, we can say that it binds the format specifiers to the values.
Consider the following example.
Integer = 10;
Float = 1.290
String = "Ayush"
print("Hi I am Integer ... My value is %d\nHi I am float ... My value is %f\nHi I am string ... My value is %s"%
(Integer,Float,String));
Output:
Hi I am Integer ... My value is 10
Hi I am float ... My value is 1.290000
Hi I am string ... My value is Ayush
CHAPTER 3
Python Tuple
Python Tuple is used to store the sequence of immutable python objects. Tuple is similar to lists since the value of the items stored in
the list can be changed whereas the tuple is immutable and the value of the items stored in the tuple can not be changed.
A tuple can be written as the collection of comma-separated values enclosed with the small brackets. A tuple can be defined as follows.
T1 = (101, "Ayush", 22)
T2 = ("Apple", "Banana", "Orange")

Example
tuple1 = (10, 20, 30, 40, 50, 60)
print(tuple1)
count = 0
for i in tuple1:
print("tuple1[%d] = %d"%(count, i));
Output:
(10, 20, 30, 40, 50, 60)
tuple1[0] = 10
tuple1[0] = 20
tuple1[0] = 30
tuple1[0] = 40
tuple1[0] = 50
tuple1[0] = 60

Example 2
tuple1 = tuple(input("Enter the tuple elements ..."))
print(tuple1)
count = 0
for i in tuple1:
print("tuple1[%d] = %s"%(count, i));

Output:
Enter the tuple elements ...12345
('1', '2', '3', '4', '5')
tuple1[0] = 1
tuple1[0] = 2
tuple1[0] = 3
tuple1[0] = 4
tuple1[0] = 5

However, if we try to reassign the items of a tuple, we would get an error as the tuple object doesn't support the item assignment.
An empty tuple can be written as follows.
T3 = ()

The tuple having a single value must include a comma as given below.
T4 = (90,)

A tuple is indexed in the same way as the lists. The items in the tuple can be accessed by using their specific index value.
We will see all these aspects of tuple in this section of the tutorial.
Tuple indexing and splitting
The indexing and slicing in tuple are similar to lists. The indexing in the tuple starts from 0 and goes to length(tuple) - 1.
The items in the tuple can be accessed by using the slice operator. Python also allows us to use the colon operator to access multiple
items in the tuple.
Consider the following image to understand the indexing and slicing in detail.
Unlike lists, the tuple items can not be deleted by using the del keyword as tuples are immutable. To delete an entire tuple, we can use
the del keyword with the tuple name.
Consider the following example.
tuple1 = (1, 2, 3, 4, 5, 6)
print(tuple1)
del tuple1[0]
print(tuple1)
del tuple1
print(tuple1)

Output:
(1, 2, 3, 4, 5, 6)
Traceback (most recent call last):
File "tuple.py", line 4, in <module>
print(tuple1)
NameError: name 'tuple1' is not defined

Like lists, the tuple elements can be accessed in both the directions. The right most element (last) of the tuple can be accessed by using
the index -1. The elements from left to right are traversed using the negative indexing.
Consider the following example.
tuple1 = (1, 2, 3, 4, 5)
print(tuple1[-1])
print(tuple1[-4])

Output:
5
2

Where use tuple


Using tuple instead of list is used in the following scenario.
1. Using tuple instead of list gives us a clear idea that tuple data is constant and must not be changed.
2. Tuple can simulate dictionary without keys. Consider the following nested structure which can be used as a dictionary.
[(101, "John", 22), (102, "Mike", 28), (103, "Dustin", 30)]

3. Tuple can be used as the key inside dictionary due to its immutable nature.
CHAPTER 4
Python Dictionary
Dictionary is used to implement the key-value pair in python. The dictionary is the data type in python which can simulate the real-life
data arrangement where some specific value exists for some particular key.
In other words, we can say that a dictionary is the collection of key-value pairs where the value can be any python object whereas the
keys are the immutable python object, i.e., Numbers, string or tuple.
Dictionary simulates Java hash-map in python.
Creating the dictionary
The dictionary can be created by using multiple key-value pairs enclosed with the small brackets () and separated by the colon (:). The
collections of the key-value pairs are enclosed within the curly braces {}.
The syntax to define the dictionary is given below.
Dict = {"Name": "Ayush","Age": 22}
In the above dictionary Dict, The keys Name, and Age are the string that is an immutable object.
Let's see an example to create a dictionary and printing its content.
Employee = {"Name": "John", "Age": 29, "salary":25000,"Company":"GOOGLE"}
print(type(Employee))
print("printing Employee data .... ")
print(Employee)
Output
<class 'dict'>
printing Employee data ....
{'Age': 29, 'salary': 25000, 'Name': 'John', 'Company': 'GOOGLE'}
Accessing the dictionary values
We have discussed how the data can be accessed in the list and tuple by using the indexing.
However, the values can be accessed in the dictionary by using the keys as keys are unique in the dictionary.
The dictionary values can be accessed in the following way.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
print (type(Employee))
print ( "printing Employee data .... " )
print ( "Name : %s" %Employee[ "Name" ])
print ( "Age : %d" %Employee[ "Age" ])
print ( "Salary : %d" %Employee[ "salary" ])
print ( "Company : %s" %Employee[ "Company" ])
Output:
<class 'dict'>
printing Employee data ....
Name : John
Age : 29
Salary : 25000
Company : GOOGLE
Python provides us with an alternative to use the get() method to access the dictionary values. It would give the same result as given by
the indexing.
Updating dictionary values
The dictionary is a mutable data type, and its values can be updated by using the specific keys.
Let's see an example to update the dictionary values.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
print (type(Employee))
print ( "printing Employee data .... " )
print (Employee)
print ( "Enter the details of the new employee...." );
Employee[ "Name" ] = input( "Name: " );
Employee[ "Age" ] = int(input( "Age: " ));
Employee[ "salary" ] = int(input( "Salary: " ));
Employee[ "Company" ] = input( "Company:" );
print ( "printing the new data" );
print (Employee)
Output:

<class 'dict'>
printing Employee data ....
{'Name': 'John', 'salary': 25000, 'Company': 'GOOGLE', 'Age': 29}
Enter the details of the new employee....
Name: David
Age: 19
Salary: 8900
Company:JTP
printing the new data
{'Name': 'David', 'salary': 8900, 'Company': 'JTP', 'Age': 19}
Deleting elements using del keyword
The items of the dictionary can be deleted by using the del keyword as given below.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
print (type(Employee))
print ( "printing Employee data .... " )
print (Employee)
print ( "Deleting some of the employee data" )
del Employee[ "Name" ]
del Employee[ "Company" ]
print ( "printing the modified information " )
print (Employee)
print ( "Deleting the dictionary: Employee" );
del Employee
print ( "Lets try to print it again " );
print (Employee)
Output:

<class 'dict'>
printing Employee data ....
{'Age': 29, 'Company': 'GOOGLE', 'Name': 'John', 'salary': 25000}
Deleting some of the employee data
printing the modified information
{'Age': 29, 'salary': 25000}
Deleting the dictionary: Employee
Lets try to print it again
Traceback (most recent call last):
File "list.py", line 13, in <module>
print(Employee)
NameError: name 'Employee' is not defined
Iterating Dictionary
A dictionary can be iterated using the for loop as given below.
Example 1
# for loop to print all the keys of a dictionary
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
for x in Employee:
print (x);
Output:

Name
Company
salary
Age
Example 2
#for loop to print all the values of the dictionary

Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }


for x in Employee:
print (Employee[x]);
Output:
29
GOOGLE
John
25000
Example 3
#for loop to print the values of the dictionary by using values() method.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
for x in Employee.values():
print (x);
Output:

GOOGLE
25000
John
29
Example 4
#for loop to print the items of the dictionary by using items() method.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
for x in Employee.items():
print (x);
Output:
('Name', 'John')
('Age', 29)
('salary', 25000)
('Company', 'GOOGLE')
Properties of Dictionary keys
1. In the dictionary, we can not store multiple values for the same keys. If we pass more than one values for a single key, then the value
which is last assigned is considered as the value of the key.
Consider the following example.
Employee = { "Name" : "John" , "Age" : 29 , "Salary" : 25000 , "Company" : "GOOGLE" , "Name" : "Johnn" }
for x,y in Employee.items():
print (x,y)
Output:

Salary 25000
Company GOOGLE
Name Johnn
Age 29
2. In python, the key cannot be any mutable object. We can use numbers, strings, or tuple as the key but we can not use any mutable
object like the list as the key in the dictionary.
Consider the following example.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" ,[ 100 , 201 , 301 ]: "Department ID" }
for x,y in Employee.items():
print (x,y)
Output:

Traceback (most recent call last):


File "list.py", line 1, in
Employee = {"Name": "John", "Age": 29, "salary":25000,"Company":"GOOGLE",[100,201,301]:"Department ID"}
TypeError: unhashable type: 'list'
CHAPTER 5
Python Operators
The operator can be defined as a symbol which is responsible for a particular operation between two operands. Operators are the pillars
of a program on which the logic is built in a particular programming language. Python provides a variety of operators described as
follows.
Arithmetic operators
Comparison operators
Assignment Operators
Logical Operators
Bitwise Operators
Membership Operators
Identity Operators

Arithmetic operators
Arithmetic operators are used to perform arithmetic operations between two operands. It includes +(addition), - (subtraction), *
(multiplication), /(divide), %(reminder), //(floor division), and exponent (**).
Consider the following table for a detailed explanation of arithmetic operators.

Operator and Description


+ (Addition) - It is used to add two operands. For example, if a = 20, b = 10 => a+b = 30
- (Subtraction) - It is used to subtract the second operand from the first operand. If the first operand is less than the second
operand, the value result negative. For example, if a = 20, b = 10 => a ? b = 10

/ (divide) - It returns the quotient after dividing the first operand by the second operand. For example, if a = 20, b = 10 => a/b = 2
* (Multiplication) - It is used to multiply one operand with the other. For example, if a = 20, b = 10 => a * b = 200

% (reminder) - It returns the reminder after dividing the first operand by the second operand. For example, if a = 20, b = 10 => a%b =
0

** (Exponent) - It is an exponent operator represented as it calculates the first operand power to second operand.

// (Floor division) - It gives the floor value of the quotient produced by dividing the two operands.

Comparison operator
Comparison operators are used to comparing the value of the two operands and returns boolean true or false accordingly. The
comparison operators are described in the following table.

Operator and Description


== If the value of two operands is equal, then the condition becomes true.

!= If the value of two operands is not equal then the condition becomes true.

<= If the first operand is less than or equal to the second operand, then the condition becomes true.

>= If the first operand is greater than or equal to the second operand, then the condition becomes true.

<> If the value of two operands is not equal, then the condition becomes true.

> If the first operand is greater than the second operand, then the condition becomes true.

< If the first operand is less than the second operand, then the condition becomes true.

Python assignment operators


The assignment operators are used to assign the value of the right expression to the left operand. The assignment operators are described
in the following table.

Operator and Description

= It assigns the the value of the right expression to the left operand.

+= It increases the value of the left operand by the value of the right operand and assign the modified value back to left operand. For
example, if a = 10, b = 20 => a+ = b will be equal to a = a+ b and therefore, a = 30.

-= It decreases the value of the left operand by the value of the right operand and assign the modified value back to left operand.
For example, if a = 20, b = 10 => a- = b will be equal to a = a- b and therefore, a = 10.

*= It multiplies the value of the left operand by the value of the right operand and assign the modified value back to left operand.
For example, if a = 10, b = 20 => a* = b will be equal to a = a* b and therefore, a = 200.

%= It divides the value of the left operand by the value of the right operand and assign the reminder back to left operand. For
example, if a = 20, b = 10 => a % = b will be equal to a = a % b and therefore, a = 0.

**= a**=b will be equal to a=a**b, for example, if a = 4, b =2, a**=b will assign 4**2 = 16 to a.

//= A//=b will be equal to a = a// b, for example, if a = 4, b = 3, a//=b will assign 4//3 = 1 to a.

Bitwise operator
The bitwise operators perform bit by bit operation on the values of the two operands.
For example,
if a = 7 ;
b = 6;
then, binary (a) = 0111
binary (b) = 0011

hence, a & b = 0011


a | b = 0111
a ^ b = 0100
~ a = 1000

Operator and Description


& (binary and) - If both the bits at the same place in two operands are 1, then 1 is copied to the result. Otherwise, 0 is copied.

| (binary or) - The resulting bit will be 0 if both the bits are zero otherwise the resulting bit will be 1.

^ (binary xor) - The resulting bit will be 1 if both the bits are different otherwise the resulting bit will be 0.

~ (negation) - It calculates the negation of each bit of the operand, i.e., if the bit is 0, the resulting bit will be 1 and vice versa.

<< (left shift) - The left operand value is moved left by the number of bits present in the right operand.

>> (right shift) - The left operand is moved right by the number of bits present in the right operand.

Logical Operators
The logical operators are used primarily in the expression evaluation to make a decision. Python supports the following logical
operators.

Operator and Description


and - If both the expression are true, then the condition will be true. If a and b are the two expressions, a → true, b → true => a and b →
true.

or - If one of the expressions is true, then the condition will be true. If a and b are the two expressions, a → true, b → false => a
or b → true.

not - If an expression a is true then not (a) will be false and vice versa.

Membership Operators
Python membership operators are used to check the membership of value inside a data structure. If the value is present in the data
structure, then the resulting value is true otherwise it returns false.
Operator and Description

in - It is evaluated to be true if the first operand is found in the second operand (list, tuple, or dictionary).

not in - It is evaluated to be true if the first operand is not found in the second operand (list, tuple, or dictionary).

Identity Operators
Operator and Description

is - It is evaluated to be true if the reference present at both sides point to the same object.
is not - It is evaluated to be true if the reference present at both side do not point to the same object.

Operator Precedence
The precedence of the operators is important to find out since it enables us to know which operator should be evaluated first. The
precedence table of the operators in python is given below.
Operator and Description
** The exponent operator is given priority over all the others used in the expression.

~+- The negation, unary plus and minus.

* / % // The multiplication, divide, modules, reminder, and floor division.

+- Binary plus and minus

>> << Left shift and right shift

& Binary and.

^| Binary xor and or

<= < > >= Comparison operators (less then, less then equal to, greater then, greater then equal to).

<> == != Equality operators.

= %= /= //= -= +=
*= **= Assignment operators

is is not Identity operators

in not in Membership operators

not or and Logical operators


CHAPTER 6
Python Functions
Functions are the most important aspect of an application. A function can be defined as the organized block of reusable code which can
be called whenever required.
Python allows us to divide a large program into the basic building blocks known as function. The function contains the set of
programming statements enclosed by {}. A function can be called multiple times to provide reusability and modularity to the python
program.
In other words, we can say that the collection of functions creates a program. The function is also known as procedure or subroutine in
other programming languages.
Python provide us various inbuilt functions like range() or print(). Although, the user can create its functions which can be called user-
defined functions.
Advantage of functions in python
There are the following advantages of C functions.

By using functions, we can avoid rewriting same logic/code again and again in a program.
We can call python functions any number of times in a program and from any place in a program.
We can track a large python program easily when it is divided into multiple functions.
Reusability is the main achievement of python functions.
However, Function calling is always overhead in a python program.

Creating a function
In python, we can use def keyword to define the function. The syntax to define a function in python is given below.
def my_function():
function-suite
return <expression>
The function block is started with the colon (:) and all the same level block statements remain at the same indentation.
A function can accept any number of parameters that must be the same in the definition and function calling.
Function calling
In python, a function must be defined before the function calling otherwise the python interpreter gives an error. Once the function is
defined, we can call it from another function or the python prompt. To call the function, use the function name followed by the
parentheses.
A simple function that prints the message "Hello Word" is given below.
def hello_world():
print ( "hello world" )

hello_world()
Output:
hello world
Parameters in function
The information into the functions can be passed as the parameters. The parameters are specified in the parentheses. We can give any
number of parameters, but we have to separate them with a comma.
Consider the following example which contains a function that accepts a string as the parameter and prints it.
Example 1
#defining the function
def func (name):
print ( "Hi " ,name);

#calling the function


func( "Ayush" )
Example 2
#python function to calculate the sum of two variables
#defining the function
def sum (a,b):
return a+b;

#taking values from the user


a = int(input( "Enter a: " ))
b = int(input( "Enter b: " ))

#printing the sum of a and b


print ( "Sum = " ,sum(a,b))
Output:
Enter a: 10
Enter b: 20
Sum = 30
Call by reference in Python
In python, all the functions are called by reference, i.e., all the changes made to the reference inside the function revert back to the
original value referred by the reference.
However, there is an exception in the case of mutable objects since the changes made to the mutable objects like string do not revert to
the original string rather, a new string object is made, and therefore the two different objects are printed.
Example 1 Passing Immutable Object (List)
#defining the function
def change_list(list1):
list1.append( 20 );
list1.append( 30 );
print ( "list inside function = " ,list1)

#defining the list


list1 = [ 10 , 30 , 40 , 50 ]

#calling the function


change_list(list1);
print ( "list outside function = " ,list1);
Output:
list inside function = [10, 30, 40, 50, 20, 30]
list outside function = [10, 30, 40, 50, 20, 30]
Example 2 Passing Mutable Object (String)
#defining the function
def change_string (str):
str = str + " Hows you" ;
print ( "printing the string inside function :" ,str);

string1 = "Hi I am there"

#calling the function


change_string(string1)

print ( "printing the string outside function :" ,string1)


Output:
printing the string inside function : Hi I am there Hows you
printing the string outside function : Hi I am there
Types of arguments
There may be several types of arguments which can be passed at the time of function calling.

1. Required arguments
2. Keyword arguments
3. Default arguments
4. Variable-length arguments

Required Arguments
Till now, we have learned about function calling in python. However, we can provide the arguments at the time of function calling. As
far as the required arguments are concerned, these are the arguments which are required to be passed at the time of function calling with
the exact match of their positions in the function call and function definition. If either of the arguments is not provided in the function
call, or the position of the arguments is changed, then the python interpreter will show the error.
Consider the following example.
Example 1
#the argument name is the required argument to the function func
def func(name):
message = "Hi " +name;
return message;
name = input( "Enter the name?" )
print (func(name))
Output:
Enter the name?John
Hi John
Example 2
#the function simple_interest accepts three arguments and returns the simple interest accordingly
def simple_interest(p,t,r):
return (p*t*r)/ 100
p = float(input( "Enter the principle amount? " ))
r = float(input( "Enter the rate of interest? " ))
t = float(input( "Enter the time in years? " ))
print ( "Simple Interest: " ,simple_interest(p,r,t))
Output:
Enter the principle amount? 10000
Enter the rate of interest? 5
Enter the time in years? 2
Simple Interest: 1000.0
Example 3
#the function calculate returns the sum of two arguments a and b
def calculate(a,b):
return a+b
calculate( 10 ) # this causes an error as we are missing a required arguments
b.
Output:
TypeError: calculate() missing 1 required positional argument: 'b'
Keyword arguments
Python allows us to call the function with the keyword arguments. This kind of function call will enable us to pass the arguments in the
random order.
The name of the arguments is treated as the keywords and matched in the function calling and definition. If the same match is found, the
values of the arguments are copied in the function definition.
Consider the following example.
Example 1
#function func is called with the name and message as the keyword arguments
def func(name,message):
print ( "printing the message with" ,name, "and " ,message)
func(name = "John" ,message= "hello" ) #name and message is copied with the values John and hello respectively
Output:
printing the message with John and hello
Example 2 providing the values in different order at the calling
#The function simple_interest(p, t, r) is called with the keyword arguments the order of arguments doesn't matter in this case
def simple_interest(p,t,r):
return (p*t*r)/ 100
print ( "Simple Interest: " ,simple_interest(t= 10 ,r= 10 ,p= 1900 ))
Output:
Simple Interest: 1900.0
If we provide the different name of arguments at the time of function call, an error will be thrown.
Consider the following example.
Example 3
#The function simple_interest(p, t, r) is called with the keyword arguments.
def simple_interest(p,t,r):
return (p*t*r)/ 100

print ( "Simple Interest: " ,simple_interest(time= 10 ,rate= 10 ,principle= 1900 )) # doesn?t find the exact match of the name of the arguments (keywords)
Output:
TypeError: simple_interest() got an unexpected keyword argument 'time'
The python allows us to provide the mix of the required arguments and keyword arguments at the time of function call. However, the required argument must not be given after the
keyword argument, i.e., once the keyword argument is encountered in the function call, the following arguments must also be the keyword arguments.
Consider the following example.

Example 4
def func(name1,message,name2):
print ( "printing the message with" ,name1, "," ,message, ",and" ,name2)
func( "John" ,message= "hello" ,name2= "David" ) #the first argument is not the keyword argument
Output:
printing the message with John , hello ,and David

The following example will cause an error due to an in-proper mix of keyword and required arguments being passed in the function
call.
Example 5
def func(name1,message,name2):
print ( "printing the message with" ,name1, "," ,message, ",and" ,name2)
func( "John" ,message= "hello" , "David" )
Output:
SyntaxError: positional argument follows keyword argument
Default Arguments
Python allows us to initialize the arguments at the function definition. If the value of any of the argument is not provided at the time of
function call, then that argument can be initialized with the value given in the definition even if the argument is not specified at the
function call.
Example 1
def printme(name,age= 22 ):
print ( "My name is" ,name, "and age is" ,age)
printme(name = "john" ) #the variable age is not passed into the function however the default value of age is considered in the function
Output:
My name is john and age is 22
Example 2
def printme(name,age= 22 ):
print ( "My name is" ,name, "and age is" ,age)
printme(name = "john" ) #the variable age is not passed into the function however the default value of age is considered in the function
printme(age = 10 ,name= "David" ) #the value of age is overwritten here, 10 will be printed as age
Output:
My name is john and age is 22
My name is David and age is 10
Variable length Arguments
In the large projects, sometimes we may not know the number of arguments to be passed in advance. In such cases, Python provides us
the flexibility to provide the comma separated values which are internally treated as tuples at the function call.
However, at the function definition, we have to define the variable with * (star) as *<variable - name >.
Consider the following example.
Example
def printme(*names):
print ( "type of passed argument is " ,type(names))
print ( "printing the passed arguments..." )
for name in names:
print (name)
printme( "john" , "David" , "smith" , "nick" )
Output:
type of passed argument is
printing the passed arguments...
john
David
smith
nick
Scope of variables
The scopes of the variables depend upon the location where the variable is being declared. The variable declared in one part of the
program may not be accessible to the other parts.
In python, the variables are defined with the two types of scopes.

1. Global variables
2. Local variables

The variable defined outside any function is known to have a global scope whereas the variable defined inside a function is known to
have a local scope.
Consider the following example.
Example 1
def print_message():
message = "hello !! I am going to print a message." # the variable message is local to the function itself
print (message)
print_message()
print (message) # this will cause an error since a local variable cannot be accessible here.

Output:
hello !! I am going to print a message.
File "/root/PycharmProjects/PythonTest/Test1.py", line 5, in
print(message)
NameError: name 'message' is not defined
Example 2
def calculate(*args):
sum= 0
for arg in args:
sum = sum +arg
print ( "The sum is" ,sum)
sum= 0
calculate( 10 , 20 , 30 ) #60 will be printed as the sum
print ( "Value of sum outside the function:" ,sum) # 0 will be printed
Output:
The sum is 60
Value of sum outside the function: 0
CHAPTER 7
Python If-else statements
Decision making is the most important aspect of almost all the programming languages. As the name implies, decision making allows
us to run a particular block of code for a particular decision. Here, the decisions are made on the validity of the particular conditions.
Condition checking is the backbone of decision making.
In python, decision making is performed by the following statements.

Statement and Description


If Statement - The if statement is used to test a specific condition. If the condition is true, a block of code (if-block) will be executed.

If - else Statement - The if-else statement is similar to if statement except the fact that, it also provides the block of the code for the false
case of the condition to be checked. If the condition provided in the if statement is false, then the else statement will be executed.

Nested if Statement - Nested if statements enable us to use if ? else statement inside an outer if statement.
Indentation in Python
For the ease of programming and to achieve simplicity, python doesn't allow the use of parentheses for the block level code. In Python,
indentation is used to declare a block. If two statements are at the same indentation level, then they are the part of the same block.
Generally, four spaces are given to indent the statements which are a typical amount of indentation in python.
Indentation is the most used part of the python language since it declares the block of code. All the statements of one block are intended
at the same level indentation. We will see how the actual indentation takes place in decision making and other stuff in python.
The if statement
The if statement is used to test a particular condition and if the condition is true, it executes a block of code known as if-block. The
condition of if statement can be any valid logical expression which can be either evaluated to true or false.
The syntax of the if-statement is given below.
if expression:
statement

Example 1
num = int(input( "enter the number?" ))
if num% 2 == 0 :
print ( "Number is even" )
Output:
enter the number?10
Number is even
Example 2 : Program to print the largest of the three numbers.
a = int(input( "Enter a? " ));
b = int(input( "Enter b? " ));
c = int(input( "Enter c? " ));
if a>b and a>c:
print ( "a is largest" );
if b>a and b>c:
print ( "b is largest" );
if c>a and c>b:
print ( "c is largest" );
Output:

Enter a? 100
Enter b? 120
Enter c? 130
c is largest

The if-else statement


The if-else statement provides an else block combined with the if statement which is executed in the false case of the condition.
If the condition is true, then the if-block is executed. Otherwise, the else-block is executed.
The syntax of the if-else statement is given below.
if condition:
#block of statements
else :
#another block of statements (else-block)

Example 1 : Program to check whether a person is eligible to vote or not.


age = int (input( "Enter your age? " ))
if age>= 18 :
print ( "You are eligible to vote !!" );
else :
print ( "Sorry! you have to wait !!" );
Output:

Enter your age? 90


You are eligible to vote !!
Example 2: Program to check whether a number is even or not.
num = int(input( "enter the number?" ))
if num% 2 == 0 :
print ( "Number is even..." )
else :
print ( "Number is odd..." )
Output:

enter the number?10


Number is even

The elif statement


The elif statement enables us to check multiple conditions and execute the specific block of statements depending upon the true
condition among them. We can have any number of elif statements in our program depending upon our need. However, using elif is
optional.
The elif statement works like an if-else-if ladder statement in C. It must be succeeded by an if statement.
The syntax of the elif statement is given below.

if expression 1:
# block of statements

elif expression 2:
# block of statements

elif expression 3:
# block of statements

else :
# block of statements
Example 1
number = int(input( "Enter the number?" ))
if number== 10 :
print ( "number is equals to 10" )
elif number== 50 :
print ( "number is equal to 50" );
elif number== 100 :
print ( "number is equal to 100" );
else :
print ( "number is not equal to 10, 50 or 100" );
Output:
Enter the number?15
number is not equal to 10, 50 or 100
CHAPTER 8
Python Loops
The flow of the programs written in any programming language is sequential by default. Sometimes we may need to alter the flow of
the program. The execution of a specific code may need to be repeated several numbers of times.
For this purpose, The programming languages provide various types of loops which are capable of repeating some specific code several
numbers of times. Consider the following diagram to understand the working of a loop statement.

Why we use loops in python?


The looping simplifies the complex problems into the easy ones. It enables us to alter the flow of the program so that instead of writing
the same code again and again, we can repeat the same code for a finite number of times. For example, if we need to print the first 10
natural numbers then, instead of using the print statement 10 times, we can print inside a loop which runs up to 10 iterations.
Advantages of loops
There are the following advantages of loops in Python.

1. It provides code re-usability.


2. Using loops, we do not need to write the same code again and again.
3. Using loops, we can traverse over the elements of data structures (array or linked lists).

There are the following loop statements in Python.

Loop Statement and Description


for loop - The for loop is used in the case where we need to execute some part of the code until the given condition is satisfied. The for
loop is also called as a per-tested loop. It is better to use for loop if the number of iteration is known in advance.
while loop - The while loop is to be used in the scenario where we don't know the number of iterations in advance. The block of
statements is executed in the while loop until the condition specified in the while loop is satisfied. It is also called a pre-tested loop.

do-while loop - The do-while loop continues until a given condition satisfies. It is also called post tested loop. It is used when it is
necessary to execute the loop at least once (mostly menu driven programs).

Python for loop


The for loop in Python is used to iterate the statements or a part of the program several times. It is frequently used to traverse the data
structures like list, tuple, or dictionary.
The syntax of for loop in python is given below.

for iterating_var in sequence:


statement(s)

Example
i=1
n=int(input("Enter the number up to which you want to print the natural numbers?"))
for i in range(0,10):
print(i,end = ' ')
Output:
0123456789
Python for loop example : printing the table of the given number
i= 1 ;
num = int(input( "Enter a number:" ));
for i in range( 1 , 11 ):
print ( "%d X %d = %d" %(num,i,num*i));
Output:
Enter a number:10
10 X 1 = 10
10 X 2 = 20
10 X 3 = 30
10 X 4 = 40
10 X 5 = 50
10 X 6 = 60
10 X 7 = 70
10 X 8 = 80
10 X 9 = 90
10 X 10 = 100

Nested for loop in python


Python allows us to nest any number of for loops inside a for loop. The inner loop is executed n number of times for every iteration of
the outer loop. The syntax of the nested for loop in python is given below.
for iterating_var1 in sequence:
for iterating_var2 in sequence:
#block of statements
#Other statements

Example 1
n = int(input( "Enter the number of rows you want to print?" ))
i,j= 0 , 0
for i in range( 0 ,n):
print ()
for j in range( 0 ,i+ 1 ):
print ( "*" ,end="")
Output:
Enter the number of rows you want to print?5
*
**
***
****
*****

Using else statement with for loop


Unlike other languages like C, C++, or Java, python allows us to use the else statement with the for loop which can be executed only
when all the iterations are exhausted. Here, we must notice that if the loop contains any of the break statement then the else statement
will not be executed.
Example 1
for i in range( 0 , 5 ):
print (i)
else : print ( "for loop completely exhausted, since there is no break." );

In the above example, for loop is executed completely since there is no break statement in the loop. The control comes out of the loop
and hence the else block is executed.
Output:
0
1
2
3
4

for loop completely exhausted, since there is no break.


Example 2
for i in range( 0 , 5 ):
print (i)
break ;
else : print ( "for loop is exhausted" );
print ( "The loop is broken due to break statement...came out of loop" )

In the above example, the loop is broken due to break statement therefore the else statement will not be executed. The statement present
immediate next to else block will be executed.
Output:
0
Python while loop
The while loop is also known as a pre-tested loop. In general, a while loop allows a part of the code to be executed as long as the given
condition is true.
It can be viewed as a repeating if statement. The while loop is mostly used in the case where the number of iterations is not known in
advance.
The syntax is given below.
while expression:
statements

Here, the statements can be a single statement or the group of statements. The expression should be any valid python expression
resulting into true or false. The true is any non-zero value.

Example 1
i= 1 ;
while i<= 10 :
print (i);
i=i+ 1 ;
Output:
1
2
3
4
5
6
7
8
9
10

Infinite while loop


If the condition given in the while loop never becomes false then the while loop will never terminate and result into the infinite while
loop.
Any non-zero value in the while loop indicates an always-true condition whereas 0 indicates the always-false condition. This type of
approach is useful if we want our program to run continuously in the loop without any disturbance.
Example 1
while ( 1 ):
print ( "Hi! we are inside the infinite while loop" );
Output:
Hi! we are inside the infinite while loop
(infinite times)
Example 2
var = 1
while var != 2 :
i = int(input( "Enter the number?" ))
print ( "Entered value is %d" %(i))
Output:
Enter the number?102
Entered value is 102
Enter the number?102
Entered value is 102
Enter the number?103
Entered value is 103
Enter the number?103
(infinite loop)

Using else with Python while loop


Python enables us to use the while loop with the while loop also. The else block is executed when the condition given in the while
statement becomes false. Like for loop, if the while loop is broken using break statement, then the else block will not be executed and
the statement present after else block will be executed.
Consider the following example.
i= 1 ;
while i<= 5 :
print (i)
i=i+ 1 ;
else : print ( "The while loop exhausted" );
Output:
1
2
3
4
5
The while loop exhausted

Python break statement


The break is a keyword in python which is used to bring the program control out of the loop. The break statement breaks the loops one
by one, i.e., in the case of nested loops, it breaks the inner loop first and then proceeds to outer loops. In other words, we can say that
break is used to abort the current execution of the program and the control goes to the next line after the loop.
The break is commonly used in the cases where we need to break the loop for a given condition.
The syntax of the break is given below.
#loop statements
break ;
Example 1
list =[ 1 , 2 , 3 , 4 ]
count = 1 ;
for i in list:
if i == 4 :
print ( "item matched" )
count = count + 1 ;
break
print ( "found at" ,count, "location" );
Output:
item matched
found at 2 location

Python continue Statement


The continue statement in python is used to bring the program control to the beginning of the loop. The continue statement skips the
remaining lines of code inside the loop and start with the next iteration. It is mainly used for a particular condition inside the loop so
that we can skip some specific code for a particular condition.
The syntax of Python continue statement is given below.

#loop statements
continue ;
#the code to be skipped

Example 1
i = 0;
while i!= 10 :
print ( "%d" %i);
continue ;
i=i+ 1 ;
Output:
infinite loop

Python OOPs Concepts


Like other general purpose languages, python is also an object-oriented language since its beginning. Python is an object-oriented
programming language. It allows us to develop applications using an Object Oriented approach. In Python, we can easily create and use
classes and objects.
Major principles of object-oriented programming system are given below.

Object
Class
Method
Inheritance
Polymorphism
Data Abstraction
Encapsulation

Object
The object is an entity that has state and behavior. It may be any real-world object like the mouse, keyboard, chair, table, pen, etc.
Everything in Python is an object, and almost everything has attributes and methods. All functions have a built-in attribute __doc__,
which returns the doc string defined in the function source code.
Class
The class can be defined as a collection of objects. It is a logical entity that has some specific attributes and methods. For example: if
you have an employee class then it should contain an attribute and method, i.e. an email id, name, age, salary, etc.
Syntax
class ClassName:
<statement- 1 >
.
.
<statement-N>
Method
The method is a function that is associated with an object. In Python, a method is not unique to class instances. Any object type can
have methods.
Inheritance
Inheritance is the most important aspect of object-oriented programming which simulates the real world concept of inheritance. It
specifies that the child object acquires all the properties and behaviors of the parent object.
By using inheritance, we can create a class which uses all the properties and behavior of another class. The new class is known as a
derived class or child class, and the one whose properties are acquired is known as a base class or parent class.
It provides re-usability of the code.
Polymorphism
Polymorphism contains two words "poly" and "morphs". Poly means many and Morphs means form, shape. By polymorphism, we
understand that one task can be performed in different ways. For example You have a class animal, and all animals speak. But they
speak differently. Here, the "speak" behavior is polymorphic in the sense and depends on the animal. So, the abstract "animal" concept
does not actually "speak", but specific animals (like dogs and cats) have a concrete implementation of the action "speak".
Encapsulation
Encapsulation is also an important aspect of object-oriented programming. It is used to restrict access to methods and variables. In
encapsulation, code and data are wrapped together within a single unit from being modified by accident.
Data Abstraction
Data abstraction and encapsulation both are often used as synonyms. Both are nearly synonym because data abstraction is achieved
through encapsulation.
Abstraction is used to hide internal details and show only functionalities. Abstracting something means to give names to things so that
the name captures the core of what a function or a whole program does.
Interview Questions and Answers
A list of frequently asked Python interview questions with answers for freshers and experienced are given below
1) What is Python?
Python is a general-purpose computer programming language. It is a high-level, object-oriented language which can run equally on
different platforms such as Windows, Linux, UNIX, and Macintosh. It is widely used in data science, machine learning and artificial
intelligence domain.
It is easy to learn and require less code to develop the applications.

2. What are the applications of Python?


Python is used in various software domains some application areas are given below.

Web and Internet Development


Games
Scientific and computational applications
Language development
Image processing and graphic design applications
Enterprise and business applications development
Operating systems
GUI based desktop applications

Python provides various web frameworks to develop web applications. The popular python web frameworks are Django, Pyramid,
Flask.
Python's standard library supports for E-mail processing, FTP, IMAP, and other Internet protocols.
Python's SciPy and NumPy helps in scientific and computational application development.
Python's Tkinter library supports to create a desktop based GUI applications.

3. What are the advantages of Python?

Interpreted
Free and open source
Extensible
Object-oriented
Built-in data structure
Readability
High-Level Language
Cross-platform
Interpreted: Python is an interpreted language. It does not require prior compilation of code and executes instructions
directly.
Free and open source: It is an open source project which is publicly available to reuse. It can be downloaded free of cost.
Portable: Python programs can run on cross platforms without affecting its performance.
Extensible: It is very flexible and extensible with any module.
Object-oriented: Python allows to implement the Object Oriented concepts to build application solution.
Built-in data structure: Tuple, List, and Dictionary are useful integrated data structures provided by the language.

4. What is PEP 8?
PEP 8 is a coding convention which specifies a set of guidelines, about how to write Python code more readable.
It's a set of rules to guide how to format your Python code to maximize its readability. Writing code to a specification helps to make
significant code bases, with lots of writers, more uniform and predictable, too.

5. What do you mean by Python literals?


Literals can be defined as a data which is given in a variable or constant. Python supports the following literals:
String Literals
String literals are formed by enclosing text in the single or double quotes. For example, string literals are string values.

E.g.:
"Aman", '12345'.

Numeric Literals
Python supports three types of numeric literals integer, float and complex. See the examples.

# Integer literal
a= 10
#Float Literal
b= 12.3
#Complex Literal
x= 3.14j

Boolean Literals
Boolean literals are used to denote boolean values. It contains either True or False.
# Boolean literal
isboolean = True

6. Explain Python Functions?


A function is a section of the program or a block of code that is written once and can be executed whenever required in the program. A
function is a block of self-contained statements which has a valid name, parameters list, and body. Functions make programming more
functional and modular to perform modular tasks. Python provides several built-in functions to complete tasks and also allows a user to
create new functions as well.

There are two types of functions:

Built-In Functions: copy(), len(), count() are the some built-in functions.
User-defined Functions: Functions which are defined by a user known as user-defined functions.

Example: A general syntax of user defined function is given below.

def function_name(parameters list):


#--- statements---
return a_value

7. Which are the file related libraries/modules in Python?


Python provides libraries/modules including functions that facilitate us to manipulate text files and binary files on the file system. By
using these libraries, we can create files, update their contents, copy, and delete files.
These libraries are os, os.path, and the shutil.
os and os.path: os and os.path libraries include functions for accessing the filesystem.
shutil: This library is used to copy and delete the files.
8. What are the different file processing modes supported by Python?
Python provides three modes to open files. The read-only, write-only, read-write and append mode. 'r' is used to open a file in read-only
mode, 'w' is used to open a file in write-only mode, 'rw' is used to open in reading and write mode, 'a' is used to open a file in append
mode. If the mode is not specified, by default file opens in read-only mode.

Read-only mode : Open a file for reading. It is the default mode.


Write-only mode: Open a file for writing. If the file contains data, data would be lost. Other a new file is created.
Read-Write mode: Open a file for reading, write mode. It means updating mode.
Append mode: Open for writing, append to the end of the file, if the file exists.

9. What are the different types of operators in Python?


Python uses a rich set of operators to perform a variety of operations. Some individual operators like membership and identity operators
are not so familiar but allow to perform operations.

Arithmetic Operators
Relational Operators
Assignment Operators
Logical Operators
Membership Operators
Identity Operators
Bitwise Operators

10. How to create a Unicode string in Python?


In Python 3, the old Unicode type has replaced by "str" type, and the string is treated as Unicode by default. We can make a string in
Unicode by using art.title.encode("utf-8") function.

11. Is Python interpreted language?


Python is an interpreted language. The Python language program runs directly from the source code. It converts the source code into an
intermediate language code, which is again translated into machine language that has to be executed.
Unlike Java or C, Python does not require compilation before execution.

12. What are the rules for a local and global variable in Python?
In Python, variables that are only referenced inside a function are called implicitly global. If a variable is assigned a new value
anywhere within the function's body, it's assumed to be a local. If a variable is ever assigned a new value inside the function, the
variable is implicitly local, and we need to declare it as 'global' explicitly. To make a variable globally, we need to declare it by using
global keyword. Local variables are accessible within local body only. Global variables are accessible anywhere in the program, and
any function can access and modify its value.

13. What is the namespace in Python?


In Python, every name has a place where it lives. It is known as a namespace. It is like a box where a variable name maps to the object
placed. Whenever the variable is searched out, this box will be searched, to get the corresponding object.

14. What are iterators in Python?


In Python, iterators are used to iterate a group of elements, containers like a list. Iterators are the collection of items, and it can be a list,
tuple, or a dictionary. Python iterator implements __itr__ and next() method to iterate the stored elements. In Python, we generally use
loops to iterate over the collections (list, tuple).

15. What is a generator in Python?


In Python, the generator is a way that specifies how to implement iterators. It is a normal function except that it yields expression in the
function. It does not implements __itr__ and next() method and reduce other overheads as well.
If a function contains at least a yield statement, it becomes a generator. The yield keyword pauses the current execution by saving its
states and then resume from the same when required.

16. What is slicing in Python?


Slicing is a mechanism used to select a range of items from sequence type like list, tuple, and string. It is beneficial and easy to get
elements from a range by using slice way. It requires a : (colon) which separates the start and end index of the field. All the data
collection types List or tuple allows us to use slicing to fetch elements. Although we can get elements by specifying an index, we get
only single element whereas using slicing we can get a group of elements.

17. What is a negative index in Python?


Python sequences are accessible using an index in positive and negative numbers. For example, 0 is the first positive index, 1 is the
second positive index and so on. For negative indexes -1 is the last negative index, -2 is the second last negative index and so on.
Index traverses from left to right and increases by one until end of the list.
Negative index traverse from right to left and iterate one by one till the start of the list. A negative index is used to traverse the elements
into reverse order.

18. What is pickling and unpickling in Python?


Pickling is a process in which a pickle module accepts any Python object, converts it into a string representation and dumps it into a file
by using dump() function.
Unpickling is a process of retrieving original Python object from the stored string representation for use.

19. What are the differences between Python 2.x and Python 3.x?
Python 2.x is an older version of Python. Python 3.x is newer and latest version. Python 2.x is legacy now. Python 3.x is the present and
future of this language.
The most visible difference between Python2 and Python3 is in print statement (function). In Python 2, it looks like print "Hello", and in
Python 3, it is print ("Hello").
String in Python2 is ASCII implicitly, and in Python3 it is Unicode.
The xrange() method has removed from Python 3 version. A new keyword as is introduced in Error handling.

20. How to send an email in Python Language?


To send an email, Python provides smtplib and email modules. Import these modules into the created mail script and send mail by
authenticating a user.
It has a method SMTP(smtp-server, port). It requires two parameters to establish SMTP connection.
A simple example to send an email is given below.

import smtplib
# Calling SMTP
s = smtplib.SMTP('smtp.gmail.com', 587)
# TLS for network security
s.starttls()
# User email Authentication
s.login("sender_email_id", "sender_email_id_password")
# message to be sent
message = "Message_you_need_to_send"
# sending the mail
s.sendmail("sender_email_id", "receiver_email_id", message)
And finally, if you liked the book, I would like to ask you to do leave a review for the book on Amazon. Just go
to your account on Amazon and write a review for this book.

Thank you and good luck!

You might also like