HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
FOR
BEGINNERS
Use case
An e-commerce site XYZ (having 100 million users) wants to offer a gift voucher of 100$ to its top 10 customers who have spent the
most in the previous year.Moreover, they want to find the buying trend of these customers so that company can suggest more items
related to them.
Issues
Huge amount of unstructured data which needs to be stored, processed and analyzed.
Solution
Storage: This huge amount of data, Hadoop uses HDFS (Hadoop Distributed File System) which uses commodity hardware to form
clusters and store data in a distributed fashion. It works on Write once, read many times principle.
Processing: Map Reduce paradigm is applied to data distributed over network to find the required output.
Analyze: Pig, Hive can be used to analyze the data.
Cost: Hadoop is open source so the cost is no more an issue.
What is Hadoop
Hadoop is an open source framework from Apache and is used to store process and analyze data which are very huge in volume.
Hadoop is written in Java and is not OLAP (online analytical processing). It is used for batch/offline processing.It is being used by
Facebook, Yahoo, Google, Twitter, LinkedIn and many more. Moreover it can be scaled up just by adding nodes in the cluster.
Modules of Hadoop
1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the basis of that HDFS was developed.
It states that the files will be broken into blocks and stored in nodes over the distributed architecture.
2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the cluster.
3. Map Reduce: This is a framework which helps Java programs to do the parallel computation on data using key value
pair. The Map task takes input data and converts it into a data set which can be computed in Key value pair. The output
of Map task is consumed by reduce task and then the out of reducer gives the desired result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other Hadoop modules.
Hadoop Architecture
The Hadoop architecture is a package of the file system, MapReduce engine and the HDFS (Hadoop Distributed File System). The
MapReduce engine can be MapReduce/MR1 or YARN/MR2.
A Hadoop cluster consists of a single master and multiple slave nodes. The master node includes Job Tracker, Task Tracker,
NameNode, and DataNode whereas the slave node includes DataNode and TaskTracker.
NameNode
It is a single master server exist in the HDFS cluster.
As it is a single node, it may become the reason of single point failure.
It manages the file system namespace by executing an operation like the opening, renaming and closing the files.
It simplifies the architecture of the system.
DataNode
The HDFS cluster contains multiple DataNodes.
Each DataNode contains multiple data blocks.
These data blocks are used to store data.
It is the responsibility of DataNode to read and write requests from the file system's clients.
It performs block creation, deletion, and replication upon instruction from the NameNode.
Job Tracker
The role of Job Tracker is to accept the MapReduce jobs from client and process the data by using NameNode.
In response, NameNode provides metadata to Job Tracker.
Task Tracker
It works as a slave node for Job Tracker.
It receives task and code from Job Tracker and applies that code on the file. This process can also be called as a Mapper.
MapReduce Layer
The MapReduce comes into existence when the client application submits the MapReduce job to Job Tracker. In response, the Job
Tracker sends the request to the appropriate Task Trackers. Sometimes, the TaskTracker fails or time out. In such a case, that part of the
job is rescheduled.
Advantages of Hadoop
Fast: In HDFS the data distributed over the cluster and are mapped which helps in faster retrieval. Even the tools to
process the data are often on the same servers, thus reducing the processing time. It is able to process terabytes of data in
minutes and Peta bytes in hours.
Scalable: Hadoop cluster can be extended by just adding nodes in the cluster.
Cost Effective: Hadoop is open source and uses commodity hardware to store data so it really cost effective as
compared to traditional relational database management system.
Resilient to failure: HDFS has the property with which it can replicate data over the network, so if one node is down or
some other network failure happens, then Hadoop takes the other copy of data and use it. Normally, data are replicated
thrice but the replication factor is configurable.
History of Hadoop
The Hadoop was started by Doug Cutting and Mike Cafarella in 2002. Its origin was the Google File System paper, published by
Google.
1) Java Installation
Step 1. Type "java -version" in prompt to find if the java is installed or not. If not then download java from
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html . The tar filejdk-7u71-linux-x64.tar.gz will
be downloaded to your system.
Step 2. Extract the file using the below command
Step 3. To make java available for all the users of UNIX move the file to /usr/local and set the path. In the prompt switch to root user
and then type the command below to move the jdk to /usr/lib.
1. # mv jdk1.7.0_71 /usr/lib/
Now in ~/.bashrc file add the following commands to set up the path.
1. # export JAVA_HOME=/usr/lib/jdk1.7.0_71
2. # export PATH=PATH:$JAVA_HOME/bin
Now, you can check the installation by typing "java -version" in the prompt.
2) SSH Installation
SSH is used to interact with the master and slaves computer without any prompt for password. First of all create a Hadoop user on the
master and slave systems
1. # useradd hadoop
2. # passwd Hadoop
To map the nodes open the hosts file present in /etc/ folder on all the machines and put the ip address along with their host name.
1. # vi /etc/hosts
1. 190.12.1.114 hadoop-master
2. 190.12.1.121 hadoop-salve-one
3. 190.12.1.143 hadoop-slave-two
Set up SSH key in every node so that they can communicate among themselves without password. Commands for the same are:
1. # su hadoop
2. $ ssh-keygen -t rsa
3. $ ssh-copy-id -i ~/.ssh/id_rsa.pub tutorialspoint@hadoop-master
4. $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp1@hadoop-slave-1
5. $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop_tp2@hadoop-slave-2
6. $ chmod 0600 ~/.ssh/authorized_keys
7. $ exit
3) Hadoop Installation
Hadoop can be downloaded from http://developer.yahoo.com/hadoop/tutorial/module3.html
Now extract the Hadoop and copy it to a location.
1. $ mkdir /usr/hadoop
2. $ sudo tar vxzf hadoop-2.2.0.tar.gz ?c /usr/hadoop
1. export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71
1. <configuration>
2. <property>
3. <name>fs.default.name</name>
4. <value>hdfs://hadoop-master:9000</value>
5. </property>
6. <property>
7. <name>dfs.permissions</name>
8. <value>false</value>
9. </property>
10. </configuration>
1. <configuration>
2. <property>
3. <name>dfs.data.dir</name>
4. <value>usr/hadoop/dfs/name/data</value>
5. <final>true</final>
6. </property>
7. <property>
8. <name>dfs.name.dir</name>
9. <value>usr/hadoop/dfs/name</value>
10. <final>true</final>
11. </property>
12. <property>
13. <name>dfs.replication</name>
14. <value>1</value>
15. </property>
16. </configuration>
1. <configuration>
2. <property>
3. <name>mapred.job.tracker</name>
4. <value>hadoop-master:9001</value>
5. </property>
6. </configuration>
1. cd $HOME
2. vi .bashrc
3. Append following lines in the end and save and exit
4. #Hadoop variables
5. export JAVA_HOME=/usr/lib/jvm/jdk/jdk1.7.0_71
6. export HADOOP_INSTALL=/usr/hadoop
7. export PATH=$PATH:$HADOOP_INSTALL/bin
8. export PATH=$PATH:$HADOOP_INSTALL/sbin
9. export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
10. export HADOOP_COMMON_HOME=$HADOOP_INSTALL
11. export HADOOP_HDFS_HOME=$HADOOP_INSTALL
12. export YARN_HOME=$HADOOP_INSTALL
1. # su hadoop
2. $ cd /opt/hadoop
3. $ scp -r hadoop hadoop-slave-one:/usr/hadoop
4. $ scp -r hadoop hadoop-slave-two:/usr/Hadoop
1. $ vi etc/hadoop/masters
2. hadoop-master
3.
4. $ vi etc/hadoop/slaves
5. hadoop-slave-one
6. hadoop-slave-two
After this format the name node and start all the deamons
1. # su hadoop
2. $ cd /usr/hadoop
3. $ bin/hadoop namenode -format
4.
5. $ cd $HADOOP_HOME/sbin
6. $ start-all.sh
The easiest step is the usage of cloudera as it comes with all the stuffs pre-installed which can be downloaded from
http://content.udacity-data.com/courses/ud617/Cloudera-Udacity-Training-VM-4.1.1.c.zip
HADOOP MODULES
What is HDFS
Hadoop comes with a distributed file system called HDFS. In HDFS data is distributed over several machines and replicated to ensure
their durability to failure and high availability to parallel application.
It is cost effective as it uses commodity hardware. It involves the concept of blocks, data nodes and node name.
HDFS Concepts
1. Blocks: A Block is the minimum amount of data that it can read or write.HDFS blocks are 128 MB by default and this is
configurable.Files n HDFS are broken into block-sized chunks,which are stored as independent units.Unlike a file
system, if the file is in HDFS is smaller than block size, then it does not occupy full block?s size, i.e. 5 MB of file stored
in HDFS of block size 128 MB takes 5MB of space only.The HDFS block size is large just to minimize the cost of seek.
2. Name Node: HDFS works in master-worker pattern where the name node acts as master.Name Node is controller and
manager of HDFS as it knows the status and the metadata of all the files in HDFS; the metadata information being file
permission, names and location of each block.The metadata are small, so it is stored in the memory of name
node,allowing faster access to data. Moreover the HDFS cluster is accessed by multiple clients concurrently,so all this
information is handled bya single machine. The file system operations like opening, closing, renaming etc. are executed
by it.
3. Data Node: They store and retrieve blocks when they are told to; by client or name node. They report back to name
node periodically, with list of blocks that they are storing. The data node being a commodity hardware also does the
work of block creation, deletion and replication as stated by the name node.
Since all the metadata is stored in name node, it is very important. If it fails the file system can not be used as there would be no way of
knowing how to reconstruct the files from blocks present in data node. To overcome this, the concept of secondary name node arises.
Secondary Name Node: It is a separate physical machine which acts as a helper of name node. It performs periodic check points.It
communicates with the name node and take snapshot of meta data which helps minimize downtime and loss of data.
Starting HDFS
The HDFS should be formatted initially and then started in the distributed mode. Commands are given below.
To Format $ hadoop namenode -format
To Start $ start-dfs.sh
Recursive deleting
hadoop fs -rmr <arg>
Example:
hadoop fs -rmr /user/sonoo/
Copies the file or directory from the local file system identified by localSrc to dest within the DFS.
copyFromLocal <localSrc><dest>
Identical to -put
copyFromLocal <localSrc><dest>
Identical to -put
moveFromLocal <localSrc><dest>
Copies the file or directory from the local file system identified by localSrc to dest within HDFS, and then deletes the local
copy on success.
get [-crc] <src><localDest>
Copies the file or directory in HDFS identified by src to the local file system path identified by localDest.
cat <filen-ame>
Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target
over time)
touchz <path>
Creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already
size 0.
test -[ezd] <path>
Prints information about path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o),
replication (%r), and modification date (%y, %Y).
HDFS Features and Goals
The Hadoop Distributed File System (HDFS) is a distributed file system. It is a core part of Hadoop which is used for data storage. It is
designed to run on commodity hardware.
Unlike other distributed file system, HDFS is highly fault-tolerant and can be deployed on low-cost hardware. It can easily handle the
application that contains large data sets.
Let's see some of the important features and goals of HDFS.
Features of HDFS
Highly Scalable - HDFS is highly scalable as it can scale hundreds of nodes in a single cluster.
Replication - Due to some unfavorable conditions, the node containing the data may be loss. So, to overcome such
problems, HDFS always maintains the copy of data on a different machine.
Fault tolerance - In HDFS, the fault tolerance signifies the robustness of the system in the event of failure. The HDFS
is highly fault-tolerant that if any machine fails, the other machine containing the copy of that data automatically
become active.
Distributed data storage - This is one of the most important features of HDFS that makes Hadoop very powerful. Here,
data is divided into multiple blocks and stored into nodes.
Portable - HDFS is designed in such a way that it can easily portable from platform to another.
Goals of HDFS
Handling the hardware failure - The HDFS contains multiple server machines. Anyhow, if any machine fails, the
HDFS goal is to recover it quickly.
Streaming data access - The HDFS applications usually run on the general-purpose file system. This application
requires streaming access to their data sets.
Coherence Model - The application that runs on HDFS require to follow the write-once-ready-many approach. So, a
file once created need not to be changed. However, it can be appended and truncate.
What is YARN
Yet Another Resource Manager takes programming to the next level beyond Java , and makes it interactive to let another application
Hbase, Spark etc. to work on it.Different Yarn applications can co-exist on the same cluster so MapReduce, Hbase, Spark all can run at
the same time bringing great benefits for manageability and cluster utilization.
Components Of YARN
Client: For submitting MapReduce jobs.
Resource Manager: To manage the use of resources across the cluster
Node Manager:For launching and monitoring the computer containers on machines in the cluster.
Map Reduce Application Master: Checks tasks running the MapReduce job. The application master and the
MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers.
Jobtracker & Tasktrackerwere were used in previous version of Hadoop, which were responsible for handling resources and checking
progress management. However, Hadoop 2.0 has Resource manager and NodeManager to overcome the shortfall of Jobtracker &
Tasktracker.
Benefits of YARN
Scalability: Map Reduce 1 hits ascalability bottleneck at 4000 nodes and 40000 task, but Yarn is designed for 10,000
nodes and 1 lakh tasks.
Utiliazation: Node Manager manages a pool of resources, rather than a fixed number of the designated slots thus
increasing the utilization.
Multitenancy: Different version of MapReduce can run on YARN, which makes the process of upgrading MapReduce
more manageable.
HADOOP MapReduce
MapReduce tutorial provides basic and advanced concepts of MapReduce. Our MapReduce tutorial is designed for beginners and
professionals.
Our MapReduce tutorial includes all topics of MapReduce such as Data Flow in MapReduce, Map Reduce API, Word Count Example,
Character Count Example, etc.
What is MapReduce?
A MapReduce is a data processing tool which is used to process the data parallelly in a distributed form. It was developed in 2004, on
the basis of paper titled as "MapReduce: Simplified Data Processing on Large Clusters," published by Google.
The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer phase. In the Mapper, the input is given in the
form of a key-value pair. The output of the Mapper is fed to the reducer as input. The reducer runs only after the Mapper is over. The
reducer too takes input in key-value format, and the output of reducer is the final output.
Usage of MapReduce
It can be used in various application like document clustering, distributed sorting, and web link-graph reversal.
It can be used for distributed pattern-based searching.
We can also use MapReduce in machine learning.
It was used by Google to regenerate Google's index of the World Wide Web.
It can be used in multiple computing environments such as multi-cluster, multi-core, and mobile environment.
Prerequisite
Before learning MapReduce, you must have the basic knowledge of Big Data.
Audience
Our MapReduce tutorial is designed to help beginners and professionals.
Problem
We assure that you will not find any problem in this MapReduce tutorial. But if there is any mistake, please post the problem in contact
form.
Data Flow In MapReduce
MapReduce is used to compute the huge amount of data . To handle the upcoming data in a parallel and distributed form, the data has to
flow from various phases.
Map function
The map function process the upcoming key-value pairs and generated the corresponding output key-value pairs. The map input and
output type may be different from each other.
Partition function
The partition function assigns the output of each Map function to the appropriate reducer. The available key and value provide this
function. It returns the index of reducers.
Reduce function
The Reduce function is assigned to each unique key. These keys are already arranged in sorted order. The values associated with the
keys can iterate the Reduce and generates the corresponding output.
Output writer
Once the data flow from all the above phases, Output writer executes. The role of Output writer is to write the Reduce output to the
stable storage.
MapReduce API
In this section, we focus on MapReduce APIs. Here, we learn about the classes and methods used in MapReduce programming.
Pre-requisite
Java Installation - Check whether the Java is installed or not using the following command.
java -version
Hadoop Installation - Check whether the Hadoop is installed or not using the following command.
hadoop version
File: WC_Mapper.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import java.util.StringTokenizer;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.LongWritable;
7. import org.apache.hadoop.io.Text;
8. import org.apache.hadoop.mapred.MapReduceBase;
9. import org.apache.hadoop.mapred.Mapper;
10. import org.apache.hadoop.mapred.OutputCollector;
11. import org.apache.hadoop.mapred.Reporter;
12. public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
13. private final static IntWritable one = new IntWritable(1);
14. private Text word = new Text();
15. public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,
16. Reporter reporter) throws IOException{
17. String line = value.toString();
18. StringTokenizer tokenizer = new StringTokenizer(line);
19. while (tokenizer.hasMoreTokens()){
20. word.set(tokenizer.nextToken());
21. output.collect(word, one);
22. }
23. }
24.
25. }
File: WC_Reducer.java
1. package com.javatpoint;
2. import java.io.IOException;
3. import java.util.Iterator;
4. import org.apache.hadoop.io.IntWritable;
5. import org.apache.hadoop.io.Text;
6. import org.apache.hadoop.mapred.MapReduceBase;
7. import org.apache.hadoop.mapred.OutputCollector;
8. import org.apache.hadoop.mapred.Reducer;
9. import org.apache.hadoop.mapred.Reporter;
10.
11. public class WC_Reducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {
12. public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,
13. Reporter reporter) throws IOException {
14. int sum=0;
15. while (values.hasNext()) {
16. sum+=values.next().get();
17. }
18. output.collect(key,new IntWritable(sum));
19. }
20. }
File: WC_Runner.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import org.apache.hadoop.fs.Path;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.Text;
7. import org.apache.hadoop.mapred.FileInputFormat;
8. import org.apache.hadoop.mapred.FileOutputFormat;
9. import org.apache.hadoop.mapred.JobClient;
10. import org.apache.hadoop.mapred.JobConf;
11. import org.apache.hadoop.mapred.TextInputFormat;
12. import org.apache.hadoop.mapred.TextOutputFormat;
13. public class WC_Runner {
14. public static void main(String[] args) throws IOException{
15. JobConf conf = new JobConf(WC_Runner.class);
16. conf.setJobName("WordCount");
17. conf.setOutputKeyClass(Text.class);
18. conf.setOutputValueClass(IntWritable.class);
19. conf.setMapperClass(WC_Mapper.class);
20. conf.setCombinerClass(WC_Reducer.class);
21. conf.setReducerClass(WC_Reducer.class);
22. conf.setInputFormat(TextInputFormat.class);
23. conf.setOutputFormat(TextOutputFormat.class);
24. FileInputFormat.setInputPaths(conf,new Path(args[0]));
25. FileOutputFormat.setOutputPath(conf,new Path(args[1]));
26. JobClient.runJob(conf);
27. }
28. }
Pre-requisite
Java Installation - Check whether the Java is installed or not using the following command.
java -version
Hadoop Installation - Check whether the Hadoop is installed or not using the following command.
hadoop version
In this example, we find out the frequency of each char value exists in this text file.
Create a directory in HDFS, where to kept text file.
$ hdfs dfs -mkdir /count
Upload the info.txt file on HDFS in the specific directory.
$ hdfs dfs -put /home/codegyani/info.txt /count
File: WC_Mapper.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import org.apache.hadoop.io.IntWritable;
5. import org.apache.hadoop.io.LongWritable;
6. import org.apache.hadoop.io.Text;
7. import org.apache.hadoop.mapred.MapReduceBase;
8. import org.apache.hadoop.mapred.Mapper;
9. import org.apache.hadoop.mapred.OutputCollector;
10. import org.apache.hadoop.mapred.Reporter;
11. public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
12. public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,
13. Reporter reporter) throws IOException{
14. String line = value.toString();
15. String tokenizer[] = line.split("");
16. for(String SingleChar : tokenizer)
17. {
18. Text charKey = new Text(SingleChar);
19. IntWritable One = new IntWritable(1);
20. output.collect(charKey, One);
21. }
22. }
23.
24. }
File: WC_Reducer.java
1. package com.javatpoint;
2. import java.io.IOException;
3. import java.util.Iterator;
4. import org.apache.hadoop.io.IntWritable;
5. import org.apache.hadoop.io.Text;
6. import org.apache.hadoop.mapred.MapReduceBase;
7. import org.apache.hadoop.mapred.OutputCollector;
8. import org.apache.hadoop.mapred.Reducer;
9. import org.apache.hadoop.mapred.Reporter;
10.
11. public class WC_Reducer extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {
12. public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,
13. Reporter reporter) throws IOException {
14. int sum=0;
15. while (values.hasNext()) {
16. sum+=values.next().get();
17. }
18. output.collect(key,new IntWritable(sum));
19. }
20. }
File: WC_Runner.java
1. package com.javatpoint;
2.
3. import java.io.IOException;
4. import org.apache.hadoop.fs.Path;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.Text;
7. import org.apache.hadoop.mapred.FileInputFormat;
8. import org.apache.hadoop.mapred.FileOutputFormat;
9. import org.apache.hadoop.mapred.JobClient;
10. import org.apache.hadoop.mapred.JobConf;
11. import org.apache.hadoop.mapred.TextInputFormat;
12. import org.apache.hadoop.mapred.TextOutputFormat;
13. public class WC_Runner {
14. public static void main(String[] args) throws IOException{
15. JobConf conf = new JobConf(WC_Runner.class);
16. conf.setJobName("CharCount");
17. conf.setOutputKeyClass(Text.class);
18. conf.setOutputValueClass(IntWritable.class);
19. conf.setMapperClass(WC_Mapper.class);
20. conf.setCombinerClass(WC_Reducer.class);
21. conf.setReducerClass(WC_Reducer.class);
22. conf.setInputFormat(TextInputFormat.class);
23. conf.setOutputFormat(TextOutputFormat.class);
24. FileInputFormat.setInputPaths(conf,new Path(args[0]));
25. FileOutputFormat.setOutputPath(conf,new Path(args[1]));
26. JobClient.runJob(conf);
27. }
28. }
Prerequisite
Before learning HBase, you must have the knowledge of Hadoop and Java.
Audience
Our HBase tutorial is designed to help beginners and professionals.
Problem
We assure that you will not find any problem in this HBase tutorial. But if there is any mistake, please post the problem in contact form.
What is HBase
Hbase is an open source and sorted map data built on Hadoop. It is column oriented and horizontally scalable.
It is based on Google's Big Table.It has set of tables which keep data in key value format. Hbase is well suited for sparse data sets which
are very common in big data use cases. Hbase provides APIs enabling development in practically any programming language. It is a
part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
Why HBase
RDBMS get exponentially slow as the data becomes large
Expects data to be highly structured, i.e. ability to fit in a well-defined schema
Any change in schema might require a downtime
For sparse datasets, too much of overhead of maintaining NULL values
Features of Hbase
Horizontally scalable: You can add any number of columns anytime.
Automatic Failover: Automatic failover is a resource that allows a system administrator to automatically switch data
handling to a standby system in the event of system compromise
Integrations with Map/Reduce framework: Al the commands and java codes internally implement Map/ Reduce to do
the task and it is built over Hadoop Distributed File System.
sparse, distributed, persistent, multidimensional sorted map, which is indexed by rowkey, column key,and timestamp.
Often referred as a key value store or column family-oriented database, or storing versioned maps of maps.
fundamentally, it's a platform for storing and retrieving data with random access.
It doesn't care about datatypes(storing an integer in one row and a string in another for the same column).
It doesn't enforce relationships within your data.
It is designed to run on a cluster of computers, built using commodity hardware.
HBase Read
A read against HBase must be reconciled between the HFiles, MemStore & BLOCKCACHE.The BlockCache is designed to keep
frequently accessed data from the HFiles in memory so as to avoid disk reads.Each column family has its own BlockCache.BlockCache
contains data in form of 'block', as unit of data that HBase reads from disk in a single pass.The HFile is physically laid out as a sequence
of blocks plus an index over those blocks. This means reading a block from HBase requires only looking up that block's location in the
index and retrieving it from disk.
Block: It is the smallest indexed unit of data and is the smallest unit of data that can be read from disk. default size 64KB.
Scenario, when smaller block size is preferred: To perform random lookups. Having smaller blocks creates a larger index and
thereby consumes more memory.
Scenario, when larger block size is preferred: To perform sequential scans frequently. This allows you to save on memory because
larger blocks mean fewer index entries and thus a smaller index.
Reading a row from HBase requires first checking the MemStore, then the BlockCache, Finally, HFiles on disk are accessed.
HBase Write
When a write is made, by default, it goes into two places:
write-ahead log (WAL), HLog, and
in-memory write buffer, MemStore.
Clients don't interact directly with the underlying HFiles during writes, rather writes goes to WAL & MemStore in parallel. Every write
to HBase requires confirmation from both the WAL and the MemStore.
HBase MemStore
The MemStore is a write buffer where HBase accumulates data in memory before a permanent write.
Its contents are flushed to disk to form an HFile when the MemStore fills up.
It doesn't write to an existing HFile but instead forms a new file on every flush.
The HFile is the underlying storage format for HBase.
HFiles belong to a column family(one MemStore per column family). A column family can have multiple HFiles, but
the reverse isn't true.
size of the MemStore is defined in hbase-site.xml called hbase.hregion.memstore.flush.size.
What happens, when the server hosting a MemStore that has not yet been flushed crashes?
Every server in HBase cluster keeps a WAL to record changes as they happen. The WAL is a file on the underlying file system.A write
isn't considered successful until the new WAL entry is successfully written, this guarantees durability.
If HBase goes down, the data that was not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL, taken
care by Hbase framework.
HBase Installation
The prerequisite for HBase installation are Java and Hadoop installed on your Linux machine.
Hbase can be installed in three modes: standalone, Pseudo Distributed mode and Fully Distributed mode.
Download the Hbase package from http://www.interior-dsgn.com/apache/hbase/stable/ and unzip it with the below commands.
1. $su
2. $password: enter your password here
3. mv hbase-0.99.1/* Hbase/
1. cd /usr/local/Hbase/conf
2. gedit hbase-env.sh
Replace the existing JAVA_HOME value with your current value as shown below.
1. export JAVA_HOME=/usr/lib/jvm/java-1.7.0
Inside /usr/local/Hbase you will find hbase-site.xml. Open it and within configuration add the below code.
1. <configuration>
2. //Here you have to set the path where you want HBase to store its files.
3. <property>
4. <name>hbase.rootdir</name>
5. <value>file:/home/hadoop/HBase/HFiles</value>
6. </property>
7.
8. //Here you have to set the path where you want HBase to store its built in zookeeper files.
9. <property>
10. <name>hbase.zookeeper.property.dataDir</name>
11. <value>/home/hadoop/zookeeper</value>
12. </property>
13. </configuration>
Now start the Hbase by running the start-hbase.sh present in the bin folder of Hbase.
1. $cd /usr/local/HBase/bin
2. $./start-hbase.sh
Use Case
We have to import data present in the file into an HBase table by creating it through Java API.
Data_file.txt contains the below data
1. 1,India,Bihar,Champaran,2009,April,P1,1,5
2. 2,India, Bihar,Patna,2009,May,P1,2,10
3. 3,India, Bihar,Bhagalpur,2010,June,P2,3,15
4. 4,United States,California,Fresno,2009,April,P2,2,5
5. 5,United States,California,Long Beach,2010,July,P2,4,10
6. 6,United States,California,San Francisco,2011,August,P1,6,20
1. "sample,region,time.product,sale,profit".
Column family region has three column qualifiers: country, state, city
Column family Time has two column qualifiers: year, month
Jar Files
Make sure that the following jars are present while writing the code as they are required by the HBase.
a. commons-loging-1.0.4
b. commons-loging-api-1.0.4
c. hadoop-core-0.20.2-cdh3u2
d. hbase-0.90.4-cdh3u2
e. log4j-1.2.15
f. zookeper-3.3.3-cdh3u0
Program Code
1. import java.io.BufferedReader;
2. import java.io.File;
3. import java.io.FileReader;
4. import java.io.IOException;
5. import java.util.StringTokenizer;
6.
7. import org.apache.hadoop.conf.Configuration;
8. import org.apache.hadoop.hbase.HBaseConfiguration;
9. import org.apache.hadoop.hbase.HColumnDescriptor;
10. import org.apache.hadoop.hbase.HTableDescriptor;
11. import org.apache.hadoop.hbase.client.HBaseAdmin;
12. import org.apache.hadoop.hbase.client.HTable;
13. import org.apache.hadoop.hbase.client.Put;
14. import org.apache.hadoop.hbase.util.Bytes;
15.
16.
17. public class readFromFile {
18. public static void main(String[] args) throws IOException{
19. if(args.length==1)
20. {
21. Configuration conf = HBaseConfiguration.create(new Configuration());
22. HBaseAdmin hba = new HBaseAdmin(conf);
23. if(!hba.tableExists(args[0])){
24. HTableDescriptor ht = new HTableDescriptor(args[0]);
25. ht.addFamily(new HColumnDescriptor("sample"));
26. ht.addFamily(new HColumnDescriptor("region"));
27. ht.addFamily(new HColumnDescriptor("time"));
28. ht.addFamily(new HColumnDescriptor("product"));
29. ht.addFamily(new HColumnDescriptor("sale"));
30. ht.addFamily(new HColumnDescriptor("profit"));
31. hba.createTable(ht);
32. System.out.println("New Table Created");
33.
34. HTable table = new HTable(conf,args[0]);
35.
36. File f = new File("/home/training/Desktop/data");
37. BufferedReader br = new BufferedReader(new FileReader(f));
38. String line = br.readLine();
39. int i =1;
40. String rowname="row";
41. while(line!=null && line.length()!=0){
42. System.out.println("Ok till here");
43. StringTokenizer tokens = new StringTokenizer(line,",");
44. rowname = "row"+i;
45. Put p = new Put(Bytes.toBytes(rowname));
46. p.add(Bytes.toBytes("sample"),Bytes.toBytes("sampleNo."),
47. Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
48. p.add(Bytes.toBytes("region"),Bytes.toBytes("country"),Bytes.toBytes(tokens.nextToken()));
49. p.add(Bytes.toBytes("region"),Bytes.toBytes("state"),Bytes.toBytes(tokens.nextToken()));
50. p.add(Bytes.toBytes("region"),Bytes.toBytes("city"),Bytes.toBytes(tokens.nextToken()));
51. p.add(Bytes.toBytes("time"),Bytes.toBytes("year"),Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
52. p.add(Bytes.toBytes("time"),Bytes.toBytes("month"),Bytes.toBytes(tokens.nextToken()));
53. p.add(Bytes.toBytes("product"),Bytes.toBytes("productNo."),Bytes.toBytes(tokens.nextToken()));
54. p.add(Bytes.toBytes("sale"),Bytes.toBytes("quantity"),Bytes.toBytes(Integer.parseInt(tokens.nextToken())));
55. p.add(Bytes.toBytes("profit"),Bytes.toBytes("earnings"),Bytes.toBytes(tokens.nextToken()));
56. i++;
57. table.put(p);
58. line = br.readLine();
59. }
60. br.close();
61. table.close();
62. }
63. else
64. System.out.println("Table Already exists.Please enter another table name");
65. }
66. else
67. System.out.println("Please Enter the table name through command line");
68. }
69. }
Hive Tutorial
Hive tutorial provides basic and advanced concepts of Hive. Our Hive tutorial is designed for beginners and professionals.
Apache Hive is a data ware house system for Hadoop that runs SQL like queries called HQL (Hive query language) which gets
internally converted to map reduce jobs. Hive was developed by Facebook. It supports Data definition Language, Data Manipulation
Language and user defined functions.
Our Hive tutorial includes all topics of Apache Hive with Hive Installation, Hive Data Types, Hive Table partitioning, Hive DDL
commands, Hive DML commands, Hive sort by vs order by, Hive Joining tables etc.
Prerequisite
Before learning Hive, you must have the knowledge of Hadoop and Java.
Audience
Our Hive tutorial is designed to help beginners and professionals.
Problem
We assure that you will not find any problem in this Hive tutorial. But if there is any mistake, please post the problem in contact form.
What is HIVE
Hive is a data warehouse system which is used to analyze structured data. It is built on the top of Hadoop. It was developed by
Facebook.
Hive provides the functionality of reading, writing, and managing large datasets residing in distributed storage. It runs SQL like queries
called HQL (Hive query language) which gets internally converted to MapReduce jobs.
Using Hive, we can skip the requirement of the traditional approach of writing complex MapReduce programs. Hive supports Data
Definition Language (DDL), Data Manipulation Language (DML), and User Defined Functions (UDF).
Features of Hive
These are the following features of Hive:
Hive is fast and scalable.
It provides SQL-like queries (i.e., HQL) that are implicitly transformed to MapReduce or Spark jobs.
It is capable of analyzing large datasets stored in HDFS.
It allows different storage types such as plain text, RCFile, and HBase.
It uses indexing to accelerate queries.
It can operate on compressed data stored in the Hadoop ecosystem.
It supports user-defined functions (UDFs) where user can provide its functionality.
Limitations of Hive
Hive is not capable of handling real-time data.
It is not designed for online transaction processing.
Hive queries contain high latency.
Hive Client
Hive allows writing applications in various languages, including Java, Python, and C++. It supports different types of clients such as:-
Thrift Server - It is a cross-language service provider platform that serves the request from all those programming
languages that supports Thrift.
JDBC Driver - It is used to establish a connection between hive and Java applications. The JDBC Driver is present in the
class org.apache.hadoop.hive.jdbc.HiveDriver.
ODBC Driver - It allows the applications that support the ODBC protocol to connect to Hive.
Hive Services
The following are the services provided by Hive:-
Hive CLI - The Hive CLI (Command Line Interface) is a shell where we can execute Hive queries and commands.
Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI. It provides a web-based GUI for
executing Hive queries and commands.
Hive MetaStore - It is a central repository that stores all the structure information of various tables and partitions in the
warehouse. It also includes metadata of column and its type information, the serializers and deserializers which is used
to read and write data and the corresponding HDFS files where the data is stored.
Hive Server - It is referred to as Apache Thrift Server. It accepts the request from different clients and provides it to
Hive Driver.
Hive Driver - It receives queries from different sources like web UI, CLI, Thrift, and JDBC/ODBC driver. It transfers
the queries to the compiler.
Hive Compiler - The purpose of the compiler is to parse the query and perform semantic analysis on the different query
blocks and expressions. It converts HiveQL statements into MapReduce jobs.
Hive Execution Engine - Optimizer generates the logical plan in the form of DAG of map-reduce tasks and HDFS tasks.
In the end, the execution engine executes the incoming tasks in the order of their dependencies.
Apache Hive Installation
In this section, we will perform the Hive installation.
Pre-requisite
Java Installation - Check whether the Java is installed or not using the following command.
1. $ java -version
Hadoop Installation - Check whether the Hadoop is installed or not using the following command.
1. $hadoop version
If any of them is not installed in your system, follow the below link to install Click Here to Install.
1. export HIVE_HOME=/home/codegyani/apache-hive-1.2.2-bin
2. export PATH=$PATH:/home/codegyani/apache-hive-1.2.2-bin/bin
1. $ source ~/.bashrc
1. $ hive
HIVE Data Types
Hive data types are categorized in numeric types, string types, misc types, and complex types. A list of Hive data types is given below.
Integer Types
Type-Size-Range
TINYINT-1-byte signed integer--128 to 127
SMALLINT-2-byte signed integer-32,768 to 32,767
INT-4-byte signed integer-2,147,483,648 to 2,147,483,647
BIGINT-8-byte signed integer--9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
Decimal Type
Type-Size-Range
FLOAT-4-byte-Single precision floating point number
DOUBLE-8-byte-Double precision floating point number
Date/Time Types
TIMESTAMP
It supports traditional UNIX timestamp with optional nanosecond precision.
As Integer numeric type, it is interpreted as UNIX timestamp in seconds.
As Floating point numeric type, it is interpreted as UNIX timestamp in seconds with decimal precision.
As string, it follows java.sql.Timestamp format "YYYY-MM-DD HH:MM:SS.fffffffff" (9 decimal place precision)
DATES
The Date value is used to specify a particular year, month and day, in the form YYYY--MM--DD. However, it didn't provide the time
of the day. The range of Date type lies between 0000--01--01 to 9999--12--31.
String Types
STRING
The string is a sequence of characters. It values can be enclosed within single quotes (') or double quotes (").
Varchar
The varchar is a variable length type whose range lies between 1 and 65535, which specifies that the maximum number of characters
allowed in the character string.
CHAR
The char is a fixed-length type whose maximum length is fixed at 255.
Complex Type
Type-Size-Range
Struct-It is similar to C struct or an object where fields are accessed using the "dot" notation.-struct('James','Roy')
Map-It contains the key-value tuples where the fields are accessed using array notation.-map('first','James','last','Roy')
Array-It is a collection of similar type of values that indexable using zero-based integers.-array('James','Roy')
Hive - Create Database
In Hive, the database is considered as a catalog or namespace of tables. So, we can maintain multiple tables within a database where a
unique name is assigned to each table. Hive also provides a default database with a name default.
Initially, we check the default database provided by Hive. So, to check the list of existing databases, follow the below
command: -
Each database must contain a unique name. If we create two databases with the same name, the following error generates:
If we want to suppress the warning generated by Hive on creating the database with the same name, follow the below
command: -
As we can see, the database demo is not present in the list. Hence, the database is dropped successfully.
If we try to drop the database that doesn't exist, the following error generates:
However, if we want to suppress the warning generated by Hive on creating the database with the same name, follow the
below command:-
In Hive, it is not allowed to drop the database that contains the tables directly. In such a case, we can drop the database
either by dropping tables first or use Cascade keyword with the command.
Let's see the cascade command used to drop the database:-
This command automatically drops the tables present in the database first.
Hive - Create Table
In Hive, we can create a table by using the conventions similar to the SQL. It supports a wide range of flexibility where the data files for
tables are stored. It provides two types of table: -
Internal table
External table
Internal Table
The internal tables are also called managed tables as the lifecycle of their data is controlled by the Hive. By default, these tables are
stored in a subdirectory under the directory defined by hive.metastore.warehouse.dir (i.e. /user/hive/warehouse). The internal tables are
not flexible enough to share with other tools like Pig. If we try to drop the internal table, Hive deletes both table schema and data.
Let's create an internal table by using the following command:-
1. hive> create table demo.employee (Id int, Name string , Salary float)
2. row format delimited
3. fields terminated by ',' ;
Here, the command also includes the information that the data is separated by ','.
Let's see the metadata of the created table by using the following command:-
External Table
The external table allows us to create and access a table and a data externally. The external keyword is used to specify the external
table, whereas the location keyword is used to determine the location of loaded data.
As the table is external, the data is not present in the Hive directory. Therefore, if we try to drop the table, the metadata of the table will
be deleted, but the data still exists.
To create an external table, follow the below steps: -
Let's create a directory on HDFS by using the following command: -
1. hive> create external table emplist (Id int, Name string , Salary float)
2. row format delimited
3. fields terminated by ','
4. location '/HiveDirectory';
Hive - Load Data
Once the internal table has been created, the next step is to load the data into it. So, in Hive, we can easily load data from any file to the
database.
Let's load the data of the file into the database by using the following command: -
If we want to add more data into the current database, execute the same query again by just updating the new file name.
In Hive, if we try to load unmatched data (i.e., one or more column data doesn't match the data type of specified table
columns), it will not throw any exception. However, it stores the Null value at the position of unmatched tuple.
Let's add one more file to the current table. This file contains the unmatched data.
Here, the third column contains the data of string type, and the table allows the float type data. So, this condition arises in an unmatched
data situation.
Now, load the data into the table.
Here, we can see the Null values at the position of unmatched data.
Partitioning in Hive
The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city or
country. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.
As we know that Hadoop is used to handle the huge amount of data, it is always required to use the best approach to deal with it. The
partitioning in Hive is the best example of it.
Let's assume we have a data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If
we use a traditional approach, we have to go through the entire data. This leads to performance degradation. In such a case, we can
adopt the better approach i.e., partitioning in Hive and divide the data among the different datasets based on particular columns.
The partitioning in Hive can be executed in two ways -
Static partitioning
Dynamic partitioning
Static Partitioning
In static or manual partitioning, it is required to pass the values of partitioned columns manually while loading the data into the table.
Hence, the data file doesn't contain the partitioned columns.
Example of Static Partitioning
First, select the database in which we want to create a table.
Create the table and provide the partitioned columns by using the following command: -
1. hive> create table student (id int, name string, age int, institute string)
2. partitioned by (course string)
3. row format delimited
4. fields terminated by ',';
Load the data into the table and pass the values of partition columns with it by using the following command: -
Let's retrieve the entire data of the able by using the following command: -
Now, try to retrieve the data based on partitioned columns by using the following command: -
In this case, we are not examining the entire data. Hence, this approach improves query response time.
Let's also retrieve the data of another partitioned dataset by using the following command: -
1. hive> create table stud_demo(id int, name string, age int, institute string, course string)
2. row format delimited
3. fields terminated by ',';
1. hive> create table student_part (id int, name string, age int, institute string)
2. partitioned by (course string)
3. row format delimited
4. fields terminated by ',';
Now, insert the data of dummy table into the partition table.
1) What is Hadoop?
Hadoop is a distributed computing platform. It is written in Java. It consists of the features like Google File System and MapReduce.
10) Which command is used for the retrieval of the status of daemons running the
Hadoop cluster?
The 'jps' command is used for the retrieval of the status of daemons running the Hadoop cluster.
40) What is the difference between Hadoop and other data processing tools?
Hadoop facilitates you to increase or decrease the number of mappers without worrying about the volume of data to be processed.
PYTHON
FOR
BEGINNERS
CHAPTER 1
Installing Python
Visit the link https://www.python.org/downloads/ to download the latest release of Python.
Double-click the executable file which is downloaded; the following window will open. Select Customize installation and proceed.
Now click Install Now.
When it finishes, you see a screen that says the Setup was successful.
Step 4) A new pop up is appear. Next type the name of the file you want (Here we give “HelloWorld”) and hit “OK”.
Step 5) Next type a simple program - print (‘Hello World!’).
Step 6) Now Go up to “Run” menu and select “Run” to run your program.
Step 7) You can see output of your program at the bottom of the screen.
Step 8) Don't worry if u don't have Pycharm Editor installed, U can still run the code from the command prompt. Enter correct path of a
file in command prompt to run the program.
Concatenate Variables
Let's see whether concatenate different data types like string and number together. For example, we concatenate "world" with the
number "2020".
Unlike Java, concatenates number with string without declaring number as string, Python requires declaring the number as string
otherwise it will show a TypeError
For the following code, you will get undefined output
a="world"
b = 2020
print a+b
Once the integer declared as string, it can concatenate both "world" + str("2020")= "world2020" in the output.
a="world"
b = 2020
print(a+str(b))
1. Variable "f" is global in scope and it is assigned value 101 which is printed in output
2. Variable f is again declared in function and it assumes local scope. That is assigned value "I am learning Python." which
is printed out as output. This variable is different from the global variable "f" define earlier in this chapter
3. Once function call is over, the local variable f is destroyed. At line 12, when u again, print the value of "f" is it displays
the value of global variable f=101
Python 2 Example
# Declare a variable and initialize it
f = 101
print f
# Global vs. local variables in functions
def someFunction():
# global f
f = 'I am learning Python'
print f
someFunction()
print f
Python 3 Example
# Declare a variable and initialize it
f = 101
print(f)
# Global vs. local variables in functions
def someFunction():
# global f
f = 'I am learning Python'
print(f)
someFunction()
print(f)
Using keyword global, we can reference the global variable inside a function.
1. Variable "f" is global in scope and is assigned value 101 which printed in output
2. Variable f declared using the keyword global. This NOT a local variable, but the same global variable declared earlier.
Hence we print its value, the output is 101
3. We changed value of "f" inside the function. Once the function call over, the changed value of variable "f" persists. At
line 12, when we again, print value of "f" is it displays the value "changing global variable"
Python 2 Example
f = 101;
print f
# Global vs.local variables in functions
def someFunction():
global f
print f
f = "changing global variable"
someFunction()
print f
Python 3 Example
f = 101;
print(f)
# Global vs.local variables in functions
def someFunction():
global f
print(f)
f = "changing global variable"
someFunction()
print(f)
Delete a variable
We can also delete variable using the command del "variable name".
In this example below, we deleted variable f, and when we proceed to print it, we get error "variable name is not defined" which means
you have deleted the variable.
f = 11;
print(f)
del f
print(f)
Summary:
● Variables referred to "envelop" or "buckets" where information can be maintained and referenced. Like any other programming
language Python also uses variable to store the information.
● Variables can declared by any name or even alphabets like a, aa, abc, etc.
● Variables can re-declared even after we have declared them for once
● In Python we cannot concatenate string with number directly, you need to declare them as a separate variable, and after that, you
can concatenate number with string
● Declare local variable when we want to use for current function
● Declare Global variable when want to weuse the same variable for rest of the program
● To delete variable, it uses keyword "del".
CHAPTER 2
Python String
Till now, we have discussed numbers as the standard data types in python. In this section of the tutorial, we will discuss the most
popular data type in python i.e., string.
In python, strings can be created by enclosing the character or the sequence of characters in the quotes. Python allows us to use single
quotes, double quotes, or triple quotes to create the string.
Consider the following example in python to create a string.
str = "Hi Python !"
Here, if we check the type of the variable str using a python script
print(type(str)), then it will print string (str).
In python, strings are treated as the sequence of strings which means that python doesn't support the character data type instead a single
character written as 'p' is treated as the string of length 1.
Strings indexing and splitting
Like other languages, the indexing of the python strings starts from 0. For example, The string "HELLO" is indexed as given in the
below figure.
Reassigning strings
Updating the content of the strings is as easy as assigning it to a new string. The string object doesn't support item assignment i.e., A
string can only be replaced with a new string since its content can not be partially replaced. Strings are immutable in python.
Consider the following example.
Example 1
str = "HELLO"
str[0] = "h"
print(str)
Output:
Traceback (most recent call last):
File "12.py", line 2, in <module>
str[0] = "h";
TypeError: 'str' object does not support item assignment
However, in example 1, the string str can be completely assigned to a new content as specified in the following example.
Example 2
str = "HELLO"
print(str)
str = "hello"
print(str)
Output:
HELLO
hello
Example
Consider the following example to understand the real use of Python operators.
str = "Hello"
str1 = " world"
print (str* 3 ) # prints HelloHelloHello
print (str+str1) # prints Hello world
print (str[ 4 ]) # prints o
print (str[ 2 : 4 ]); # prints ll
print ( 'w' in str) # prints false as w is not present in str
print ( 'wo' not in str1) # prints false as wo is present in str1.
print (r 'C://python37' ) # prints C://python37 as it is written
print ( "The string str : %s" %(str)) # prints The string str : Hello
Output:
HelloHelloHello
Hello world
o
ll
False
False
C://python37
The string str : Hello
Example
tuple1 = (10, 20, 30, 40, 50, 60)
print(tuple1)
count = 0
for i in tuple1:
print("tuple1[%d] = %d"%(count, i));
Output:
(10, 20, 30, 40, 50, 60)
tuple1[0] = 10
tuple1[0] = 20
tuple1[0] = 30
tuple1[0] = 40
tuple1[0] = 50
tuple1[0] = 60
Example 2
tuple1 = tuple(input("Enter the tuple elements ..."))
print(tuple1)
count = 0
for i in tuple1:
print("tuple1[%d] = %s"%(count, i));
Output:
Enter the tuple elements ...12345
('1', '2', '3', '4', '5')
tuple1[0] = 1
tuple1[0] = 2
tuple1[0] = 3
tuple1[0] = 4
tuple1[0] = 5
However, if we try to reassign the items of a tuple, we would get an error as the tuple object doesn't support the item assignment.
An empty tuple can be written as follows.
T3 = ()
The tuple having a single value must include a comma as given below.
T4 = (90,)
A tuple is indexed in the same way as the lists. The items in the tuple can be accessed by using their specific index value.
We will see all these aspects of tuple in this section of the tutorial.
Tuple indexing and splitting
The indexing and slicing in tuple are similar to lists. The indexing in the tuple starts from 0 and goes to length(tuple) - 1.
The items in the tuple can be accessed by using the slice operator. Python also allows us to use the colon operator to access multiple
items in the tuple.
Consider the following image to understand the indexing and slicing in detail.
Unlike lists, the tuple items can not be deleted by using the del keyword as tuples are immutable. To delete an entire tuple, we can use
the del keyword with the tuple name.
Consider the following example.
tuple1 = (1, 2, 3, 4, 5, 6)
print(tuple1)
del tuple1[0]
print(tuple1)
del tuple1
print(tuple1)
Output:
(1, 2, 3, 4, 5, 6)
Traceback (most recent call last):
File "tuple.py", line 4, in <module>
print(tuple1)
NameError: name 'tuple1' is not defined
Like lists, the tuple elements can be accessed in both the directions. The right most element (last) of the tuple can be accessed by using
the index -1. The elements from left to right are traversed using the negative indexing.
Consider the following example.
tuple1 = (1, 2, 3, 4, 5)
print(tuple1[-1])
print(tuple1[-4])
Output:
5
2
3. Tuple can be used as the key inside dictionary due to its immutable nature.
CHAPTER 4
Python Dictionary
Dictionary is used to implement the key-value pair in python. The dictionary is the data type in python which can simulate the real-life
data arrangement where some specific value exists for some particular key.
In other words, we can say that a dictionary is the collection of key-value pairs where the value can be any python object whereas the
keys are the immutable python object, i.e., Numbers, string or tuple.
Dictionary simulates Java hash-map in python.
Creating the dictionary
The dictionary can be created by using multiple key-value pairs enclosed with the small brackets () and separated by the colon (:). The
collections of the key-value pairs are enclosed within the curly braces {}.
The syntax to define the dictionary is given below.
Dict = {"Name": "Ayush","Age": 22}
In the above dictionary Dict, The keys Name, and Age are the string that is an immutable object.
Let's see an example to create a dictionary and printing its content.
Employee = {"Name": "John", "Age": 29, "salary":25000,"Company":"GOOGLE"}
print(type(Employee))
print("printing Employee data .... ")
print(Employee)
Output
<class 'dict'>
printing Employee data ....
{'Age': 29, 'salary': 25000, 'Name': 'John', 'Company': 'GOOGLE'}
Accessing the dictionary values
We have discussed how the data can be accessed in the list and tuple by using the indexing.
However, the values can be accessed in the dictionary by using the keys as keys are unique in the dictionary.
The dictionary values can be accessed in the following way.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
print (type(Employee))
print ( "printing Employee data .... " )
print ( "Name : %s" %Employee[ "Name" ])
print ( "Age : %d" %Employee[ "Age" ])
print ( "Salary : %d" %Employee[ "salary" ])
print ( "Company : %s" %Employee[ "Company" ])
Output:
<class 'dict'>
printing Employee data ....
Name : John
Age : 29
Salary : 25000
Company : GOOGLE
Python provides us with an alternative to use the get() method to access the dictionary values. It would give the same result as given by
the indexing.
Updating dictionary values
The dictionary is a mutable data type, and its values can be updated by using the specific keys.
Let's see an example to update the dictionary values.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
print (type(Employee))
print ( "printing Employee data .... " )
print (Employee)
print ( "Enter the details of the new employee...." );
Employee[ "Name" ] = input( "Name: " );
Employee[ "Age" ] = int(input( "Age: " ));
Employee[ "salary" ] = int(input( "Salary: " ));
Employee[ "Company" ] = input( "Company:" );
print ( "printing the new data" );
print (Employee)
Output:
<class 'dict'>
printing Employee data ....
{'Name': 'John', 'salary': 25000, 'Company': 'GOOGLE', 'Age': 29}
Enter the details of the new employee....
Name: David
Age: 19
Salary: 8900
Company:JTP
printing the new data
{'Name': 'David', 'salary': 8900, 'Company': 'JTP', 'Age': 19}
Deleting elements using del keyword
The items of the dictionary can be deleted by using the del keyword as given below.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
print (type(Employee))
print ( "printing Employee data .... " )
print (Employee)
print ( "Deleting some of the employee data" )
del Employee[ "Name" ]
del Employee[ "Company" ]
print ( "printing the modified information " )
print (Employee)
print ( "Deleting the dictionary: Employee" );
del Employee
print ( "Lets try to print it again " );
print (Employee)
Output:
<class 'dict'>
printing Employee data ....
{'Age': 29, 'Company': 'GOOGLE', 'Name': 'John', 'salary': 25000}
Deleting some of the employee data
printing the modified information
{'Age': 29, 'salary': 25000}
Deleting the dictionary: Employee
Lets try to print it again
Traceback (most recent call last):
File "list.py", line 13, in <module>
print(Employee)
NameError: name 'Employee' is not defined
Iterating Dictionary
A dictionary can be iterated using the for loop as given below.
Example 1
# for loop to print all the keys of a dictionary
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
for x in Employee:
print (x);
Output:
Name
Company
salary
Age
Example 2
#for loop to print all the values of the dictionary
GOOGLE
25000
John
29
Example 4
#for loop to print the items of the dictionary by using items() method.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" }
for x in Employee.items():
print (x);
Output:
('Name', 'John')
('Age', 29)
('salary', 25000)
('Company', 'GOOGLE')
Properties of Dictionary keys
1. In the dictionary, we can not store multiple values for the same keys. If we pass more than one values for a single key, then the value
which is last assigned is considered as the value of the key.
Consider the following example.
Employee = { "Name" : "John" , "Age" : 29 , "Salary" : 25000 , "Company" : "GOOGLE" , "Name" : "Johnn" }
for x,y in Employee.items():
print (x,y)
Output:
Salary 25000
Company GOOGLE
Name Johnn
Age 29
2. In python, the key cannot be any mutable object. We can use numbers, strings, or tuple as the key but we can not use any mutable
object like the list as the key in the dictionary.
Consider the following example.
Employee = { "Name" : "John" , "Age" : 29 , "salary" : 25000 , "Company" : "GOOGLE" ,[ 100 , 201 , 301 ]: "Department ID" }
for x,y in Employee.items():
print (x,y)
Output:
Arithmetic operators
Arithmetic operators are used to perform arithmetic operations between two operands. It includes +(addition), - (subtraction), *
(multiplication), /(divide), %(reminder), //(floor division), and exponent (**).
Consider the following table for a detailed explanation of arithmetic operators.
/ (divide) - It returns the quotient after dividing the first operand by the second operand. For example, if a = 20, b = 10 => a/b = 2
* (Multiplication) - It is used to multiply one operand with the other. For example, if a = 20, b = 10 => a * b = 200
% (reminder) - It returns the reminder after dividing the first operand by the second operand. For example, if a = 20, b = 10 => a%b =
0
** (Exponent) - It is an exponent operator represented as it calculates the first operand power to second operand.
// (Floor division) - It gives the floor value of the quotient produced by dividing the two operands.
Comparison operator
Comparison operators are used to comparing the value of the two operands and returns boolean true or false accordingly. The
comparison operators are described in the following table.
!= If the value of two operands is not equal then the condition becomes true.
<= If the first operand is less than or equal to the second operand, then the condition becomes true.
>= If the first operand is greater than or equal to the second operand, then the condition becomes true.
<> If the value of two operands is not equal, then the condition becomes true.
> If the first operand is greater than the second operand, then the condition becomes true.
< If the first operand is less than the second operand, then the condition becomes true.
= It assigns the the value of the right expression to the left operand.
+= It increases the value of the left operand by the value of the right operand and assign the modified value back to left operand. For
example, if a = 10, b = 20 => a+ = b will be equal to a = a+ b and therefore, a = 30.
-= It decreases the value of the left operand by the value of the right operand and assign the modified value back to left operand.
For example, if a = 20, b = 10 => a- = b will be equal to a = a- b and therefore, a = 10.
*= It multiplies the value of the left operand by the value of the right operand and assign the modified value back to left operand.
For example, if a = 10, b = 20 => a* = b will be equal to a = a* b and therefore, a = 200.
%= It divides the value of the left operand by the value of the right operand and assign the reminder back to left operand. For
example, if a = 20, b = 10 => a % = b will be equal to a = a % b and therefore, a = 0.
**= a**=b will be equal to a=a**b, for example, if a = 4, b =2, a**=b will assign 4**2 = 16 to a.
//= A//=b will be equal to a = a// b, for example, if a = 4, b = 3, a//=b will assign 4//3 = 1 to a.
Bitwise operator
The bitwise operators perform bit by bit operation on the values of the two operands.
For example,
if a = 7 ;
b = 6;
then, binary (a) = 0111
binary (b) = 0011
| (binary or) - The resulting bit will be 0 if both the bits are zero otherwise the resulting bit will be 1.
^ (binary xor) - The resulting bit will be 1 if both the bits are different otherwise the resulting bit will be 0.
~ (negation) - It calculates the negation of each bit of the operand, i.e., if the bit is 0, the resulting bit will be 1 and vice versa.
<< (left shift) - The left operand value is moved left by the number of bits present in the right operand.
>> (right shift) - The left operand is moved right by the number of bits present in the right operand.
Logical Operators
The logical operators are used primarily in the expression evaluation to make a decision. Python supports the following logical
operators.
or - If one of the expressions is true, then the condition will be true. If a and b are the two expressions, a → true, b → false => a
or b → true.
not - If an expression a is true then not (a) will be false and vice versa.
Membership Operators
Python membership operators are used to check the membership of value inside a data structure. If the value is present in the data
structure, then the resulting value is true otherwise it returns false.
Operator and Description
in - It is evaluated to be true if the first operand is found in the second operand (list, tuple, or dictionary).
not in - It is evaluated to be true if the first operand is not found in the second operand (list, tuple, or dictionary).
Identity Operators
Operator and Description
is - It is evaluated to be true if the reference present at both sides point to the same object.
is not - It is evaluated to be true if the reference present at both side do not point to the same object.
Operator Precedence
The precedence of the operators is important to find out since it enables us to know which operator should be evaluated first. The
precedence table of the operators in python is given below.
Operator and Description
** The exponent operator is given priority over all the others used in the expression.
<= < > >= Comparison operators (less then, less then equal to, greater then, greater then equal to).
= %= /= //= -= +=
*= **= Assignment operators
By using functions, we can avoid rewriting same logic/code again and again in a program.
We can call python functions any number of times in a program and from any place in a program.
We can track a large python program easily when it is divided into multiple functions.
Reusability is the main achievement of python functions.
However, Function calling is always overhead in a python program.
Creating a function
In python, we can use def keyword to define the function. The syntax to define a function in python is given below.
def my_function():
function-suite
return <expression>
The function block is started with the colon (:) and all the same level block statements remain at the same indentation.
A function can accept any number of parameters that must be the same in the definition and function calling.
Function calling
In python, a function must be defined before the function calling otherwise the python interpreter gives an error. Once the function is
defined, we can call it from another function or the python prompt. To call the function, use the function name followed by the
parentheses.
A simple function that prints the message "Hello Word" is given below.
def hello_world():
print ( "hello world" )
hello_world()
Output:
hello world
Parameters in function
The information into the functions can be passed as the parameters. The parameters are specified in the parentheses. We can give any
number of parameters, but we have to separate them with a comma.
Consider the following example which contains a function that accepts a string as the parameter and prints it.
Example 1
#defining the function
def func (name):
print ( "Hi " ,name);
1. Required arguments
2. Keyword arguments
3. Default arguments
4. Variable-length arguments
Required Arguments
Till now, we have learned about function calling in python. However, we can provide the arguments at the time of function calling. As
far as the required arguments are concerned, these are the arguments which are required to be passed at the time of function calling with
the exact match of their positions in the function call and function definition. If either of the arguments is not provided in the function
call, or the position of the arguments is changed, then the python interpreter will show the error.
Consider the following example.
Example 1
#the argument name is the required argument to the function func
def func(name):
message = "Hi " +name;
return message;
name = input( "Enter the name?" )
print (func(name))
Output:
Enter the name?John
Hi John
Example 2
#the function simple_interest accepts three arguments and returns the simple interest accordingly
def simple_interest(p,t,r):
return (p*t*r)/ 100
p = float(input( "Enter the principle amount? " ))
r = float(input( "Enter the rate of interest? " ))
t = float(input( "Enter the time in years? " ))
print ( "Simple Interest: " ,simple_interest(p,r,t))
Output:
Enter the principle amount? 10000
Enter the rate of interest? 5
Enter the time in years? 2
Simple Interest: 1000.0
Example 3
#the function calculate returns the sum of two arguments a and b
def calculate(a,b):
return a+b
calculate( 10 ) # this causes an error as we are missing a required arguments
b.
Output:
TypeError: calculate() missing 1 required positional argument: 'b'
Keyword arguments
Python allows us to call the function with the keyword arguments. This kind of function call will enable us to pass the arguments in the
random order.
The name of the arguments is treated as the keywords and matched in the function calling and definition. If the same match is found, the
values of the arguments are copied in the function definition.
Consider the following example.
Example 1
#function func is called with the name and message as the keyword arguments
def func(name,message):
print ( "printing the message with" ,name, "and " ,message)
func(name = "John" ,message= "hello" ) #name and message is copied with the values John and hello respectively
Output:
printing the message with John and hello
Example 2 providing the values in different order at the calling
#The function simple_interest(p, t, r) is called with the keyword arguments the order of arguments doesn't matter in this case
def simple_interest(p,t,r):
return (p*t*r)/ 100
print ( "Simple Interest: " ,simple_interest(t= 10 ,r= 10 ,p= 1900 ))
Output:
Simple Interest: 1900.0
If we provide the different name of arguments at the time of function call, an error will be thrown.
Consider the following example.
Example 3
#The function simple_interest(p, t, r) is called with the keyword arguments.
def simple_interest(p,t,r):
return (p*t*r)/ 100
print ( "Simple Interest: " ,simple_interest(time= 10 ,rate= 10 ,principle= 1900 )) # doesn?t find the exact match of the name of the arguments (keywords)
Output:
TypeError: simple_interest() got an unexpected keyword argument 'time'
The python allows us to provide the mix of the required arguments and keyword arguments at the time of function call. However, the required argument must not be given after the
keyword argument, i.e., once the keyword argument is encountered in the function call, the following arguments must also be the keyword arguments.
Consider the following example.
Example 4
def func(name1,message,name2):
print ( "printing the message with" ,name1, "," ,message, ",and" ,name2)
func( "John" ,message= "hello" ,name2= "David" ) #the first argument is not the keyword argument
Output:
printing the message with John , hello ,and David
The following example will cause an error due to an in-proper mix of keyword and required arguments being passed in the function
call.
Example 5
def func(name1,message,name2):
print ( "printing the message with" ,name1, "," ,message, ",and" ,name2)
func( "John" ,message= "hello" , "David" )
Output:
SyntaxError: positional argument follows keyword argument
Default Arguments
Python allows us to initialize the arguments at the function definition. If the value of any of the argument is not provided at the time of
function call, then that argument can be initialized with the value given in the definition even if the argument is not specified at the
function call.
Example 1
def printme(name,age= 22 ):
print ( "My name is" ,name, "and age is" ,age)
printme(name = "john" ) #the variable age is not passed into the function however the default value of age is considered in the function
Output:
My name is john and age is 22
Example 2
def printme(name,age= 22 ):
print ( "My name is" ,name, "and age is" ,age)
printme(name = "john" ) #the variable age is not passed into the function however the default value of age is considered in the function
printme(age = 10 ,name= "David" ) #the value of age is overwritten here, 10 will be printed as age
Output:
My name is john and age is 22
My name is David and age is 10
Variable length Arguments
In the large projects, sometimes we may not know the number of arguments to be passed in advance. In such cases, Python provides us
the flexibility to provide the comma separated values which are internally treated as tuples at the function call.
However, at the function definition, we have to define the variable with * (star) as *<variable - name >.
Consider the following example.
Example
def printme(*names):
print ( "type of passed argument is " ,type(names))
print ( "printing the passed arguments..." )
for name in names:
print (name)
printme( "john" , "David" , "smith" , "nick" )
Output:
type of passed argument is
printing the passed arguments...
john
David
smith
nick
Scope of variables
The scopes of the variables depend upon the location where the variable is being declared. The variable declared in one part of the
program may not be accessible to the other parts.
In python, the variables are defined with the two types of scopes.
1. Global variables
2. Local variables
The variable defined outside any function is known to have a global scope whereas the variable defined inside a function is known to
have a local scope.
Consider the following example.
Example 1
def print_message():
message = "hello !! I am going to print a message." # the variable message is local to the function itself
print (message)
print_message()
print (message) # this will cause an error since a local variable cannot be accessible here.
Output:
hello !! I am going to print a message.
File "/root/PycharmProjects/PythonTest/Test1.py", line 5, in
print(message)
NameError: name 'message' is not defined
Example 2
def calculate(*args):
sum= 0
for arg in args:
sum = sum +arg
print ( "The sum is" ,sum)
sum= 0
calculate( 10 , 20 , 30 ) #60 will be printed as the sum
print ( "Value of sum outside the function:" ,sum) # 0 will be printed
Output:
The sum is 60
Value of sum outside the function: 0
CHAPTER 7
Python If-else statements
Decision making is the most important aspect of almost all the programming languages. As the name implies, decision making allows
us to run a particular block of code for a particular decision. Here, the decisions are made on the validity of the particular conditions.
Condition checking is the backbone of decision making.
In python, decision making is performed by the following statements.
If - else Statement - The if-else statement is similar to if statement except the fact that, it also provides the block of the code for the false
case of the condition to be checked. If the condition provided in the if statement is false, then the else statement will be executed.
Nested if Statement - Nested if statements enable us to use if ? else statement inside an outer if statement.
Indentation in Python
For the ease of programming and to achieve simplicity, python doesn't allow the use of parentheses for the block level code. In Python,
indentation is used to declare a block. If two statements are at the same indentation level, then they are the part of the same block.
Generally, four spaces are given to indent the statements which are a typical amount of indentation in python.
Indentation is the most used part of the python language since it declares the block of code. All the statements of one block are intended
at the same level indentation. We will see how the actual indentation takes place in decision making and other stuff in python.
The if statement
The if statement is used to test a particular condition and if the condition is true, it executes a block of code known as if-block. The
condition of if statement can be any valid logical expression which can be either evaluated to true or false.
The syntax of the if-statement is given below.
if expression:
statement
Example 1
num = int(input( "enter the number?" ))
if num% 2 == 0 :
print ( "Number is even" )
Output:
enter the number?10
Number is even
Example 2 : Program to print the largest of the three numbers.
a = int(input( "Enter a? " ));
b = int(input( "Enter b? " ));
c = int(input( "Enter c? " ));
if a>b and a>c:
print ( "a is largest" );
if b>a and b>c:
print ( "b is largest" );
if c>a and c>b:
print ( "c is largest" );
Output:
Enter a? 100
Enter b? 120
Enter c? 130
c is largest
if expression 1:
# block of statements
elif expression 2:
# block of statements
elif expression 3:
# block of statements
else :
# block of statements
Example 1
number = int(input( "Enter the number?" ))
if number== 10 :
print ( "number is equals to 10" )
elif number== 50 :
print ( "number is equal to 50" );
elif number== 100 :
print ( "number is equal to 100" );
else :
print ( "number is not equal to 10, 50 or 100" );
Output:
Enter the number?15
number is not equal to 10, 50 or 100
CHAPTER 8
Python Loops
The flow of the programs written in any programming language is sequential by default. Sometimes we may need to alter the flow of
the program. The execution of a specific code may need to be repeated several numbers of times.
For this purpose, The programming languages provide various types of loops which are capable of repeating some specific code several
numbers of times. Consider the following diagram to understand the working of a loop statement.
do-while loop - The do-while loop continues until a given condition satisfies. It is also called post tested loop. It is used when it is
necessary to execute the loop at least once (mostly menu driven programs).
Example
i=1
n=int(input("Enter the number up to which you want to print the natural numbers?"))
for i in range(0,10):
print(i,end = ' ')
Output:
0123456789
Python for loop example : printing the table of the given number
i= 1 ;
num = int(input( "Enter a number:" ));
for i in range( 1 , 11 ):
print ( "%d X %d = %d" %(num,i,num*i));
Output:
Enter a number:10
10 X 1 = 10
10 X 2 = 20
10 X 3 = 30
10 X 4 = 40
10 X 5 = 50
10 X 6 = 60
10 X 7 = 70
10 X 8 = 80
10 X 9 = 90
10 X 10 = 100
Example 1
n = int(input( "Enter the number of rows you want to print?" ))
i,j= 0 , 0
for i in range( 0 ,n):
print ()
for j in range( 0 ,i+ 1 ):
print ( "*" ,end="")
Output:
Enter the number of rows you want to print?5
*
**
***
****
*****
In the above example, for loop is executed completely since there is no break statement in the loop. The control comes out of the loop
and hence the else block is executed.
Output:
0
1
2
3
4
In the above example, the loop is broken due to break statement therefore the else statement will not be executed. The statement present
immediate next to else block will be executed.
Output:
0
Python while loop
The while loop is also known as a pre-tested loop. In general, a while loop allows a part of the code to be executed as long as the given
condition is true.
It can be viewed as a repeating if statement. The while loop is mostly used in the case where the number of iterations is not known in
advance.
The syntax is given below.
while expression:
statements
Here, the statements can be a single statement or the group of statements. The expression should be any valid python expression
resulting into true or false. The true is any non-zero value.
Example 1
i= 1 ;
while i<= 10 :
print (i);
i=i+ 1 ;
Output:
1
2
3
4
5
6
7
8
9
10
#loop statements
continue ;
#the code to be skipped
Example 1
i = 0;
while i!= 10 :
print ( "%d" %i);
continue ;
i=i+ 1 ;
Output:
infinite loop
Object
Class
Method
Inheritance
Polymorphism
Data Abstraction
Encapsulation
Object
The object is an entity that has state and behavior. It may be any real-world object like the mouse, keyboard, chair, table, pen, etc.
Everything in Python is an object, and almost everything has attributes and methods. All functions have a built-in attribute __doc__,
which returns the doc string defined in the function source code.
Class
The class can be defined as a collection of objects. It is a logical entity that has some specific attributes and methods. For example: if
you have an employee class then it should contain an attribute and method, i.e. an email id, name, age, salary, etc.
Syntax
class ClassName:
<statement- 1 >
.
.
<statement-N>
Method
The method is a function that is associated with an object. In Python, a method is not unique to class instances. Any object type can
have methods.
Inheritance
Inheritance is the most important aspect of object-oriented programming which simulates the real world concept of inheritance. It
specifies that the child object acquires all the properties and behaviors of the parent object.
By using inheritance, we can create a class which uses all the properties and behavior of another class. The new class is known as a
derived class or child class, and the one whose properties are acquired is known as a base class or parent class.
It provides re-usability of the code.
Polymorphism
Polymorphism contains two words "poly" and "morphs". Poly means many and Morphs means form, shape. By polymorphism, we
understand that one task can be performed in different ways. For example You have a class animal, and all animals speak. But they
speak differently. Here, the "speak" behavior is polymorphic in the sense and depends on the animal. So, the abstract "animal" concept
does not actually "speak", but specific animals (like dogs and cats) have a concrete implementation of the action "speak".
Encapsulation
Encapsulation is also an important aspect of object-oriented programming. It is used to restrict access to methods and variables. In
encapsulation, code and data are wrapped together within a single unit from being modified by accident.
Data Abstraction
Data abstraction and encapsulation both are often used as synonyms. Both are nearly synonym because data abstraction is achieved
through encapsulation.
Abstraction is used to hide internal details and show only functionalities. Abstracting something means to give names to things so that
the name captures the core of what a function or a whole program does.
Interview Questions and Answers
A list of frequently asked Python interview questions with answers for freshers and experienced are given below
1) What is Python?
Python is a general-purpose computer programming language. It is a high-level, object-oriented language which can run equally on
different platforms such as Windows, Linux, UNIX, and Macintosh. It is widely used in data science, machine learning and artificial
intelligence domain.
It is easy to learn and require less code to develop the applications.
Python provides various web frameworks to develop web applications. The popular python web frameworks are Django, Pyramid,
Flask.
Python's standard library supports for E-mail processing, FTP, IMAP, and other Internet protocols.
Python's SciPy and NumPy helps in scientific and computational application development.
Python's Tkinter library supports to create a desktop based GUI applications.
Interpreted
Free and open source
Extensible
Object-oriented
Built-in data structure
Readability
High-Level Language
Cross-platform
Interpreted: Python is an interpreted language. It does not require prior compilation of code and executes instructions
directly.
Free and open source: It is an open source project which is publicly available to reuse. It can be downloaded free of cost.
Portable: Python programs can run on cross platforms without affecting its performance.
Extensible: It is very flexible and extensible with any module.
Object-oriented: Python allows to implement the Object Oriented concepts to build application solution.
Built-in data structure: Tuple, List, and Dictionary are useful integrated data structures provided by the language.
4. What is PEP 8?
PEP 8 is a coding convention which specifies a set of guidelines, about how to write Python code more readable.
It's a set of rules to guide how to format your Python code to maximize its readability. Writing code to a specification helps to make
significant code bases, with lots of writers, more uniform and predictable, too.
E.g.:
"Aman", '12345'.
Numeric Literals
Python supports three types of numeric literals integer, float and complex. See the examples.
# Integer literal
a= 10
#Float Literal
b= 12.3
#Complex Literal
x= 3.14j
Boolean Literals
Boolean literals are used to denote boolean values. It contains either True or False.
# Boolean literal
isboolean = True
Built-In Functions: copy(), len(), count() are the some built-in functions.
User-defined Functions: Functions which are defined by a user known as user-defined functions.
Arithmetic Operators
Relational Operators
Assignment Operators
Logical Operators
Membership Operators
Identity Operators
Bitwise Operators
12. What are the rules for a local and global variable in Python?
In Python, variables that are only referenced inside a function are called implicitly global. If a variable is assigned a new value
anywhere within the function's body, it's assumed to be a local. If a variable is ever assigned a new value inside the function, the
variable is implicitly local, and we need to declare it as 'global' explicitly. To make a variable globally, we need to declare it by using
global keyword. Local variables are accessible within local body only. Global variables are accessible anywhere in the program, and
any function can access and modify its value.
19. What are the differences between Python 2.x and Python 3.x?
Python 2.x is an older version of Python. Python 3.x is newer and latest version. Python 2.x is legacy now. Python 3.x is the present and
future of this language.
The most visible difference between Python2 and Python3 is in print statement (function). In Python 2, it looks like print "Hello", and in
Python 3, it is print ("Hello").
String in Python2 is ASCII implicitly, and in Python3 it is Unicode.
The xrange() method has removed from Python 3 version. A new keyword as is introduced in Error handling.
import smtplib
# Calling SMTP
s = smtplib.SMTP('smtp.gmail.com', 587)
# TLS for network security
s.starttls()
# User email Authentication
s.login("sender_email_id", "sender_email_id_password")
# message to be sent
message = "Message_you_need_to_send"
# sending the mail
s.sendmail("sender_email_id", "receiver_email_id", message)
And finally, if you liked the book, I would like to ask you to do leave a review for the book on Amazon. Just go
to your account on Amazon and write a review for this book.