CS 441 Handouts
CS 441 Handouts
Problem?
In today’s age, as data is rapidly growing, so
expectations on how to analyze it faster are also
increasing.
Unstructured
Semi-Structured
Structured Data
Information stored in databases is known as structured
data because it is represented in a strict format.
Table 2: Sales for 2012 by salesperson. Table of fictional sales data. Compare the sales figures
of the two salespeople.
Copyright 2015 Keith Andrews
Sales for 2012
Figure 2: Sales for 2012 by salesperson. Line chart of the same sales data. It is
much easier to see the trends and compare the data, when it is presented visually.
Figure 3.8
12
Copyright 2015 Keith Andrews
Snow’s Cholera
Map, 1855
13
Copyright 2015 Keith Andrews
Introduction to Big Data
What is Big Data?
Big data is data that exceeds the processing capacity of
conventional database systems.
The data is too big, moves too fast, or doesn’t fit the
structures of your database architectures.
To gain value from this data, you must choose an
alternative way to process it.
Who generates Big Data?
Web data, e-commerce
purchases at department/
grocery stores
Bank/Credit Card
transactions
Mobile Devices
Scientific instruments
Sensor Technology and network
Social Media and Network etc.
Data Measurement Chart
How much data?
Google processes 20 PB a day (2008)
Examples:
Hbase
Cassandra
MongoDB
Neo4j
Python
Created in 1991 by Guido van Rossum (now at Google)
Named for Monty Python
Useful as a scripting language
script: A small program meant for one-time use
Targeted towards small to medium sized projects
Used by:
Google, Yahoo!, Youtube
Many Linux distributions
Games and apps (e.g. Eve Online)
1
Why Python?
Easy-to-learn
Easy-to-read
Easy-to-maintain
A broad standard library
Interactive Mode
Portable
Extendable
Databases
GUI Programming
Scalable
2
Installing Python - Windows
Mac OS X:
Windows: Python is already installed.
Download Python from Open a terminal and run python
http://www.python.org or run Idle from Finder.
Install Python.
Run Idle from the Start Menu. Linux:
Chances are you already have
Python installed. To check, run
python from the terminal.
If not, install from your
distribution's package system.
3
Interpreted Language
Interpreted
Not compiled like Java
Code is written and then directly executed by an interpreter
Type commands into interpreter and see immediate results
Java: Runtime
Code Compiler Computer
Environment
4
Python Interpreter
Allows you to type commands one-at-a-time and see results
A great way to explore Python's syntax
Repeat previous command: Alt+P
5
Basic Syntax
Console output: System.out.println
Methods: public static void name() { ...
Hello.java
1 public class Hello {
2 public static void main(String[] args) {
3 hello();
4 }
5
6 public static void hello() {
7 System.out.println("Hello, world!");
8 }
9 }
6
Our First Python Program
Python does not have a main method like Java
The program's main code is just written directly in the
file
Python statements do not end with semicolons
hello.py
1 print("Hello, world!”)
7
The Print Statement
print("text”)
print() (a blank line)
swallows.py
1 print(”Hello, world!”)
2 print()
3 print("Suppose two swallows \"carry\" it together.”)
4 print('African or "European" swallows?’)
8
Comments
Syntax:
# comment text (one line)
swallows2.py
1 # Suzy Student, CSE 142, Fall 2097
2 # This program prints important messages.
3 print("Hello, world!”)
4 print() # blank line
5 print(”Suppose two swallows \"carry\" it together.”)
6 print('African or "European" swallows?’)
9
Functions
Function: Equivalent to a static method in Java.
Syntax:
def name():
statement
statement
...
statement
11
Variables
Declaring
no type is written; same syntax as assignment
Operators
no ++ or -- operators (must manually adjust by 1)
Java Python
int x = 2; x = 2
x++; x = x + 1
System.out.println(x); print(x)
x = x * 8; x = x * 8
System.out.println(x); print(x)
double d = 3.2; d = 3.2
d = d / 2; d = d / 2
System.out.println(d); print(d)
12
Constants
Python doesn't really have constants.
Instead, declare a variable at the top of your code.
All methods will be able to use this "constant" value.
12
Types
Python is looser about types than Java
Variables' types do not need to be declared
Variables can change types as a program is running
Standard Data Types used in Python are Numbers, String,
List, Tuple and Dictionary
Value Java type Python type
42 int int
3.14 double float
"ni!" String str
12
String
Python strings can be multiplied by an integer.
The result is many copies of the string concatenated together.
Plus (+) Sign is String Concatenation Operator and asterisk
(*) sign is the repetition operator
>>> "hello" * 3
"hellohellohello"
12
String Concatenation
Integers and strings cannot be concatenated in Python.
Workarounds:
str(value) - converts a value into a string
print value, value - prints value twice, separated by a space
>>> x = 4
>>> print("Thou shalt not count to " + x + ".“)
TypeError: cannot concatenate 'str' and 'int' objects
12
Operators - Arithmetic
Arithmetic is very similar to Java
Operators: + - * / % (plus ** for exponentiation)
Precedence: () before ** before * / % before + -
You may use // for integer division
Operator Example
+ Addition a + b = 30
- Subtraction a – b = -10
* Multiplication a * b = 200
/ Division b/a=2
% Modulus b%a=0
** Exponent a**b =10 to the power 20
// Floor Division 9//2 = 4 and 9.0//2.0 = 4.0
12
Operators - Comparison
compare the values on either sides of them and decide the
relation among them
Also called Relational operators
Operator Meaning Example Result
== equals 1 + 1 == 2 True
12
Operators - Assignment
List of assignment operators are
Operator Example
12
Operators - Bitwise
Works on bits and performs bit by bit operation
Operator Example
12
Operators - Logical
Logical operators supported by python
12
Decision Making
Anticipation of conditions occurring while execution of the
program and specifying actions taken according to the
conditions
Evaluate multiple expressions which produce TRUE or FALSE as
outcome.
12
If Condition
if condition:
statements
Example:
gpa = input("What is your GPA? ")
if gpa > 2.0:
print("Your application is accepted.“)
12
If/Else Example:
Syntax:
gpa = input("What is your GPA? ")
if condition: if gpa > 3.5:
statements print("You have qualified for the
elif condition: honor roll.“)
statements elif gpa > 2.0:
else: print("Welcome to Mars
statements University!“)
else:
print("Your application is
denied.“)
12
If..in
if value in sequence: Example:
statements
x=3
if x in range(0, 10):
The sequence can be a
range, string, tuple, or print("x is between 0 and 9“)
list name = input("What is your name? ")
name = name.lower()
if name[0] in "aeiou":
print("Your name starts with a vowel!“)
12
Loops
Loop statement allows us to execute a statement or group of
statements multiple times.
12
For Loop
for name in range(max):
statements
12
For Loop Variations
for name in range(min, max):
statements
for name in range(min, max, step):
statements
Can specify a minimum other than 0, and a step other than 1
>>> for i in range(2, 6):
... print(i)
2
3
4
5
>>> for i in range(15, 0, -5):
... print(i)
15
10
5
12
Nested Loop
Nested loops are often replaced by string * and +
....1
...2 Java
..3 1 for (int line = 1; line <= 5; line++) {
.4 2 for (int j = 1; j <= (5 - line); j++) {
3 System.out.print(".");
4 }
5 5 System.out.println(line);
6 }
Python
12
While Loop
while test:
statements
>>> n = 91
>>> factor = 2 # find first factor of n
>>> factor
7
12
Tuple
Sequence of immutable Python objects
Tuples are sequences, just like lists
Differences between tuples and lists are, the tuples cannot be changed unlike lists
and tuples use parentheses, whereas lists use square brackets
tuple_name = (value, value, ..., value)
A way of "packing" multiple values into one variable
>>> x = 3
>>> y = -5
>>> p = (x, y, 42)
>>> p
(3, -5, 42)
Tuple … Continued
name, name, ..., name = tuple_name
"unpacking" a tuple's contents into multiple variables
>>> a, b, c = p
>>> a
3
>>> b
-5
>>> c
42
2 3
Tuple … Continued
Useful for returning more than one value
>>> from random import *
>>> def roll2():
... die1 = randint(1, 6)
... die2 = randint(1, 6)
... return (die1, die2)
...
>>> d1, d2 = roll2()
>>> d1
6
>>> d2
4
4
Tuple as Parameter
def name( (name, name, ..., name), ... ):
statements
Declares tuple as a parameter by naming each of its
pieces
5
Tuple as Return
def name(parameters):
statements
return (name, name, ..., name)
>>> from random import *
>>> def roll2():
... die1 = randint(1, 6)
... die2 = randint(1, 6)
... return (die1, die2)
...
>>> d1, d2 = roll2()
>>> d1
6
>>> d2
4
6
Dictionaries
are similar to map in JAVA
Dictionaries store a mapping between a set of keys and a set of values.
Keys can be any immutable type.
Values can be any type
Values and keys can be of different types in a single dictionary
You can
define
modify
view
lookup
delete
the key-value pairs in the dictionary
2
Creating and Accessing Values in
Dictionaries
To access dictionary elements, you can use the familiar square brackets along with the key to
obtain its value
2 3
Updating Dictionaries
Update a dictionary by adding a new entry or a key-value pair, modifying an existing
entry, or deleting an existing entry
2 5
Properties of Dictionary Keys
Dictionary values have no restrictions. They can be any arbitrary Python object,
either standard objects or user-defined objects. However, same is not true for
the keys.
Two important points to remember about dictionary keys are
More than one entry per key not allowed. Which means no duplicate key is allowed.
When duplicate keys encountered during assignment, the last assignment wins.
6
Properties of Dictionary Keys .. Continued
Keys must be immutable. Which means you can use strings, numbers or tuples as
dictionary keys but something like ['key'] is not allowed.
2 7
Functions
2 2
Calling a Function
Defining a function only gives it a name, specifies the parameters that are to
be included in the function and structures the blocks of code
Once the basic structure of a function is finalized, you can execute it by calling it
from another function or directly from the Python prompt. Following is the example
to call printme() function:
2 3
Functions without returns
All functions in Python have a return value
even if no return line inside the code.
Functions without a “return” return the special value None.
None is a special constant in the language.
2 4
Functions Overloading
There is no function overloading in Python.
Unlike Java, a Python function is specified by its name alone
The number, order, names, or types ofits arguments cannot be used to distinguish
between two functions with the same name.
Two different functions can’t have the same name, even if they have different
numbers of arguments.
But operator overloading–overloading +, ==, -, etc. –is possible using special
methods on various classes
2 5
Functions behave like Object
Functions can be used just like any other data
They can be
Arguments to function
2 6
Lambda Notation
Functions can be defined without giving them names like anonymous
inner classes in JAVA
This is most useful when passing a short function as an argument to
another function.
The first argument to apply() is an unnamed function that takes one input and
returns the input multiplied by four.
Note: only single-expression functions can be defined using this lambda
notation.
Lambda notation has a rich history in CS research and the design of many
current programming languages.
2 7
Default Value for Arguments
You can provide default values for a function’s arguments
2 8
Keyword Arguments
Functions can be called with arguments out of order
These arguments are specified in the call
Keyword arguments can be used for a final subset of the arguments.
2 9
Import and Modules
Programs will often use classes & functions defined in another file
A Python module is a single file with the same name (plus the .py extension)
Modules can contain many classes and functions
Access using import (like Java)
2 4
Commonly Used Modules
Some useful modules, included with Python:
2 5
More Commonly Used Modules
Module: math- Mathematical functions
Exponents
Sqrt
To see what’s in the standard library of modules, check out the Python
Library Reference: http://docs.python.org/lib/lib.html
2 6
Files I/O: Create a File Object
Python interacts with files with file objects
Instantiate a file object with open or file
f = open(filename, option)
At the end, use f.close()
filename is a string containing the location of a file
option can be:
r to read a file (default option)
w to write a file
a to append to a file
r+ to read and write
rb for binary reading
wb for binary writing
2
Files I/O: Reading Files
The file object keeps track of its current location
Example in first few lines of reader.py
Some useful commands:
read() Read next line by default in ‘r’ mode
Read next n characters if argument is there in ‘r’ mode
Read next n bytes in ‘rb’ mode
readline() Reads a single line per call
readlines() Returns a list of lines (splits at newline)
3
Files I/O: with open() as my FileObject
The file object keeps track of its current location
Example in first few lines of reader.py
Some useful commands:
read() Read next line by default in ‘r’ mode
Read next n characters if argument is there in ‘r’ mode
Read next n bytes in ‘rb’ mode
readline() Reads a single line per call
readlines() Returns a list of lines (splits at newline)
4
Files I/O: The read() method
The read() method reads a string
from an open file. It is important
to note that Python strings can
have binary data, apart from text
data
Syntax: fileObject.read([count]);
Here, passed parameter is the number of
bytes to be read from the opened file.
This method starts reading from the
beginning of the file and if count is
missing, then it tries to read as much as
possible, maybe until the end of file
Let us take a file foo.txt, which we created
above
1
Files I/O: File Positions
The tell() method tells you the
current position within the file; in
other words, the next read or write
will occur at that many bytes from
the beginning of the file
The seek(offset[, from]) method changes
the current file position. The offset
argument indicates the number of bytes
to be moved. The from argument
specifies the reference position from
where the bytes are to be moved
Let us take a file foo.txt, which we created
above
6
Files I/O: Writing Files
Use write() to write to a file
The write() method writes any string to an open file. It is important to note
that Python strings can have binary data and not just text.
The write() method does not add a newline character ('\n') to the end of the
string:
Syntax: fileObject.write(string);
See example in writer.py
7
Files I/O: Renaming Files
Use rename() method to rename the file
The rename() method takes two arguments, the current
filename and the new filename.
Syntax: os.rename(current_file_name, new_file_name)
Following is the example to rename an existing file test1.txt
8
Files I/O: Deleting Files
You can use the remove() method to delete files by supplying the
name of the file to be deleted as the argument.
Syntax: os.remove(file_name)
Following is the example to delete an existing file test2.txt:
9
Files I/O: Useful Tools & Tips for I/O with
Binary Files
When you read and write, use the struct module’s pack and unpack function
https://docs.python.org/2/library/struct.html
When you unpack, you must take the [0] element of a tuple
Use seek function to move to a particular location for reading/writing
If you get nonsense, try swapping byte-order (little/big endian denoted by >, <, @,
!,=)
10
Exceptions
Exceptions are events that can modify the flow or control
through a program.
They are automatically triggered on errors.
try/except : catch and recover from raised by you or Python
exceptions
try/finally: perform cleanup actions whether exceptions occur or
not
raise: trigger an exception manually in your code
assert: conditionally trigger an exception in your code
2
Exception Roles
Error handling
Wherever Python detects an error it raises exceptions
Default behavior: stops program.
Otherwise, code try to catch and recover from the exception (try handler)
Event notification
Can signal a valid condition (for example, in search)
Special-case handling
Handles unusual situations
Termination actions
Guarantees the required closing-time operators (try/finally)
Unusual control-flows
A sort of high-level “goto”
3
try/except/else
try:
You do your operations here;
......................
except ExceptionI:
If there is ExceptionI, then execute this block.
except ExceptionII:
If there is ExceptionII, then execute this block.
......................
else:
If there is no exception then execute this block.
4
try/except/else - Example
This example tries to open a file where you do not have write permission, so
it raises an exception:
5
try/else
else is used to verify if no exception occurred in try.
You can always eliminate else by moving its logic at the end of the try block.
However, if “else statement” triggers exceptions, it would be misclassified as
exception in try block.
6
try/finally
In try/finally, finally block is always run whether an exception occurs or not
try:
<block of statements>
finally:
<block of statements>
7
try/finally - Example
8
raise
raise triggers exceptions explicitly
raise <name>
raise <name>,<data> # provide data to handler
raise #re-raise last exception
>>>try:
raise ‘zero’, (3,0)
except ‘zero’: print “zero argument”
except ‘zero’, data: print data
Last form may be useful if you want to propagate caught exception to another
handler.
Exception name: built-in name, string, user-defined class
9
raise - Example
An exception can be a string, a class or an object. Most of the exceptions that the
Python core raises are classes, with an argument that is an instance of the class.
Defining new exceptions is quite easy and can be done as follows
10
Assert & Exception Objects
An exception can be a string, a class or an object. Most of the exceptions that the
Python core raises are classes, with an argument that is an instance of the class.
Defining new exceptions is quite easy
Exception Objects
String-based exceptions are any string object
Class-based exceptions are identified with classes. They also identify categories
of exceptions.
String exceptions are matched by object identity: is
Class exceptions are matched by superclass identity: except catches instances of the
mentioned class and instances of all its subclasses lower in the class tree.
1
Built-in Exception Classes
Exception – top-level root superclass of exceptions.
StandardError – the superclass of all built-in error exceptions.
ArithmeticError – the superclass of all numeric errors.
OverflowError – a subclass that identifies a specific numeric error.
>>>import exceptions
>>>help(exceptions)
12
Classes & Objects – Defining a Class
Python was built as a procedural language
OOP exists and works fine, but feels a bit more "tacked on"
Java probably does classes better than Python (gasp)
Declaring a class:
class name:
statements
2
Classes & Objects – Fields of a Class
name = value
Example:
class Point:
x=0
y=0
# main
p1 = Point()
p1.x = 2
p1.y = -5
can be declared directly inside class (as shown here)
or in constructors (more common)
Python does not really have encapsulation or private fields
Relies on caller to "be nice" and not mess with objects' contents
3
Classes & Objects – Class Example
Following is the example of a simple Python class:
4
Classes & Objects – Using a Class
import class
client programs must import the classes they use
5
Classes & Objects – Object Methods
def name(self, parameter, ..., parameter):
statements
class Point:
def translate(self, dx, dy):
self.x += dx
self.y += dy
Classes & Objects – "Implicit" Parameter
(self)
Java: this, implicit
public void translate(int dx, int dy) {
x += dx; // this.x += dx;
y += dy; // this.y += dy;
}
Python: self, explicit
def translate(self, dx, dy):
self.x += dx
self.y += dy
1
Classes & Objects – "Implicit" Parameter
(self) - Example
Declaration and definition of distance, set_location, and distance_from_origin
methods
8
Classes & Objects – Calling Methods
A client can call the methods of an object in two ways:
(the value of self can be an implicit or explicit parameter)
object.method(parameters) or
Class.method(object, parameters)
Example:
p = Point(3, -4)
p.translate(1, 5)
Point.translate(p, 1, 5)
1
Classes & Objects – Constructors
def __init__(self, parameter, ..., parameter):
statements
a constructor is a special method with the name __init__
Example:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
Classes & Objects – toString and __str__
def __str__(self):
return string
def __str__(self):
return "(" + str(self.x) + ", " + str(self.y) + ")"
Classes & Objects – Complete Point Class
12
Classes & Objects – Operator Overloading
operator overloading: You can define functions so that Python's built-in operators
can be used with your class.
See also: http://docs.python.org/ref/customization.html
2
Regular Expression Python Syntax
Most characters match themselves
The regular expression “test” matches the string ‘test’, and only
that string
[x] matches any one of a list of characters
“[abc]” matches ‘a’,‘b’,or ‘c’
[^x] matches any one character that is not included in x
“[^abc]” matches any single character except ‘a’,’b’,or ‘c’
“.” matches any single character
Parentheses can be used for grouping
“(abc)+” matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc.
3
Regular Expression Python Syntax .. Cont
x|y matches x or y
“this|that” matches ‘this’ and ‘that’, but not ‘thisthat’.
x* matches zero or more x’s
“a*” matches ’’, ’a’, ’aa’, etc.
x+ matches one or more x’s
“a+” matches ’a’,’aa’,’aaa’, etc.
x? matches zero or one x’s
“a?” matches ’’ or ’a’ .
x{m, n} matches i x‘s, where m<i< n
“a{2,3}” matches ’aa’ or ’aaa’
4
Regular Expression Python Syntax .. Cont
“\d” matches any digit; “\D” matches any non-digit
“\s” matches any whitespace character; “\S” matches any non-
whitespace character
“\w” matches any alphanumeric character; “\W” matches any
non-alphanumeric character
“^” matches the beginning of the string; “$” matches the end of
the string
“\b” matches a word boundary; “\B” matches position that is not
a word boundary
5
Search and Match
The two basic functions are re.search and re.match
Search looks for a pattern anywhere in a string
Match looks for a match staring at the beginning
Both return None if the pattern is not found (logical false) and a
“match object” if it is
>>> pat = "a*b"
>>> import re
>>> re.search(pat,"fooaaabcde")
<_sre.SRE_Match object at 0x809c0>
>>> re.match(pat,"fooaaabcde")
6
Match Object
A: an instance of the match class with the details of the match
result
pat = "a*b"
>>> r1 = re.search(pat,"fooaaabcde")
>>> r1.group() # group returns string matched
'aaab'
>>> r1.start() # index of the match start
3
>>> r1.end() # index of the match end
7
>>> r1.span() # tuple of (start, end)
(3, 7)
7
Example: Email Address Matching
Here’s a pattern to match simple email addresses
\w+@(\w+\.)+(com|org|net|edu)
We might want to extract the pattern parts, like the email name
and host
8
Example: Email Address Matching .. Cont
We can put parentheses around groups we want to be able to reference
>>> pat2 = "(\w+)@((\w+\.)+(com|org|net|edu))“
>>> r2 = re.match(pat2,"[email protected]")
>>> r2.group(1)
'finin'
>>> r2.group(2)
'cs.umbc.edu'
>>> r2.groups()
r2.groups()
('finin', 'cs.umbc.edu', 'umbc.', 'edu’)
Note that the ‘groups’ are numbered in a preorder traversal of the
forest 9
Example: Email Address Matching .. Cont
We can ‘label’ the groups as well…
>>> pat3
="(?P<name>\w+)@(?P<host>(\w+\.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"[email protected]")
>>> r3.group('name')
'finin'
>>> r3.group('host')
'cs.umbc.edu’
And reference the matching parts by the labels
10
More re functions
re.split() is like split but can use patterns
>>> re.split("\W+", “This... is a test,
short and sweet, of split().”)
['This', 'is', 'a', 'test', 'short’, ‘and', 'sweet', 'of',
'split’, ‘’]
re.sub substitutes one string for a pattern
>>> re.sub('(blue|white|red)', 'black', 'blue socks and
red shoes')
'black socks and black shoes’
re.findall() finds al matches
>>> re.findall("\d+”,"12 dogs,11 cats, 1 egg")
['12', '11', ’1’]
11
Methods of Pattern Objects
There are methods defined for a pattern object that parallel the
regular expression functions, e.g.,
match
search
split
findall
Sub
12
Introduction
Big data technologies are important in providing more
accurate analysis.
It may lead to more concrete decision-making, which
results in;
Greater operational efficiencies
Cost reductions
Reduced risks for the business
To harness the power of big data, an infrastructure is
required to;
Manage and process huge volumes of structured and
unstructured data in real-time
Protect data privacy and security
Classes of Technologies
Two classes
Operational
Systems that provide operational capabilities for real-time,
interactive workloads where data is primarily captured and
stored.
MongoDB
NoSQL
Analytical
Systems that provide analytical capabilities for retrospective,
complex analysis that may touch most or all of the data
Massively Parallel Processing (MPP) database systems
MapReduce
Cont’d
Both classes present opposing requirements
Systems have evolved to address their particular demands
separately and in very different ways
Each has driven the creation of new technology architectures
Operational systems, focus on servicing highly concurrent
requests while exhibiting low latency for responses operating on
highly selective access criteria
Analytical systems tend to focus on high throughput;
Queries can be very complex and touch most if not all of the data in
the system at any time
Both systems tend to operate over many servers operating in a
cluster, managing tens or hundreds of terabytes of data across
billions of records
Operational vs. Analytical
Systems
Reference
http://www.tutorialspoint.com/hadoop/index.htm
https://www.mongodb.com/big-data-explained
Hadoop
Doug Cutting and his team developed an Open Source
Project called HADOOP written in Java.
It is used to develop applications that could perform
complete statistical analysis on huge amounts of data.
It uses MapReduce algorithm.
It allows distributed processing of large datasets across
clusters of computers using simple programming models.
It works in distributed storage environment and
computation across clusters of computers.
It is designed to scale up from single server to thousands of
machines, each offering local computation and storage.
Benefits of Hadoop
Saleable
Economical
Efficient
Reliable
Computing Power
When to use Hadoop
When “standard tools” don’t work anymore because of
sheer size of data.
Hadoop Architecture
Two major Layers
Processing/Computation
layer (MapReduce)
Storage layer (Hadoop
Distributed File System)
Cont’d
Apart from the mentioned two core components,
Hadoop framework also includes the following two
modules:
Hadoop Common: These are Java libraries and utilities
required by other Hadoop modules.
Step 3
Verify the file using ls command.
Retrieving Data from
HDFS
Step 1
Initially, view the data from HDFS using cat command.
Step 2
Get the file from HDFS to the local file system using get
command.
Shutting Down the HDFS
Stop Hadoop HDFS
Shut down the HDFS by using the following command.
2. tcp_wrappers,
4. zlib
status:
Provides the status of HBase, for example, the number of
servers.
version:
Provides the version of HBase being used.
table_help:
Provides help for table-reference commands.
whoami:
Provides information about the user.
HBase Security
We can grant and revoke permissions to users in HBase.
There are three commands for security purpose:
grant,
revoke,
user_permission.
grant:
The grant command grants specific rights such as read,
write, execute, and admin on a table to a certain user. The
syntax of grant command is as follows:
hbase> grant <user> <permissions> [<table> [<column
family> [<column qualifier>]]
Con’t
We can grant zero or more privileges to a user from the
set of RWXCA, where
user_permission:
This command is used to list all the permissions for a
particular table. The syntax of user_permission is as
follows:
hbase>user_permission ‘tablename’
The following code lists all the user permissions of ‘emp’
table.
hbase(main):013:0> user_permission 'emp'
HBase DDL
create: Creates a table.
list: Lists all the tables in HBase.
disable: Disables a table.
is_disabled: Verifies whether a table is disabled.
enable: Enables a table.
is_enabled: Verifies whether a table is enabled.
describe: Provides the description of a table.
alter: Alters a table.
exists: Verifies whether a table exists.
drop: Drops a table from HBase.
drop_all: Drops the tables matching the ‘regex’ given in
the command.
Con’t
Java Admin API: Prior to all the above commands,
Java provides an Admin API to achieve DDL
functionalities through programming.
Under org.apache.hadoop.hbase.client package,
HBaseAdmin and HTableDescriptor are the two
important classes in this package that provide DDL
functionalities.
DDL Python Example Code
HBase DML
put: Puts a cell value at a specified column in a specified row in a
particular table.
get: Fetches the contents of row or a cell.
delete: Deletes a cell value in a table.
deleteall: Deletes all the cells in a given row.
scan: Scans and returns the table data.
count: Counts and returns the number of rows in a table.
truncate: Disables, drops, and recreates a specified table.
Java client API: Prior to all the above commands, Java provides a client
API to achieve DML functionalities.
CRUD (Create Retrieve Update Delete) operations and more through
programming, under org.apache.hadoop.hbase.client package.
HTable Put and Get are the important classes in this package.
HBase Scan & Python Example
Code
The scan command is used to view the data in
HTable. Using the scan command, you can get the
table data. Its syntax is as follows:
scan ‘<table name>’
The following example shows how to read data
from a table using the scan command. Here we are
reading the emp table.
Con’t
Scanning Using Java API:
The complete program to scan the entire table data using java
API is as follows:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
}
}
Compile and execute the above program as shown below.
$javac ScanTable.java
$java ScanTable
Con’t
The above compilation works only if you have set the
classpath in “ .bashrc ”. If you haven't, follow the procedure
given below to compile your .java file.