Algorithms Data Structure
Algorithms Data Structure
1. Fundamental concepts
There are two fundamental concerns of data structure.
First, how the data will be stored?
Second, what operations will be performed on it?
a. Data Definition:
Data Definition defines a particular data with the following characteristics.
Atomic − Definition should define a single concept.
Traceable − Definition should be able to be mapped to some data element.
Accurate − Definition should be unambiguous.
Clear and Concise − Definition should be understandable.
b. Data Object
Data Object represents an object having a data.
c. Data Type
i. Built in Datatypes (Int, Boolean, Floating, Char, String)
ii. Derived Datatypes (List, Array, Stack, Queue)
d. Basic Operations
Traversing, Searching, Insertion, Deletion, Sorting, Merging
e. Classification of Data Structures
i. linear structures (Arrays, linked lists, stacks, and queues)
ii. hierarchical structures (trees, graphs, heaps etc.)
2. Properties of algorithms
Non-Ambiguity: Each step in an algorithm should be non-ambiguous. That means each
instruction should be clear and precise. The instruction in any algorithm should not
denote any conflicting meaning. This property also indicates the effectiveness of
algorithm.
Range of Input: The range of input should be specified. This is because normally the
algorithm is input driven and if the range of input is not being specified then algorithm
can go in an infinite state.
Multiplicity: The same algorithm can be represented into several different ways. That
means we can write in simple English the sequence of instruction or we can write it in
form of pseudo code. Similarly, for solving the same problem we can write several
different algorithms.
Speed: The algorithm written using some specified ideas. Bus such algorithm should be
efficient and should produce the output with fast speed.
Finiteness: The algorithm should be finite. That means after performing required
operations it should be terminate.
3. Criteria for an Algorithm
Input: The algorithm must have input values from a specific set.
Output: Must product output from set of input values. Which is solution to problem.
Definiteness Each instruction of the algorithm should be clear and unambiguous.
Finiteness The process should be terminated after a finite number of steps.
Effectiveness Every instruction must be basic enough to be carried out theoretically or
by using paper and pencil.
Characteristics of a Data Structure
Correctness − Data structure implementation should implement its interface correctly.
Time Complexity − Running time or the execution time of operations of data structure
must be as small as possible.
Space Complexity − Memory usage of a data structure operation should be as little as
possible.
5. Algorithm Representations
a. Pseudo Code
Pseudo code: It's simply an implementation of an algorithm in the form of
annotations and informative text written in plain English. It has no syntax like any of
the programming language and thus can't be compiled or interpreted by the
computer.
Pseudocode primitives:
Sequence - Is implied by the ordering of the steps in the algorithm.
Selection - allows for the selective execution of steps based on some
conditional evaluation. (If-else, switch)
Repetition - allows for execution of the same steps repeatedly, we don't
have to keep writing the same steps over and over and over. (while, for, do-
while)
Also needed is some mechanism to assign a value to an identifier.
Use appropriate naming conventions.
b. Flow Charts
A flowchart is the graphical or pictorial representation of an algorithm with the help
of different symbols, shapes and arrows in order to demonstrate a process or a
program.
Several standard graphics are applied in a flowchart:
- Terminal Box - Start / End
- Input / Output
- Process / Instruction
- Decision
- Connector / Arrow
6. Designing Algorithms
Algorithm design refers to a method or a mathematical process for problem-solving and
engineering algorithms. The design of algorithms is part of many solution theories of
operation research, such as dynamic programming and divide-and-conquer.
7. Algorithm Analysis and Asymptotic Notations
Execution time of an algorithm depends on the instruction set, processor speed, disk I/O
speed, etc. Hence, we estimate the efficiency of an algorithm asymptotically.
Time function of an algorithm is represented by T(n), where n is the input size.
Different types of asymptotic notations are used to represent the complexity of an
algorithm.
O − Big Oh (express the upper bound of an algorithm's running time.)
Ω − Big omega (express the lower bound of an algorithm's running time.)
θ − Big theta (express both the lower bound and the upper bound of an algorithm's
running time.)
o − Little Oh (loose upper bound of f(n).)
ω − Little omega (loose lower bound of f(n).)
8. Classification of Lists
b. Binary Search
Binary search is an efficient algorithm for finding an item from a sorted list of
items. It works by repeatedly dividing in half the portion of the list that could
contain the item, until you've narrowed down the possible locations to just one.
The time complexity of above algorithm is Theta (Log n).
c. Bubble Sort
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping
the adjacent elements if they are in wrong order.
Worst and Average Case Time Complexity: O(n*n).
Worst case occurs when array is reverse sorted.
Best Case Time Complexity: O(n).
Best case occurs when array is already sorted.
d. Merge Sort
Merge Sort is a Divide and Conquer algorithm. It divides input array in two
halves, calls itself for the two halves and then merges the two sorted halves. ...
Once the size becomes 1, the merge processes comes into action and starts
merging arrays back till the complete array is merged.
The time complexity of above algorithm is Theta (n Log n).
e. Quick Sort
Quicksort is an efficient sorting algorithm, serving as a systematic method for
placing the elements of a random-access file or an array in order. Quicksort is a
popular sorting algorithm that is often faster in practice compared to other
sorting algorithms. It utilizes a divide-and-conquer strategy to quickly sort data
items by dividing a large array into two smaller arrays.
Worst Case Time Complexity: O(n*n)
The worst case occurs when the partition process always picks greatest or
smallest element as pivot. If we consider above partition strategy where last
element is always picked as pivot, the worst case would occur when the array is
already sorted in increasing or decreasing order.
Best Case Time Complexity: Theta (n Log n).
The best case occurs when the partition process always picks the middle
element as pivot.
Average Case Time Complexity: Big-O (n Log n).
f. Heap Sort
Heap Sort is a popular and efficient sorting algorithm in computer programming.
Heap sort works by visualizing the elements of the array as a special kind of
complete binary tree called heap.
Heap sort is a comparison-based sorting technique based on Binary Heap data
structure. It is similar to selection sort where we first find the maximum element
and place the maximum element at the end. We repeat the same process for
remaining element.
Time complexity of heapify is O (Log n). Time complexity of
createAndBuildHeap () is O(n) and overall time complexity of Heap Sort is O (n
Log n)
Queue: A queue is a linear data structure in which elements can be inserted only from
one side of the list called rear, and the elements can be deleted only from the other side
called the front. The queue data structure follows the FIFO (First in First Out) principle,
i.e. the element inserted at first in the list, is the first element to be removed from the
list. The insertion of an element in a queue is called an enqueue operation and the
deletion of an element is called a dequeue operation. In queue we always maintain two
pointers, one pointing to the element which was inserted at the first and still present in
the list with the front pointer and the second pointer pointing to the element inserted
at the last with the rear pointer.
Hashing is a technique to convert a range of key values into a range of indexes of an array.
a. Linear Probing
Linear probing is a scheme in computer programming for resolving collisions in
hash tables, data structures for maintaining a collection of key–value pairs and
looking up the value associated with a given key. ... Along with quadratic
probing and double hashing, linear probing is a form of open addressing.
b. Bucketing
A bucket is simply a fast-access location (like an array index) that is the result of
the hash function. The idea with hashing is to turn a complex input value into a
different value which can be used to rapidly extract or store data.
c. Chaining
Collision resolution by chaining (closed addressing) Chaining is a possible way to
resolve collisions. Each slot of the array contains a link to a singly linked list
containing key-value pairs with the same hash.
The idea is to make each cell of hash table point to a linked list of records that
have same hash function value. Chaining is simple but requires additional
memory outside the table.
14. Recursion
Some computer programming languages allow a module or function to call itself. This
technique is known as recursion.
Properties
A recursive function can go infinite like a loop. To avoid infinite running of
recursive function, there are two properties that a recursive function must have:
Base criteria − There must be at least one base criteria or condition, such that,
when this condition is met the function stops calling itself recursively.
Progressive approach − The recursive calls should progress in such a way that
each time a recursive call is made it comes closer to the base criteria.
A call made to a function is Ο (1), hence the (n) number of times a recursive call is made
makes the recursive function Ο(n).
15. Trees
Tree represents the nodes connected by edges.
a. Binary Trees
Binary Tree is a special data structure used for data storage purposes. A binary
tree has a special condition that each node can have a maximum of two
children. A binary tree has the benefits of both an ordered array and a linked list
as search is as quick as in a sorted array and insertion or deletion operation are
as fast as in linked list.
b. Binary Search trees
Binary Search tree exhibits a special behavior. A node's left child must have a
value less than its parent's value and the node's right child must have a value
greater than its parent value.
Basic Operations
Insert − Inserts an element in a tree/create a tree.
Search − Searches an element in a tree. Preorder
Traversal − Traverses a tree in a pre-order manner.
In order Traversal − Traverses a tree in an in-order manner.
Post order Traversal − Traverses a tree in a post-order manner.
c. AVL Trees
Named after their inventor Adelson, Velski & Landis, AVL trees are height
balancing binary search tree. AVL tree checks the height of the left and the right
sub-trees and assures that the difference is not more than 1. This difference is
called the Balance Factor.
AVL Rotations
To balance itself, an AVL tree may perform the following four kinds of
rotations
- Left rotation
- Right rotation
- Left-Right rotation
- Right-Left rotation
The first two rotations are single rotations and the next two rotations
are double rotations. To have an unbalanced tree, we at least need a
tree of height 2. With this simple tree, let's understand them one by
one.
d. Two-Three Trees
2-3 tree is a tree data structure in which every internal node (non-leaf node) has
either one data element and two children or two data elements and three
children. If a node contains one data element leftVal, it has two subtrees
(children) namely left and middle. Whereas if a node contains two data
elements leftVal and rightVal, it has three subtrees namely left, middle and
right.
The main advantage with 2-3 trees is that it is balanced in nature as opposed to
a binary search tree whose height in the worst case can be O(n). Due to this, the
worst-case time-complexity of operations such as search, insertion and deletion
is O(log(n)) as the height of a 2-3 tree is O(log(n)).
16. Graphs
A Graph is a non-linear data structure consisting of nodes and edges. The nodes are
sometimes also referred to as vertices and the edges are lines or arcs that connect any
two nodes in the graph. ... A Graph consists of a finite set of vertices (or nodes) and set
of Edges which connect a pair of nodes.
Graphs are used to solve many real-life problems. Graphs are used to represent
networks. The networks may include paths in a city or telephone network or circuit
network. Graphs are also used in social networks like LinkedIn, Facebook.
a. DES
DES is a block cipher and encrypts data in blocks of size of 64 bit each, means 64
bits of plain text goes as the input to DES, which produces 64 bits of cipher text.
The same algorithm and key are used for encryption and decryption, with minor
differences. The key length is 56 bits.
b. RSA
RSA is an asymmetric key algorithm which is named after its creators Rivest,
Shamir and Adleman. The algorithm is based on the fact that the factors of large
composite number is difficult: when the integers are prime, this method is
known as Prime Factorization. It is generator of public key and private key.
Using public key we convert plain text to cipher text and private key is used for
converting cipher text to plain text. Public key is accessible by everyone whereas
Private Key is kept secret. Public Key and Private Key are kept different. Thus
making it more secure algorithm for data security.
The idea! The idea of RSA is based on the fact that it is difficult to factorize a
large integer. The public key consists of two numbers where one number is
multiplication of two large prime numbers. And private key is also derived from
the same two prime numbers. So if somebody can factorize the large number,
the private key is compromised. Therefore encryption strength totally lies on
the key size and if we double or triple the key size, the strength of encryption
increases exponentially. RSA keys can be typically 1024 or 2048 bits long, but
experts believe that 1024 bit keys could be broken in the near future. But till
now it seems to be an infeasible task.