Data Structures Short Notes All Units
Data Structures Short Notes All Units
INTRODUCTION
DATA STRUCTURE: -Structural representation of data items in primary memory to do storage &
retrieval operations efficiently.
Data structures are needed to solve real-world problems. But while choosing implementations
for it, its necessary to recognize the efficiency in terms of TIME and SPACE.
TYPES:
i. Simple: built from primitive data types like int, char & Boolean.
eg: Array & Structure
ii. Compound: Combined in various ways to form complex structures.
1:Linear: Elements share adjacency relationship& form a sequence.
Eg: Stack, Queue , Linked List
LINKED LIST:
ii. Doubly: There are two links, forward and backward link.
iii. Circular: The last node is again linked to the first node. These can be singly circular
& doubly circular list.
ADVANTAGES:
Linked list use dynamic memory allocation thus allocating memory when program
is initialised. List can grow and shrink as needed. Arrays follow static memory
allocation .Hence there is wastage of space when less elements are declared. There
is possibility of overflow too bcoz of fixed amount of storage.
Nodes are stored incontiguously thus insertion and deletion operations are easily
implemented.
Linear data structures like stack and queues are easily implemented using linked
list.
DISADVANTAGES:
Reverse traversing is difficult especially in singly linked list. Memory is wasted for
allocating space for back pointers in doubly linked list.
struct node {
int info;
} *ptr;
-ptr is a pointer of type node. To access info n next the syntax is: ptr->info; ptr->next;
i. Searching
ii. Insertion
iii. Deletion
iv. Traversal
v. Reversal
vi. Splitting
vii. Concatenation
Some operations:
a: Insertion :
newnode->data= data;
newnode->next= *headref;
*headref = newnode;
(1) : headref is a pointer to a pointer of type struct node. Such passing of pointer to pointer
is called Reference pointer. Such declarations are similar to declarations of call by
reference. When pointers are passed to functions ,the function works with the original
copy of the variable.
i. Insertion at head:
struct node* head=NULL;
for(int i=1; i<6;i++)
{ push(&head,i); \\ push is called here. Data pushed is 1.’&’ used coz references are
passed in function arguments.
}
return(head);
}
# :o\p: 5 4 3 2 1
# : o\p: 1 2 3 4 5
b. Traversal:
struct node* q;
current = q;
while(q->next != NULL)
{ q=q->next;
count++; }
return(count);
c. Searching:
{ struct node* p;
if(p->data==x)
return(p);
return(NULL);
}}
IMPLEMENTATION OF LISTS:
i : Array implementation:
structnodetype
};
structnodetype node[NUMNODES];
# :100 nodes are declared as an array node. Pointer to a node is represented by an array
index. Thus pointer is an integer b/w 0 & NUMNODES-1 . NULL pointer is represented by -1.
node[p] is used to reference node(p) , info(p) is referenced by node[p].info & next by
node[p].next.
ii : Dynamic Implementation :
This is the same as codes written under defining of linked lists. Using malloc() and
freenode() there is the capability of dynamically allocating & freeing variable. It is identical to
array implementation except that the next field is an pointer rather than an integer.
NOTE : Major demerit of dynamic implementation is that it may be more time consuming to call
upon the system to allocate & free storage than to manipulate a programmer- managed list.
Major advantage is that a set of nodes is not reserved in advance for use.
SORTING
Introduction
· The main purpose of sorting information is to optimize it's usefulness for a specific
tasks.
· Sorting is one of the most extensively researched subject because of the need to speed up
the operations on thousands or millions of records during a search operation.
Types of Sorting :
· Internal Sorting
An internal sort is any data sorting process that takes place entirely within the main
memory of a computer.
This is possible whenever the data to be sorted is small enough to all be held in the main
memory.
For sorting larger datasets, it may be necessary to hold only a chunk of data in memory at
a time, since it won’t all fit.
The rest of the data is normally held on some larger, but slower medium, like a hard-disk.
Any reading or writing of data to and from this slower media can slow the sorting process
considerably
· External Sorting
Many important sorting applications involve processing very large files, much too large
to fit into the primary memory of any computer.
Methods appropriate for such applications are called external methods, since they involve
a large amount of processing external to the central processing unit.
There are two major factors which make external algorithms quite different: ƒ
First, the cost of accessing an item is orders of magnitude greater than any bookkeeping
or calculating costs. ƒ
Second, over and above with this higher cost, there are severe restrictions on access,
depending on the external storage medium used: for example, items on a magnetic tape
can be accessed only in a sequential manner
->Insertion sort, Merge sort, Bubble sort, Selection sort, Heap sort, Quick sort
INSERTION SORT
Insertion sort is the simple sorting algorithm which sorts the array by shifting elements one by
one.
->OFFLINE sorting-This is the type of sorting in which whole input sequence is known. The
number of inputs is fixed in offline sorting.
->ONLINE sorting-This is the type of sorting in which current input sequence is known and
future input sequence is unknown i.e in online sort number inputs may increase.
int a[6]={5,1,6,2,4,3};
int i,j,key;
for(i=1;i<6;i++)
key=a[i];
j=i-1;
a[j+1]=a[j];
j-- ;
a[j+1]=key;
5
5ss 1 6 2 4 3 In insertion always start with 2nd element as key sort we
1 2 5 6 4 3 after 1.
1 2 4 5 6
1 2 3 4 5 6
( complete sorted list)
In insertion sort there is a single pass ,but there are many steps.
Input: n items
else , no swapping
->Then scan next item and compare with item1 and item2 and then continue .
* Simple implementation.
* Stable i.e does not change the relative order of elements with same values.
* The insertion sort repeatedly scans the list of items, so it takes more time.
*With n squared steps required for every n elements to be sorted , the insertion sort does not
deal well with a huge list. Therefore insertion sort is particularly useful when sorting a list of
few items.
* Insertion sort can used to sort phone numbers of the customers of a particular company.
* It can used to sort bank account numbers of the people visiting a particular bank.
MERGE SORT
Merge sort is a recursive algorithm that continually splits a list .In merge sort parallel
comparisons between the elements is done. It is based on "divide and conquer" paradigm.
int mid;
if(upper >lower)
mid=(lower+upper)/2;
mergesort(a,lower,mid);
mergesort(a,mid+1,upper);
merge(a,lower,mid,mid+1,upper);
int p,q,j,n;
int d[100];
while((p<=upper1)&& (q<=upper2))
d[n++]=(a[p]<a[q])?a[p++]:a[q++];
while(p<=upper1)
d[n++]=a[p++];
while(q<=upper2)
d[n++]=a[q++];
for(q=lower1,n=0;q<=upper1;q++,n++)
a[q]=d[n];
for(q=lower2,j=0;q<=upper2;q++,n++)
a[q]=d[j];
Illustration
8 2 9 4 5 3 1 6
6
Divide
8 2 9 4 5 3 1 6
Divide
8 2 9 4 5 3 1 6
Divide
2 9 4 5 3 1 6
Merge
2 8 4 9 3 5 1 6
Merge
2 4 8 9 1 3 5 6
Merge
1 2 3 4 5 6 8 9
TIME COMPLEXITY:
There are total log2 n passes in merge sort and in each pass there are n comparisons atmost.
*k way merge is possible in merge sort where k is utmost n/2 i.e k<n/2.
* Merge sort operation is useful in online sorting where list to be sorted is received a piece at a
time instead of all at the beginning, for example sorting of bank account numbers.
·
!" #
$
·
· % !"
Height Invariant
· !"
.
* -
· % /
.
·
· !" 0 1
·
·
*
%
2
4 *(/
5
2
3 %
3
2
6 " 3
6 3 %
3
7
%
7
8
9 2
" 3
" (8 8
" 3
8 ("
% -
.
(- 7
" 3 ; - %
7 3
2
3 3
<
= 7
(
> 2
Increasing the speed by minimizing the height difference is the main of AVL tree.
Operations like insertions,deletions etc can be done in time of O(log n),even in the worst case.
e.g.In a complete balanced tree,the left and right subtrees of any node would have same height.
Height difference is 0.
Suppose Hl is the height of left sub tree and Hr is the height of right sub tree,then following properties
must be satisfied for the tree to be an AVL tree:
i.e. |Hl-Hr|<=1
Hl-Hr={-1,0,1}
e.g. (2)
A
(1)
B
(0)
Balancing factor=Hl-Hr.
In the above diagram,balancing factor for root node is 2,so,it is not an AVL tree. In such cases,the tree
can be balanced by rotations.
1.LL Rotation(left-left)
2.RR Rotation(right-right)
3.LR Rotation(left-right)
4.RL Rotation(right-left)
LL Rotation: (+2)
A
(+1)
B
(0) c
(0)
(0) (0)
C A
RR Rotation:
(-2)
A
(-1)
(0) B
C
(0)
B
B
(0) (0)
B
A C
LR Rotation:
(-2)
A
C
(-1)
A
Balance factor of A is -2
B
B
(0)
(0) C
(0)
(0)
B A
RL Rotation: (-2)
A
(+1)
(0)
B
(0)
(0)
(0)
1st step:-We have to arrange these days alphabetically,and the constructed tree should satisfy the
conditions of an AVL Tree.Starting with Sunday(sun):
Sun
(M<S<T)
Mon Tue
Here,the balance factor for all three nodes is 0.Also,it is a BST.So,it satisfies all the conditions for
an AVL tree.
2nd step:- Now,Wednesday is to be inserted,As (W)>(T),so,it will be placed at right of Tuesday for
satisfying BST conditions:
Sun
Mon
3rd step:-
sun
tue
mon thu
mon thu
Here,balance factor of sun=-1
wed
Balance factor of tue,wed,thu=0
4th step:
sun
mon tue
fri thu we
sat
Use :
* 60 ;1 !"
- % 60 1
/
? !"
@
Limitations
* > A B
-
/ '
0 ( 1
? ' 6C 60;1
0 1
Threaded BST
2
"A binary tree is threaded by making all right child pointers that would normally be null point to
the inorder successor of the node (if it exists) , and all left child pointers that would normally be
null point to the inorder predecessor of the node."
A threaded binary tree makes it possible to traverse the values in the binary tree via a linear
traversal that is more rapid than a recursive in-order traversal. It is also possible to discover the
parent of a node from a threaded binary tree, without explicit use of parent pointers or a stack,
albeit slowly. This can be useful where stack space is limited, or where a stack of parent pointers
is unavailable (for finding the parent pointer via DFS).
SPANNING TREE:-
->A tree is a connected undirected graph with no cycles.It is a spanning tree of a graph G if it
spans G (that is, it
includes every vertex of G) and is a subgraph of G (every edge in the tree belongs to G).
->A spanning tree of a connected graph G can also be defined as a maximal set of edges of G
that contains no cycle,
-> So key to spanning tree is number of edges will be 1 less than the number of nodes.
Weighted Graphs:- A weighted graph is a graph, in which each edge has a weight (some real
number).
Minimum spnning tree is the spanning tree in which the sum of weights on edges is minimum.
NOTE:- The minimum spanning tree may not be unique. However, if the weights of all the
edges are pairwise distinct, it is indeed unique.
example:-
There are a number of ways to find the minimum spanning tree but out of them most
popular methods are prim's algorithm and kruskal algorithm.
PRIM'S ALGORITHM:-
Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a connected
weighted graph. This means it finds a subset of the edges that forms a tree hat includes every
vertex , where the total weight of all the edges in the tree is minimized.
steps:-
1. Initialize a tree with a single vertex, chosen arbitrarily from the graph.
2. Grow the tree by one edge: of the edges that connect the tree to vertices not yet in the
tree, find the minimum-weight edge, and transfer it to the tree.
3. Repeat step 2 (until all vertices are in the tree).
step-0
choose a.
step-1
step-2
step-3
step-4
step-5
step-6
kruskal algorithm:-
example:-
HASHING
Introduction:
Hashing involves less key comparison and searching can be performed in constant time.
Suppose we have keys which are in the range 0 to n-1 and all of them are unique.
We can take an array of size n and store the records in that array based on the condition that key
and array index are same.
The searching time required is directly proportional to the number of records in the file. We
assume a function f and if this function is applied on key K it returns i, an index so that
i=f(K).then entry in the access table gives the location of record with key value K.
K1
K2
HASH FUNCTIONS
The main idea behind any hash function is to find a one to one correspondence between a key
value and index in the hash table where the key value can be placed. There are two principal
criteria deciding a hash function H:K->I are as follows:
ii. The function H should achieve an even distribution of keys that actually occur across the
range of indices.
Some of the commonly used hash functions applied in various applications are:
DIVISION:
It is obtained by using the modulo operator. First convert the key to an integer then divide it
by the size of the index range and take the remainder as the result.
MID –SQUARE:
FOLDING:
The key is partitioned into a number of parts and then the parts are added together. There are
many variation in this method one is fold shifting method where the even number parts are
each reversed before the addition. Another is the fold boundary method here the two
boundary parts are reversed and then are added with all other parts.
If for a given set of key values the hash functions does not distribute then uniformly over the
hash table, some entries are there which are empty and in some entries more than one key
value are to be stored. Allotment of more than one key values in one location in the hash
table is called collision.
1 19 H:K->I
HASH TABLE
0 10
8
7 49
Collision in hashing cannot be ignored whatever
8 the size of the hash tables. There 59, 31, 7 7 are several
techniques to resolve collisions. Two important
4
methods are:
4 33
i. Closed hashing(linear
3 probing) 43
ii. Open hashing(chaining)
4 35,6 2
6
CLOSED HASHING:
The simplest method to resolve a collision is closed hashing. Here the hash table is considered as
circular so that when the last location is reached the search proceeds to the first location of the
table. That is why this is called closed hashing.
As the half of the hash table is filled there is a tendency towards clustering. The key values
are clustered in large groups and as a result sequential search becomes slower and slower.
a. Random probing
b. Double hashing or rehashing
c. Quadratic probing
RANDOM PROBING:
This method uses a pseudo random number generator to generate a random sequence of
locations, rather than an ordered sequence as was the case in linear probing method. The
random sequence generated by the pseudo random number generator contains all
positions between 1 and h, the highest location of the hash table.
I=(i+m)%h+1
i is the number in the sequence
m and h are integers that are relatively prime to each other.
DOUBLE HASHING:
When two hash functions are used to avoid secondary clustering then it is called double
hashing. The second function should be selected in such a way that hash address
generated by two hash functions are distinct and the second function generates a value m
for the key k so that m and h are relatively prime.
(k)=(k%h)+1
(k)=(k%(h-4))+1
QUADRATIC PROBING:
It is a collision resolution method that eliminates the primary clustering problem of linear
probing. For quadratic probing the next location after i will be i+,i++...... etc.
H(k)+%h for i=1,2,3.....
OPEN HASHING
In closed hashing two situations occurs 1. If there is a table overflow situation 2. If the
key values are haphazardly intermixed. To solve this problem another hashing is used
open chaining.
4. Open hashing is best suitable in applications where number of key values varies
drastically as it uses dynamic storage management policy.
5. The chaining has one disadvantage of maintaining linked lists and extra storage space
for link fields.
FILE STRUCTURE
File Organisation:-
Record:-
Field:-
Seek time:-
It is the track access time by R/w haed.
Latency time:-
“L< S”
Data access time:-It is the time taken for the file movement.Student file:-
KEY:-
File:-
APPLICATION:-
Types of files –
1. Serial Files
2. Sequential Files
Serial Files –
· When the size of the file increases, the time required to access data becomes more. This is
because it can only apply linear search.
· Examples – A material without page no. Here searching will be difficult due to lack of page
number.
Sequential File –
· Gaps are left so that new records can be added there to maintain ordering.
· When the size of the file increases, the time required to access data becomes more. This is
because there are no key.
· Access is sequential.
· Searching is fast as the index is searched and then the key is searched.
· Example – Contents of a book. The topics are the keys. They have indices like page number.
That topic can be found in that page no. When new information needs to be added, a pointer is
taken to point to a new location like in appendix of a book. This saves the time and errors that
occur due to shifting the later data after insertion.
Key-
A vital, crucial element notched and grooved, usually metal implement that is turned to open or close a
lock. The keys have the characteristics of having unique attribute.
In database management systems, a key is a field that you use to sort data.
It can also be called a key field , sort key, index, or key word.
For example, if you sort records by age, then the age field is a key.
Most database management systems allow you to have more than one key so that you can sort records
in different ways.
One of the keys is designated the primary key, and must hold a unique value for each record.
A key field that identifies records in a different table is called a foreign key.
LOCK: In general a device operated by a key, combination, or keycard and used for holding, closing or
securing the data.
The key lock principle states that once lock can be opened or fastened by specific one type of keys only.
In digital electronics latch is used for temporary security while lock is used for permanent security.
The structure of a file (especially a data file), defined in terms of its components and how they are
mapped onto backing store.
Any given file organization supports one or more file access methods.
Organization is thus closely related to but conceptually distinct from access methods.
The distinction is similar to that between data structure and the procedures and functions that operate
on them (indeed a file organization is a large-scale data structure), or to that between a logical schema
of a database and the facilities in a data manipulation language.
There is no very useful or commonly accepted taxonomy of methods of file organization: most attempts
confuse organization with access methods.
Choosing a file organization is a design decision, hence it must be done having in mind the achievement
of good performance with respect to the most likely usage of the file.
3. Storage efficiency.
To read a specific record from an indexed sequential file, you would include the KEY= parameter in the
READ (or associated input) statement.
The "key" in this case would be a specific record number (e.g., the number 35 would represent the 35th
record in the file).
The direct access to a record moves the record pointer, so that subsequent sequential access would take
place from the new record pointer location, rather than the beginning of the file.
Now question arises how to acces these files. We need KEYS to acess the file.
TYPES OF KEYS:
PRIMARY KEYS: The primary key of a relational table uniquely identifies each record in the table.
It can either be a normal attribute that is guaranteed to be unique (such as Social Security Number in a
table with no more than one record per person) or it can be generated by the DBMS (such as a globally
unique identifier, or GUID, in Microsoft SQL Server).
Examples: Imagine we have a STUDENTS table that contains a record for each student at a university.
The student's unique student ID number would be a good choice for a primary key in the STUDENTS
table.
The student's first and last name would not be a good choice, as there is always the chance that more
than one student might have the same name.
ALTERNATE KEY: The keys other than the primary keys are known as alternate key.
CANDIDATE KEY: The Candidate Keys are super keys for which no proper subset is a super key. In other
words candidate keys are minimal super keys.
SUPER KEY: Super key stands for superset of a key. A Super Key is a set of one or more attributes that
are taken collectively and can identify all other attributes uniquely.
The problem with secondary keys is that they are not unique and are therefore likely to return more
than one record for a particular value of the key.
Some fields have a large enough range of values that a search for a specific value will produce only a few
records; other fields have a very limited range of values and a search for a specific value will return a
large proportion of the file.
An example of the latter would would be a search in student records for students classified as freshmen.
FOREIGN KEY : Foreign Key (Referential integrity) is a property of data which, when satisfied, requires
every value of one attribute of a relation to exist as a value of another attribute in a different relation.
For referential integrity to hold in a relational database, any field in a table that is declared a foreign key
can contain either a null value, or only values from a parent table's primary key or a candidate key.
In other words, when a foreign key value is used it must reference a valid, existing primary key in the
parent table.
For instance, deleting a record that contains a value referred to by a foreign key in another table would
break referential integrity.
Some relational database management systems can enforce referential integrity, normally either by
deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing
the delete.
Which method is used may be determined by a referential integrity constraint defined in a data
dictionary. "Referential" the adjective describes the action that a foreign key performs, 'referring' to a
link field in another table.
In simple terms, 'referential integrity' is a guarantee that the target it 'refers' to will be found.
A lack of referential integrity in a database can lead relational databases to return incomplete data,
usually with no indication of an error.
A common problem occurs with relational database tables linked with an 'inner join' which requires non-
NULL values in both tables, a requirement that can only be met through careful design and referential
integrity.
ENTITY INTEGRITY:
In the relational data model, entity integrity is one of the three inherent integrity rules.
Entity integrity is an integrity rule which states that every table must have a primary key and that the
column or columns chosen to be the primary key should be unique and not NULL.
Within relational databases using SQL, entity integrity is enforced by adding a primary key clause to a
schema definition.
The system enforces Entity Integrity by not allowing operation (INSERT, UPDATE) to produce an invalid
primary key.
Any operation that is likely to create a duplicate primary key or one containing nulls is rejected.
The Entity Integrity ensures that the data that you store remains in the proper format as well as
comprehensible. SU