Data Structure Notes Update-1
Data Structure Notes Update-1
The data structure name indicates itself that organizing the data in memory. There are many
ways of organizing the data in the memory as we have already seen one of the data structures,
i.e., array in C language. Array is a collection of memory elements in which data is stored
sequentially, i.e., one after another. In other words, we can say that array stores the elements
in a continuous manner. This organization of data is done with the help of an array of data
structures. There are also other ways to organize the data in memory. Let's see the different
types of data structures
The data structure is not any programming language like C, C++, java, etc. It is a set of
algorithms that we can use in any programming language to structure the data in the memory.
To structure the data in memory, 'n' number of algorithms were proposed, and all these
algorithms are known as Abstract data types. These abstract data types are the set of rules.
The primitive data structures are primitive data types. The int, char, float, double, and pointer
are the primitive data structures that can hold a single value.
The arrangement of data in a sequential manner is known as a linear data structure. The data
structures used for this purpose are Arrays, Linked list, Stacks, and Queues. In these data
structures, one element is connected to only one another element in a linear form.
When one element is connected to the 'n' number of elements known as a non-linear
data structure. The best example is trees and graphs. In this case, the elements are
arranged in a random manner.
We will discuss the above data structures in brief in the coming topics. Now, we will see the
common operations that we can perform on these data structures.
o Static data structure: It is a type of data structure where the size is allocated at the compile
time. Therefore, the maximum size is fixed.
o Dynamic data structure: It is a type of data structure where the size is allocated at the run
time. Therefore, the maximum size is flexible.
Major Operations
The major or the common operations that can be performed on the data structures are:
o Sorting: We can sort the elements of a data structure either in an ascending or descending
order.
o Updation: We can also update the element, i.e., we can replace the element with another
element.
o Deletion: We can also perform the delete operation to remove the element from the data
structure.
An ADT tells what is to be done and data structure tells how it is to be done. In other words,
we can say that ADT gives us the blueprint while data structure provides the implementation
part. Now the question arises: how can one get to know which data structure to be used for a
particular ADT?.
As the different data structures can be implemented in a particular ADT, but the different
implementations are compared for time and space. For example, the Stack ADT can be
implemented by both Arrays and linked list. Suppose the array is providing time efficiency
while the linked list is providing space efficiency, so the one which is the best suited for the
current user's requirements will be selected.
o Efficiency: If the choice of a data structure for implementing a particular ADT is proper, it
makes the program very efficient in terms of time and space.
o Reusability: he data structures provide reusability means that multiple client programs can
use the data structure.
o Abstraction: The data structure specified by an ADT also provides the level of abstraction.
The client cannot see the internal working of the data structure, so it does not have to worry
about the implementation part. The client can only see the interface.
Stack is a linear data structure which follows a particular order in which the operations are performed.
The order may be LIFO(Last In First Out) or FILO(First In Last Out).
There are many real-life examples of a stack. Consider an example of plates stacked over one another
in the canteen. The plate which is at the top is the first one to be removed, i.e. the plate which has been
placed at the bottommost position remains in the stack for the longest period of time. So, it can be
simply seen to follow LIFO(Last In First Out)/FILO(First In Last Out) order.
In simple words, a linked list consists of nodes where each node contains a data field and a
reference(link) to the next node in the list.
Drawbacks:
1) Random access is not allowed. We have to access elements sequentially starting from the
first node. So we cannot do binary search with linked lists efficiently with its default
implementation. Read about it here.
2) Extra memory space for a pointer is required with each element of the list.
3) Not cache friendly. Since array elements are contiguous locations, there is locality of
reference which is not there in case of linked lists.
Representation:
A linked list is represented by a pointer to the first node of the linked list. The first node is
called head. If the linked list is empty, then value of head is NULL.
Each node in a list consists of at least two parts:
1) data
2) Pointer (Or Reference) to the next node
In C, we can represent a node using structures. Below is an example of a linked list node with
an integer data.
In Java, LinkedList can be represented as a class and a Node as a separate class. The
LinkedList class contains a reference of Node class type.
Both Arrays and Linked List can be used to store linear data of similar types, but they both
have some advantages and disadvantages over each other.
(1) The size of the arrays is fixed: So we must know the upper limit on the number of
elements in advance. Also, generally, the allocated memory is equal to the upper limit
irrespective of the usage, and in practical uses, the upper limit is rarely reached.
(2) Inserting a new element in an array of elements is expensive because a room has to be
created for the new elements and to create room existing elements have to be shifted.
And if we want to insert a new ID 1005, then to maintain the sorted order, we have to move
all the elements after 1000 (excluding 1000).
Deletion is also expensive with arrays until unless some special techniques are used. For
example, to delete 1010 in id[], everything after 1010 has to be moved.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of
edges, connecting the pairs of vertices. Take a look at the following graph −
V = {a, b, c, d, e}
Basic terms
Mathematical graphs can be represented in data-structure. We can represent a graph using an
array of vertices and a two dimensional array of edges. Before we proceed further, let's
familiarize ourselves with some important terms
Vertex − Each node of the graph is represented as a vertex. In example given below,
labeled circle represents vertices. So A to G are vertices. We can represent them using
an array where A can be identified by index 0. B can be identified using index 1 and
so on.
Edge − Edge represents a path between two vertices or a line between two vertices. In
example given below, lines from A to B, B to C and so on represents edges. We can
use a two dimensional array to represent edges where AB can be represented as 1 at
row 0, column 1, BC as 1 at row 1, column 2 and so on, keeping other combinations as
0.
Adjacency − Two node or vertices are adjacent if they are connected to each other
through an edge. In example given below, B is adjacent to A, C is adjacent to B and so
on.
Path − Path represents a sequence of edges between two vertices. In example given
below, ABCD represents a path from A to D.
Kinds of Graphs
Undirected Graphs.
In an undirected graph, the order of the vertices in the pairs in the Edge set doesn't matter.
Thus, if we view the sample graph above we could have written the Edge set as
{(4,6),(4,5),(3,4),(3,2),(2,5)),(1,2)),(1,5)}. Undirected graphs usually are drawn with straight
lines between the vertices.
In a directed graph the order of the vertices in the pairs in the edge set matters. Thus u is
adjacent to v only if the pair (u,v) is in the Edge set. For directed graphs we usually use arrows
for the arcs between vertices. An arrow from u to v is drawn only if (u,v) is in the Edge set.
The directed graph below
Note that both (B,D) and (D,B) are in the Edge set, so the arc between B and D is an
arrow in both directions.
In a labeled graph, each vertex is labeled with some data in addition to the data that identifies
the vertex. Only the indentifying data is present in the pair in the Edge set. This is silliar to the
(key,satellite) data distinction for sorting.
Here we have the following parts.
o The underlying set for the keys of the Vertices set is the integers.
o The underlying set for the satellite data is Color.
o The Vertices set = {(2,Blue),(4,Blue),(5,Red),(7,Green),(6,Red),(3,Yellow)}
o The Edge set = {(2,4),(4,5),(5,7),(7,6),(6,2),(4,3),(3,7)}
Cyclic Graphs.
A cyclic graph is a directed graph with at least one cycle. A cycle is a path along the directed
edges from a vertex to itself. The vertex labeled graph above as several cycles. One of them is
2»4»5»7»6»2
A Edge labeled graph is a graph where the edges are associated with labels. One can indicate
this be making the Edge set be a set of triples. Thus if (u,v,X) is in the edge set, then there is
an edge from u to v with label X
Edge labeled graphs are usually drawn with the labels drawn adjacent to the arcs
specifying the edges.
Here we have the following parts.
A weighted graph is an edge labeled graph where the labels can be operated on by the usual
arithmetic operators, including comparison
comparisonss like using less than and greater than. In Haskell
we'd say the edge labels are i the Num class. Usually they are integers or floats. The idea is
that some edges may be more (or less) expensive, and this cost is represented by the edge
labels or weight. Inn the graph below, which is an undirected graph, the weights are drawn
adjacent to the edges and appear in dark purple.
A Dag is a directed graph without cycles. They appear as special cases in CS applications all
the time.
Vertices in a graph do not need to be connected to other vertices. It is legal for a graph to have
disconnected components, and even lone vertices without a single connection.
Vertices (like 5,7,and 8) with only in-arrows are called sinks. Vertices with only out-
arrows (like 3 and 4) are called sources.
Connecting with friends on social media, where each user is a vertex, and when users
connect they create an edge.
Using GPS/Google Maps/Yahoo Maps, to find a route based on shortest route.
Google, to search for webpages, where pages on the internet are linked to each other
by hyperlinks; each page is a vertex and the link between two pages is an edge.
On eCommerce websites relationship graphs are used to show recommendations.
Graphs are used to solve many real-life problems. Graphs are used to represent networks. The
networks may include paths in a city or telephone network or circuit network. Graphs are also
used in social networks like linkedIn, Facebook. For example, in Facebook, each person is
represented with a vertex(or node). Each node is a structure and contains information like
person id, name, gender, locale etc.
This is probably the most often used algorithm. It may be applied in situations where the
shortest path between 2 points is needed.
Examples of such applications would be:
Computer games - finding the best/shortest route from one point to another.
Maps - finding the shortest/cheapest path for a car from one city to another, by using
given roads.
May be used to find the fastest way for a car to get from one point to another inside a
certain city. E.g. satellite navigation system that shows to drivers which way they
should better go.
The same problem, but instead of connecting communications stations - villages are to
be connected with roads.
Eulerian Path/Circuit:
A postman has to visit a set of streets in order to deliver mails and packages. It is needed to
find a path that starts and ends at the post-office, and that passes through each street (edge)
exactly once. This way the postman will deliver mails and packages to all streets he has to,
and in the same time will spend minimum efforts/time for the road.
Note that not all graphs have an eulerian circuit. If needed - the algorithm for Chinese
Postman Problem can be used.
The same problem with the postman as above, but instead of visiting each street
(vertex) exactly once, the postman can visit them more than once if needed. Thus the
path should pass through each street at least one time and should have the minimum
cost.
Drawing a circuit with a plotter in a fastest possible way, or with a minimum cost.
It may be used to determine the cheapest path for garbage collection, street cleaning,
or snow removal.
Also applied in routing robots, analysing DNA, and others.
Hamiltonian Path/Circuit:
The same problem with the postman as above, but instead of visiting a set of streets
(edges), he has to visit each point (house) exactly once.
Network Flows:
With Maximum Flow algorithm it is possible to find the most loaded roads or rails in a
certain transportation network, and also to determine its maximum intensivity. This
information may be then used to improve the traffic situation in those places.
This algorithm may be used to color a map with a minimum number of colors.
Graph Median:
A warehouse should be placed in a city (a region) so that the sum of shortest distances to all
other points (regions) is minimal. This is useful for lowering the cost of transporting goods
from a warehouase to clients.
Same thing can be considered for selecting the place of a shop, market, office and other
buildings.
Graph Center:
Suppose that a hospital, a fire department, or a police department, should be placed in a city
so that the farthest point is as close as possible. For example a hospital should be placed in
such a way that an ambulance can get as a fast as possible to the farthest situated house
(point).
Graph representation
Following two are the most commonly used representations of graph.
Adjacency Matrix:
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph.
Let the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to
vertex j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is
also used to represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to
vertex j with weight w.
Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time.
Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and can be
done O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse(contains less number of
edges), it consumes the same space. Adding a vertex is O(V^2) time.
Adjacency List:
An array of linked lists is used. Size of the array is equal to number of vertices. Let the array
be array[]. An entry array[i] represents the linked list of vertices adjacent to the ith vertex.
This representation can also be used to represent a weighted graph. The weights of edges can
be stored in nodes of linked lists. Following is adjacency list representation of the above
graph.
Adjacency List Representation of the above Graph
#include <stdio.h>
#include <stdlib.h>
return graph;
}
Output: