Module I-Part A
Module I-Part A
MODULE-1
Introduction: Data Structures, Classifications (Primitive & Non Primitive), Data structure
Operations, Review of Arrays, Structures, Self-Referential Structures, and Unions. Pointers and
Dynamic Memory Allocation Functions. Representation of Linear Arrays in Memory,
Dynamically allocated arrays, Array Operations: Traversing, inserting, deleting, searching, and
sorting. Multidimensional Arrays, Polynomials and Sparse Matrices. Strings: Basic
Terminology, Storing, Operations and Pattern Matching algorithms. Programming Examples.
Introduction
Data is a value or a set of values. Data as such may not convey any meaning.
Example: 90, Bob
When data is interpreted to convey a meaning we call it an information.
For example Bob scored 90 marks
A data item refers to a single unit of values. Data items that are divided into sub items are
called group items.
Example : Name of an employee can be divided to three sub items- first name, middle
name and last name
Data items that are not divided into sub items are called elementary items.
Entity: An entity is something that has certain attributes or properties which may
be assigned values. The values may be either numeric or non-numeric.
Ex: Attributes- Names, Age, Sex, SSN
Values- Rohland Gail, 34, F, 134-34-5533
Entities with similar attributes form an entity set. Each attribute of an entity set has a range of
values, the set of all possible values that could be assigned to the particular attribute.
Field is a single elementary unit of information representing an attribute of an entity.
Record is the collection of field values of a given entity.
File is the collection of records of the entities in a given entity set.
Each record in a file may contain many field items but the value in a certain
field may uniquely determine the record in the file. Such a field K is called a
primary key and the values k1, k2,….. in such a field are called keys or key
values.
Example: Student records have variable lengths, since different students take different numbers
of courses. Variable-length records have a minimum and a maximum length.
The above organization of data into fields, records and files may not be complex enough to maintain
and efficiently process certain collections of data. For this reason, data are also organized into
Dept. of CSE, SVIT Page 1
DSA/18CS32 Module 1
Data Structures:
Data may be organized in many different ways. The logical or mathematical model of a particular
organization of data is called a data structure.
The choice of a particular data model depends on the two considerations
1. It must be rich enough in structure to mirror the actual relationships of the data in the real
world.
2. The structure should be simple enough that one can effectively process the data whenever
necessary.
A data structure is a particular method of storing and organizing data in a computer memory so
that it can be used efficiently. The data structure is classified into
Primitive data structure: These can be manipulated directly by the machine instructions.
Example integer character, float etc
Non primitive data structures: They cannot be manipulated directly by the machine
instructions. The non primitive data structures are further classified into linear and non
linear data structures.
Linear data structures: show the relationship of adjacency between the elements of
the data structures. Example : arrays, stacks, queues , list etc.
Non linear data structure: They do not show the relationship of adjacency between
the elements. Example : Trees and graphs.
Arrays:
The simplest type of data structure is a linear (or one dimensional) array. A list of a finite number
n of similar data referenced respectively by a set of n consecutive numbers, usually 1, 2, 3 . . . . .
. . n. if A is chosen the name for the array, then the elements of A are denoted by subscript
notation a1, a2, a3….. an or
by the parenthesis notation A (1), A (2), A (3) .............. A (n)
or
by the bracket notation A [1], A [2], A [3] ............... A [n]
Example 1: A linear array STUDENT consisting of the names of six students is pictured in below
figure. Here STUDENT [1] denotes John Brown, STUDENT [2] denotes Sandra Gold, and so on.
Linear arrays are called one-dimensional arrays because each element in such an array is
referenced by one subscript. A two-dimensional array is a collection of similar data elements
where each element is referenced by two subscripts.
Example 2: A chain of 28 stores, each store having 4 departments, may list its weekly sales as in
below fig. Such data can be stored in the computer using a two-dimensional array in which the
first subscript denotes the store and the second subscript the department. If SALES is the name
given to the array, then SALES [1, 1] = 2872, SALES [1, 2] - 805, SALES [1, 3] = 3211,….,
SALES [28, 4] = 982
Trees
Data frequently contain a hierarchical relationship between various elements. The data structure
which reflects this relationship is called a rooted tree graph or a tree.
both the group items and the elementary items, can best be described by means of a tree structure.
For example, an employee personnel record may contain the following data items:
Social Security Number, Name, Address, Age, Salary, Dependents
However, Name may be a group item with the sub-items Last, First and MI (middle initial). Also
Address may be a group item with the subitems Street address and Area address, where Area itself
may be a group item having subitems City, State and ZIP code number.
This hierarchical structure is pictured below
Stack:
A stack, also called a fast-in first-out (LIFO) system, is a linear list in which insertions and
deletions can take place only at one end, called the top. This structure is similar in its operation to
a stack of dishes on a spring system as shown in fig.
Note that new 4 dishes are inserted only at the top of the stack and dishes can be deleted only from
the top of the Stack.
Queue: A queue, also called a first-in first-out (FIFO) system, is a linear list in which deletions
can take place only at one end of the list, the "from'' of the list, and insertions can take place
only at the other end of the list, the “rear” of the list.
This structure operates in much the same way as a line of people waiting at a bus stop, as pictured
in Fig. the first person in line is the first person to board the bus. Another analogy is with
automobiles waiting to pass through an intersection the first car in line is the first car through.
Graph: Data sometimes contain a relationship between pairs of elements which is not necessarily
hierarchical in nature. For example, suppose an airline flies only between the cities connected by
lines in Fig. The data structure which reflects this type of relationship is called a graph
The data appearing in data structures are processed by means of certain operations. The following
four operations play a major role:
1. Traversing: accessing each record/node exactly once so that certain items in the record
may be processed. (This accessing and processing is sometimes called “visiting” the
record.)
2. Searching: Finding the location of the desired node with a given key value, or finding
the locations of all such nodes which satisfy one or more conditions.
3. Inserting: Adding a new node/record to the structure.
4. Deleting: Removing a node/record from the structure.
ARRAYS
An Array is defined as, an ordered set of similar data items.
All the data items of an array are stored in consecutive memory locations.
The data items of an array are of same type and each data items can be accessed using the
same name but different index value. The array index starts at 0
An array is a set of pairs, <index, value >, such that each index has a value associated with
it. It can be called as corresponding or a mapping
Ex: <index, value>
< 0 , 25 > list[0]=25
< 1 , 15 > list[1]=15
< 2 , 20 > list[2]=20
< 3 , 17 > list[3]=17
< 4 , 35 > list[4]=35
Here, list is the name of array. By using, list [0] to list [4] the data items in list can be accessed.
Array I n C
Declaration: A one dimensional array in C is declared by adding brackets to the name of a
variable.
Syntax: <datatype> <array_name>[size]
The array list[5], defines 5 integers and in C array start at index 0, so list[0], list[1], list[2],
list[3], list[4] are the names of five array elements which contains an integer value.
Implementation:
When the complier encounters an array declaration, list[5], it allocates five consecutive
memory locations. Each memory is enough large to hold a single integer.
The address of first element of an array is called Base Address. Ex: For list[5] the
address of list[0] is called the base address.
If the memory address of list[i] need to compute by the compiler, then the size of the
int would get by sizeof (int), then memory address of list[i] is as follows:
list[i] = α + i * sizeof (int)
Where, α is base address.
Lets see how to obtain address of the element list[3] i.e., 4th element in the array list:
To pass an entire array to a function, only the name of the array is passed as an argument.
However, we need to make use of [] in the function definition. This informs the compiler
that you are passing a one-dimensional array to the function.
#include<stdio.h>
#define MAX_SIZE
100
float sum(float [], int);
void main(void)
{
int i;
float input[MAX_SIZE], answer;
for( i=0; i<MAX_SIZE;
i++)
input[i]= i;
answer = sum(input);
printf(“\n The sum is: %f \n”,answer);
}
Here, When sum is invoked, input=&input[0](i.e., base address) is copied into a temporary
location and associated with the formal parameter list
STRUCTURES
In C, a way to group data that permits the data to vary in type. This mechanism is called the
structure, for short struct.
A structure (a record) is a collection of data items, where each item/member is identified by its
type and name.
Structure is basically a user-defined data type that can store related information that may be of
same or different data types together.
The major difference between a structure and an array is that an array can store only
information of same data type. A structure is therefore a collection of variables under a single
name. The variables within a structure are of different data types and each has a name that is
used to select it from the structure.
struct <struct_name>
{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
};
For example,
struct student
{
char
sname[
10]; int
age;
float average_marks;
};
To assign values to these fields dot operator (. ) is used as the structure member operator.
Type-Defined Structure
The structure definition associated with keyword typedef is called Type-Defined Structure.
Syntax 1: typedef struct
{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
}Type_name;
Where,
typedef is the keyword used at the beginning of the definition and by using typedef
user defined data type can be obtained.
struct is the keyword which tells structure is defined to the complier
The members are declare with their data_type
Type_name is not a variable, it is user defined data_type.
Syntax 2:
struct struct_name
{
data_type member 1;
data_type member 2;
………………………
………………………
data_type member n;
};
typedef struct
{
char name[10];
int age;
float salary;
}EMPLOYEE;
Example2:
struct Employee
{
char name[10];
int age;
float salary;
};
typedef struct Employee EMPLOYEE;
In above example, EMPLOYEE is the name of the type and it is a user defined data type.
This statement declares the variable person1 and person2 are of type EMPLOYEE.
Structure Operation
The various operations can be performed on structures and structure members.
typedef struct{
char
name[10]; int
age;
float salary;
}humanBeing;
humanBeing person1, person2;
Comparing structures: Return TRUE if employee 1 and employee 2 are the same
otherwise return FALSE
1. The structures are defined separately and a variable of structure type is declared inside the
definition of another structure. The accessing of the variable of a structure type that are nested
inside another structure in the same way as accessing other member of that structure
For example, associated with our employee structure we may wish to include the date of Birth of
an employee by using nested stucture
typedef struct
{
int day,month,year;
} date;
typedef struct
{
char name[10]; int age;
float salary; date dob;
}employee;
A person born on September 10, 1974, would have the values for the date struct set as:
p1.dob.month = 9;
p1.dob.day = 10;
p1.dob.year = 1974;
SELF-REFERENTIAL STRUCTURES
A self-referential structure is one in which one or more of its data members is a pointer to itself.
Self-referential structures usually require dynamic storage management routines (malloc and
free) to explicitly obtain and release memory.
Example:
struct list {
char data;
list *link ;
};
typedef struct list LIST;
Each instance/ variable of the structure list will have two components, data and link. data
is a single character, while link is a pointer to a list structure variable.
The value of link is either the address in memory of an instance of list or the null pointer.
Consider these statements, which create three structure variables and assign values to their
Dept. of CSE, SVIT Page 11
DSA/18CS32 Module 1
respective fields:
LIST item1, item2, item3;
item1.data = „a‟;
item2.data =‟b‟;
item3.data = „c‟;
item1.link = item2.link = item3.link = NULL;
We can attach these structure variables together by replacing the null link field in item 2 with one
that points to item 3 and by replacing the null link field in item 1 with one that points to item 2.
item1.link = &item2;
item2.1ink = &item3;
Array of Structures: In the case of a student or the employee we may not store the details of
only 1 student or 1 employee. When we have to store the details of a group of students we can
declare an array of structures.
Unions: A union is a user-defined data type that can store related information that may be
of different data types or same data type, but the fields of a union must share their memory
space. This means that only one field of the union is "active" at any given time.
Example1:
Suppose a program uses either a number that is int or float we can define a union as
union num
{
int a;
float b;
};
Now we can store values as n1.a=5 or n2.b= 3.14 only one member is active at a point of time.
For example
char *p; //char pointer variable
int *m; //integer pointer variable
The variable p is declared as a pointer variable of type character.
The variable m is declared as a pointer variable of type integer.
The two most important operator used with the pointers are
& - The unary operator & which gives the address of a variable
* - The indirection or dereference operator * gives the content/value of the
object/variable pointed by a pointer.
Initialization of pointer variables: Uninitialized variables have unknown garbage values stored in
them, similarly uninitialized pointer variables will have uninitialized memory address stored inside
them which may be interpreted as a memory location, and may lead to runtime error.
These errors are difficult to debug and correct, therefore a pointer should always be initialized with a
valid memory address.
Here the variable a and the pointer variable p are of the same data type.
To make p to point at a we have to write a statement
p=&a; // now the address of a is stored in the pointer variable p and now p is said to be
pointing at a.
If we do not want the pointer variable to point at anything we can initialize it to point at NULL
When we dereference a null pointer, we are using address zero, which is a valid address in the
computer.
NOTE:
A pointer variable can only point at a variable of the same type.
We can have more than one pointer variable pointing at the same variable.
For example
int a;
int *p,*q;
p=&a;
q=&a;
Now both the pointer variable p and q are pointing at the same variable a.
There is no limit to the number of pointer variable that can point to a variable.
a
=
5
;
i
n
t
*
p
p=&a; // p is now pointing at a
*p=*p+1; //Now the value of a is modified through the pointer variable p
printf(“ %d %d %p”, a, *p, p);
}
Output: 6 6 XXXXX(address of variable a)
Note:
We need parenthesis for expressions like (*p) ++ as the precedence of postfix increment is more
than precedence of the indirection operator (*). If the parenthesis is not used the address will be
incremented.
When we call a function by passing the address of a variable we call it as pass by reference. By
passing the address of variables defined in main we can directly store the data in the calling function
rather returning the value. Pointers can also be used when we have to return more than one value
from a function
Pointer can be very dangerous if they are misused. The pointers are dangerous in following
situations:
1. Pointer can be dangerous when an attempt is made to access an area of memory that is
either out of range of program or that does not contain a pointer reference to a legitimate
object.
Ex: main ()
{
int *p;
int pa = 10;
p = &pa;
printf(“%d”, *p); //output = 10;
printf(“%d”, *(p+1)); //accessing memory which is out of range
}
2. It is dangerous when a NULL pointer is de-referenced, because on some computer it
may return 0 and permitting execution to continue, or it may return the result stored in
location zero, so it may produce a serious error.
3. Pointer is dangerous when use of explicit type casts in converting between pointer types
Ex: int *pi;
float *pf;
pi = malloc (sizeof (int));
pf = (float*) pi;
4. In some system, pointers have the same size as type int, since int is the default type
specifier, some programmers omit the return type when defining a function. The return
type defaults to int which can later be interpreted as a pointer. This has proven to be a
dangerous practice on some computer and the programmer is made to define explicit
types for functions.
Memory allocation functions: In high level languages the data structures are fully defined at compile
time. Modern languages like C can allocate memory at execution this feature is known as dynamic
memory allocation.
There are two ways in which we can reserve memory locations for a variable
Static memory allocation: the declaration and definition of memory should be specified in
the source program. The number of bytes reserved cannot be changed during runtime
Dynamic memory allocation : Data definition can be done at runtime .It uses predefined
functions to allocate and release memory for data while the program is running. To use
dynamic memory allocation the programmer must use either standard data types or must
declare derived data types
Memory usage: Four memory management functions are used with dynamic memory. malloc, calloc
and realloc are used for memory allocation. The function free is used to return memory when it is not
used.
Heap: It is the unused memory allocated to the program When requests are made by memory
allocating functions, memory is allocated from the heap at run time.
The pointer returned by the malloc function can be type cast to the pointer of the required
type by making use of type cast expressions
When memory locations allocated are no longer needed, they should be freed by using the predefined
function free.
Syntax: free(ptr);
This statement cause the space in memory pointer by ptr to be deallocated
#include <stdio.h>
#include<stdlib.h>
void main()
{
int i,*pi;
float f,*pf;
pi= (int*) malloc (sizeof((int));
pf= (float *) malloc(sizeof(float));
*pi= 1344;
*pf= 3.14
printf(“integer value= %d float value= %f”,*pi, *pf);
free(pi);
free(pf);
Example: To allocate a one dimensional array of integers whose capacity is n the following code can be
written.
int *ptr
ptr=(int*)calloc(n,sizeof(int))
Reallocation of memory(realloc): The function realloc resizes the memory previously allocated by
either malloc or calloc.
Example
int *p;
p=(int*)calloc(n,sizeof(int))
p=realloc(p,s) /*where s is the new size*/
The statement realloc(p,s) -- Changes the size of the memory pointed by p to s. The existing contents of
the block remain unchanged.
When s> oldsize(Block size increases) the additional (s - oldsize ) have unspecified value
When s<oddsize (Block size reduces) the rightmost (oldsize-s) bytes of the old block are freed.
When realloc is able to do the resizing it returns a pointer to the start of the new block
When is not able to do the resizing the old block is unchanged and the function returns the
value NULL.
Dangling Reference: Once a pointer is freed using the free function then there is no way to retrieve this
storage and any reference to this location is called dangling reference.
Example2:
int i,*p,*f;
i=2;
p=&i;
f=p;
free(p);
*f=*f+2 /* Invalid dangling reference*/
The location that holds the value 2 is freed but still there exist a reference to this location through f and pointer
f will try to access a location that is freed so the pointer f is a dangling reference
One dimensional array: When we cannot determine the exact size of the array the space of the array
can be allocated at runtime.
For example consider the code given below
int i,n,*list;
printf(“enter the size of the array”);
scanf(“%d”,&n);
if (n<1)
{
printf(“Improper values of n \n”);
exit();
}
list=(int*) malloc (n*sizeof(n))/* or list=(int*)calloc(n,sizeof(int))
In C we find the element x[i][j] by first accessing the pointer in x[i]. This pointer gives the address of
the zeroth element of row i of the array. Then by adding j*sizeof(int) to this pointer, the address of the
jth element of the ith row is determined
Example to find x[1][3] we first access the pointer in x[1] this pointer gives the address of x[1][0]
now by adding 3*sizeof (int) the address of the element x[1][3] is determined.
Linear Arrays: A Linear Array is a list of finite number (n) of homogenous data elements.
a. The elements of the array are referenced by an index set consisting of n consecutive
numbers(0 ...(n-1)).
b. The elements of the array are stored in successive memory locations
c. The number n of elements is called the length or size of the array. Length of the array can be
obtained from the index set using the formula
Length = Upperbound - Lowerbound +1
d. The elements of an array may be denoted by a[0],a[2]………a[n-1]. The number k in a[k] is
called a subscript or index and a[k] is called the subscripted value.
e. An array is usually implemented as a consecutive set of memory locations
Declaration: Linear arrays are declared by adding a bracket to the name of a variable. The size of the
array is mentioned within the brackets.
In C all arrays start at index 0. Therefore, list[0], list[1], list[2], list[3], and list[4] are the names of the
five array elements ,each of which contains an integer value.
address(list[1] )= α + w*1
address(list[2]) = α + w*2
address(list[3]) = α + w*3
address(list[4]) = α + w*4
Array Operations: Operations that can be performed on any linear structure whether it is an array
or a linked list include the following
a. Traversal- processing each element in the list
b. Search- Finding the location of the element with a given key.
c. Insertion- Adding a new element to the list
d. Deletion- Removing an element from the list.
e. Sorting- Arranging the elements in some type of order.
f. Merging- combining two list into a single list.
Traversing Linear Arrays: Traversing an array is accessing and processing each element exactly
once. Considering the processing applied during traversal as display of elements the array can be
traversed as follows
void displayarray(int a[])
{
int i;
printf("The Array Elements are:\n");
for(i=0;i<n;i++)
printf("%d\t",a[i]);
}
Insertion
Inserting an element at the end of the array can be done provided the memory space allocated
for the array is large enough to accommodate the additional element.
If an element needs to be inserted in the middle then all the elements form the specified position
to the end of the array should be moved down wards to accommodate the new element and to
keep the order of the other element.
The following function inserts an element at the specified position
Deletion
If an element needs to be deleted in the middle then all the elements form the specified position
to the end of the array should be moved upwards to fill up the array.
The following function deletes an element at the specified position
printf("Invalid Position\n");
else
{
printf("The Deleted Element is %d\n",a[pos]);
for(i=pos;i<n;i++)
a[i]=a[i+1]; //Delete by pushing up other elements
*n--;
}
}
Sorting: Sorting refers to the operation of rearranging the elements of an array in increasing or
decreasing order.
Example: Write a program to sort the elements of the array in ascending order using bubble
sort.
#include<stdio.h>
void main()
{
int a[10],i,j,temp,n;
printf("enter the size of the array : ");
scanf("%d",&n);
printf("enter the elements of the
array\n"); for(i=0;i<n;i++)
scanf("%d",&a[i]);
for(i=1;i<=n-1;i++)
for(j=0;j<n-i ;j++)
if (a[j] >a[j+1])
{
temp=a[j];
a[j]=a[j+1];
a[j+1]= temp;
}
printf("the sorted array is \n");
for(i=0;i<n;i++)
printf("%d \t",a[i]);
return(0);
}
Searching:
Let DATA be a collection of data elements in memory and suppose a specific ITEM of
information is given.
Searching refers to the operation of finding the Location LOC of the ITEM in DATA or
printing a message that the item does not appear here.
The search is successful if the ITEM appear in DATA and unsuccessful otherwise.
The algorithm chosen for searching depends on the way the data is organised. The two algorithm
considered here is linear search and binary search.
LINEAR SEARCH: This program traverses the array sequentially to locate key
#include<stdio.h>
#include<stdlib.h>
void main()
{
int a[10],i,key,pos,n,flag=0;
printf("enter the size of the array : ");
scanf("%d",&n);
printf("enter the elements of the array\n");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
printf("enter the key \n");
scanf("%d",&key);
for(i=0;i<=n-1;i++)
if (a[i]== key)
{
printf("key %d found at %d",key,pos+1);
exit();
}
printf("key not found");
}
Complexity of Linear search: The complexity is based on the number of comparison C(n) required to
find the key in the array element.
The best case occurs when the key is found at first position. Number of Comparisons is 1
Worst case occurs when key element is not found in the array or when the element is in the last
position.
Thus in worst case the running time is proportional to n
The running time of the average case uses the probabilistic notation of expectation.
Number of comparison can be any number from 1 to n and each occurs with probability p= 1/n
then C(n) = 1.1/n +2.1/n+… .. n.1/n
= (1+2+3… .....+n).1/n
=n(n+1)/2.1/n
= n+1/2
#include<stdio.h>
#include<stdlib.h>
int main()
{
int a[10],i,key,mid,low,high,n;
printf("enter the size of the array : ");
scanf("%d",&n);
printf("enter the elements of the array in ascending order\n");
for(i=0;i<n;i++)
scanf("%d",&a[i]);
printf("enter the key \n");
scanf("%d",&key);
low=0;
high=n-1;
while(low<=high)
{
mid=(low+high)/2;
if (key==a[mid])
{
printf("element %d found at %d",key,mid+1);
exit(0);
}
else
{
if (key<a[mid])
high = mid-1;
else
low=mid+1;
}
}
printf("key not found");
return(0);
}