Lecture 1 - Introduction - Data Structures and Algorithms
Lecture 1 - Introduction - Data Structures and Algorithms
Data Structures are the programmatic way of storing data so that data can be used
efficiently. Almost every enterprise application uses various types of data structures in
one or the other way. This tutorial will give you a great understanding on Data Structures
needed to understand the complexity of enterprise level applications and need of
algorithms, and data structures.
Prerequisites
Before proceeding with this tutorial, you should have a basic understanding of C
programming language, text editor, and execution of programs, etc.
Basic Terminology
Data − Data are values or set of values.
Data Item − Data item refers to single unit of values.
Group Items − Data items that are divided into sub items are called as Group
Items.
Elementary Items − Data items that cannot be divided are called as Elementary
Items.
Attribute and Entity − An entity is that which contains certain attributes or
properties, which may be assigned values.
Entity Set − Entities of similar attributes form an entity set.
Field − Field is a single elementary unit of information representing an attribute of
an entity.
Record − Record is a collection of field values of a given entity.
File − File is a collection of records of the entities in a given entity set.
Installation on UNIX/Linux
If you are using Linux or UNIX, then check whether GCC is installed on your system by
entering the following command from the command line −
$ gcc -v
If you have GNU compiler installed on your machine, then it should print a message such
as the following −
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix = /usr .......
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
If GCC is not installed, then you will have to install it yourself using the detailed
instructions available at https://gcc.gnu.org/install/
This tutorial has been written based on Linux and all the given examples have been
compiled on Cent OS flavor of Linux system.
Installation on Mac OS
If you use Mac OS X, the easiest way to obtain GCC is to download the Xcode
development environment from Apple's website and follow the simple installation
instructions. Once you have Xcode setup, you will be able to use GNU compiler for
C/C++.
Xcode is currently available at developer.apple.com/technologies/tools/
Installation on Windows
To install GCC on Windows, you need to install MinGW. To install MinGW, go to the
MinGW homepage, www.mingw.org, and follow the link to the MinGW download page.
Download the latest version of the MinGW installation program, which should be named
MinGW-<version>.exe.
While installing MinWG, at a minimum, you must install gcc-core, gcc-g++, binutils, and
the MinGW runtime, but you may wish to install more.
Add the bin subdirectory of your MinGW installation to your PATH environment variable,
so that you can specify these tools on the command line by their simple names.
When the installation is complete, you will be able to run gcc, g++, ar, ranlib, dlltool, and
several other GNU tools from the Windows command line.
Algorithms – Overview
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed
in a certain order to get the desired output. Algorithms are generally created independent
of underlying languages, i.e. an algorithm can be implemented in more than one
programming language.
From the data structure point of view, following are some important categories of
algorithms −
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
Characteristics of an Algorithm
Not all procedures can be called an algorithm. An algorithm should have the following
characteristics −
Unambiguous − Algorithm should be clear and unambiguous. Each of its steps
(or phases), and their inputs/outputs should be clear and must lead to only one
meaning.
Input − An algorithm should have 0 or more well-defined inputs.
Output − An algorithm should have 1 or more well-defined outputs, and should
match the desired output.
Finiteness − Algorithms must terminate after a finite number of steps.
Feasibility − Should be feasible with the available resources.
Independent − An algorithm should have step-by-step directions, which should
be independent of any programming code.
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation
and after implementation. They are the following −
A Priori Analysis − This is a theoretical analysis of an algorithm. Efficiency of an
algorithm is measured by assuming that all other factors, for example, processor
speed, are constant and have no effect on the implementation.
A Posterior Analysis − This is an empirical analysis of an algorithm. The selected
algorithm is implemented using programming language. This is then executed on
target computer machine. In this analysis, actual statistics like running time and
space required, are collected.
We shall learn about a priori algorithm analysis. Algorithm analysis deals with the
execution or running time of various operations involved. The running time of an
operation can be defined as the number of computer instructions executed per operation.
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by
the algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such
as comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space
required by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by
the algorithm in its life cycle. The space required by an algorithm is equal to the sum of
the following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and
constants used, program size, etc.
A variable part is a space required by variables, whose size depends on the size
of the problem. For example, dynamic memory allocation, recursion stack space,
etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part
and S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Following is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now,
space depends on data types of given variables and constant types and it will be
multiplied accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm
to run to completion. Time requirements can be defined as a numerical function T(n),
where T(n) can be measured as the number of steps, provided each step consumes
constant time.
For example, addition of two n-bit integers takes n steps. Consequently, the total
computational time is T(n) = c ∗ n, where c is the time taken for the addition of two bits.
Here, we observe that T(n) grows linearly as the input size increases.