0% found this document useful (0 votes)
46 views

Unit 1 Data Structures and Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Unit 1 Data Structures and Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Elementary Data Organisation

Elementary data organization in data structures refers to the


fundamental ways in which data can be stored, arranged, and accessed
within a computer's memory. These basic structures form the building
blocks for more complex data structures and algorithms. Here are
some common elementary data organizations:

1. Arrays:
An array is a collection of elements of the same data type stored in
contiguous memory locations. Each element can be accessed using its
index. Arrays offer constant-time access to elements, but insertion and
deletion operations might require shifting elements, making them less
efficient.

2. Linked Lists:
A linked list is a linear data structure where elements, called nodes,
are connected using pointers. Each node contains both the data and a
pointer/reference to the next node in the sequence. Linked lists
provide efficient insertion and deletion at any position but have
slower random access compared to arrays.

3. Stacks:
A stack is a linear data structure that follows the Last-In-First-Out
(LIFO) principle. Elements are added and removed from the top,
similar to a stack of plates. Stacks are commonly used for tasks like
managing function calls in programming.

4. Queues:
A queue is a linear data structure that follows the First-In-First-Out
(FIFO) principle. Elements are added to the back (enqueue) and
removed from the front (dequeue). Queues are used in scenarios
where order preservation matters, like scheduling tasks.

5. Strings:
Strings are sequences of characters. They can be implemented as
arrays of characters or as linked lists of characters. Various string
manipulation operations, such as concatenation and substring
extraction, are performed using different algorithms.

6. Trees:
Trees are hierarchical data structures that consist of nodes connected
by edges. They have a root node, and each node can have zero or
more child nodes. Trees are widely used for hierarchical
representations and efficient searching (e.g., binary search trees).

7. Graphs:
Graphs are collections of nodes (vertices) and edges that connect
pairs of nodes. Graphs can be directed or undirected and are used to
represent relationships between entities. They have applications in
social networks, routing algorithms, and more.

8. Hash Tables:
Hash tables provide fast data retrieval using a hash function to map
keys to indexes in an array. They offer constant-time average case
access but may degrade to linear time in the worst case due to
collisions (when two keys hash to the same index).
These elementary data organizations serve as the basis for more
complex data structures and algorithms. Choosing the appropriate
data organization for a given problem is crucial for achieving efficient
and effective solutions. Different data organizations have different
strengths and weaknesses, and the choice often depends on the
specific requirements of the task at hand.

Data structure definition: -


A data structure is a way of organizing, storing, and managing data in
a computer's memory to enable efficient access, manipulation, and
retrieval of information. It provides a systematic and organized
approach to representing and storing different types of data, such as
numbers, characters, records, and more complex entities. Data
structures define the relationship between data elements, the
operations that can be performed on them, and the constraints that
govern these operations.

Data structures serve as the foundation for designing and


implementing algorithms, as the choice of an appropriate data
structure can significantly impact the efficiency and performance of
algorithms that operate on that data. Different data structures are
designed to optimize specific types of operations, such as searching,
insertion, deletion, and sorting, based on the requirements of various
applications.

There are various types of data structures, ranging from simple ones
like arrays and linked lists to more complex structures like trees,
graphs, and hash tables. Each data structure has its own advantages,
trade-offs, and best-use scenarios, making it important to select the
right data structure for a particular problem to achieve efficient and
effective solutions.
Data Structure vs Data types
Data structures and data types are related concepts in computer
science that both deal with the representation and organization of
data, but they have different focuses and purposes.

Data Types:
A data type defines the characteristics of a particular type of data,
such as integers, floating-point numbers, characters, strings, boolean
values, etc. It specifies the range of values that a variable of that data
type can hold and the operations that can be performed on those
values. Data types are used to ensure that data is stored and
manipulated correctly, preventing errors and unexpected behavior in
programs. They are essential for specifying the kind of information
that a variable or a function parameter can hold.

Common data types include:


- Integer: Represents whole numbers.
- Floating-point: Represents numbers with decimal points.
- Character: Represents individual characters (e.g., 'a', '1').
- String: Represents sequences of characters (e.g., "hello").
- Boolean: Represents true or false values.

Data Structures:
Data structures, on the other hand, are mechanisms for organizing and
storing data in memory in a way that facilitates efficient operations.
They define the layout of data and the relationships between different
data elements. Data structures provide a higher-level abstraction than
raw data types, allowing for more complex storage and manipulation
strategies. The choice of data structure depends on the specific
requirements of the problem being solved and the types of operations
that need to be performed on the data.

Common data structures include:


- Array: A collection of elements of the same data type, accessed by
indices.
- Linked List: A sequence of nodes, where each node contains data
and a reference to the next node.
- Stack: A linear collection of elements with LIFO (Last-In-First-Out)
behavior.
- Queue: A linear collection of elements with FIFO (First-In-First-
Out) behavior.
- Tree: A hierarchical structure with nodes connected by edges, often
used for efficient searching and sorting.
- Graph: A collection of nodes and edges that represent relationships
between entities.
- Hash Table: A data structure that allows efficient key-value pair
storage and retrieval.

In summary, data types define the characteristics and operations of


basic types of data, while data structures provide more complex ways
to organize and store data to achieve efficient algorithms and data
manipulation. Data types are the building blocks of data structures, as
data structures often consist of collections of data types.

Categories of Data Structures


Data structures can be categorized into several main types based on
their organization, behavior, and characteristics. Here are some
common categories of data structures:

1. Primitive Data Types:


These are the most basic data types provided by a programming
language, such as integers, floating-point numbers, characters, and
boolean values.

2. Linear Data Structures:


These data structures organize data elements sequentially, where
each element has a unique predecessor and successor.

- Arrays: A collection of elements of the same data type stored in


contiguous memory locations.
- Linked Lists: Elements (nodes) are connected using pointers,
forming a linear sequence.
- Stacks: A collection of elements with Last-In-First-Out (LIFO)
behavior.
- Queues: A collection of elements with First-In-First-Out (FIFO)
behavior.

3. Non-Linear Data Structures:


These data structures organize data elements in a hierarchical or
interconnected manner, allowing more complex relationships.

- Trees: Hierarchical structures with nodes connected by edges.


Examples include binary trees, AVL trees, and B-trees.
- Graphs: A collection of nodes and edges representing relationships
between entities.
- Hash Tables: Associative arrays that use a hash function to map
keys to values for efficient retrieval.

4. Composite Data Structures:


These are combinations of primitive or other composite data types.

- Records/Structures: Store different data types together under a


single name.
- Classes/Objects: Object-oriented programming constructs that
encapsulate data and behavior.

5. Homogeneous and Heterogeneous Data Structures:


- Homogeneous: All elements are of the same data type (e.g., arrays,
linked lists).
- Heterogeneous: Elements can be of different data types (e.g.,
records, structures).

6. Static and Dynamic Data Structures:


- Static: Size is fixed and determined at compile-time (e.g., arrays).
- Dynamic: Size can change during program execution (e.g., linked
lists).

7. Linear and Non-Linear Access:


- Linear: Elements are accessed sequentially (e.g., arrays, linked
lists).
- Non-Linear: Elements are accessed based on specific relationships
(e.g., trees, graphs).

8. Indexed and Sequential Access:


- Indexed: Elements are accessed using an index (e.g., arrays).
- Sequential: Elements are accessed one after another (e.g., linked
lists).

These categories provide a framework for understanding the


characteristics and behaviors of various data structures. The choice of
data structure depends on the problem's requirements, the operations
that need to be performed, and the efficiency constraints of the
application.

Data structure Operations


Sure, I can provide examples of these operations for various common
data structures:

Traversal:
Traversal involves visiting all elements of a data structure in a
systematic manner.

1. Array:
- Linear Traversal: Loop through each element using a loop.

2. Linked List:
- Sequential Traversal: Traverse through each node, following the
pointers.
3. Tree:
- In-order Traversal: Visit nodes in left-root-right order.
- Pre-order Traversal: Visit nodes in root-left-right order.
- Post-order Traversal: Visit nodes in left-right-root order.

4. Graph:
- Depth-First Traversal: Explore as far as possible along each branch
before backtracking.
- Breadth-First Traversal: Explore all neighbors at the present depth
before moving to the next level.

Insertion:

1. Array:
- Insert at Specific Index: Shift elements to accommodate the new
element.

2. Linked List:
- Insert at Beginning: Create a new node and adjust pointers.
- Insert at End: Traverse to the end and append a new node.
- Insert at Specific Position: Adjust pointers to insert a new node.

3. Tree:
- Binary Search Tree: Insert a new node while maintaining the
binary search tree property.
Deletion:

1. Array:
- Delete at Specific Index: Shift elements to fill the gap.

2. Linked List:
- Delete at Beginning: Adjust pointers to skip the first node.
- Delete at End: Traverse and update pointers to remove the last
node.
- Delete at Specific Position: Adjust pointers to remove a specific
node.

3. Tree:
- Binary Search Tree: Delete a node while maintaining the binary
search tree property.

Searching:

1. Array:
- Linear Search: Iterate through each element until the target is
found.
- Binary Search (sorted array): Divide and conquer approach to find
the target in a sorted array.

2. Linked List:
- Sequential Search: Traverse through nodes until the target is
found.

3. Tree:
- Binary Search Tree: Traverse the tree based on the comparison of
values to find the target.

4. Hash Table:
- Search using Hashing: Use the hash function to locate the bucket
and search for the target.

Sorting:

1. Array:
- Bubble Sort: Compare adjacent elements and swap them if needed.
- Selection Sort: Select the smallest element and place it at the
beginning.
- Insertion Sort: Build the sorted array in a step-by-step manner.
- Merge Sort: Divide the array into halves, sort them, and merge
them.
- Quick Sort: Choose a pivot element, partition the array, and sort
recursively.

2. Linked List:
- Merge Sort: Divide the list into halves, sort them, and merge them.
Applications of Data Structures
Data structures play a crucial role in computer science and
programming by providing efficient ways to store, organize, and
manipulate data. They have a wide range of applications in various
domains. Here are some common applications of data structures:

1. Databases and File Systems:


Data structures like B-trees and hash indexes are used in databases
to efficiently store and retrieve large amounts of data. File systems
use data structures to manage file organization and storage on disk.

2. Algorithms and Computational Geometry:


Many algorithms rely on specific data structures to optimize their
performance. For example, graphs are used for pathfinding
algorithms, and spatial data structures like quad-trees and octrees are
used in computational geometry for spatial partitioning and nearest
neighbor searches.

3. Networking and Routing:


Data structures like graphs are essential in networking for
representing network topologies, routing paths, and flow analysis.

4. Compiler Design:
Symbol tables, abstract syntax trees, and other data structures are
used in compiler design for parsing and optimizing code.

5. Operating Systems:
Data structures are used in memory management, process
scheduling, file management, and various system-level operations.

6. Artificial Intelligence and Machine Learning:


Data structures are used to represent various types of data, such as
matrices, graphs, and trees, which are commonly encountered in AI
and machine learning algorithms.

7. Computer Graphics:
Spatial data structures like kd-trees and BVH (Bounding Volume
Hierarchy) are used in computer graphics for efficient ray-tracing and
collision detection.

8. Cryptography:
Data structures play a role in cryptographic algorithms such as hash
functions and digital signatures.

9. Web Development:
Data structures are used in web development to manage user
sessions, store and retrieve data from databases, and optimize page
load times.

10. Game Development:


Game engines use data structures like spatial partitioning trees for
rendering and collision detection, as well as graphs for managing
game world interactions.

11. Embedded Systems:


Efficient data structures are crucial in resource-constrained
environments like embedded systems for managing data and
optimizing memory usage.

12. Real-time Systems:


Data structures are used in real-time systems to manage tasks and
prioritize execution based on timing constraints.

13. Simulations and Scientific Computing:


Data structures play a role in simulations and scientific computing
for storing and processing large datasets, such as climate data,
physical simulations, and numerical analysis.

Algorithm complexity and time-space trade


off
Algorithm Complexity:

Algorithm complexity refers to the measure of how the time and


space requirements of an algorithm grow as the input size increases.
There are two main types of algorithm complexity:

1. Time Complexity:
Time complexity measures the amount of time an algorithm takes to
complete as a function of the input size. It helps us understand how
the execution time increases with larger inputs. Time complexity is
often expressed using Big O notation, which describes the upper
bound of the growth rate.
For example, an algorithm with a time complexity of O(n) indicates
that the execution time increases linearly with the input size (n).
Common time complexities include O(1) (constant time), O(log n)
(logarithmic time), O(n) (linear time), O(n log n) (linearithmic time),
and more.

2. Space Complexity:
Space complexity measures the amount of memory space an
algorithm uses as a function of the input size. It helps us understand
how much memory an algorithm requires for its execution. Similar to
time complexity, space complexity is also often expressed using Big
O notation.

Space complexity considers both the extra space used by the


algorithm (data structures, variables) and the input space. Common
space complexities include O(1) (constant space), O(n) (linear space),
O(n^2) (quadratic space), and so on.

Time-Space Trade-off:

In some cases, there is a trade-off between time complexity and space


complexity. This means that by using more memory (space), you can
achieve faster execution (time), or vice versa. This trade-off arises due
to different ways of optimizing algorithms for speed or memory
efficiency.

For example, consider the use of caching in dynamic programming.


Storing precomputed results in memory (increasing space complexity)
can lead to significantly faster execution of certain algorithms
(reducing time complexity). This trade-off can be advantageous when
memory is available and time is a critical factor.

Similarly, in data structures like hash tables, increasing the size of the
hash table (more memory) can reduce the chances of collisions,
resulting in faster access times (reducing time complexity).

In contrast, some algorithms may use less memory by sacrificing


execution speed. For example, certain sorting algorithms with lower
space complexity may require more time to complete.

Big-O notations
Big O notation is a mathematical notation used in computer science to
describe the upper bound of an algorithm's time or space complexity
in terms of the input size. It provides a way to characterize how the
performance of an algorithm scales as the input size grows. Big O
notation is widely used to analyze and compare the efficiency of
algorithms without getting into the specifics of hardware or constant
factors.

Here are some common Big O notations and their meanings:

1. O(1) - Constant Time:


An algorithm is considered to have constant time complexity if the
time taken by the algorithm remains constant, regardless of the input
size. This is often the most efficient scenario.

Example: Accessing an element in an array by its index.


2. O(log n) - Logarithmic Time:
Algorithms with logarithmic time complexity often halve the input
size with each step. These algorithms are very efficient for large input
sizes.

Example: Binary search on a sorted array.

3. O(n) - Linear Time:


Linear time complexity indicates that the algorithm's execution time
grows linearly with the input size. It scales directly with the number
of input elements.

Example: Simple linear search through an array.

4. O(n log n) - Linearithmic Time:


Algorithms with linearithmic time complexity often divide the input
and perform work on each division. They are more efficient than
quadratic time complexity but slower than linear time complexity.

Example: Merge sort, quicksort (on average).

5. O(n^2) - Quadratic Time:


Quadratic time complexity indicates that the algorithm's execution
time grows quadratically with the input size. It involves nested
iterations over the input data.
Example: Selection sort, bubble sort.

6. O(n^k) - Polynomial Time:


Polynomial time complexity indicates that the algorithm's execution
time grows as a polynomial function of the input size. Higher values
of k indicate higher degrees of the polynomial.

Example: Matrix multiplication using the naive method.

7. O(2^n) - Exponential Time:


Algorithms with exponential time complexity grow at a rate
proportional to 2 raised to the power of the input size. These
algorithms can become very slow for even moderately sized inputs.

Example: Recursive algorithms that solve the Tower of Hanoi


puzzle.

8. O(n!) - Factorial Time:


Factorial time complexity indicates that the algorithm's execution
time grows at a rate proportional to the factorial of the input size.
These algorithms are extremely slow and often impractical for larger
inputs.

Example: Generating all permutations of a set.

Big O notation provides a simplified way to express the upper bound


of an algorithm's complexity. It helps programmers and computer
scientists quickly understand how an algorithm's performance scales
as input size increases, allowing for informed decisions when
selecting algorithms for specific tasks.

Strings
Storing Strings
In C, you can store strings using different data structures and
approaches. Here are a few common ways to store strings using C:

1. Character Arrays (C-Style Strings):


C-style strings are arrays of characters terminated by a null
character ('\0'). You can create a character array to store a string and
manipulate it using various string functions.

2. Dynamic Memory Allocation (malloc):


You can use dynamic memory allocation to store strings of variable
lengths. Remember to allocate extra space for the null terminator.

3. String Library Functions:


The C standard library provides functions for string manipulation.
You can use these functions with character arrays to perform various
operations.
4. Structures:
You can use structures to store strings along with additional
information. This is useful when you need to associate metadata with
the string.

5. Linked Lists:
You can create a linked list where each node stores a character or a
small substring. This approach provides flexibility for manipulating
individual characters.
String Operations
In the context of data structures and algorithms, the operations you
perform on strings in the C programming language can be crucial for
solving various problems efficiently. Let's define some common string
operations and discuss their relevance to data structures and
algorithms:

1. String Length (strlen):


- Definition: Returns the length (number of characters) of a string.
- Importance: String length is often used in loops and memory
allocation for efficient string processing. It's important for
understanding the complexity of algorithms that involve string
manipulation.

2. String Copy (strcpy):


- Definition: Copies one string to another.
- Importance: Copying strings is relevant when working with
algorithms that require modifying a string without affecting the
original. It's crucial to consider the time complexity of copying
strings, especially in algorithms with significant copying operations.
3. String Concatenation (strcat):
- Definition: Concatenates (appends) one string to the end of
another.
- Importance: Concatenation operations are used when building
strings, but they can lead to inefficiencies due to reallocation and
copying. Algorithms involving concatenation should be analyzed for
potential performance bottlenecks.

4. Substring Extraction (strncpy, strndup):


- Definition: Copies a portion of a string to another string, or creates
a new string from a substring.
- Importance: Substring extraction is relevant when working with
specific parts of a string. In algorithms, it's used for dividing a
problem into subproblems or for extracting relevant information.

5. String Comparison (strcmp):


- Definition: Compares two strings lexicographically.
- Importance: String comparison is a fundamental operation in
sorting algorithms and searching algorithms that require comparing
strings to determine order or equality.

6. String Searching (strstr):


- Definition: Searches for a substring within a larger string.
- Importance: Searching algorithms like the Knuth-Morris-Pratt
algorithm or the Boyer-Moore algorithm use string searching to
efficiently locate patterns within strings.

7. Character Manipulation (toupper, tolower):

- Definition: Converts characters to uppercase or lowercase,


respectively.
- Importance: Character manipulation is relevant when case-
insensitive comparisons or conversions are required, which can affect
sorting and searching algorithms.
8. String Tokenization (strtok):
- Definition: Breaks a string into tokens based on a delimiter.
- Importance: Tokenization is useful for breaking down strings into
meaningful parts, which can be beneficial in parsing and analyzing
text-based data.

Pattern Matching Algorithms

1. Brute-Force (Naive) Algorithm:


The simplest approach involves comparing the pattern with every
possible substring in the text. It checks for a match character by
character and shifts the pattern by one character if no match is found.
This algorithm has a straightforward implementation but can be
inefficient for large texts and patterns due to its time complexity of
O(m * n), where 'm' is the length of the text and 'n' is the length of the
pattern.

2. Knuth-Morris-Pratt (KMP) Algorithm:


The KMP algorithm improves efficiency by utilizing a preprocessed
"failure function" that helps determine the number of characters to
skip when a mismatch occurs. This avoids redundant comparisons and
achieves a linear time complexity of O(m + n), where 'm' is the length
of the text and 'n' is the length of the pattern. The KMP algorithm is
particularly useful for patterns with repeated substrings.

3. Boyer-Moore Algorithm:
The Boyer-Moore algorithm is efficient for large patterns. It uses
two main heuristics—the bad character rule (skipping based on the
mismatched character) and the good suffix rule (skipping based on
previously matched substring)—to skip unnecessary comparisons. Its
average-case and best-case time complexity is O(m + n), but in some
cases, it could have a worst-case time complexity of O(m * n), where
'm' is the length of the text and 'n' is the length of the pattern.

4. Rabin-Karp Algorithm:
The Rabin-Karp algorithm employs hashing to compare the pattern
with substrings of the text. It uses a rolling hash function to quickly
calculate the hash values of the current substring and the pattern. It
has an average-case time complexity of O(m + n) and worst-case time
complexity of O(m * n), where 'm' is the length of the text and 'n' is
the length of the pattern. The Rabin-Karp algorithm is efficient for
multiple pattern matching and can handle patterns of varying lengths.
5. Finite Automaton Algorithm:
The finite automaton algorithm builds a state machine, often
represented as a transition table, based on the pattern. It uses this state
machine to efficiently search for matches in the text. While it has a
linear time complexity in the average and best cases (O(m + n)), its
worst-case time complexity can be O(m * n), where 'm' is the length
of the text and 'n' is the length of the pattern.

6. Aho-Corasick Algorithm:
The Aho-Corasick algorithm is designed for searching multiple
patterns in a single pass. It constructs a trie (keyword tree) from the
patterns and augments it with failure links to allow efficient
backtracking. It has a linear time complexity and is especially useful
for applications like string matching in dictionary searches and lexical
analysis.

Each algorithm has its strengths and weaknesses, and the choice of
algorithm depends on factors like the length of the text, the length of
the pattern, the expected number of matches, and the specific
requirements of the application.

You might also like