DS UNIT 2 Notes
DS UNIT 2 Notes
DEFINITION
Searching is the process of finding element in a given list. In this process we check item
is available in given list or not.
TYPE OF SEARCHING
1. Internal search
2. External search
COMPLEXITY ANALYSIS
The Complexity analysis is used to determine, the algorithm will take amount of resources (such
as time and space) necessary to execute it.
There are two types of complexities: 1) Time Complexity and 2) Space Complexity.
1) Time Complexity
The time complexity of an algorithm is the amount of time it needs to run a completion. In
computer programming the time complexity any program or any code quantifies the amount of
time taken by a program to run. The time complexity is define using some of notations like Big
O notations, which excludes coefficients and lower order terms. The time complexity is said to
be described asymptotically, when we describe it in this way i.e., as the input size goes to
infinity. For example, if the time required by an algorithm on all inputs of size n is at most 5n3 +
3n for any n(bigger than some n0), the asymptotic time complexity is O(n3).
1
2) Space Complexity
The complexity of an algorithm, i.e., a program is the amount of memory; it needs to run to
completion. Some of the reasons for studying space complexities are:
If the program is to run on multi user system, it may be required to specify amount of
memory to be allocated to the program.
We may be interested to know in advance that weather sufficient memory available to run
program.
There may be several possible solutions with different space requirements.
Instruction space: Space needed to store the executable version of the program and it is
fixed.
Data space: Space needed to store all constants; variable values and has further following
components:
o Space needed by constants and simple variables. This space is fixed.
o Space needed by fixed size structured variable, such as array and structure.
o Dynamically allocated space. This space usually varies.
The amount of space of memory which is consumed by the recursion is called a recursion stack
space. For each recursive function, this space depends on the on the space needed by the local
variable and the forma parameters. In addition, this space depends on the maximum depth of
recursion i.e. maximum number of nested recursive calls.
In general, the total space needed by the program can be divided into two parts.
1. A fixed part that is independent of particular problems, and includes instruction space,
space for constants, simple variables, and fixed sized structured variables.
2. A variable part that includes structured variable whose size depends on the particular
problem being solved dynamically allocated space and he recursion stack space.
2
SEARCHING TECHNIQUES
Linear/Sequential searching is a searching technique to find an item from a list until the
particular item not found or list not reached at end. We start the searching from 0th index to N-1
index in a sequential manner, if particular item found, returns the position of that item otherwise
return failure status or -1.
The complexity of search algorithm is based on number of comparisons C, between ITEM and
LIST [LOC]. We seek C (n) for the worst and average case, where n is the size of the list.
Worst Case: The worst case occurs when ITEM is present at the last location of the list, or it is
not there at al. In either situation, we have
C (n) = n
Now, C (n) = n is the worst case complexity of linear or sequential search algorithm.
3
Average case: In this case, we assume that ITEM is there in the list, and it can be present at any
position in the list. Accordingly, the number of comparisons can be any of the numbers 1, 2, 3, 4
…n. and each number occurs with probability p = 1/n. Then,
C (n) = n/2
2) BINARY SEARCH
It is special type of search work on sorted list only. During each stage of our procedure, our
search for ITEM is reduced to a restricted segment of elements in LIST array. The segment
starts from index LOW and spans to HIGH.
LIST [LOW], LIST [LOW+1], LIST [LOW+2], LIST [LOW+ 3]….. LIST[HIGH]
The ITEM to be searched is compared with the middle element LIST[MID] of segment,
where MIDobtained as:
MID = ( LOW + HIGH )/2
We assume that the LIST is sorted in ascending order. There may be three results of this
comparison:
1. If ITEM = LIST[MID], the search is successful. The location of the ITEM is LOC :=
MID.
2. If ITEM < LIST[MID], the ITEM can appear only in the first half of the list. So, the
segment is restricted by modifying HIGH = MID – 1.
3. If ITEM > LIST[MID], the ITEM can appear only in the second half of the list. So, the
segment is restricted by modifying LOW = MID + 1.
Initially, LOW is the start index of array, and HIGH is the last index of array.
The comparison goes on. With each comparison, low and high approaches near to each other.
The loop will continue till LOW < HIGH.
4
Step3: Since item (15) = list [mid] (15);
It is success, the location 1 is returned.
BINARY SEARCH (LIST, N, ITEM, LOC, LOW, MID, HIGH)
Here LIST is a sorted array with size N, ITEM is the given information.
The variable LOW, HIGH and MID denote, respectively,
the beginning, end and the middle of segment of LIST.
The algorithm is used to find the location LOC of ITEM in LIST array.
If ITEM not there LOC is NULL.
The complexity measured by the number f(n) of comparisons to locate ITEM in LIST where
LIST contains n elements. Each comparison reduces the segment size in half. Hence, we require
at most f(n) comparisons to locate ITEM, where:
2c >= n
Approximately, the time complexity is equal to log2n. It is much less than the time complexity of
linear search.
5
TOPIC 2: MEMORY MANAGEMENT
INTRODUCTION:
Memory management is the functionality of an operating system which handles or manages
primary memory and moves processes back and forth between main memory and disk during
execution. Memory management keeps track of each and every memory location, regardless of
either it is allocated to some process or it is free. It checks how much memory is to be allocated
to processes. It decides which process will get memory at what time. It tracks whenever some
memory gets freed or unallocated and correspondingly it updates the status.
1) Fixed Partitioning
• Partition main memory into a set of non-overlapping memory regions called partitions.
• Fixed partitions can be of equal or unequal sizes.
• Leftover space in partition, after program assignment, is called internal fragmentation.
6
•
Equal-size partitions:
– If there is an available partition, a process can be loaded into that partition –
• because all partitions are of equal size, it does not matter which partition is
used.
– If all partitions are occupied by blocked processes, choose one process to swap
out to make room for the new process.
• Unequal-size partitions, use of a single queue:
– when its time to load a process into memory, the smallest available partition that
will hold the process is selected.
– increases the level of multiprogramming at the expense of internal fragmentation.
Drawback: Main memory use is inefficient. Any program, no matter how small, occupies
an entire partition. This can cause internal fragmentation.
7
Example of Buddy System
8
• We start with the entire block of size 2^{U}.
• When a request of size S is made:
– If 2^{U-1} < S <= 2^{U} then allocate the entire block of size 2^{U}.
– Else, split this block into two buddies, each of size 2^{U-1}.
– If 2^{U-2} < S <= 2^{U-1} then allocate one of the 2 buddies.
– Otherwise one of the 2 buddies is split again.
• This process is repeated until the smallest block greater or equal to S is generated.
• Two buddies are coalesced whenever both of them become unallocated.
• The OS maintains several lists of holes:
– the i-list is the list of holes of size 2^{i}.
– whenever a pair of buddies in the i-list occur, they are removed from that list and
coalesced into a single hole in the (i+1)-list.
• Presented with a request for an allocation of size k such that 2^{i-1} < k <= 2^{i}:
– the i-list is first examined.
– if the i-list is empty, the (i+1)-list is then examined ...
• On average, internal fragmentation is 25%
– each memory block is at least 50% occupied.
• Programs are not moved in memory:
– simplifies memory management.
9
TOPIC 3: GARBAGE COLLECTION ALGORITHMS
MEMORY ALLOCATION
In programming languages, there are two parts in the memory defined as Heap and
Stack.The stack memory is used for execution of a thread. When a function is called, a block of
memory is allocated in the stack to store the local variables of the function. The allocated memory
gets freed when the function is returned. In contrast to the Stack, Heap memory is used for
dynamic allocation (usually when creating objects with “new” or “malloc” keyword) and memory
deallocation needs to be handled separately.
.............
myObject = null;
If at some point of the program, another reference to an object or “null” was assigned to the
“myObject” variable, the reference that existed with the already created “Object” will be
removed. However, the memory allocated for this “Object” will not be freed even though the
Object is not being used. In the older programs such as C or C++, the programmer needs to be
concerned about these type of objects allocated in the Heap and delete when they are not in use to
free up the memory. Failing to do that can end up in a memory leak. In the other hand, if we
mistakenly delete an Object that has a live reference to a variable can cause null pointer
exceptions in later parts of the code when we try to access the deleted object using the old
reference.
However, in languages like Java and C#, this memory management is handled by a separate entity
known as the Garbage Collector.
10
With a Garbage Collector in place, we can allocate an object in the memory, use it and when there
is no longer any reference for that object, the object will be marked for the Garbage Collector to
pick up freeing the allocated memory. And a Garbage Collector also guarantees that any live
Object that has an active reference will not get removed from the memory.
GARBAGE COLLECTION
Introduction:
Garbage collection (GC) is a form of automatic memory management. The garbage
collector, or just collector, attempts to reclaim garbage, or memory occupied by objects
that are no longer in use by the program. Garbage collection is often portrayed as the
opposite of manual memory management, which requires the programmer to specify
which objects to deallocate and return to the memory system. Garbage collection, like
other memory management techniques, may take a significant proportion of total
processing time in a program and can thus have significant influence on performance.
The reference count garbage collection keeps a track of the number of references for a particular
object in the memory. Let’s look at the following code segment.
b.someMethod();
b = null; // Reference Count(OB1) decremented to 1 as reference goes away
a = c; // Reference Count(OB1) decremented to 0
11
When executing the line Object a = new Object(), a new object (let’s say OB1 ) is created in the
memory and the reference count (for OB1) starts at 1.
When the reference for the OB1 in the variable “a” is copied to “b”, the reference counter
increases by one as now two variables have the reference for OB1.
When “b” is assigned to null, there reference for OB1 decreases leaving only the variable “a”
having a reference for OB1.
When the value of “a” is updated by the value of “c” (which is having a reference for a whole new
object), there reference counter for the OB1 becomes zero leaving the OB1 available for garbage
collection.
The main disadvantage in the reference counted garbage collection is its inability to identify
circular references. To understand the circular references, let’s have a look into the below code
segment.
class B {
private A a;
Now in the main method, we can create new objects for both of these classes and assign the
references.
public class Main {
public static void main(String[] args) {
A one = new A();
12
B two = new B();
// Throw away the references from the main method; the two objects are
// still referring to each other
one = null;
two = null;
}
}
When we assign null values for the two variables one and two, the external references existed
with the class objects (“A” and “B”) created at the beginning will be removed. Still, they won’t be
eligible for garbage collection as the reference counters of those two objects will not become zero
due to object “A” having its reference inside “B” and the object “B” having its reference inside
“A”.
As the name suggests, Mark and Sweep garbage collectors have two phases
1. Mark Phase
2. Sweep Phase
Mark Phase
During the Mark phase, the GC identifies the objects that are still in use and set their “mark bit” to
true. The search starts with a root set of references kept in local variables in the stack or global
variables. Starting from the root references the GC will conduct a depth search for the objects that
have reachable references from the root. Any object that keeps a reference of another object,
keeps that object alive.
It is important to keep note that during the Mark phase, the application threads are stopped to
avoid the changes that can happen to the object state during the marking phase.
13
The cyclic references are not an issue for a Mark and Sweep GC. If you observe the above
diagram, a cyclic reference exists (shown by the square) but it is unreachable from the root.
Hence, those types of references will not be marked as live allowing GC to collect as garbage.
Sweep Phase
In the sweep face, all the unmarked objects from the Mark phase will be removed from the
memory freeing up space.
14
As you can observe from the above diagram, there may exist plenty of free regions after the
sweep phase. But, due to this fragmentation, the next memory allocation may fail if it is bigger
than all the existing free regions.
After the sweep phase, all the memory locations are rearranged to provide a more compact
memory allocation. The downside of this approach is an increased GC pause duration as it needs
to copy all objects to a new place and to update all references to such objects.
15
3) Copy Collection method
This is similar to the Mark and Sweep collector, but the memory space is divided into two.
Initially, the objects are allocated to one space (fromspace), and the live objects are marked.
During the copy phase, the marked objects are copied into the other space (tospace) and at the
same time compacted. Then, the fromspace is cleared out.
16
After that, both spaces are swapped resulting any new memory allocation to allocate memory in
the “new fromspace” (the old tospace will now become the new fromspace). Finally, when the
“new fromspace” becomes full, the whole process happens again.
----------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------
17