Heaps: Analysis of Algorithms
Heaps: Analysis of Algorithms
Heaps: Analysis of Algorithms
Analysis of Algorithms
Heaps
Heap is a specialized tree-based data structure that satisfies the heap property.
If B is a child node of A, then key(A) key(B). This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap. If the comparison is reversed, the smallest element is always in the root node, which results in a min-heap.
The heap is one maximally-efficient implementation of an abstract data type called a priority queue. Applications:
In several efficient graph algorithms such as Dijkstra's algorithm, Prim's minimal spanning tree algorithm and in the sorting algorithm heapsort.
Implementation of Heaps
Heaps are usually implemented in an array, and do not require pointers between elements. The operations commonly performed with a heap are:
create-heap: create an empty heap find-max or find-min: find the maximum item of a max-heap or a minimum item of a min-heap, respectively delete-max or delete-min: removing the root node of a maxor min-heap, respectively increase-key or decrease-key: updating a key within a maxor min-heap, respectively insert: adding a new key to the heap merge: joining two heaps to form a valid new heap containing all the elements of both.
Heapsort
Heapsort is a comparison-based sorting algorithm to create a sorted array (or list), and is part of the selection sort family. Heapsort is an in-place algorithm, but is not a stable sort.
In-place=an algorithm which transforms input using a data structure with a small, constant amount of extra storage space. Stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different then this distinction is not necessary. But if there are equal keys, then a sorting algorithm is stable if whenever there are two records
sds
Heap sort
Explanation
1. Heapsort begins by building a heap out of the data
set. 2. and then removing the largest item and placing it at the end of the partially sorted array. 3. After removing the largest item, it reconstructs the heap, removes the largest remaining item, and places it in the next open position from the end of the partially sorted array. 4. This is repeated until there are no items left in the heap and the sorted array is full. Elementary implementations require two arrays - one to hold the heap and the other to hold the sorted elements.
Explanation
Heapsort inserts the input list elements into a binary heap data structure. The largest value (in a max-heap) or the smallest value (in a min-heap) are extracted until none remain, the values having been extracted in sorted order. The heap's invariant is preserved after each extraction, so the only cost is that of extraction. During extraction, the only space required is that needed to store the heap. T To achieve constant space overhead, the heap is stored in the part of the input array not yet sorted.
Heapsort Operations
Heapsort uses two heap operations: insertion and root deletion. Each extraction places an element in the last empty location of the array. The remaining prefix of the array stores the unsorted elements. Worst case performance O(n log n) Best case performance (n),O(n log n) Average case performance O(n log n) Worst case space complexity O(n) total, O(1) auxiliary
Quicksort is typically somewhat faster, due to better cache behavior and other factors, but the worst-case running time for quicksort is O(n2), which is unacceptable for large data sets and can be deliberately triggered given enough knowledge of the implementation, creating a security risk. Because of the O(n log n) upper bound on heapsort's running time and constant upper bound on its auxiliary storage, embedded systems with real-time constraints or systems concerned with security often use heapsort
Has the same time bounds, but requires (n) auxiliary space, whereas heapsort requires only a constant amount. Heapsort also typically runs more quickly in practice on machines with small or slow data caches. On the other hand, merge sort has several advantages over heapsort:
Like quicksort, merge sort on arrays has considerably better data cache performance, often outperforming heapsort on a modern desktop PC, because it accesses the elements in order. Merge sort is a stable sort. Merge sort parallelizes better; the most trivial way of parallelizing merge sort achieves close to linear speedup, while
Example
Let { 6, 5, 3, 1, 8, 7, 2, 4 } be the list that we want to sort from the smallest to the largest.
(NOTE, for 'Building the Heap' step: Larger nodes don't stay below smaller node parents. They are swapped with parents, and then recursively checked if another swap is needed, to keep larger numbers above smaller numbers on the heap binary tree.)
2. Sorting.
Priority Queues
It is exactly like a regular queue or stack data structure, but additionally, each element is associated with a "priority". stack: elements are pulled in last-in first-out-order (e.g. a stack of papers) queue: elements are pulled in first-in first-out-order (e.g. a line in a cafeteria) priority queue: elements are pulled highest-priorityfirst (e.g. cutting in line, or VIP service).
Operations
A priority queue must at least support the following operations: insert_with_priority: add an element to the queue with an associated priority pull_highest_priority_element: remove the element from the queue that has the highest priority, and return it (also known as "pop_element(Off)", "get_maximum_element", or "get_front(most)_element"; some conventions consider lower priorities to be higher, so this may also be known as "get_minimum_element", and is often referred to as "getmin" in the literature; the literature also sometimes implement separate "peek_at_highest_priority_element" and "delete_element" functions, which can be combined to produce "pull_highest_priority_element")
Applications
Bandwidth management
Priority queuing can be used to manage limited resources such as bandwidth on a transmission line from a network router. Queuing due to insufficient bandwidth, all other queues can be halted to send the traffic from the highest priority queue upon arrival. This ensures that the prioritized traffic (such as real-time traffic, e.g. an RTP stream of a VoIP connection) is forwarded with the least delay and the least likelihood of being rejected due to a queue reaching its maximum capacity. Another use of a priority queue is to manage the events in a discrete event simulation. The events are added to the queue with their simulation time used as the priority. The execution of the simulation proceeds by repeatedly pulling the top of the queue and executing the event thereon.
Applications
The A* search algorithm finds the shortest path between two vertices or nodes of a weighted graph, trying out the most promising routes first. The priority queue (also known as the fringe) is used to keep track of unexplored routes; the one for which a lower bound on the total path length is smallest is given highest priority.
ROAM triangulation algorithm The Real-time Optimally Adapting Meshes (ROAM) algorithm computes a dynamically changing triangulation of a terrain. It works by splitting triangles where more detail is needed and merging them where less detail is needed. The algorithm assigns each triangle in the terrain a priority, usually related to the error decrease if that triangle would be split. The algorithm uses two priority queues, one for triangles that can be split and another for triangles that can be merged. In each step the triangle from the split queue with the highest priority is split, or the triangle from the merge queue with the lowest priority is merged with its neighbours.
Hashing
A hash function is any algorithm that maps large data sets to smaller data sets, called keys. For example,
a single integer can serve as an index to an array (cf. associative array).
The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.
Hash functions are mostly used to accelerate table lookup or data comparison tasks such as finding items in a database, detecting duplicated or similar records in a large file, finding similar stretches in DNA sequences, and so on. A hash function should be referentially transparent, i.e. if called twice on input that is "equal" (e.g. strings that consist of the same sequence of characters), it should give the same result.
Applications
Hash tables
Hash functions are primarily used in hash tables, to quickly locate a data record (for example, a dictionary definition) given its search key (the headword). Specifically, the hash function is used to map the search key to the hash. The index gives the place where the corresponding record should be stored. Hash tables, in turn, are used to implement associative arrays and dynamic sets. In general, a hashing function may map several different keys to the same index. Therefore, each slot of a hash table is associated with (implicitly or explicitly) a set of records, rather than a single record. For this reason, each slot of a hash table is often called a bucket, and hash values are also called bucket indices.
Applications
Caches
Hash functions are also used to build caches for large data sets stored in slow media. A cache is generally simpler than a hashed search table, since any collision can be resolved by discarding or writing back the older of the two colliding items.
Bloom filters
Hash functions are an essential ingredient of the Bloom filter, a compact data structure that provides an enclosing approximation to a set of them.
When storing records in a large unsorted file, one may use a hash function to map each record to an index into a table T, and collect in each bucket T[i] a list of the numbers of all records with the same hash value i. Once the table is complete, any two duplicate records will end up in the same bucket. The duplicates can then be found by scanning every bucket T[i] which contains two or more members, fetching those records, and comparing them. With a table of appropriate size, this method is likely to be much faster than any alternative approach (such as sorting
Applications
Geometric hashing
This principle is widely used in computer graphics, computational geometry and many other disciplines, to solve many proximity problems in the plane or in three-dimensional space, such as finding closest pairs in a set of points, similar shapes in a list of shapes, similar images in an image database, and so on. In these applications, the set of all inputs is some sort of metric space, and the hashing function can be interpreted as a partition of that space into a grid of cells. The table is often an array with two or more indices (called a grid file, grid index, bucket grid, and similar names), and the hash function returns an index tuple. This special case of hashing is known as geometric hashing or the grid method.
Types of Hashing
Perfect Hashing
A hash function that is injective(that is, maps each valid input to a different hash value)is said to be perfect. With such a function one can directly locate the desired entry in a hash table, without any additional searching.
Types of Hashing
A perfect hash function for n keys is said to be minimal if its range consists of n consecutive integers, usually from 0 to n1. Besides providing single-step lookup, a minimal perfect hash function also yields a compact hash table, without any vacant slots. Minimal perfect hash functions are much harder to find than perfect ones with a wider range.