0% found this document useful (0 votes)

22 views

Clustering

Hashing refers to generating a fixed-size output from a variable-sized input using hash functions. It determines an index for storing items in a data structure. A good hash function uniformly distributes keys, minimizes collisions, and has a low load factor. Collisions occur when keys hash to the same slot. Separate chaining handles collisions by storing items in linked lists at each slot, while open addressing probes for empty slots.

Uploaded by

Deneshraja Nedu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Clustering

Uploaded by

Deneshraja Nedu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Hashing

Hashing refers to the process of generating a fixed-size output from an input of variable size
using the mathematical formulas known as hash functions. This technique determines an index or
location for the storage of an item in a data structure.

Components of Hashing

There are majorly three components of hashing:

1. Key: A Key can be anything string or integer which is fed as input in the hash function
the technique that determines an index or location for storage of an item in a data
structure.
2. Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values using a special
function called a hash function. Hash stores the data in an associative manner in an array
where each data value has its own unique index.

Properties of a Good hash function

A hash function that maps every item into its own unique slot is known as a perfect hash
function. We can construct a perfect hash function if we know the items and the collection will
never change but the problem is that there is no systematic way to construct a perfect hash
function given an arbitrary collection of items. Fortunately, we will still gain performance
efficiency even if the hash function isn’t perfect. We can achieve a perfect hash function by
increasing the size of the hash table so that every possible value can be accommodated. As a
result, each item will have a unique slot. Although this approach is feasible for a small number of
items, it is not practical when the number of possibilities is large.

So, We can construct our hash function to do the same but the things that we must be careful
about while constructing our own hash function.

A good hash function should have the following properties:

1. Efficiently computable.
2.  Should uniformly distribute the keys (Each table position is equally likely for each.
3. Should minimize collisions.
4. Should have a low load factor(number of items in the table divided by the size of the
table)

What is Collision?
Since a hash function gets us a small number for a key which is a big integer or string, there is a
possibility that two keys result in the same value. The situation where a newly inserted key maps
to an already occupied slot in the hash table is called collision and must be handled using some
collision handling technique.
How to handle Collisions?
There are mainly two methods to handle collision:

 Separate Chaining
 Open Addressing
Separate Chaining:
The idea behind separate chaining
ing is to implement the array as a linked list called a chain.
Separate chaining is one of the most popular and commonly used techniques in order to handle
collisions.

The linked list data structure is used to implement this technique. So what happens is, when
multiple elements are hashed into the same slot index, then these elements are inserted into a
singly-linked
linked list which is known as a chain.

Here, all those elements that hash into the same slot index are inserted into a linked list. Now, we
can use a key K to search in the linked list by just linearly traversing. If the intrinsic key for any
entry is equal to K then it means that we have found our entry. If we have reached the end of the
linked list and yet we haven’t found our entry then it means th that
at the entry does not exist. Hence,
the conclusion is that in separate chaining, if two different elements have the same hash value
then we store both the elements in the same linked list one after the other.
Advantages:
 Simple to implement.
 Hash table neverver fills up, we can always add more elements to the chain.
 Less sensitive to the hash function or load factors.
 It is mostly used when it is unknown how many and how frequently keys may be inserted
or deleted.
Disadvantages:
 The cache performance of cchaining
haining is not good as keys are stored using a linked list.
Open addressing provides better cache performance as everything is stored in the same
table.
 Wastage of Space (Some Parts of the hash table are never used)
 If the chain becomes long, then search time can become O(n) in the worst case
 Uses extra space for links

Example: Let us consider a simple hash function as ““key mod 7”” and a sequence of keys as 50,
700, 76, 85, 92, 73, 101
2) Open Addressing

Like separate chaining, open addressing is a method for handling collisions. In Open Addressing,
all elements are stored in the hash table itself. So at any point, the size of the table must be
greater than
han or equal to the total number of keys (Note that we can increase table size by copying
old data if needed). This approach is also known as closed hashing. This entire procedure is
based upon probing. We will understand the types of probing ahead:

 Insert(k): Keep probing until an empty slot is found. Once an empty slot is
found, insert k.
 Search(k): Keep probing until the slot’s key doesn’t become equal to k or an
empty slot is reached.
 Delete(k): Delete operation is interesting
interesting.. If we simply delete a key,
k then the
search may fail. So slots of deleted keys are marked specially as “deleted”.
The insert can insert an item in a deleted slot, but the search doesn’t stop at a
deleted sl

Algorithm:

1. Calculate the hash key. i.e. key = data % size

2. Check, if hashTable[key] is empty
o store the value directly by hashTable[key] = data
3. If the hash index already has some value then
1. checkk for next index using key = (key+1) % size
4. Check, if the next index is available hashTable[key] then store the value. Otherwise try
for next index.
5. Do the above process till we find the space.
Clustering

The process of combining a set of physical or abstract objects into classes of the same objects is
known as clustering. A cluster is a set of data objects that are the same as one another within the
same cluster and are disparate from the objects in other clusters. A cluster of data objects can be
considered collectively as one group in several applications. Cluster analysis is an essential
human activity.

Cluster analysis is used to form groups or clusters of the same records depending on various
measures made on these records. The key design is to define the clusters in ways that can be
useful for the objective of the analysis. This data has been used in several areas, such as
astronomy, archaeology, medicine, chemistry, education, psychology, linguistics, and sociology.
There is one famous use of cluster analysis in marketing is for market segmentation − users are
segmented based on demographic and transaction history data, and marketing techniques are
tailored for each segment.

Another term is for market structure analysis identifying teams of the same products according to
competitive measures of similarity. In marketing and political forecasting, clustering of
neighborhoods using U.S. postal zip codes has been used strongly to group neighborhoods by
lifestyles.

In finance, cluster analysis can be used for making balanced portfolios − Given data on several
investment opportunities (e.g., stocks), one can find clusters depending on financial performance
variables including return (daily, weekly, or monthly), volatility, beta, and other characteristics,
including industry and market capitalization. Selecting securities from multiple clusters can help
make a balanced portfolio.

There is another operation of cluster analysis in finance is for market analysis. For a given
industry, it is interested in finding teams of the same firms based on measures such as growth
rate, profitability, industry size, product range, and presence in several international markets.
These teams can then be analyzed to learn the market structure and to decide, for example, who
is a competitor.

Cluster analysis can be used for large amounts of data. For example, Internet search engines use
clustering methods to cluster queries that users submit. These can then be used for developing
search algorithms.

Generally, the basic data used to cluster are a table of measurements on various variables, where
each column defines a variable and a row defines a record. The aim is to form groups of data so
that the same records are in the same group. The number of clusters can be pre-specified or
decided from the data.

Hashing
No ratings yet
Hashing
37 pages
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
No ratings yet
Topic 1: Hashing - Introduction: Hashing Is A Method of Storing and Retrieving Data From A Database Efficiently
31 pages
Hashing Part1 - 241021 - 152911
No ratings yet
Hashing Part1 - 241021 - 152911
10 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Hashing
No ratings yet
Hashing
4 pages
Lab 09 - Hashing
No ratings yet
Lab 09 - Hashing
47 pages
Hash Table
No ratings yet
Hash Table
26 pages
Hashing in Data Structure
No ratings yet
Hashing in Data Structure
23 pages
629314285 Hashing in Data Structure
No ratings yet
629314285 Hashing in Data Structure
23 pages
Unit29 Hashing2
No ratings yet
Unit29 Hashing2
20 pages
Hashing 1
No ratings yet
Hashing 1
4 pages
Hash Table
No ratings yet
Hash Table
15 pages
Hashng Notes SVIMS
No ratings yet
Hashng Notes SVIMS
14 pages
Modifed Hash
No ratings yet
Modifed Hash
42 pages
CH 4
No ratings yet
CH 4
58 pages
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
No ratings yet
AST20105 Data Structure and Algorithms: Chapter 9 - Hash Table
39 pages
HASHING
No ratings yet
HASHING
8 pages
Collision
No ratings yet
Collision
24 pages
DSA G5 Hashing Handouts
No ratings yet
DSA G5 Hashing Handouts
7 pages
Unit 5 Data Structure
No ratings yet
Unit 5 Data Structure
12 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Theory PDF
No ratings yet
Theory PDF
18 pages
Lect Hashing
No ratings yet
Lect Hashing
36 pages
Hashing
No ratings yet
Hashing
9 pages
DSA Lab 11 Hashing
No ratings yet
DSA Lab 11 Hashing
9 pages
ADI Hashing
No ratings yet
ADI Hashing
47 pages
Hashing Cropped (1)
No ratings yet
Hashing Cropped (1)
12 pages
Seminar 5
No ratings yet
Seminar 5
5 pages
DSA MK Lect2 PDF
No ratings yet
DSA MK Lect2 PDF
92 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
Cse373 10 Hashing
No ratings yet
Cse373 10 Hashing
36 pages
unit 1 Hashing
No ratings yet
unit 1 Hashing
61 pages
Hashing
No ratings yet
Hashing
75 pages
11 Hashtable-1
No ratings yet
11 Hashtable-1
48 pages
Hashing
No ratings yet
Hashing
34 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing
No ratings yet
Hashing
30 pages
ADS Unit 3
No ratings yet
ADS Unit 3
14 pages
05 Hashing
No ratings yet
05 Hashing
47 pages
Lecture 27 - Hashing
No ratings yet
Lecture 27 - Hashing
48 pages
Hashing Slide
No ratings yet
Hashing Slide
16 pages
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
No ratings yet
Hash Tables: COT4810 Ken Pritchard 2 Sep 04
20 pages
Unit IV Hashing and Set 9
No ratings yet
Unit IV Hashing and Set 9
8 pages
Implementation Priority Queue Using Array
No ratings yet
Implementation Priority Queue Using Array
3 pages
Hashing
No ratings yet
Hashing
20 pages
Collision
No ratings yet
Collision
4 pages
Hashing
No ratings yet
Hashing
7 pages
Hash Tables
No ratings yet
Hash Tables
21 pages
Hashing
No ratings yet
Hashing
30 pages
Colossion in Hasing
No ratings yet
Colossion in Hasing
22 pages
Group 15 Hash Tables
No ratings yet
Group 15 Hash Tables
42 pages
Hashing Techniques
No ratings yet
Hashing Techniques
25 pages
Study_Material_on_Hashing
No ratings yet
Study_Material_on_Hashing
4 pages
Hashing
No ratings yet
Hashing
24 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
GROUP 15.Pptx Presentation
No ratings yet
GROUP 15.Pptx Presentation
29 pages
Various Formulas Needed To Solve Energy and Power Signal Problem
No ratings yet
Various Formulas Needed To Solve Energy and Power Signal Problem
1 page
Digital Signal Processing (EEE324) : Lab Instructor Engr. Muhammad Ayaz
No ratings yet
Digital Signal Processing (EEE324) : Lab Instructor Engr. Muhammad Ayaz
12 pages
POLYNOMIALS ASSERTION AND REASON (1)
No ratings yet
POLYNOMIALS ASSERTION AND REASON (1)
4 pages
Lab-7_Clustering
No ratings yet
Lab-7_Clustering
4 pages
CVPR2022 Tutorial Diffusion Model
No ratings yet
CVPR2022 Tutorial Diffusion Model
188 pages
Greedy
No ratings yet
Greedy
8 pages
An Introduction To Sequential Monte Carlo: Nicolas Chopin Omiros Papaspiliopoulos
No ratings yet
An Introduction To Sequential Monte Carlo: Nicolas Chopin Omiros Papaspiliopoulos
390 pages
AIML- Module 3- Updated
No ratings yet
AIML- Module 3- Updated
42 pages
Optimizing Technique-Grenade Explosion Method
100% (1)
Optimizing Technique-Grenade Explosion Method
18 pages
Soalan Assignment Stm10173
No ratings yet
Soalan Assignment Stm10173
7 pages
DSP Complex Engineering Activity
No ratings yet
DSP Complex Engineering Activity
12 pages
Analysis and Design of Algorithm Final
No ratings yet
Analysis and Design of Algorithm Final
10 pages
Distribution and Network Models: Solutions
No ratings yet
Distribution and Network Models: Solutions
10 pages
39.M.E. Digital Signal Processing
No ratings yet
39.M.E. Digital Signal Processing
42 pages
Sorting and Searching
No ratings yet
Sorting and Searching
15 pages
Department of Computer Science & Engineering Practical File Subject: Artificial Intelligence Lab (BTCS 605-18) B. Tech - 6 Semester (Batch 2020-24)
No ratings yet
Department of Computer Science & Engineering Practical File Subject: Artificial Intelligence Lab (BTCS 605-18) B. Tech - 6 Semester (Batch 2020-24)
28 pages
Complexity Analysis of Algorithms: Jordi Cortadella Department of Computer Science
No ratings yet
Complexity Analysis of Algorithms: Jordi Cortadella Department of Computer Science
19 pages
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki pdf download
No ratings yet
Data Mining and Analysis Fundamental Concepts and Algorithms 1st Edition by Mohammed Zaki pdf download
49 pages
Homework 2 Sol PDF
No ratings yet
Homework 2 Sol PDF
6 pages
cs312 PDF
No ratings yet
cs312 PDF
3 pages
Numerical Methods COMPLETE QUIZ PDF
No ratings yet
Numerical Methods COMPLETE QUIZ PDF
22 pages
60003190064_Assignment
No ratings yet
60003190064_Assignment
2 pages
Algorithms RWHung
No ratings yet
Algorithms RWHung
317 pages
internship report _merged
No ratings yet
internship report _merged
29 pages
Page Replacement
No ratings yet
Page Replacement
4 pages
Chapter Six Realization of Discrete-Time Systems: Lecture #10
No ratings yet
Chapter Six Realization of Discrete-Time Systems: Lecture #10
31 pages
Divide & Conquer - Recurrance
No ratings yet
Divide & Conquer - Recurrance
17 pages
(Ebook) Linear programming with MATLAB by Michael C. Ferris, Olvi L. Mangasarian, Stephen J. Wright ISBN 9780898716436, 0898716438 All Chapters Instant Download
100% (3)
(Ebook) Linear programming with MATLAB by Michael C. Ferris, Olvi L. Mangasarian, Stephen J. Wright ISBN 9780898716436, 0898716438 All Chapters Instant Download
81 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
32 pages
SNM
No ratings yet
SNM
7 pages

Clustering

Uploaded by

Clustering

Uploaded by

Hashing

There are majorly three components of hashing:

Properties of a Good hash function

A good hash function should have the following properties:

1. Calculate the hash key. i.e. key = data % size

You might also like

A good hash function should have the following properties: