0% found this document useful (0 votes)
188 views3 pages

Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.

HeavyKeeper is an algorithm that accurately identifies the top-k elephant flows using a small amount of memory. It uses a technique called "count-with-exponential-decay" to actively remove small flows through decaying, while minimizing the impact on large flows. This allows it to achieve high precision in finding top-k flows using only a small, constant amount of processing per packet. Experimental results show HeavyKeeper achieves over 99.99% precision and reduces errors by three orders of magnitude compared to state-of-the-art algorithms.

Uploaded by

siva sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views3 pages

Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.

HeavyKeeper is an algorithm that accurately identifies the top-k elephant flows using a small amount of memory. It uses a technique called "count-with-exponential-decay" to actively remove small flows through decaying, while minimizing the impact on large flows. This allows it to achieve high precision in finding top-k flows using only a small, constant amount of processing per packet. Experimental results show HeavyKeeper achieves over 99.99% precision and reduces errors by three orders of magnitude compared to state-of-the-art algorithms.

Uploaded by

siva sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Heavy Keeper: An Accurate Algorithm for Finding

Top-k Elephant Flows.


Abstract: Finding top-k elephant flows is a critical task in network traffic measurement, with
many applications in congestion control, anomaly detection and traffic engineering. As the line
rates keep increasing in today’s networks, designing accurate and fast algorithms for online
identification of elephant flows becomes more and more challenging. The prior algorithms are
seriously limited in achieving accuracy under the constraints of heavy traffic and small on-chip
memory in use. We observe that the basic strategies adopted by these algorithms either require
significant space overhead to measure the sizes of all flows or incur significant inaccuracy when
deciding which flows to keep track of. In this paper, we adopt a new strategy, called count-with-
exponential-decay, to achieve space-accuracy balance by actively removing small flows through
decaying, while minimizing the impact on large flows, so as to achieve high precision in finding
top-k elephant flows. Moreover, the proposed algorithm called HeavyKeeper incurs small,
constant processing overhead per packet and thus supports high line rates. Experimental results
show that HeavyKeeper algorithm achieves 99.99% precision with a small memory size, and
reduces the error by around 3 orders of magnitude on average compared to the state-of-the-art.

Index Terms: HeavyKeeper, top-k, sketch, network measurements, elephant flow.

Existing system:

Traditional solutions to finding the top-k flows follow two basic strategies: count-all and admit-
all-count-some. The count-all strategy relies on a sketch to measure the sizes of all flows, while
using a min-heap to keep track of the top-k flows. For each incoming packet, it records the
packet in the sketch and retrieves from the sketch an estimate ni for the size of the flow fi that the
packet belongs to. If ni is larger than the smallest flow size in the min-heap, it replaces the
smallest flow in the heap by flow fi. As a large sketch is needed to count all flows, these
solutions are not memory efficient. The admit-all-count-some strategy is adopted by Frequent
Lossy Counting, Space-Saving and CSS . These algorithms are similar to each other. To save
memory, Space-Saving only maintains a data structure called Stream-Summary to count only
some flows (e.g., m flows). Each new flow will be inserted into the summary, replacing the
smallest existing flow. The initial size of the new flow is set as nmin+1, where ˆnmin is the size
of the smallest flow in the summary. By keeping m flows in the summary, the algorithm will
report the largest k flows among them, where m > k. It assumes every new incoming flow is an
elephant flow, and expels the smallest one in the summary to make room for the new one. But
most flows are mouse flows.
Disadvantages:
An assumption causes significant error, especially under tight memory (for a limited value of m).
Proposed system:
Proposed a new algorithm, HeavyKeeper, which uses the similar strategy introduced from, called
count-with-exponential-decay. It keeps all elephant flows while drastically reducing space
wasted on mouse flows. Heavy-Guardian can handle five different tasks, but not including top-k
elephant flows detection, while the algorithm we proposed just focuses on finding top-k elephant
flows. Heavy- Keeper uses multiple arrays, and thus can scale well while Heavy Guardian
cannot. Unlike count-all, our strategy only keeps track of a small number of flows. Unlike
admit-all-count-some, we do not automatically admit new flows into our data structure and the
vast majority of mouse flows will be by-passed. For a small number of mouse flows that do enter
our data structure, they will decay away to make room for true elephants. The decay is not
uniform for the flows in our data structure. The design of exponential decay is biased against
small flows, and it has a smaller impact on larger flows .
Advantages:
This design works extremely well with real traffic traces under small memory.
Modules:
The HeavyKeeper: HeavyKeeper is comprised of d arrays, and each array is comprised of w
buckets. Each bucket consists of two fields: a fingerprint field and a counter field.
Basic Version for Finding Top-k Elephant Flows
To find top-k elephant flows, our basic version just uses a HeavyKeeper and a min-heap. The
min-heap is used to store the IDs and sizes of top-k flows. For each incoming packet Pl
belonging to flow fi, we first insert it into HeavyKeeper. Suppose that HeavyKeeper reports the
size of fi as ˆni. If fi is already in the min-heap, we update its estimated flow size with
max(ˆni,min_heap[fi]), where min_heap[fi] is the recorded size of fi in min-heap. Otherwise, if
ˆni is larger than the smallest flow size which is in the root node of the min-heap, we expel the
root node from the min-heap, and insert fi with ˆni into the min-heap. To query top-k flows, we
simply report the k flows in the min-heap with their estimated flow sizes
Query top-k flows: It reports the k flows recorded in the min-heap and their estimated flow
sizes.
Software Requirements
Operating System : Windows XP/2003 or Linux (Any OS)
User Interface : HTML, CSS
Client-side Scripting : JavaScript
Programming Language : Java
Web Applications : JDBC, Servlets, JSP
IDE/Workbench : My Eclipse 8.6
Database : Oracle 11g
Server Deployment : Tomcat 7.0

Hardware Requirements (Minimum)

Processor : Intel core i3 or above


Hard Disk : 500GB or more
RAM : 8GB or more

Conclusion:
Finding the top-k elephant flows is a critical task for network traffic measurement. Existing
algorithms for finding top-k flows cannot achieve high precision when traffic speed is high and
memory usage is small. In this paper, we propose a novel data structure, called HeavyKeeper,
which achieves a much higher precision on top-k queries and a much lower error rate on flow
size estimation, compared to previous algorithms. The key idea of HeavyKeeper is that it
intelligently omits mouse flows, and focuses on recording the information of elephant flows by
using the exponential-weakening decay strategy.

You might also like