Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.
Heavy Keeper An Accurate Algorithm For Finding Top-K Elephant Flows.
Existing system:
Traditional solutions to finding the top-k flows follow two basic strategies: count-all and admit-
all-count-some. The count-all strategy relies on a sketch to measure the sizes of all flows, while
using a min-heap to keep track of the top-k flows. For each incoming packet, it records the
packet in the sketch and retrieves from the sketch an estimate ni for the size of the flow fi that the
packet belongs to. If ni is larger than the smallest flow size in the min-heap, it replaces the
smallest flow in the heap by flow fi. As a large sketch is needed to count all flows, these
solutions are not memory efficient. The admit-all-count-some strategy is adopted by Frequent
Lossy Counting, Space-Saving and CSS . These algorithms are similar to each other. To save
memory, Space-Saving only maintains a data structure called Stream-Summary to count only
some flows (e.g., m flows). Each new flow will be inserted into the summary, replacing the
smallest existing flow. The initial size of the new flow is set as nmin+1, where ˆnmin is the size
of the smallest flow in the summary. By keeping m flows in the summary, the algorithm will
report the largest k flows among them, where m > k. It assumes every new incoming flow is an
elephant flow, and expels the smallest one in the summary to make room for the new one. But
most flows are mouse flows.
Disadvantages:
An assumption causes significant error, especially under tight memory (for a limited value of m).
Proposed system:
Proposed a new algorithm, HeavyKeeper, which uses the similar strategy introduced from, called
count-with-exponential-decay. It keeps all elephant flows while drastically reducing space
wasted on mouse flows. Heavy-Guardian can handle five different tasks, but not including top-k
elephant flows detection, while the algorithm we proposed just focuses on finding top-k elephant
flows. Heavy- Keeper uses multiple arrays, and thus can scale well while Heavy Guardian
cannot. Unlike count-all, our strategy only keeps track of a small number of flows. Unlike
admit-all-count-some, we do not automatically admit new flows into our data structure and the
vast majority of mouse flows will be by-passed. For a small number of mouse flows that do enter
our data structure, they will decay away to make room for true elephants. The decay is not
uniform for the flows in our data structure. The design of exponential decay is biased against
small flows, and it has a smaller impact on larger flows .
Advantages:
This design works extremely well with real traffic traces under small memory.
Modules:
The HeavyKeeper: HeavyKeeper is comprised of d arrays, and each array is comprised of w
buckets. Each bucket consists of two fields: a fingerprint field and a counter field.
Basic Version for Finding Top-k Elephant Flows
To find top-k elephant flows, our basic version just uses a HeavyKeeper and a min-heap. The
min-heap is used to store the IDs and sizes of top-k flows. For each incoming packet Pl
belonging to flow fi, we first insert it into HeavyKeeper. Suppose that HeavyKeeper reports the
size of fi as ˆni. If fi is already in the min-heap, we update its estimated flow size with
max(ˆni,min_heap[fi]), where min_heap[fi] is the recorded size of fi in min-heap. Otherwise, if
ˆni is larger than the smallest flow size which is in the root node of the min-heap, we expel the
root node from the min-heap, and insert fi with ˆni into the min-heap. To query top-k flows, we
simply report the k flows in the min-heap with their estimated flow sizes
Query top-k flows: It reports the k flows recorded in the min-heap and their estimated flow
sizes.
Software Requirements
Operating System : Windows XP/2003 or Linux (Any OS)
User Interface : HTML, CSS
Client-side Scripting : JavaScript
Programming Language : Java
Web Applications : JDBC, Servlets, JSP
IDE/Workbench : My Eclipse 8.6
Database : Oracle 11g
Server Deployment : Tomcat 7.0
Conclusion:
Finding the top-k elephant flows is a critical task for network traffic measurement. Existing
algorithms for finding top-k flows cannot achieve high precision when traffic speed is high and
memory usage is small. In this paper, we propose a novel data structure, called HeavyKeeper,
which achieves a much higher precision on top-k queries and a much lower error rate on flow
size estimation, compared to previous algorithms. The key idea of HeavyKeeper is that it
intelligently omits mouse flows, and focuses on recording the information of elephant flows by
using the exponential-weakening decay strategy.