Clickstream Analysis Using Hadoop
Clickstream Analysis Using Hadoop
Hadoop
ClickStream Analysis
Data
Data is obtained from the site in the form of click
stream records. Each record consists of the
details of clicks by the visitors and each record
contains the following details:
Server IP
Client IP
Time stamp with Date
URL visted
No. of bytes transferred
Custom record(s)
Methodology
In order to perform our algorithm on a larger data set, we
must know big data and Hadoop.
What is Big data?
A dataset that has the three characteristics can be called big
data:Volume - of order tera byte to peta byte
Variety -
Velocity - How fast data comes (eg more than 5000 tweets
per second)
Hadoop Architecture
Hadoop follows a master-slave architecture.
Simulation Result
Website Visited/Country
Website Visited/Month
Conclusion
Future Work
References
http://hadoop.apache.org/
http://www.usa.gov/About/developerresources/1usagovt.shtml
http://www.cloudera.com/content/cloudera/en/about
/hadoop-and-big-data.html
Thank You