0% found this document useful (0 votes)
37 views17 pages

Real Time Analysis of Log Data Using Data Streaming: Colloquium Presentation

This project is a product based work based on real time data collection and analysis from online traffic through log data. The project has immense potential to capture information in various fields, such as current server traffic, network busy time, or malicious IP addresses trying to consume resources through DDOS attacks. In real-time, a security operations center (SOC) could detect an attack in a matter of minutes. Using RTA, the corporation can have a 360 degree view of their customer’s inte

Uploaded by

Vikas Parmar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views17 pages

Real Time Analysis of Log Data Using Data Streaming: Colloquium Presentation

This project is a product based work based on real time data collection and analysis from online traffic through log data. The project has immense potential to capture information in various fields, such as current server traffic, network busy time, or malicious IP addresses trying to consume resources through DDOS attacks. In real-time, a security operations center (SOC) could detect an attack in a matter of minutes. Using RTA, the corporation can have a 360 degree view of their customer’s inte

Uploaded by

Vikas Parmar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Real Time Analysis of Log Data Using

Data Streaming
Colloquium Presentation

ABV-Indian Institute of Information Technology and Management


Gwalior

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Company’s Profile
• A Product of Toppr Technologies Private Limited.
• Learning app for students studying in classes 5th to 12th and students
appearing for entrance exams and scholarship exams.
• As of December 2017, Toppr has a user base of 2.5 million.
• Motto: Make Learning Personalized.
• Awarded the best educational website by Indian Digital Awards [IAMAI] and
also by AWS Mobility.
• Recognized as the Top 10 Hottest Start-ups by CB Insights.
• Helped thousands of students crack JEE.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Footprint of Toppr
Toppr has taken personalised
learning to students in Assam,
Odisha, Arunachal, J&K,
Chhattisgarh, Himachal and
many others.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


JEE Surveys Favouring Toppr

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Problem Statement
• Given the enormous amount of data depicting different information, the task is to
capture them in real time and analyse it.

• Can have several use cases such as detecting malicious IP flooding the server, the
busy servers, the loyal customers, the error rates and error types.

• Suitable actions should also be automated.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Design Constraints
• Near real time requirement.
• Handle high ingestion rate.
• Cost effective and feasible.
• Quick automated action.
• Streaming data should be secure.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Literature Review
STREAMING DATA

• Data that is generated continuously by thousands of data sources, which


typically send in the data records simultaneously, and in Kilobytes.
• Need to be processed sequentially and incrementally on a record-by-
record basis or over sliding time windows

AMAZON WEB SERVICES

• Amazon Web Services (AWS) is a secure cloud services platform.


• Offers compute power, database storage, content delivery and other
functionality to help businesses scale and grow.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


AWS Services Used

AWS KINESIS EC2 LAMBDA CLOUDWATCH


AWS Kinesis
• Massively scalable and durable real-time data streaming service of AWS.
• Consist of Shards, which hold the data as the basic scaling unit of the stream.
• Data read by the Kinesis Agent is fed into this stream in near real time.
• The Analytics stream consists of:
• Input
• Application Code
• Output
• Can also trigger the AWS Lambda Function for further steps.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Solution Approach
TASK 1: Creating EC2 instance

• Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type, EBS backed.
• Update JAVA and Python on this instance and install Kinesis Agent.

TASK 2: Configure the Kinesis Agent

• Put suitable Access keys, region, and endpoint ARN.


• Agent reads the data to put it to the stream with given ARN.
• Configured in the JSON file

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Solution Approach

TASK 3: Configure the Kinesis Data Stream

• Using AWSCLI, a new stream is created with multiple Shards.


• The ARN of this stream is used by the agent to discover it.
• The data from the log file is read by the agent and put into this
stream.

TASK 4: Configure the Kinesis Analytics Stream

• Fake Apache log going into the KDS is analyzed over a sliding
time window.
• New KAS application with source and destination set is
created.
• KAS allows writing SQL like queries over the streaming data.
• Provides output as well as error stream.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Solution Approach

TASK 5: Catching the Output Stream

• Here we chose another KDS to hold the output data.


• The new KDS has the Analytics Stream as its source.

TASK 6: Configure a Lambda Function

• The Lambda function is configured to be triggered when the


batch size of data exceeds 10 in KDS.
• Uses Python BOTO3 library to trigger Automated services such
as CloudWatch metric emission.
• Created the lambda function to send the notification email to
network administrator in case an IP is detected to be sending
the malicious requests.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Flow Diagram of Project
CloudWatch Metrics

EC2 with Kinesis Agent Kinesis Data Stream 1 Kinesis Analytics Stream Kinesis Data Stream 2

CREATE OR REPLACE STREAM "REQ_COUNT" (IP Lambda Function


varchar(20), COUNT_VAL integer);
CREATE OR REPLACE PUMP "VAL_PUMP" AS
INSERT INTO "REQ_COUNT"
SELECT STREAM IP, COUNT (IP) AS COUNT_VAL
FROM "SOURCE_SQL_STREAM_001"
GROUP BY "IP",
STEP ("SOURCE_SQL_STREAM_001". ROWTIME
BY INTERVAL '5' MINUTES);

Simple Email Service

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Results
• The model was first tested on fake Apache log data where it produced
accurate results in finding:
• The number of errors in every 5 minutes window frame.
• The number of requests made by each IP per time window frame of 5 minutes.
• Suitable lambda functions were triggered as per the threshold conditions.
• The model later went for the rigorous QA where it is tested on real data for
its correctness.
• The delays per link were within the tolerance limits.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


Conclusion
• The main objective of this internship were met and all the given
requirements were fulfilled.
• The knowledge acquired through continuous learning at the institute
definitely helped in implementing the required tasks easily.
• Helped me learn to implement IT solutions to real world problems.
• Project passed DevTesting, Quality Analysis and Sanity test with satisfactory
results.
• Deployed and working.

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


References
• AWS Documentation : https://docs.aws.amazon.com/index.html#lang/en_us
• Perform sliding window queries on streaming data: https://sqlstream.com/platform/kinesis/
• Fake Apache log document: https://httpd.apache.org/docs/2.4/logs.html
• Code and method references: https://github.com/awslabs/
• HTTP status codes: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
• About streaming data: https://aws.amazon.com/streaming-data/
• Producer and Consumer scripts: https://www.arundhaj.com/blog/getting-started-kinesis-
python.html
• Understanding DoS attacks: https://www.digitalattackmap.com/understanding-ddos/

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr


THANK YOU

Prakhar Dev Gupta | 2014-IPG-062 | Summer Internship | Toppr

You might also like