COMP 593 - Lab 4 - Gateway Log Investigation
COMP 593 - Lab 4 - Gateway Log Investigation
TABLE OF CONTENTS
Learning Objectives ........................................................................................................................2
Introduction....................................................................................................................................2
Instructor Notes .............................................................................................................................3
Instructions .....................................................................................................................................4
Step 1: Download the Gateway Firewall Log ............................................................................................ 4
Step 2: Get the Script Templates from D2L .............................................................................................. 4
Step 3: Create Function that Gets the Log File Path................................................................................. 4
Step 4: Inspect the Function that Filters Log Message Records ............................................................... 5
Step 5: Investigate the Gateway Firewall Log .......................................................................................... 5
Step 6: Extract Data from Records in the Log File .................................................................................... 7
Step 7: Create Function that Tallies Traffic by Port .................................................................................. 8
Step 8: Create Function that Generates Destination Port Reports .......................................................... 8
Step 9: Generate Destination Port Reports .............................................................................................. 9
Step 10: Create Function that Generates Invalid User Report ................................................................. 9
Step 11: Create Function that Extracts and Saves Source IP Records .................................................... 10
Dropbox Submission .....................................................................................................................11
Assessment ...................................................................................................................................12
Page 1 of 12
COMP 593
Lab 4: Gateway Log Investigation
LEARNING OBJECTIVES
Upon completion of this lab assignment, students should be able to:
• Write a Python script that process a text file line by line
• Incorporate regex matching operations into a Python script using the functions and
classes defined in the re module, including:
o Matching text in a string using a regex pattern,
o Extract text from a string using regex capture groups,
o Replacing text in a string using a regex pattern to locate the text to be replaced.
• Generate a CSV file using DataFrame class methods from the pandas module.
• Generate a plain text file using DataFrame class methods from the pandas module.
• Discuss the purpose and format of a function docstring.
INTRODUCTION
This lab assignment simulates a computer security investigation that involves developing a
Python script to locate and extract information from a log file produced by a network gateway
firewall.
A gateway is a node that connects two networks together, serving as an entry and exit point
that all data passes through before being routed to its destination. An example network
architecture that incorporates a gateway is depicted in the figure below.
Page 2 of 12
COMP 593
Lab 4: Gateway Log Investigation
A firewall is a network security device that monitors and filters incoming and outgoing network
traffic based on established security policies. Some network gateways have a built-in firewall.
Firewalls can typically be configured to record a log of all network traffic that passes through it,
or only traffic that meets some specific criteria, e.g., record all network traffic that is blocked by
the firewall. Such a log may be saved as a plain text file, where key information about network
traffic is recorded, such as source and destination IP addresses, which can be useful for
detecting and tracking malicious attacks on the network from external sources.
Firewall logs may contain a large amount of data, making them impractical to search through
manually. For this reason, some type of software tool is typically needed to facilitate a timely
and accurate investigation. In some cases, e.g., when the log uses a CSV format, it may be
sufficient to open the log file in a spreadsheet application and use its built-in search and
filtering capabilities to locate records of interest; however, the log file may not be formatted in
a way that allows the spreadsheet application to automatically separate values into individual
columns, which severely limits its filtering capabilities. Another approach could involve using a
log analysis tool, but this would involve learning how to use the tool and may be overkill for a
simple log investigation.
Another approach is to write a Python script that processes the log file line by line and uses
regular expressions to locate and extract information of interest. This approach provides
endless possibilities for text searching and report generation and can be tailored to be as simple
or complex as needed for the task at hand. This is the log analysis approach that students will
perform for this lab assignment.
INSTRUCTOR NOTES
The instructor will guide students through implementing steps 1-9.
Students will complete steps 10 and 11 on their own, using the code implemented in the
previous steps for reference.
Page 3 of 12
COMP 593
Lab 4: Gateway Log Investigation
INSTRUCTIONS
Page 4 of 12
COMP 593
Lab 4: Gateway Log Investigation
Working as a group led by the lab instructor, define a function that gets a log file path from any
command line parameter in the file log_analysis_lib.py. The function must:
• Accept one integer parameter: The number of the command line parameter from which
to get the log file path (1 = argv[1], 2 = argv[2], etc.)
• Return the full path of the log file
• Output a descriptive error message and abort script execution if no command line
parameter is provided
• Output a descriptive error message and abort script execution if the command line
parameter is not the path of an existing file
Call this function from main() and assign its return value to a local variable.
Charley, the network administrator, believes somebody has been attempting to crack the
server by breaking into the secure shell server, so let's use the filter_log_by_regex() function to
print all records that case-insensitive match the regex 'sshd'. The script output should be as
shown below.
PowerShell
It looks like someone is running through a list of common usernames to try to sign in, so let's
use the function to print all records that case-insensitive match the regex 'invalid user'. The
script output should be as shown below.
Page 5 of 12
COMP 593
Lab 4: Gateway Log Investigation
PowerShell
It looks like all those invalid login attempts are coming from the same IP address, but it would
be a good idea to confirm that all 318 of them are in fact from the same IP address. Let's check
that by printing all records that case-insensitive match the regex 'invalid user.*220.195.35.40'.
The script output should be as shown below.
PowerShell
That confirms that all invalid login attempts are coming from the same IP address. We should
let Charley know about it so the gateway firewall can be configured to block all network traffic
from that IP address.
Now let's check if the log contains any error messages by printing all records that case-
insensitive match the regex 'error'. The script output should be as shown below.
PowerShell
We should let Charley know about those SFW2-OUT-ERROR logs, but they don't seem to be of
any security concern.
Page 6 of 12
COMP 593
Lab 4: Gateway Log Investigation
Those authentication failures look interesting though, so let's print all records that case-
insensitive match the regex 'pam'. The script output should be as shown below.
PowerShell
Based on those timestamps showing successful logins shortly after the failures, it looks like
Charley's has a habit of mistyping his password when trying to login from his Windows 95
machine. No security concerns here, just run of the mill typos!
List of tuples:
[('24.64.208.134', '216.58.112.55', '512'), ß Data extracted from first record
('24.64.208.134', '216.58.112.55', '512'), ß Data extracted from second record
('24.64.208.134', '216.58.112.55', '512'), ß Data extracted from third record
... ß Data omitted for brevity
('192.168.17.24', '192.168.10.60', '235'), ß Data extracted from third last record
('192.168.17.24', '192.168.9.51', '204'), ß Data extracted from second last record
('192.168.17.24', '192.168.10.60', '204')] ß Data extracted from last record
A list of tuples is a commonly used data structure can easily be converted into a DataFrame
object, which opens up a lot of data analysis possibilities using the pandas package.
Use the filter_log_by_regex() function and the above regex with capturing groups to extract
the source IP address, destination IP address, and length from each matching record in the
gateway log. Convert the extracted information into a DataFrame object and save it as the CSV
file as shown below.
Page 7 of 12
COMP 593
Lab 4: Gateway Log Investigation
Page 8 of 12
COMP 593
Lab 4: Gateway Log Investigation
Define a function that generates a CSV file containing the information described above for a
specified destination port number that is extracted from a specified log file. The function must:
• Accept the following parameters:
o Log file path
o Destination port number
• Generate a CSV file containing the information described above
• Save the CSV file in the same directory in which the script resides under the filename
destination_port_{number}_report.csv, where {number} is the destination port
number.
Page 9 of 12
COMP 593
Lab 4: Gateway Log Investigation
Define a function that generates the report described above using information extracted from a
specified log file. The function must:
• Accept the log file path as a parameter
• Generate a CSV file containing the information described above
• Save the CSV file in the same directory in which the script resides under the filename
invalid_users.csv.
Call the function from main() to generate the file.
Hint: This function will be very similar to the function created in step 9.
STEP 11: CREATE FUNCTION THAT EXTRACTS AND SAVES SOURCE IP RECORDS
To further investigate the invalid user logins, Charley wants a plain text .txt file that contains all
records from the log file that contain the source (SRC) IP address 220.195.35.40. The first few
rows of the plain text report should look like this.
Define a function that generates the plain text .txt file described above using information
extracted from a specified log file. Just in case Charley asks for a similar file for another source
IP address in the future, let's make the source IP address a parameter.
Page 10 of 12
COMP 593
Lab 4: Gateway Log Investigation
DROPBOX SUBMISSION
Submit the URL of the GitHub repository that contains your script files, e.g.,
https://github.com/BobLoblaw/COMP593-Lab4
Page 11 of 12
COMP 593
Lab 4: Gateway Log Investigation
ASSESSMENT
Item Out Of Assessment Criteria
GitHub 2 • GitHub repository is private, instructor added as a
collaborator
• Repository contains both .py files
• Repository uses a .gitignore file to exclude .csv files, .txt
files, and the __pycache__ folder
Collaborative 2 • Script implemented as explained by instructor and
Portion described in steps 2-9
Invalid User Report 3 • Function defined and called as described in step 10
• Report contains required information and format
• Report filename as required
Source IP Log 3 • Function defined and called as described in step 11
• Report contains required information and format
• Report filename as specified
Total: 10
Page 12 of 12