0% found this document useful (0 votes)
58 views13 pages

COMP 593 - Lab 4 - Gateway Log Investigation

The document provides instructions for a lab assignment to analyze a gateway firewall log file using Python. Students will download a sample log file, get script templates, define functions to get the log file path and filter log records, investigate the log by extracting data using regex, generate reports tallying traffic by port and listing invalid users, and save extracted source IP records. The goal is to practice processing text files line by line in Python using regex for tasks like security investigations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views13 pages

COMP 593 - Lab 4 - Gateway Log Investigation

The document provides instructions for a lab assignment to analyze a gateway firewall log file using Python. Students will download a sample log file, get script templates, define functions to get the log file path and filter log records, investigate the log by extracting data using regex, generate reports tallying traffic by port and listing invalid users, and save extracted source IP records. The goal is to practice processing text files line by line in Python using regex for tasks like security investigations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Lab 4: Gateway Log Investigation

Photo by FLY:D on Unsplash


Free to use under the Unsplash License

COMP 593 - Scripting Applications


COMP 593
Lab 4: Gateway Log Investigation

TABLE OF CONTENTS
Learning Objectives ........................................................................................................................2
Introduction....................................................................................................................................2
Instructor Notes .............................................................................................................................3
Instructions .....................................................................................................................................4
Step 1: Download the Gateway Firewall Log ............................................................................................ 4
Step 2: Get the Script Templates from D2L .............................................................................................. 4
Step 3: Create Function that Gets the Log File Path................................................................................. 4
Step 4: Inspect the Function that Filters Log Message Records ............................................................... 5
Step 5: Investigate the Gateway Firewall Log .......................................................................................... 5
Step 6: Extract Data from Records in the Log File .................................................................................... 7
Step 7: Create Function that Tallies Traffic by Port .................................................................................. 8
Step 8: Create Function that Generates Destination Port Reports .......................................................... 8
Step 9: Generate Destination Port Reports .............................................................................................. 9
Step 10: Create Function that Generates Invalid User Report ................................................................. 9
Step 11: Create Function that Extracts and Saves Source IP Records .................................................... 10
Dropbox Submission .....................................................................................................................11
Assessment ...................................................................................................................................12

Page 1 of 12
COMP 593
Lab 4: Gateway Log Investigation

LEARNING OBJECTIVES
Upon completion of this lab assignment, students should be able to:
• Write a Python script that process a text file line by line
• Incorporate regex matching operations into a Python script using the functions and
classes defined in the re module, including:
o Matching text in a string using a regex pattern,
o Extract text from a string using regex capture groups,
o Replacing text in a string using a regex pattern to locate the text to be replaced.
• Generate a CSV file using DataFrame class methods from the pandas module.
• Generate a plain text file using DataFrame class methods from the pandas module.
• Discuss the purpose and format of a function docstring.

INTRODUCTION
This lab assignment simulates a computer security investigation that involves developing a
Python script to locate and extract information from a log file produced by a network gateway
firewall.
A gateway is a node that connects two networks together, serving as an entry and exit point
that all data passes through before being routed to its destination. An example network
architecture that incorporates a gateway is depicted in the figure below.

Page 2 of 12
COMP 593
Lab 4: Gateway Log Investigation

A firewall is a network security device that monitors and filters incoming and outgoing network
traffic based on established security policies. Some network gateways have a built-in firewall.
Firewalls can typically be configured to record a log of all network traffic that passes through it,
or only traffic that meets some specific criteria, e.g., record all network traffic that is blocked by
the firewall. Such a log may be saved as a plain text file, where key information about network
traffic is recorded, such as source and destination IP addresses, which can be useful for
detecting and tracking malicious attacks on the network from external sources.
Firewall logs may contain a large amount of data, making them impractical to search through
manually. For this reason, some type of software tool is typically needed to facilitate a timely
and accurate investigation. In some cases, e.g., when the log uses a CSV format, it may be
sufficient to open the log file in a spreadsheet application and use its built-in search and
filtering capabilities to locate records of interest; however, the log file may not be formatted in
a way that allows the spreadsheet application to automatically separate values into individual
columns, which severely limits its filtering capabilities. Another approach could involve using a
log analysis tool, but this would involve learning how to use the tool and may be overkill for a
simple log investigation.
Another approach is to write a Python script that processes the log file line by line and uses
regular expressions to locate and extract information of interest. This approach provides
endless possibilities for text searching and report generation and can be tailored to be as simple
or complex as needed for the task at hand. This is the log analysis approach that students will
perform for this lab assignment.

INSTRUCTOR NOTES
The instructor will guide students through implementing steps 1-9.
Students will complete steps 10 and 11 on their own, using the code implemented in the
previous steps for reference.

Page 3 of 12
COMP 593
Lab 4: Gateway Log Investigation

INSTRUCTIONS

STEP 1: DOWNLOAD THE GATEWAY FIREWALL LOG


Download the gateway.log file from the Module 4 folder in D2L.
This is a log file recorded by a network gateway firewall. Most of the log records describe
dropped packets, but there are also some other records of security interest. Dealing with log
files like this is a common task for security administrators and investigators.
Open the log file in a text editor (e.g., VS Code) and inspect the information that it contains.
Notice that:
• Records are separated by a new line,
• Records do not all have the same format,
• Records do not all contain the same information,
• Some data items are consistently labeled using an uppercase word followed by an equal
sign, e.g., SRC=89.49.78.88, and is separated from the next data item by a space,
• The data is not formatted in a way that would allow a spreadsheet application to
automatically separate data items into columns (like a CSV file), or for the data to be
imported into a pandas DataFrame object, and
• There is too much information for a manual search to be feasible.

STEP 2: GET THE SCRIPT TEMPLATES FROM D2L


Create a new repository, save the script templates into the repo folder, and open the folder in
VS Code. Include a .gitignore file to exclude .csv files, .txt files, and the __pycache__ folder.

STEP 3: CREATE FUNCTION THAT GETS THE LOG FILE PATH


The log file to be analyzed could have any file name and could be located on any drive and/or
directory. Therefore, rather than hardcoding the path of the log file as a string literal inside the
Python script, it would be better for the script to accept the log file path as a command line
parameter, so the same script can be used to analyze any log file without requiring
modifications.
Since this sort of functionality would be desirable for any log analysis script, let's create a
general-purpose function that can be used to get a file path from any command line parameter.
The function should also check whether the command line parameter was provided and
whether it is the path of an existing file. If not, the function should abort script execution, since
there is not point in the script trying to analyze a log file that does not exist.

Page 4 of 12
COMP 593
Lab 4: Gateway Log Investigation

Working as a group led by the lab instructor, define a function that gets a log file path from any
command line parameter in the file log_analysis_lib.py. The function must:
• Accept one integer parameter: The number of the command line parameter from which
to get the log file path (1 = argv[1], 2 = argv[2], etc.)
• Return the full path of the log file
• Output a descriptive error message and abort script execution if no command line
parameter is provided
• Output a descriptive error message and abort script execution if the command line
parameter is not the path of an existing file
Call this function from main() and assign its return value to a local variable.

STEP 4: INSPECT THE FUNCTION THAT FILTERS LOG MESSAGE RECORDS


The script template file log_analysis_lib.py contains a function named filter_log_by_regex()
that can be used to find all records in a plain-text log file that match a specified regex. The lab
instructor will explain how this general-purpose function works and the next two steps will
demonstrate how it could be used to analyze a log file.

STEP 5: INVESTIGATE THE GATEWAY FIREWALL LOG


This step will demonstrate how the filter_log_by_regex() function can be used in a simulated
computer security investigation to locate and extract/print a list of all records in a log file that
match a regex.

Charley, the network administrator, believes somebody has been attempting to crack the
server by breaking into the secure shell server, so let's use the filter_log_by_regex() function to
print all records that case-insensitive match the regex 'sshd'. The script output should be as
shown below.

PowerShell

PS C:\> python log_analysis_lib.py gateway.log


Jan 29 04:00:23 myth sshd[10124]: Accepted publickey for root from 192.168.17.8 port 40555 ssh2
Jan 29 04:01:00 myth sshd[10136]: Accepted publickey for root from 192.168.17.8 port 40556 ssh2
...
Jan 29 13:26:58 myth sshd[12502]: Invalid user patriciar from 220.195.35.40
Jan 29 13:27:02 myth sshd[12504]: Invalid user porteria from 220.195.35.40
The log file contains 338 records that case-insensitive match the regex "sshd".

It looks like someone is running through a list of common usernames to try to sign in, so let's
use the function to print all records that case-insensitive match the regex 'invalid user'. The
script output should be as shown below.

Page 5 of 12
COMP 593
Lab 4: Gateway Log Investigation

PowerShell

PS C:\> python log_analysis_lib.py gateway.log


Jan 29 13:05:04 myth sshd[11825]: Invalid user anonymous from 220.195.35.40
Jan 29 13:05:10 myth sshd[11827]: Invalid user passwd from 220.195.35.40
...
Jan 29 13:26:58 myth sshd[12502]: Invalid user patriciar from 220.195.35.40
Jan 29 13:27:02 myth sshd[12504]: Invalid user porteria from 220.195.35.40
The log file contains 318 records that case-insensitive match the regex "invalid user".

It looks like all those invalid login attempts are coming from the same IP address, but it would
be a good idea to confirm that all 318 of them are in fact from the same IP address. Let's check
that by printing all records that case-insensitive match the regex 'invalid user.*220.195.35.40'.
The script output should be as shown below.

PowerShell

PS C:\> python log_analysis_lib.py gateway.log


Jan 29 13:05:04 myth sshd[11825]: Invalid user anonymous from 220.195.35.40
Jan 29 13:05:10 myth sshd[11827]: Invalid user passwd from 220.195.35.40
...
Jan 29 13:26:58 myth sshd[12502]: Invalid user patriciar from 220.195.35.40
Jan 29 13:27:02 myth sshd[12504]: Invalid user porteria from 220.195.35.40
The log file contains 318 records that match the regex "Invalid user.*220.195.35.40".

That confirms that all invalid login attempts are coming from the same IP address. We should
let Charley know about it so the gateway firewall can be configured to block all network traffic
from that IP address.

Now let's check if the log contains any error messages by printing all records that case-
insensitive match the regex 'error'. The script output should be as shown below.

PowerShell

PS C:\> python log_analysis_lib.py gateway.log


Jan 29 08:17:05 myth kernel: SFW2-OUT-ERROR IN= OUT=ppp0 SRC=216.58.112.55 DST=77.42.129.124 …
Jan 29 08:17:36 myth kernel: SFW2-OUT-ERROR IN= OUT=ppp0 SRC=216.58.112.55 DST=77.42.129.124 …
...
Jan 29 10:29:11 myth sshd[11035]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:29:22 myth sshd[11035]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:56:05 myth sshd[11136]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:58:59 myth sshd[11161]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
...
Jan 29 14:18:14 myth kernel: SFW2-OUT-ERROR IN= OUT=ppp0 SRC=216.58.112.55 DST=193.227.243.149 …
Jan 29 14:19:53 myth kernel: SFW2-OUT-ERROR IN= OUT=ppp0 SRC=216.58.112.55 DST=193.227.243.149 …
The log file contains 14 records that case-insensitive match the regex "error".

We should let Charley know about those SFW2-OUT-ERROR logs, but they don't seem to be of
any security concern.

Page 6 of 12
COMP 593
Lab 4: Gateway Log Investigation

Those authentication failures look interesting though, so let's print all records that case-
insensitive match the regex 'pam'. The script output should be as shown below.

PowerShell

PS C:\> python log_analysis_lib.py gateway.log


Jan 29 10:26:59 myth sshd[11004]: Accepted keyboard-interactive/pam for root from 192.168.17.11 port 3151 ssh2
Jan 29 10:29:11 myth sshd[11035]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:29:22 myth sshd[11035]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:29:28 myth sshd[11035]: Accepted keyboard-interactive/pam for root from 192.168.17.11 port 3152 ssh2
Jan 29 10:50:01 myth sshd[11119]: Accepted keyboard-interactive/pam for root from 192.168.17.11 port 3153 ssh2
Jan 29 10:56:05 myth sshd[11136]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:56:12 myth sshd[11136]: Accepted keyboard-interactive/pam for root from 192.168.17.11 port 3154 ssh2
Jan 29 10:58:59 myth sshd[11161]: error: PAM: Authentication failure for root from cmbw95.fivefortyfour.com
Jan 29 10:59:04 myth sshd[11161]: Accepted keyboard-interactive/pam for root from 192.168.17.11 port 3156 ssh2
Jan 29 11:02:19 myth sshd[11208]: Accepted keyboard-interactive/pam for root from 192.168.17.11 port 3157 ssh2
The log file contains 10 records that case-insensitive match the regex "pam".

Based on those timestamps showing successful logins shortly after the failures, it looks like
Charley's has a habit of mistyping his password when trying to login from his Windows 95
machine. No security concerns here, just run of the mill typos!

STEP 6: EXTRACT DATA FROM RECORDS IN THE LOG FILE


The filter_log_by_regex() function can also be used to extract specific information from each
record in a log file that match a specified regex. To use this functionality, the regex must
include capture groups that indicate which parts of each matching record to extract. The
extracted information is then returned by the function as a list of tuples (see example below).

Extracted Data Example


Regex with capture groups:
'SRC=(.*?) DST=(.*?) LEN=(.*?) '

List of tuples:
[('24.64.208.134', '216.58.112.55', '512'), ß Data extracted from first record
('24.64.208.134', '216.58.112.55', '512'), ß Data extracted from second record
('24.64.208.134', '216.58.112.55', '512'), ß Data extracted from third record
... ß Data omitted for brevity
('192.168.17.24', '192.168.10.60', '235'), ß Data extracted from third last record
('192.168.17.24', '192.168.9.51', '204'), ß Data extracted from second last record
('192.168.17.24', '192.168.10.60', '204')] ß Data extracted from last record

A list of tuples is a commonly used data structure can easily be converted into a DataFrame
object, which opens up a lot of data analysis possibilities using the pandas package.
Use the filter_log_by_regex() function and the above regex with capturing groups to extract
the source IP address, destination IP address, and length from each matching record in the
gateway log. Convert the extracted information into a DataFrame object and save it as the CSV
file as shown below.

Page 7 of 12
COMP 593
Lab 4: Gateway Log Investigation

STEP 7: CREATE FUNCTION THAT TALLIES TRAFFIC BY PORT


Charley wants you to provide him with a report (content described in next step) for each of the
destination ports that are experiencing high volumes of network traffic. To do this, you will first
have to determine how many records there are in the log file for each destination port. One
simple way to do this is to create a dictionary that uses each destination port number as a key,
where its respective value is the number of records in the log file that contain that destination
port number.
Define a function that processes a log file to create a dictionary of record tallies for each
destination port as described above. The function must:
• Accept the log file path as a parameter
• Process the log file line by line tallying the number of records that contain each
destination port number (DPT)
• Return a dictionary of destination port number records counts

STEP 8: CREATE FUNCTION THAT GENERATES DESTINATION PORT REPORTS


Charley wants each report to be a CSV file that contains the following information extracted
from each record in the log file that contains a specified destination port number:
• Date
• Time
• Source IP address
• Destination IP address
• Source port number
• Destination port number
For example, the first few rows of the CSV report for destination port number 40686 should
look like this (when opened in Excel).

Page 8 of 12
COMP 593
Lab 4: Gateway Log Investigation

Define a function that generates a CSV file containing the information described above for a
specified destination port number that is extracted from a specified log file. The function must:
• Accept the following parameters:
o Log file path
o Destination port number
• Generate a CSV file containing the information described above
• Save the CSV file in the same directory in which the script resides under the filename
destination_port_{number}_report.csv, where {number} is the destination port
number.

STEP 9: GENERATE DESTINATION PORT REPORTS


Charley wants you to provide a separate report for each destination port for which there are
100 or more records in the log file.
Within the main() function, use a for loop to iterate through the dictionary of destination port
number records counts, and for each port having a record count of 100 or more, call the
function created in step 8 to generate a report.

STEP 10: CREATE FUNCTION THAT GENERATES INVALID USER REPORT


Charley also wants a CSV report that contains the following information extracted from the log
file that indicates an attempt to login as an invalid user:
• Date
• Time
• Username
• IP address
The first few rows of the CSV report should look like this (when opened in Excel).

Page 9 of 12
COMP 593
Lab 4: Gateway Log Investigation

Define a function that generates the report described above using information extracted from a
specified log file. The function must:
• Accept the log file path as a parameter
• Generate a CSV file containing the information described above
• Save the CSV file in the same directory in which the script resides under the filename
invalid_users.csv.
Call the function from main() to generate the file.
Hint: This function will be very similar to the function created in step 9.

STEP 11: CREATE FUNCTION THAT EXTRACTS AND SAVES SOURCE IP RECORDS
To further investigate the invalid user logins, Charley wants a plain text .txt file that contains all
records from the log file that contain the source (SRC) IP address 220.195.35.40. The first few
rows of the plain text report should look like this.

Define a function that generates the plain text .txt file described above using information
extracted from a specified log file. Just in case Charley asks for a similar file for another source
IP address in the future, let's make the source IP address a parameter.

Page 10 of 12
COMP 593
Lab 4: Gateway Log Investigation

The function must:


• Accept the following parameters:
o Log file path
o Source IP address
• Generate a plain text file containing the information described above
• Save the file in the same directory in which the script resides under the filename
source_ip_{address}.txt, where {address} is the source IP address with all periods
replaced by underscores, e.g., source_ip_220_195_35_40.txt.
Call the function from main() to generate the file.
Hints:
• This function will be similar to the function created in step 9.
• The first parameter returned by the filter_log_by_regex() function is a list containing
each record that matches the specified regex, which is exactly the information that
needs to be saved in the .txt file.
• Since CSV files are plain text files, the to_csv() method of the Dataframe class can be
used to save the file. This StackOverflow thread describes one way it can be done.
• The re.sub() function from the re module could be used to replace all periods in the
source IP address with underscores.

DROPBOX SUBMISSION
Submit the URL of the GitHub repository that contains your script files, e.g.,
https://github.com/BobLoblaw/COMP593-Lab4

Page 11 of 12
COMP 593
Lab 4: Gateway Log Investigation

ASSESSMENT
Item Out Of Assessment Criteria
GitHub 2 • GitHub repository is private, instructor added as a
collaborator
• Repository contains both .py files
• Repository uses a .gitignore file to exclude .csv files, .txt
files, and the __pycache__ folder
Collaborative 2 • Script implemented as explained by instructor and
Portion described in steps 2-9
Invalid User Report 3 • Function defined and called as described in step 10
• Report contains required information and format
• Report filename as required
Source IP Log 3 • Function defined and called as described in step 11
• Report contains required information and format
• Report filename as specified
Total: 10

Page 12 of 12

You might also like