Find Difference Between 2 Files in Python



In most of the applications, especially in data processing, software development or testing, it is required to compare two files to detect changes, validate outputs or find discrepancies. Python offers several ways to compare files which ranges from basic line-by-line comparisons to more advanced diff utilities. Following are the key methods which are used to compare two files in python -

  • Line-by-line comparison: This is a straightforward approach to textual differences.
  • difflib module: This module is used to produce human-readable diffs similar to Unix's diff command.
  • filecmp module: This module is used for quick binary or shallow comparisons.

All the above methods can be applied to various file types such as .txt, .csv, .json, or all code files.

Line-by-Line Comparison (Basic Method)

The Line-by-line comparison method is the basic method that reads both files line by line and compares each line in order. This method is useful in detecting small changes in text files such as source code, logs, or configuration files. This method can handle files of different lengths also.

Example

Following is the example which compares two text files file1.txt, file2.txt, each line by index and reports if any difference is found at a specific line number -

def compare_files(file1, file2):
    with open(file1, 'r') as f1, open(file2, 'r') as f2:
        f1_lines = f1.readlines()
        f2_lines = f2.readlines()

    max_lines = max(len(f1_lines), len(f2_lines))
    for i in range(max_lines):
        line1 = f1_lines[i].strip() if i < len(f1_lines) else "<no line>"
        line2 = f2_lines[i].strip() if i < len(f2_lines) else "<no line>"

        if line1 != line2:
            print(f"Difference at line {i + 1}:")
            print(f"File1: {line1}")
            print(f"File2: {line2}")
            print("-" * 40)

# Example usage
compare_files('file1.txt', 'file2.txt')

Here is the output after comparing both the files -

Difference at line 1:
File1: Hello, welcome to Tutorialspoint.
File2: Hello welcome to Tutorialspoint.
----------------------------------------

Using difflib for Detailed Diffs

The difflib module provides various tools for computing and working with differences between sequences. It's a more advanced method for file comparison which produces a detailed and human-readable difference output that is similar to Unix's diff command. This method is useful when we want to highlight exactly what changed in each line.

Here are the symbols and their meanings that are defined in the output when a comparison of two files is performed -

  • - line unique to sequence 1
  • + line unique to sequence 2
  • '' line common to both sequences
  • ? Line not present in either input sequence

Example

Following is an example that compares two text files file1.txt, file2.txt, using difflib.unified_diff() function and prints a unified difference showing the changes in both the files -

import difflib

def diff_files(file1, file2):
    with open(file1, 'r') as f1, open(file2, 'r') as f2:
        f1_lines = f1.readlines()
        f2_lines = f2.readlines()

    diff = difflib.unified_diff(f1_lines, f2_lines, fromfile='file1.txt', tofile='file2.txt')
    print(''.join(diff))

# Example usage
diff_files('file1.txt', 'file2.txt')

Here is the output after comparing both files using difflib.unified_diff() function -

--- file1.txt
+++ file2.txt
@@ -1,2 +1,2 @@
-Hello, welcome to Tutorialspoint.
-Have a happy learning.+Hello welcome to Tutorialspoint.
+have a happy learning.

Using filecmp for File Comparison

The filecmp module provides a fast way to compare files to check if they are identical or not. This module performs a shallow or deep comparison, which is useful for binary files or when we simply need to know whether two files are exactly the same without caring about the specific differences.

Example

Following is the example which uses filecmp.cmp() to compare two files file1.txt, file2.txt, and prints whether they are identical or different -

import filecmp

def compare_binary(file1, file2):
    result = filecmp.cmp(file1, file2, shallow=False)
    if result:
        print("Files are the same.")
    else:
        print("Files differ.")

# Example usage
compare_binary('file1.txt', 'file2.txt')

Here is the output after comparing both the files -

Files differ.

We can choose an appropriate method to compare two different files based on our use cases and scenarios.

Updated on: 2025-05-15T18:32:09+05:30

10K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements