
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Difference Between 2 Files in Python
In most of the applications, especially in data processing, software development or testing, it is required to compare two files to detect changes, validate outputs or find discrepancies. Python offers several ways to compare files which ranges from basic line-by-line comparisons to more advanced diff utilities. Following are the key methods which are used to compare two files in python -
- Line-by-line comparison: This is a straightforward approach to textual differences.
- difflib module: This module is used to produce human-readable diffs similar to Unix's diff command.
- filecmp module: This module is used for quick binary or shallow comparisons.
All the above methods can be applied to various file types such as .txt, .csv, .json, or all code files.
Line-by-Line Comparison (Basic Method)
The Line-by-line comparison method is the basic method that reads both files line by line and compares each line in order. This method is useful in detecting small changes in text files such as source code, logs, or configuration files. This method can handle files of different lengths also.
Example
Following is the example which compares two text files file1.txt, file2.txt, each line by index and reports if any difference is found at a specific line number -
def compare_files(file1, file2): with open(file1, 'r') as f1, open(file2, 'r') as f2: f1_lines = f1.readlines() f2_lines = f2.readlines() max_lines = max(len(f1_lines), len(f2_lines)) for i in range(max_lines): line1 = f1_lines[i].strip() if i < len(f1_lines) else "<no line>" line2 = f2_lines[i].strip() if i < len(f2_lines) else "<no line>" if line1 != line2: print(f"Difference at line {i + 1}:") print(f"File1: {line1}") print(f"File2: {line2}") print("-" * 40) # Example usage compare_files('file1.txt', 'file2.txt')
Here is the output after comparing both the files -
Difference at line 1: File1: Hello, welcome to Tutorialspoint. File2: Hello welcome to Tutorialspoint. ----------------------------------------
Using difflib for Detailed Diffs
The difflib module provides various tools for computing and working with differences between sequences. It's a more advanced method for file comparison which produces a detailed and human-readable difference output that is similar to Unix's diff command. This method is useful when we want to highlight exactly what changed in each line.
Here are the symbols and their meanings that are defined in the output when a comparison of two files is performed -
- - line unique to sequence 1
- + line unique to sequence 2
- '' line common to both sequences
- ? Line not present in either input sequence
Example
Following is an example that compares two text files file1.txt, file2.txt, using difflib.unified_diff() function and prints a unified difference showing the changes in both the files -
import difflib def diff_files(file1, file2): with open(file1, 'r') as f1, open(file2, 'r') as f2: f1_lines = f1.readlines() f2_lines = f2.readlines() diff = difflib.unified_diff(f1_lines, f2_lines, fromfile='file1.txt', tofile='file2.txt') print(''.join(diff)) # Example usage diff_files('file1.txt', 'file2.txt')
Here is the output after comparing both files using difflib.unified_diff() function -
--- file1.txt +++ file2.txt @@ -1,2 +1,2 @@ -Hello, welcome to Tutorialspoint. -Have a happy learning.+Hello welcome to Tutorialspoint. +have a happy learning.
Using filecmp for File Comparison
The filecmp module provides a fast way to compare files to check if they are identical or not. This module performs a shallow or deep comparison, which is useful for binary files or when we simply need to know whether two files are exactly the same without caring about the specific differences.
Example
Following is the example which uses filecmp.cmp() to compare two files file1.txt, file2.txt, and prints whether they are identical or different -
import filecmp def compare_binary(file1, file2): result = filecmp.cmp(file1, file2, shallow=False) if result: print("Files are the same.") else: print("Files differ.") # Example usage compare_binary('file1.txt', 'file2.txt')
Here is the output after comparing both the files -
Files differ.
We can choose an appropriate method to compare two different files based on our use cases and scenarios.