
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Unique Lines from Two Text Files in Python
Many times we see two files that look similar but had certain differences. If the files are big or have lots of content, searching for that difference or finding the uniqueness in that file manually is not easy. However, this problem of finding the unique lines in two text files can be done easily using Python programs. In this article, using three different examples, three different ways of finding the unique lines in two text files are given. The text files used are a.txt and b.txt while in the end the result is stored in another txt file.
For these example, the content or lines differences in the txt files are given here ?
Lines Given In text files | In a.txt | In b.txt |
---|---|---|
Introduction to Computers |
Yes |
Yes |
Introduction to Programming Concepts |
Yes |
Yes |
Introduction to Windows, its Features, Application |
Yes |
Yes |
C++ Programming |
No |
Yes |
Computer Organization Principles |
Yes |
Yes |
Database Management Systems |
Yes |
Yes |
Introduction to Embedded Systems |
Yes |
Yes |
Fundamentals of PHP |
Yes |
Yes |
Mathematical Foundation For Computer Science |
Yes |
No |
Java Programming |
Yes |
Yes |
Functions |
Yes |
Yes |
Arrays |
Yes |
Yes |
Disk Operating System |
Yes |
Yes |
Introduction to Number system and codes |
No |
Yes |
Data Mining |
Yes |
Yes |
Software Engineering |
Yes |
No |
Computer Networks |
Yes |
Yes |
Control Structures |
Yes |
Yes |
Example 1 - Find Unique Lines From Two Text Files by iterating and comparing the individual lines in both files.
Algorithm
Step 1 ? Open both text files in read mode.
Step 2 ? Read lines in a.txt in afile and readlines in b.txt and store it in bfile
Step 3 ? Make an empty list called cfile. Go through line by line in bfile. If a line is not present in afile append it to cfile.
Step 4 ? Now go through line by line in afile. If a line is not present in bfile append it to cfile. Write the cfile to finalRes.txt.
Step 5 ? Run the program and then check the result.
The Python File Contains this
af = open('a.txt', 'r') afile = af.readlines() bf = open('b.txt', 'r') bfile = bf.readlines() cfile=[] for ln in bfile: if ln not in afile: cfile.append(ln) for ln in afile: if ln not in bfile: cfile.append(ln) resultFile= open('finalRes.txt', 'w') for lin in cfile: resultFile.write(lin)
Viewing The Result - Example 1
For seeing the unique lines in both the txt files as result run the Python file in the cmd window.
C++ Programming Mathematical Foundation For Computer Science Software Engineering
Fig 1: Content of the result file called finalRes.txt.
Example 2: Find Unique Lines From Two Text Files by using difflib library module.
Algorithm
Step 1 ? First import Differ module from the difflib.
Step 2 ? Open both text files in read mode.
Step 3 ? Read lines in a.txt in afile and readlines in b.txt and store them in bfile.
Step 4 ? Compare the file differences using Differ module. Write the result to finalRes1.txt.
Step 5 ? Run the program and then check the result.
The Python File Contains this
from difflib import Differ af = open('a.txt', 'r') afile = af.readlines() bf = open('b.txt', 'r') bfile = bf.readlines() result = list(Differ().compare(afile, bfile)) resultFile= open('finalRes1.txt', 'w') for lin in result: resultFile.write(lin)
Viewing The Result - Example 2
Open the cmd window and run the python file to see the result. The result file will show - or + infront of the unique lines in bothe the files. + sign means that that line is not given in first txt file while - means that that line is not present in the second txt file.
Introduction to Computers Introduction to Programming Concepts Introduction to Windows, its Features, Application + C++ Programming Computer Organization Principles Database Management Systems Introduction to Embedded Systems Fundamentals of PHP - Mathematical Foundation For Computer Science Java Programming Functions Arrays Disk Operating System Introduction to Number system and codes Data Mining - Software Engineering Computer Networks Control Structures
Fig 2: Content of the result file called finalRes1.txt
Example 3: Find Unique Lines From Two Text Files by using removing similar lines and retaining unique lines.
Algorithm
Step 1 ? Open both text files in read mode.
Step 2 ? Read lines in a.txt in afile and open b.txt and store it in bf.
Step 3 ? For all lines in bf, if that line is in afile, remove it from a file. If it is not in afile, append it to another list called uniqueB
Step 4 ? Append the lines left in afile and those in uniqueB to cfile. Write the cfile to finalRes2.txt.
Step 5 ? Deploy the program and then check the result.
The Python File Contains this
with open('a.txt', 'r') as af: afile = set(af) uniqueB = [] cfile=[] with open('b.txt', 'r') as bf: for ln in bf: if ln in afile: afile.remove(ln) else: uniqueB.append(ln) print("\nPrinting all unique lines in both a.txt and b.txt : ") print('\nAll the lines in a.txt file that are not in b.txt: \n') for ln in sorted(afile): print(ln.rstrip()) cfile.append(ln) print() print('\nAll the lines in b.txt file that are not in a.txt: \n') for lin in uniqueB: print(lin.rstrip()) cfile.append(lin) print() resultFile= open('finalRes2.txt', 'w') for lin in cfile: resultFile.write(lin)
Viewing The Result - Example 3
For seeing the unique lines in both the txt files as result, run the Python file in the cmd window.
Mathematical Foundation For Computer Science Software Engineering C++ Programming
Fig 3: Content of the result file called finalRes2.txt.
Conclusion
In this Python article, using three different examples, the ways to show how to find unique lines in two text files are given. In example1, simple iteration and comparision is used by going line by line in both the txt files. In example 2, a library module called Differ from difflib is used. In example 3, the similar lines are removed while retaining the unique lines using Python lists.