Final_Code_for_Markup

The document outlines a Python script that utilizes PyMuPDF and PyPDF2 libraries to search for specific texts in PDF files and highlight them. It reads search terms from a CSV file and processes multiple PDF files listed in an Excel spreadsheet, saving the modified PDFs with highlights. The script includes functions for reading search texts, searching and highlighting text in PDFs, and handling file paths for input and output.

Uploaded by

Suresh Kadam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Final_Code_for_Markup

Uploaded by

Suresh Kadam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 1

pip install PyMuPDF

pip install PyPDF2

###FINAL
import fitz # PyMuPDF
from PyPDF2 import PdfReader, PdfWriter
import csv
import pandas as pd

def read_search_texts_from_csv(csv_path):
with open(csv_path, 'r') as csvfile:
reader = csv.reader(csvfile)
return [row[0] for row in reader]

def my_search_and_highlight(pdf_path, search_texts, output_path):

# Open the PDF with PyMuPDF
doc = fitz.open(pdf_path)

for page in doc:

# Search for each text on the page

for search_text in search_texts:
text_instances = page.search_for(search_text)

# Highlight the text on the page

for text_rect in text_instances:
highlight = page.add_highlight_annot(text_rect)
highlight.update()

# Save the modified PDF

doc.save(output_path, garbage=4, deflate=True, clean=True)

if __name__ == "__main__":
# Replace these variables with your actual file paths and search texts
input_pdf_loc = "/Users/sureshkadam/Documents/MY
DATA/PYHTON-WINDOWS/PYTHON/HTMLPARSER/Htmlparser/ETPDF/"
output_pdf_loc = "/Users/sureshkadam/Documents/MY DATA/PHD/DATA/MARKUP_PDF/"
csv_path = "search_texts.csv"

excel_file_path = "/Users/sureshkadam/Documents/MY
DATA/PHD/DATA/MARKUP_PDF/filestoupdate.xlsx"

df = pd.read_excel(excel_file_path)

# Read search texts from CSV

search_texts = read_search_texts_from_csv(csv_path)

filenames = df.iloc[:,0]
i=0
for values in filenames:
input_pdf_path = input_pdf_loc + values
output_pdf_path = output_pdf_loc + values
#print(input_pdf_path)
#print(output_pdf_path)
my_search_and_highlight(input_pdf_path, search_texts, output_pdf_path)
i = i + 1
print(str(i)+ " - "+ values)

WSMA Lab Manual 2
No ratings yet
WSMA Lab Manual 2
8 pages
I Analyzed 2k Data Scientist and Data Engineer Jobs and This Is What I Found - by Khuyen Tran - Towards AI
No ratings yet
I Analyzed 2k Data Scientist and Data Engineer Jobs and This Is What I Found - by Khuyen Tran - Towards AI
17 pages
ai_ml_roadmap
No ratings yet
ai_ml_roadmap
1 page
Use Python To Fill PDF Files! - AKDux
No ratings yet
Use Python To Fill PDF Files! - AKDux
16 pages
Scrape Data From PDF Files Using Python Towards Data Science
No ratings yet
Scrape Data From PDF Files Using Python Towards Data Science
8 pages
download-pdf
No ratings yet
download-pdf
5 pages
25 Awesome Python Scripts
No ratings yet
25 Awesome Python Scripts
26 pages
Testing PDFs With Python
No ratings yet
Testing PDFs With Python
5 pages
How To Use NLP in Python A Practical Step-by-Step ExampleTo Find Out The In-Demand Skills For Data SC
No ratings yet
How To Use NLP in Python A Practical Step-by-Step ExampleTo Find Out The In-Demand Skills For Data SC
12 pages
Project X
No ratings yet
Project X
10 pages
Bhavnesh Baghel's Resume
No ratings yet
Bhavnesh Baghel's Resume
2 pages
AI Over PDF Library
No ratings yet
AI Over PDF Library
2 pages
50 Useful Python Scripts Free PDF
100% (1)
50 Useful Python Scripts Free PDF
65 pages
DOC-20250123-WA0006.
No ratings yet
DOC-20250123-WA0006.
34 pages
Information Retrival
No ratings yet
Information Retrival
43 pages
8 10
No ratings yet
8 10
6 pages
Data Science Live PDF
No ratings yet
Data Science Live PDF
6 pages
Welcome To Colaboratory - Colaboratory
No ratings yet
Welcome To Colaboratory - Colaboratory
5 pages
all_python
No ratings yet
all_python
8 pages
Self Intoduction 1 project
No ratings yet
Self Intoduction 1 project
11 pages
AI Lab - Manual - 136
No ratings yet
AI Lab - Manual - 136
17 pages
akk
No ratings yet
akk
2 pages
PGM 11 Execfinal
No ratings yet
PGM 11 Execfinal
3 pages
GuidedPractice3 3
No ratings yet
GuidedPractice3 3
11 pages
22 Project 3 PDF Scraping in Python REGEX
No ratings yet
22 Project 3 PDF Scraping in Python REGEX
3 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
11 pages
Python Using AI Workshop Notes
No ratings yet
Python Using AI Workshop Notes
21 pages
Syllabuls For Python Class
No ratings yet
Syllabuls For Python Class
3 pages
Sourcecode
No ratings yet
Sourcecode
16 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
LLM For QnA Proposal
No ratings yet
LLM For QnA Proposal
12 pages
Create Edit PDF App in Python
No ratings yet
Create Edit PDF App in Python
3 pages
3 Ways to Scrape PDF in Python - Proxidize
No ratings yet
3 Ways to Scrape PDF in Python - Proxidize
20 pages
Python Using AI
No ratings yet
Python Using AI
9 pages
A Z Cheatsheet Python DA
No ratings yet
A Z Cheatsheet Python DA
7 pages
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Ai Lab 02
No ratings yet
Ai Lab 02
12 pages
21CSC303JJ-SEPM_Ex-1.docx - Google Docs
No ratings yet
21CSC303JJ-SEPM_Ex-1.docx - Google Docs
4 pages
DS5B-2028 All Ques - HTML
No ratings yet
DS5B-2028 All Ques - HTML
10 pages
_Project List (1)
No ratings yet
_Project List (1)
2 pages
Python Record Manual
No ratings yet
Python Record Manual
18 pages
P.H.P Simple C.R.U.D Design
From Everand
P.H.P Simple C.R.U.D Design
Rohaya Mohamad
4/5 (1)
Automation Cheat Sheet 2.0
100% (1)
Automation Cheat Sheet 2.0
6 pages
Program 9
No ratings yet
Program 9
3 pages
Sahil Malhotra 16 BCE 0113 Web Mining L51+L52: 1. Universal Crawling 1.1. CODE
No ratings yet
Sahil Malhotra 16 BCE 0113 Web Mining L51+L52: 1. Universal Crawling 1.1. CODE
11 pages
Internship Assignment Coding2024
No ratings yet
Internship Assignment Coding2024
6 pages
AISpace Idea
No ratings yet
AISpace Idea
4 pages
Let Us Create Super Ai by Chat GPT and Muwanguz David
No ratings yet
Let Us Create Super Ai by Chat GPT and Muwanguz David
133 pages
Capstone project_Jaro-Prof. Babji
No ratings yet
Capstone project_Jaro-Prof. Babji
5 pages
How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers
No ratings yet
How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers
8 pages
Python-Deprecated Library v1.1 Documentation
From Everand
Python-Deprecated Library v1.1 Documentation
Laurent LAPORTE
No ratings yet
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
Web Scraping With Python
No ratings yet
Web Scraping With Python
21 pages
Naukri AnoopHK (3y 4m)
No ratings yet
Naukri AnoopHK (3y 4m)
4 pages
Dataprep Cheat Sheet
No ratings yet
Dataprep Cheat Sheet
1 page
Top 60 Python Projects For All Levels of Expertise
No ratings yet
Top 60 Python Projects For All Levels of Expertise
9 pages
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Tue+Sep+20+23 56 35+GMT+05 00+2022
No ratings yet
Tue+Sep+20+23 56 35+GMT+05 00+2022
1 page
50 Python Projects Scripts
No ratings yet
50 Python Projects Scripts
65 pages
Case Studies in Finance _ Managing for Corporate Value Creation ( PDFDrive.com ) - Copy-2
No ratings yet
Case Studies in Finance _ Managing for Corporate Value Creation ( PDFDrive.com ) - Copy-2
1 page
Understanding The Balance Sheet Items
No ratings yet
Understanding The Balance Sheet Items
3 pages
Consolidated Financial Statements
100% (3)
Consolidated Financial Statements
43 pages
Audit Report Format - Companies Act 2013
No ratings yet
Audit Report Format - Companies Act 2013
4 pages

Final_Code_for_Markup

Uploaded by

Final_Code_for_Markup

Uploaded by

pip install PyMuPDF

pip install PyPDF2

def my_search_and_highlight(pdf_path, search_texts, output_path):

for page in doc:

# Search for each text on the page

# Highlight the text on the page

# Save the modified PDF

# Read search texts from CSV

You might also like