0% found this document useful (0 votes)

30 views3 pages

Text Processor For OCR AND FILE and Summarization

The Python script defines a Text Processor class that extracts and summarizes text from various sources like images, PDFs, and text files. It uses libraries for OCR, NLP, and text processing to extract keywords, summarize the text, and format it into bullet points or paragraphs. The class offers methods for text extraction, summarization, conversion to PDF, and analysis. It has been expanded over time to include additional features like keyword-based summarization and flexible formatting options.

Uploaded by

Prasu Muthyalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views3 pages

Text Processor For OCR AND FILE and Summarization

Uploaded by

Prasu Muthyalapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

DATE:09/10/10

Documenting the Code: Text Processor for OCR AND FILE and
Summarization

Purpose:

 The Python script defines a Text Processor class responsible for extracting and processing
text from various sources such as images, PDFs, text files, and Word documents. Additionally,
the class offers functionality to summarize the extracted text and convert it into a PDF file.

Libraries Used:
 PyPDF2: For PDF file handling.
 cv2 (OpenCV): For image processing and OCR.
 Pytesseract: An OCR engine for extracting text from images.
 FPDF: For creating PDF documents.
 Sumy: Text summarization library.
 nltk: Natural Language Toolkit for text processing.
 spacy: Natural language processing library.
 scikit-learn: For TF-IDF vectorization.
 Date-specific Setup:
 The script includes date-specific comments for future reference, indicating the setup and
implementation changes made on specific dates.

DATE:10/01/2024

Class Methods:
Extract Text Methods:

 Extract_text_from_image(scanned_image): Extracts text from a scanned image using OCR.

 Extract_text_from_pdf(pdf_path): Extracts text from a PDF file.
 Extract_text_from_text_file(text_file_path): Reads text from a plain text file.
 Extract_text_from_word_document(docx_path): Extracts text from a Word document.

Summarization Methods:
 Get_sentences_count(text, summary_length): Determines the number of sentences for
summarization based on the desired summary length.
 Summarize_text_sumy(text, sentences_count): Uses the Sumy library to generate a summary
of the text.

Text Conversion Methods:

 text_to_pdf(text, filename="summarized_text.pdf"): Converts the summarized text into a
PDF document.
Text Analysis Methods:

 Text_in_words_and_sentences(text): Counts the number of words and sentences in the

provided text.

DATE:11/01/2024:

Documenting the Updated Code: Enhanced Text Processor

Purpose:
The code has been expanded to include additional functionalities such as keyword extraction,
content summarization based on keywords, and flexible text formatting.

Date-Specific Updates:

 Added methods for extracting keywords from the extracted text.

 Implemented content summarization based on selected keywords.
 Introduced text formatting options (bullets or paragraphs).
 Enhanced the main script for user interaction.

Class Methods (Additions):

Keyword Extraction:

 Extracted_text_words(text, num_keywords=5): Uses TF-IDF to extract the top keywords from

the preprocessed text.
 Summarization based on Keywords:

 Eummarize_content(text, selected_keywords): Generates a summary by selecting sentences

containing specific keywords.

Text Formatting:

 format_text(text, format_choice): Formats the text into either bullet points or paragraphs
based on user choice.
Note:
 The script is designed for flexible interaction, allowing users to choose summary length,
format, and between default and keyword-based summarization.
 The keyword extraction and content summarization enhance the utility of the TextProcessor
class.
 Users can input their preferences through the console for a customized experience.

Date:12/01/2024

Text to Bullet Points Converter

Description:

 This Python script converts a given text into bullet points. The text is processed using the
spaCy natural language processing library, and the sentences are combined in groups of
three to form bullet points. Each bullet point is prefixed with the Unicode bullet character
(U+2022).

note:

 The script combines sentences in groups of three to create each bullet point.
 The Unicode bullet character (U+2022) is used as the prefix for each bullet point.
 Customize the code based on specific requirements, such as changing the number of lines
per bullet point or using a different bullet character.

Python Programming Recipes For IoT Applications
50% (2)
Python Programming Recipes For IoT Applications
206 pages
Cyberbullying A17 Major Project
No ratings yet
Cyberbullying A17 Major Project
98 pages
SUMMARIZATION Project For Ipec Solutions
No ratings yet
SUMMARIZATION Project For Ipec Solutions
18 pages
Research Paper Summarization
No ratings yet
Research Paper Summarization
13 pages
Brand24 - API Documentation: General
No ratings yet
Brand24 - API Documentation: General
10 pages
I PUC CS Chapter 5 Getting Started With Python Final
No ratings yet
I PUC CS Chapter 5 Getting Started With Python Final
30 pages
Krajewski TIF Chapter 1
No ratings yet
Krajewski TIF Chapter 1
12 pages
Core Java
No ratings yet
Core Java
131 pages
Python Content Manual
No ratings yet
Python Content Manual
95 pages
Embedded System and Iot My Ppt-1
No ratings yet
Embedded System and Iot My Ppt-1
94 pages
Introduction To Transact-SQL
No ratings yet
Introduction To Transact-SQL
22 pages
Daffodil DB SQL Reference Guide
No ratings yet
Daffodil DB SQL Reference Guide
259 pages
Madhu Final
No ratings yet
Madhu Final
80 pages
Programming in C (Theory) - Final PDF
No ratings yet
Programming in C (Theory) - Final PDF
242 pages
Unit Ii - C Programming Basics
No ratings yet
Unit Ii - C Programming Basics
64 pages
Symbol Tables: ASU Textbook Chapter 7.6, 6.5 and 6.3
No ratings yet
Symbol Tables: ASU Textbook Chapter 7.6, 6.5 and 6.3
21 pages
Oss Unit III
No ratings yet
Oss Unit III
44 pages
BCA C Program UNIT-I
No ratings yet
BCA C Program UNIT-I
7 pages
Rubric For Persuasive Research Essay LMC 1 1 1
No ratings yet
Rubric For Persuasive Research Essay LMC 1 1 1
5 pages
Template of JET
No ratings yet
Template of JET
3 pages
Viva Questions
No ratings yet
Viva Questions
2 pages
Advance Computer Programming: Notes
No ratings yet
Advance Computer Programming: Notes
4 pages
Introduction To QBASIC
No ratings yet
Introduction To QBASIC
51 pages
Shali Py
No ratings yet
Shali Py
4 pages
Template Tlemc
No ratings yet
Template Tlemc
3 pages
AI Tech Agency
No ratings yet
AI Tech Agency
28 pages
Mathematics Pseudocode Transcript
No ratings yet
Mathematics Pseudocode Transcript
2 pages
George H. Data Science From Scratch... 2020
100% (5)
George H. Data Science From Scratch... 2020
190 pages
Computer Programming Using Oo-Fortran
No ratings yet
Computer Programming Using Oo-Fortran
27 pages
24CSE24 - Data Structures Using C
No ratings yet
24CSE24 - Data Structures Using C
154 pages
Datatypes in Python 21 10 2024
No ratings yet
Datatypes in Python 21 10 2024
2 pages
621941-264561 - Bhavitha Bisalamanepalli - Feb 23, 2024 759 PM - Python Questions
No ratings yet
621941-264561 - Bhavitha Bisalamanepalli - Feb 23, 2024 759 PM - Python Questions
3 pages
Expert PHP 5 Tools
From Everand
Expert PHP 5 Tools
Dirk Merkel
4/5 (5)
Software Architecture with Python
From Everand
Software Architecture with Python
Anand Balachandran Pillai
3/5 (1)
Learning ASP.NET Core MVC Programming
From Everand
Learning ASP.NET Core MVC Programming
Mugilan T. S. Ragupathi
5/5 (4)
The Rust Programming Language, 2nd Edition
From Everand
The Rust Programming Language, 2nd Edition
Steve Klabnik
No ratings yet
TypeScript for Python Developers: Bridging Syntax and Practices
From Everand
TypeScript for Python Developers: Bridging Syntax and Practices
Baldurs L.
No ratings yet
Building Websites with Microsoft Content Management Server
From Everand
Building Websites with Microsoft Content Management Server
Lim Mei Ying
3/5 (2)
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
From Everand
Implementing Domain-Specific Languages with Xtext and Xtend - Second Edition
Lorenzo Bettini
4/5 (1)
Python Essentials
From Everand
Python Essentials
Steven F. Lott
5/5 (7)
Node.js Web Development - Third Edition
From Everand
Node.js Web Development - Third Edition
David Herron
2/5 (1)
Go Programming Blueprints - Second Edition
From Everand
Go Programming Blueprints - Second Edition
Mat Ryer
4.5/5 (3)
Node.js Blueprints
From Everand
Node.js Blueprints
Krasimir Tsonev
No ratings yet
Python Mini Manual
From Everand
Python Mini Manual
CodeCraft Dynamics
No ratings yet
Mastering TypoScript: TYPO3 Website, Template, and Extension Development
From Everand
Mastering TypoScript: TYPO3 Website, Template, and Extension Development
Daniel Koch
No ratings yet
C# 7 and .NET Core: Modern Cross-Platform Development - Second Edition
From Everand
C# 7 and .NET Core: Modern Cross-Platform Development - Second Edition
Mark J. Price
4.5/5 (2)
Moodle 1.9 Extension Development
From Everand
Moodle 1.9 Extension Development
Moore
No ratings yet
AppleScript
From Everand
AppleScript
Mark Conway Munro
5/5 (1)
Microsoft Dynamics GP 2010 Reporting
From Everand
Microsoft Dynamics GP 2010 Reporting
Christopher Liley
5/5 (2)
TypeScript Blueprints
From Everand
TypeScript Blueprints
Ivo Gabe de Wolff
No ratings yet
.NET Design Patterns
From Everand
.NET Design Patterns
Praseed Pai
3/5 (2)
Mastering Yii
From Everand
Mastering Yii
PortwoodII Charles R.
No ratings yet
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
Mastering Bootstrap 5: From Basics to Expert Projects
From Everand
Mastering Bootstrap 5: From Basics to Expert Projects
Kameron Hussain
No ratings yet
Professional C# 5.0 and .NET 4.5.1
From Everand
Professional C# 5.0 and .NET 4.5.1
Christian Nagel
No ratings yet
Learning Boost C++ Libraries
From Everand
Learning Boost C++ Libraries
Arindam Mukherjee
No ratings yet
Core Objective-C in 24 Hours
From Everand
Core Objective-C in 24 Hours
Keith Lee
5/5 (1)
MEAN Web Development - Second Edition
From Everand
MEAN Web Development - Second Edition
Amos Q. Haviv
No ratings yet
Professional C# 2012 and .NET 4.5
From Everand
Professional C# 2012 and .NET 4.5
Christian Nagel
3/5 (1)
Applied Architecture Patterns on the Microsoft Platform Second Edition
From Everand
Applied Architecture Patterns on the Microsoft Platform Second Edition
Andre Dovgal
No ratings yet
Spring Data
From Everand
Spring Data
Petri Kainulainen
No ratings yet
Yii2 By Example: Develop complete web applications from scratch through practical examples and tips for beginners and more advanced users
From Everand
Yii2 By Example: Develop complete web applications from scratch through practical examples and tips for beginners and more advanced users
Fabrizio Caldarelli
No ratings yet
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
From Everand
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
Anand Vemula
No ratings yet
Drupal 7 First Look
From Everand
Drupal 7 First Look
Mark Noble
No ratings yet
Expert Python Programming - Second Edition
From Everand
Expert Python Programming - Second Edition
Tarek Ziadé
2/5 (1)
FuelPHP Application Development Blueprints
From Everand
FuelPHP Application Development Blueprints
Sébastien Drouyer
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
TypeScript from the Ground Up: A Practical Guide with Examples
From Everand
TypeScript from the Ground Up: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Basics Made Simple: A Practical Guide with Examples
From Everand
Python Basics Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Programming And Coding in Intermidiate Level
From Everand
Programming And Coding in Intermidiate Level
Memo
No ratings yet
Odoo 10 Development Essentials
From Everand
Odoo 10 Development Essentials
Daniel Reis
No ratings yet
Mastering Python: A Comprehensive Crash Course for Beginners
From Everand
Mastering Python: A Comprehensive Crash Course for Beginners
Kameron Hussain
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Python Algorithms Step by Step: A Practical Guide with Examples
From Everand
Python Algorithms Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Efficient Editing with BBEdit: Definitive Reference for Developers and Engineers
From Everand
Efficient Editing with BBEdit: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Workflows with Notepad++: Definitive Reference for Developers and Engineers
From Everand
Efficient Workflows with Notepad++: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Sublime Text Essentials: Definitive Reference for Developers and Engineers
From Everand
Sublime Text Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Python OOP Step by Step: A Practical Guide with Examples
From Everand
Python OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering Python Programming: A Comprehensive Guide: The IT Collection
From Everand
Mastering Python Programming: A Comprehensive Guide: The IT Collection
Christopher Ford
5/5 (1)
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet
The C++ Template Handbook: Advanced Techniques for Modern C++ Developers
From Everand
The C++ Template Handbook: Advanced Techniques for Modern C++ Developers
Robert Johnson
No ratings yet
IGNOU PGDCA All in One Previous Years Unsolved Papers
From Everand
IGNOU PGDCA All in One Previous Years Unsolved Papers
Manish Soni
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Efficient Development with CodeLite IDE: Definitive Reference for Developers and Engineers
From Everand
Efficient Development with CodeLite IDE: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Beginner’s Guide to AI - Aider
From Everand
The Beginner’s Guide to AI - Aider
Steven Mcananey
No ratings yet
Efficient Editing with Kate: Definitive Reference for Developers and Engineers
From Everand
Efficient Editing with Kate: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)

Text Processor For OCR AND FILE and Summarization

Uploaded by

Text Processor For OCR AND FILE and Summarization

Uploaded by

DATE:09/10/10

 Extract_text_from_image(scanned_image): Extracts text from a scanned image using OCR.

Text Conversion Methods:

 Text_in_words_and_sentences(text): Counts the number of words and sentences in the

Documenting the Updated Code: Enhanced Text Processor

 Added methods for extracting keywords from the extracted text.

Class Methods (Additions):

 Extracted_text_words(text, num_keywords=5): Uses TF-IDF to extract the top keywords from

 Eummarize_content(text, selected_keywords): Generates a summary by selecting sentences

Text to Bullet Points Converter

You might also like