0% found this document useful (0 votes)
79 views18 pages

Remove Text From Images Using CV2 and Keras-OCR - by Carlo Borella - Towards Data Science

This article explains how to remove text from images using Python with Keras and OpenCV. The process involves detecting text with Keras-OCR, creating masks for inpainting, and applying an inpainting algorithm to generate text-free images. The implementation is provided in detail, including code examples and considerations for specific use cases.

Uploaded by

stcase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views18 pages

Remove Text From Images Using CV2 and Keras-OCR - by Carlo Borella - Towards Data Science

This article explains how to remove text from images using Python with Keras and OpenCV. The process involves detecting text with Keras-OCR, creating masks for inpainting, and applying an inpainting algorithm to generate text-free images. The implementation is provided in detail, including code examples and considerations for specific use cases.

Uploaded by

stcase
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Get unlimited access to all of Medium for less than $1/week.

Become a member

Remove Text from Images using CV2 and Keras-


OCR
How to automatically modify images to make them text-free using Python

Carlo Borella · Follow


Published in Towards Data Science
7 min read · Nov 5, 2021

Listen Share More

An example of before and after removing text using Cv2 and Keras. Source: image by the author processing an
image by morningbirdphoto from Pixabay.

Introduction
In this article I will discuss how to quickly remove text from images as a pre-
processing step for an image classifier or a multi-modal text and image classifier
involving images with text such as memes (for instance the Hateful Memes Challenge
by Facebook).

Removing text can be useful for a variety or reasons, for example we can use the text-
free images for data augmentation as we can now pair the text-free image with a new
text.

For this tutorial we will use OCR (Optical Character Recognition) to detect text inside
images, and inpainting - the process where missing parts of a photo are filled in to
produce a complete image - to remove the text we detected.

The Process
In order to erase text from images we will go through three steps:

1. Identify text in the image and obtain the bounding box coordinates of each text,
using Keras-ocr.

2. For each bounding box, apply a mask to tell the algorithm which part of the image
we should inpaint.

3. Finally, apply an inpainting algorithm to inpaint the masked areas of the image,
resulting in a text-free image, using cv2.

A representation of the process from an image with text to a text-free image. Source: Image by Author.

The Implementation
Brief overview of Keras-ocr
Keras-ocr provides out-of-the-box OCR models and an end-to-end training pipeline to
build new OCR models (see: https://keras-ocr.readthedocs.io/en/latest/).

In this case we will use the pre-trained model, which works fairly well for our task.

Keras-ocr would automatically download the pre-trained weights for the detector and
recognizer.

When passing an image through Keras-orc it will return a (word, box) tuple, where the
box contains the coordinates (x, y) of the four box corners of the word.

Here is a quick example:

import matplotlib.pyplot as plt


import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

#read image from the an image path (a jpg/png file or an image url)
img = keras_ocr.tools.read(image_path)
# Prediction_groups is a list of (word, box) tuples
prediction_groups = pipeline.recognize([img])
#print image with annotation and boxes
keras_ocr.tools.drawAnnotations(image=img,
predictions=prediction_groups[0])
Source: Image by Paul Steuber from Pixabay.

If we take a look at prediction_groups we will see that each element corresponds to a


pair of word-box coordinates.

For example, prediction_groups[0][10] would look like:

('tuesday',
array([[ 986.2778 , 625.07764],
[1192.3856 , 622.7086 ],
[1192.8888 , 666.4836 ],
[ 986.78094, 668.8526 ]], dtype=float32))

The first element of the array corresponds to the coordinates of the top-left corner, the
second element corresponds to the bottom-right corner, the third elements is the top-
right corner, while the fourth element is the bottom-left corner.
A representation of a text bounding box, and its coordinates. Source: Image by Author.

Brief overview of cv2 inpaint functions


When applying an inpainting algorithm using OpenCV we need to provide two images:

1. The input image with the text we want to remove.

2. The mask image, which shows where in the image the text that we want to remove
is. This second image should have the same dimensions as the input. The mask will
display non-zero pixels corresponding to those areas of the input image that
contain text and would be inpainted, while the areas where we have zero pixels
won’t be modified.

Cv2 features two possible inpainting algorithms and allows to apply rectangular,
circular or line masks (see: https://opencv24-python-
tutorials.readthedocs.io/en/latest/py_tutorials/py_photo/py_inpainting/py_inpainting.h
tml)

In this case I decided to use line masks, as they are more flexible to cover text with
different orientations (rectangular masks would only work well for words parallel or
perpendicular to the x-axis and circular masks would cover an area larger than
necessary).

In order to apply the mask we need to provide the coordinates of the starting and the
ending points of the line, and the thickness of the line:

The start point will be the mid-point between the top-left corner and the bottom-left
corner of the box while the end point will be the mid-point between the top-right
corner and the bottom-right corner.

For the thickness we will calculate the length of the line between the top-left corner
and the bottom-left corner.

import math
import numpy as np
def midpoint(x1, y1, x2, y2):
x_mid = int((x1 + x2)/2)
y_mid = int((y1 + y2)/2)
return (x_mid, y_mid)
#example of a line mask for the word "Tuesday"
box = prediction_groups[0][10]
x0, y0 = box[1][0]
x1, y1 = box[1][1]
x2, y2 = box[1][2]
x3, y3 = box[1][3]

x_mid0, y_mid0 = midpoint(x1, y1, x2, y2)


x_mid1, y_mi1 = midpoint(x0, y0, x3, y3)
thickness = int(math.sqrt( (x2 - x1)**2 + (y2 - y1)**2 ))

Now we can create our mask:

mask = np.zeros(img.shape[:2], dtype="uint8")


cv2.line(mask, (x_mid0, y_mid0), (x_mid1, y_mi1), 255, thickness)

We can also check the masked area to make sure it is working properly.
masked = cv2.bitwise_and(img, img, mask=mask)
plt.imshow(masked)

The masked area corresponding to the word “Tuesday”.

Finally, we can inpaint the image. In this case we will be using cv2.INPAINT_NS which
refers to the inpainting algorithm described in the paper “Navier-Stokes, Fluid
Dynamics, and Image and Video Inpainting”.

img_inpainted = cv2.inpaint(img, mask, 7, cv2.INPAINT_NS)


plt.imshow(img_inpainted)
As you can see the work “Tuesday” was removed from the image.

The Implementation
Now let’s wrap it up altogether and create a function to inpaint text from any image.

We will just need to generate the list of boxes and iterate masking and inpainting each
text box.

import matplotlib.pyplot as plt


import keras_ocr
import cv2
import math
import numpy as np
def midpoint(x1, y1, x2, y2):
x_mid = int((x1 + x2)/2)
y_mid = int((y1 + y2)/2)
return (x_mid, y_mid)
pipeline = keras_ocr.pipeline.Pipeline()
def inpaint_text(img_path, pipeline):
# read image
img = keras_ocr.tools.read(img_path)
# generate (word, box) tuples
prediction_groups = pipeline.recognize([img])
mask = np.zeros(img.shape[:2], dtype="uint8")
for box in prediction_groups[0]:
x0, y0 = box[1][0]
x1, y1 = box[1][1]
x2, y2 = box[1][2]
x3, y3 = box[1][3]

x_mid0, y_mid0 = midpoint(x1, y1, x2, y2)


x_mid1, y_mi1 = midpoint(x0, y0, x3, y3)

thickness = int(math.sqrt( (x2 - x1)**2 + (y2 - y1)**2 ))

cv2.line(mask, (x_mid0, y_mid0), (x_mid1, y_mi1), 255,


thickness)
img = cv2.inpaint(img, mask, 7, cv2.INPAINT_NS)

return(img)

and here is the final result (before vs after):

I also included another couple of examples:


Source: image by the author generated by processing a meme from HackerNoon.

Source: image by the author generated by processing an image by Alfred Derks from Pixabay.

Note that if you want to save the image you will need to convert it to the RGB format,
otherwise the colours will be inverted!

img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)


cv2.imwrite(‘text_free_image.jpg’,img_rgb)
In case you were interested in removing certain words only, an if-condition can be
included as follows:

Given a list of words to remove

remove_list = [‘tuesday’, ‘monday’]

We can include the if condition in the for-loop

def inpaint_text(img_path, remove_list, pipeline):


# read image
img = keras_ocr.tools.read(img_path)
# generate (word, box) tuples
prediction_groups = pipeline.recognize([img])
mask = np.zeros(img.shape[:2], dtype="uint8")
for box in prediction_groups[0]:
if box[0] in remove_list:
x0, y0 = box[1][0]
x1, y1 = box[1][1]
x2, y2 = box[1][2]
x3, y3 = box[1][3]
Open in app

x_mid0, y_mid0 = midpoint(x1, y1, x2, y2)


x_mid1,
Search y_mi1 = midpoint(x0, y0, x3, y3)
Medium

thickness = int(math.sqrt( (x2 - x1)**2 + (y2 - y1)**2 ))

cv2.line(mask, (x_mid0, y_mid0), (x_mid1, y_mi1), 255,


thickness)
img = cv2.inpaint(img, mask, 7, cv2.INPAINT_NS)

return(img)

This of course is just a quick case-sensitive example on how to apply the inpainting to
just a certain list of words.

End Notes
In this article, we discussed how to implement an algorithm to automatically remove
text from images with a pre-trained OCR model using Keras and an inpainting
algorithm using cv2. The algorithm seems to work fairly well to quickly remove text
from images without the need to train a model for this specific task. It generally
performs not as well when a text box is close to other objects as it may distort the
surroundings.

Edit: the implementation was executed using Python 3.7, I received a few feedbacks of
problems encountered using OpenCv which happen when using other versions such as
Python 3.9. I would suggest to try with 3.7 instead to fix the issue.

I appreciate any feedback and constructive criticism! My email is


[email protected]

Computer Vision Ocr Python Machine Learning Image Processing

Follow

Written by Carlo Borella


23 Followers · Writer for Towards Data Science

Lead Data Scientist at Huge Inc, Passionate about Social Media Data and Miniature art — Msc in Economics and
Msc in Research Methods.

More from Carlo Borella and Towards Data Science


Carlo Borella in Towards AI

Automating Zero-Shot Classification Generating Model Labels with GPT-3


Use GPT-3 to automatically generate domain-specific labels for zero-shot text classification tasks,
saving time and effort.

5 min read · Apr 3

8
All You Need to Know to Build Your First LLM App

Dominik Polzer in Towards Data Science

All You Need to Know to Build Your First LLM App


A step-by-step tutorial to document loaders, embeddings, vector stores and prompt templates

· 26 min read · Jun 22

4.1K 38

130 ML Tricks And Resources Curated Carefully From 3 Years (Plus Free eBook)

Bex T. in Towards Data Science

130 ML Tricks And Resources Curated Carefully From 3 Years (Plus Free
eBook)
Each one is worth your time

· 48 min read · Aug 1

2.1K 9
Running Llama 2 on CPU Inference Locally for Document Q&A

Kenneth Leung in Towards Data Science

Running Llama 2 on CPU Inference Locally for Document Q&A


Clearly explained guide for running quantized open-source LLM applications on CPUs using
LLama 2, C Transformers, GGML, and LangChain

· 11 min read · Jul 19

1.93K 26

See all from Carlo Borella

See all from Towards Data Science

Recommended from Medium

Introduction to EASYOCR: A Simple and Accurate Python Library for Optical Character
Recognition

Arjun Gullbadhar in Level Up Coding

Introduction to EASYOCR: A Simple and Accurate Python Library for Optical


Character Recognition
Want to become an instant expert on this EasyOCR? Here are all the key insights I gained after 10
hours of research, condensed into just 5…

· 5 min read · Feb 6


224

OpenCV: Adaptive and Otsu Threshold in Image Processing with Python

Amit Chauhan

OpenCV: Adaptive and Otsu Threshold in Image Processing with Python


Image pre-processing techniques in artificial intelligence

· 3 min read · Mar 20

15

Lists

Predictive Modeling w/ Python


18 stories · 237 saves

Principal Tim Practical Guides to Machine Learning


ComponentSerie
le 10 stories · 247 saves
A l i fA l
Coding & Development
11 stories · 96 saves

Natural Language Processing


481 stories · 111 saves

Remove Any Object From An Image Using AI 🤩 | Stable Diffusion | LaMa Cleaner

Martin Thissen

Remove Any Object From An Image Using AI 🤩 | Stable Diffusion | LaMa


Cleaner
In this article, I will show you how to remove unwanted objects from an image using state-of-the-
art AI models.

· 14 min read · Mar 2

123

Examples of image augmentations

Conor O'Sullivan in Towards Data Science

Augmenting Images for Deep Learning


Using Python to augment data by flipping, adjusting brightness, color jitter and random noise

· 12 min read · Nov 17, 2022

34

All You Need to Know to Build Your First LLM App

Dominik Polzer in Towards Data Science

All You Need to Know to Build Your First LLM App


A step-by-step tutorial to document loaders, embeddings, vector stores and prompt templates

· 26 min read · Jun 22

4.1K 38

Detect and Extract Tabular Data From Images Using TableNet (With PyTorch)

Lidor ES in Better Programming


Detect and Extract Tabular Data From Images Using TableNet (With
PyTorch)
Table detection and extraction

10 min read · Feb 18

98 4

See more recommendations

You might also like