Remove Text From Images Using CV2 and Keras-OCR - by Carlo Borella - Towards Data Science
Remove Text From Images Using CV2 and Keras-OCR - by Carlo Borella - Towards Data Science
Become a member
An example of before and after removing text using Cv2 and Keras. Source: image by the author processing an
image by morningbirdphoto from Pixabay.
Introduction
In this article I will discuss how to quickly remove text from images as a pre-
processing step for an image classifier or a multi-modal text and image classifier
involving images with text such as memes (for instance the Hateful Memes Challenge
by Facebook).
Removing text can be useful for a variety or reasons, for example we can use the text-
free images for data augmentation as we can now pair the text-free image with a new
text.
For this tutorial we will use OCR (Optical Character Recognition) to detect text inside
images, and inpainting - the process where missing parts of a photo are filled in to
produce a complete image - to remove the text we detected.
The Process
In order to erase text from images we will go through three steps:
1. Identify text in the image and obtain the bounding box coordinates of each text,
using Keras-ocr.
2. For each bounding box, apply a mask to tell the algorithm which part of the image
we should inpaint.
3. Finally, apply an inpainting algorithm to inpaint the masked areas of the image,
resulting in a text-free image, using cv2.
A representation of the process from an image with text to a text-free image. Source: Image by Author.
The Implementation
Brief overview of Keras-ocr
Keras-ocr provides out-of-the-box OCR models and an end-to-end training pipeline to
build new OCR models (see: https://keras-ocr.readthedocs.io/en/latest/).
In this case we will use the pre-trained model, which works fairly well for our task.
Keras-ocr would automatically download the pre-trained weights for the detector and
recognizer.
When passing an image through Keras-orc it will return a (word, box) tuple, where the
box contains the coordinates (x, y) of the four box corners of the word.
pipeline = keras_ocr.pipeline.Pipeline()
#read image from the an image path (a jpg/png file or an image url)
img = keras_ocr.tools.read(image_path)
# Prediction_groups is a list of (word, box) tuples
prediction_groups = pipeline.recognize([img])
#print image with annotation and boxes
keras_ocr.tools.drawAnnotations(image=img,
predictions=prediction_groups[0])
Source: Image by Paul Steuber from Pixabay.
('tuesday',
array([[ 986.2778 , 625.07764],
[1192.3856 , 622.7086 ],
[1192.8888 , 666.4836 ],
[ 986.78094, 668.8526 ]], dtype=float32))
The first element of the array corresponds to the coordinates of the top-left corner, the
second element corresponds to the bottom-right corner, the third elements is the top-
right corner, while the fourth element is the bottom-left corner.
A representation of a text bounding box, and its coordinates. Source: Image by Author.
2. The mask image, which shows where in the image the text that we want to remove
is. This second image should have the same dimensions as the input. The mask will
display non-zero pixels corresponding to those areas of the input image that
contain text and would be inpainted, while the areas where we have zero pixels
won’t be modified.
Cv2 features two possible inpainting algorithms and allows to apply rectangular,
circular or line masks (see: https://opencv24-python-
tutorials.readthedocs.io/en/latest/py_tutorials/py_photo/py_inpainting/py_inpainting.h
tml)
In this case I decided to use line masks, as they are more flexible to cover text with
different orientations (rectangular masks would only work well for words parallel or
perpendicular to the x-axis and circular masks would cover an area larger than
necessary).
In order to apply the mask we need to provide the coordinates of the starting and the
ending points of the line, and the thickness of the line:
The start point will be the mid-point between the top-left corner and the bottom-left
corner of the box while the end point will be the mid-point between the top-right
corner and the bottom-right corner.
For the thickness we will calculate the length of the line between the top-left corner
and the bottom-left corner.
import math
import numpy as np
def midpoint(x1, y1, x2, y2):
x_mid = int((x1 + x2)/2)
y_mid = int((y1 + y2)/2)
return (x_mid, y_mid)
#example of a line mask for the word "Tuesday"
box = prediction_groups[0][10]
x0, y0 = box[1][0]
x1, y1 = box[1][1]
x2, y2 = box[1][2]
x3, y3 = box[1][3]
We can also check the masked area to make sure it is working properly.
masked = cv2.bitwise_and(img, img, mask=mask)
plt.imshow(masked)
Finally, we can inpaint the image. In this case we will be using cv2.INPAINT_NS which
refers to the inpainting algorithm described in the paper “Navier-Stokes, Fluid
Dynamics, and Image and Video Inpainting”.
The Implementation
Now let’s wrap it up altogether and create a function to inpaint text from any image.
We will just need to generate the list of boxes and iterate masking and inpainting each
text box.
return(img)
Source: image by the author generated by processing an image by Alfred Derks from Pixabay.
Note that if you want to save the image you will need to convert it to the RGB format,
otherwise the colours will be inverted!
return(img)
This of course is just a quick case-sensitive example on how to apply the inpainting to
just a certain list of words.
End Notes
In this article, we discussed how to implement an algorithm to automatically remove
text from images with a pre-trained OCR model using Keras and an inpainting
algorithm using cv2. The algorithm seems to work fairly well to quickly remove text
from images without the need to train a model for this specific task. It generally
performs not as well when a text box is close to other objects as it may distort the
surroundings.
Edit: the implementation was executed using Python 3.7, I received a few feedbacks of
problems encountered using OpenCv which happen when using other versions such as
Python 3.9. I would suggest to try with 3.7 instead to fix the issue.
Follow
Lead Data Scientist at Huge Inc, Passionate about Social Media Data and Miniature art — Msc in Economics and
Msc in Research Methods.
8
All You Need to Know to Build Your First LLM App
4.1K 38
130 ML Tricks And Resources Curated Carefully From 3 Years (Plus Free eBook)
130 ML Tricks And Resources Curated Carefully From 3 Years (Plus Free
eBook)
Each one is worth your time
2.1K 9
Running Llama 2 on CPU Inference Locally for Document Q&A
1.93K 26
Introduction to EASYOCR: A Simple and Accurate Python Library for Optical Character
Recognition
Amit Chauhan
15
Lists
Remove Any Object From An Image Using AI 🤩 | Stable Diffusion | LaMa Cleaner
Martin Thissen
123
34
4.1K 38
Detect and Extract Tabular Data From Images Using TableNet (With PyTorch)
98 4