title | description | type | page_title | slug | tags | res_type | category | ticketid |
---|---|---|---|---|---|---|---|---|
Extracting Text Within a Specific Rectangle in PDF Documents |
Learn how to extract text from a specified rectangle area within PDF pages using RadPdfProcessing. |
how-to |
How to Extract Text from a Specified Area in PDF Pages |
extract-text-specific-rectangle-pdf-radpdfprocessing |
radpdfprocessing, pdf, textfragment, cropbox, extract, text |
kb |
knowledge-base |
1653594 |
Version | Product | Author |
---|---|---|
2024.2.426 | RadPdfProcessing | Desislava Yordanova |
Learn how to extract the text from specific rectangular areas within PDF pages.
To extract text from a specific rectangle or crop box within a PDF page, you can utilize the [TextFragment]({%slug radpdfprocessing-model-textfragment%}) class along with its [MatrixPosition]({%slug radpdfprocessing-concepts-position%}) property. The following code snippet demonstrates how to load a PDF document, define a rectangle that represents the desired area from which text should be extracted, and iterate through the text fragments within each page. It checks if the position of the text fragment is contained within the specified rectangle and, if so, outputs the text.
static void Main(string[] args)
{
string originalFilePath = @"WinForms PdfViewer.pdf";
PdfFormatProvider provider = new PdfFormatProvider();
RadFixedDocument croppedDocument = provider.Import(File.ReadAllBytes(originalFilePath));
Rect middleRectangle = new Rect(croppedDocument.Pages.First().Size.Width/2, croppedDocument.Pages.First().Size.Height / 3, croppedDocument.Pages.First().Size.Width, croppedDocument.Pages.First().Size.Height / 3);
foreach (RadFixedPage currentPage in croppedDocument.Pages)
{
foreach (var contentElement in currentPage.Content)
{
TextFragment textFragment = contentElement as TextFragment;
if (textFragment != null)
{
string currentText = (contentElement as TextFragment).Text;
if (currentText==" ")
{
continue;
}
MatrixPosition position = textFragment.Position as MatrixPosition;
if (middleRectangle.Contains(position.Matrix.OffsetX, position.Matrix.OffsetY))
{
Debug.Write(currentText);
}
}
}
}
The cropped middle part of the page is represented in the below screenshot:
The detected text is printed in the Output console:
- [RadPdfProcessing Documentation]({%slug radpdfprocessing-overview%}})
- [TextFragment]({%slug radpdfprocessing-model-textfragment%}})
- [MatrixPosition]({%slug radpdfprocessing-concepts-position%})