Math El
Math El
EXPERIENTIAL LEARNING
TEAM INTRODUCTION
2. LITERATURE SURVEY
3. METHODOLOGY
4. IMPLEMENTATION
5. CONCLUSION
INTRODUCTION
● Image Capture: The system captures images using a Raspberry Pi camera connected to the CSI
port. The camera is moved over printed text to capture clear and high-quality images.
● Pre-processing: The captured images undergo pre-processing to enhance clarity and remove
noise. This includes thresholding, where grayscale images are converted to binary images based
on a specified threshold, and blurring to eliminate noise using techniques like median filtering.
● Optical Character Recognition (OCR): The pre-processed images are then passed through an
OCR engine, in this case, Tesseract OCR. Tesseract analyzes the images, detects individual
characters, and segments them into words. It employs x-height normalization to differentiate
between capital and small text.
● Text-to-Speech Conversion: The recognized text is then converted into speech using eSpeak, an
open-source software for speech synthesis. This process involves text analysis, phonetic
analysis, prosodic analysis, and speech production. The synthesized speech output is then ready
for playback.
● Speech Output: Finally, the synthesized speech output is played through the audio jack of the
Raspberry Pi board, allowing users to hear the converted text through headphones.
IMPLEMENTATION
OCR Module:
Image Processing: OCR often begins with image preprocessing, which includes
techniques like thresholding, blurring, and edge detection. These techniques involve
mathematical operations such as convolution, averaging, and gradient calculation.
IMPLEMENTATION
OCR Module:
Feature Extraction: In OCR, characters need to be distinguished from the background and other
objects in the image. Mathematical algorithms, such as histogram analysis, connected component
analysis, and contour detection, are used to extract relevant features of characters, such as shape,
size, and orientation.
IMPLEMENTATION
OCR Module:
Error Correction:
OCR systems often incorporate mathematical algorithms for error correction, which may
involve techniques like error detection and correction codes, probabilistic models, or
contextual analysis to improve accuracy.
Hamming Codes: These codes add parity bits to the data to detect and correct single-bit
errors. The parity bits are calculated based on specific bit positions in the data.
Reed-Solomon Codes: These codes are used for correcting multiple errors in data,
commonly employed in barcode and QR code scanning applications.They work by adding
redundancy to the data, enabling the correction of errors even in the presence of a significant
number of corrupted bits.
IMPLEMENTATION
Image: Camera:
IMPLEMENTATION
Text and audio:
CONCLUSION
In conclusion, the project presents an innovative solution leveraging
Raspberry Pi technology for real-time text-to-speech conversion, aimed at
enhancing accessibility for visually impaired individuals. Through the
integration of computer vision, deep learning, and text synthesis techniques,
the system offers a compact and portable means to process text input and
generate clear and accurate speech output. Its efficient processing
capabilities and real-time performance ensure immediate access to
information from printed text, thus empowering users with greater
independence and inclusion in daily activities. Furthermore, the project's
potential for future development and expansion, such as knowledge
distillation and mobile deployment, signifies its contribution to advancing
assistive technology and improving the quality of life for individuals with
visual impairments.
THANKYOU!