0% found this document useful (0 votes)
55 views17 pages

Phase 3 - Communication and Future Exploration..

Uploaded by

22b106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views17 pages

Phase 3 - Communication and Future Exploration..

Uploaded by

22b106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

PHASE 3: COMMUNICATION AND FUTURE EXPLORATION

1.ABSTRACT:
The need for seamless communication across different languages is more
pressing than ever. The Multilanguage Muse project aims to harness the power of
artificial intelligence to provide a comprehensive solution for translating both
audio and text files from one language to another. This innovative system leverages
state-of-the-art natural language processing (NLP) and speech recognition
technologies to deliver accurate and contextually appropriate translations.

The core of the Multilanguage Muse is an AI-driven translation engine that


integrates advanced machine learning models trained on vast multilingual datasets.
These models are capable of understanding and processing various languages,
ensuring high-quality translations that preserve the meaning and nuances of the
original content. The system supports a wide range of languages, making it a
versatile tool for diverse user needs.

Key features of the Multilanguage Muse include:

● Text Translation: The system can handle various text formats, including
documents, emails, and web content. It ensures that translations are not only
accurate but also contextually appropriate, taking into account idiomatic
expressions and cultural nuances.

● Audio Translation: Utilizing sophisticated speech recognition algorithms,


the system converts spoken language into text, which is then translated into
the target language. The translated text can be synthesized back into speech,
providing a seamless audio translation experience.

● Image Text Translation: By employing advanced OCR technology, the


system extracts text from images, such as scanned documents, photographs,
or screenshots. This text is then translated into the desired language,
enabling users to access the content of images regardless of the language it
was originally created in.
● User-Friendly Interface: The Multilanguage Muse offers an intuitive user
interface that allows users to easily upload files and select source and target
languages. Real-time translation feedback and customization options
enhance user control and satisfaction.

● Scalability and Integration: Designed with scalability in mind, the system


can handle large volumes of data and integrate with other applications and
platforms, making it suitable for both personal and enterprise use.

This project demonstrates the potential of AI in breaking down language barriers


and fostering global communication. By providing reliable and efficient translation
services, the Multilanguage Muse contributes to greater accessibility and
understanding in a multilingual world.

2. SYSTEM REQUIREMENTS:

2.1 HARDWARE REQUIREMENTS


1. High-performance computing hardware (e.g., multi-core CPU, GPU, or
specialized AI accelerators like TPUs) for training and inference tasks.
2. RAM-4 GB or higher

2.2 SOFTWARE REQUIREMENTS


1. Operating System- Windows, Linux, or macOS.
2. Development Environment- Transformers , MBart , OpenAI-whisper ,
Flask , React , OCR , Tesseract

3. TOOLS AND VERSIONS :


1. Google colaboratory
2.Transformers - 3.5.0
3. Openai - 0.26.5
4. Easyocr - 1.7.1
5. Pytesseract-0.3.10
3.FLOWCHART:
CODE IMPLEMENTATION(SAMPLE CODE):

Text_to_text.py:

def translate(src,tar , given_text):


from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,
pipeline

languages = {
"Acehnese (Arabic script)": "ace_Arab",
"Acehnese (Latin script)": "ace_Latn",
"Mesopotamian Arabic": "acm_Arab",
"Ta’izzi-Adeni Arabic": "acq_Arab",
"Tunisian Arabic": "aeb_Arab",
"Afrikaans": "afr_Latn",
"South Levantine Arabic": "ajp_Arab",
"Akan": "aka_Latn",
"Amharic": "amh_Ethi",
"North Levantine Arabic": "apc_Arab",
"Modern Standard Arabic": "arb_Arab",
"Modern Standard Arabic (Romanized)": "arb_Latn",
"Najdi Arabic": "ars_Arab",
"Moroccan Arabic": "ary_Arab",
"Egyptian Arabic": "arz_Arab",
"Assamese": "asm_Beng",
"Asturian": "ast_Latn",
"Awadhi": "awa_Deva",
"Central Aymara": "ayr_Latn",
"South Azerbaijani": "azb_Arab",
"North Azerbaijani": "azj_Latn",
"Bashkir": "bak_Cyrl",
"Bambara": "bam_Latn",
"Balinese": "ban_Latn",
"Belarusian": "bel_Cyrl",
"Bemba": "bem_Latn",
"Bengali": "ben_Beng",
"Bhojpuri": "bho_Deva",
"Banjar (Arabic script)": "bjn_Arab",
"Banjar (Latin script)": "bjn_Latn",
"Standard Tibetan": "bod_Tibt",
"Bosnian": "bos_Latn",
"Buginese": "bug_Latn",
"Bulgarian": "bul_Cyrl",
"Catalan": "cat_Latn",
"Cebuano": "ceb_Latn",
"Czech": "ces_Latn",
"Chokwe": "cjk_Latn",
"Central Kurdish": "ckb_Arab",
"Crimean Tatar": "crh_Latn",
"Welsh": "cym_Latn",
"Danish": "dan_Latn",
"German": "deu_Latn",
"Southwestern Dinka": "dik_Latn",
"Dyula": "dyu_Latn",
"Dzongkha": "dzo_Tibt",
"Greek": "ell_Grek",
"English": "eng_Latn",
"Esperanto": "epo_Latn",
"Estonian": "est_Latn",
"Basque": "eus_Latn",
"Ewe": "ewe_Latn",
"Faroese": "fao_Latn",
"Fijian": "fij_Latn",
"Finnish": "fin_Latn",
"Fon": "fon_Latn",
"French": "fra_Latn",
"Friulian": "fur_Latn",
"Nigerian Fulfulde": "fuv_Latn",
"Scottish Gaelic": "gla_Latn",
"Irish": "gle_Latn",
"Galician": "glg_Latn",
"Guarani": "grn_Latn",
"Gujarati": "guj_Gujr",
"Haitian Creole": "hat_Latn",
"Hausa": "hau_Latn",
"Hebrew": "heb_Hebr",
"Hindi": "hin_Deva",
"Chhattisgarhi": "hne_Deva",
"Croatian": "hrv_Latn",
"Hungarian": "hun_Latn",
"Armenian": "hye_Armn",
"Igbo": "ibo_Latn",
"Ilocano": "ilo_Latn",
"Indonesian": "ind_Latn",
"Icelandic": "isl_Latn",
"Italian": "ita_Latn",
"Javanese": "jav_Latn",
"Japanese": "jpn_Jpan",
"Kabyle": "kab_Latn",
"Jingpho": "kac_Latn",
"Kamba": "kam_Latn",
"Kannada": "kan_Knda",
"Kashmiri (Arabic script)": "kas_Arab",
"Kashmiri (Devanagari script)": "kas_Deva",
"Georgian": "kat_Geor",
"Central Kanuri (Arabic script)": "knc_Arab",
"Central Kanuri (Latin script)": "knc_Latn",
"Kazakh": "kaz_Cyrl",
"Kabiyè": "kbp_Latn",
"Kabuverdianu": "kea_Latn",
"Khmer": "khm_Khmr",
"Kikuyu": "kik_Latn",
"Kinyarwanda": "kin_Latn",
"Kyrgyz": "kir_Cyrl",
"Kimbundu": "kmb_Latn",
"Northern Kurdish": "kmr_Latn",
"Kikongo": "kon_Latn",
"Korean": "kor_Hang",
"Lao": "lao_Laoo",
"Ligurian": "lij_Latn",
"Limburgish": "lim_Latn",
"Lingala": "lin_Latn",
"Lithuanian": "lit_Latn",
"Lombard": "lmo_Latn",
"Latgalian": "ltg_Latn",
"Luxembourgish": "ltz_Latn",
"Luba-Kasai": "lua_Latn",
"Ganda": "lug_Latn",
"Luo": "luo_Latn",
"Mizo": "lus_Latn",
"Standard Latvian": "lvs_Latn",
"Magahi": "mag_Deva",
"Maithili": "mai_Deva",
"Malayalam": "mal_Mlym",
"Marathi": "mar_Deva",
"Minangkabau (Arabic script)": "min_Arab",
"Minangkabau (Latin script)": "min_Latn",
"Macedonian": "mkd_Cyrl",
"Plateau Malagasy": "plt_Latn",
"Maltese": "mlt_Latn",
"Meitei (Bengali script)": "mni_Beng",
"Halh Mongolian": "khk_Cyrl",
"Mossi": "mos_Latn",
"Maori": "mri_Latn",
"Burmese": "mya_Mymr",
"Dutch": "nld_Latn",
"Norwegian Nynorsk": "nno_Latn",
"Norwegian Bokmål": "nob_Latn",
"Nepali": "npi_Deva",
"Northern Sotho": "nso_Latn",
"Nuer": "nus_Latn",
"Nyanja": "nya_Latn",
"Occitan": "oci_Latn",
"West Central Oromo": "gaz_Latn",
"Odia": "ory_Orya",
"Pangasinan": "pag_Latn",
"Eastern Panjabi": "pan_Guru",
"Papiamento": "pap_Latn",
"Western Persian": "pes_Arab",
"Polish": "pol_Latn",
"Portuguese": "por_Latn",
"Dari": "prs_Arab",
"Southern Pashto": "pbt_Arab",
"Ayacucho Quechua": "quy_Latn",
"Romanian": "ron_Latn",
"Rundi": "run_Latn",
"Russian": "rus_Cyrl",
"Sango": "sag_Latn",
"Sanskrit": "san_Deva",
"Santali": "sat_Olck",
"Sicilian": "scn_Latn",
"Shan": "shn_Mymr",
"Sinhala": "sin_Sinh",
"Slovak": "slk_Latn",
"Slovenian": "slv_Latn",
"Samoan": "smo_Latn",
"Shona": "sna_Latn",
"Sindhi": "snd_Arab",
"Somali": "som_Latn",
"Southern Sotho": "sot_Latn",
"Spanish": "spa_Latn",
"Tosk Albanian": "als_Latn",
"Sardinian": "srd_Latn",
"Serbian": "srp_Cyrl",
"Swati": "ssw_Latn",
"Sundanese": "sun_Latn",
"Swedish": "swe_Latn",
"Swahili": "swh_Latn",
"Silesian": "szl_Latn",
"Tamil": "tam_Taml",
"Tatar": "tat_Cyrl",
"Telugu": "tel_Telu",
"Tajik": "tgk_Cyrl",
"Tagalog": "tgl_Latn",
"Thai": "tha_Thai",
"Tigrinya": "tir_Ethi",
"Tamasheq (Latin script)": "taq_Latn",
"Tamasheq (Tifinagh script)": "taq_Tfng",
"Tok Pisin": "tpi_Latn",
"Tswana": "tsn_Latn",
"Tsonga": "tso_Latn",
"Turkmen": "tuk_Latn",
"Tumbuka": "tum_Latn",
"Turkish": "tur_Latn",
"Twi": "twi_Latn",
"Central Atlas Tamazight": "tzm_Tfng",
"Uyghur": "uig_Arab",
"Ukrainian": "ukr_Cyrl",
"Umbundu": "umb_Latn",
"Urdu": "urd_Arab",
"Northern Uzbek": "uzn_Latn",
"Venetian": "vec_Latn",
"Vietnamese": "vie_Latn",
"Waray": "war_Latn",
"Wolof": "wol_Latn",
"Xhosa": "xho_Latn",
"Eastern Yiddish": "ydd_Hebr",
"Yoruba": "yor_Latn",
"Yue Chinese": "yue_Hant",
"Chinese (Simplified)": "zho_Hans",
"Chinese (Traditional)": "zho_Hant",
"Standard Malay": "zsm_Latn",
"Zulu": "zul_Latn"
}

model =
AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
tokenizer =
AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")

translator = pipeline('translation', model=model, tokenizer=tokenizer,


src_lang=languages[src], tgt_lang=languages[tar], max_length = 400)
text = given_text
output_text = translator(text,max_length=700)

print(output_text[0]['translation_text'])

translate("Tunisian Arabic" , "English" ,"‫سنتناء ولن كل زيارة بمزيد من الطفصيل زيارة‬


‫)"األولى‬

File_to_text.py:

# from pypdf import PdfReader


# file = open("tamil.pdf",'rb')

# reader = PdfReader(file)
# num_pages = reader.pages
# for p in range(len(num_pages)):
# page = reader.pages[p]
# text=page.extract_text()
# print(text)

from PIL import Image


from pytesseract import pytesseract
import enum
import text2text

class OS(enum.Enum):
Mac = 0
Windows = 1

class Language(enum.Enum):
ENG = 'eng'
RUS = 'rus'
ITA = 'ita'
TAM = 'tam'

class ImageReader:

def __init__(self,os : OS):


if os== OS.Mac:
print()

if os == OS.Windows:
windows_path = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
pytesseract.tesseract_cmd = windows_path
print("running on : windows\n")

def extract_text(self,image:str,Language) -> str:


img = Image.open(image)
extracted_text = pytesseract.image_to_string(img,lang= Language)
return extracted_text

if __name__ == "__main__":
ir = ImageReader(OS.Windows)
path = "russian.png"
if path[len(path)-4:len(path)] == ".pdf":

from pdf2image import convert_from_path


images = convert_from_path(path)
for i in range(len(images)):
images[i].save('page'+ str(i) +'.jpg', 'JPEG')

text = ir.extract_text('page'+ str(i) +'.jpg',Language =


'tam')

text2text.translate("Tamil","English",text)
# print(text)
else:
text = ir.extract_text(path,Language = 'rus')
print(text)
text2text.translate("Russian","English",text)

Speech_To_Text.py:

import whisper
import text2text
model = whisper.load_model("base")

from IPython.display import Audio


Audio("Ar_F114_Rania (mp3cut.net).mp3")

# load audio and pad/trim it to fit 30 seconds


audio = whisper.load_audio("Ar_F114_Rania (mp3cut.net).mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language


_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text


print(result.text)
text2text.translate("Tunisian Arabic","English",result.text)

Image to text

pip install easyocr


pip install easyocr googletrans==4.0.0-rc1

import matplotlib.pyplot as plt


import cv2
import easyocr
from pylab import rcParams
from IPython.display import Image
from googletrans import Translator
rcParams['figure.figsize'] = 8, 16

reader = easyocr.Reader(['en'])

Image("/content/img1.jpg")

output = reader.readtext("/content/img1.jpg")

output

texts = [item[1] for item in output]


print(texts)

translator = Translator()

translated_texts = [translator.translate(text, dest='ta').text for text in texts]

for original, translated in zip(texts, translated_texts):


print(f"Original: {original} => Translated: {translated}")

PROJECT HURDLES:
● While doing the project I got so many errors while loading the facebook’s NLLB Model
after downloading the correct modules and dependencies the errors was solved finally
● Since the model is so large my laptops processor could not handle so it took very long
time to execute each translation texts.

OUTPUT:
CONCLUSION AND FUTURE SCOPE:

The Multilanguage Muse project has successfully created a powerful


tool that uses artificial intelligence to help people communicate across different
languages. It can translate text, speech, and images accurately, making it a valuable
resource for both individuals and businesses. The system is designed to be easy to
use and can handle large amounts of data, ensuring that translations are not only
precise but also contextually appropriate.
This project has shown how AI can break down language barriers, making it easier
for people to understand each other regardless of the language they speak. By
providing reliable translation services, Multilanguage Muse helps promote global
communication and understanding. The tool's ability to translate different types of
content, from documents to spoken words, demonstrates its versatility and
effectiveness.

Looking ahead, there are several ways to improve and expand the Multilanguage
Muse project. Adding support for more languages, especially those that are less
common, will make the system more inclusive. Enhancing the system to provide
real-time translations during live conversations and video calls will make it even
more useful in everyday situations.

Another area for improvement is the AI's understanding of context and cultural
nuances. By making the models better at recognizing idiomatic expressions and
cultural differences, the translations can become even more accurate and
meaningful. Integrating the translation tool into popular virtual assistants can make
it more accessible for everyday use.

Optimizing the system for mobile devices will ensure that people can use it
wherever they are, adding convenience and flexibility. Allowing users to customize
translation models for specific fields, like medicine or law, can provide more
accurate translations for specialized content.

Finally, enhancing security measures to protect user data and ensuring privacy will
build trust in the system. Encouraging user feedback to continually improve the
translation models can create a community-driven approach to making the tool
better. Exploring new areas, such as translating video content, can further expand
the system's capabilities, making it a comprehensive solution for all translation
needs.

You might also like