0% found this document useful (0 votes)
195 views7 pages

Farasapy-Tests - Ipynb - Colaboratory

The document demonstrates the use of the Farasapy library for Arabic natural language processing tasks. It shows how to install Farasapy, import the necessary modules, and apply segmenting, stemming, part-of-speech tagging, named entity recognition, and diacritization to a sample Arabic text. Both standalone and interactive modes are presented.

Uploaded by

Fahd Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
195 views7 pages

Farasapy-Tests - Ipynb - Colaboratory

The document demonstrates the use of the Farasapy library for Arabic natural language processing tasks. It shows how to install Farasapy, import the necessary modules, and apply segmenting, stemming, part-of-speech tagging, named entity recognition, and diacritization to a sample Arabic text. Both standalone and interactive modes are presented.

Uploaded by

Fahd Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2021/‫‏‬6/‫‏‬26 farasapy-tests.

ipynb - Colaboratory

Farasapy (Interactive Session)

This is a live preview for farasapy library.

install

pip  install -U farasapy

Requirement already up-to-date: farasapy in /usr/local/lib/python3.7/dist-packages (0.0.13)

Requirement already satisfied, skipping upgrade: tqdm in /usr/local/lib/python3.7/dist-packages (from farasapy) (4.41.1)

Requirement already satisfied, skipping upgrade: requests in /usr/local/lib/python3.7/dist-packages (from farasapy) (2.23.0)

Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packa


Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->farasapy
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->f
Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->fa

import

from farasa.pos import FarasaPOSTagger
from farasa.ner import FarasaNamedEntityRecognizer
from farasa.diacratizer import FarasaDiacritizer
from farasa.segmenter import FarasaSegmenter
from farasa.stemmer import FarasaStemmer

Sample

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 1/7
2021/‫‏‬6/‫‏‬26 farasapy-tests.ipynb - Colaboratory

You can put any sample here and watch out the results

#source: https://r12a.github.io/scripts/tutorial/summaries/arabic
sample =\
'''
‫الست‬ ‫الرسمية‬ ‫المتحدة‬ ‫األمم‬ ‫منظمة‬ ‫لغات‬ ‫من‬ ‫الرابعة‬ ‫اللغة‬ ‫وهي‬ .‫وغيرها‬ ‫وإريتريا‬ ‫والسنغال‬ ‫وتشاد‬ ‫وتركيا‬ ‫األهواز‬ ‫مثل‬ ‫المجاورة‬ ‫األخرى‬ ‫المناطق‬ ‫من‬ ‫العديد‬ ‫إلى‬ ‫باإلضافة‬ ‫العربي‬ ‫الوطن‬ ‫باسم‬ ‫المعروفة‬ ‫المنطقة‬ 
'''

print("original sample:",sample)

original sample:

‫ وهي اللغة الرابعة من لغات منظمة األمم المتحدة الرسمية الست‬.‫المعروفة باسم الوطن العربي باإلضافة إلى العديد من المناطق األخرى المجاورة مثل األهواز وتركيا وتشاد والسنغال وإريتريا وغيرها‬

Standalone Mode

Segmenter

segmenter = FarasaSegmenter()

/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being


InsecureRequestWarning)

100%|██████████| 241M/241M [00:21<00:00, 12.6MiB/s]

segmented = segmenter.segment(sample)
print("sample segmented:",segmented)

sample segmented: ‫ست‬+‫ة ال‬+‫رسمي‬+‫ة ال‬+‫متحد‬+‫أمم ال‬+‫ة ال‬+‫ات منظم‬+‫ة من لغ‬+‫رابع‬+‫ة ال‬+‫لغ‬+‫هي ال‬+‫ و‬. ‫ها‬+‫غير‬+‫إريتريا و‬+‫سنغال و‬+‫ال‬+‫تشاد و‬+‫تركيا و‬+‫أهواز و‬+‫ة مثل ال‬+‫مجاور‬+‫ل‬

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 2/7
2021/‫‏‬6/‫‏‬26 farasapy-tests.ipynb - Colaboratory

Stemmer

stemmer = FarasaStemmer()

stemmed = stemmer.stem(sample)
print("sample stemmed:",stemmed)

sample stemmed: ‫ هي لغة رابع من لغة منظمة أمة متحد رسمي ست‬. ‫ي منطقة معروف اسم وطن عربي إضافة إلى عديد من منطقة آخر مجاور مثل أهواز تركيا تشاد سنغال أريتريا غير‬

POS Tagger

pos_tagger = FarasaPOSTagger()

pos_tagged = pos_tagger.tag(sample)
print("sample POS Tagged",pos_tagged)

100%|██████████| 241M/241M [00:40<00:00, 12.6MiB/s]sample POS Tagged S/S ‫يشار‬/V ‫إلى‬/PREP ‫أن‬/PART ‫ة‬+ ‫ لغ‬+‫ال‬/DET+NOUN+NSUFF-FS ‫ة‬+ ‫ عربي‬+

Farasa Named Entity Recognition

named_entity_recognizer = FarasaNamedEntityRecognizer()

named_entity_recognized = named_entity_recognizer.recognize(sample)
print("sample named entity recognized:",named_entity_recognized)

sample named entity recognized: ‫يشار‬/O ‫إلى‬/O ‫أن‬/O ‫اللغة‬/O ‫العربية‬/O ‫يتحدثها‬/O ‫أكثر‬/O ‫من‬/O 422/O ‫مليون‬/O ‫نسمة‬/O ‫ويتوزع‬/O ‫متحدثوها‬/O ‫في‬/O ‫المنطقة‬/O ‫المعروفة‬/

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 3/7
2021/‫‏‬6/‫‏‬26 farasapy-tests.ipynb - Colaboratory

Diacritizer

diacritizer = FarasaDiacritizer()

diacritized = diacritizer.diacritize(sample)
print("sample diacritized:",diacritized)

sample diacritized: ‫ َو هَي الُّلَغُة الّر اِبَعُة ِم ْن ُلغاِت ُم َنَّظَم ِة اُألَم ِم الُم َّتِح َدِة الَّر ْس مَّيِة الِّسِّت‬. ‫ضاَفِة ِإَلى الَعديِد ِم ْن الَم ناِط ِق اُألْخ َر ى الُم جاِوَر ِة ِم ْثَل اَألْه واِز َو ُتْر كيا َو ِتشاَد والِّس ْنغاِل َو ِإريْتريا َو َغْيُر ها‬

Interactive Mode

Segmenter

segmenter_interactive = FarasaSegmenter(interactive=True)

[2021-04-16 01:10:49,144 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You m

segmented_interactive = segmenter_interactive.segment(sample)
print("sample segmented (interactive):",segmented_interactive)
# terminate the object to save resources:
segmenter_interactive.terminate()

sample segmented (interactive): ‫ست‬+‫ة ال‬+‫رسمي‬+‫ة ال‬+‫متحد‬+‫أمم ال‬+‫ة ال‬+‫ات منظم‬+‫ة من لغ‬+‫رابع‬+‫ة ال‬+‫لغ‬+‫هي ال‬+‫ و‬. ‫ها‬+‫غير‬+‫إريتريا و‬+‫سنغال و‬+‫ال‬+‫تشاد و‬+‫تركيا و‬+‫هواز و‬

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 4/7
2021/‫‏‬6/‫‏‬26 farasapy-tests.ipynb - Colaboratory

Stemmer
stemmer_interactive = FarasaStemmer(interactive=True)

[2021-04-16 01:10:54,103 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You m

stemmed_interactive = stemmer_interactive.stem(sample)
print("sample stemmed (interactive):",stemmed_interactive)
# terminate the object to save resources:
stemmer_interactive.terminate()

sample stemmed (interactive): ‫ هي لغة رابع من لغة منظمة أمة متحد رسمي ست‬. ‫وطن عربي إضافة إلى عديد من منطقة آخر مجاور مثل أهواز تركيا تشاد سنغال أريتريا غير‬

POS Tagger

pos_tagger_interactive = FarasaPOSTagger(interactive=True)

[2021-04-16 01:10:59,199 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You m

pos_tagged_interactive = pos_tagger_interactive.tag(sample)
print("sample POS Tagged (interactive)",pos_tagged_interactive)
# terminate the object to save resources:
pos_tagger_interactive.terminate()

sample POS Tagged (interactive) S/S ‫يشار‬/V ‫إلى‬/PREP ‫أن‬/PART ‫ة‬+ ‫ لغ‬+‫ال‬/DET+NOUN+NSUFF-FS ‫ة‬+ ‫ عربي‬+‫ال‬/DET+ADJ+NSUFF-FS ‫يتحدث‬/V +‫ها‬/PRON ‫كثر‬

Named Entity Reconition

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 5/7
2021/‫‏‬6/‫‏‬26 farasapy-tests.ipynb - Colaboratory

named_entity_recognizer_interactive = FarasaNamedEntityRecognizer(interactive=True)

[2021-04-16 01:11:15,644 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You m

named_entity_recognized_interactive = named_entity_recognizer_interactive.recognize(sample)
print("sample named entity recognized (interactive):",named_entity_recognized_interactive)
# terminate the object to save resources:
named_entity_recognizer_interactive.terminate()

sample named entity recognized (interactive): ‫يشار‬/O ‫إلى‬/O ‫أن‬/O ‫اللغة‬/O ‫العربية‬/O ‫يتحدثها‬/O ‫أكثر‬/O ‫من‬/O 422/O ‫مليون‬/O ‫نسمة‬/O ‫ويتوزع‬/O ‫متحدثوها‬/O ‫في‬/O

Diacritizer

diacritizer_interactive = FarasaDiacritizer(interactive=True)

[2021-04-16 01:11:41,439 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You m

diacritized_interactive = diacritizer_interactive.diacritize(sample)
print("sample diacritized (interactive):",diacritized_interactive)
# terminate the object to save resources:
diacritizer_interactive.terminate()

sample diacritized (interactive): ‫ َو هَي الُّلَغُة الّر اِبَعُة ِم ْن ُلغاِت ُم َنَّظَم ِة اُألَم ِم الُم َّتِح َدِة الَّر ْس مَّيِة الِّسِّت‬. ‫لَم ناِط ِق اُألْخ َر ى الُم جاِوَر ِة ِم ْثَل اَألْه واِز َو ُتْر كيا َو ِتشاَد والِّس ْنغاِل َو ِإريْتريا َو َغْيُر ها‬

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 6/7
2021/‫‏‬6/‫‏‬26 farasapy-tests.ipynb - Colaboratory

https://colab.research.google.com/drive/1xjzYwmfAszNzfR6Z2lSQi3nKYcjarXAW?usp=sharing#scrollTo=nUtKdhhMnJtB&printMode=true 7/7

You might also like