Hesham Haroon

Hesham Haroon

مصر
٣١ ألف متابع أكثر من 500 زميل

نبذة عني

At WideBot, we're at the forefront of NLP, where my linguistics background and passion…

مقالات Hesham

عرض كل المقالات

الإسهامات

النشاط

انضم الآن لعرض كل النشاط

الخبرة

  • رسم بياني WideBot

    WideBot

    Qesm El Maadi, Cairo, Egypt

  • -

  • -

    Qesm El Zamalek, Cairo, Egypt

  • -

    Remote

  • -

  • -

  • -

    Riyadh, Saudi Arabia

  • -

التعليم

  • Association for Computational Linguistics

    -

    As a mentee researcher in the Association for Computational Linguistics, I had the opportunity to learn the fundamentals of research and gain valuable insights into the field of computational linguistics. Through this program, I was able to work closely with experienced researchers and mentors who provided guidance and support as I developed my skills and knowledge.

    Participating in this program allowed me to gain hands-on experience in conducting research, as well as to learn about the…

    As a mentee researcher in the Association for Computational Linguistics, I had the opportunity to learn the fundamentals of research and gain valuable insights into the field of computational linguistics. Through this program, I was able to work closely with experienced researchers and mentors who provided guidance and support as I developed my skills and knowledge.

    Participating in this program allowed me to gain hands-on experience in conducting research, as well as to learn about the latest developments and trends in the field. It also helped me to build my network of professional contacts and to connect with others who share my interests and passions.

    Overall, my experience as a mentee researcher in the Association for Computational Linguistics was invaluable in helping me to grow as a researcher and to gain a deeper understanding of the field. I am grateful for the opportunity and look forward to continuing to apply my skills and knowledge in my future research endeavors.

  • -

    I am proud to have completed the Data Science Summer School, a series of theoretical and practical workshops focused on the methods and technologies used by industry, government, and civil society to solve complex problems.

    The program began with an introduction to programming and the mathematical foundations essential for success in data science, and then delved into advanced machine learning and deep learning techniques for tackling Computer Vision and Natural Language Processing…

    I am proud to have completed the Data Science Summer School, a series of theoretical and practical workshops focused on the methods and technologies used by industry, government, and civil society to solve complex problems.

    The program began with an introduction to programming and the mathematical foundations essential for success in data science, and then delved into advanced machine learning and deep learning techniques for tackling Computer Vision and Natural Language Processing problems. Throughout the course, I gained valuable insights into the latest technologies and methods used in data science, and I had the opportunity to apply what I learned through hands-on exercises and projects.

    Overall, the Data Science Summer School was a truly enriching experience that has greatly expanded my knowledge and skills in this exciting field. I am grateful for the opportunity to have participated in this program and look forward to applying what I have learned in my future endeavors.

التراخيص والشهادات

الخبرات التطوعية

  • Student tutor

    Academic

    ⁩ - الحالي 3 عام 2 شهر

    Education

    As a volunteer tutor for Teach for Arabic students , I work with underprivileged students to provide academic support and help them achieve their academic goals. I design lesson plans and teaching materials and work one-on-one with students to provide tailored support and guidance.

    I am committed to promoting education and helping students achieve their full potential, and I am passionate about making a positive difference in the lives of others.

المنشورات

  • Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition

    LREC-COLING 2024: The 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT6)

    Wikipedia articles (content pages) are commonly used corpora in Natural Language Processing (NLP) research, especially in low-resource languages other than English. Yet, a few research studies have studied the three Arabic Wikipedia editions, Arabic Wikipedia (AR), Egyptian Arabic Wikipedia (ARZ), and Moroccan Arabic Wikipedia (ARY), and documented issues in the Egyptian Arabic Wikipedia edition regarding the massive automatic creation of its articles using template-based translation from…

    Wikipedia articles (content pages) are commonly used corpora in Natural Language Processing (NLP) research, especially in low-resource languages other than English. Yet, a few research studies have studied the three Arabic Wikipedia editions, Arabic Wikipedia (AR), Egyptian Arabic Wikipedia (ARZ), and Moroccan Arabic Wikipedia (ARY), and documented issues in the Egyptian Arabic Wikipedia edition regarding the massive automatic creation of its articles using template-based translation from English to Arabic without human involvement, overwhelming the Egyptian Arabic Wikipedia with articles that do not only have low-quality content but also with articles that do not represent the Egyptian people, their culture, and their dialect. In this paper, we aim to mitigate the problem of template translation that occurred in the Egyptian Arabic Wikipedia by identifying these template-translated articles and their characteristics through exploratory analysis and building automatic detection systems. We first explore the content of the three Arabic Wikipedia editions in terms of density, quality, and human contributions and utilize the resulting insights to build multivariate machine learning classifiers leveraging articles' metadata to detect the template-translated articles automatically. We then publicly deploy and host the best-performing classifier, XGBoost, as an online application called EGYPTIAN WIKIPEDIA SCANNER and release the extracted, filtered, and labeled datasets to the research community to benefit from our datasets and the online, web-based detection system

    مؤلفون آخرون
    عرض المنشور
  • Error Analysis of Pretrained Language Models (PLMs) in English-to-Arabic Machine Translation

    Human-Centric Intelligent Systems

    مؤلفون آخرون
    عرض المنشور

المشروعات

  • a OpenDevin

    -

    OpenDevin is the open source project build after the Devin first AI software tool, I worked on building the agents in the first stage

  • Bert-fine-tuning-sentence-classfication

    -

    Bidirectional Encoder Representations from Transformers, better known as BERT, is a revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and was the stepping stone for many other revolutionary architectures.
    It's not an exaggeration to say that BERT set a new direction for the entire domain. It shows clear benefits of using pre-trained models (trained on huge datasets) and transfer learning independent of the downstream tasks.
    his is a guided…

    Bidirectional Encoder Representations from Transformers, better known as BERT, is a revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and was the stepping stone for many other revolutionary architectures.
    It's not an exaggeration to say that BERT set a new direction for the entire domain. It shows clear benefits of using pre-trained models (trained on huge datasets) and transfer learning independent of the downstream tasks.
    his is a guided project on fine-tuning a Bidirectional Transformers for Language Understanding (BERT) model for text classification with TensorFlow. In this 2.5 hour long project, you will learn to preprocess and tokenize data for BERT classification, build TensorFlow input pipelines for text data with the tf.data API, and train and evaluate a fine-tuned BERT model for text classification with TensorFlow 2 and TensorFlow Hub.

    عرض المشروع
  • Building a language detection model with fastText

    -

    FastText has also published a fast and accurate tool for text-based language identification capable of reognizing more than 170 languages. The tool has been open-sourced to be used for free by anyone. To versions of language identification are available, each optimized for different memory usage.

    lid.176.bin, which is faster and slightly more accurate, but has a file size of 126MB.
    lid.176.ftz, which is the compressed version of the model, with a file size of 917kB. The smaller file…

    FastText has also published a fast and accurate tool for text-based language identification capable of reognizing more than 170 languages. The tool has been open-sourced to be used for free by anyone. To versions of language identification are available, each optimized for different memory usage.

    lid.176.bin, which is faster and slightly more accurate, but has a file size of 126MB.
    lid.176.ftz, which is the compressed version of the model, with a file size of 917kB. The smaller file size is achieved by a little compromise on accuracy

    عرض المشروع
  • Convolution Neural Network (CNN) From Scratch

    -

    A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics

    عرض المشروع
  • Detecting-Fake-News-with-Python

    -

    TF (Term Frequency): The number of times a word appears in a document is its Term Frequency. A higher value means a term appears more often than others, and so, the document is a good match when the term is part of the search terms.

    IDF (Inverse Document Frequency): Words that occur many times a document, but also occur many times in many others, may be irrelevant. IDF is a measure of how significant a term is in the entire corpus.

    The TfidfVectorizer converts a collection of raw…

    TF (Term Frequency): The number of times a word appears in a document is its Term Frequency. A higher value means a term appears more often than others, and so, the document is a good match when the term is part of the search terms.

    IDF (Inverse Document Frequency): Words that occur many times a document, but also occur many times in many others, may be irrelevant. IDF is a measure of how significant a term is in the entire corpus.

    The TfidfVectorizer converts a collection of raw documents into a matrix of TF-IDF features.

    Passive Aggressive algorithms are online learning algorithms. Such an algorithm remains passive for a correct classification outcome, and turns aggressive in the event of a miscalculation, updating and adjusting. Unlike most other algorithms, it does not converge. Its purpose is to make updates that correct the loss, causing very little change in the norm of the weight vector.

    عرض المشروع
  • Investigating Netflix Movies and Guest Stars in The Office

    -

    The Netflix Movies and Guest Stars in The Office project is a data analysis project that explores two different datasets. The first dataset is a collection of Netflix movies that includes information such as movie titles, countries of production, release years, and movie durations. The second dataset is a collection of episodes from the popular TV show "The Office" that includes information such as episode titles, air dates, and guest stars.

    The main objective of this project is to…

    The Netflix Movies and Guest Stars in The Office project is a data analysis project that explores two different datasets. The first dataset is a collection of Netflix movies that includes information such as movie titles, countries of production, release years, and movie durations. The second dataset is a collection of episodes from the popular TV show "The Office" that includes information such as episode titles, air dates, and guest stars.

    The main objective of this project is to practice data wrangling, visualization, and exploratory data analysis using Python and its powerful data science libraries such as Pandas, Matplotlib, and Seaborn. Throughout the project, we clean and manipulate the datasets to prepare them for analysis, explore relationships between variables using scatter plots, and use statistical methods to draw conclusions about trends and patterns in the data.

    The project involves a series of guided steps that lead to the completion of a final analysis, which in this case is answering the question of whether Netflix movies are getting shorter over time. In addition to this analysis, we also explore the relationship between guest stars and ratings in "The Office" dataset.

    عرض المشروع
  • Table Detection and OCR with Transformers

    -

    This project is a Streamlit app for detecting tables in images, cropping them, detecting cells within the cropped tables, and applying OCR (Optical Character Recognition) to extract the table data into a CSV file.

  • Toxic-Comments-Classification

    -

    build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s current models. You’ll be using a dataset of comments from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

    عرض المشروع

المؤسسات

  • SIGARAB

    researcher

    the Special Interest Group of the Association for Computational Linguistics for researchers concerned with all aspects of Arabic NLP. The activities of SIGARAB include: Exchange and propagation of research results, publications and resources relevant to all subareas of Arabic NLP, and for all Arabic variants (Standard, Classical and Dialectal). Coordination between organizations in academia and industry throughout the world performing research in Arabic NLP or using Arabic…

    the Special Interest Group of the Association for Computational Linguistics for researchers concerned with all aspects of Arabic NLP. The activities of SIGARAB include: Exchange and propagation of research results, publications and resources relevant to all subareas of Arabic NLP, and for all Arabic variants (Standard, Classical and Dialectal). Coordination between organizations in academia and industry throughout the world performing research in Arabic NLP or using Arabic NLP. Sponsoring a number of Arabic NLP workshops The Arabic Natural Language Processing Workshop (WANLP) (WANLP 2022) (WANLP Google Scholar) (WANLP Archive) (WANLP Twitter) The Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) (OSACT 2022) (OSACT Google Scholar) International Workshop on Arabic Big Data & AI (IWABigDAI) (IWABIGDAI 2022)

المزيد من أنشطة Hesham

عرض ملف Hesham الشخصي الكامل

  • مشاهدة الأشخاص المشتركين الذين تعرفهم
  • تقديم تعارف
  • تواصل مع Hesham مباشرة
انضم لعرض الملف الشخصي الكامل

ملفات شخصية أخرى مشابهة

أعضاء آخرون يحملون اسم ⁦⁩Hesham Haroon

اكتسب مهارات جديدة من خلال هذه المواد الدراسية