We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11
Introduction to Natural
Language Processing
INSTRUCTOR: DR . GULSHAN SALEEM
COURSE CODE: CS AL4253 Introduction to POS Tagging POS Tagging is the process of marking each word in a sentence with its corresponding part of speech (noun, verb, adjective, etc.). It helps machines understand the grammatical structure of a sentence.
•Purpose: Helps in understanding the syntactic structure of a sentence.
Applications • Syntactic parsing • Machine translation • Text analysis and sentiment analysis
•Common POS Tags:
• Noun (NN), Verb (VB), Adjective (JJ), Adverb (RB), etc Introduction to Named Entity Recognition (NER) Named Entity Recognition (NER) is the task of identifying entities in a text and classifying them into predefined categories such as Person, Organization, Location, Date, etc. Significance Helps extract important information from large texts, which is useful in: 1. Information retrieval 2. Text summarization 3. Chatbots and virtual assistants Common Entity Types: PERSON: Names of people. ORG: Organizations (e.g., companies, universities). GPE: Geopolitical entities (countries, cities). DATE: Dates, years. Task (POS) "The quick brown fox jumps over the lazy dog.“ "She sells sea shells by the sea shore." TASK (NER) "Bill Gates is the founder of Microsoft, which is headquartered in Redmond, Washington.“ "The Eiffel Tower is located in Paris, France.“ "Google maps are helpful" vs. "I need to google that" Using Visualization for NER Use spaCy’s built-in visualization tool, displacy, to visually display entities and POS tags for a sentence. NLTK vs spaCy •Stemming: Only available in NLTK (e.g., PorterStemmer, SnowballStemmer). spaCy does not include stemming natively. •Lemmatization: Both libraries support it, but spaCy is generally more accurate. •NER: spaCy has a more comprehensive NER implementation, while NLTK has limited NER capabilities. NLTK vs spaCy spaCy is better suited for complete, efficient NLP pipelines, especially when accurate lemmatization and NER are needed. However, it lacks native stemming.NLTK is more flexible in customization and includes stemming, but it has limited NER capabilities and less robust lemmatization. Combining NLTK and spaCy: we can use spaCy for tokenization, lemmatization, POS tagging, and NER, and NLTK for stemming if needed. NLTK vs spaCy