Skip to content

Conversation

@jalajthanaki
Copy link
Owner

Fix wordsteam.py: Replace broken polyglot with NLTK and update to Python 3

Summary

This PR fixes the HTTP Error 403: Forbidden error in ch3/3_1_wordsteam.py that readers reported when trying to run the code from the Python Natural Language Processing book. The root cause was that the polyglot library's data repository (http://bit.ly/polyglot-data) is defunct and no longer accessible.

Changes made:

  • Removed polyglot dependency and replaced polyglot_stem() with nltk_lemmatizer() using NLTK's WordNetLemmatizer
  • Converted all Python 2 print statements to Python 3 syntax (added parentheses)
  • Fixed syntax error: added missing comma between "canonical" and "historical" in the words list (line 21)
  • Script now runs successfully without HTTP errors

Pedagogical note: The replacement uses lemmatization instead of morpheme segmentation. While both demonstrate morphological analysis, they're different techniques. Lemmatization reduces words to their base dictionary form, while morpheme segmentation breaks words into meaningful units. The new implementation shows noun/adjective/verb forms for each word.

Review & Testing Checklist for Human

  • Verify educational alignment: Check that lemmatization still teaches the intended concepts from Chapter 3 of the book. If the book specifically teaches morpheme segmentation, this change might need adjustment or additional explanation.
  • Test in target environment: Run the script in a fresh Python environment (especially Windows/Anaconda if that's what readers use) to ensure NLTK installation and wordnet data downloads work smoothly.
  • Update installation instructions: Check if ch3/Chapter_3_Installation_Commands.txt needs updating to replace polyglot instructions with NLTK + wordnet data download instructions.
  • Check for other polyglot usage: Search the repository for other files that might use polyglot and need similar fixes.

Recommended test plan:

# In a fresh environment:
pip install nltk
python -c "import nltk; nltk.download('wordnet'); nltk.download('omw-1.4')"
python ch3/3_1_wordsteam.py
# Verify output shows stemming results and lemmatization results without errors

Notes

…hon 3

- Remove polyglot dependency which has defunct data repository
- Replace polyglot_stem() with nltk_lemmatizer() using WordNetLemmatizer
- Convert all Python 2 print statements to Python 3 syntax
- Fix syntax error: add missing comma in words list (line 21)
- Maintain same educational purpose: demonstrating morphological analysis
- Script now runs successfully without HTTP 403 errors

Co-Authored-By: [email protected] <[email protected]>
@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants