Detect an Unknown Language using Python
Last Updated :
22 Oct, 2025
Language detection is an essential task in Natural Language Processing (NLP). It involves identifying the language of a given text by analyzing its characters, words, and structure. Python provides several libraries to make this process simple and accurate.
In this article, we’ll explore three popular libraries for language detection:
Using langdetect Library
The langdetect module is a port of Google’s language-detection library and supports 55+ languages. It’s not included in Python’s standard library, so you need to install it first.
Install the library using:
pip install langdetect
Python
from langdetect import detect
print(detect("Geeksforgeeks is a computer science portal for geeks"))
print(detect("Geeksforgeeks - это компьютерный портал для гиков"))
print(detect("Geeksforgeeks es un portal informático para geeks"))
print(detect("Geeksforgeeks是面向极客的计算机科学门户"))
print(detect("Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है"))
print(detect("Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"))
Output
en
ru
es
zh-cn
hi
ja
Explanation: detect(): automatically identifies the most probable language for the given text using a pre-trained statistical model.
Using textblob Library
TextBlob is a powerful library for various NLP tasks such as sentiment analysis, translation, and language detection.
Install the library using:
pip install textblob
Example:
Python
from textblob import TextBlob
texts = [
"Geeksforgeeks is a computer science portal for geeks",
"Geeksforgeeks - это компьютерный портал для гиков",
"Geeksforgeeks es un portal informático para geeks",
"Geeksforgeeks是面向极客的计算机科学门户",
"Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
"Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"
]
for t in texts:
print(TextBlob(t).detect_language())
Output
en
ru
es
zh-CN
hi
ja
Explanation:
- TextBlob(): Creates a text processing object for each sentence.
- .detect_language(): Automatically detects the language of the text using an internal API.
- The loop prints the detected language code for each sentence, for example: en (English), ru (Russian), es (Spanish), zh-CN (Chinese), hi (Hindi), ja (Japanese).
Using langid Library
langid is a standalone language identification tool pre-trained on 97 languages. It’s lightweight and doesn’t require an internet connection.
Install it using:
pip install langid
Example:
Python
import langid
texts = [
"Geeksforgeeks is a computer science portal for geeks",
"Geeksforgeeks - это компьютерный портал для гиков",
"Geeksforgeeks es un portal informático para geeks",
"Geeksforgeeks是面向极客的计算机科学门户",
"Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
"Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"
]
for t in texts:
print(langid.classify(t))
Output
('en', -119.93)
('ru', -641.34)
('es', -191.01)
('zh', -199.18)
('hi', -286.99)
('ja', -875.66)
Explanation:
- langid.classify(text) returns a tuple containing two values - the detected language code and a confidence score.
- The first element (e.g., 'en') indicates the language code.
- The second element is a log-probability score - a lower (more negative) value still represents a valid prediction, not necessarily lower confidence.
- For example, ('en', -119.93) means the model detected English with a log-probability score of -119.93.
Related Articles:
Explore
Python Fundamentals
Python Data Structures
Advanced Python
Data Science with Python
Web Development with Python
Python Practice