Open In App

Detect an Unknown Language using Python

Last Updated : 22 Oct, 2025
Comments
Improve
Suggest changes
5 Likes
Like
Report

Language detection is an essential task in Natural Language Processing (NLP). It involves identifying the language of a given text by analyzing its characters, words, and structure. Python provides several libraries to make this process simple and accurate.

In this article, we’ll explore three popular libraries for language detection:

  • langdetect
  • textblob
  • langid

Using langdetect Library

The langdetect module is a port of Google’s language-detection library and supports 55+ languages. It’s not included in Python’s standard library, so you need to install it first.

Install the library using:

pip install langdetect

Python
from langdetect import detect

print(detect("Geeksforgeeks is a computer science portal for geeks"))
print(detect("Geeksforgeeks - это компьютерный портал для гиков"))
print(detect("Geeksforgeeks es un portal informático para geeks"))
print(detect("Geeksforgeeks是面向极客的计算机科学门户"))
print(detect("Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है"))
print(detect("Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"))

Output

en
ru
es
zh-cn
hi
ja

Explanation: detect(): automatically identifies the most probable language for the given text using a pre-trained statistical model.

Using textblob Library

TextBlob is a powerful library for various NLP tasks such as sentiment analysis, translation, and language detection.

Install the library using:

pip install textblob

Example: 

Python
from textblob import TextBlob

texts = [
    "Geeksforgeeks is a computer science portal for geeks",
    "Geeksforgeeks - это компьютерный портал для гиков",
    "Geeksforgeeks es un portal informático para geeks",
    "Geeksforgeeks是面向极客的计算机科学门户",
    "Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"
]

for t in texts:
    print(TextBlob(t).detect_language())

Output

en
ru
es
zh-CN
hi
ja

Explanation:

  • TextBlob(): Creates a text processing object for each sentence.
  • .detect_language(): Automatically detects the language of the text using an internal API.
  • The loop prints the detected language code for each sentence, for example: en (English), ru (Russian), es (Spanish), zh-CN (Chinese), hi (Hindi), ja (Japanese).

Using langid Library

langid is a standalone language identification tool pre-trained on 97 languages. It’s lightweight and doesn’t require an internet connection.

Install it using:

pip install langid

Example: 

Python
import langid

texts = [
    "Geeksforgeeks is a computer science portal for geeks",
    "Geeksforgeeks - это компьютерный портал для гиков",
    "Geeksforgeeks es un portal informático para geeks",
    "Geeksforgeeks是面向极客的计算机科学门户",
    "Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"
]

for t in texts:
    print(langid.classify(t))

Output

('en', -119.93) ('ru', -641.34) ('es', -191.01) ('zh', -199.18) ('hi', -286.99) ('ja', -875.66)

Explanation:

  • langid.classify(text) returns a tuple containing two values - the detected language code and a confidence score.
  • The first element (e.g., 'en') indicates the language code.
  • The second element is a log-probability score - a lower (more negative) value still represents a valid prediction, not necessarily lower confidence.
  • For example, ('en', -119.93) means the model detected English with a log-probability score of -119.93.

Article Tags :

Explore