Detect an Unknown Language using Python

Last Updated : 22 Oct, 2025

Language detection is an essential task in Natural Language Processing (NLP). It involves identifying the language of a given text by analyzing its characters, words, and structure. Python provides several libraries to make this process simple and accurate.

In this article, we’ll explore three popular libraries for language detection:

langdetect
textblob
langid

Using langdetect Library

The langdetect module is a port of Google’s language-detection library and supports 55+ languages. It’s not included in Python’s standard library, so you need to install it first.

Install the library using:

pip install langdetect

Python

from langdetect import detect

print(detect("Geeksforgeeks is a computer science portal for geeks"))
print(detect("Geeksforgeeks - это компьютерный портал для гиков"))
print(detect("Geeksforgeeks es un portal informático para geeks"))
print(detect("Geeksforgeeks是面向极客的计算机科学门户"))
print(detect("Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है"))
print(detect("Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"))

Output

en
ru
es
zh-cn
hi
ja

Explanation: detect(): automatically identifies the most probable language for the given text using a pre-trained statistical model.

Using textblob Library

TextBlob is a powerful library for various NLP tasks such as sentiment analysis, translation, and language detection.

Install the library using:

pip install textblob

Example:

Python

from textblob import TextBlob

texts = [
    "Geeksforgeeks is a computer science portal for geeks",
    "Geeksforgeeks - это компьютерный портал для гиков",
    "Geeksforgeeks es un portal informático para geeks",
    "Geeksforgeeks是面向极客的计算机科学门户",
    "Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"
]

for t in texts:
    print(TextBlob(t).detect_language())

Output

en
ru
es
zh-CN
hi
ja

Explanation:

TextBlob(): Creates a text processing object for each sentence.
.detect_language(): Automatically detects the language of the text using an internal API.
The loop prints the detected language code for each sentence, for example: en (English), ru (Russian), es (Spanish), zh-CN (Chinese), hi (Hindi), ja (Japanese).

Using langid Library

langid is a standalone language identification tool pre-trained on 97 languages. It’s lightweight and doesn’t require an internet connection.

Install it using:

pip install langid

Example:

Python

import langid

texts = [
    "Geeksforgeeks is a computer science portal for geeks",
    "Geeksforgeeks - это компьютерный портал для гиков",
    "Geeksforgeeks es un portal informático para geeks",
    "Geeksforgeeks是面向极客的计算机科学门户",
    "Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"
]

for t in texts:
    print(langid.classify(t))

Output

('en', -119.93) ('ru', -641.34) ('es', -191.01) ('zh', -199.18) ('hi', -286.99) ('ja', -875.66)

Explanation:

langid.classify(text) returns a tuple containing two values - the detected language code and a confidence score.
The first element (e.g., 'en') indicates the language code.
The second element is a log-probability score - a lower (more negative) value still represents a valid prediction, not necessarily lower confidence.
For example, ('en', -119.93) means the model detected English with a log-probability score of -119.93.

Natural Language Processing using Polyglot - Introduction
NLP Libraries in Python

argha_c14

Improve

Article Tags :

Detect an Unknown Language using Python

Using langdetect Library

Using textblob Library

Using langid Library

Related Articles:

Explore

Python Fundamentals

Python Data Structures

Advanced Python

Data Science with Python

Web Development with Python

Python Practice

Thank You!

What kind of Experience do you want to share?