350 NLP Projects With Code
350 NLP Projects With Code
350 NLP Projects With Code
Projects
with Code
The Most Powerful NLP-Weapon Arsenal
Himanshu Ramchandani
M.Tech | Data Science
NLP Migrant Workers' Paradise: Almost the most complete
Chinese NLP resource library
In the process of getting started and getting familiar with NLP, I used a lot of packages
on github, so I sorted it out and shared it here.
⭐
Many bags are very interesting and worth collecting, satisfying everyone's collection
addiction! If you find it useful, please share and star ,thanks!
❤️❤️❤️
Long-term irregular updates, welcome to watch and fork!
🍆🍒🍐🍊 🌻🍓🍈🍅🍍
* Corpus * Document Processing
* Others
corpus
Resource name Description Link
(Name)
42GB of JD github
Customer Service
Dialogue Data
(CSDD)
Multi-Document github
Summarization
Dataset
Cantonese/English github
Conversational
Bilingual Corpus
Nomenclature github
recognition data set
of person-like
names/place
names/organization
names
: 0.37006739587
Polyphone github
dictionary data and
codes
char_featurizer - github
Chinese character
feature extraction
tool
token2index is a github
powerful lightweight
term index library
compatible with
PyTorch/Tensorflow
Language/Knowledge github
Representation Tool BERT
& ERNIE
XLMFacebook's github
cross-language pre-trained
language model
extract
Resource name (Name) Description Link
text generation
Resource name (Name) Description Link
Texar Toolkit for Text github
Generation and
Beyond
text summary
Resource name (Name) Descriptio Link
n
Smart Q&A
Resource name (Name) Description Link
CommonsenseQA link
Commonsense-Oriented
English QA Challenge
Winner of Text Smart Proofreading It has been applied, from the link
Contest team of Soochow University
and Dharma Academy
multimodal
Resource name Description Link
(Name)
speech processing
Resource name (Name) Description Link
data_thchs30tgz
test-noisetgz-Open
SLR domestic
image test-noisetgz
resourcetgz-OpenS
LR domestic image
resourcetgz
Free ST Chinese
Mandarin Corpus
Free ST Chinese
Mandarin Corpus
AIShell-1 open
source version
dataset-OpenSLR
domestic image
AIShell-1 open
source version
dataset
Primewords
Chinese Corpus Set
1-OpenSLR
Domestic Mirror
Primewords
Chinese Corpus Set
1
Chinese/English github
Pronunciation Dictionary
for Speech Recognition
Multilingual speech-text Includes audio, text github
translation corpus released transcription and
by CoVoSTEFacebook English translation in 11
languages (French,
German, Dutch,
Russian, Spanish,
Italian, Turkish, Persian,
Swedish, Mongolian
and Chinese)
document processing
Resource name Description Link
(Name)
LayoutLM-v3 github
Document
Understanding
Model
Single-document github
unsupervised
keyword
extraction
DocSearch Free github
Documentation
Search Engine
form processing
Resource name (Name) Description Link
model
text match
Resource name (Name) Description Link
Regular IDCards_pattern =
expression for r'^([1-9]\d{5}[12]\d{3}(0[1-9]|1[01
extracting ID 2])(0[1-9]|[12][0
number -9]|3[01])\d{3}[0-9xX])
IDs =
re.findall(IDCards_pattern, text,
flags=0)
Tencent QQ [1-9]([0-9]{5,11})
number regular
expression
Domestic [0-9-()()]{7,18}
fixed-line number
regular expression
Regular github
Expression
Tutorial
text search
Resource name (Name) Description Link
Efficient string matching tool a fast string matching library for github
RapidFuzz Python and C++, which is using
the string similarity calculations
from FuzzyWuzzy
reading comprehension
Resource name (Name) Descriptio Link
n
emotion analysis
Resource name (Name) Description Link
aspect sentiment analysis github
package
event extraction
Resource name (Name) Descriptio Link
n
machine translation
Resource Description Link
name (Name)
no way The command line version of Youdao Dictionary github
dictionary supports English-Chinese mutual search and online
search
digital conversion
Resource name (Name) Descriptio Link
n
anaphora resolution
Resource name (Name) Descriptio Link
n
text clustering
Resource name (Name) Descriptio Link
n
Text Categorization
Resource name (Name) Descriptio Link
n
knowledge reasoning
Resource name (Name) Descriptio Link
n
text attack
Resource name (Name) Description Link
text visualization
Resource name (Name) Description Link
language detection
Resource Description Link
name
(Name)
langid 97 https://github.com/saffsd/langid.py
languages
detected
comprehensive tool
Resource name Description Link
(Name)
jieba jieba
hanlp hanlp
nlp4han Chinese natural language processing tool set github
(sentence segmentation/word
segmentation/part-of-speech
tagging/chunking/syntax analysis/semantic
analysis/NER/N-gram/HMM/pronoun
resolution/sentiment analysis/spelling check
jieba_fast github
accelerated version
of jieba
Some papers and Including topic model, word vector (Word github
codes related to nlp Embedding), named entity recognition
(NER), text classification (Text Classificatin),
text generation (Text Generation), text
similarity (Text Similarity) calculation, etc.,
involving various nlp-related Algorithm,
based on keras and tensorflow
Recurrence of github
vectorized recall
pipelines commonly
used in the industry
based on DSSM
LineFlow is an github
efficient NLP data
loader for all deep
learning frameworks
Analysis of github
girlfriend's
emotional
fluctuations
Contest
Resource name (Name) Descriptio Link
n
text to image
Resource name Description Link
(Name)
other
Resource name (Name) Description Link
Meta-architecture of github
CHAMELEON deep learning
news recommendation system
Join the Data Science & ML Full Stack WhatsApp Group Community here:
If the group is full, please join another one.
https://chat.whatsapp.com/B7Mdp6QTMJ0KZYGWrziT3Y
https://chat.whatsapp.com/HWDSJU4KXrXJIcn5Npp3Gm
https://chat.whatsapp.com/DmATV5uaVY7IKrTMHDiHnr
https://chat.whatsapp.com/Blz2n8QYSgdKWfQbJZxHtJ