Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → LinXueyuanStdio → Latex_ocr_pro

LinXueyuanStdio / Latex_ocr_pro

Licence: gpl-3.0

🎨 数学公式识别增强版：中英文手写印刷公式、支持初级符号推导（数据结构基于 LaTeX 抽象语法树）

Labels

jupyter-notebook

Projects that are alternatives of or similar to Latex ocr pro

2020 benfords

Stars: ✭ 384 (-0.26%)

Mutual labels: jupyter-notebook

Vae cf

Variational autoencoders for collaborative filtering

Stars: ✭ 386 (+0.26%)

Mutual labels: jupyter-notebook

Data Science And Machine Learning From Scratch

Implements common data science methods and machine learning algorithms from scratch in python. Intuition and theory behind the algorithms is also discussed.

Stars: ✭ 387 (+0.52%)

Mutual labels: jupyter-notebook

Dawp

Jupyter Notebooks and code for Derivatives Analytics with Python (Wiley Finance) by Yves Hilpisch.

Stars: ✭ 383 (-0.52%)

Mutual labels: jupyter-notebook

Njucs Course Material From Yikaizhang

南京大学计算机系课程资料作业代码实验报告 NJU-CS 课程分享计划 🍚

Stars: ✭ 383 (-0.52%)

Mutual labels: jupyter-notebook

Multimodal Emotion Recognition

A real time Multimodal Emotion Recognition web app for text, sound and video inputs

Stars: ✭ 384 (-0.26%)

Mutual labels: jupyter-notebook

Bayesian Neural Network Blogpost

Building a Bayesian deep learning classifier

Stars: ✭ 382 (-0.78%)

Mutual labels: jupyter-notebook

Toyplot

Interactive plotting for Python.

Stars: ✭ 389 (+1.04%)

Mutual labels: jupyter-notebook

Quantumcomputingbook

Companion site for the textbook Quantum Computing: An Applied Approach

Stars: ✭ 386 (+0.26%)

Mutual labels: jupyter-notebook

Computer Vision Basics With Python Keras And Opencv

Full tutorial of computer vision and machine learning basics with OpenCV and Keras in Python.

Stars: ✭ 387 (+0.52%)

Mutual labels: jupyter-notebook

Transformers Tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Stars: ✭ 384 (-0.26%)

Mutual labels: jupyter-notebook

Supervisely Tutorials

🌈 Tutorials for Supervise.ly

Stars: ✭ 385 (+0%)

Mutual labels: jupyter-notebook

100daysofmlcode

Stars: ✭ 387 (+0.52%)

Mutual labels: jupyter-notebook

2019 Cs109a

https://harvard-iacs.github.io/2019-CS109A/

Stars: ✭ 384 (-0.26%)

Mutual labels: jupyter-notebook

Top250movie douban

TOP250豆瓣电影短评：Scrapy 爬虫+数据清理/分析+构建中文文本情感分析模型

Stars: ✭ 387 (+0.52%)

Mutual labels: jupyter-notebook

Human Activity Recognition Using Cnn

Convolutional Neural Network for Human Activity Recognition in Tensorflow

Stars: ✭ 382 (-0.78%)

Mutual labels: jupyter-notebook

Ufldl Tutorial

Deep Learning and Unsupervised Feature Learning Tutorial Solutions

Stars: ✭ 385 (+0%)

Mutual labels: jupyter-notebook

Bayesian changepoint detection

Methods to get the probability of a changepoint in a time series.

Stars: ✭ 387 (+0.52%)

Mutual labels: jupyter-notebook

Summerschool2015

Slides and exercises for the Deep Learning Summer School 2015 programming tutorials

Stars: ✭ 388 (+0.78%)

Mutual labels: jupyter-notebook

Dorn

Stars: ✭ 386 (+0.26%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

LaTeX_OCR_PRO

数学公式识别，增强：中文公式、手写公式

Seq2Seq + Attention + Beam Search。结构如下：

1. 搭建环境
2. 开始训练
3. 可视化
4. 部署
5. 评价
6. 更多细节
- 模型实现细节
- 解决方案
7. 致谢

1. 搭建环境

python3.5 + tensorflow1.12.2
[可选] latex (latex 转 pdf)
[可选] ghostscript (图片处理)
[可选] magick (pdf 转 png)

如果你想直接训练，不想自己构建数据集：

[可选] 新开一个虚拟环境

virtualenv env35 --python=python3.5
source env35/bin/activate

安装依赖

pip install -r requirements.txt     // cpu 版
pip install -r requirements-gpu.txt // gpu 版

下载数据集

git submodule init
git submodule update

如果你想自己构建数据集，然后再训练：

Linux

一键安装

make install-linux

或

安装本项目依赖

virtualenv env35 --python=python3.5
source env35/bin/activate
pip install -r requirements.txt

安装 latex (latex 转 pdf)

sudo apt-get install texlive-latex-base
sudo apt-get install texlive-latex-extra

安装 ghostscript

sudo apt-get update
sudo apt-get install ghostscript
sudo apt-get install libgs-dev

安装magick (pdf 转 png)

wget http://www.imagemagick.org/download/ImageMagick.tar.gz
tar -xvf ImageMagick.tar.gz
cd ImageMagick-7.*; \
./configure --with-gslib=yes; \
make; \
sudo make install; \
sudo ldconfig /usr/local/lib
rm ImageMagick.tar.gz
rm -r ImageMagick-7.*

Mac

一键安装

make install-mac

或

安装本项目依赖

sudo pip install -r requirements.txt

LaTeX

我们需要 pdflatex，可以傻瓜式一键安装：http://www.tug.org/mactex/mactex-download.html

安装magick (pdf 转 png)

wget http://www.imagemagick.org/download/ImageMagick.tar.gz
tar -xvf ImageMagick.tar.gz
cd ImageMagick-7.*; \
./configure --with-gslib=yes; \
make;\
sudo make install; \
rm ImageMagick.tar.gz
rm -r ImageMagick-7.*

2. 开始训练

生成小数据集、训练、评价

提供了样本量为 100 的小数据集，方便测试。只需 2 分钟就可以根据 ./data/small.formulas/ 下的公式生成用于训练的图片。

一步训练

make small

或

生成数据集

用 LaTeX 公式生成图片，同时保存公式-图片映射文件，生成字典 只用运行一次

# 默认
python build.py
# 或者
python build.py --data=configs/data_small.json --vocab=configs/vocab_small.json

训练

# 默认
python train.py
# 或者
python train.py --data=configs/data_small.json --vocab=configs/vocab_small.json --training=configs/training_small.json --model=configs/model.json --output=results/small/

评价预测的公式

# 默认
python evaluate_txt.py
# 或者
python evaluate_txt.py --results=results/small/

评价数学公式图片

# 默认
python evaluate_img.py
# 或者
python evaluate_img.py --results=results/small/

生成完整数据集、训练、评价

根据公式生成 70,000+ 数学公式图片需要 2-3 个小时

一步训练

make full

或

生成数据集

用 LaTeX 公式生成图片，同时保存公式-图片映射文件，生成字典 只用运行一次
```
python build.py --data=configs/data.json --vocab=configs/vocab.json
```

训练

python train.py --data=configs/data.json --vocab=configs/vocab.json --training=configs/training.json --model=configs/model.json --output=results/full/

评价预测的公式

python evaluate_txt.py --results=results/full/

评价数学公式图片

python evaluate_img.py --results=results/full/

3. 可视化

可视化训练过程

用 tensorboard 可视化训练过程

小数据集

cd results/small
tensorboard --logdir ./

完整数据集

cd results/full
tensorboard --logdir ./

可视化预测过程

打开 visualize_attention.ipynb，一步步观察模型是如何预测 LaTeX 公式的。

或者运行

# 默认
python visualize_attention.py
# 或者
python visualize_attention.py --image=data/images_test/6.png --vocab=configs/vocab.json --model=configs/model.json --output=results/full/

可在 --output 下生成预测过程的注意力图。

4. 部署

部署为 Django 应用

安装部署需要的环境
```
pip install django
```

开启服务

python manage.py runserver 0.0.0.0:8010

开启图片服务

cd data/images_train
python -m SimpleHTTPServer 8020

使用方法在输入框里依次输入 0.png, 1.png 等等，即可看到结果

5. 评价

指标	训练分数	测试分数
perplexity	1.12	1.13
EditDistance	94.16	93.36
BLEU-4	91.03	90.47
ExactMatchScore	49.30	46.22

perplexity 是越接近1越好，其余3个指标是越大越好。

其中 EditDistance 和 BLEU-4 已达到业内先进水平

将 perplexity 训练到 1.03 左右，ExactMatchScore 还可以再升，应该可以到 70 以上。

机器不太好，训练太费时间了。

6. 更多细节

模型实现细节

包括数据获取、数据处理、模型架构、训练细节
解决方案

包括 “如何可视化 Attention 层”、“在 win10 用 GPU 加速训练” 等等

7. 致谢

十分感谢 Harvard 和 Guillaume Genthial 、Kelvin Xu 等人提供巨人的肩膀。

论文：

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 385

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗