This repository contains source code of AI-based structured web data extractor.
- 👨💻 Author: Jan Joneš
- 📜 Thesis: PDF, assignment, submission, slides
- 🚀 Demo: live, Docker Hub, examples below
- 🗃️ Data: SWDE with visuals
- 📂
awe/: Python module (data manipulation and machine learning). Seeawe/README.md. - 📂
js/: Node.js app (visual attribute extractor and inference demo). Seejs/README.md. - 📂
docs/- 📂
dev/ - 📄
data.md: dataset preparation. - 📄
extractor.md: running the visual extractor. - 📄
train.md: training instructions. - 📄
release.md: release instructions. - 📂
demo/
- 📂
docker pull janjones/awe-demo
docker run --rm -it -p 3000:3000 janjones/awe-demoOpen a web browser and navigate to http://localhost:3000/.
For more details, see docs/demo/run.md.
docker pull janjones/awe-gradient
docker run --rm -it -v awe:/storage -p 3000:3000 janjones/awe-gradient bashThen, run inside the Docker container:
git clone https://github.com/jjonescz/awe .
git clone https://github.com/jjonescz/swde-visual data/swde
python -m awe.training.params
python -m awe.training.train
# Model is trained, now you can run the demo.
cd js
pnpm install
DEBUG=1 pnpm run serverFor more details, see
Generated by the live demo.

