This repository contains benchmarking tools of various tokenizers.
To perform benchmarking, you have to run the following two steps.
The following commands prepare resources (e.g. model data) and compile source codes.
% git submodule update --init
% ./download_resources.sh
% ./compile_all.sh
To measure the speed of each tokenizer, run the following commands.
If you stop ./run_all.sh
in the middle, ./stats.py
will calculate statistics from avaiable results.
% ./run_all.sh | tee ./results
% ./stats.py < ./results
This software is developed by LegalForce, Inc., but not an officially supported LegalForce product.
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
For softwares under thirdparty
, follow the license terms of each software.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.