Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: html5lib/html5lib-tests
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: a9f4496
Choose a base ref
...
head repository: html5lib/html5lib-tests
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: b6c4e3f
Choose a head ref
  • 2 commits
  • 2 files changed
  • 1 contributor

Commits on Aug 24, 2020

  1. Test the (meta) prescan algorithm

    This change adds a `preparsed` subdirectory in the `encoding` directory,
    with tests for which the result of the *encoding sniffing algorithm* at
    https://html.spec.whatwg.org/#encoding-sniffing-algorithm is the
    expected result — that is, tests for which the expected result is the
    output of running *only* the encoding sniffing algorithm (of which the
    main sub-algorithm is the so-called “meta prescan”) — without
    also running the tokenization state machine and tree-construction stage.
    
    This change also adds a README file that explicitly documents what the
    expected results for the encoding tests are, based on whether or not
    they’re in the `preparsed` subdirectory.
    
    Without those changes, it’s unclear whether the expected results shown
    in the existing tests are for the output of fully parsing the test data —
    through the tokenization state machine and tree-construction stage — or
    instead just the output of the encoding sniffing algorithm only. And
    without those changes, we also don’t have any tests a system can use for
    testing only the output from the encoding sniffing algorithm.
    
    Fixes #28
    sideshowbarker committed Aug 24, 2020

    Verified

    This commit was signed with the committer’s verified signature.
    Copy the full SHA
    1e10bdb View commit details
  2. Verified

    This commit was signed with the committer’s verified signature.
    Copy the full SHA
    b6c4e3f View commit details
Showing with 93 additions and 0 deletions.
  1. +42 −0 encoding/README.md
  2. +51 −0 encoding/preparsed/tests1.dat
42 changes: 42 additions & 0 deletions encoding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Encoding Tests
==============

Each file containing encoding tests has any number of tests separated by
two newlines (LF) and a single newline before the end of the file:

[TEST]LF
LF
[TEST]LF
LF
[TEST]LF

...where [TEST] is the format documented below.

Encoding test format
====================

Each test must begin with a string "\#data", followed by a newline (LF).
All subsequent lines until a line that says "\#encoding" are the test data
and must be passed to the system being tested unchanged, except with the
final newline (on the last line) removed.

Then there must be a line that says "\#encoding", followed by a newline
(LF), followed by string indicating an encoding name, followed by a newline
(LF). The encoding name indicated is the expected character encoding for
the output with the given test data as input.

For the tests in the `preparsed` subdirectory, the encoding name indicated
is the expected result of running the *encoding sniffing algorithm* at
https://html.spec.whatwg.org/#encoding-sniffing-algorithm with the given
test data as input; this is, it's the expected result of running *only* the
*encoding sniffing algorithm* — without also running the tokenization state
machine and tree-construction stage defined in the spec — and specifically,
for running the *prescan the byte stream to determine its encoding*
https://html.spec.whatwg.org/#prescan-a-byte-stream-to-determine-its-encoding
algorithm on only the first 1024 bytes of the test data.

For all tests outside the subdirectory named `preparsed`, the encoding name
indicated is instead the expected character encoding for the output after
fully parsing the given test data; that is, it's the expected character
encoding for the output after running the tokenization state machine and
tree-construction stage.
Loading