I propose that a checksum field be added to the text table that reflects the contents of the "old_text" column.
Reasoning:
- This column could then be checked against at each insertion into the text table in order to avoid saving the same text multiple times. This could reduce the necessary disc footprint required to store an amount of revisions.
- Exposing this checksum field via the API would allow developers to know when it is necessary to request the text of a revision. They may already have it a copy of it that was previously requested and can simply use that.
- The checksum field would provide a quick mechanism for detecting identity reverts. By identity revert, I mean a reversion of a page to an *exact* previous state such that both revisions would contain text of the same checksum.
Disclaimer:
I develop user scripts for the English Wikipedia and would find a wide range of uses for this feature. In Wikipedia, identity reverts are very common. They are one of the basic operations that Wikipedians become familiar with. I am not sure if this applies to MediaWiki installations in general, but my intuition says that it its likely.
Version: unspecified
Severity: enhancement