Wikidata:Property proposal/changeset
changeset
editOriginally proposed at Wikidata:Property proposal/Computing
Description | should be used for references to version control changesets (where the source code repository that contains them is stated with P1324 for the subject of the statement) |
---|---|
Represents | changeset (Q5071951) |
Data type | String |
Example 1 | Leaflet (Q13685322)inception (P571)1 September 2010 with reference changeseteb5b7d706b9f439863172c522add77a0193dbda8 |
Example 2 | Rust (Q575650)software quality assurance (P2992)continuous integration (Q965769) with reference changeset9beb8f54774ca0d41dd2eb7622809f4073676757 |
Example 3 | Bootstrap (Q893195)build system (P11197)Rollup (Q114900810) with reference changeset9936bf59444c402b653f28449529eab83794e911 |
Example 4 | Linux kernel (Q14579)mascot (P822)Tuz (Q38806) with reference changeset8032b526d1a3bd91ad633dd3a3b5fdbc47ad54f1 |
Example 5 | SPARQL query for more examples |
See also | reference URL (P854), checksum (P4092) |
Motivation
editStatements about software often cite a particular changeset (Q5071951) via its identifying hashsum. Currently this is done by using reference URL (P854)
e.g. reference URL (P854)https://github.com/Leaflet/Leaflet/commit/eb5b7d706b9f439863172c522add77a0193dbda8
I suggest the introduction of a new "changeset" property so that we could instead state:
- changeseteb5b7d706b9f439863172c522add77a0193dbda8
Which has the following advantages:
- the repository hosting service (Q115203600) can change without the references becoming out of date
- it allows for efficient querying for such commit references via SPARQL (as opposed to having to FILTER through all reference URL (P854) references, see the query in Example 5 ... searching for commit references this way takes quite long)
- it allows for commit references to be reliably recognized as such even when a self-hosted Git web interface is used e.g. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8032b526d1a3bd91ad633dd3a3b5fdbc47ad54f1
In order to allow these commit hashes to still be resolved to URLs this proposal also introduces a new #commit formatter URL suffix property. So for example:
- changeseteb5b7d706b9f439863172c522add77a0193dbda8 for some statement about Leaflet (Q13685322)
- Leaflet (Q13685322)source code repository URL (P1324)https://github.com/Leaflet/Leaflet
web interface software (P10627)GitHub (Q364) - GitHub (Q364)commit formatter URL suffix/commit/$1
With these three statements a data consumer could resolve "eb5b7d706b9f439863172c522add77a0193dbda8" to "https://github.com/Leaflet/Leaflet" + "/commit/$1" where $1 is the hash → https://github.com/Leaflet/Leaflet/commit/eb5b7d706b9f439863172c522add77a0193dbda8.
This works great for the majority of software which has one primary source code repository. For software that has several source code repository URL (P1324) statements (e.g. the source code of CodeMirror (Q114901858) is split up across eight repositories) data consumers can instead link the commit via the search formatter URL (P4354) of the web interface software (P10627) e.g. https://github.com/search?q=eb5b7d706b9f439863172c522add77a0193dbda8 (and clicking on the "commits" menu item there yields the same commit).
--Push-f (talk) 12:09, 15 November 2022 (UTC)
Discussion
edit- Support, mostly for the URL suffix one. -wd-Ryan (Talk/Edits) 17:09, 15 November 2022 (UTC)
- Support. I've found myself using a commit link as a reference URL the pretty often; this would indeed be a more appropriate way to model such statements. --Waldyrious (talk) 17:42, 15 November 2022 (UTC)
- WikiProject Informatics has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Notified participants of WikiProject Websites. --Push-f (talk) 19:34, 15 November 2022 (UTC)
Weak oppose.Using reference URL (P854) works well enough. The proposed way to resolve commit hashes to URLs is too complicated and a burden on the implementation and performance of data consumers. Ensuring that references to commits are still clickable in the Wikidata web UI requires support from Wikibase developers. Dexxor (talk) 08:29, 16 November 2022 (UTC)- I'd say most data consumers don't care about commit references and those who do are better served by a dedicated property because if you actually want to use the commit hash (e.g. to retrieve the commit diff from the repository), you certainly do not want to be forced to 1) filter through all reference URL (P854) references (which takes so long that WDQS times out) and 2) deal with the smorgasbord of git-web-interface-specific URL formats that are out there.
- But you do make a good point that these references no longer being clickable would be a step back. This could however quite easily be addressed with a user script, of which we already have many and which can be easily enabled with the click of a checkbox in Special:Preferences#mw-prefsection-gadgets. To proof this claim, I will implement such a user script on https://test.wikidata.org, so you can try it out :) --Push-f (talk) 12:29, 16 November 2022 (UTC)
- @Dexxor: I implemented such a userscript, you can try it out on test.wikidata.org in three steps:
- Log in at https://test.wikidata.org/ with your usual credentials.
- Go to https://test.wikidata.org/wiki/Special:MyPage/common.js and add the following line:
mw.loader.load( '//test.wikidata.org/w/index.php?title=User:Push-f/gadgets/linkifyCommitHash.js&action=raw&ctype=text/javascript' ); // [[User:Push-f/gadgets/linkifyCommitHash.js]]
- Go to https://test.wikidata.org/wiki/Q33832 and check out the "commit hash" reference for the inception claim. Editing the reference and adding new references should work as expected. Note that if you change the rank of the second repository to normal the commit is linked to the GitHub search instead.
- Cheers, --Push-f (talk) 02:00, 17 November 2022 (UTC)
- Changing my !vote to Conditional support because of the user script. Since the proposed "commit formatter URL suffix" property will only be used on a handful of items, this information should be hardcoded in the user script. Dexxor (talk) 08:40, 17 November 2022 (UTC)
- I strongly disagree with your suggestion that this information should be hardcoded in the userscript. For several reasons:
- it would force other data consumers that want to turn the hashes into links to duplicate that information ... which just leads to duplicate effort
- the userscript would have to be updated every time somebody wants to add support for a new git web interface ... which would be problematic because I am planning on getting my userscript installed as a MediaWiki gadget, however only wiki admins can edit the MediaWiki:* namespace and we certainly don't want to bother the admins each time somebody wants to add a new suffix
- the suffix information is also of interest for other tooling e.g. it could be used to write a bot that automatically converts reference URL (P854) properties into "commit hash" properties ... so it only makes sense to have that information in Wikidata
- --Push-f (talk) 10:45, 17 November 2022 (UTC)
- But Help:Properties states: "When proposing properties, keep in mind that each property should be expected to be used by at least 100 items; if a proposed property cannot be used this many times, it likely should not be added to Wikidata (of course, there are exceptions to this rule)." Besides, storing the formatter URLs in Wikidata would allow anyone to change the web interface software (P10627) to a data item that links to a spam/phishing site. Dexxor (talk) 07:59, 18 November 2022 (UTC)
- Yes ?item wdt:instance of (P31)/wdt:subclass of (P279)* wd:repository web interface (Q115217949) currently only yields 33 results, however that number is only growing. There are new repository web interfaces being developed, as well as there are new distributed revision control system (Q1186723) being developed. Furthermore every time an instance of repository web interface (Q115217949) is forked, the fork (if notable) is eligible for a new "commit formatter URL suffix" statement as well.
- However even if it will take a decade for that number to grow to 100, I don't think that should discourage us from creating that property now because as you quoted "there are exceptions to this rule". The proposed "commit hash" property can be used thousands of times, we currently have 14,718 items that have a source code repository URL (P1324) claim. And there are many statements about pieces of software that can be sourced with a commit reference (as showcased by my examples). So if one property with <50 uses enables us to effectively use another property >10,000 times, then I think that's a very good tradeoff.
- I agree that vandalism is a problem on Wikidata (though I have not seen any formatter URL vandalism yet) however introducing a new property really does not change anything about this. You do realize that formatter URL (P1630) and search formatter URL (P4354) are currently used 7,361 and 1,960 times respectively? Introducing a new property does not change our attack surface to vandalism at all ... the attack surfaces is already nearly all of Wikidata since nearly all of Wikidata can be edited without even having to create an account.
- --Push-f (talk) 03:39, 19 November 2022 (UTC)
- If you edit an formatter URL (P1630), it takes at least 24h for the change to take effect – enough time for people to revert it. In addition, formatter URL (P1630) is exclusively used on properties (which are always semi-protected). The proposed "commit formatter URL suffix" would be much more prone to vandalism. If you really don't want to hardcode the suffixes in the script, you could try storing them as Tabular Data on Commons. Dexxor (talk) 09:45, 19 November 2022 (UTC)
- Thanks for elaborating, you make some good points. However the "commit formatter URL suffix" is appended to the value of source code repository URL (P1324), so there is really no way to abuse it to link to another website.
- "https://example.com/someuser/somerepo/" + $X will always link to example.com, no matter what $X is.
- I strongly believe that these suffixes should be part of Wikidata not only because everything else is inconvenient for data consumers but also because Wikidata should be self-contained, e.g. if you download the database dump from https://dumps.wikimedia.org/ you should also have these suffixes, as opposed to storing them on some random Commons page.
- --Push-f (talk) 18:34, 19 November 2022 (UTC)
- Thanks for clarifying, vandalism really isn't that much of a concern then. I still believe that resolving the commit hashes requires too much effort with little benefit over reference URL (P854), but I won't stand in the way of this proposal getting accepted. Dexxor (talk) 09:30, 20 November 2022 (UTC)
- If you edit an formatter URL (P1630), it takes at least 24h for the change to take effect – enough time for people to revert it. In addition, formatter URL (P1630) is exclusively used on properties (which are always semi-protected). The proposed "commit formatter URL suffix" would be much more prone to vandalism. If you really don't want to hardcode the suffixes in the script, you could try storing them as Tabular Data on Commons. Dexxor (talk) 09:45, 19 November 2022 (UTC)
- But Help:Properties states: "When proposing properties, keep in mind that each property should be expected to be used by at least 100 items; if a proposed property cannot be used this many times, it likely should not be added to Wikidata (of course, there are exceptions to this rule)." Besides, storing the formatter URLs in Wikidata would allow anyone to change the web interface software (P10627) to a data item that links to a spam/phishing site. Dexxor (talk) 07:59, 18 November 2022 (UTC)
- I strongly disagree with your suggestion that this information should be hardcoded in the userscript. For several reasons:
- Changing my !vote to Conditional support because of the user script. Since the proposed "commit formatter URL suffix" property will only be used on a handful of items, this information should be hardcoded in the user script. Dexxor (talk) 08:40, 17 November 2022 (UTC)
- @Dexxor: I implemented such a userscript, you can try it out on test.wikidata.org in three steps:
- Support Laftp0 (talk) 15:07, 16 November 2022 (UTC)
- I have relabled the proposed property from "commit hash" to "changeset" because "commit hash" is Git-specific lingo but the proposed property is also intended for other version control systems, such as Darcs (Q204377) and Pijul (Q63313646) which use different terms for changesets. --Push-f (talk) 00:28, 21 November 2022 (UTC)
- I have withdrawn my proposal of the "changeset formatter URL suffix" property in favor of my new proposal serves resource, which is much more powerful and useful. --Push-f (talk) 00:32, 21 November 2022 (UTC)
- @Push-f, Wd-Ryan, Waldyrious, Dexxor, Laftp0: Done --Tinker Bell ★ ♥ 17:26, 6 December 2022 (UTC)
changeset formatter URL suffix
editOriginally proposed at Wikidata:Property proposal/Generic
Description | string that has to be appended to the URL of a repository to link a specific changeset within the repository ("$1" can be automatically replaced with the changeset identifier) |
---|---|
Data type | String |
Domain | items that are instances of repository web interface (Q115217949) |
Example 1 | GitHub (Q364) → /commit/$1 |
Example 2 | GitLab (Q16639197) → /-/commit/$1 |
Example 3 | cgit (Q28974765) → /commit/?id=$1 |
Example 4 | Gitea (Q28714270) → /commit/$1 |
Example 5 | Gitiles (Q111038621) → /+/$1 |
Example 6 | SourceHut (Q78514485) → /commit/$1 |
Example 7 | Bitbucket (Q2493781) → /commits/$1 |
Example 8 | Gitweb (Q97460957) → /commit/$1 |
Example 9 | hgweb (Q112183167) → /rev/$1 |
Example 10 | Nest (Q115217392) → /changes/$1 |
Example 11 | Pagure (Q111750799) → /c/$1 |
Example 12 | Gogs (Q21091728) → /commit/$1 |
Example 13 | Codeberg (Q106102182) → /commit/$1 |
Example 14 | stagit (Q97456355) → /commit/$1.html |
Example 15 | Heptapod (Q111410232) → /-/commit/$1 |
Example 16 | Fossil (Q1439431) → /info/$1 |
Example 17 | darcsden (Q115267340) → /patch/$1 |
See also | search formatter URL (P4354) |
See #changeset for the motivation and discussion.
- Withdrawn in favor of my new proposal serves resource. --Push-f (talk) 00:32, 21 November 2022 (UTC)