Identifiers are currently mixed with other statements. This makes it harder to scan the statement section and find relevant information. We should treat them better by giving them their own section in the UI and moving linking of the identifiers into Wikibase proper instead of a gadget.
Description
Status | Subtype | Assigned | Task | |
---|---|---|---|---|
· · · | ||||
Invalid | Lydia_Pintscher | T93766 Provide Authority Control within an extension functionality instead of a gadget | ||
Open | None | T87316 [Story] Redesign statement section | ||
Duplicate | None | T107407 [Story] Parser function should link string values when formatter is available | ||
Resolved | None | T95287 [Epic] better handling of identifiers in the UI | ||
Resolved | Bene | T95684 [Story] Format identifiers as Links | ||
Resolved | hoo | T95682 [Task] Add a new datatype for identifiers | ||
Resolved | hoo | T95686 [Task] write a maintenance script to migrate properties from string to new identifier datatype | ||
Resolved | Lydia_Pintscher | T117421 [Story] Put identifiers into separate section in the UI | ||
Resolved | daniel | T125647 [Task] Summarize consequences for users related to identifier datatype deployment | ||
· · · |
Event Timeline
We discussed this some more today and decided to go with a new datatype. The UI will separate identifiers out based on this datatype. This is the cleanest solution.
Some things we also discussed and need to keep in mind: We want to use the formatter URL statement on the property to construct a link for the property in the UI. This might or might not be the same one we use for the expanded export in JSON. We will likely need a second property formatter URI for this usecase.
(Picture of discussion notes attached for archiving purpose)
You must be able to migrate the existing properties without changing their IDs.
Otherwise the whole think will BRRREEEEAAAAK seriously.
Yes. In some rare cases we can change the datatype. This is one of them. That's what the maintenance script is for.
At least pywikibot assumes that the data type will never change for a property (https://github.com/wikimedia/pywikibot-core/blob/669ea31a4c7849c576e5fb5b67b6bb5d6c6bc4e9/pywikibot/site.py#L5581), I'm not sure if any other client libraries do as well. Is that an bad assumption to make?
For the value type it is a relatively safe assumption to make. This in this case is string. For the data type it is a safe assumption to make most of the time. This in this case is changing from string to identifier. There will only be very few cases like this one where we'll do this. Another one I could imagine is later adding a formula datatype in case Wikidata is using string for that before we introduce it and wants to keep the property ids again. Other than that I can't think of any right now.
Another potential case of changing from string to identifier is later adding a query datatype.
Personally I have always thought that identifiers like this should probably be sitelinks.
I also once had some discussions with calling site links identifiers.
This makes the data model less wikipedia specific while doing what is discussed above.
The alternative could be get rid of sitelinks, do sitelinks and identifiers in claims and implement either option 2 or 3 for them!
Are references actually used on identifiers? I think the identifier itself is its reference.
The same question applies to qualifiers. Are they actually used/useful for identifiers?
@Bene that is what I was initially thinking, but I guess there could also be other references.
Next steps, as discussed in todays story time:
- Create the new data type. Its possible to create new entities with the new data type.
- Write maintenance script.
- Create list of candidate properties that should probably be migrated (e.g. select all that either have a relevant instanceOf and/or one of the URL/URI statements).
- Ask the community if this list is ok. Point to the list of all string properties, we may have missed some.
- Do the data type conversion. Announce! This is a breaking change.
Deploy.
- Adapt JSON export.
- Make the formatter(s) create clickable links for identifier values.
Deploy.
- JS UI will break in interesting ways (does it already support multiple statements sections?).
- Make PHP view aware of the new type, show identifiers in a new section below the other statements (but reuse most, if not all of the formatting code).
- Do we need to change the suggester(s)? What happens if you add an identifier property in the wrong section? It will show up there and "jump" to the other after reload. Decision: Acceptable for now.
- Make the Wikibase-Quality-Constraints extension(s) aware of the new type. Can possibly be skipped for now because the regex checks are disabled anyway.
- Adapt RDF export. This is a breaking change in the simple mapping, but probably not in the full mapping.
Are catalog codes (P528) also identifiers in this scheme? Currently, it’s unclear whether catalog code statements (with the catalog itself as qualifier) should include the catalog prefix or not (e. g. “BWV 565” vs just “565”), so I’d be very happy if this new datatype could solve that problem as well.
In my opinion catalog codes are identifiers as well. I don't think the datatype will solve the prefix discussion for you though. You have to decide one way or another and stick to it.
Well, if we want to convert catalog codes also to urls in the ui, we need the string that can be inserted to construct the url as the stored value.
Sorry if that is just late trolling but I don't see any reason why we should need references or qualifiers on url-generating identifiers, and so I do not see any reason why we should not have sitelinks. Sitelink were made precisely for those one to one relations. The only reason I see against sitelink is that sometimes, the linked website has dupes in its identifiers, but I don't think it is a really convincing argument.
The only reason I see against sitelink is that sometimes, the linked website has dupes in its identifiers, but I don't think it is a really convincing argument.
In light of a recent discussion on the mailing lists I think it is actually very important to be able to keep duplicate identifiers here
There might be ways to customize the number of sitelinks we accept for each site ? But actually I just realized that though adding references to statements about identifiers seems rather pointless to me, using identifier-properties as references in other statements would be useful.
Another thing I meant to mention: it would be nice to have an option for making the URL language dependent like
English: http://www.emporis.com/buildings/114095 but German: http://www.emporis.de/buildings/114095