Wikidata:Property proposal/document keywords

‎document keywords

edit

Originally proposed at Wikidata:Property proposal/Creative work

   Not done
DescriptionKeywords associated with the text found in <meta name="keywords"> of an HTML document.
Representssubject heading (Q1128340)
Data typeString
Domainscholarly article (Q13442814), news article (Q5707594), website (Q35127)
Allowed unitsComma-separated values
Example 1Trust and nuanced profile similarity in online social networks (Q61963594)Social networks, recommender systems, trust
Example 2Why The Future Doesn't Need Us (Q2074626)future of work, future, opinion, wired classic, longreads, magazine-8.04
Example 3Wadoku (Q20850709)official website (P856)https://www.wadoku.de/[document keywords]Japanisch-Deutsches Wörterbuch, Japanese-German dictionary, japanisch, deutsch, japanisch-deutsch, wadoku.de, wadoku, WaDokuJT, 和独, 和独辞典, 翻訳, 辞典, Wörterbuch, Übersetzung, 日本語, ドイツ語
Expected completenessalways incomplete (Q21873886)
Robot and gadget jobsA bot could add this as a qualifier to url statements
See alsoauthor name string (P2093)
Single-value constraintno
Distinct-values constraintno

Motivation

edit

Various websites, articles and scientific papers come with comma seperated tags that represent the content of a page often marked with the HTML tag <meta name="keywords" content="keywords, go, here, …"/>. Those can be actually helpful to find things using the good old search function (of this very website) as it finds strings in every entity. it might also be helpful in data mining. Not unlike author name string (P2093) values should ideally main subject (P921) or alias or something similiar.

I'll make sure Wikidata for Web (Q99894727) will be able to semi-automatically extract keywords from websites. –Shisma (talk) 19:26, 4 March 2024 (UTC)[reply]

Discussion

edit
  Comment a couple of thoughts on this:
* a comma-separated list is rather unstructured (and there's a length limit on values in Wikidata) - I think it would be better to have each string (separated by commas) entered as separate values.
* You could use main subject with <unknown value> and the string value as object named as (P1932) qualifier value.
ArthurPSmith (talk) 18:54, 5 March 2024 (UTC)[reply]
  Conditional support if they can be separate values. I like it as either main statement or qualifier, no preference. If it is a main statement, it would be nice to qualify them with the item for their concept. -wd-Ryan (Talk/Edits) 03:33, 6 March 2024 (UTC)[reply]
I see your points. author name string (P2093) has the same issues (and solutions) doesn't it? How about I change the title to document keyword(s) and say it can be used as a claim or qualifier for url properties. – Shisma (talk) 08:10, 6 March 2024 (UTC)[reply]
Tend to   Oppose as not structured and not clearly resolvable to structured information; author name string (P2093) has the clear implication that it should in the future be resolvable to author (P50), and (it is stated that even something like) scope and content (P7535) has a particular structure to its values. Arthur's suggestion to use 'unknown' main subjects with P1932 qualifiers is much better. Mahir256 (talk) 16:38, 6 March 2024 (UTC)[reply]