Wikidata:Property proposal/document keywords

‎document keywords

Originally proposed at Wikidata:Property proposal/Creative work

Not done

Description	Keywords associated with the text found in `<meta name="keywords">` of an HTML document.
Represents	subject heading (Q1128340)
Data type	String
Domain	scholarly article (Q13442814), news article (Q5707594), website (Q35127)
Allowed units	Comma-separated values
Example 1	Trust and nuanced profile similarity in online social networks (Q61963594)→`Social networks, recommender systems, trust`
Example 2	Why The Future Doesn't Need Us (Q2074626)→`future of work, future, opinion, wired classic, longreads, magazine-8.04`
Example 3	Wadoku (Q20850709)→official website (P856)→`https://www.wadoku.de/`→[document keywords]→`Japanisch-Deutsches Wörterbuch, Japanese-German dictionary, japanisch, deutsch, japanisch-deutsch, wadoku.de, wadoku, WaDokuJT, 和独, 和独辞典, 翻訳, 辞典, Wörterbuch, Übersetzung, 日本語, ドイツ語`
Expected completeness	always incomplete (Q21873886)
Robot and gadget jobs	A bot could add this as a qualifier to url statements
See also	author name string (P2093)
Single-value constraint	no
Distinct-values constraint	no

Motivation

Various websites, articles and scientific papers come with comma seperated tags that represent the content of a page often marked with the HTML tag <meta name="keywords" content="keywords, go, here, …"/>. Those can be actually helpful to find things using the good old search function (of this very website) as it finds strings in every entity. it might also be helpful in data mining. Not unlike author name string (P2093) values should ideally main subject (P921) or alias or something similiar.

I'll make sure Wikidata for Web (Q99894727) will be able to semi-automatically extract keywords from websites. –Shisma (talk) 19:26, 4 March 2024 (UTC)[reply]

Discussion

Comment a couple of thoughts on this:

* a comma-separated list is rather unstructured (and there's a length limit on values in Wikidata) - I think it would be better to have each string (separated by commas) entered as separate values.

* You could use main subject with <unknown value> and the string value as object named as (P1932) qualifier value.

ArthurPSmith (talk) 18:54, 5 March 2024 (UTC)[reply]

Conditional support if they can be separate values. I like it as either main statement or qualifier, no preference. If it is a main statement, it would be nice to qualify them with the item for their concept. -wd-Ryan (Talk/Edits) 03:33, 6 March 2024 (UTC)[reply]

I see your points. author name string (P2093) has the same issues (and solutions) doesn't it? How about I change the title to document keyword(s) and say it can be used as a claim or qualifier for url properties. – Shisma (talk) 08:10, 6 March 2024 (UTC)[reply]

Tend to

Oppose as not structured and not clearly resolvable to structured information; author name string (P2093) has the clear implication that it should in the future be resolvable to author (P50), and (it is stated that even something like) scope and content (P7535) has a particular structure to its values. Arthur's suggestion to use 'unknown' main subjects with P1932 qualifiers is much better. Mahir256 (talk) 16:38, 6 March 2024 (UTC)[reply]

Oppose - unstructured. Lacks a language attribute. Use "main topic". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:08, 12 March 2024 (UTC)[reply]
Oppose by Andy Mabbett Mfchris84 (talk) 20:36, 18 March 2024 (UTC)[reply]
Oppose for the reasons given above, plus: often contains misguided SEO attempts Jneubert (talk) 20:12, 27 March 2024 (UTC)[reply]

Not done. Lack of support; opposing comments were not addressed. Regards Kirilloparma (talk) 04:00, 28 March 2024 (UTC)[reply]