Wikidata:Lexicographical data/Ideas of tools
This page is a list of ideas of tools and features that Wikidata and Wiktionary editors may need on the top of lexicographical data on Wikidata.
If you want to formulate a specific need, if you have an idea, if you think that something is essential for editing or reusing lexicographical data, please add a section and follow the template!
This list is not a commitment from the Wikidata development team. The needs expressed below will be analyzed, prioritized, developed or not during the next steps of the project, by the Wikidata team, other developers, or volunteers.
German articles game
[edit]- Need
- I'm learning German and I'm struggling with remembering the articles of the nouns (der, die, das). I wish I had a game to practice and learn. The game would present me a noun, and I'd have to guess the article. If I'm wrong, I'd see the correct answer appearing. There could be several levels, from easy daily-life words to more unusual ones. There could be thematic levels, like one only about food.
- Who would benefit
- Anyone who struggles with German, so a lot of people :D
- Proposition of solution
- This would be an external tool, for example a website or an app, based on Wikidata's lexicographical data. It would select random Lexemes with the condition of being German nouns (example of data structure), display the lemma and check my answer against the grammatical gender. It could also display one gloss or other information about the noun.
- Further comments
- Access lemma and grammatical gender will be easy with Wikidata's API. Selecting the words by theme would be feasible using the connection between the lexemes (words) and the items (concepts). I'm wondering how the software could sort the words by level and decide which are "easy", "daily life-related" or not.
- Proposer
- Lea Lacroix (WMDE) (talk) 15:21, 28 November 2017 (UTC)
- Discussions
Lacroix' game is now running at http://auregann.fr/derdiedas/ Cloned versions of it are now available for French and Danish. The Danish is here: https://tools.wmflabs.org/enet/
Spell-checker
[edit]- Need
- A tool to help finding error in Wikisources by spotting words that don't have L item
- Who would benefit
- the Wikisources and the L items creation
- Proposition of solution
- Ideally included in the edition interface of Wikisource, the tool would highlight words that doesn't exist (and eventually suggest a replacement in correction?)
- Further comments
- it's a both way relation: at the beginning, it will be more useful to help create new L items but in a second time, when all "common" words would have a L item it will be more useful for Wikisource (still, new books with new words appears everyday on the Wikisources)
- it could probably be used in other Wikimedia projects but it would more helpful for the Wikisources where we do faithful transcription of old books with a lot of strange spellings and variations that usual spell-checkers wrongfully consider as errors.
- Discussions
- Support 150% as a wikisource editor/admin Hsarrazin (talk) 12:25, 1 April 2020 (UTC)
Check pop-up
[edit]- Need
- A popup in the style of a navigation popup in the edit windows.
- Who would benefit
- Every project
- Proposition of solution
- Check for information like syllabification, translation, correct spelling, etc.
- Further comments
- Discussions
SVG translation
[edit]- Need
- Use Litem labels to translate SVG files. Taking the
<switch>
element idea in multiple languages SVG files - like File:Wikidata nodes in white.svg - to the next level and make it easier (right now, it's done by hand in a text editor).
- Who would benefit
- From readers side : anyone not speaking English (so most people on Earth)
- From contributors side : easier and faster to translate files, translation in more languages, etc.
- Proposition of solution
- not sure for the technical part, but SVG is extensible and foreign code can easily be embarked in it (directly in the
<switch>
element or maybe with the<script>
element, someone more tech savvy should look at the specifications in particular this section), I guess the problem will more be on how Mediawiki understand this code. - a downgraded solution would be to don't do it dynamically but to export the Litem labels inside the current structure of the multiple languages SVG files
- Further comments
- Proposer
- VIGNERON (talk) 13:54, 18 February 2018 (UTC) (idea suggested on Facebook by Pigsonthewing)
- Discussions
Is it poetry?
[edit]- Need
- tool to tell if a text is a poem or not (can help categorization of pages on Wikisource and adding statements on Wikidata item)
- Who would benefit
- Proposition of solution
- Take the last word of each lines and look if the pronunciation rhymes. A upgraded version could probably do more analysis (what type of rhyme, what type of poetry, etc.).
- Further comments
- Proposer
- VIGNERON (talk) 12:03, 15 April 2018 (UTC)
- Discussions
Text modernisation
[edit]- Need
- modernisation of texts
- Who would benefit
- old books on Wikisources
- Proposition of solution
On the french Wikisource, there is a tool for modernisation of old texts : a js gadget s:fr:MediaWiki:Gadget-modernisation.js and a dictionary s:fr:Wikisource:Dictionnaire (this is the general one, a template s:fr:Modèle:Modernisation is used for word to modernize specifically in a text - rare words or words who would generate false-positive in other texts).
- Further comments
I'm not sure how Lexicographical data can help. The gadget works good but (AFAIK) exist only on the French Wikisourcefew projects and we have trouble from time to time for the dictionary (false-positives which need to be moved from the general dictionary to the specific one) as is only plain text string "old word: new word". The lexicographical data could be more precise and at least give a point in time (which would avoid a lot of false-positives).
- Proposer
- VIGNERON (talk) 09:34, 23 April 2018 (UTC)
- Discussions
Wikidata Cognate
[edit]- Need
For Wiktionary there is the mw:Extension:Cognate that links pages with the same title, however in Wikidata there will be lexemes with the same title but with different languages (thus in different pages), so it would be good to link them somehow.
- Who would benefit
Anyone who wants to navigate between the 12 languages that have fest as a lexeme.
- Proposition of solution
A drop-down menu on the lexeme page with the different languages that have the same lexeme.
- Further comments
- Proposer
--Micru (talk) 08:54, 2 May 2018 (UTC)
- Discussions
Practice a new language
[edit]- Need
A game to learn the basis of a new language could be created.
- Learn new words: Five words and their respective translations are shown and then you have to guess the correct translation of some words shown before between 4 choises or write down the correct translation of the words. Letters may be given. This could be repeated many times;
- Learn some grammar basis: as another proposal is related to practice a new language, I think everyone should add language-specific games. Even Italian language has many articles (
il
,lo/l'
,la
for singular,i
,gli
,le
for plural) so even Italian should have a similar game. Other games should be to learn how to decline verbes and conjugate words.
You can choose if have 3 times to correctly guess, having time, or just earn points in case of correct answer.
- Who would benefit
- Everyone who wants to learn a new language and don't want to use commercial apps (Duolingo, Memrise, Babel, ...)
- Proposition of solution
- a new website should be created. A very simple version should just start with inserting the language you want to learn and what do you want to learn (new words, articles, declinations, ...). A more complicated version should also implement a login interface where to see presonal progress, see personal errors, ...
- Further comments
- This proposal contains "German articles game"
- Proposer
- ★ → Airon 90 12:41, 17 July 2018 (UTC)
- Discussions
Wikify with lexemes
[edit]- Need
Take a text, e.g. from Wikisource, and wikify it with lexemes from Wikidata. Maybe start with identifying verbs in each sentence, then other parts. Offer to create missing lexemes.
- Who would benefit
- Proposition of solution
- Try to identify verbs, locutions first
- GUI allow selection of various possible lexemes (based on forms/lemmas)
- basic mode: just highlight strings missing from forms (or just lemmas). Could be a first milestone for use.
- offer to add samples to Wikidata.
- offer to create missing Lexemes
- Further comments
- it seems to me that text-to-lexemes does it quite efficiently :) --Hsarrazin (talk) 12:29, 1 April 2020 (UTC)
- Proposer
--- Jura 12:17, 5 August 2018 (UTC)
- Discussions
- see Wikidata_talk:Lexicographical_data#Wikifying_with_lexemes?
- somewhat similar to #Spell-checker above, but a more permanent tool.
- every word in the text should be annotated with the form and the sense, not just the lexeme – The preceding unsigned comment was added by Denny (talk • contribs).
- see also https://annotation.wmcloud.org/ for a similar idea --DVrandecic (WMF) (talk) 22:09, 10 February 2021 (UTC)
- More info on this at m:Abstract Wikipedia/Updates/2021-02-10#Annotation interface. Quiddity (WMF) (talk) 20:53, 11 February 2021 (UTC)
- @Quiddity (WMF): @DVrandecic (WMF): Where can this tool be found today? Asaf Bartov (talk) 16:46, 26 September 2022 (UTC)
- Unfortunately, currently dead (it had to be switched off due to spam). I would love to set it up somewhere again, if someone helps (with spam, and keeping the wiki up). I would be happy to maintain the software. -- DVrandecic (WMF) (talk) 18:31, 26 September 2022 (UTC)
- @Quiddity (WMF): @DVrandecic (WMF): Where can this tool be found today? Asaf Bartov (talk) 16:46, 26 September 2022 (UTC)
- More info on this at m:Abstract Wikipedia/Updates/2021-02-10#Annotation interface. Quiddity (WMF) (talk) 20:53, 11 February 2021 (UTC)
Print version
[edit]- Need
Generate a printable set of dictionaries solely from Wikidata entities. This could include:
- a monolingual dictionary
- a bilingual dictionary
- a specialized dictionary or word list.
- Who would benefit
- Wikidata contributors (helps visualize possible output, needed data, comparison with old dictionaries)
- users learning languages or terminology
- Proposition of solution
The set should draw from structured data in Wikidata to output all parts generally in found in dictionaries: introduction, methodology, primary entries, indexes.
Elements that can be included beyond primary entries are:
- []
To avoid wasting paper, the proof of concept version could be limited to a defined number of entries, e.g. 100 words. It should also work with a higher number.
- Further comments
- Proposer
- --- Jura 17:05, 6 October 2018 (UTC)
- Discussions
Wordmap
[edit]- Need
- display a map, and overlay it with words from languages spoken in the given region, where all of them have the same meaning (so, the result of this query but displayed on a map)
- Who would benefit
- would look nice, but also everyone interested in languages
- Proposition of solution
- ideally that would be just a SPARQL query result, but I am not sure that is possible out of the box
- Further comments
- Proposer
- Denny (talk) 18:09, 19 October 2018 (UTC)
- Discussions
@Denny: I just pushed a first version of a wordmap based on wikidata. Any feedback would be really welcome! It is not exactly the same of what you proposed (as I am using labels). Here is the link: Wikidata Wor(l)dmap
Epantaleo (talk) 11:35, 19 October 2019 (UTC)
- @Epantaleo Interesting. There only seems to be two languages with coordinates. Maybe you would get more entries by using official languages of countries, and using the country's coordinates (see query). Your visualization will have to account for several languages for one coordinates, and several coordinates for some countries - e.g. Denmark. Just an idea. Robertsilen (talk) 09:19, 27 October 2022 (UTC)
Etymology graph
[edit]- Need
- create a graph of the etymology of a word, and all other words with the same meaning
- Who would benefit
- would look nice
- Proposition of solution
- hopefully just a SPARQL query?
- Further comments
- Proposer
- Denny (talk) 18:12, 19 October 2018 (UTC)
- Discussions
@ Denny: I had troubles using graphs/trees to visualize the complex and big directed graph of etymologically related words (also because of incorrect etymology links). I tried an alternative visualization (see below). In the future I plan to add word definitions to it: it should be pretty straightforward.
See Visualization of words etymologically related to English word door and to word pistachio.
Btw, do you think it would be a good idea to export the RDF database generated by etytree into Wikidata? (with supervision of course).
Epantaleo (talk) 11:38, 19 October 2019 (UTC)
@ Epantaleo: Impressive graph! Some comments: I would not bother about the wrong etymology links while building the tool: the graph help identify etymologies to be corrected.
I had a similar idea before finding that page, but I was thinking of a simpler graph, in form of a tree, only including the strict translations, as inspired by the graphs of Jakub Marian like that one on pronoun "I" or that one on "hundred". Possibly that could be a different idea (Etymology tree vs. cognates graph).
If both ideas are developed and if your graphs get to complex because of the number of words, it could be considered to restrict:
- Etymology tree: only strict translation => several languages, only one word or so by language
- Cognates graph: maybe only one language?
For both types of graph, good features could be:
- Links to Wiktionary and/or Wikidata article (ideally with preview as pop-up when hoovering above with the mouse)
- Possibility to shortlist/highlight some languages => shortlisting only one language would make your graphs clearer
- Function to show/hide etymology links with lower likeliness.
For the last point, a "weight" (e.g. in form of a percentage and/or of adverbs like "possibly"/"probably") describing the likelihood of an etymology would be needed in Wikidata and Wiktionary. I don't know if the topic was already considered. Gfombell (talk) 04:38, 22 August 2021 (UTC)
@ Gfombell: Thanks for your interest and your comments! I hope to have time soon to contribute to the project. Epantaleo (talk) 12:15, 24 August 2021 (UTC)
Noting that it is (now?) possible to use SPARQL. E.g. water or tea. Quiddity (talk) 19:56, 14 September 2023 (UTC)
List new words found in Wikipedia
[edit]- Need
- Who would benefit
- Editors and users of Wikidata
- Wikipedia indirectly
- Proposition of solution
- Scan Wikipedia articles for words currently not in Wikidata and propose them with sample once a given number of occurrences are found.
- Further comments
- Proposer
- --- Jura 18:20, 19 October 2018 (UTC)
- Discussions
- Might take some time since we filled the initial backlog ;) --- Jura 18:20, 19 October 2018 (UTC)
- See Wikidata:Lexicographical coverage for a related tool --DVrandecic (WMF) (talk) 22:10, 10 February 2021 (UTC)
Auto-specifying the language when binding senses
[edit]- Need
- Now, when linking senses through translation, you have to manually specify the language of the specified sense. This is not entirely rational, since the language is directly specified in the lexeme.
- Who would benefit
- Proposition of solution
- When linking a sence, the language is automatically substituted.
- Further comments
- Proposer
- Iniquity (talk) 12:23, 30 July 2019 (UTC)
- Discussions
Bilingual dictionary app for wikipedias/other wikimedia sites
[edit]- Need
- a script that takes a higlighted word on a page of some wiki, and searches for wikidata lexeme for that word. Then shows the meaning of the word in popup using native language. If sense and/or lexeme is missing, shows a form to add that information.
- Who would benefit
- any users that read wikis in languages they are not fluent in yet. Wikidata will benefit from a way to easier add new words.
- Proposition of solution
User script/gadget, that, for example, on de.wikipedia.org, allows to configure language(s) I understand, then when I press Alt+Shift+t (or whatever shortcut is free to use), searches for german lexem or it's form with that spelling in wikidata, and shows senses in languages I understand and grammatical information avaliable. When sense in my native language is missing - shows field to add it. When I submit the form - publishes that change on wikiata.
- Further comments
- When I read some wikipedia page in foreign language, I often lookup unknown words, and would love to have a way to lookup it faster and more convenient place to store it. I like https://www.dict.cc/ project, and use it a lot while reading German texts but could not contribute because my native language support there is missing.
- Proposer
- Benderovec (talk) 20:54, 16 November 2020 (UTC)
- Discussions
Translations
[edit]- Need
- This is something that has caused me some discomfort since I started editing lexemes. Currently, adding (and removing) translation (P5972) of the senses to the pages is quite a hard and tiresome process. All the work is manual: it is necessary to search for senses and edit many lexemes several times, if the person wants something complete. See, for example, how many translations of അമ്മ (L480) could also be added to äiti (L7335), 母/bó/bó (L222599) and mamma (L32675) (all having the same meaning, "mother").
- Who would benefit
- All editors, translators and readers of all languages.
- Proposition of solution
- This reminds me a little of the time when Wikipedia pages had their interlanguage links managed by robots (before Wikidata existed). I believe that, as was the case in the past, a robot could solve the current problem, adding and removing translations in the lexemes to unify and standardise their contents.
- Further comments
- Going further, I would say that the same could be done with image (P18), item for this sense (P5137) and glosses (the latter would not necessarily need to have the existing content exchanged among pages, but only added when it is missing). In the case of glosses, see the amount of senses in ama/𒂼 (L1) (more than 60) that could also be in أُمّ (L226769) and ina (L416267) (with only 1 and 3, respectively). Imagine an editor who only edits in his language. Creating a new lexeme and adding meaning to it, the editor, in a hurry to create more lexemes in his language, leaves the page as it is, small, short and kind of empty. Currently, a page like this will be like this until someone goes there and adds more content to it. With what I propose above, the editor would only need to create his lexeme and add/search for a translation in another language for the robot (based on that translation) to come and add the rest to it (images, senses and translations in other languages, etc.), saving time and effort that can be spent on other tasks.
- Discussions
- There are probably nuance differences between languages, so I'm not sure if a robot can blindly copy everything, I guess? But of course it would be nice if it was user friendly to reuse across items. Robertsilen (talk) 09:30, 27 October 2022 (UTC)
Show interlanguage sitelinks to Wiktionaries on sidebars of WD:Lexeme namespace
[edit]- Need
- Shows which all wiktionaries have an entry with same title as the Lexeme, and allows to navigate to Wiktionary entry quickly
- Who would benefit
- All lexeme editors. It helps both to show which all wiktionaries have entries on same title as well to navigate to it quickly
- Proposition of solution
- Use mw.util.addPortletLink() to display links
- or, Use the same mw:Extension:Cognate feature that current Wiktionaries are using to display links automatically if it exist
- Further comments
- Proposer
- Vis M (talk) 15:29, 22 June 2021 (UTC)
- Discussions
Support Nice Idea. This is very helpful --Sriveenkat (talk) 10:02, 26 September 2023 (UTC)
- @Vis M I found a tool for this User:Nikki/LexemeInterwikiLinks.js @Nikki Thanking you for this tool creation :)) .
Sriveenkat|talk/{PING ME}
09:07, 31 December 2023 (UTC)
Add sense from item
[edit]- Need
- adding a sense with a item for this sense (P5137) statement requires several clicks and formulating a sentence (the gloss) that likely already exists elsewehere (as the description of the item). This extension would reduce the number of steps needed do the same.
- Who would benefit
- lazy people who would like to contribute senses for things that have a wikidata item.
- Proposition of solution
-
- The add sense link is augmented with an additional add sense from item link.
- Clicking this link will open a search that will autocomplete items (that have a description in the user's language).
- selecting an item will result in a sense being created
- The gloss is automatically copied from the selected item's description
- a item for this sense (P5137) statement is automatically created linking to the selected item
- Mockup:
- Further comments
- Proposer
- Shisma (talk) 10:32, 9 October 2021 (UTC)
- Discussions
- this looks neat! Has anyone been working on it? Asaf Bartov (talk) 17:48, 26 September 2022 (UTC)
Support After my first couple of days entering lexemes, I can see this as being a very handy tool. Wikidata has in many cases a suitable short gloss that can be used. Robertsilen (talk) 09:34, 27 October 2022 (UTC) Support Please ping if done! -wd-Ryan (Talk/Edits) 20:13, 2 November 2022 (UTC)
- This is really good Support Sriveenkat (talk) 15:33, 2 October 2023 (UTC)
Adding synonyms, antonyms and translations
[edit]- Need
Easy way to add synonym, antonym and translation statements between senses.
- Who would benefit
Editors of all languages.
- Proposition of solution
A gadget (or script, I don't know) similar to Merge.js for sense level edits. On a lexeme page, the user would click on a button at the top of the screen: "More ∨" → "select sense to relate". Then three buttons would appear next to the senses of that lexeme: S, A or T (or maybe three icons, in order to be more easily understood), standing for synonym, antonym and translation. After choosing one of those letters in one of the senses to click, other lexemes opened in the user's browser will show these three buttons in their senses as well. Then, by clicking one of those letters again (in another lexeme), the gadget will, detecting these clicks, add the desired statement to both lexemes, linking the senses together. A tool like this could also support other properties for senses: pertainym of, hyperonym, false friend, specified by sense, etc.
- Further comments
- Proposer
Enaldodiscussão 18:42, 9 February 2022 (UTC)
- Discussions
Support Makes sense, sounds good. Robertsilen (talk) 09:36, 27 October 2022 (UTC)
Pronunciation audios
[edit]- Need
Easy way to add pronunciation audio (P443) statements to the forms of lexemes.
- Who would benefit
Editors of all languages.
- Proposition of solution
A new tool. In it, the user would select a Commons category with audio files and specify regular patterns (e.g.: in Lingua Libre pronunciation-cat, use the text [A-Za-z]+ between the last "-" and ".wav") so the tool could search for lexemes with forms matching the titles of these files. In the process, the tool would ignore capitalization and punctuations ("·"). Finally, the tool would display to the user both the lexeme and its form and the audio to be played. If it's a correct match, the user could click "Add" to insert a new pronunciation audio (P443) statement, and maybe also specify the pronunciation variety (P5237).
- Further comments
- Discussions
- Support This looks like something I would love to experiment with. You may ping me for collaboration on this. Eugene233 (talk) 12:51, 17 January 2023 (UTC)
- Support Sriveenkat (talk) 15:31, 2 October 2023 (UTC)
Example template
[edit]- Need
- Who would benefit
- Proposition of solution
- Further comments
- Proposer
- Discussions