Wiktionary:Todo

This page is for cleanup jobs. Request jobs are at Wiktionary:Task lists.

Shortcuts:
WT:TODO
WT:CDPR

This page lists cleanup requests affecting multiple entries. These may include updating templates, categories or generic entry structure, but not specific terms, which should be tagged with {{rfc}} and put on WT:RFC. Therefore, tasks that have previously been divided across discussion and user pages are grouped together in one place where they are easier to find.

Frequently updated todo lists

Todo Lists project: Please see WT:Todo/Lists for a set of regularly updated cleanup lists.
JeffDoozan's cleanup lists: See User:JeffDoozan for lists of entries with formatting or layout errors.

Regular tasks

In this section, you will find relatively easy cleanup tasks.

Special pages

Updated a couple of times a week.

Todo lists

Updated weekly.

Wiktionary:Todo/Lists/Blank or extremely short pages .. 25 results .. last updated 01:42, 26 December 2024 (UTC)

Semi-regular tasks

Usually dump-analyzed:

Unhelpful abbreviations — These should use the full term.
Occasionally, soft hyphens or other invisible/zero-width characters (||‌|‍) sneak into the content of entries or even the pagenames; the soft hyphens should be removed; the other characters should be discussed.
People sometimes type {[, }] etc when they mean {{ / }}. It is useful to periodically scan dumps for instances of this. Here is some regex: ([^\[\{]\[\{[^\[\{]|[^\[\{]\{\[[^\[\{]|[^\]\}]\]\}[^\]\}]|[^\]\}]\}\][^\]\}]). Simply searching for ]} will not work, because there are many valid instances of it, e.g. {{m|en|a [[link]]}}.
Every few months, check for instances of the common but nonstandard headers "Alternative form", "Alternative spelling" and "Alternative spellings" (which should be "Alternative forms") and "Usage note" (which should be "Usage notes"). Many other nonstandard headers exist, but none are as common as those. Also, no L1 headers should exist in the main namespace (language headers should always be L2, and all other headers should always be L3 or more). See User:Erutuon/mainspace headers for a full list of non-language headers and User:Erutuon/mainspace headers/possibly incorrect for a list of possibly incorrect headers.
Check for entries using modifier letters or deprecated IPA characters.
Search for (using the site search function) and fix "Etymology 2" -"Etymology 1" and other cases of higher-number etymologies without the full complement of lower-number etymologies.
Check for misindented quotations (pages with a line containing {{quote- but not starting with #* or ##*)
Check for entries that use Template:sense, Template:a, manual formatting, etc. instead of {{lb}}
People, and some other online dictionaries, write /e/ where the actual IPA symbol is /ɛ/, e.g. [1], [2], [3]

To be monitored manually:

Check periodically for misspellings.
Check periodically that things in Category:English countable proper nouns aren't mislabelled common nouns.
Periodically fix entries in Category:Terms borrowed back into the same language that are not twice-borrowed, like this.
Periodically fix entries in "Category:Terms borrowed from Proto-" categories (see [4]) like Category:Terms borrowed from Proto-Slavic that were not actually "borrowed" by the L2 language (e.g. here). (list)
Check for incorrect characters (for instance, ي vs ی and ك vs ک in Arabic, Persian, Urdu, Azeri, and other languages) that look identical in certain positions and are often mixed up. Lists of instances in common linking templates are found at User:Erutuon/wrong script.
Check for transliterations containing incorrect characters. These usually indicate errors of some kind. User:Erutuon/bad transliteration lists transliterations with non-Latin characters, except for those with CJK script and Latin script, such as {{t+|ja|言葉|tr=ことば, kotoba}}.
Periodically fix entries in Category:English terms derived from Greek that are in fact from Ancient Greek (grc) rather than (Modern) Greek (el).
Periodically check /Citations without citations
Periodically check for translations that aren't templatized (but use bare links, like * French: [[foo]])

Also:

Uses of the language code aaa (Ghotuo) in translations tables are often vandalism.
pdc (Pennsylvania German) and pdt (Plautdietsch), aja (Aja/Adja of Sudan) and ajg (Adja/Aja of Benin) need to be kept separate.

Useful search queries

Latest comment: 2 years ago1 comment1 person in discussion

insource:/\# \(\[\[/ -insource:/Chinese/
Mostly, the matched texts need to use {{label}} template. It's also possible to search by a specific label (# ([[botany]])
insource:/=.\{\{sense/
Mostly {{sense}} should be preceded with an asterisk.
insource:/\=\=Etymology 2/ -insource:"Etymology 1"
Entries with Etymology 2 but not Etymology 1 (there are a few false positives where "Etymology 2" is inside a comment)
Instance of "the the" in the entry - these errors keep on occurring, it's pretty crazy. GreyishWorm (talk) 01:53, 12 November 2022 (UTC)Reply
Has "trans-top" but not in English or Translingual lemmas/non-lemma forms
Special:Search/insource:"Wikipedia.org", Special:Search/insource:"Wiktionary.org"
Unwanted language code in {{w}}: insource:/\{w\|en\|/. The language code (for other than English Wikipedia) is in the |lang= parameter, so {{w|en|...}} links to the page "En" on English Wikipedia. The "en" can be replaced in the search box to search for other language codes, in which case they would be fixed by either deleting the language code in the first parameter or adding "lang=" in front of it.
Pages with templates missing parameter |1=: Special:WhatLinksHere/Unsupported_titles/`lcub``lcub``lcub`1`rcub``rcub``rcub` (you can use namespace filtering to weed out userspace, and change the "1" to view other positional parameters)
Fodder for {{etymid}}- hard-coded links to a numbered etymology section: Special:Search/insource:/\#Etymology+[0-9]/

If the search gives a warning (and even if it doesn't!), see Help:CirrusSearch for ways of making the search much less demanding on the servers and much more likely to provide a complete list of problem entries.

All subpages

Subpages of Wiktionary:Todo :

2013

Latest comment: 11 years ago1 comment1 person in discussion

Wiktionary:Todo/Slovene masculine translations

> This is the list of entries, as of the last database dump, that contain Slovene translations with the gender m ("masculine"). They should most likely be changed to use either m-an (+ "animate") or m-in (+ "inanimate"), since that distinction has grammatical consequences in Slovene. (?)

—Ruakh_TALK 14:34, 11 September 2013 (UTC)Reply

2015

Latest comment: 1 year ago19 comments5 people in discussion

/Pages containing LTR marks and /RTL marks

In many cases, these are unnecessary and cause problems. - -sche (discuss) 18:16, 21 January 2015 (UTC)Reply

What are LTR marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)Reply

What are RTL marks and how should one improve the entry? --A230rjfowe (talk) 21:00, 15 July 2015 (UTC)Reply

They are invisible characters that otherwise behave like strongly left-to-right characters (such as Latin letters) or strongly right-to-left characters (such as Arabic letters), in that they influence the direction of surrounding characters that do not have a defined text direction. So they are sometimes used to change the direction of characters in text. For instance, on Wiktionary, where text direction is generally left-to-right, punctuation characters can be forced to render right-to-left by sandwiching them between Arabic letters and a right-to-left mark.

But CSS should be used to change text direction instead, whenever possible. On Wiktionary, we do this by adding classes that have the correct CSS properties: for instance, enclosing Arabic text in class="Arab", which has the CSS direction: rtl; unicode-bidi: embed; applied to it in MediaWiki:Common.css. This is done automatically by most linking templates.

Regenerated. - -sche (discuss) 14:48, 19 February 2018 (UTC)Reply

/Page with untemplatized etymologies

A partial list of pages where at least one language section simply states, in plain text, without using {{etyl}}, that it derives from German, French, Latin, Greek, Ancient Greek, Chinese or Spanish. - -sche (discuss) 17:43, 25 January 2015 (UTC)Reply

Regenerated (1469 entries). - -sche (discuss) 14:44, 19 February 2018 (UTC)Reply

/North American

A list of entries which are labelled as being Canadian, or American, but not both. It is likely that many should in fact have both labels. See Wiktionary:Beer_parlour/2015/March#North_American_English_vs_Canadian_and_American_English for a bit of background. - -sche (discuss) 05:00, 7 March 2015 (UTC)Reply

Erroneous Greek characters

Any place that the character ϕ is used in place of φ or ϑ in place of θ in a string that is marked as being grc or el should be listed so that an editor can look them over and fix mistakes. I just found one lying around in a {{term}}, which made me think that these shouldn't be overly hard to find. —Μετάknowledge^{discuss/deeds} 21:01, 12 May 2015 (UTC)Reply

@Metaknowledge: Never knew this page existed. Ironically I came across this why searching for incorrect uses of ϕ. For future reference, here is the search for ϕ and here is the search for ϑ (other incorrect characters are ϖ ϛ ϰ ϱ ϐ ϵ ϲ ϗ ȣ; there may be more). --Wiki Tiki 89 13:20, 21 April 2017 (UTC)Reply

If nothing has been done about this, I can make Module:script utilities search for these characters when it tags text, and add a tracking template or a category. — Eru·tuon 23:50, 20 May 2017 (UTC)Reply

@Metaknowledge, Wikitiki89: Done. — Eru·tuon 00:02, 21 May 2017 (UTC)Reply

@Erutuon: It's never done, people will keep adding them. --Wiki Tiki 89 15:03, 22 May 2017 (UTC)Reply

Oh sorry, you were referring to having Module:script utilities search for them. It's not that nothing has been done, I went through and removed over a hundred of these. But again, people will keep adding them. --Wiki Tiki 89 15:05, 22 May 2017 (UTC)Reply

Right. I just found one in polypharmacy... 🙄 — Eru·tuon 18:14, 22 May 2017 (UTC)Reply

User:Erutuon, should we add an actual cleanup category to entries using these? I just cleaned up hypophora, which had no indication on the page itself (that I noticed) of the problem (though someone who knew where the tracking template was could find the page). I'm going at ask in the WT:GP if we could catch these with an edit filter. - -sche (discuss) 11:13, 12 November 2022 (UTC)Reply

@-sche: Unfortunately that isn't a good idea because adding a category will cause changes in parsing in certain cases. We don't do that at all when language-tagging text at the moment and so language-tagged text can be used in cases where a link wouldn't be allowed. For instance, if language-tagged text is inside the text of a page link ([[some page|{{lang|grc|ϑ}}]]), adding a category link (the equivalent of [[some page|{{lang|grc|ϑ}}[[Category:Bad Ancient Greek text]]]]) next to it would break the page link. — Eru·tuon 21:28, 13 November 2022 (UTC)Reply

As of a while ago, I implemented (with an IP's help) one filter which warns against a few of the most-wrong of the characters above, which has already helped some users to replace them before saving their edit, and another filter which silently tracks all of the characters. - -sche (discuss) 02:38, 2 August 2023 (UTC)Reply

Not click characters

All over the dictionary, e.g. in the name and content of !nawas and in this translation, ! turns up for ǃ, and I wouldn't be surprised to find other substitutions for click consonants. The best way I can think of to find such uses is: create a list of all languages that use clicks, or as a presumably easier-to-make approximation of that a list of all Khoisan languages, then search a database dump for all translations, language sections, and {{m}}/{{l}}s of those languages that contain !. I've just cleaned up the few pages which misused ! in their pagenames (only 31 pages on Wiktionary used ! in their pagenames at all). - -sche (discuss) 18:42, 25 August 2015 (UTC)Reply

2017

Latest comment: 4 years ago5 comments3 people in discussion

Check IDs

As discussed at Wiktionary:Grease pit/2017/May § Adding ids to enable linking to headwords, we need to check for sense ids in {{senseid}} and the |id= parameter of headword templates that are on the same page and have the same language and have the same id string: that is, those that would create the exact link when input into an entry linking template. Each sense id for a given language on a given page should be unique. — Eru·tuon 16:57, 19 May 2017 (UTC)Reply

Usage note template naming

User:-sche/Usage note templates lists some usage-note templates which could be moved to fit our usual naming scheme, as described on the page and [5]. - -sche (discuss) 22:01, 26 May 2017 (UTC)Reply

Possibly mislabeled affixes

Wiktionary:Todo/interfixes: These look like interfixes, but are labelled "prefixes" or "suffixes". - -sche (discuss) 19:57, 8 June 2017 (UTC)Reply

Regenerated (per request on my talk page). Note that some, e.g. for Navajo, may be fine as they are. - -sche (discuss) 03:34, 15 February 2020 (UTC)Reply

Pronunciation audio files

User:DerbethBot/Add manually: DerbethBot adds pronunciation files to entries, but some audio files need to be added manually. (See also User:DerbethBot for more info.) -- Curious (talk) 12:00, 11 June 2017 (UTC)Reply

2018

Latest comment: 6 years ago1 comment1 person in discussion

Terms not restricted to legal jargon

Quite a few entries with usage notes like this are labelled {{lb|en|law}}, but are in fact in general use and not at all restricted to legal jargon (so the label should be removed). - -sche (discuss) 00:10, 23 December 2018 (UTC)Reply

2022

Latest comment: 18 days ago17 comments9 people in discussion

Broken interwiki links

You can help repair the broken links to Wikipedia, Wikispecies, Wikimedia Commons and Wikisource at the subpages of User:This, that and the other/broken interwiki links. For each page listed, one of the following three things should be done: (1) correct the spelling, pluralisation, lowercase/uppercase of the link, add a |lang= parameter etc., (2) remove the link template altogether if not appropriate, or (3) create a redirect on the other wiki (many redirects on other projects were valid but have since been deleted). This, that and the other (talk) 03:14, 2 February 2022 (UTC)Reply

/Missing Hebrew roots

See above. 70.172.194.25 00:59, 1 April 2022 (UTC)Reply

Wiktionary:Todo/compounds not linked to from components

To find compound terms not linked Dunderdool (talk) 21:39, 24 July 2022 (UTC)Reply

Terms from Webster's 1913 dictionary

Thousands of them are at Category:Webster 1913 (and have been around since almost the beginning of Wiktionary!). Often only one or two terms in Webster's dictionary have not been assimilated and modernized into Wiktionary, sometimes more. GreyishWorm (talk) 17:51, 22 October 2022 (UTC)Reply

At one term there were 29,000 entries, according to archive.org P. Sovjunk (talk) 14:10, 27 April 2024 (UTC)Reply

>21,000 as of today. GreyishWorm (talk) 15:11, 12 November 2022 (UTC)Reply

<17,000 Ñobody Elz (talk) 08:25, 5 June 2023 (UTC)Reply

<16,000 Creeps like you (talk) 12:29, 2 July 2023 (UTC)Reply

<15,000 Worm spail (talk) 17:09, 28 August 2023 (UTC)Reply

<14,000 Denazz (talk) 08:07, 20 December 2023 (UTC)Reply

<13,000 Denazz (talk) 21:11, 23 February 2024 (UTC)Reply

<12,000 P. Sovjunk (talk) 07:11, 29 April 2024 (UTC)Reply

<11,000 Denazz (talk) 13:32, 21 July 2024 (UTC)Reply

<10,000 Denazz (talk) 13:05, 10 September 2024 (UTC)Reply

<9,500 P. Sovjunk (talk) 21:51, 30 September 2024 (UTC)Reply

<9000 P. Sovjunk (talk) 13:21, 15 October 2024 (UTC)Reply

<8000 P. Sovjunk (talk) 16:55, 8 December 2024 (UTC)Reply

some easy adjectival forms and easy noun forms

User:This, that and the other/Websterpedia contains a subset of the category which have corresponding Wikipedia pages.

2023

Latest comment: 1 year ago5 comments3 people in discussion

/long usage examples

Shorten them, or convert to quotations. This, that and the other (talk) 01:24, 19 June 2023 (UTC)Reply

Category:Requests for date by source

Many undated quotes. Chioshio (talk) 03:10, 19 June 2023 (UTC)Reply

"Raw" inflection tables in entries

Numerous entries contain hard-coded, non-templated inflection tables. Languages especially affected include Hunsrik, Pennsylvania German, Albanian, Old Marathi, and Sanskrit. Some of them have probably been subst'ed by accident, but in other cases, no inflection template exists. The development of a new one will be necessary.

See the search, which currently returns 198 pages. This, that and the other (talk) 05:45, 14 August 2023 (UTC)Reply

In the English Wikipedia I instituted a system whereby certain accidentally substituted templates (cleanup tags) were easily de-substituted. I think someone else improved it so that they de-substituted themselves, though this required some magic somewhere. Rich Farmbrough, 15:43, 13 December 2023 (UTC).Reply

Non-standard superscript Wikipedia links

[6] This, that and the other (talk) 11:23, 3 October 2023 (UTC)Reply