Wiktionary:Beer parlour/2022/September

The Spanish Inquisition vs. CFI

When discussing whether use of a name is about the person/entity or figurative, I like to think of the example of Monty Python's Spanish Inquisition sketches. These start out with character innocently saying "I wasn't expecting some kind of Spanish Inquisition", at which point characters in historical costumes burst through the door and say, dramatically "Nobody expects the Spanish Inquisition!" Then they launch into a self-descriptive monolog starting with "Our chief weapon is surprise".

The whole logic of the sketches hinges on whether "Spanish Inquisition" refers to a heavy-handed interrogation (figurative) or a historical/fictional entity (non-figurative). So, when you see an entry for a named entity, ask yourself: is this making the same mistake that the Spanish Inquisition makes in the Monty Python sketches? Chuck Entz (talk) 15:24, 1 September 2022 (UTC)[reply]

The interesting question here is whether the figurative uses are figurative uses of the literal sense or rather literal uses of the figurative sense. And that is subject to some academic debate, from what I remember. As for Spanish Inquisition, what is interesting is that the figurative uses use the definite article the, as in "I agreed to answer a few questions, but I didn't expect the Spanish Inquisition." And a further question is to what extent the literal sense is kind of embedded in the figurative uses, expecting the reader/listener to know the literal sense. I am inclined to think that figurative uses of literal senses for persons are figurative uses of the literal sense and that we would be best served by having a single definition of the form "Literal sense definition, noted for characteristics X, Y, Z". And if we assume that a separate figurative sense is warranted, then the question is whether the literal sense should be relegated only to etymology despite often being the main sense of the defined term. That there are figurative uses of names of persons and groups, of that there is no question. --Dan Polansky (talk) 16:01, 1 September 2022 (UTC)[reply]

CEFR levels

I would like to know if it is possible to add CEFR levels to entries, as does the Cambridge dictionary. Backinstadiums (talk) 19:49, 1 September 2022 (UTC)[reply]

Wouldn't that have to be done by individual definition/sense?

Is there a source for such information or a set of criteria for determining a level for a definition? DCDuring (talk) 19:54, 1 September 2022 (UTC)[reply]

The Cambridge Dictionary lexicographic resources include them. Ther is this one too https://www.englishprofile.org/wordlists/evp Backinstadiums (talk) 20:26, 1 September 2022 (UTC)[reply]

We'd probably have to revise at least one definition per entry to make it conform to the CEFR level. Eg, their first definition of iron ("a dark grey metal used to make steel and found in very small amounts in blood and food") corresponds to our def. 1. "A common, inexpensive metal, silvery grey when untarnished, that rusts, is attracted by magnets, and is used in making steel." Their second definition "a piece of electrical equipment that you use for making clothes flat and smooth" corresponds to our def. 4: ("A tool or appliance made of metal, which is heated and then used to transfer heat to something else; most often a thick piece of metal fitted with a handle and having a flat, roughly triangular bottom, which is heated and used to press wrinkles from clothing, and now usually containing an electrical heating apparatus.")

The problem with our def. 4, 2 verbose NPs, each with a complex compound clause as modifer, for CEFR purposes should be evident. In 2012 our def. 1 was "A metallic chemical element having atomic number 26, and symbol Fe.", brief but using terms not covered in elementary education and forgotten by many adults and failing to connect with everyday experience outside of classrooms. Even our current def. 1 fails as a definition in CEFR terms because of some of the terms it uses (untarnished, magnet) are not themselves listed in CEFR. (We don't even use a defining vocabulary to attempt to simplify any of our definitions.) DCDuring (talk) 23:10, 1 September 2022 (UTC)[reply]

Imagine the fun of keeping these levels updated with every edit by an IP (and not just edits to the base article, but to the ones it derives its own level from). The phrase moon on a stick comes to mind. Equinox ◑ 23:14, 1 September 2022 (UTC)[reply]

Let's do it tho' Backinstadiums (talk) 10:28, 7 September 2022 (UTC)[reply]

Sounds like something that should be added to Wikidata instead. – Jberkel 08:09, 8 September 2022 (UTC)[reply]

Who will though? Backinstadiums (talk) 09:23, 9 September 2022 (UTC)[reply]

@Brett? DCDuring (talk) 16:54, 9 September 2022 (UTC)[reply]

We have HSK levels for Mandarin, cf. Category:Mandarin by difficulty level. 98.170.164.88 17:02, 9 September 2022 (UTC)[reply]

The CEFR levels in the EnglishProfile data is indeed by sense, not by headword. What we could perhaps do is add the level of the most common sense to the wikidata. If that's a useful thing to do, I could take it on.--Brett (talk) 21:59, 9 September 2022 (UTC)[reply]

Looking forward to it. Thanks. Backinstadiums (talk) 08:17, 10 September 2022 (UTC)[reply]

To be clear, I said I'd do this if folks think its useful to add the most frequent sense to the wikidata.--Brett (talk) 12:49, 12 September 2022 (UTC)[reply]

Are our most-frequent-current-sense definitions simple enough to warrant a CEFR for users or would this just be useful for contributors who would seek to simplify the relevant definition? DCDuring (talk) 13:37, 12 September 2022 (UTC)[reply]

Could you elaborate a bit on your connotations here? Thanks Backinstadiums (talk) 10:07, 13 September 2022 (UTC)[reply]

Our definitions of 'A' terms often require 'C' level capability in a user to be understood. So, either we intend for CEFR categorization to be only for educators (and/or definition simplifiers) or we need 'A'-level definitions for 'A'-level users. DCDuring (talk) 11:35, 13 September 2022 (UTC)[reply]

Let's do it then! Backinstadiums (talk) 12:52, 15 September 2022 (UTC)[reply]

Would adding the CEFR level to the wikidata be something like the grade of a kanji?--Brett (talk) 12:51, 17 September 2022 (UTC)[reply]

CEFR's is about the semantic depth of vocabulary, not sure how hanzi/kanji official character lists are ranked Backinstadiums (talk) 09:40, 19 September 2022 (UTC)[reply]

@Backinstadiums Specifically with kanji, it refers to the school grade in which the Japanese Ministry of Education prescribes that a given kanji is taught. It’s useful for learners. Hong Kong has something similar. Theknightwho (talk) 15:58, 26 September 2022 (UTC)[reply]

Category:English terms spelled with 0 etc.: yes or no?

We currently have 397 terms in Category:English terms spelled with 0, all of which have the category added manually. Module:headword has support for adding categories like this automatically, depending on the value of the standardChars field in Module:languages/data2; if a character isn't listed, a terms spelled with category is added. However, digits 0-9 are included in this field for English (same for many other languages, but not all), so the category isn't added automatically. IMO either it should be added automatically or not at all. Which one is correct? Benwing2 (talk) 04:00, 2 September 2022 (UTC)[reply]

Er, I would say that people should never be adding "terms spelled with X" manually because a bot can do it, and a person can make typos. (I also cry to think about human users wasting their time doing shit that a machine can do.) Equinox ◑ 04:04, 2 September 2022 (UTC)[reply]

@Equinox I completely agree; in any case we should remove the manually-added categories. What I'm asking (sorry for not being clear) is whether we should remove digits 0-9 from English standardChars, so that categories like this get autopopulated, or keep them, so that the categories get emptied. Benwing2 (talk) 04:13, 2 September 2022 (UTC)[reply]

We need a user survey to find out whether anybody has ever used "English terms spelled with X". I never saw the point of them at all. I know people always complain about anagrams, but I bet loads of people use that, to check Scrabble words and stuff. Who actually goes on the Internet to say "oh, today I want a list of words that contain the digit 7?" Nobody. That's who. John Q Nonexistent does that. Equinox ◑ 04:21, 2 September 2022 (UTC)[reply]

I find them interesting if they're weird. Digits are not weird. Theknightwho (talk) 10:10, 2 September 2022 (UTC)[reply]

I used to use these when doing things like checking terms spelled with æ or œ that were not categorized as archaic / obsolete, to find ones that needed to be, and I had also wanted to look over "terms spelled with '" for some reason I no longer remember, and was frustrated that category didn't exist because people thought it was too common / boring. But that was me doing cleanup work, I haven't used them in a while, and I know there are other (albeit time-consuming) ways of generating lists of "all German words spelled with x" with AWB if I need to; I don't know if a reader would find a category for "x" or "0" useful. Probably they're only interested in seeing what's spelled with weird characters, as TKW says.
Is it expensive to add these, Lua-wise? Once the module is already checking for ligatures, is it any more expensive to have it also check for 0? If it's cheap, maybe just add 0-9 and just make them hidden categories if we're concerned they'll clutter up the bottom of the page, but if it's expensive, it's probably not worth the bother. - -sche (discuss) 15:56, 2 September 2022 (UTC)[reply]

-sche's comment makes me realise that actually this whole picture isn't a category issue (really), it's a search issue. There might not be a way right now to say "find me pages whose titles include the é like in café", but there might as well be, because that is a search problem. It's not something we should waste our time or categories on. Equinox ◑ 16:02, 2 September 2022 (UTC)[reply]

MediaWiki has reasonably good set of search functions all things told, but it's not the most intuitive. Theknightwho (talk) 16:33, 2 September 2022 (UTC)[reply]

intitle:0 insource:"\=English" works. You can't use categories like Category:English lemmas and Category:English non-lemma forms in searches because they're just too big- apparently the search engine loads the entire category before applying the filters. Chuck Entz (talk) 17:05, 2 September 2022 (UTC)[reply]

Never mind. It seems to only find "0" in isolation. The intitle: would require a regex, but these tend to time out if not carefully designed. Chuck Entz (talk) 17:19, 2 September 2022 (UTC)[reply]

I mean at this point I'm gonna be seen as a troll: but can we find one single human user who wants to find "English terms with a zero in them"? No. Just drop this bollocks. It's obviously autistic nonsense, of no value to anybody. Equinox ◑

I went ahead and deleted all manually specified categories for terms spelled with 0 through 9 and am in the process of doing the same with various ASCII puncutation characters, all of which are in standardChars (especially useless categories like Category:English terms spelled with - and others). Benwing2 (talk) 05:47, 3 September 2022 (UTC)[reply]

@Benwing2: Category:Translingual terms spelled with less-than sign is useful, though. I do not agree with this removal. J3133 (talk) 06:17, 3 September 2022 (UTC)[reply]

@J3133 Fine, I can put it back. However, < and > are in standardChars; if you think it's useful to have those categories, you should lobby for removal of them from standardChars. I'm not about to manually add that category to all pages with a < or > sign in them. Benwing2 (talk) 06:20, 3 September 2022 (UTC)[reply]

I did add the category manually to all pages earlier, as Translingual does not have standardChars. J3133 (talk) 06:24, 3 September 2022 (UTC)[reply]

@Benwing2 I find this as another thing that was implemented way too quickly with little discussion. I have found the categories interesting, it's not "autistic nonsense", and Wiktionary as a whole doesn't even really know what most readers do to begin with, nor do they really use Beer Parlour to comment on issues like these. Was barely given more than 24 hours to respond before the change went in. I'd really appreciate it, as I've mentioned before, that changes like these be given more time and discussion before they go in, especially if they affect a bunch of entries. It's weird that things can move extremely fast on one end and then painstakingly slow on another. AG202 (talk) 17:32, 3 September 2022 (UTC)[reply]

@AG202 I'm sorry, I just removed the manual categories. I agree maybe I acted too quickly, but I still maintain it's not useful to have manually added categories like this, and they are necessarily incomplete; most of these categories were added long ago, and most terms added in the past few years containing digits were never in the categories. If you think we should have 'terms spelled with 0-9' categories, the correct way is to use the standardChars mechanism and remove 0-9 from the list. I haven't touched any existing standardChars, and I agree it should require consensus over a week or so to do so. Benwing2 (talk) 17:38, 3 September 2022 (UTC)[reply]

But, in general, what's the rush to implement this kind of change? Give it a month at least. DCDuring (talk) 13:42, 12 September 2022 (UTC)[reply]

To follow up on my earlier comment ("if this is not expensive, then mehhh, why not add it and just hide it?"), I notice for comparison that we have Category:English terms with quotations which contains so many entries of such widely varying quality as to be uninteresting to any normal user, and given that it doesn't catch terms that have quotations not formatted into a template, I'm not even sure how useful it'd be to a technically-minded contributor. (At least "terms spelled with..." can be added exhaustively automatically.) - -sche (discuss) 00:17, 30 September 2022 (UTC)[reply]

Some categories may be useful to speed up "insource" (regex) searches in Cirrus search as well as dump processing. DCDuring (talk) 13:48, 30 September 2022 (UTC)[reply]

Mongolian terms spelled with ъ and щ

Discussion moved to WT:GP.

Closing RFD discussions using the strength of the arguments

Some hold that RFD discussions should be closed based on the strength of the arguments presented rather than on the "keep" and "delete" post counts. This isn't workable, as per the following.

Imagine a RFD nomination with the rationale "sum of parts". Three additional editors post "Delete as SOP". A keeper posts a keep with an elaborate explanation why the term is not a sum of parts. A month passes. The keeper closes the discussion with "RFD kept: the arguments for keeping are stronger". The keeper is honestly using their judgment to assess the strength of the arguments; it is the same judgment they used to post the keep in the first place. The number of discussion participants is 5 (1 + 3 + 1), but the number of arguments is 2 since the 3 additional pro-deletion editors did not post any additional arguments. In fact, since only the strength of the arguments should matter and not vote counts, the deleters should not have posted anything since they did not add anything to arguments or their strength. This is an absurd way to administer a RFD process; it turns the discussion participants into having a mere advisory role, removes all decision making authority from them and places all the authority on the single closer.

The above also shows that our RFD process has so far been based predominantly on vote counting: 1) participants usually post boldface keeps and deletes; 2) participants add their posts even when that adds nothing to the arguments already presented, serving to increase the number of votes and not number of arguments; 3) the RFD closers sometimes present explicit vote counts as part of closure statements.

It is one thing to allow an occasional vote-count override to handle abuses of the process, it is another thing to vest the closer with argument-assessment powers.

What is Wikipedia doing? Wikipedia's notion of the consensus process is Orwellian and confusing. They pretend to be closing discussions based on the strength of argument, per W:Wikipedia:Consensus: "Consensus is ascertained by the quality of the arguments given on the various sides of an issue, as viewed through the lens of Wikipedia policy." This is unworkable as I have shown above. What happens in practice is some kind of indeterminate mix of vote counting and strength of the arguments. It usually does not happen that a lone dissenter is allowed to close a discussion based on their assessment of the strength of the argument and override the near-unanimity, but such an override is a direct logical consequence of the quoted policy. Their process is Orwellian in so far as they redefine the common word "consensus" to mean something which it does not mean outside of Wikipedia. The actual consensus-based processes in business and other organizations involve discussion where the discussion participants are trying to discuss and exchange arguments until they reach a general agreement (not unanimity), but what is reached is a state in which a supermajority agrees with the outcome. It is good to emphasize the need of arguments rather than bare thoughtless voting but at the end of the day, vote counting is what defines whether there is consensus. Our processes cannot approach the business consensus processes since there is no real-time interaction.

What can be done to improve use of good arguments? The following policy comes to mind:

A RFD post that contains no rationale or only an obviously nonsensical rationale should be stricken out and discounted. "Delete per nom" and "Delete as SOP" count as having rationales, and so does "Keep per Joe Hoe".

This would improve things just a little since many poor rationales are not obviously nonsensical but still poor, but it would allow to discount bare keeps and bare deletes. A similar policy could be adopted for formal votes, for which from what I remember there would be quite some opposition; too many people seem to like bare votes too much. Dan Polansky (talk) 07:13, 3 September 2022 (UTC)[reply]

Stop trying to start the same discussion again because you didn’t get your own way last time. Theknightwho (talk) 11:39, 3 September 2022 (UTC)[reply]

Last time, the discussion was initiated with a question of numerical threshold. Opposers required flexibility and no firm threshold but most did not say the closer should only consider the strength of the arguments being made. Here I am arguing against a specific principle in a way that was not covered in the previous discussion. I wonder what kind of objections can be raised against the points that I made. I don't think anyone can seriously maintain that the strength of the argument should be the sole factor per the above. --Dan Polansky (talk) 12:59, 3 September 2022 (UTC)[reply]

You gave an example of someone with a conflict of interest closing a discussion, as an argument against the principle of closing based on the strength of argument at all. You then made a leap of logic to say that that means we must, therefore, be generally closing based on numbers. This does not follow, because closers can take into account multiple aspects of the discussion.

The point that “strength of argument should be the sole factor” does not entail that we must therefore have a numerical threshold, which would necessarily entail that strength of argument cannot influence the closer at all.

As a side point - please stop endlessly suggesting policy changes that have zero chance of being implemented. There needs to be genuine momentum before a vote happens - and you don’t have it at the moment. Theknightwho (talk) 13:31, 3 September 2022 (UTC)[reply]

The modification to remove the alleged conflict of interest is trivial: an editor who supports keeping does not post keep and then closes discussions as they see fit in favor of keeping, exercising undue authority. And the conflict of interest should not really be the problem anyway since having posted keep does not disqualify the editor as a competent judge of the strength of the arguments. It is not anything like conflict of interest in a legal sense.

No one has proposed how to combine the strength of the argument with vote counts; if we had such a proposal, we could send it to a vote, but there is none. I have no idea how to do it. I am all ears. Ideally I would like to see an example closure of some real-world example where both vote counts and argument strength were taken into account. --Dan Polansky (talk) 13:52, 3 September 2022 (UTC)[reply]

How about you WT:AGF in the closer? Theknightwho (talk) 14:14, 3 September 2022 (UTC)[reply]

I am assuming good faith in the closer. The closer uses their best judgment to assess the strength of the arguments, with the intention to make dictionary better. Under the strength of argument principle, the closer is under no obligation to consider what others think best. The problem is it gives the closer the sole authority. And again, no one has proposed how to combine the strength of the argument with vote counts, and if someone has a formulation and application example, I am all ears. --Dan Polansky (talk) 14:19, 3 September 2022 (UTC)[reply]

But nobody said we should only use the strength of argument criterion. Theknightwho (talk) 14:20, 3 September 2022 (UTC)[reply]

The quoted Wikipedia passage suggest so. And if we should use a combination, how? What is an example of a RFD discussion evaluation using a combination of the two principles? Can it be seen somewhere? --Dan Polansky (talk) 14:27, 3 September 2022 (UTC)[reply]

This isn’t Wikipedia. An example is the closer considering the numbers and the strength of the arguments made, and explaining if the result isn’t immediately intuitive. Theknightwho (talk) 14:41, 3 September 2022 (UTC)[reply]

Where can I find an actual example? Which RFD closure is an example? Is someone else willing to draft a description of the combination of the two principles or work out an example? --Dan Polansky (talk) 15:14, 3 September 2022 (UTC)[reply]

Frankly, having now closed a batch of RFD nominations, the idea that I as a closer should read through the discussion and properly think about the strength of the arguments made and have a look at possible evidence, possibly also checking other dictionaries, seems remarkably impractical. Counting votes is sometimes tedious enough. We should be glad that closers want to do the intelectually trivial closure work in RFD instead of requiring them to assess the arguments. --Dan Polansky (talk) 18:38, 3 September 2022 (UTC)[reply]

The following combination of vote tallying and strength of argument does not work either: Let the method be that the closer considers the arguments made and then disregards those votes that make or refer to arguments that are weak, only tallying those that are strong. This is not solely a strength of argument consideration since the number of votes still makes a difference for the votes that make a strong argument. The problematic scenario is the same as above: 4 voters post "Delete as SOP", 1 voter posts "Keep" with an elaborate non-SOP argument, and the closer closes the discussion as "RFD-kept: the pro-deletion votes were discounted as having a weak argument". It seems that "weak argument" is too subjective a filter, maybe not really subjective but practically subjective by differing too much between editors. The differences in votes cast are themselves evidence of the differences in the strength of argument assessments between editors. A filter that is more realistic is to remove votes with "no argument" or "obviously nonsensical argument", although one can argue that bare "delete" votes are equivalent to "delete per nom" or "delete as SOP" (SOP is the most common rationale in RFD) so even bare deletes are probably not worth discounting. Bare "keep" votes can be understood as "keep as non-SOP"; the question is whether the keepers have more of a burden of proof than the deleters. No other method of combining strength of argument with vote tallying comes to mind. If someone has an idea, I am very eager to hear it. --Dan Polansky (talk) 10:36, 5 September 2022 (UTC)[reply]

@Dan Polansky, Theknightwho: this is too important an issue to be left unclarified, because it throws all discussions into doubt. I have created a vote at "Wiktionary:Votes/pl-2022-09/Meaning of consensus for discussions other than formal votes created at Wiktionary:Votes". — Sgconlaw (talk) 16:20, 21 September 2022 (UTC)[reply]

Making adding sources default

Pretty bold proposal but here goes nothing: What about making it obligatory to add one reference, quote, mention or similar to any entry when creating it?

I see a lot of upsides to this: We will have less clutter on RFV (which would only be needed for rfv-sense, validity verification and WDLs), the reader would always know where we got the entries from, and it's just generally a good idea to double-check any entry you create with the sources we have.

If, for some reason, there are languages out there with so little documentation at all, that we could potentially get a native speaker on the wiki who cannot find any references for words that are in widespread use (so, this word could pass CFI without any references), then we could make a list of such languages and exclude them from this policy, but from my experience such languages are either extremely rare or nonexistent.

Obviously, this proposal doesn't cover reconstructions, which are often justifiably OR, but basic attested languages need to be able to be attested anyway, so why not do it when creating the entry? I feel like I'm overlooking some huge flaw in this (because why wouldn't we have that policy already?), but I can't find it. I'm eager to know your opinions on this. Thadh (talk) 14:36, 3 September 2022 (UTC)[reply]

This is something I already practice. If I am adding a word not from a source, I immediately quote it so as to avoid the HEADACHE that is opening Non-English RFV, and even when I am adding a word from a source I check if it shows up in corpora anyway. I think one of the biggest ways to increase the reliability of Wiktionary is to show our work - either from a reference or with quotes. It's more work but there are SOME tools to help with that, like QuietQuinton and reference templates (yes you have to make it, but it's usually not that hard). Vininn126 (talk) 14:45, 3 September 2022 (UTC)[reply]

I’m okay with this so long as it includes references. I’d be less okay with it if cites were necessary, due to the LDL issue. Theknightwho (talk) 14:53, 3 September 2022 (UTC)[reply]

This isn't an "and" argument, it's an "or" argument. That is the given language should have ONE of the listed items. Vininn126 (talk) 14:57, 3 September 2022 (UTC)[reply]

I’m aware - I was just making my position clear. Theknightwho (talk) 15:03, 3 September 2022 (UTC)[reply]

Obviously, a great deal of the words I add I would probably not be able to quote, but I definitely add a reference in that case. Thadh (talk) 14:57, 3 September 2022 (UTC)[reply]

In short this thread comes down to do you believe in quantity vs quality. Vininn126 (talk) 22:59, 3 September 2022 (UTC)[reply]

I like the proposal but I also see the requirement as too much of an additional burden, so I don't really know at this point. I try to indicate sources in the edit summary but do not bother to format the quotations since it is such a hassle. If new entries (after some cutoff date) failing to meet the requirement were speedy deleted, we might lose some good contributions. And the reader is not obliged to believe an entry that has no substantiation. --Dan Polansky (talk) 15:18, 3 September 2022 (UTC)[reply]

First of all, you really should learn to format sources and start doing it - it's fairly easy with a reference template, and nobody looks at the page's history for references.

Second, I wasn't talking about deleting anything just yet: If we agree to only add referenced entries from now on, that would already amount to quite a lot of good, and we could talk cleanup later.

The third point, however, quite frankly baffles me: The reader doesn't have to trust any of our entries, but we certainly want them to, don't we? Thadh (talk) 15:25, 3 September 2022 (UTC)[reply]

I know how but it is laborious. And I have my priorities. We should not trust anything unsubstantiated either. If we require substantiation for all entries, we can have no entries to serve as hypotheses yet to be verified. Sometimes you are better off finding an unsubstantiated hypothesis than finding nothing. And the practicalities are not obviously surmountable: the amount of work that would need to be done to substantiate, say, 20% of our entries would be enormous. We could start by running a bot to add OneLook to all entries for which OneLook has some of the classical dictionaries (not all of OneLook does that); that would alone increase the volume of substantiation hugely, without a need of a policy. But someone has to run the bot and design it: the bot has to parse the OneLook page to see which dictionaries are there. This could be further done for {{R:GNV}}. For other languages, bots could be adding references for the reference templates that we have collected; that again would hugely increase the level of substantiation without any policy change, but again requires a bot designer and operator. This would be a start. To do all this manually is just an unrealistic effort. And it does not require any policy change. Once that would be done, we could determine how many entries remain without substantiation and that would tell us how much more manual effort is required and whether a policy like the one proposed is worth it. --Dan Polansky (talk) 15:35, 3 September 2022 (UTC)[reply]

I'm inclined to disagree. Sometimes I have this phase where I churn out a ton of German compound entries in a short amount of time for words that obviously exist and that nobody would seriously doubt the existence of (because proof is one Google search away; they're however not always found in the Duden und Co.). I can't help but see this proposal as incurring a lot of unnecessary extra work which would only lead to me being able to create fewer entries in the same amount of time (which clearly outweighs the positives IMO). To give an example, I can create entries such as ganzstündig in less than a minute, but finding a reference or a quotation can itself already take a minute. I assume that Surjection who I regularly see create similar articles for Finnish compounds would feel the same way.

Further, when (or rather, IF) I finally get around to documenting Alemannic, it would be annoying having to find sources for the most basic of words (seeing that we don't even have Alemannic German chaufe or hebe). The only really good reference (Schweizerisches Idiotikon) is rather unwieldy to even just read. This could however be avoided if this proposal doesn't apply to terms that pass WT:CFI by the "clearly widespread use" clause. — Fytcha〈 T | L | C 〉 15:42, 3 September 2022 (UTC)[reply]

For German, GNV works as well, so GNV could work as a minimal substantiation of existence of a form, and could be added by a bot. But GNV only covers a couple of languages, so the substance of the above stays valid. --Dan Polansky (talk) 15:43, 3 September 2022 (UTC)[reply]

@Fytcha: See the third point of my original post regarding Alemannic. On ganzstündig, you seem to have added a quote, so I don't really see an issue; Do we really want to prioritise speed over quality? I personally think a user would rather see one translation he can trust than three he cannot. Moreover, I'm not saying you need to add some kind of perfect quote to illustrate everything, just one simple (even untranslated) quote that fits the CFI would be fine by me. Thadh (talk) 15:54, 3 September 2022 (UTC)[reply]

I think it’s important to separate the issue from CFI, too. While one route for CFI is that a term must be in clear, widespread use, that could still apply to terms for which a source has been added but not a citation. It’s relatively trivial to add dictionary source templates - at most you might need to specify a page number, or some code for the URL to work. It’s pretty unlikely that a common German term is not going to be in Duden. Theknightwho (talk) 16:04, 3 September 2022 (UTC)[reply]

On ganzstündig, you seem to have added a quote, so I don't really see an issue; Right, that was not the ideal example to provide (Islamologe is a better one, no Duden nor DWDS entry for this one either) but I think the point I was making was still clear: ganzstündig (minus the quote) took <1min to create, but then adding a quote can take that same amount of time itself (or even more). So in essence, the proposal drastically reduces my number of entries per time without really decreasing my number of mistakes per time (it however lends more credence to the contents of the entries, I'll give you that).

I personally think a user would rather see one translation he can trust than three he cannot. It's likely this point where we diverge in opinion. I'd personally take "Fytcha's 30k word dictionary without quotes" over "Fytcha's 10k word dictionary with a first page quotation from Google Books" but this opinion is entirely informed by my personal use case for dictionaries (i.e. using them to understand text (so the text's context serves as a sanity check already) and using multiple dictionaries in conjunction). I can totally understand though why somebody would prefer the latter. — Fytcha〈 T | L | C 〉 16:18, 3 September 2022 (UTC)[reply]

Would you also accept something I've been doing with first attestations on Polish terms a la akcesoryjny? It's as good as a quote. (Ignoring the fact it has other citations - I mean imagine an entry with ONLY that.) Vininn126 (talk) 16:21, 3 September 2022 (UTC)[reply]

We don’t prioritize speed over quality. The appendix of a page just does not as a rule partake in its quality, especially if you make it a rule, but it is only a fig leaf. It may even distracting from creating quality, hence one bothers even not with references, that might even add too little or cause more confusion. And as Fytcha hinted, in the area of compounds there are more common German terms than the other dictionaries are willing to include. One could just parse a legal commentary and get a list of thousands of such words that are familiar to jurists at least in context but not added by Duden and competitors, for being too terse. Even linguistics has left a lot of stuff that has passed by the internet references, I noticed when it was 2022 and I had to create words like potamonym, ichthyonym, dendronym, which you have no difficulty to search. So this is the matter with every science, and the bar should not be higher to casually add them, for example while sitting in the library and actually doing something else, when you shouldn’t be browsing Wiktionary. Fay Freak (talk) 17:02, 3 September 2022 (UTC)[reply]

I don't understand your issue - if you found this term in a book you can just quote the book, can't you? You'd have to verify the term if it were sent to RFV anyway, wouldn't you? Thadh (talk) 19:25, 3 September 2022 (UTC)[reply]

One of the issues is that quoting is a faff - there's a reason why we automate references when possible. I also use plenty of jargon at work that is difficult to find citations for - particularly when it's a niche use of an otherwise common word. Theknightwho (talk) 20:56, 3 September 2022 (UTC)[reply]

But how often does that actually happen? Is there really NO way of finding at least one citation/quotation? Vininn126 (talk) 21:01, 3 September 2022 (UTC)[reply]

I didn't say it's not possible - I said it's a faff. Theknightwho (talk) 17:37, 4 September 2022 (UTC)[reply]

I agree with Fytcha that this might be taking up too much time when creating new entries. There is more fuss to do for Chinese, beyond copy-pasting and formatting quotations: the words have to be manually spaced and formatted for {{zh-x}}, and the auto-romanization has to be checked if it is correct or not (which often is not the case so I had to go through more stuff), this part alone takes at least double, perhaps triple, the time used for writing the entry itself. For some entries it might be more tedious, such as hurt#Chinese: without the quote it might take less than two minutes, but with the quote it took at least 15 minutes for me. I had to manually transcribe what was said in the film (which is different from the subtitles since subtitles are always in Chinese not Cantonese, and the audio quality isn't that good) and repeat the above process, not to mention actually finding the source, which also takes up a lot of time.

Nevertheless, I do see the benefits brought by requiring sources when creating new entries, which might be taking less time when compared to an RFV, but I don't think it is worth it for every single entry, perhaps only for the ones that are likely to be nominated for RFV. Instead, I would suggest that this should be something voluntary and recommended, but not mandatory. – Wpi31 (talk) 17:47, 3 September 2022 (UTC)[reply]

This isn't directly an impediment to the proposal, and I'm wary of saying it at all because w:WP:BEANS, but you may recall that a few months ago a user added a word with fake citations, which took a while to uncover. (And fake cites can be hard to uncover, since not all books are digitized, so just being unable to find a cite in Google Books doesn't necessarily mean it's not a real cite; someone found a real cite of Thing that wasn't findable online by happening to be reading the book at the time¹.) Right now, because people can add words without needing to add cites, the pressure to fake cites is low(er), and most cites are added by contributors who've been around a while, who are presumably less likely to add fakes. If we require every new user who wants to add a word to also include a cite, not only are we likely to get a lot of crappy/unusable cites of random webpages (though what is so crappy as to be unusable is less clear these days), but the pressure/benefit to add fake cites goes up.
I'm also not sure the benefit is worth the extra burden it puts on contributors. I generally add cites whenever I'm adding an obscure word, or one likely to be challenged, I often add them in other cases too, but having to always add one even for common and obvious words would be more tedious. Meh. - -sche (discuss) 20:39, 3 September 2022 (UTC)[reply]

Sure. I get it. I still repeat, this feels like a different issue that I would love to hash out. It's just not part of this discussion. Vininn126 (talk) 20:42, 3 September 2022 (UTC)[reply]

Based on all the other feedback, my opinion is solidifying into a firmer "oppose"; this adds a burden on good-faith contributors, doesn't impede either inept or bad-faith crappy contributors who we already see just paste whatever reference templates were on the page they were copying as a model into their new entry without checking whether the reference has the word they're now creating, and makes it harder for both editors and readers to spot entries that are suspect or need improvement because it gives everything the veneer of being referenced whether it is or not. "Add a cite or reference when adding an entry" as an ideal for regular editors to aspire to? Sure. But as a rule to require in all cases? No. - -sche (discuss) 19:30, 4 September 2022 (UTC)[reply]

I'll probably be repeating what most of the people have already said, still, I disagree. I enjoy references and quotes very much, I add them wherever I can. Taking time to make an entry, and prioritizing quality over quantity is a good idea, I believe, though I read some prefer being fast. What I most certainly believe is a bad idea though, is forcing people to put refs at the end of a page, because, since reference templates can be a thing, you can just slap some of them under your page and pretend you did your research. Compare uni#Italian which contains literally all of the Italian ref templates, and not a single one of them actually links to the intended word. That's an extreme example, but it is very common to see ref templates linking to nowhere, and even more common to see ref templates linking somewhere that gives way more information than we are displaying, clearly showing that the presumed source hasn't actually been used as a reference. I only add ref templates if I actually got the information from there, if not, they should be under Further reading. When I see a refless page, I immediately understand 'Oh this needs work', while if every page has refs, I would need to check them everytime to see if they're serious or not. This doesn't seem like it will make better the rfv practices, and (maybe an exaggeration) might even make them slower. Anyone could still make up any (i.e.) Italian word they want, nothing stopping them. Just the additional step of having to type {{R:it:Trec}} at the end of it. Catonif (talk) 22:09, 3 September 2022 (UTC)[reply]

I think this is a deeper issue with a LOT of nuance that should be discussed. Vininn126 (talk) 22:34, 3 September 2022 (UTC)[reply]

It's an issue that will grow tenfold if this goes through. Also I wouldn't want it to be hard for newcomers to make a page. Catonif (talk) 10:49, 4 September 2022 (UTC)[reply]

I agree with your original point, but I definitely think that it's a good idea to teach newcomers to use references from the start rather than have them make hundreds of stub entries that nobody can use. Thadh (talk) 12:28, 4 September 2022 (UTC)[reply]

Just on your point about sources that contain a lot more information than we're showing - I'm guilty of this on occasion, but it doesn't necessarily mean the source wasn't used. In adding Mongolian terms, sometimes I really don't want to spend the time adding 12 senses (10 of which are very niche), when the important thing for the language right at the moment is to get the main senses down. Theknightwho (talk) 17:47, 4 September 2022 (UTC)[reply]

So the priority is to churn out as much as possible ignoring the corner cases? I'm not sure I agree to that. Vininn126 (talk) 17:56, 4 September 2022 (UTC)[reply]

No. The priority is to, well, prioritise. I would much rather a smaller language had wider coverage that made it somewhat useable, than deep coverage over a much smaller number of lemmas. Theknightwho (talk) 03:42, 5 September 2022 (UTC)[reply]

@Theknightwho: You're right, it doesn't clearly show that the source wasn't used, it just hints towards it. And you're also right that it is not always the priority to list every possible sense. Catonif (talk) 12:20, 5 September 2022 (UTC)[reply]

I would support this move with a few caveats: we’d need to have a list for each (major) language where folks can easily find quotes & references (some languages already do this I feel), and then also, we’d need more active editors in general. I try to do this as much as I can with words like ᄒᆞ다 (hawda), but as seen with that entry, it takes time. (And there’s also the culture issue that’s a separate but related topic) AG202 (talk) 23:20, 3 September 2022 (UTC)[reply]

This is something I've been thinking about. Many print dictionaries have bibliographies out of necessity. We do at times as well. While I don't think we should create templates of each journal/book whatever, we should at least make them for more prevalent. We could have a page where we save them - perhaps language considerations or a separate one. Vininn126 (talk) 23:25, 3 September 2022 (UTC)[reply]

Such templates are often saved in categories such as "<lang> quotation templates" and, especially for dictionaries, "<lang> reference templates". In principle there's also the system of Module:Quotations, but I'm not having much joy in making it work for me with collective works such as translations of the Bible. (The Bible should have the advantage of there being a public domain English translation, without relying on the USA's legalisation of piracy of the Authorised Version.) --RichardW57m (talk) 13:49, 5 September 2022 (UTC)[reply]

I oppose this proposal because I think it will be an undue burden on new users, who often have a hard enough time using the site as it is, and also on highly active users who would not find it difficult, but might find it tiresome. —Soap— 17:54, 4 September 2022 (UTC)[reply]

The problem is that most people are not lexicographers. So the premise of the project is already very exclusionary. I think trying to invite people to edit who don't even know the difference between prescriptavism and descriptivism is a very bad idea. Vininn126 (talk) 17:59, 4 September 2022 (UTC)[reply]

"I think it will be an undue burden on new users"-- I agree with this. You've give them a taste of the action, even if they can only do limited work. For instance, I don't want to discourage someone making a slightly malformed page like Xiahuayuan, which I can then fix (see the Edit History). Also, there are words I'm only vaguely familiar with that are indeed words- Chinyang. I'm not in a position to do full cites on it or assess the one cite I did put on there. --Geographyinitiative (talk) 12:53, 5 September 2022 (UTC)[reply]

I oppose making the inclusion of a reference or citation mandatory based on the points made above about it providing additional burden for careful good-faith editors without posing much of an obstacle for bad-faith or careless editors who can simply misuse reference templates to create fake reference links.--Urszag (talk) 22:26, 4 September 2022 (UTC)[reply]

It would run counter to my practice when adding synchronously derivable Pali words. (Pali is an LDL.) I will furnish a quotation for the word I am intending to add, but for the words from which it is derived I do not struggle to find a quotation. Usually in principle I can find references to the word in texts from the Pali Text Society dictionary, but different citable versions have different numbering systems, and then providing a translation without breaching copyright is another major effort. If furnishing a quotation becomes necessary, I will be strongly tempted to simply omit the translation. Another solution will be to simply leave the immediate source(s) of the word as red links - or even as default blue (optionally orange for logged in users) misdirections. I don't think these labour-saving tricks will improve my contributions. As I recall, dictionaries are not admissible evidence for meanings. --RichardW57m (talk) 11:07, 5 September 2022 (UTC)[reply]

By the way - unrelated to this - but dictionaries are admissible for LDLs, as long as the community of editors in that language agrees that they are: "the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention," (WT:CFI#Number of citations), in practice, this list is often not written down and just agreed upon. Thadh (talk) 12:53, 5 September 2022 (UTC)[reply]

What's a 'community of editors' in a language? I see no active mechanism for contacting or joining such a community. (Although {{wgping}} seems set up for such a function, it's not actively used.) --RichardW57m (talk) 13:25, 5 September 2022 (UTC)[reply]

Community of editors is just a fancy name for "people that edit the language". There's no way to officially define it yet. Thadh (talk) 13:25, 5 September 2022 (UTC)[reply]

It would make sense to require some evidence for the existence of a word. Perhaps we could accept a dictionary entry, even though it might not be valid for an RfV challenge. --RichardW57m (talk) 11:07, 5 September 2022 (UTC)[reply]

That's what my original proposal was (hence reference, quote or mention). But it seems even that might be a problem for many editors. Thadh (talk) 12:47, 5 September 2022 (UTC)[reply]

OK, that's tolerable. --RichardW57m (talk) 13:26, 5 September 2022 (UTC)[reply]

For inflected forms etc, would a link back to the lemma suffice? I have three cases particularly worthy of inclusion in mind:

Inflected forms in headwords of lemmas. For example, in Welsh wyth ar ddeugain, ddeugain is the soft mutation of deugain, and I would rather say that the lemma derives from wyth, ar and deugain. It seems superfluous to supply a quotation etc. for ddeugain, which is already linked to from deugain and links back to it in its definition.
Alternative citation forms. For Pali nouns and adjectives, we make the stem the lemma, but alternative traditions make the nominative singular (masculine) or, in the case of nouns in -tar, the genitive/dative singular in -tu.
Homographs of other terms.

For languages with multiple writing systems, there are also the subsidiary lemmas that are homographs of other terms. For these, it should suffice, for creating though perhaps not for retaining, to have a link to (and preferably back) from the main lemma. (There may also be language-specific restrictions.)

I would describe all these terms as subsidiary forms. --RichardW57 (talk) 03:12, 7 September 2022 (UTC)[reply]

I have always understood that existence of inflected forms are to be taken as evidence for a lemma. Vininn126 (talk) 08:19, 7 September 2022 (UTC)[reply]

This discussion was just for lemmas, since inflected forms don't need to be verifiable if they are regular. Thadh (talk) 08:44, 7 September 2022 (UTC)[reply]

I can't see the restriction to lemmas. Do we even have a mechanism for challenging irregular forms until they have their own entries? Even systems of alleged regular inflections may need challenging - Pali grammars conflict on the rarer parts of the system. Welsh plurals, Arabic masculine plurals and Latin 3rd conjugation perfects notoriously lack regular forms. --RichardW57m (talk) 11:35, 8 September 2022 (UTC)[reply]

I would be happy to give this proposal an unconditional support in a modified form: 1) If an authoritative reference exists, it shall be provided. This is easy to do, not laborious at all. 2) If the form is a fairly transparent closed compound that is in Google Ngram Viewer, GNV can be provided and this is sufficient. 3) If only attesting quotations in use exist to support the entry, they can be represented in an abbreviated form using some template, which is only to provide the author and the year or the title and the year. This would not be too laborious to enter, and would show the reader what kind of evidence we have used. It would be additional work, but not too much. The abbreviated forms would then later be expanded by whomever finds it worthwhile. --Dan Polansky (talk) 08:37, 7 September 2022 (UTC)[reply]

I would be fine with that. Again, the whole point of this proposal was to give our readers some - however small - proof that we didn't pull the entry out of our arses and to give them anything to go by if they want to verify the entry. Thadh (talk) 08:45, 7 September 2022 (UTC)[reply]

I see your purpose and support it, as long as it does not become too laborious. If the above or something similar gets approved, this will have been a very good initiative. People objected on the grounds that editors will have it easy to fake evidence, and I do not have a good response to that; when one only gives the author and the year, it is much harder to search than with an example sentence. One thing is for sure: if there was a community-approved template via which I can provide authors and years without quoting the passage, I would be happy to use the template without being forced to do so. If these concerns prevail, the policy could use the language "Editors are encouraged to do X", and that would still be an improvement over what we have since I would then be able to "encourage" editors on their talk page without being accused of impropriety. There would be a template for me to substitute on the editor's talk page, containing a refined polite request doing the encouragement to provide sources. As weak as it may seem, it would be progress. --Dan Polansky (talk) 09:08, 7 September 2022 (UTC)[reply]

There is {{rfquotek}} serving a similar purpose, but it does not allow giving the year, and it expects the author to be a key into a dictionary of Webster 1913 authors, which we do not want. I don't know what "k" at the end stands for. The template could be called {{quote-abbr}} and be used like {{quote-abbr|en|Jeremy Bentham|1890}}, {{quote-abbr|en|Jeremy Bentham|1890|author2=J. S. Mill}} and {{quote-abbr|en|1890|title=Treatise Concerning Things of Great Utility}}. Other names coming to mind are {{quote-stub}} and {{quote-incompl}}. And of course, we could follow the late trend of short template names and call it {{qa}}, {{qs}} or {{qi}}. --Dan Polansky (talk) 10:02, 7 September 2022 (UTC)[reply]

Ease of faking: The proposed evidence is very easy to fake, but Wikipedia's printed references do not fare much better: most readers will not have access to the referenced printed reference works and will not be able to verify that the statement traced to an inline reference is really supported by the source. And even if they have the source, there is often no passage or not even page number, so it is laborious to conclusively show that a given statement is not supported by the given reference. We should require {{qa}} to be pointing to something that is online, or else it will be hard to remove when an online search finds nothing promising. --Dan Polansky (talk) 10:14, 7 September 2022 (UTC)[reply]

I think restricting us to online sources is a bad idea. Vininn126 (talk) 10:19, 7 September 2022 (UTC)[reply]

Restricting {{qa}}, not us in general, since it provides so little identification. {{quote-book}} can use offline sources. But if people want to allow offline sources for {{qa}}, I will not oppose because of that, I just think it unwise. --Dan Polansky (talk) 10:23, 7 September 2022 (UTC)[reply]

Is this intended only to apply to entries (lemma L2 headers, actually), rather than to each etymology and each definition? In English, increasingly, marginal and spurious definitions for well-attested words are being added. Nothing short of attestation really addresses this problem in a readily monitorable way. DCDuring (talk) 13:50, 7 September 2022 (UTC)[reply]

Yes, this was intended for when one is creating an L2 lemma entry. It seems like a good idea however to just add a quote to strange definitions that most people wouldn't know regardless. Thadh (talk) 14:19, 7 September 2022 (UTC)[reply]

Is the proposal that an attesting quote be mandatory for each added definition? Would a footnote to a reference be sufficient? DCDuring (talk) 15:09, 7 September 2022 (UTC)[reply]

Again, the original proposal was adding either a quote to one definition or a reference to the entire entry. Is the "good idea" part of the current version of the proposal? Thadh (talk) 15:14, 7 September 2022 (UTC)[reply]

The original proposal was for an entry. Whether it was subsequently modified, I could not tell since the discussion is TLDR. The word definition only came up in these last few comments. DCDuring (talk) 16:14, 7 September 2022 (UTC)[reply]

Yeah, the word definition thing was just me saying it's generally a good idea to add sources to definitions as well, if it might be difficult for readers to find any verification. Thadh (talk) 16:20, 7 September 2022 (UTC)[reply]

Belated two cents --

I am not happy with the idea that an entry would require inclusion of a reference right from the get-go, as a necessary component of entry creation. As others have noted above, gathering and formatting references can be laborious. This process is somewhat similar in my mind to the process of gathering and formatting quotations. Some of our editors are very adept at that process, and seem to really enjoy doing so, as evidenced by participation in RFV threads.

I am very happy with the idea that an entry should have references as a general matter of style and entry structure.

‑‑ Eiríkr Útlendi │^{Tala við mig} 16:56, 7 September 2022 (UTC)[reply]

Though I sometimes enjoy the hunt for quotes, I usually do not. I view it more as a duty, which I sometimes neglect. If more contributors viewed it as a duty, this mandate would probably not have been proposed. DCDuring (talk) 17:26, 7 September 2022 (UTC)[reply]

Personally it would be a lot easier if I didn't have to provide a translation, but in general I can't really get behind that policy in the long run. Vininn126 (talk) 18:02, 7 September 2022 (UTC)[reply]

What do you mean by "that policy"?

It would be easier for you, but it does defeat the purpose of helping those who know more English than the language of the passage to be translated. Helping such people is, after all, what the justification for what we do here, isn't it? DCDuring (talk) 23:35, 7 September 2022 (UTC)[reply]

"That policy" of not adding translations.~And the reasoning you provided is exactly why I can't get behind it. Vininn126 (talk) 07:47, 8 September 2022 (UTC)[reply]

@Vininn126: Are you saying that you don't think our purpose is to help "those who know more English than the language of the passage to be translated". DCDuring (talk) 14:15, 8 September 2022 (UTC)[reply]

What? I was saying that "while it would be much easier to just not translate, I believe we should". Vininn126 (talk) 14:16, 8 September 2022 (UTC)[reply]

Just view a quotation without a translation as a lot better than nothing. During the course of the next thirty years or so, it is likely that someone will add the translation, or replace the quotation with a better or more translatable one. --RichardW57m (talk) 10:50, 8 September 2022 (UTC)[reply]

I've been simulating this by providing the first attestation as a citation without a translation. Vininn126 (talk) 11:04, 8 September 2022 (UTC)[reply]

Main space vs. other namespaces contributions

I would like for us to introduce a rule by which editors would be forced to keep the ratio between their contributions to the main space and their total contributions at a certain level for them to be able to post in other namespaces. (What do I mean by "post"? I'd say asking questions about words is fine, but pretending to give one's opinion and have an impact on our policies without doing much useful work oneself isn't right.)

This would drive away "policy makers" and other prattlers who don't actively engage in expanding our coverage and actually improving the dictionary.

I already see an obvious way of gaming the system: make a few cosmetic edits to main space entries in a row, and there you are, your ratio is maintained and you can keep blathering away on our various talk pages. That's why the main space contributions would have to be substantial: for example, I would suggest taking into account the number of entries created for words that unobjectionably belong here. (By "unobjectionable", I mean "core vocabulary that no person in their right mind would ever think of excluding".)

Disclaimer: I readily acknowledge creating new entries about basic words is not the only way of doing useful work here, but I think it's a pretty good metric/indicator. Maybe not a necessary condition, but certainly a sufficient one.

P U C – 21:18, 3 September 2022 (UTC)[reply]

@PUC So, if a certain Thai user creates enough bogus verlan entries, we should give them preference over someone who works in difficult languages that require extensive research for every edit? Chuck Entz (talk) 22:09, 3 September 2022 (UTC)[reply]

@Chuck Entz: I spoke of unobjectionable entries above, and a bogus entry is hardly that, so no. I'll admit that "unobjectionable" is left undefined, but everybody will agree that unobjectionable entries do exist (monomorphemic words such as dog would be a good start).

I'm aware such a system would require a good deal of discussion and flexibility to ensure that it's fair and doesn't exclude contributors who do deserve to have their seat at the table from discussion pages. I still think it could be an improvement on the current state of affairs. P U C – 22:22, 3 September 2022 (UTC)[reply]

@Chuck Entz I think this is a(nother) situation where good faith is called for. I'd apply a rule like this to a common sense test. Theknightwho (talk) 22:23, 3 September 2022 (UTC)[reply]

I was exaggerating to make my point. The truth is that there's a certain type of user who systematically creates entries for everything that doesn't get out of the way fast enough, and does it in huge volumes. Some of them manage to avoid obvious mistakes, in spite of knowing nothing on the subject matter. There's definitely a place for such editors, but the volume of their edits doesn't make them any more worthy of participating in discussions. I would contend that some people are too focused on edit counts alone, and I don't want to encourage it.

I also have trouble coming up with examples of more than a couple of main-space-slacking forum hogs- some of the most annoying recent discussions have involved people with substantial mainspace contributions, and there are lots of subject-matter experts we call on whose contributions are mostly elsewhere.

The main problem, though, is that a test like you're proposing requires someone to look through lots and lots of edits, which strikes me as a waste of time. Chuck Entz (talk) 22:52, 3 September 2022 (UTC)[reply]

In short we need to be able to look at an editor's 1) reasoning 2) knowledgeability and from THAT as a community be able to assign weight to arguments. It's very... unscientific/imprecise unfortunately. Vininn126 (talk) 23:03, 3 September 2022 (UTC)[reply]

You know, I agree, even as someone who does exactly that (i.e. 1) "systematically creat[ing] entries for everything that doesn't get out of the way fast enough" (me creating dozens of Armenian entries even though I don't speak a word of it) and 2) "[being] too focused on edit counts alone" (me obsessing over the number of entries I've created).) P U C – 23:12, 3 September 2022 (UTC)[reply]

@PUC: Hi, I'm new and unfamiliar with WMF site practices (although not with editing in general), so I don't know if this counts as a coldpost or necropost, but I suppose you and Chuck Entz could sorta have harmony if you were to create a (Draft) namespace here, as I believe English Wikipedia already has. That way, looking through lots of edits wouldn't feel like such a waste of time. I also suppose you would have fewer cases of non-experts making dozens of mistaken edits, since now you could just move any potentially mistaken entries (and maybe some others, like dubious "new-L2" section additions?) into this Draft namespace so that they could get reviewed by actual experts.
(Personally, I would also like English Wiktionary to have a Long-Term-Abuse list like Japanese Wiktionary does, along with an ArbCom section, but that's a matter for a different time.) Roland Ellis (talk) 21:32, 28 September 2022 (UTC)[reply]

We are often answering questions outside mainspace, so in general much of what is outside namespace is preparatory for work in the mainspace. Some editors have also been scolded for not asking before implementing changes, though their useful edits to modules would neither count towards the mainspace ratio. Forced is a strong word, I think PUC is writing satire here to something above; or a witty remark about something desirable that is not possible, driving attention towards an ideal or its distinctness from the feasible. Fay Freak (talk) 22:39, 3 September 2022 (UTC)[reply]

I appreciate that the gripe here is that some editors put in more useful work than others and that we should recognize them. (I want to die after my work at ackja.) I am unsure what is the best method for this. Currently we recognize certain users in a very unofficial way. This also ties in with the above BP discussion about FRD and the weight of arguments. I think it will take a feat of genius to think up of a way to "weigh/weight" certain editors opinions over others. Vininn126 (talk) 22:45, 3 September 2022 (UTC)[reply]

You probably mean akcja :-p P U C – 23:00, 3 September 2022 (UTC)[reply]

I spent more than 4 hours on that, I've earned my typo. Vininn126 (talk) 23:02, 3 September 2022 (UTC)[reply]

Ideally, this would be a cultural norm rather than a hard-and-fast rule, since it requires judgement to enforce the spirit of it whereas hard-and-fast rules can be gamed. But since cultural norms are harder to enforce than rules, and Wikipedia does in practice get observable value out of protecting certain pages and even talk pages against being edited by people with less than 500 edits / 30 days of activity as a hard-and-fast protection setting, I'm not saying a rule would be useless. But we do, so far, seem to more often have specific problematic editors (who could be blocked) rather than the sorts of organized harassment campaigns Wikipedia has needed to protect pages against. - -sche (discuss) 00:14, 4 September 2022 (UTC)[reply]

I remember once seeing a user be blocked on Wikipedia for using the site as a social network. He was quite young and perhaps simply didn't have a lot to offer in basic content editing. Also from Wikipedia was the phrase "voting-only account" for people with little interest in mainspace but a lot of strong opinions such that they were comparable to vandals. I would support having this type of block as an option here too, but I hope it would be very rarely used and that we should not need to measure a person's behavior in terms of numbers. —Soap— 21:51, 4 September 2022 (UTC)[reply]

The effort to quantify this seems doomed to failure without a great deal of effort. It does not seem worth the effort. I hope we can manage to find good reason, acceptable to a supermajority here and defensible to outsiders, to block someone who seems to violate vague behavioral norms without having to legislate. DCDuring (talk) 01:11, 5 September 2022 (UTC)[reply]

I am probably guilty of recently discussing a lot without having 500 edits during last 30 days in the mainspace, but I would have thought all my previous contribution to mainspace, the thesaurus and elsewhere counts for something. If the unspoken cultural norm is that the previous contribution does not count and that a ratio has to be maintained on a floating 30-day basis or something, please let me know, and an informal guideline can be adopted to that effect, without being mathematically enforceable. It also seems to me that the OP is discounting the value of policy work; many are able and willing to do mainspace work but not all that many are able and willing to make passable policy proposals and back them with sound reasoning and evidence. I am one of the few people who had some success at designing passing votes and policy changes, including WT:THUB; more are at User:Dan Polansky/Votes created. I also think that my participation in RFD is valuable: unlike many others, I always try to provide specific reasoning, maybe too much for the taste of some; RFD could be more speedily administered if more people participated in it, and I think people should be encouraged to participate more in RFD discussions. --Dan Polansky (talk) 06:46, 7 September 2022 (UTC)[reply]

`{{surf}}` shouldn't categorize

It's misleading for e.g. subjugation to be included in Category:English terms suffixed with -ion. It's borrowed from Latin and does not come from *subjugate + -ion. I'm not opposed to including surface analyses, but I thought we wanted to distinguish affixed words from words that start or end with a certain string of letters; otherwise the categories are just wrong. Ultimateria (talk) 21:09, 4 September 2022 (UTC)[reply]

@Ultimateria: It is an English word containing the suffix -ion (unless you want to separate out words with the suffix -ation. You seem to be suggesting a separate category for words containing said morphemes. --RichardW57m (talk) 11:22, 5 September 2022 (UTC)[reply]

@RichardW57m: I don't think subjugation should be in any category for ending in -ion or -ation, because it was not suffixed in English. It doesn't "contain the suffix -ion", it just ends with those letters. Ultimateria (talk) 20:41, 5 September 2022 (UTC)[reply]

@Rua recategorised {{suffix}} from the etymology templates to the morphology templates on 24 July 2014. Any sane morphemic analysis of the word will find one of those suffixes in subjugation, whereas neither should be found in cation. Additionally, note that subjugation can be, and I'm sure often is, regenerated from subjugate in English. --RichardW57m (talk) 09:40, 6 September 2022 (UTC)[reply]

One place where the categorization might be useful are internationalisms like genetyka. Vininn126 (talk) 12:59, 5 September 2022 (UTC)[reply]

@Vininn126: Hmm, that seems fine to me despite the derivation being unclear. But I don't think the majority of uses come from that situation. Also, more broadly speaking, I acknowledge there's a gray area with terms derived from modern languages. One could argue that Spanish campeón, which is borrowed from Italian campione, is suffixed with -ón because the endings are analogous but distinct. In these cases I'm willing to leave the categories untouched. Ultimateria (talk) 20:41, 5 September 2022 (UTC)[reply]

Me, Thadh, and Surjeciton were having a big discussion on similar kinds of things where the borrowing is somehow adapted. Sadly if we want that nuance it might be more difficult to make sweeping changes. Vininn126 (talk) 20:44, 5 September 2022 (UTC)[reply]

I agree with Ultimateria. I also want to draw attention to the fact that many entries are manual categorized into "terms suffixed with -X" when they merely end in -X without them being derived using -X. I think this is wrong but judging by how widespread this practice is, many editors seem to disagree. — Fytcha〈 T | L | C 〉 16:03, 6 September 2022 (UTC)[reply]

I agree that not every word beginning with a sequence should be categoriezed - however there are cases where it's useful. ANother example (along with my above one) would be rocznik. The categorization is useful for uncertain situations. Vininn126 (talk) 16:07, 6 September 2022 (UTC)[reply]

I think it should categorize, especially for terms inherited rather than borrowed from an ancestor term, which is not the case for subjugation. Morphologically, the suffix is there even if it was not the method of production. One may claim the term has inherited the suffix, and there will be a corresponding ancestor suffix in the ancestor term. Whether the borrowed terms are a different case is not so clear. English -ion is descended from Latin -io, so here it fits. -ion entry contains two definition lines marked "non-productive", and this is exactly what we are talking about. If we believe the -ion entry that the suffix is never productive, it would follow CAT:English terms suffixed with -ion would be empty or near-empty. -ion is defined in Merriam-Webster[1]. The surface analysis is not based merely on presence of a substring; it is a morphological analysis. --Dan Polansky (talk) 15:54, 7 September 2022 (UTC)[reply]

I was wondering, @Ultimateria, Dan Polansky, Mahagaja if the usual template {{af}} could give both e.g.

Cat:Greek words (terms) prefixed with xxx- & Cat:Xxx terms formed/created? with prefix xxx-

Cat:Xxx terms combined with xxx & Cat:Xxx terms formed with xxx (do we have this for combining forms?)

Cat:Xxx terms suffixed with -xxx (instead of Cat:Xxx terms ending in -xxx) & Cat:Xxx terms formed with suffix -xxx

and the template {{surf}} might give only the first kind of Cat. I was worried for this distinction of word formation v. surface analysis in Greek, where there are so many affixes that are stritly ancient, but also appear at all descendants. Thank you ‑‑Sarri.greek ^♫ I 16:53, 10 September 2022 (UTC)[reply]

RFD header - abandoning text implying plain majority

Please comment on the following change of Wiktionary:Requests for deletion/Header:

Old:

If there is sufficient discussion, but a decision cannot be reached because editors are evenly split between two options, the request can be closed as “no consensus”, in which case the status quo is maintained.

New:

If there is sufficient discussion, but a decision cannot be reached because there is no consensus, the request should be closed as “no consensus”, in which case the status quo is maintained.

I highlighted the changed parts in boldface. The problem with the old text is that it implies that a plain majority suffices as consensus, which is not what the header said for over a decade, and which was not our practice, and I hope still isn't. Furthermore, the old text implies that editor votes should be tallied, which multiple editors opposed in previous discussions and required that strength of argument should play a role. The new text matches approximately what was in the header for over a decade and it does not prejudge what "consensus" means in any way, merely stating that it is required. That is only fair and takes no sides in the unresolved debate about how to close RFD nominations. The can to should change is minor and should be obvious. Thank you. Dan Polansky (talk) 10:20, 5 September 2022 (UTC)[reply]

Largish SoP Numerals

If a multi-word phrase is only used to denote a number larger than 100, and cannot appeal to WT:COALMINE, is it necessarily living on borrowed time? WT:CFI#Numbers,_numerals,_and_ordinals seems to imply so. If it is a synonym of a word that does meet CFI, and the phrase is the commoner way of expressing the number, how does one avoid creating a permanent red link to the phrase when listing synonyms? Not listing it seems wrong.

I've come across a Welsh number chwegain whose primary meaning is '120', whose spelt out synonyms are cant ac ugain and cant dau ddeg, which appear to be prohibited as lemmas if they only have the meaning '120'. The cardinal chwegain seems to be obsolescent - its main meaning seems to have become '50 pence', i.e. '120 pre-decimalisation pence' (shades of Welsh grôt)! --RichardW57m (talk) 12:13, 5 September 2022 (UTC)[reply]

It might be worth putting the individual words in [[ ]] (or however you best think it should be subdivided), so it's listed as a synonym but doesn't go to a redlink. Theknightwho (talk) 15:02, 5 September 2022 (UTC)[reply]

What I met is a by-form of chweugain, which already has those synonym problems. --RichardW57 (talk) 21:53, 5 September 2022 (UTC)[reply]

Should words with circumfixes also have the categories for the corresponding prefix and suffix?

There are many circumfixes that can be broken down into a prefix and a suffix, for example ver- -en, ont- -en, be- -en, ge- -t; and the {{head|circumfix}} template used on the articles for these circumfixes does automatically link to the prefix and suffix they're made of. However, words that use the {{circumfix}} template only get the category for the circumfix. I think these words could just as well be analysed as having the prefix and suffix.

I recently added the German section for be- -en and the corresponding category Category:German_terms_circumfixed_with_be-_-en (such an article and category already existed for the same circumfix in Dutch) and wanted to add this category to the appropriate words, which includes many of the words currently in Category:German_terms_prefixed_with_be-, so I was looking through that category and changing them to use {{circumfix}}. They usually used a template like {{affix}} or {{confix}} before and I noticed that when I replaced it with {{circumfix}}, they would no longer be in the categories for the prefix and suffix. Whether {{affix}}, {{confix}} or {{circumfix}} is used, it looks the same to the user, and it seems pretty arbitrary to me to choose one of the two analyses as either having a circumfix or a combination of prefix and suffix, I think both are valid. As a user I would expect be- -en words to show up when I look through the German_terms_prefixed_with_be- or German_terms_suffixed_with_-en categories. Maybe the {{circumfix}} template should also add the corresponding prefix and suffix categories? Tajoshu (talk) 17:23, 5 September 2022 (UTC)[reply]

@Tajoshu: How are any of the above examples circumfixes? The ending part is inflectional, part of the infinitives that we happen to use as a citation forms of German verbs. When I clicked on Category:German terms circumfixed with be- -en I hoped that it would contain beschissen but it doesn’t, though even that is questionable in the afterthought.

So the greater problem is not that they can be analysed being prefix plus suffix but that they can be seen as one part derivational and one part inflectional. While inflectional circumfixes exist, should the concept of a circumfix that is both inflectional and derivational be ascribed reality? And then, we don’t categorize inflections by their affixes employed anyhow. @Fytcha, Mahagaja, I suggest you to exercise your discretion to take the appropriate measures.

Until you afford actual examples, to constitute discernible relevance in front of the ambitions of my intellect, I am not waker enough to deliberate the solution of your abstract question, and it may be the effective reason for others not answering. Fay Freak (talk) 19:29, 8 September 2022 (UTC)[reply]

@Fay Freak: Yes, -en is the infinitive, but the point is that it makes it a verb. -en is seen as a derivational suffix in its role of deriving verbs from nouns. The reason this derivation is portrayed as a suffix -en on Wiktionary is because this suffix is contained in the dictionary form, which is the form that the derived verbs will be listed under (compare this description at -en: "[the suffix -en] may be understood as the suffix for denominal verbs in general (actually -∅ derivation or conversion plus an inflectional suffix that happens to be part of the citation form of a German verb"). The circumfix be- -en derives a verb from a noun, with a specific kind of meaning as can be seen in the description "Forms transitive verbs from nouns, meaning 'to apply/provide (noun) to'" in the section on the Dutch version of the suffix, this is different from applying the prefix be- to something that is already a verb and thus happens to end in -en. I am not the person who decided to consider -en a derivational suffix or consider ver- -en, be- -en etc to be circumfixes, others have done that before me (as I said, the corresponding Dutch articles already existed), and that was not the issue I wanted to discuss. beschissen is a case of a different sense of be- -en, where the -en part is a past participle suffix rather than a verb (infinitive) suffix. This sense could also be added to the section at be- -en though currently it doesn't have any translation or description yet anyway. I would have probably added beschissen to the category later; as I described in my post I was in the process of adding words to the category, having just created it, but then I stopped because I noticed that it made them disappear from the categories for the be- prefix and -en suffix that others had added them to, and I wanted to resolve this issue before continuing, thus I made this post to clarify the situation and come to a consensus before I add more words to the category. Tajoshu (talk) 21:41, 10 September 2022 (UTC)[reply]

Chinese etymology sections should not use zh

Chinese etymology sections should not use zh, especially for {{psm}} which the pronunciation of the Chinese lect makes a large part in choosing the characters. Majority of the current usage of zh are in fact Mandarin-only, which should be cmn instead, meanwhile Category:Mandarin terms derived from other languages is extremely underpopulated. Note that this usage results in nonsense such as Category:Chinese phono-semantic matchings from Cantonese. For orthographical loans such as {{wasei kango}}, the zh could remain, or we could use the inclusive zho instead.

In addition, in cases such as 麥當勞, Mandarin, Hakka, and Min Nan did not directly loaned the term from en, instead it was via Cantonese, which then loaned the orthography into Chinese, so it should be denoted by {{bor|yue|en|McDonald's}} and {{der|zh|yue|-}}. (ideally there should be a separate category/template for this type of borrowing for a word to first be borrowed from other languages and then becomes pan-Chinese, but I think this solution is better than what we are currently doing)

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, ND381): – Wpi31 (talk) 17:49, 6 September 2022 (UTC)[reply]

@Wpi31 What I currently do is that if a term is only used in one Chinese lect, say Cantonese, then it is {{bor|yue}}; otherwise I would use {{bor|zh}}. Are you suggesting that it should instead be the first language where the borrowing (would have) occurred? Sometimes it might be difficult to determine, like 紐約 for example. — justin(r)leung _{{ (t...) | c=› }} 19:01, 6 September 2022 (UTC)[reply]

Side note: I do not think PSM is particularly more dependent on the pronunciation than other borrowings. In fact, it probably is the other way around since the pronunciation is the only thing that other borrowings depend on essentially. — justin(r)leung _{{ (t...) | c=› }} 19:03, 6 September 2022 (UTC)[reply]

@justinrleung: Yes, that's exactly what I'm suggesting: it should be using {{bor|yue}} (if it's first in Cantonese), and then use {{der|yue|zh}}(or maybe {{obor|yue|zh}} if it makes sense). For terms where the original lect is unknown, such as 紐約 or 倫敦, they can keep it as is.

Regarding the part about PSM, I meant all the different types of phonetic borrowings. Since the word is phonetically borrowed, there should be a reference to the pronunciation (if it's identifiable), not just a nonspecific zh.

(please also read the next BP section, because I don't to bother people by mass-pinging twice in a short period of time.)

– Wpi31 (talk) 04:25, 7 September 2022 (UTC)[reply]

I'm a bit cautious about this when it comes to terms where I don't know how much lectal spread there is, or when it's uncertain at what period it entered Chinese at all. For example, I can be sure of the origin of 騰格里 (Ténggélǐ) without knowing which lect it entered first (probably Middle Chinese, but I don't know). Theknightwho (talk) 15:51, 7 September 2022 (UTC)[reply]

A presage of the end of the digital tyranny of Template:zh-pron! --Geographyinitiative (talk) 15:33, 9 September 2022 (UTC)[reply]

Categorisation (topics and labels) in Chinese

Currently we have Category:zh:All topics but no Category:cmn:All topics or Category:yue:All topics. (or Category:hak:All topics which exists but contains a category which in turn contains itself lol) Also, there are no separate categories for Category:Mandarin vulgarities or Category:Cantonese vulgarities, only Category:Chinese vulgarities, which lumps everything into one category. (likewise for other categories) This makes a user who is only interested in, say, Mandarin derogatory terms to be overwhelmed by words in other lects in Category:Chinese derogatory terms, which contains 1000+ entries, many of which aren't relevant. These issues rendered these categories with low to zero usability. Besides, the Chinese categories are sorted by radical, which isn't something most of the learners and online people familiar with, whereas splitting them by lect also allows sorting of words in an order based on the lect's phonology.

Therefore, I am proposing that we split the above categories by lect, while keeping the current Chinese category alongside the new ones. The {{cln|zh}} and {{topics|zh}}ones should be fairly easy to do, simply adding {{cln|cmn}}, {{cln|yue}}, {{cln|hak}}, {{cln|nan}}, {{cln|wuu}}, etc. should do the job.

For the categories generated by {{lb}}, I am thinking of a {{zh-lb}} which would sort pages with some trickery? For example, the second sense on 屌 could be something like {{zh-lb|c,h,p,Zhongshan Min, Guangxi Mandarin|vulgar}}. (Also, while we are at this, the not-so-ideal {{lb|zh|dialectal}} could be also cleaned up in this process)

(don't want to mass ping again, so I'll just hope everyone gets the ping from the previous section and reads this as well) Wpi31 (talk) 18:30, 6 September 2022 (UTC)[reply]

@Wpi31: This is definitely something that should be dealt with. While having a new template is probably easier, I wonder if it's possible to do with the current {{lb}} template. — justin(r)leung _{{ (t...) | c=› }} 04:44, 7 September 2022 (UTC)[reply]

I would rather we do this. The long-term goal is probably to integrate the other Chinese-specific templates into the main ones if at all possible. Fundamentally, any special features can be achieved by simply checking what the language-code is and implementing them - not so easy for the stuff that's already separate, but with new things we should definitely take that approach. Theknightwho (talk) 16:34, 7 September 2022 (UTC)[reply]

OED treatment of proper nouns

Let me drop the following investigation I made elsewhere here to BP for later reference. OED, a classic and very impressive dictionary, lacks surname Darwin, first names Martin and Paula, and the cities of London and New York; in New York, they define it as some kind of attributive adjective, mentioning New York only in etymology. From the New York entry, one would get the impression that OED avoids proper nouns and specific entities like a plague: they would rather include New York as an attributive adjective than being forced to admit New York is a city. But OED has Sirius (star), Mars (planet) and Milky Way; it has Homo sapiens; it has river Thames but not river Nile; they have entry Nile, but the river is only in etymology. OED Canada entry has the country only in etymology. OED Europe entry has the continent only in etymology and has European Union as the only sense. OED has no entries for Asia, Ontario and Germany. But OED has China as a country. OED Star Wars has the military defense strategy as the sole sense, having the franchise only in etymology. The selection criteria of OED for proper names remain elusive; it seems pretty chaotic and inconsistent. From our voted-on coverage of proper names (esp. geographic ones) it follows we do not plan to follow the OED in its exclusion of proper names and specific entities. Ontario is instructive: we have 14 specific entities covered. If someone want to have a look at more examples of what OED is doing and post the results here, that would be cool. --Dan Polansky (talk) 06:35, 7 September 2022 (UTC)[reply]

I found this summary/report interesting. Thanks for digging into it. I view this aspect as one of the opportunities for Wiktionary to exceed the performance of other dictionaries. Which is not to denigrate those others; it is merely to say (if one allows a metaphor) that even excellent athletes are not superhumanly perfect and that there is always another athlete who can also be excellent. Quercus solaris (talk) 21:00, 9 September 2022 (UTC)[reply]

OED has United Nations, League of Nations, U.S., United States, United Kingdom, W.H.O. and Interpol. --Dan Polansky (talk) 17:26, 13 September 2022 (UTC)[reply]

My impression is that there has been some change over time: the OED initially was fairly strict about not including proper nouns, but then this came to be reconsidered somewhat after it caused awkward handling of cases of words derived from proper nouns such as adjectives, etc. "The Case of the Missing O.E.D. Words" (Jesse Sheidlower, November 28, 2012, The New Yorker) says "James Murray, the dictionary’s first editor, made an early editorial decision that the O.E.D. would not include any proper nouns—this was regarded as the province of the encyclopedia, not the dictionary—and that words formed from proper nouns would likewise be excluded. This was a poor policy, which was quickly rescinded: “American” was duly entered when editorial work progressed deeper into the letter A, but by then it was too late to make changes in the “af-” section." The Guide to the Third Edition of the OED says "Proper names are not systematically covered by the dictionary, though many are entered because the terms themselves are used in extended or allusive meanings, or because they are in some way culturally significant."--Urszag (talk) 01:43, 17 September 2022 (UTC)[reply]

I always got the distinct impression that the study of proper nouns and their origins is shunned. Everyone goes wild about finding the origins of the verb sense of shanghai, but no one lifts a finger to find the origin of English langauge loan term proper noun Shanghai. Proper nouns are presumed to be a great banality. --Geographyinitiative (talk) 01:59, 17 September 2022 (UTC)[reply]

I put the above in a rewritten form to Wiktionary:Oxford English Dictionary, and added more content, including treatment of prefixed words and OED thesaurus. Thanks to Urszag for the links. I created shortcut WT:OED. --Dan Polansky (talk) 18:25, 6 October 2022 (UTC)[reply]

I created Wiktionary:Merriam-Webster with MW's treatment of proper nouns. --Dan Polansky (talk) 13:15, 7 October 2022 (UTC)[reply]

@Dan Polansky Great resource you are making. Something analogous to 'hot words' is also discussed- see [2] "In such a situation, the editors determine that the word has become firmly established in a relatively short time and should be entered in the dictionary, even though its citations may not span the wide range of years exhibited by other words." --Geographyinitiative (talk) 13:44, 7 October 2022 (UTC)[reply]

interface-editor group proposal

Such group (Επεξεργαστές της διεπαφής) is present in Greek wiktionary.

I need editinterface right to edit a page in MediaWiki namespace and i don't need sysop group. —Игорь Тълкачь (talk) 16:54, 7 September 2022 (UTC)[reply]

Is it really necessary to create a new role? We could imagine repurposing/renaming the interface administrator role to what you're suggesting, and creating a nomination process to that role as well.

@Chuck Entz, Surjection, are the roles of administrator and interface administrator bound in some way? Has an interface administrator all the prerogatives of an administrator? P U C – 17:20, 8 September 2022 (UTC)[reply]

They are historically. It used to be that only admins could edit interface pages until it was split off into a separate group. What is the problem here that makes it impractical to request edits to the appropriate pages or that which requires adding new user groups? — SURJECTION ^{/ T / C / L /} 19:18, 8 September 2022 (UTC)[reply]

1) I don't know what is the problem to press Ctrl+C and Ctrl+V to fulfill the simple request from 2022.08.31. At this moment, it still got 0 response and implementation. It may take months, so it is much effectively to do such tasks by myself. 2) Same problem with the proposed group: it only takes few lines of code to add it to enwiktionary and then get it from bureaucrat via Special:UserRights. I would not propose new group if i got response that does tell am i suitable for interface-administrator and why. —Игорь Тълкачь (talk) 10:55, 10 September 2022 (UTC)[reply]

Probably the problem is that nobody saw it, just ask here, or at one of the other discussion pages (Grease Pit is the other obvious choice) or ping someone. My guess is there are not very many followers of niche interface pages. - TheDaveRoss 13:31, 12 September 2022 (UTC)[reply]

Yeah, I must have been away from my watchlist and missed the edit request. I made the change to MediaWiki:Edittools proposed by User:Useigor. Not to exclude the possibility of nominating more interface administrators though. — Eru·tuon 04:40, 13 September 2022 (UTC)[reply]

Changes to WT:LEMMING

I made some changes to WT:LEMMING and was reverted. My changes are in diff. Thus, I propose:

1) Add "OED, AHD, Cambridge, Collins, Macmillan, Longman, German Duden and Spanish DRAE" as example dictionaries. These are the dictionaries that were listed in the lemming vote and these are the kind of dictionaries that are being mentioned in support of LEMMING. They match the definition of "general monolingual" dictionary already present in LEMMING, entered there by me based on the 2014 discussion some years ago.
2) Add Talk:George VI and Talk:Joan of Arc as examples where LEMMING was not followed. Thus, the reader will see at least part of the extent to which this is not actually applied. Seems very useful. More examples could be added as objective evidence of actual acceptance and rejection of the principle.
3) Add "Further discussions can be found from Special:WhatLinksHere/Wiktionary:LEMMING." This is useful for the reader who wants to know how far the principle has been invoked in discussions. Nothing wrong with that, from what I can see.
4) Add 'History: The principle arrived to this page via diff on 7 September 2007 in a different form: "Terms that have entries in other dictionaries, especially specialized ones." The principle proposed in 2014 was about general dictionaries, not specialized ones. The term "lemming test" occurred in a 2007 discussion at Talk:genuine issue of material fact.'
This is to honestly report that a) some form of the principle is as old as 2007, and b) that the principle originally specified in terms of "specialized" dictionaries. It is accurate and of historical interest. Discussions pre 2014 referencing the principle invoked the principle in that form. I see nothing wrong with that: again, accurate and interesting.

Please comment, and sorry for the bother. Dan Polansky (talk) 11:40, 8 September 2022 (UTC)[reply]

All of this should be discussed somewhere. It may well be that Wiktionary:Lemming principle needs to be created as a separate page, not a redirect, and linked to from WT:CFI and other places. At least initially it would be less than a policy page. DCDuring (talk) 17:45, 9 September 2022 (UTC)[reply]

I think this is the place for that discussion, in fairness. Theknightwho (talk) 19:14, 9 September 2022 (UTC)[reply]

@Theknightwho Can you please indicate which parts of the proposal you disagree with or if there are any parts that you are ok with? --Dan Polansky (talk) 16:06, 12 September 2022 (UTC)[reply]

Revised Enforcement Draft Guidelines for the Universal Code of Conduct

You can find this message translated into additional languages on Meta-wiki.

More languages • Please help translate to your language

Hello everyone,

The Universal Code of Conduct Enforcement Guidelines Revisions committee is requesting comments regarding the Revised Enforcement Draft Guidelines for the Universal Code of Conduct (UCoC). This review period will be open from 8 September 2022 until 8 October 2022.

The Committee collaborated to revise these draft guidelines based on input gathered from the community discussion period from May through July, as well as the community vote that concluded in March 2022. The revisions are focused on the following four areas:

To identify the type, purpose, and applicability of the UCoC training;
To simplify the language for more accessible translation and comprehension by non-experts;
To explore the concept of affirmation, including its pros and cons;
To review the balancing of the privacy of the accuser and the accused

The Committee requests comments and suggestions about these revisions by 8 October 2022. From there, the Revisions Committee anticipates further revising the guidelines based on community input.

Find the Revised Guidelines on Meta, and a comparison page in some languages.

Everyone may share comments in a number of places. Facilitators welcome comments in any language on the Revisions Guideline Talk Page. Comments can also be shared on talk pages of translations, at local discussions, or during conversation hours. There are planned live discussions about the UCoC enforcement draft guidelines; please see Meta times and details: Conversation hours

The facilitation team supporting this review period hopes to reach a large number of communities. If you do not see a conversation happening in your community, please organize a discussion. Facilitators can assist you in setting up the conversations. Discussions will be summarized and presented to the drafting committee every two weeks. The summaries will be published here.

On behalf of the T&S Policy Team

Mervat (WMF) (discuss • contribs) 11:10, 9 September 2022 (UTC) Mervat (WMF) (discuss • contribs) 11:10, 9 September 2022 (UTC)[reply]

Harm done by User:Quercus solaris

I've been watching this guy's edits with increasing incredulity. Basically he seems to feel it's necessary to write a little self-aggrandizing essay on every entry, where we should really be thinking about readers and learners, and not about "look at me, I'm clever because I know a scrap of Greek". This ultimately came to a head at Talk:Church-of-Englandism. You guys all know that I'm not very tactful. But err yeah has anyone else looked at his edits and thought "hey, it's not your homework"... I feel he is a net harm. Just raising it. And it's not "revenge" for today's disagreement, rather this is the culmination of seeing him expand a lot of everyday simple, helpful etys into baffling pretension. Equinox ◑ 04:54, 10 September 2022 (UTC)[reply]

See my latter replies at the thread linked above. Equinox seems to seize upon the outliers of my mainspace edits that could stand improvement on concision and act like I'm a horrible monster in my net effect on Wiktionary. I'm quite willing to rework for brevity in any specific spot where constructive criticism applies. But the edits I've been making to ISV combining forms lately, for example, are simply in line with the same kinds of information that MWU and Dorland's report in their homologous entries. Which anyone with subscriptions to those works can easily verify for themselves. No one has to read my userspace reflections if they don't want to (TL;DR). The number of times when anyone at Wiktionary has been subjected to my too-long talkspace analyses/expositions (TL;DR) is literally only a dozen outliers when a nuanced subject was being steamrolled. My nine thousand mainspace edits are 99.8% sound. Quercus solaris (talk) 05:11, 10 September 2022 (UTC)[reply]

Also, I need to acknowledge something that is not meant to be personal but is definitely (objectively, nonpersonally) very relevant. Reading comprehension matters. There is nothing "baffling" among the 99.8% of my mainspace edits, to anyone who uses unabridged dictionaries. Anyone who uses MWU or MW Collegiate or MW Medical or WNW or AHD or Dorland's or Stedman's or Taber's or the Oxford series (e.g., Dictionary of Science, Dictionary of Chemistry) reads mere facts, all day long, that are no more baffling than my mainspace contributions to Wiktionary. Anyone who uses AHD reads usage notes that are longer than any that I've ever written at Wiktionary except a handful of outliers that have already been trimmed (entirely accurate but TL;DR). Reading comprehension matters. Learning from dictionaries matters. I and my work colleagues have all learned from the dictionaries that we use. We read them, and we learn from them. They are not obligated to bend over backward to avoid teaching fairly simple facts by prompting a user, for example, to click through on a bluelink and learn something thereby. If a Big Word is encountered, it is duly bluelinked for the user's pedagogic convenience. That's what dictionaries, and hyperlinks, are for. If you, dear reader, are someone with enough reading comprehension to have read this whole comment, good on you. Quercus solaris (talk) 05:26, 10 September 2022 (UTC)[reply]

@Equinox Can you cite some examples? I see a ton of contribs by this user and I haven't encountered what you're referring to in a casual look-through. Benwing2 (talk) 05:49, 10 September 2022 (UTC)[reply]

@Benwing2: This is the kind of edit by this user that I find counterproductive: diff. If I were a reader who saw enough of these, I would start to assume that Wiktionary's usage notes are pointless word salad. Ultimateria (talk) 19:24, 14 September 2022 (UTC)[reply]

This is useless and should not be kept. Vininn126 (talk) 19:31, 14 September 2022 (UTC)[reply]

Wrong tone, wrong style and wrong section. Get rid. Theknightwho (talk) 00:03, 15 September 2022 (UTC)[reply]

I think the problem is that they have trouble providing structure: as long as they're editing in a format with its own inherent structure, they're fine. It's only when they have to explain things in a free format like in usage notes that they get lost and ramble around. They definitely need to work on that for their own sake: unreadable emails, essays, papers, proposals, etc. are going to damage their prospects in real life. Chuck Entz (talk) 03:28, 15 September 2022 (UTC)[reply]

@Ultimateria Thanks for the links. Yeah I've seen overly wordy and confusing etymologies and usage notes before. I tend to delete them and believe others should feel free to do the same. Benwing2 (talk) 05:06, 15 September 2022 (UTC)[reply]

I do not feel his edits are net harm, even if they are sometimes too wordy. I also really don't understand why you want to escalate things. Honestly, read that talk page and imagine that it's two strangers: would you think that the way you act there is the way an admin should? Or even just a nice person? I really don't get it. I try to not escalate and try to only be rude when someone else is rude first (God knows I don't always meet that standard), but now you're broadcasting this conversation while admitting that you lack some social grace. Don't you both have some things that could be done better here? I really don't even know why you responded to his post: had you written nothing at all, it would have been a silly talk page message that was maybe a little pointless, but altogether harmless and now we have this. Moreover, his actual edits in the main namespace for this entry were useful, correct? I sincerely don't even understand the complaint. —Justin (koavf)❤T☮C☺M☯ 07:54, 10 September 2022 (UTC)[reply]

I don't think Koavf has much understanding of this project. He's a Wikipedia guy. But hey. No matter. I mostly try to create missing words. Equinox ◑ 01:34, 21 September 2022 (UTC)[reply]

(edit conflict) I'm not familiar enough with this situation to comment on the substance of Equinox's complaint. But I will say that linking "reading comprehension" three times in a single comment does come across as unnecessarily passive-aggressive and condescending to me. Of course one comment is an extremely limited sample size. If you're a heavy, long-term contributor, it's statistically inevitable that, one day, someone is going to catch in your least flattering aspect. I could easily single out a couple of instances in which Equinox soapboxed on talk pages from the 99.8% out of his contributions that are constructive. Some editors might also be thought of as specialist doctors who have yet to hone their beside manner. Really good at the information-and-technical side of wikis, less proficient at the interacting-with-other-people part. Stick two editors of this type in a room and horns may very well lock. No one likes having their less favourable tendencies reflected back at them. WordyAndNerdy (talk) 08:24, 10 September 2022 (UTC)[reply]

(Why do people write "edit conflict"? You just re-post your comment... what is the point of saying "I had a temporary issue while posting...?") @WordyAndNerdy: Yes, I guess "locking horns" has been a thing. I think I could have done more convincing, if I had recorded everything that bothered me, but hey. If I'm wrong, this guy may continue to edit the project long after I am dead; if not, vice versa. I assume that people who are good at "interacting with other people" actually go out and do so, and don't edit on a Saturday night. Shocking eh. Equinox ◑ 01:33, 21 September 2022 (UTC)[reply]

The usage note in developed country seems overdone. There is some substance there, but too many words. One defect is the very long complex sentences. I also find the note very opinionated and non-neutral, which is probably the fault of the referenced style guide. Wikipedia article for developed country does not make any such point. Anyone wants to fix it or remove it? (And I find all the wikilinking of words in the note annoying, treating the reader like some kind of incompetent who needs to look up half of the words used.) --Dan Polansky (talk) 08:20, 2 October 2022 (UTC)[reply]

Category:English words ending in "-gry" and Category:English words ending in "-yre"

These are super garbagey categories despite having been around for years. These are not suffixes and we don't create categories for every random combination of letters. Anyone object to me nuking the categories and removing them from the pages they've been manually added to? Benwing2 (talk) 05:46, 10 September 2022 (UTC)[reply]

You make a good point, and the existence of such categories at Wiktionary seems superfluous/redundant to search engines that can search for terminal strings and are widely available to laypersons, such as for example the Free Dictionary's 'ends with' search function; plenty of those can be found in top google results for search for word endings. Maybe in future, Wiktionary can offer the 'ends with' function under "Advanced search" at Special:Search. (Or maybe it already does and I just didn't dig it up while glancing there.) Quercus solaris (talk) 06:43, 10 September 2022 (UTC)[reply]

The "words ending with "-gry" thing is a common trivia trope, so I'd like to see it preserved in the appendix namespace, but I don't think it's really that important to keep. Agreed that these aren't proper suffix endings and should be deleted, tho. —Justin (koavf)❤T☮C☺M☯ 07:56, 10 September 2022 (UTC)[reply]

How does one do a Cirrus-Elastic searchfor a pagename ending in "gry"? There are no end-of-line markers in our regexes. The best I can come up with is "intitle:/[a-z]+gry/ -intitle:/[a-z]+gry / -intitle:/[a-z]+gry[a-z]+/ incategory:"English lemmas". (I probably shouldn't need two 'intitle's.) Having such a regex documented would eliminate the need for such categories or for indexes based on the last characters of pagenames. DCDuring (talk) 14:38, 10 September 2022 (UTC)[reply]

@DCDuring: It isn't possible to do a proper search for something at the end of a title with intitle. I'm guessing (or maybe I read somewhere) that they wanted to disable $ for wikitext searches (insource) and by mistake extended the restriction to intitle. I can't think of any reason why we shouldn't be able to do insource:/gry$/ for words ending in -gry. I posted a Phabricator task for this because I couldn't find one. — Eru·tuon 05:01, 13 September 2022 (UTC)[reply]

@Benwing2 Wouldn't this be better suited to WT:RFDO? - excarnateSojourner (talk | contrib) 07:29, 19 September 2022 (UTC)[reply]

Note that there's an open, old RFM about the -gry category, and an RFDO of the similar German -nf category]. - -sche (discuss) 18:13, 11 November 2024 (UTC)[reply]

CJKV Character list by Ideographic Description Characters

I miss an appendix of characters ordered according to the Unicode ideographic description character used in their description. Backinstadiums (talk) 08:24, 10 September 2022 (UTC)[reply]

I recommend that if such an index were created, the phraseology "ideographic" should be deprecated. The vast majority of Chinese characters are not ideographic but are instead phono-semantic compounds (see Category:Han phono-semantic compounds), that is, an element in the character is related to the pronunciation of the character and an element is related to the meaning. The Orientalist tradition is to ignore the sound component in favor of the "mysterious Oriental" meaning component. --Geographyinitiative (talk) 13:09, 11 September 2022 (UTC)[reply]

That is the name of their Unicode block, so it should stay as such. Unicode itself should change it first. Backinstadiums (talk) 09:00, 12 September 2022 (UTC)[reply]

Let's do it tho' Backinstadiums (talk) 06:46, 18 September 2022 (UTC)[reply]

Don't we already have something that does this? Appendix:Unicode/CJK_Unified_Ideographs ‑‑ Eiríkr Útlendi │^{Tala við mig} 05:13, 20 September 2022 (UTC)[reply]

@Eirikr: There need to be different lists for each, and within thsese there're variants too. Backinstadiums (talk) 08:09, 20 September 2022 (UTC)[reply]

Inappropriate synonymy

I'm new to the Wictionary aspect of the Wikiverse, having spent most of my time learning and starting to work in Wikidata. I'm trying to determine the appropriate interconnection between entities and claims in Wikidata and other resources like Wictionary, Wikisource, etc. I'm interested to learn the cultural norms for addressing issues like I came across today with an aspect of my own profession. Rather than simply undo the changes from some time ago, I'd like to learn about the best way to go about providing needed corrections.

In 2021, @Vivaelcelta added much information to the term, ecologist, based on an erroneous synonymous relationship between ecologist and environmentalist. This is at least the case in most uses of the terms in the English language but could have variations in Spanish or other languages and cultures. Though I am not a fluent Spanish speaker, my relationships with ecologists from around the world lead me to believe this is also an invalid synonym in other cultures.

The most appropriate higher level concepts in English for an ecologist are scientist and researcher. Ecologists study and strive to understand the relationships between species (habitat creators and habitat users) in their environments, including organic and inorganic aspects. Apart from what they may do in personal life, ecologists in their professional capacity do not advocate for one thing or another. All research scientists seek knowledge through scientific study and experimentation and then share that knowledge in various formal and informal ways. The additions to this definition and declared synonyms with environmentalism as a political movement do need to be corrected. What's the best way to go about that and help to course correct this kind of thing in general? Skybristol (talk) 13:35, 10 September 2022 (UTC)[reply]

The synonyms have been specifically added for the second sense, which is defined as "one who advocates for the protection of the environment". The "scientist/researcher" concept you mentioned is listed as the main sense. If you think that ecologist doesn't have the additional meaning "environmentalist", you should take it to WT:RFVE. –Jberkel 14:20, 10 September 2022 (UTC)[reply]

@Jberkel - Thank you very much for the pointer to the request for verification route on this kind of thing. I followed your direction and posted an rfv-sense comment. I appreciate your guidance. Skybristol (talk) 12:59, 11 September 2022 (UTC)[reply]

“Old Ruthenian” language

This is an incorrect name. Ruthenian is correct. The relevant categories and pages should be moved. Also, Old Belarusian and Old Ukrainian should probably be renamed Middle Belarusian and Middle Ukrainian (the “Old” varieties are aspects of Old East Slavic).

Old refers to the Old East Slavic language (ancestor of Ruthenian) in the literate period, aspects of which can in some contexts be called Old Belarusian or Old Ukrainian. Regional aspects of the Ruthenian language are Middle Belarusian and Middle Ukrainian.

Refer to w:Ruthenian language. I scanned through every accessible source cited in that Wikipedia article, and only found the term Old Ruthenian in one,[3] but unfortunately, without only the clipping view of three book pages I can’t determine how it is defined. One paper uses Middle Ruthenian.[4]

For the periodization, see w:Ukrainian language#Chronology.

Also, the family tree at Category:Old Ruthenian language is inaccurate. The modern languages should be placed as descendants of the “Old” versions, and perhaps Old Rusyn could be added and renamed Middle Rusyn along with the others. (I also notice that at Category:Old East Slavic language, the relationship of Middle Russian and (Modern) Russian is backwards, with the former as descendant of the latter.) I have no idea how to edit this.

Previous discussions:

Pinging editors from related discussions and page histories: @Thadh, @ZomBear, @Wikitiki89, @CodeCat, @Atitarev, @-sche, @Benwing2, @Crom daba, @PulauKakatua19, @Brittletheories.

—Michael Z. 2022-09-11 19:40 z 19:40, 11 September 2022 (UTC)[reply]

@Mzajac: The categorisation is fine, etymology-only codes cannot be placed as direct predecessors of any other language.

"Old Ruthenian" is a perfectly valid name, cf. [5]. "Ruthenian", on the other hand, is ambiguous, since it also refers to Rusyn (cf. [6]).

Does "Old Rusyn" have any literature? I doubt that, so we won't need it as an etymology-only code.

I don't really care about Old/Middle Belarusian and Ukrainian's naming, so I'll let others decide on that. Thadh (talk) 20:20, 11 September 2022 (UTC)[reply]

The 1975 w:Paul Robert Magocsi paper with the enlightening footnotes uses Ruthenian “at the suggestion of the editor,” but the author uses Rusyn in his more recent works over three decades, including the authoritative Encyclopedia of Rusyn History and Culture. —Michael Z. 2022-09-11 22:50 z 22:50, 11 September 2022 (UTC)[reply]

The fact that it’s used is not the same as being used for this subject. In how many of those sources does it mean Old Ruthenian as distinct from Middle Ruthenian? The first result[7] in your Scholar search appears to use it as a synonym for Old East Slavic and not this language at all, as it appears to be about loanword inherited by Russian from an “Old Ruthenian” language that existed in the eleventh century. —Michael Z. 2022-09-11 22:59 z 22:59, 11 September 2022 (UTC)[reply]

Old Ruthenian is used frequently in literature. Vininn126 (talk) 21:55, 11 September 2022 (UTC)[reply]

Can we find any examples where it’s defined? I suppose it might be a calque of davn’orus’ka or starorus’ka mova. Keep in mind that in different period contexts Ruthenian means “of or pertaining to Rus,” “East Slavic,” “Ukrainian,” “Rusyn,” and even “Russian.” Nevertheless, Old Ruthenian and Middle Ruthenian are varieties of Ruthenian, and even though they’re sometimes synonyms Rusyn language and Ruthenian language have distinct primary meanings, so I see no sense in using the qualifier “Old” where no one can tell me what meaning it adds. —Michael Z. 2022-09-11 22:39 z 22:39, 11 September 2022 (UTC)[reply]

Missing French entries

While generating the latest "wanted pages", I noticed that for French there are many verbs which appear to have a full set of bot-generated inflected forms without the lemma: ébossa, entôla, entraccusa, entraccorda, ensoutana, ensoufra and more, all created by Dawnraybot (talk • contribs). Does anyone know what is up with these? Buggy bot run? – Jberkel 15:57, 12 September 2022 (UTC)[reply]

I've noticed this too (for years); in my spot-checking, the forms are usually real and fr.Wiktionary has the lemmas, e.g. fr:ébosser. I think WF (or someone) created only the inflected forms, as that could be done en masse by bot, but not the lemma forms, because that would require a human with knowledge of French to write one or more definitions. A combination of laziness and industriousness. (One solution would be to create the lemma forms with {{rfdef}}. Another would be to delete the forms en masse and let another bot recreate them later once someone adds the lemmas and inflection tables.) - -sche (discuss) 16:35, 12 September 2022 (UTC)[reply]

Yeah, it was Wonderfool. The idea was to bot-create the forms and someone, within 10 long years, would surely have made the lemma by then. Almostonurmind (talk) 20:44, 15 September 2022 (UTC)[reply]

RFD header - the consensus is determined primarily based on tallies

Please comment on the following change in Wiktionary:Requests for deletion/Header

Old:

Currently, there is no fixed supermajority requirement and consensus for closing any request is judged at the discretion of the closer (usually an administrator or another experienced editor).

New:

The consensus is determined primarily by tallying the posts in support for various outcomes, but there is no fixed supermajority requirement and consensus for closing any request is judged at the discretion of the closer (usually an administrator or another experienced editor).

I believe it is a fair description of what we do while it still leaves room for discounting of poorly reasoned votes and the like by using the word "primarily". Having such a fair description in the header seems preferable. The accuracy is supported by the following observations: 1) people often post bold keep and delete; 2) people often post without their post adding anything to examination of arguments, like "delete", "delete per nom", "delete per the arguments already made" and similar. This would make no sense unless editors believed their votes counted as votes, not merely as arguments; 3) some RFD closures contain explicit tallies; 4) people often say "vote keep" and the like, using the language of "vote", so they believe they are actually voting. This all still leaves a lot of room because of "primarily", because of no fixed threshold (which is a pity, but it is so), and because of the discretion of the closing editor. Thank you.

A further enhancement would be "The closer may discount some votes" and by setting a fixed threshold since it cannot be desirable that each closer arbitrarily chooses a different threshold, but this goes beyond this record. Dan Polansky (talk) 16:05, 12 September 2022 (UTC)[reply]

Please, Dan, stop. Previous protracted discussions have shown you have no support for these changes. I have rolled back all your edits and would strongly suggest you make no more unilateral changes before getting consensus. Benwing2 (talk) 05:29, 15 September 2022 (UTC)[reply]

Why did you undo all the changes? Each of my changes had an edit summary explaining it. The normal consensus process is that people make changes and if there are no specific objections to these changes, they can stay. This is the process I used in this page that is not vote controlled. I tried hard to make each change as uncontroversial and corresponding to observable facts as possible. About the change proposed in this thread, what are the specific objections? The normal Wikipedia consensus process requires that people who want to undo changes to provide some rationales. Were my changes inaccurate? How so? --Dan Polansky (talk) 09:48, 15 September 2022 (UTC)[reply]

I undid all the changes because the change 'The consensus is determined primarily by tallying the posts in support for various outcomes' seems tendentious and like it is trying to gradually change policy through seemingly unobjectionable edits, and I didn't have time to sort through all the changes to figure out which ones were fine and which ones weren't. Benwing2 (talk) 04:53, 26 September 2022 (UTC)[reply]

Whatever is meant by "tendentious", the statement is true, as argued above. My effort seems futile. The good thing is the current header implies tallying anyway, by saying "If there is sufficient discussion, but a decision cannot be reached because editors are evenly split between two options ...". So I don't see what the fuss is about. The bad thing is it implies the tallying is against a low threshold, by the language used. I was trying to move away from that language so that the header only says that no supermajority threshold is agreed on. --Dan Polansky (talk) 17:15, 26 September 2022 (UTC)[reply]

I don't think there is ever much consensus for relatively minor wording changes that require the community to pay attention to details that don't seem to pose an actual problem. We seem to have more of a common law approach than a civil law approach. DCDuring (talk) 14:34, 26 September 2022 (UTC)[reply]

Our CFI takes the civil law approach. We do not take common law approach in that we do not recognize the force of precedent as binding: we have no stare decisis. Further reading: Stanford Encyclopedia of Philosophy. The discussed header itself is in civil law spirit. The problem with the common law implied approach is that it is not and that what really happens are changing whims as new admins arrive and start deviating from previous practice. It is the arbitrary whimsical deviation from previous practice that is the problem here, and I don't see any solution to it any time soon since this is what multiple admins are happy with. --Dan Polansky (talk) 17:15, 26 September 2022 (UTC)[reply]

Reconstruction:Proto-Uralic/pukta-

This entry appears to mix two different things: Proto-Uralic *pukta- (to caper, jump, run; and not "to shoo" as is currently indicated) and Finno-Volgaic *pukta- (to wake up). The two Uralonet links at the end of the entry clarify this. Would someone more knowledgeable about proto-languages please correct this? Thanks. Panda10 (talk) 17:21, 12 September 2022 (UTC)[reply]

Unicode 15

Not a discussion as such, but Unicode 15 has been released, which will therefore impact {{character info}}. Theknightwho (talk) 15:32, 13 September 2022 (UTC)[reply]

Entries on simple.wiktionary, but not on en.wiktionary

I have generated a list of titles that are found on the Simple English Wiktionary, but not on the English Wiktionary. In some cases this may indicate an error on the part of simple: or an omission on the part of en:. In other cases, especially when it comes to multiword terms, it could just be due to differences in editorial judgement/inclusion policy.

=> Wiktionary:Todo/On simple.wiktionary, not on en.wiktionary. 98.170.164.88 02:38, 14 September 2022 (UTC)[reply]

Thanks for this. I think your point about different editorial judgment is a useful one to bear in mind, as I note quite a few are verbs with prepositions that learners may find useful to see as set terms. Theknightwho (talk) 13:47, 14 September 2022 (UTC)[reply]

I've just created most of the ones that looked worthwhile to me. Some are obvious errors on simple.wikt, and I have marked those for speedy deletion there. Equinox ◑ 15:03, 14 September 2022 (UTC)[reply]

Yeesh, I haven't looked at Simple in a long time, but there sure are a lot of errors. - TheDaveRoss 15:22, 14 September 2022 (UTC)[reply]

I added the organism names on the list. The missing ones were names for south Asian species which were in our queue. DCDuring (talk) 15:58, 14 September 2022 (UTC)[reply]

West Polesian and Surzhik

@Atitarev, -sche We don't seem to have any policy on classifying these lects yet, and the former seems to already cause a problem. I won't pretend I'm at all knowledgeable about these, but maybe you have an idea how to handle them? Thadh (talk) 18:12, 15 September 2022 (UTC)[reply]

@Thadh: WT:RFM is probably the place to request language codes, if they are missing and if they are true languages or dialects. I am not sure if Surzhyk, Trasyanka or Balachka can be classified as languages, though. Anatoli T. ^{(обсудить}/^вклад) 23:26, 15 September 2022 (UTC)[reply]

I think this is where we should decide if they are. Vininn126 (talk) 10:01, 16 September 2022 (UTC)[reply]

Are there grounds to consider Surzhyk a Russian based pidgin? Vininn126 (talk) 17:26, 16 September 2022 (UTC)[reply]

@Vininn126: Surzhyk is not a pidgin, but a Mixed language. It is primarily based on the Ukrainian. --ZomBear (talk) 17:42, 23 September 2022 (UTC)[reply]

"West Polesian" this is the West Polissian dialect of the Ukrainian, not a different language. --ZomBear (talk) 02:02, 23 September 2022 (UTC)[reply]

Silesian is dialect of Polish, Slovincian is dialect of Kashubian and Kashubian is usually considered to be dialect of Polish, yet we treat them as languages. We have separate space for Slavomolisano too. Sławobóg (talk) 21:39, 25 September 2022 (UTC)[reply]

Silesian and Kashubian are considered dialects by SOME (cough Miodek), but that doesn't make them dialects. Vininn126 (talk) 22:11, 25 September 2022 (UTC)[reply]

Expand "Descendants" part of WT:EL

The "Descendants" portion of WT:EL is literally just the following:

List terms in other languages that have borrowed or inherited the word. The etymology of these terms should then link back to the page.

There is no definition on what terms to include and what not to include, no explanation on the meaning of "borrowed" and "inherited", no examples and guidance on how to list the terms in the first place, and no indication to use the {{desc}} template. It's basically "you're on your own" when creating the Descendants section of entries.

I am new to Wiktionary, so more experienced Wiktionarians should discuss on how to expand/expound this portion of the entry layout.

TY. — 🍕 Yivan000 ^view_talk 00:52, 17 September 2022 (UTC)[reply]

I think this is probably a good idea, however most of the rules are written out in the documentation for {{desc}} and the like, but we should at least direct editors to said documentations. Vininn126 (talk) 12:00, 17 September 2022 (UTC)[reply]

We do have WT:ETY, but it's incomplete. Thadh (talk) 13:14, 17 September 2022 (UTC)[reply]

GODlessness

Why was GOD deleted? We have LORD. I'm trying to work out how we should be handling the Welsh equivalents using small capitals. The equivalent of 'LORD' corresponds to the use of the divine tetragrammaton, while for Duw the use of small caps indicates that it is the use of Hebrew אֱלֹהִים to mean 'God', as opposed to the use of some word just meaning 'god' for which grammar permits (or requires) capitalisation in the context. (I think I've seen the latter usage in English, but I haven't seen it recently, despite looking). --RichardW57 (talk) 16:47, 17 September 2022 (UTC)[reply]

GOD was deleted because its only content was "God Neo also known as the One who consists of the Trinity which is the divine three entities which are as follows the Father The Son & the Holy Spirit all infused...", entered by IP 208.54.39.242. If GOD vs God is used contrastively the way LORD vs Lord is (described in LORD#Usage_notes and Talk:Lord), I would think it'd be just as includable, if entered with better content. You'll notice on that talk page and the Tea Room discussion of the related GOtt vs Gott that not everyone thinks such things are worth including, so maybe this is a good opportunity for people to weigh in. My opinion is that different capitalizations are includable to the extent they're contrastive, which these kinda are: they're not as distinctive as god (just any deity) vs God (the Christian one), but they still indicate a distinction in at least some works (the use of the tetragrammaton vs another term). - -sche (discuss) 17:32, 17 September 2022 (UTC)[reply]

Thanks. God is indeed synonymous with Lord. --RichardW57 (talk) 02:36, 18 September 2022 (UTC)[reply]

To create an entry for GOD, I may have to be creative over copyright issues. Am I required to ensure that I would not be responsible for it being illegal to sell some attribute-preserving edit of a dump of Wiktionary? The risk I have in mind is the aggregation of material available under restrictive licences that would permit individual Wiktionary pages but would prohibit selective aggregation. --RichardW57 (talk) 02:36, 18 September 2022 (UTC)[reply]

I'm not sure that a single word that is presumably used by millions of people can be copyrighted, even if it originated from a copyrighted text. If you were to copy the definition or other text from a copyrighted work, that might be different, but it's possible to have a perfectly fine definition without that. Chuck Entz (talk) 03:06, 18 September 2022 (UTC)[reply]

It's even fine for a dictionary to copy (with attribution) passages that exemplify usage. Otherwise the OED would be a massive copyright violation. I don't see the issue here. 70.172.194.25 04:08, 18 September 2022 (UTC)[reply]

Most dictionaries don't give permission to copy and modify their text and sell the result. Wiktionary and its editors do.

It's not the single word or its definition that bothers me. It's the three 'independent' durably archived quotations that do. --RichardW57 (talk) 07:51, 18 September 2022 (UTC)[reply]

The difference isn't generally relevant, as the dictionary also doesn't own the copyright to the quoted text, so has no discretion to impose any restrictions on its use beyond the actual copyrightholder (that is to say, if they can do it, so can anyone else). What dictionaries do hold copyright over are things like the definitions (if non-trivial) and any original commentary, as well as the layout, typesetting and overall structure of the work (taken in aggregate). These kinds of usage examples are undoubtedly fair use, though. Theknightwho (talk) 08:27, 18 September 2022 (UTC)[reply]

A licence might permit the use of a text A provided that it does not constitute more than 25% of the derived text B. A text C derived from B might exclude the 99.9% of B that is not in text A; it would then be in breach of the licence that allowed text A to be used in text B.

Is "fair use" compatible with CC BY-SA? --RichardW57 (talk) 10:13, 18 September 2022 (UTC)[reply]

Restrictive licences tend to work the other way around, where they permit the use of no more than X% of the original text to be used in derivative works (which would therefore pose no issue). I can't think of any commercial situation where it would be advantageous to allow use in full, but only on the condition that the derivative work also contain other works.

And yes, fair use is compatible (it's used on Wikipedia all the time, and it just means ensuring things are no longer/more detailed etc than necessary for the point being made). The use of snippets to demonstrate use is also well-understood to be a necessary and practical part of writing a dictionary. Any profit that it generates for the dictionary publisher is only generated in the context of publishing that dictionary. Theknightwho (talk) 10:47, 18 September 2022 (UTC)[reply]

Can you point to an item that says it is relying on fair use. Our terms of use say that such a reliance must be indicated in the talk page or a change edit, though related material claims it can be put in the item itself.

The licence for the KJV says extracts from the KJV must not exceed 25% of the derived work. Wiktionary may be non-commercial educational use, but its non-WM clones aren't non-commercial. Moreover, there's a 500 verse limit. Who's doing the counting? Editors in the UK are at legal risk if they use the KJV for quotations. --RichardW57 (talk) 15:03, 18 September 2022 (UTC)[reply]

So the specific issue is that: (a) The KJV is under Crown copyright, and a license is issued by CUP on its behalf which restricts use to 500 verses for certain uses; it also stipulates that the text used must comprise no more than 25% of the work in which it's quoted; (b) Wiktionary publishes text under the CC-BY-SA 3.0 licence in all situations, which contains a stipulation that no legal restrictions may be applied other than attribution and share-alike; (c) the CUP licence is therefore incompatible with CC-BY-SA 3.0.

I don't see how this situation is any different to us giving any other quotation from a copyrighted work. Theknightwho (talk) 15:53, 18 September 2022 (UTC)[reply]

Yes, anything dated after 1950 should be treated with deep suspicion. We probably need quotation templates modified to include a defence of the inclusion of quotations, and warn writers of derivative works that they are not protected by CC BY-SA. I'm not sure if attributions and defences in change histories are intended to be difficult to use. --RichardW57m (talk) 13:05, 20 September 2022 (UTC)[reply]

The KJV is particular problem for UK editors. RichardW57m (talk) 13:05, 20 September 2022 (UTC)[reply]

I don't see how it's a particular problem: it's copyright of the same kind as any other - it just so happens that it won't expire. Your suggestion has pretty major implications for the project as a whole, and I quite honestly don't think it applies due to the nature of fair use. I appreciate that is open to interpretation, though I have no doubt this issue has been considered by WMF's legal team, as it would be them (and not individual editors) who would be liable; irrespective of where they're based in the world. However, what I don't really understand is why you're singling this work out. It's only special in terms of duration, which is immaterial to your concern. Theknightwho (talk) 14:31, 22 September 2022 (UTC)[reply]

I'm now seeing some problems with the Welsh word, where the contrastive usage may be harder to demonstrate. --RichardW57 (talk) 02:36, 18 September 2022 (UTC)[reply]

Request: template-editor permissions

Hi - I'm finding that I'm increasingly needing to request permission to edit templates and modules, or asking others implement changes, as I've been doing quite a lot of work in building the module infrastructure for Mongolian in preparation for generating inflection tables etc. See the frequent requests at the Grease Pit that I've been making, and there are several more that will need to be made. Outside of the really large-scale stuff, I'm generally capable of implementing this stuff myself (e.g. MOD:mn and MOD:txg (for Tangut) are largely my own work, or heavy modifications of other pre-existing modules), so it would be nice to be able to do things myself without having to bother others about it. Theknightwho (talk) 17:10, 17 September 2022 (UTC)[reply]

Granted. Vininn126 (talk) 11:13, 18 September 2022 (UTC)[reply]

Thanks! Theknightwho (talk) 11:35, 18 September 2022 (UTC)[reply]

Contributions by User:Rajkiandris

So, this user has had this a long time coming, but today after hearing that they have been outright lying about their sources, not to mention a history of poor source interpretation, using sources in and about languages he doesn't know, using sources of very questionable validity or even very certain invalidity as an authoritative source, etc. etc. I have indef-blocked Rajkiandris.

As this editor has been quite active over the years and edited numerous languages and - believe it or not - even more entries, not to mention the hundreds upon thousands of translations, that are all of questionable validity as well, I believe it is a good idea to collectively go through all of their contributions and check, fix and/or delete them. This might be a job the likes of etyl cleanup (although hopefully not quite as big).

@Surjection has offered to assist with the technical part of this - that is, if possible, designing a bot job to go through Rajkiandris' contributions and tagging them either for general cleanup (perhaps we'd need a special category for this) or, specifically, with {{t-check}}. I've also discussed this with @Vininn126 and @Theknightwho who both appear to be in favour of such an operation.

I would love to hear everyone's opinion on this and any idea, addition, concern you may have. Thadh (talk) 19:32, 19 September 2022 (UTC)[reply]

It took you 9 years to block this guy? That's sloppy adminning... Almostonurmind (talk) 19:37, 19 September 2022 (UTC)[reply]

Minecraft sense of creeper

Why was this closed as delete? It clearly passed. I extensively attested it, with over a decade worth of cites, but certain people seem inclined to ignore evidence and policy changes as it suits them. I have a huge to-do list – a half-decade-old backlog of fandom and online slang – but clearly every contribution is building on sand. There isn't as much time for making headway in yet-to-completed tasks when one has to circle back to things one thought were done. I am tired of having to continually re-litigate matters that should be settled. I'm tired of jumping through prohibitive and seemingly unnecessary hoops only to have those labours deemed insufficient.

I learned to hunt for citations in obscure theses and dissertations in order to attest slang that's already in wide use in certain online spaces to the satisfaction of CFI hardliners. That didn't always work, though, for various reasons. (Academic interest does not always seem to mirror popular interest. I had better luck attesting incel slang under the old CFI framework than I did Star Wars fandom slang.). The point is that there's something like a three-to-five year lag before new coinages filter into print. If they ever filter into print. For the past decade, the sole online resource for attestation was Usenet, a platform that's been defunct for much longer. It's still an invaluable resource for the documentation of language. But very few people are having conversations on Usenet in 2022. It was a much-needed breath of fresh air when the door finally opened to using Twitter for attestation earlier this year.

I fought for the amendment to CFI for years and now it is being blithely ignored. I went above-and-beyond in attesting creeper as a kind of preemptive fire-proofing (it only needed three cites spanning one year – I found sixteen spanning eleven years) and it was still deleted because some deletionist busybody had it in their crosshairs. I'm done. Nothing will ever be good enough and nothing will ever change. WordyAndNerdy (talk) 05:00, 20 September 2022 (UTC)[reply]

The problem does not seem to be with accepting Twitter as a source but rather with interpretation of WT:FICTION, per Talk:creeper. In that talk page's RFV archive, those disputing the quotations required "figurative" use, not mentioned in WT:FICTION. Furthermore, they seemed to fail to notice that simile ("like X") is in fact figurative use, per definitions in simile and also in figurative. Thus, 1) figurative use is not required; 2) figurative use was demonstrated. At least two people in the RFV in Talk:creeper opined that the quotations were no good, but that seems baseless policy-wise.

I would undo the RFV closure as incorrect given the above, but I don't want to get into fight. It is essential that RFVs are closed in keeping with vote-on policy; this one wasn't, IMHO.

A vote explicitly adding Twitter as an accepted source to CFI would settle its status for future RFV discussions. --Dan Polansky (talk) 06:37, 20 September 2022 (UTC)[reply]

The definition "A mottled black, white and green enemy in the video game Minecraft, which attacks the player by chasing them and exploding" was fine, and helped the dictionary users understand the similes. By contrast, Wiktionary:Votes/2022-05/creeper validation mentioned "something that explodes violently and unexpectedly" as a definition, but that one was not supported by similes. It is the literal definition that is supported by similes, not the metaphorical one. People may complain that similes are not out-of-universe references, but Wiktionary:Criteria for inclusion/Fictional universes has the following as an acceptable example: "Wielding his flashlight like a lightsaber, Kyle sent golden shafts slicing through the swirling vapors". It would seem similes are out of universe, as voted on. --Dan Polansky (talk) 06:55, 20 September 2022 (UTC)[reply]

I've explained multiple times that the prevalence of simile citations is an artefact of tailoring searches to try to winnow out results for other senses. If creeper weren't a word with multiple and varied pre-existing senses, it probably would've been easier to find idiomatic Minecraft-inspired usages. As it stands searching for "like a creeper" instead of "creeper" substantially reduced the signal-to-noise ratio. It went from a vast unfiltered mess to a narrower pile of idiomatic references to stalkers, vines, and insects, in which there was at least the hope of catching an idiomatic reference to the Minecraft beastie. "Like a [something]" may be a blunt instrument for finding idiomatic usages but it worked. It was a stone pickaxe that let me dig through dirt blocks to the big lava cave with diamonds in it.

I've also explained that I prefer idiomatic definitions of terms taken from works of fiction to unidiomatic ones (see, for example, Jabba the Hutt). This is how I've understood the CFI criteria for the inclusion of fiction-derived terms since I've been here. However, I'm not strongly attached to "my" way of doing things in this case, and I was fine with the definition remaining an unidiomatic description of the Minecraft monster. In fact, the definition that was removed was an unidiomatic description of the Minecraft monster. At any time, someone could've checked the definition that was actually in the entry, which isn't always the same as the definition on the citations page. At any time, if someone wasn't satisfied with the gathered citations, they could've dedicated an hour of their life to searching for cites as I did.

This whole experience has underscored my conclusion that some contributors want to dismantle areas of this wiki that offend their subjective sensibility of what a dictionary is and what it ought to contain. The RfV nomination for creeper went untouched for over a month after I announced that I'd gathered citations. When someone took notice of the discussion again, it was only to unilaterally throw up a roadblock. The point is that no citations, definition tweaks, or votes are going to satisfy someone who's got a word (or sometimes a whole category of words) in their proverbial crosshairs.

I'm tired of having to litigate things over and over. I'm tired of having to navigate seemingly arbitrary obstacle courses. I want to be able to get things done. WordyAndNerdy (talk) 11:25, 20 September 2022 (UTC)[reply]

Kiwima is an inclusionist (see her user page); she took significant initiative to get text added to CFI to allow terms like this AND to implement a process get the term kept, although the latter regrettably didn't succeed. It might have felt like a roadblock, but I really doubt it was meant as one. This, that and the other (talk) 11:32, 20 September 2022 (UTC)[reply]

This was a most regrettable situation. The vote was set up in a confusing and arguably incorrect way, and this was partly my fault, so I apologise for that.

What has been said is very reasonable and I agree with much of it. I would only add this: cites from Twitter are somewhat problematic when it comes to WT:FICTION. The context of a tweet is informed by the tweets to which the author is responding, or other tweets in a thread. If some of these tweets were posted in the context of a discussion or thread about Minecraft, WT:FICTION is not satisfied. Without seeing the broader context of each of the tweets, it isn't possible to know this one way or the other. This is part of a broader issue with Twitter cites, where a citation consisting of a single tweet is not always enough to establish the sense being used - we need some kind of citation template allowing us to capture two or three tweets belonging to the same conversation.

(And then, yes, you have a few naysayers who refuse to accept that Twitter could possibly be a valid source of CFI-compliant citations, even when presented with 16 cites over 11 years. Even the OED cites Twitter; I have no idea what these people think a 21st-century dictionary should look like.)

More pertinently, we still haven't figured out a way to allow senses supported by non-classic non-durably-archived online cites to pass RFV - there are a couple of entries languishing at WT:RFVE right now that cannot be dealt with unless this issue is resolved one way or another. This, that and the other (talk) 11:32, 20 September 2022 (UTC)[reply]

I've always taken the "independent of reference to that universe" portion WT:FICTION as stipulating that citations A) not be from the source works (e.g, the Star Wars films, novels, etc.) and B) feature idiomatic or genericized usages of the word. Lightsaber and warp speed are examples of a genericized terms. Both can be traced back to specific works of fiction, but have since become generic words for futuristic tech, used in unrelated sci-fi works.

Things are a bit more complex when it comes to identifying idiomatic usages. There aren't extensive rules clearly delineating what does and does not constitutes idiomaticity. Sometimes it seems to come down to an individual determination. I would consider the following idiomatic: "My brother is a real Han Solo. You know, like, from Star Wars? Always goes in with guns blazing." The key for me is that Han Solo is being used to convey meaning independent of the character's name. It's serving as a synonym for "go-getter" or "man of action." The reference to Star Wars explains the reference to Han Solo but doesn't strip the usage of its idiomaticity. CFI explicitly states that citations still count as uses even when they contain definitions of the terms they are attesting. "They raised the jib (a small sail forward of the mainsail)..." would be acceptable as a citation. Why selectively hold fiction-derived terms to a higher standard of attestation when it isn't stipulated by policy?

The issues you've raised regarding textual continuity aren't limited to Twitter. We typically only quote one or two sentences from books, magazines, newspapers, etc. That's not always a wide enough picture to provide all the context necessary to accurately determine intended meaning. But attestation can be a balancing act between multiple concerns. We cannot reproduce large portions of copyrighted texts for obvious reasons. I often find myself limiting quoted text to the minimum required to convey meaning. Reproducing entire Twitter threads could open up an additional can of worms. Beyond copyright issues, there's privacy concerns. We can justify quoting the tweeter using a term, but the two or three tweeters up- and downthread of them, probably less so. WordyAndNerdy (talk) 13:53, 20 September 2022 (UTC)[reply]

Should Scots be an LDL?

I've noticed that Scots is currently on our list of well-documented languages, which seems really out of place to me. Although it's a recognised minority language in Scotland and Northern Ireland, it is not used as a language of governance, speakers often code-switch with English (which leads to it being conflated with slang or colloquialisms), and only a very small number of materials are published in Scots in any given year. I really don't see why we should be holding it to a higher standard than, say, Welsh, which has a comparative number of speakers, but is actually used as a language of governance and has a far larger output in terms durably archived material. Theknightwho (talk) 11:22, 20 September 2022 (UTC)[reply]

I certainly think so: from my understanding, Scots is a very oral language and is not generally recorded for writing or used for composition (Burns aside, of course). I'm happy to be corrected, but I think that the Scottish people have generally preferred to keep communication in Scots and even Scottish English as oral communication and save written communication for standard British English. —Justin (koavf)❤T☮C☺M☯ 11:30, 20 September 2022 (UTC)[reply]

Having grown up 30 miles from the border (on the English side), that is exactly my experience. A lot of Scots words are used in Northumberland, too, but you won't see them written down other than on social media or occasional quotes in newspapers. Theknightwho (talk) 11:34, 20 September 2022 (UTC)[reply]

Wiktionary:Beer_parlour/2020/November#Attestation_of_Scots For reference. Vininn126 (talk) 12:33, 20 September 2022 (UTC)[reply]

@Theknightwho, Vininn126: Indeed in any case, as I have become aware at some point recently but failed to notify to the community yet, we will have to remove it from the WDL list if Wiktionary:Votes/2022-08/Regional and Obsolete variations as LDL's passes since the main reason for its presence on it (according to the discussions I have taken notice of but not partaken in, due to me personally finding no interest in Scots, so you should not suspect me to be biased in accordance with previously uttered stances) appears to be to prevent English terms sneaking in i. e. circumventing the attestation requirements by their being passed off as Scots (however an unbelievable concern this sounds in 2022), but now we would be more supportive of regiolectal English anyway. Fay Freak (talk) 18:17, 22 September 2022 (UTC)[reply]

I did have a suspicion it would be that. In other words, not actually treating it as a separate language. Theknightwho (talk) 18:20, 22 September 2022 (UTC)[reply]

IMO, based on the prior discussion, and this one, it looks to me like we have broad support to stop classifying it as a WDL. (We previously removed Irish and Welsh by BP discussion.) Does anyone actually object, and thin it is a WDL? - -sche (discuss) 19:07, 22 September 2022 (UTC)[reply]

A game/puzzle for you

It's my birthday today, so my gift to you is a lexicographic game/puzzle: come up with a list of loanwords into English, one for each letter, where each one comes from a different language and a different country. Full details here. :-) -Stelio (talk) 20:46, 20 September 2022 (UTC)[reply]

Happy birthday @Stelio!!! ‑‑Sarri.greek ^♫ I 23:37, 20 September 2022 (UTC)[reply]

Still trying to solve 26 August Telegraph crossword, a very nasty one. But: I wish you 🎂. (Actually I've got a couple more accounting/finance terms to bother you with, but they haven't been a priority, lol.) Equinox ◑ 01:39, 21 September 2022 (UTC)[reply]

Subtractive Welsh Numbers

This should be of particular interest to (Notifying Mahagaja, RichardW57): , @Llusiduonbach, Benwing2. Ignoring mutations, what lemmas and sublemmas should there be for the subtractive Welsh 'word' for '38', deugain namyn dau. When combined with a masculine noun with soft mutation xxx, softly mutated plural xxxau, one gets deugain namyn dau o xxxau or namyn dau xxx deugain. Should there be a lemma or non-lemma for the latter form, and if so, what? If the noun is feminine, dau is replaced by dwy. Should the feminine be a separate lemma?

If the feminine forms are non-lemmas, should they be listed in the number box generated by {{number box}}? Should invocations of {{number box}} be permitted on their pages? At present the principal feminine forms are generated and the number box is displayed on their pages. --RichardW57m (talk) 09:43, 21 September 2022 (UTC)[reply]

@RichardW57m I would follow whatever is currently being done. I think feminine forms are given as non-lemma forms ('numeral form') and have a number box displayed. Mutations would be non-lemma forms using 'mutated numeral' as the POS (see bymtheg for an example) and don't have a number box displayed. Usually feminine forms don't have number boxes displayed for other languages but Welsh feminine numerals seem to be rather irregular so it makes sense to me to have a number box. The mutations are largely predictable (once you know that there should be a particular type of mutation) so no need to have a number box for them. Benwing2 (talk) 20:02, 25 September 2022 (UTC)[reply]

@Benwing2: OK, so deugain namyn dau and deugain namyn dwy are one lemma, but both are ultimately to be able to invoke {{number box}}. What about namyn dau deugain? --RichardW57 (talk) 20:44, 25 September 2022 (UTC)[reply]

@RichardW57m IMO namyn dau deugain is a separate lemma with the same meaning; maybe it should be defined using {{alt form}} = {{alternative form of}}. Its feminine 'namyn dwy deugain' would be a non-lemma form of it. Benwing2 (talk) 21:23, 25 September 2022 (UTC)[reply]

Whether Reddit and Twitter are to be regarded as durably archived sources

According to WT:CFI, "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks." As the issue of whether Reddit and Twitter should be regarded as durably archived sources has come up at "Wiktionary:Requests for verification/English#jogger" I am raising the matter for discussion, together with two other common online-only sources. I propose that the results of the discussion be recorded at "Wiktionary:Criteria for inclusion/Accepted online-only sources". — Sgconlaw (talk) 11:13, 21 September 2022 (UTC)[reply]

I feel that this should be an official vote for more visibility. AG202 (talk) 13:12, 21 September 2022 (UTC)[reply]

Let it run in BP for some time, and then make a vote to change CFI, putting the list directly to CFI instead of a subpage; no subpage is required for a short list of pre-approved sources. The CFI update can then be a mere formality, and RFVs can reference the RFV discussion until the mere formality finishes. Thank you for this effort, looks very useful and productive! --Dan Polansky (talk) 14:10, 21 September 2022 (UTC)[reply]

Do we really need a formal vote supported by at least a two-third majority for matters like this? WT:CFI only requires a consensus reached in a discussion lasting at least two weeks. I read this to mean the usual requirement of a simple majority. Deciding on which online-only sources are acceptable is not really a policy matter, and it may be cumbersome if we need to keep having formal two-thirds votes to approve sources. — Sgconlaw (talk) 14:33, 21 September 2022 (UTC)[reply]

"consensus" is not a simple majority. I made some writeup in User talk:Dan Polansky § What is consensus?, citing sources about what "consensus" is, and you can start at “consensus”, in OneLook Dictionary Search. if you don't want to read my writeup. In fact, 2/3 is not consensus either; using 2/3 threshold is already fairly low for "consensus", but very useful for practical purposes since else we would get very little approved. Deciding which online sources are acceptable is a policy matter; why shouldn't it? --Dan Polansky (talk) 14:47, 21 September 2022 (UTC)[reply]

@Dan Polansky: I have not seen any general agreement that "consensus" without qualification (that is, other than in a formal vote at Wiktionary:Votes) means anything other than whatever 51% of the editors participating in the discussion agree on. — Sgconlaw (talk) 14:56, 21 September 2022 (UTC)[reply]

If you say "consensus" means "plain majority", you should prove that with reliable sources. The word "consensus" is defined in OneLook dictionaries; you can read them and tell us what you found. Or if Wiktionary community decided somewhere that "deciding by consensus" means "deciding by plain majority" here contrary to the general meaning of "consensus", you should probably point me to the place where this was decided. --Dan Polansky (talk) 15:01, 21 September 2022 (UTC)[reply]

I mean, the project can decide to decide certain things by plain majority, or perhaps 55% or 60% to have some stability buffer and prevent alternation, but that should not be called "consensus" any more since that is too far away from what the word means. One should say, such and such is decided by a discussion combined with a 60%-supermajority vote, or the like. --Dan Polansky (talk) 15:12, 21 September 2022 (UTC)[reply]

This is too important an issue to be left unclarified, because it throws all discussions into doubt. I have created a vote at "Wiktionary:Votes/pl-2022-09/Meaning of consensus for discussions other than formal votes created at Wiktionary:Votes". — Sgconlaw (talk) 16:22, 21 September 2022 (UTC)[reply]

This vote seems likely to fail since the RFD threshold vote opposers objected to having fixed thresholds. Some imply that we already use the strength of arguments in our RFD processes to some extent whereas the above vote is about pure vote tallying. In any case, I recommend “consensus”, in OneLook Dictionary Search. to find out what the word means. People cannot just make up meanings of words with no supporting evidence or references. --Dan Polansky (talk) 17:23, 21 September 2022 (UTC)[reply]

As far as I can tell, there is no consensus here for your view on consensus. Theknightwho (talk) 14:24, 22 September 2022 (UTC)[reply]

I say consensus is determined primarily from position tallying, and the bare supports below seem to agree with me. Sgconlaw above agrees with tallying but is for 50% threshold, and if he is right, the vote he created above will pass and will settle this unresolved problem. The bare supports seem to disagree with your views on consensus. --Dan Polansky (talk) 09:56, 23 September 2022 (UTC)[reply]

Wikimedia projects, which are mostly durably archived for copyright purposes, are banned as sources. Why should Twitter and Reddit, which appear to have all their problems and more, be allowed? RichardW57m (talk) 09:44, 23 September 2022 (UTC)[reply]

This need to perpetually re-litigate certain issues at the expense of more constructive focuses (we've already functionally weighed and voted on this) is why I'm hanging up my hat. Would be nice if the community decides not to trash my last six months of work but I have low hopes. 🤷‍♀️ WordyAndNerdy (talk) 15:24, 21 September 2022 (UTC)[reply]

This looks like a hopeful development, to prevent future discussions about acceptability of Twitter. It seems to be an implementation of the rules as newly specified in WT:CFI: "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks." And this is the two-week discussion called for by the approved vote. It would have been more productive to have approved some sources straight away as part of the vote that approved the rule, but unfortunately it was not done, so there seems to be no better or more productive option. Arguing about Twitter in each RFV is not particularly productive. --Dan Polansky (talk) 15:33, 21 September 2022 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ I think it is better to record the results of such votes at "Wiktionary:Criteria for inclusion/Accepted online-only sources" or some other suitably named page rather than on WT:CFI itself, because there is every possibility that the whitelist will become quite long as more discussions take place. — Sgconlaw (talk) 17:27, 21 September 2022 (UTC)[reply]

I hate CFI subpages. When I arrive at a section of CFI, I want to read all that is to it, not click to learn more. Even if the list is long, so what? Quite a long comma-separated list can neatly fit a single paragraph. And it does not need to be very long: once you whitelist multiple online magazines, others can follow by extrapolation. Once you whitelist Twitter and Reddit, all sources are good anyway, I suppose. --Dan Polansky (talk) 17:31, 21 September 2022 (UTC)[reply]

@Dan Polansky: updating the main CFI page will require a formal vote each time; I'm not sure whether that's what people want to do but I'll go along with whatever editors are happy with. Apart from Reddit and Twitter (which I personally currently express no view on), I'd probably exclude most purely personal blogs and websites unless there are strong reasons to include them (for example, a particular website is owned by a notable person, or has become culturally significant as the first place a well-established term (like on fleek) was used). — Sgconlaw (talk) 17:39, 21 September 2022 (UTC)[reply]

Sure, but what is a difference between a 2-week BP discussion and 2-week vote, other than a vote being more prominently on a radar screen? Not much, unless you pull it off with redefining "consensus" to mean "plain majority", which it does not.

Anyone can Tweet; creating a Twitter account is easy. Tweets are much more conversational than blogs usually are. Personal blogs are often better than Twitter in that they show the sentence as part of a whole paragraph, so the context can be analyzed. I see no reason to consider personal blogs worse than Twitter: the absence of copy editing is identical. --Dan Polansky (talk) 17:48, 21 September 2022 (UTC)[reply]

@Dan Polansky: I've created a straw poll below on what should be done after the conclusion of the discussion. — Sgconlaw (talk) 18:36, 21 September 2022 (UTC)[reply]

Won't bother voting on the individual sites, since it's inane, but "there are hurtful words there!" is not a good reason to ban a source of words and we shouldn't take it seriously. Equinox ◑ 19:07, 21 September 2022 (UTC)[reply]

After my last irate sabbatical, when I started creating entries attested with tweets, you left me a message saying it was "as if the stopper came out of the jar." This vote raises the very real prospect that all of the work I've done in the last few months will get binned. I'd like to continue contributing. But I won't be returning if this fails. I mean it this time. I've struggled (off-and-on) with the barely workable status quo for over a decade.

Every "oppose" vote is a millstone dragging this project further into irrelevance. (Well, except for Reddit. I understand the concerns being raised, even if, in this case, I voted in support.) Much-needed policy changes finally arrived, but now some want to turn back the clock. To what end, exactly? Why is a digital dictionary steadfastly clinging to paper a full 20 years after the decline of print newspapers? Why is the one exception to this print purism a platform that's been functionally defunct since the mid-2000s? Many of the issues raised about Twitter and Reddit apply equally to Usenet. No single source for citations is without drawbacks. Google Books limits page views, doesn't index a lot of characters, and older digitized texts are often scanno-ridden. But one learns to work with the available tools. Having more tools in one's toolbox certainly makes it easier to get the job done.

I don't like tooting my own horn. But I think I'm probably one of the more experienced editors when it comes to English attestation. I've learned to work with the quirks of various platforms (Google Books, Google Scholar, Issuu, Internet Archive) and with the odd strictures of the old CFI framework. But language is constantly outrunning our ability to document it. @BinaryStep's observation that the old CFI framework meant we don't have "any fandom slang newer than Buffy, The X-Files, and Star Trek" is particularly salient. I cut my teeth on those fandoms. I know it's been 20 years since their heyday. I've managed to attest newer fandom slang when it's been the focus of academic study (BBC Sherlock being a prominent example). But I haven't had much luck attesting Marvel fandom slang through the traditional routes despite the huge popularity of the MCU. This is just one example of the weird artificial gaps in linguistic coverage created by the old CFI framework.

I feel like I've written this post post a dozen times. I don't know how many times I can keep patiently (or not) making these arguments. I'm tired. WordyAndNerdy (talk) 18:38, 30 September 2022 (UTC)[reply]

Most of the above is dealt with in my lengthy support and oppose votes. The dirty bathwater is unknown and too much of a risk; the gate should be opened a little, not full with unknown consequences. Twitter can be allowed if the requirements are raised. Usenet counting as much as Google Books is a legacy burden, not a feature. Our CFI is no worse than M-W's and OED's CFI, in fact much more lenient. Voting in individual RFVs is still an option: if you collect 15 quotations spanning multiple years as you or someone else did recently, there is a chance people will support keeping, but I don't know. --Dan Polansky (talk) 18:55, 30 September 2022 (UTC)[reply]

You opposed Wiktionary:Votes/2021-09/New standard for archived quotations, which would allow all Wayback-Machine-archived Internet. What makes you think Twitter is better than a random Internet web site or discussion? --Dan Polansky (talk) 19:12, 30 September 2022 (UTC)[reply]

I opposed that proposal because I judged it would've opened the door to editors adding quotes from fringe websites like the Daily Stormer to entries (a thing I've actually seen happen.) My issue was and remains more with platforming such websites. I've attested a ton of incel and GC slang from academic sources and Twitter. I'm not entirely uncomfortable with attesting objectionable language.

This proposal, in contrast, is more conservative in scope, and thus seems less likely to present unintended consequences. Reddit has systemic issues that have been discussed downthread. However, like Twitter, it's a social-media platform. It doesn't exist solely to propagate misinformation or hate, and it's a nominally moderated space, in the sense it has TOS by which users are supposed to abide. One can thus more readily separate less-favourable citations from favourable ones. With Twitter, you can also search for tweets by minimum likes (add "min_faves:NUMBER" to the search). That's a quick way of filtering out spammy posts.

I honestly don't see why Twitter is being selectively held to a higher standard. We don't set these kinds of barriers to entry with print media. It would be entirely possible for me to attest something using two self-published books and a letter-to-the-editor from a small-town circular. The one-year span and independent author provisions of CFI already provide a relatively good filter against attempts to game protologisms into mainspace. It really seems to me like some editors are simply opposed to any and all change. WordyAndNerdy (talk) 21:15, 30 September 2022 (UTC)[reply]

Right: Twitter is nominally moderated and is a single site and therefore less of a risk than the whole archived Internet. But still, Twitter is not moderated for creative neologism used for fun, and that is what we are lexicographically concerned with, not hate speech per se. Citations:Turkroach shows example Twitter attestations; how is the use of Turkroach, likening an ethnic to insects, not spreading hate, anyway?

Some indicate there are about 500 million tweets per day in 2022, a huge volume. In that daily volume, the threshold of 3 quotations has a whole different meaning than it has for print publications. Hardly anyone can really know what sort of low-grade material can be year-spanning 3-attested in that astronomical volume of tweets; random human sampling will achieve nothing. OED requires evidence of "sufficiently sustained and widespread use", which, whatever that means exactly, 3 independent tweets are not.

Surely tweeting a would-be word or phrase into a year-spanning 3-attested existence is astronomically easier and cheaper than trying to do the same via self-publishing in print. Even if you self-publish, someone has to scan that to make it accessible to attesters online; we would not accept a quote traced to a source in print without being able to verify the accuracy of the quote. Tweeting is on a whole different order of magnitude or two than self-publishing in print. --Dan Polansky (talk) 07:03, 1 October 2022 (UTC)[reply]

@Dan Polansky:

The dirty bathwater is unknown and too much of a risk; the gate should be opened a little, not full with unknown consequences.

So what? Taking risks is necessary sometimes. An unknown result is better than a system we already know is broken. What's actually the worst case scenario here? We end up with a bunch of vulgar/offensive terms from social media sites? Even if your worst fears are correct, it's not like Wiktionary would somehow be permanently ruined by the result. Pages can always be deleted if necessary. The fact of the matter is, our current CFI does not work. It's time we stop letting fear of change drive us into irrelevance.

Voting in individual RFVs is still an option: if you collect 15 quotations spanning multiple years as you or someone else did recently, there is a chance people will support keeping, but I don't know.

If this vote concludes with the decision to ban Reddit and/or Twitter, those sites will become permanently unusable. Under our current system, we're at least able to cite them on a case-by-case basis. Unfortunately, this proposal is all-or-nothing. Binarystep (talk) 14:16, 3 October 2022 (UTC)[reply]

@WordyAndNerdy I just feel like I've tried hard to find a middle ground but I've unfortunately grown tired and frustrated. I've been in multiple conversations, pushed for the recent change to CFI to allow online sources in the first place, talked about how Usenet has been an issue in recent times, voted for entries like creeper and so much more, but votes like these continue to make issues black or white, with no middle nuance, and I can't support them knowing how I've been targeted and harassed and how this website has been made to harbor abhorrent content. I really wish that folks would learn from prior issues and conversations to try and make a middle ground and make nuanced changes after discussion instead of making votes like these, but unfortunately it doesn't seem like that'll ever change on this website. I'm really sorry that your work may be subject to future RFVs and deletion and I really do support fandom slang and the like on the website. I really hope that we'll be able to one day work towards actually finding a middle ground. In the future if this Twitter vote fails, I will likely make a vote to specifically allow Twitter for fandom slang and LDLs/dialects to try and alleviate this issue. AG202 (talk) 20:09, 30 September 2022 (UTC)[reply]

Consensus is sometimes an imperfect tool for finding workable solutions. I feel like some of the voices weighing in here don't have a lot of practical experience with attestation. They thus seem to be setting unrealistic expectations and arbitrarily high standards. Every possible source has its own unique advantages and pitfalls. I suppose I'm the closest thing to a centrist on this site. I support disallowing links to websites that exist solely to propagate fringe and hate content. I think WT:DEROGATORY is a step in the right direction (although it could certainly be made a little more robust). But I don't count Usenet, Twitter, or Reddit as problem sites. They don't exist solely to propagate objectionable material. Treating them as if they do would be shooting ourselves in the foot. (Although I would be open to blacklisting certain problem subreddits.) I know this site is constitutionally resistant to change. But at the end of the day it's a WikiMedia project. The laissez-faire spirit that informs WikiMedia will likely prevail in the end. It may be years down the road, but when the dust has settled, the pendulum will probably swing further than either of us want, toward a framework where linking to the Daily Stormer is acceptable. This is the middle-ground solution you are seeking. WordyAndNerdy (talk) 21:43, 30 September 2022 (UTC)[reply]

@AG202: as the person who created this discussion, I was following WT:CFI, "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks." I took it to mean reaching a consensus on whether a particular source should be whitelisted, like Usenet. If you are saying that such a discussion was not the meaning of this sentence, then it seems to me that this part of WT:CFI needs to be clarified. Also, if you mean that postings on websites like Reddit and Twitter need to be individually considered rather than the websites as a whole being considered as suitable or not, can I ask what sort of criteria are to be applied to decide this? — Sgconlaw (talk) 11:52, 1 October 2022 (UTC)[reply]

HuffPost

https://www.huffpost.com

HuffPost should be allowed

Support — Sgconlaw (talk) 11:13, 21 September 2022 (UTC)[reply]
Support - It appears to be archived at the Library of congress News on the Internet Web Archive --RichardW57m (talk) 11:42, 21 September 2022 (UTC)[reply]
Support Binarystep (talk) 11:53, 21 September 2022 (UTC)[reply]
Support WordyAndNerdy (talk) 12:43, 21 September 2022 (UTC)[reply]
Support Acolyte of Ice (talk) 12:44, 21 September 2022 (UTC)[reply]
Support AG202 (talk) 13:17, 21 September 2022 (UTC)[reply]
~~Support~~ Looks good, is not user-edited. It may be a bit tabloid, but that's not too bad. Whether it is "durably archived" is immaterial; the criterion of "durably archived" is a surrogate for "copyedited". --Dan Polansky (talk) 14:14, 21 September 2022 (UTC)[reply]

Disagree. Being durably archived is a captured requirement; it was fortunate that the creation of durably archived material tended to imply being copy edited. (I do agree that in general we have a potential problem with the lack of copy editing; it is fortunate that Wiktionary is not stored on paper.) --RichardW57m (talk) 09:32, 23 September 2022 (UTC)[reply]
The matter is debatable. The captured requirement in WT:CFI is "permanently recorded media", an unclear phrase. WT:CFI does mention "durably archived" in bullet items, though, implying that "permanently recorded media" is to be interpreted as "durably archived". The part of CFI is unfortunate.

This poll implements "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks", which says nothing about "durably archived": was this the intent or an omission?

One may object that the above CFI text requires a discussion, not a poll with bare supports. One could decide to declare this poll invalid based on that, but that would turn very controversial, I suppose.

For better or worse, in this poll, a lot of discussion revolves about whether allowing the source loosens the criteria too much, and not about "durably archived".

One may object this poll is not properly conducted because the votes investigate other concerns than "durably archived". Maybe there is some merit to it. Maybe more people will agree with that concern; I don't know.

The situation is not ideal. Bare supports are not good. Overriding "durably archived" if the intent was that the BP discussions were to interpret it is no good. What can one do? --Dan Polansky (talk) 11:38, 23 September 2022 (UTC)[reply]
To me, the implication is that materials appearing online are not automatically 'permanently recorded', whatever that means. A sensible interpretation would be to survive at least as long as Wiktionary and its adaptations. I've been told that my worries that old inscriptions might not survive were misplaced.

It does appear that editors may disregard the usual requirement to be permanently recorded. RichardW57m (talk) 12:29, 23 September 2022 (UTC)[reply]
Support, and while I don't think conditional voting is a thing I would like to make clear that I am voting to support assuming that we are reasonably restricting what we mean when we say that a certain site is allowed, namely that what is allowed from those sites is content they have "published", generally by selected contributors (paid or otherwise) and put through editorial review and approval. Not public blogs, not comment sections, not banner ads, not site source code, not Tweets by the publication hosted elsewhere, etc. - TheDaveRoss 15:14, 29 September 2022 (UTC)[reply]
Support any website including this one. We need more ability to add words, not more restriction. PseudoSkull (talk) 18:46, 21 October 2022 (UTC)[reply]

HuffPost should be rejected

Oppose. I oppose accepting individual sites and listing them anywhere or a per-site basis. That is bizarre and helps nothing. The added value of HuffPost over material in print in Google Books is minuscule: you won't be able to attest slang and this is a single English-only site. Make a general policy that online publications subject to editing are as acceptable as Google Books and list some canonical examples, e.g. New York Times and at least one example of a tabloid to show that being a tabloid is no exclusion criterion. --Dan Polansky (talk) 07:05, 30 September 2022 (UTC)[reply]
I do not see why this is "bizarre". As mentioned at the top of this discussion, I created the discussion because the issue of whether Reddit and Twitter should be regarded as durably archived sources came up at "Wiktionary:Requests for verification/English#jogger", and according to WT:CFI "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks." If that sentence in WT:CFI does not mean discussing whether individual sites should be permitted as sources or not, then kindly start a formal vote at "Wiktionary:Votes" for that sentence to be amended. — Sgconlaw (talk) 12:45, 3 October 2022 (UTC)[reply]

Result

HuffPost should be regarded as a durably archived source (passed 8–1). — Sgconlaw (talk) 15:47, 28 October 2022 (UTC)[reply]

Reddit

https://www.reddit.com

Reddit should be allowed

Support Binarystep (talk) 11:53, 21 September 2022 (UTC)[reply]
Support WordyAndNerdy (talk) 12:43, 21 September 2022 (UTC)[reply]
Support Acolyte of Ice (talk) 12:44, 21 September 2022 (UTC)[reply]
Support Allahverdi Verdizade (talk) 15:28, 21 September 2022 (UTC)[reply]
Support Overlordnat1 (talk) 16:07, 21 September 2022 (UTC)[reply]
Support PseudoSkull (talk) 16:09, 21 September 2022 (UTC)[reply]
Support - excarnateSojourner (talk | contrib) 16:54, 23 September 2022 (UTC)[reply]
Support Ioaxxere (talk) 16:31, 7 October 2022 (UTC)[reply]
Support any website including this one. We need more ability to add words, not more restriction. PseudoSkull (talk) 18:46, 21 October 2022 (UTC)[reply]

Reddit should be rejected

Oppose unless we refactor the CFI so that terms which are extremely rare (nonces, typos, etc.) are not kept without compelling reason given. - TheDaveRoss 12:53, 21 September 2022 (UTC)[reply]
We recently banned typos, and we already have hundreds of nonce words from legacy media, which haven't harmed our dictionary so far. Binarystep (talk) 13:05, 21 September 2022 (UTC)[reply]
I don't think that we should ban nonces, however there is a huge difference between, say dwimmer-crafty which is used in an extremely widely-read, edited book and the sorts of nonces we have been seeing from internet-only sources, especially all of the crap generated by racists in their dark internet holes. The "3 cites and you are in" policy applied without any actual editorial discretion sees no difference, so I oppose removing the only mechanism that allows editors to treat them as equivalent to any further degree. - TheDaveRoss 13:48, 21 September 2022 (UTC)[reply]
Oppose. I feel that the current issues we have with Usenet are exemplified if Reddit is allowed by default. There'd be nothing stopping IPs from going to the depths of Reddit and adding any word they see. Also, we'd need to verify the issue of sockpuppets and overall spam/abuse on the website. AG202 (talk) 13:17, 21 September 2022 (UTC)[reply]
You're acting like Reddit's only good for citing slurs. There are plenty of extremely useful terms (especially LGBT, fandom, and subculture slang) that are easier to cite from there, due to the subreddit format making it easy to search specific communities when compared to other social media sites like Twitter. Reddit is essentially the modern version of Usenet, when you really get down to it. Binarystep (talk) 13:23, 21 September 2022 (UTC)[reply]
When did I say that it's only good for citing slurs. Please don't put words in my mouth. Yes, I'm aware that there are plenty of useful terms on Reddit, but I'd rather be safe than sorry, especially considering the issue of openly racist IPs and accounts creating terms here. If there were a bit more restriction on, maybe, for example, certain subs that terms can't be pulled from or heavily downvoted offensive comments can't be used, then I'd gladly support Reddit being used. But for now, the negatives are stronger than the positives for me. Also, as a side note, being the "modern version of Usenet" does not ring well with me at all. AG202 (talk) 13:46, 21 September 2022 (UTC)[reply]
When did I say that it's only good for citing slurs. Please don't put words in my mouth.

It's pretty obvious that the "issues with Usenet" you referenced were the number of Usenet-only slurs that got added by various racist IPs.

Yes, I'm aware that there are plenty of useful terms on Reddit, but I'd rather be safe than sorry, especially considering the issue of openly racist IPs and accounts creating terms here.

You're proposing that we throw the baby out with the bathwater because of entries created primarily, if not entirely, by one persistent racist troll. Even if we assume that bigoted terms will be added, the vast majority of entries would be beneficial. As it stands, slurs only comprise 0.07% of English lemmas, even with our current policies allowing any term that gets used on Usenet 3 times. I don't see why we should ignore potentially hundreds of useful terms just to prevent a handful of offensive entries.

If there were a bit more restriction on, maybe, for example, certain subs that terms can't be pulled from or heavily downvoted offensive comments can't be used, then I'd gladly support Reddit being used.

Why shouldn't we document current racist terminology? I think it's beneficial if, for instance, non-native speakers (or people who aren't terminally online) know that jogger is used as an N-word substitute and thus don't end up taking BasedGod88's "I fucking hate joggers" post at face value.

I also don't recommend using downvotes as a way to determine whether a word is too offensive to include. There are plenty of reasons for comments to get downvoted, and posting the wrong content in an echo chamber (e.g. pro-trans content in /r/Conservative) essentially guarantees a negative response.

Also, as a side note, being the "modern version of Usenet" does not ring well with me at all.

There's more to Usenet than slurs. Binarystep (talk) 14:13, 21 September 2022 (UTC)[reply]
This is not a 1/0 issue. There can be balance between banning Reddit entirely and including Reddit completely. It hasn’t been one IP, there’ve been multiple issues with multiple people, as I’ve told you before. For reasons I’ve also explained before, I don’t want Wiktionary to be a bastion of nonce slurs that wouldn’t otherwise gain traction. The downvote suggestion was just a suggestion. There are multiple options that could be pursued such as increasing the citation count. You don’t agree with that and that’s fine, you’ve already voted in support of this. At this point, I cannot support it without some kind of alteration. I truly do not want to get into this with you again, because it will only take up unneeded space. Please just let me vote the way I want to without us having repeated discussions over the same topic that don’t lead anywhere useful. AG202 (talk) 16:56, 21 September 2022 (UTC)[reply]
To be fair here, AG202 does have a point. Reddit is notorious in some circles for being a 'wretched hive of scum and villainy' (although it also houses quite inoffensive communities). See Controversial Reddit communities for a rundown. I think the admission of any site based around user-generated content will run into similar issues, however. 70.172.194.25 18:27, 21 September 2022 (UTC)[reply]
Thank you. There used to be (and probably still are) whole communities who want people like me dead. I don't want to give any more space or cadence to them. AG202 (talk) 18:29, 21 September 2022 (UTC)[reply]

@AG202:

This is not a 1/0 issue. There can be balance between banning Reddit entirely and including Reddit completely.

This proposal is, unfortunately, all-or-nothing. You're voting to completely ban Reddit (thus preventing countless good entries) just so we can avoid citing a handful of offensive terms.

It hasn’t been one IP, there’ve been multiple issues with multiple people, as I’ve told you before.

How many racist trolls do we have? Everything I've seen was clearly from the same person, since their terms came from the same obscure racist forum (Chimpmania) and they have a very distinct editing style.

For reasons I’ve also explained before, I don’t want Wiktionary to be a bastion of nonce slurs that wouldn’t otherwise gain traction.

It's not our job as a descriptivist dictionary to police the English language.

The downvote suggestion was just a suggestion. There are multiple options that could be pursued such as increasing the citation count.

Increasing the citation count is something I'd be more inclined to support, though I still think websites shouldn't be treated differently than legacy sources. Binarystep (talk) 02:13, 22 September 2022 (UTC)[reply]

I feel like you are absolutely missing the point of the argument. Dave mentioned earlier that the policy of "three times and you're in" isn't that useful in some contexts. We face the same problems with Reddit as we do with Usenet. Vininn126 (talk) 19:28, 21 September 2022 (UTC)[reply]

Just for more clarity, I personally do not want Reddit completely banned. I voted for the initial change to CFI to allow web-only sources. I just feel that there's no option here that gives me a middle ground to where I can comfortably support Reddit's inclusion without having to worry about more racists and racist attacks towards editors here. I still maintain that our community suffers because of issues like this, and we've lost editors because of it. There has to be a better middle ground. AG202 (talk) 19:17, 21 September 2022 (UTC)[reply]
@AG202: The problem is that you're voting to completely ban Reddit. This proposal doesn't allow for a middle ground, unfortunately. Binarystep (talk) 01:52, 22 September 2022 (UTC)[reply]
To me this current proposal doesn't mean that Reddit would be entirely banned, and I feel that it'd be much much easier to litigate making Reddit have more restrictions from a position where it's not already completely allowed, rather than trying to go "backwards", as we've seen with Usenet currently as it's pretty much impossible for various reasons. (CC: @Dan Polansky) AG202 (talk) 03:48, 22 September 2022 (UTC)[reply]
@AG202: I already asked the proposer about this (see Wiktionary:Requests for verification/English#jogger), and this is indeed a vote to completely ban Reddit. Binarystep (talk) 01:28, 24 September 2022 (UTC)[reply]
Oppose. I think the fact that there is little to no moderation makes the site vulnerable to being used to game the system – a number of users intent on trying to get a term into the Wiktionary could cook up a number of posts using the term, and then use that as the "evidence" for inclusion of the term. — Sgconlaw (talk) 18:59, 21 September 2022 (UTC)[reply]
I do tend to agree with this as well. Reddit has significantly more use than Usenet, and while that's a good thing in some aspects, that could lead to cases where specific subreddits will try and game the system to make sure that their terms are added to the website. In recent memory, I remember something similar happening with Petersonian. And while we could combat that through perhaps RFD, I don't see that ending positively at all. AG202 (talk) 19:20, 21 September 2022 (UTC)[reply]
@AG202: No one "gamed the system" to get Petersonian added. The term was attested in print media. The worst thing that happened was a bunch of JP fans celebrating the word's inclusion on their subreddit. Binarystep (talk) 02:04, 22 September 2022 (UTC)[reply]
Which is why I said something ~similar~. Though it was attested in print media, it did give legitimacy, and I can see that getting much much worse knowing how certain sects of Reddit are. I'm sure that there are people reading this very discussion for that as well. AG202 (talk) 03:43, 22 September 2022 (UTC)[reply]

This is a good argument. However, 1) Usenet seems not much better as far as gameability. 2) Is Reddit worse than Twitter? Does Twitter have stronger moderation rules than Reddit? (If so, I might be inclined to oppose Reddit but not Twitter, striking a middle ground there.) --Dan Polansky (talk) 19:27, 21 September 2022 (UTC)[reply]
@Dan Polansky. Problem with Usenet is that it's grandfathered in already. There's opposition to removing it, and for better or worse we're stuck with it. On that note though, just because we have it, doesn't mean that we should add another, more notorious version of it with little moderation. I do find Reddit a bit more problematic than Twitter for the reasons that 70 mentioned, plus the fact that moderation is almost entirely left up to those that moderate that subreddit and that it's much much easier to create echo chambers and communities that hyperfocus on issues like being racist. In the same vein, for example, a sub like that can have moderators that are openly racist and delete anything that doesn't align with that view (or limit posts to people that verify or have a flair or whatever) vs. Twitter where generally anyone can reply or at least quote-tweet if the original tweeter turns off replies. This also goes with the fact that Reddit is anonymous by default until made otherwise, leading to more... choice comments that I've seen. Overall, while Twitter isn't perfect, Reddit's issues show up much much more often, and that's why I believe that it should be taken more seriously. AG202 (talk) 19:38, 21 September 2022 (UTC)[reply]
Thank you for that, makes a lot of sense. Yes, Usenet is there and can't go out without consensus. I'll watch the discussion develop and quite possibly switch to oppose Reddit. --Dan Polansky (talk) 19:41, 21 September 2022 (UTC)[reply]
Yep no problem! AG202 (talk) 19:43, 21 September 2022 (UTC)[reply]

To play the devil's advocate, isn't then Reddit better at documenting real spoken linguistic phenomena, because it is de facto not censored? Like, imagine an all-mighty omnicorpus of all utterances ever made. And we would be mining from that corpus, maybe post mortem of a civilization past. We would not limit ourselves to 3 quotations, but whatever the count, we would run into all sorts of objectionable language people use when no one can hear them and reproach them. Isn't Reddit better at that, and isn't that in part why it is more objectionable, by showing more of things we do not want to know about? As far as pure disinterested knowledge from a Martian perspective, isn't it what makes Reddit valuable? --Dan Polansky (talk) 19:49, 21 September 2022 (UTC)[reply]
@Dan Polansky I guess it depends on what the goal of Wiktionary is for you. (Also Reddit is ~censored~ in some aspects, though not in the typical direction that people tend to [mis]use that term) Is our ultimate goal to allow any and every term on the website? Why do we have CFI in the first place? Are we comfortable giving cadence and legitimacy to groups that otherwise wouldn't have it? How does this affect groups of editors who'd otherwise participate for languages that aren't covered currently? Overall, it comes down to where the line is drawn. AG202 (talk) 20:09, 21 September 2022 (UTC)[reply]
We need some evidence-based attestation criteria, or else it would be just opinions on protologisms or putative words and voting, with woefully bad administrability. So CFI is required regardless. As for the rest, that's a larger discussion that already took place elsewhere, so I won't dig into that much. I just brought forth the Martian perspective, the search for knowledge of things lexical, even obnoxious things lexical. The other perspective is more of social needs. One may note things changed in lexicography quite a bit: the f-word did not use to be in respectable dictionaries, and now it is. Things have shifted away from prescription and offense avoidance toward description, even of things proscribed, nonstandard, offensive, derogatory, etc. --Dan Polansky (talk) 20:19, 21 September 2022 (UTC)[reply]

@AG202:

Is our ultimate goal to allow any and every term on the website?

It is according to the front page, though many have chosen to ignore it.

Why do we have CFI in the first place?

To prevent nonexistent terms, SOP entries, and encyclopedic material from being added? There are plenty of reasons to use CFI that have nothing to do with gatekeeping, prescriptivism, or censorship.

Are we comfortable giving cadence and legitimacy to groups that otherwise wouldn't have it?

Does our inclusion of the N-word legitimize racism, or does this only apply to terms coined in the 21st century? I really don't understand why you feel that acknowledging a word's existence is the same thing as praising it, especially when we already have countless entries for terms used by reprehensible ideologies, all of which are cited in "durably archived" media. Reddit isn't more racist than Usenet, and it certainly isn't more racist than 19th/20th century books. Binarystep (talk) 02:33, 22 September 2022 (UTC)[reply]

@Sgconlaw: The English language doesn't have moderation. Our job as a descriptivist dictionary is to document language as it is used, not how it "should be" used. The scenario you've described is extremely unlikely (not a lot of people on Reddit really care about Wiktionary all that much), and could already be done with Usenet. This is pure fearmongering. Binarystep (talk) 01:58, 22 September 2022 (UTC)[reply]
Oppose Reddit and Twitter for reasons I've discussed previously at length, which have to do with how gameable such sites are. News sites with editors and copyeditors and usually-real writers, great; social media where it takes ten minutes to make three accounts and tweet/comment your new made-up attack term into existence, no. (I look forward to the day when trolls realize that with a couple tweets and a phone alarm to remind 'em to send the third tweet a year later, they can get targetted harassment coinages in, e.g. using the name of a classmate they dislike as an adjective meaning ugly or a verb meaning to fuck up.) If we could resolve that issue, Twitter (at least, whether or not Reddit) could be very useful, as it's full of dialectal and colloquial speech, but some of the very people who want these sites, and are bludgeoning this discussion accusing who don't want to add them (because of these issues) of blocking good content / throwing the baby out with the bathwater, themselves seem to want to use the existence of good content on the sites to camel's nose the gamed/fake/bad content in, inasmuch as they've opposed efforts to raise standards against it. (The fact that people could do this with Usenet is a problem, a bug, not a feature, not a door we should throw open even wider.) - -sche (discuss) 23:50, 22 September 2022 (UTC)[reply]
Thank you for this. I've been struggling to express how I've felt with some of these discussions but you've expressed it much better than I could. I've tried in the past to search for a compromise on issues like this and push for a helpful solution, but it's always 1/0 and I don't feel like I'm truly listened to sometimes. Regardless of the result, I hope that folks can work towards some kind of middle ground that works for as many people as possible without alienating helpful groups of people. AG202 (talk) 04:14, 23 September 2022 (UTC)[reply]

@-sche:

Reddit and Twitter for reasons I've discussed previously at length, which have to do with how gameable such sites are.

That's not a real issue. It's a hypothetical worst-case scenario, and one that we could easily prevent.

News sites with editors and copyeditors and usually-real writers, great; social media where it takes ten minutes to make three accounts and tweet/comment your new made-up attack term into existence, no.

This prevents us from accurately documenting the overwhelming majority of internet slang, especially fandom/subculture slang that has zero chance of being used in a professional publication.

(I look forward to the day when trolls realize that with a couple tweets and a phone alarm to remind 'em to send the third tweet a year later, they can get targetted harassment coinages in, e.g. using the name of a classmate they dislike as an adjective meaning ugly or a verb meaning to fuck up.)

Or we could use common sense and not allow terms referring to people who aren't public figures. We already deleted Big Red for that exact reason despite it being attested in "durably archived" sources.

If we could resolve that issue, Twitter (at least, whether or not Reddit) could be very useful, as it's full of dialectal and colloquial speech, but some of the very people who want these sites, and are bludgeoning this discussion accusing who don't want to add them (because of these issues) of blocking good content / throwing the baby out with the bathwater, themselves seem to want to use the existence of good content on the sites to camel's nose the gamed/fake/bad content in, inasmuch as they've opposed efforts to raise standards against it.

Nothing was incorrect about what I said. Throwing out potentially hundreds (if not thousands, given that internet slang exists in all languages) of useful terms to preemptively block a small handful of trolls is, by definition, throwing the baby out with the bathwater. As I said above, there is a way to resolve this issue, which we've already done in the past.

I also don't appreciate your baseless accusation that your opposition is trying to force "gamed/fake/bad content" into the dictionary, especially since you quoted one of my comments in the process. My only motivation in this vote is documenting internet slang and fandom/subculture terminology, not whatever you're trying to imply.

(The fact that people could do this with Usenet is a problem, a bug, not a feature, not a door we should throw open even wider.)

And would your solution be to ban Usenet as well and prevent us from documenting any internet/fandom slang that hasn't made it into books? I would hope not. Binarystep (talk) 01:27, 24 September 2022 (UTC)[reply]
Oppose I've been using Reddit for 17 years and I routinely see content deleted there and it's fairly common for users to make accounts and delete them after x months. I do not think the content here is durable and actually possibly less so than most websites. —Justin (koavf)❤T☮C☺M☯ 04:41, 23 September 2022 (UTC)[reply]
Coincidentally, I happened across someone in the Firefox subreddit just a few minutes ago who wrote "[I've] been on Reddit since close to the beginning... I change my account every year or two." (altho, unfortunately, he seems painfully ignorant of how the site is supposed to work, but that's an aside). This is pretty common and I imagine will only be moreso. I'm inclined against any self-published outlet that also allows you to delete or edit your content as being durably cited by its very nature. —Justin (koavf)❤T☮C☺M☯ 23:59, 23 September 2022 (UTC)[reply]

@Koavf: There are unofficial Reddit archives now, like unddit.com; I don't know for how long anything is stored there and in howfar it's going to continue existing though. Thadh (talk) 00:57, 24 September 2022 (UTC)[reply]

@Koavf: Internet Archive exists. I don't see why you're focused on whether the live version of the page stays the same forever, when that's not true of any website. Binarystep (talk) 01:12, 24 September 2022 (UTC)[reply]
"Durable" doesn't mean "infinite": I'm concerned with things that are more or less likely to be around for a bit and I don't think Reddit comments are liable to be accessible. —Justin (koavf)❤T☮C☺M☯ 01:27, 24 September 2022 (UTC)[reply]
@Koavf: They're accessible through the Wayback Machine. Is that not good enough? Is it absolutely necessary that the live version of a Reddit comment stay unmodified for 10+ years? Binarystep (talk) 01:31, 24 September 2022 (UTC)[reply]
No. Yes. —Justin (koavf)❤T☮C☺M☯ 01:36, 24 September 2022 (UTC)[reply]
@Koavf: Why? The Internet Archive's backup of a Reddit comment isn't somehow less "real" than the live version. Binarystep (talk) 01:37, 24 September 2022 (UTC)[reply]
It may not capture everything, it can be taken down with a Robots.txt file, Internet Archive could itself go down, etc. If anything that could be on the Internet Archive or similar archiving sites could be considered durably archived, then the entire web could be durably archived and the discussion of a single site is redundant. —Justin (koavf)❤T☮C☺M☯ 01:51, 24 September 2022 (UTC)[reply]
@Koavf:

It may not capture everything,

This is demonstrably false.

it can be taken down with a Robots.txt file,

Google takes down archives of entire newsgroups if their content is deemed objectionable, yet we're still allowed to link to them. Reddit also doesn't use robots.txt, there's no reason to assume they will, and IA no longer respects robots.txt in the first place.

Internet Archive could itself go down, etc.

The Internet Archive has been around longer than we have. What makes you think we'll outlive them? This is purely baseless fearmongering.

You're also the one who said this earlier:

"Durable" doesn't mean "infinite": I'm concerned with things that are more or less likely to be around for a bit

Do you really think IA won't even be around in a few years?

If anything that could be on the Internet Archive or similar archiving sites could be considered durably archived, then the entire web could be durably archived and the discussion of a single site is redundant.

I fail to see the problem with this. It's time we stop trying to gatekeep language. Our "durably archived source" policy is a euphemism for only allowing "professional" sources (just look at all the discussions about whether self-published books should be allowed, even though they're hardly less "durable"), and primarily exists as a way to keep our coverage from being "tainted" by Those Damn Kids And Their Dumb Kid Words™. It's elitism at its finest, and it directly contradicts our "all words in all languages" motto. Binarystep (talk) 02:15, 24 September 2022 (UTC)[reply]
"This is demonstrably false." Please demonstrate how the Internet Archive captures the private comments at /r/Lounge. Please also demonstrate how the Internet Archive captures all comments from the Reddit API that are deleted by /u/Automoderator. I know for a fact that you won't do this, but I would love to see how you would.

"Google takes down archives of entire newsgroups if their content is deemed objectionable, yet we're still allowed to link to them." Okay. w:en:WP:OTHERSTUFFEXISTS.

"Reddit also doesn't use robots.txt" Yes it does: https://www.reddit.com/robots.txt Did you even check this URI before you made this comment?

"IA no longer respects robots.txt in the first place." They changed their policy, it can change again.

"What makes you think we'll outlive them?" Wiktionary is mirrored and archived several places and is orders of magnitude smaller and in no way in danger of the litany of copyright and illegal content issues that Internet Archive is.

"This is purely baseless fearmongering." No, it's not.

"Do you really think IA won't even be around in a few years?" I do, but I also think it's very vulnerable to legal challenges, outages, and loss of funding.

"I fail to see the problem with this." The problem is discussing every website individually and wasting time instead of discussing all of the archived Web at once. It's fine if you want to propose "all sites on Internet Archive are durable" and we can have that discussion. It's pointless to discuss every single one.

"It's time we stop trying to gatekeep language." What? I didn't.

"Our "durably archived source" policy is a euphemism for only allowing "professional" sources" No, it's not: Usenet is not a professional source.

"just look at all the discussions about whether self-published books should be allowed, even though they're hardly less "durable". There are books that have lasted for millennia, but no website has. When a website lasts for 5,000 years, let me know.

"It's elitism at its finest, and it directly contradicts our "all words in all languages" motto." This is purely baseless fearmongering. —Justin (koavf)❤T☮C☺M☯ 02:34, 24 September 2022 (UTC)[reply]

@Koavf:

Please demonstrate how the Internet Archive captures the private comments at /r/Lounge. Please also demonstrate how the Internet Archive captures all comments from the Reddit API that are deleted by /u/Automoderator. I know for a fact that you won't do this, but I would love to see how you would.

So because IA doesn't have literally every Reddit comment ever typed, none of its archives can be trusted? My point is that every comment that can be archived should be considered "durably archived" by our standards. Otherwise, you might as well use lost media as a justification to not cite books.

Okay. w:en:WP:OTHERSTUFFEXISTS.

There's a difference between whataboutism and pointing out a relevant inconsistency in how our policies are applied.

Yes it does: https://www.reddit.com/robots.txt Did you even check this URI before you made this comment?

So I was wrong about that. It still doesn't matter, given that it hasn't stopped me from archiving Reddit comments in IA with zero issues.

Wiktionary is mirrored and archived several places and is orders of magnitude smaller and in no way in danger of the litany of copyright and illegal content issues that Internet Archive is.

Reddit comments can also be (and often are) archived in multiple places.

They changed their policy, it can change again.

That's speculative.

No, it's not.

How isn't it? There's absolutely no reason to assume that IA is going to shut down anytime soon.

I do, but I also think it's very vulnerable to legal challenges, outages, and loss of funding.

There isn't enough evidence to immediately jump to the worst possible conclusion. IA's been around for over 25 years, and I'm confident they'll be around for 25 more.

The problem is discussing every website individually and wasting time instead of discussing all of the archived Web at once. It's fine if you want to propose "all sites on Internet Archive are durable" and we can have that discussion. It's pointless to discuss every single one.

Fair point, but I'm not the one who started the proposal, so all I can really do here is support the option closest to what I want, even if it's not entirely right.

What? I didn't.

Apologies for the assumption, then.

No, it's not: Usenet is not a professional source.

Usenet is the exception, and there are plenty of people who want it to be less accepted as a source despite its "durability".

There are books that have lasted for millennia, but no website has. When a website lasts for 5,000 years, let me know.

The internet itself hasn't been around for 5,000 years, so this is a pointless argument, and one that again contradicts your assertion that "durable" doesn't mean "forever". I'm sure many websites will remain archived in the future, though, since they're arguably comparable to historical texts.

It's not like books haven't been lost before, either. I personally own a copy of an extremely rare book from the 1940s (only 3 hits for the title on Google, and no scans exist), yet it would presumably be considered "durably archived" under our current CFI.

This is purely baseless fearmongering.

How is it fearmongering? There have been multiple failed attempts to ban fandom slang for being too niche (even to the point of trying to reinterpret WT:FICTION), and plenty of entries have been deleted despite being CFI-compliant, often citing nonexistent policies as a justification ("no nicknames of individuals", "non-notable subject"). I've seen users argue against web citations on the basis of not wanting to document terms from "lower registers", and there was a recent proposal to ban certain terms on the basis of their offensiveness. Binarystep (talk) 03:19, 24 September 2022 (UTC)[reply]
"So because IA doesn't have literally every Reddit comment ever typed, none of its archives can be trusted?" I didn't write that. I'm just refuting the misinformation that you wrote. I never wrote that Internet Archive is untrusthworthy.

"My point is that every comment that can be archived should be considered "durably archived" by our standards." I think you should make this case for all sites indexed by Internet Archive, then. I think that would be a very fine conversation to have about our policy.

So I was wrong about that. It still doesn't matter, given that it hasn't stopped me from archiving Reddit comments in IA with zero issues." So far. Until/unless the policy changes again.

"Reddit comments can also be (and often are) archived in multiple places." As much as Wiktionary?

""They changed their policy, it can change again.": That's speculative." Neither of those claims is speculative and both are true.

"How isn't it? There's absolutely no reason to assume that IA is going to shut down anytime soon." I told you: greater amounts of data, funding issues, outages (which I have personally experienced), copyright issues, legal issues.

"There isn't enough evidence to immediately jump to the worst possible conclusion." And I didn't.

"Apologies for the assumption, then." Thanks so much: always good to talk collaboratively.

"The internet itself hasn't been around for 5,000 years, so this is a pointless argument, and one that again contradicts your assertion that "durable" doesn't mean "forever". I'm sure many websites will remain archived in the future, though, since they're arguably comparable to historical texts." Probably, but maybe not. A magnetic flare from the Sun could instantly destroy most digital media. You can't do that with most books. Durability is a flexible and subjective standard. I'm giving you my personal standard. I appreciate that yours are different than mine and you have valid reasons for believing what you believe. I just don't think that social media sites are durable ones: they frequently disappear (e.g. all of Friendster, virtually all of MySpace's music), are subject to the same users deleting the content, and don't represent what is most likely to be around in a decade or two or 10. Tweets archived by the Library of Congress are a different thing, hence I didn't comment on the Twitter proposal.

"It's not like books haven't been lost before, either. I personally own a copy of an extremely rare book from the 1940s (only 3 hits for the title on Google, and no scans exist), yet it would presumably be considered "durably archived" under our current CFI." Exactly: it's still here and probably will be for several years: do you plan to burn your copy? Someone could flip a switch and destroy all of Reddit.

"How is it fearmongering?" You are making much grander and at best, tangential arguments that are not what we're discussing here. I'm talking about Reddit, which I don't feel confident will be durably archived in the future.

"There have been multiple failed attempts to ban fandom slang for being too niche (even to the point of trying to reinterpret WT:FICTION), and plenty of entries have been deleted despite being CFI-compliant, often citing nonexistent policies as a justification ("no nicknames of individuals", "non-notable subject")." I imagine that we're actually in much more accord here than you would suspect. I am very much in favor of using Urban Dictionary as a source for slang, for instance and I agree that there should not be gatekeeping of "lo" versus "hi" culture, particularly for language: Wiktionary would be much better for having more colloquial language. If you see similar attempts to get rid of vulgar/popular speech, let me know. —Justin (koavf)❤T☮C☺M☯ 03:37, 24 September 2022 (UTC)[reply]

@Koavf:

I didn't write that. I'm just refuting the misinformation that you wrote. I never wrote that Internet Archive is untrusthworthy.

You were saying IA isn't a good/reliable archive because it doesn't have everything. I apologize if I misunderstood you, but I legitimately don't see how I could have in this case.

I think you should make this case for all sites indexed by Internet Archive, then. I think that would be a very fine conversation to have about our policy.

I agree, but I know a lot of people wouldn't be in favor of that. It's much easier to accept a handful of well-known websites for the time being.

So far. Until/unless the policy changes again.

Why should we assume the worst in this case?

As much as Wiktionary?

I don't know, I've never done a comparison. I don't think it matters which site is more thoroughly archived, though, what matters is that both sites are backed up to a significant degree.

Neither of those claims is speculative and both are true.

How isn't it speculative? IA has no reason to start respecting robots.txt again (it's not illegal for them to ignore it, after all), and doing so would only piss people off.

I told you: greater amounts of data, funding issues, outages (which I have personally experienced), copyright issues, legal issues.

I'm aware of the issues IA's faced, but they've managed to survive for over 25 years despite them, and without sufficient proof, I have no reason to assume that's going to change in the near future.

And I didn't.

You're acting on the assumption that IA will vanish from the internet in a few years.

A magnetic flare from the Sun could instantly destroy most digital media.

This is exactly why durability doesn't matter, in my opinion. Wiktionary itself is no more durable than Reddit or the Internet Archive, so I fail to see why we should hold those sites to a higher standard.

You can't do that with most books.

Most books, sure, but the existence of lost media proves that books aren't exactly durable either.

I just don't think that social media sites are durable ones: they frequently disappear (e.g. all of Friendster, virtually all of MySpace's music), are subject to the same users deleting the content, and don't represent what is most likely to be around in a decade or two or 10.

The problem is that social media sites are a key method of communication, and thus are often the only way to document modern slang terms, especially ones only used within certain fandoms or subcultures. By excluding them on the basis of durability, we're preventing ourselves from accurately documenting language in the 21st century.

Exactly: it's still here and probably will be for several years: do you plan to burn your copy?

I don't, but what if I did? What if it was stolen? What if I lost it? What if I died and my family sold it or gave it away? Ironically, the most durable way I could preserve this book would be to scan it and post it online. A single copy of a book could actually be less likely to exist in a few years than an ebook archived on numerous websites.

Someone could flip a switch and destroy all of Reddit.

The same goes for Wiktionary.

I imagine that we're actually in much more accord here than you would suspect. I am very much in favor of using Urban Dictionary as a source for slang, for instance and I agree that there should not be gatekeeping of "lo" versus "hi" culture, particularly for language: Wiktionary would be much better for having more colloquial language.

I'm glad we're in agreement. My question then is how do you plan to cover such slang if you think we shouldn't be allowed to cite the only places where it's used?

If you see similar attempts to get rid of vulgar/popular speech, let me know.

Will do. Binarystep (talk) 00:44, 26 September 2022 (UTC)[reply]
"It's much easier to accept a handful of well-known websites for the time being." It's odd to me that you would write this, as it appears pretty similar to the elitism that you disliked. Why does being well-known matter?

"How isn't it speculative?" I wrote that they changed their policy (not speculation) and it can change again (not speculation). I never wrote that they will.

"Wiktionary itself is no more durable than Reddit or the Internet Archive, so I fail to see why we should hold those sites to a higher standard." Some sites are stored on paper or nickel plates or with redundant mirrors. Those are more durable than other sites.

"The problem is that social media sites are a key method of communication, and thus are often the only way to document modern slang terms, especially ones only used within certain fandoms or subcultures. By excluding them on the basis of durability, we're preventing ourselves from accurately documenting language in the 21st century... My question then is how do you plan to cover such slang if you think we shouldn't be allowed to cite the only places where it's used?" Yes, but we could have a standard similar to de.wq, where you need something quoted by an additional source and not just the primary source to be notable, so if/when another outlet quotes the language on a message board, then it can be included. I agree that it's a problem to miss out on all slang and vernacular language. I don't know what the solution is.
—Justin (koavf)❤T☮C☺M☯ 01:44, 26 September 2022 (UTC)[reply]
@Koavf:

It's odd to me that you would write this, as it appears pretty similar to the elitism that you disliked. Why does being well-known matter?

To be clear, I don't think it matters whether a website is well-known or not. The problem is that other users may be reluctant to cite obscure websites and blog posts, so I'm being pragmatic here and trying to come up with a good compromise, since a vote to allow all websites as sources would be far more likely to fail. Allowing cites from Reddit and Twitter specifically, while not perfect, is better than not allowing internet cites at all.

Yes, but we could have a standard similar to de.wq, where you need something quoted by an additional source and not just the primary source to be notable, so if/when another outlet quotes the language on a message board, then it can be included.

That would leave us with only the most popular/well-known terms. It would also gut our coverage of fandom and subculture slang, which is inherently less likely to become mainstream.

I agree that it's a problem to miss out on all slang and vernacular language. I don't know what the solution is.

The only viable solution is to allow citations from social media sites. We've already tried the "durable sources only" approach, and the result was a dictionary that didn't include any fandom slang newer than Buffy, The X-Files, and Star Trek. The more sources we exclude, the worse our coverage is, and with Usenet no longer being popular, we're at a dead end as far as internet slang goes. If nothing changes, the gaps in our coverage will only widen over the coming decades.

If nothing else, I don't think we should immediately ban Reddit so soon after we finally started allowing internet citations on a case-by-case basis. The least we could do is let the previous vote play out before trying to reverse it. Binarystep (talk) 09:36, 26 September 2022 (UTC)[reply]
Oppose Per all of the above — SURJECTION ^{/ T / C / L /} 05:10, 23 September 2022 (UTC)[reply]
Oppose Vininn126 (talk) 00:14, 24 September 2022 (UTC)[reply]
Oppose per my most about Twitter: first raise the standard for non-copy-edited content to open the gate just a little for a start (5 quotations to count as 1, spanning 3 years, or the like), and then allow all Wayback-Machine-archived content. We won't have to vote on individual sites: there are so many of them, anyway, why pick some, one at a time. --Dan Polansky (talk) 06:54, 30 September 2022 (UTC)[reply]
Oppose Reddit is in English except on the language and country-related subs, making it only useful for English cites, which is less useful compared to Twitter which is more diverse in terms of languages. There might be words that we could cite from Reddit, but they will most likely be also citable from Twitter or other places, so for now I don't see the need to allow Reddit. There should probably be another discussion if there were certain words that we could cite from Reddit but not Twitter - however I strongly doubt these words would be inclusion-worthy if these words are only citable from Reddit. – Wpi31 (talk) 09:57, 30 September 2022 (UTC) Change to procedural Oppose. It is premature to set up this vote, it should be closed as no-consensus. (Copied from below:I do however have to agree that this vote was never a good idea to begin with, and we already have an acceptable though not perfect process of citing Twitter and Reddit. It's really weird to vote for everything or nothing, and also the wording is unclear ("rejected" = "not allowed but we can vote for it next time if needed" or "banned and we should never mention about it again"?) We also have a thing called WT:Votes for exactly the purpose of voting, rather than creating a lengthy BP thread that is not going anywhere. I think we should just close these votes as premature or no consensus.) – Wpi31 (talk) 05:37, 4 October 2022 (UTC)[reply]
@Wpi31:

Reddit is in English except on the language and country-related subs, making it only useful for English cites, which is less useful compared to Twitter which is more diverse in terms of languages.

Allowing Reddit citations doesn't prevent us from citing non-English websites (if anything, it creates precedent in favor of doing so), and banning it isn't going to improve our non-English coverage. Reddit is currently the 9th most-visited website in the world and the 6th most-visited website in the US, making it one of the best possible resources for internet slang (at least in English). Banning it would leave noticeable gaps in our coverage, at no benefit to us or our readers.

There might be words that we could cite from Reddit, but they will most likely be also citable from Twitter or other places, so for now I don't see the need to allow Reddit.

Websites often form unique cultures with their own distinct slang. Only the most popular terms would be used on the broader internet, and I don't see why we should ignore niche internet slang when we've consistently documented slang used exclusively by specific subcultures. I suppose this ultimately just boils down to whether you consider internet slang to be less "real" or less important than IRL slang.
Additionally, given that you voted to ban Twitter as well, I'm not sure why you brought it up here as an example of a better source. Binarystep (talk) 14:48, 3 October 2022 (UTC)[reply]
"Reddit is currently the 9th most-visited website in the world and the 6th most-visited website in the US, making it one of the best possible resources for internet slang (at least in English)."
Per List of most visited websites, Twitter ranks 4th while Reddit is only 18th, meaning that Twitter likely has a larger user base. Reddit might one of the best resources for English, but Twitter is definitely better in terms of overall coverage (i.e. including both English and non-English).

"and banning it isn't going to improve our non-English coverage."
and allowing Reddit isn't going to improve our non-English coverage either.

"Banning it would leave noticeable gaps in our coverage, at no benefit to us or our readers."
There is considerable overlap between the user base of the two sites, so as I've said, I consider that the words that could be cited from Reddit will be also citable on Twitter, or somewhere else. If this gap still exists after we allow Twitter and other websites, then it might be beneficial to cite Reddit, but again, I doubt the existence of this gap.

"Websites often form unique cultures with their own distinct slang. Only the most popular terms would be used on the broader internet, and I don't see why we should ignore niche internet slang when we've consistently documented slang used exclusively by specific subcultures."
There are also many smaller groups based on Internet culture on Twitter, and I don't see why including Twitter but not Reddit would ignore these Internet slang. If we were to document words from the less broad part of the Internet, then why not also include sites such as 4chan, which also has lots of unique slang? Why specifically Reddit?

If you could give some examples that would only be citable from Reddit, then I might reconsider my vote.

"I suppose this ultimately just boils down to whether you consider internet slang to be less "real" or less important than IRL slang."
To be fair, a considerable part of my contribs are Internet slang, and I don't think Internet slang is less real than IRL slang.

To make myself clear, I do frequent these sites, and I'm well aware that there is useful stuff for us out there, but I don't think we should sacrifice the quality of our entries (i.e. whether these words are genuine or nonce) over quantity (i.e. including more words).

"Additionally, given that you voted to ban Twitter as well, I'm not sure why you brought it up here as an example of a better source."
I did not vote to ban Twitter, it was rather a weak oppose or a procedural oppose. I believe that Twitter will be a better source than Reddit and it deserves inclusion, but voting oppose was the only way to halt the process and avoid allowing Twitter wholesale, given the amount of support votes down there.
– Wpi31 (talk) 16:07, 3 October 2022 (UTC)[reply]
@Wpi31:

Reddit might one of the best resources for English, but Twitter is definitely better in terms of overall coverage (i.e. including both English and non-English).

I'm aware of that. That's still not a reason to ban Reddit, since allowing it would ultimately improve our English coverage even if it doesn't help much with non-English entries.

and allowing Reddit isn't going to improve our non-English coverage either.

I don't understand why your solution is "ban English-only sources" rather than "allow both English-only and non-English sources".

There is considerable overlap between the user base of the two sites, so as I've said, I consider that the words that could be cited from Reddit will be also citable on Twitter, or somewhere else. If this gap still exists after we allow Twitter and other websites, then it might be beneficial to cite Reddit, but again, I doubt the existence of this gap.

There are plenty of Reddit slang terms that aren't as likely to be used on Twitter due to its different format and culture.

There are also many smaller groups based on Internet culture on Twitter, and I don't see why including Twitter but not Reddit would ignore these Internet slang.

Allowing Twitter but not Reddit would prevent us from documenting terms used on Reddit.

If we were to document words from the less broad part of the Internet, then why not also include sites such as 4chan, which also has lots of unique slang? Why specifically Reddit?

I mean, we have Category:English 4chan slang.

If you could give some examples that would only be citable from Reddit, then I might reconsider my vote.

Alright.
[0] through [10]; [0} through [9}; {1] through {10]: Used to indicate how high someone is on a scale of 0 to 10, with curly brackets used to indicate whether someone is getting higher or coming down from their high. Originated from r/trees (Reddit's largest marijuana subreddit), but has since spread all over the site.

ATBGE: Initialism of "awful taste but great execution". Used for creative projects that, while tasteless, tacky, offensive, or crude, are exceptionally well-made.

Beetlejuicing: When someone makes a post or comment, and a user who responds has a username relevant to the subject of whatever they're replying to. Here's an example.

CMV: Initialism of "change my view".

GTBAE: Initialism of "great taste but awful execution". Used for low-quality creative projects that could have been significantly better if the creator had more skill or experience.

ITAP: Initialism of "I took a picture".

karma farming: Making low-quality posts in an attempt to gain upvotes.

OOP: Initialism of "original original poster". Refers to the person who originally posted content which was later crossposted.

orangered: The reddish-orange color used for Reddit upvotes.

r/: Placed before words and phrases to form nonexistent subreddit names for the purposes of a joke. Similar to a hashtag in function.

Reddit hug of death: An accidental DDOS attack caused by linking a smaller website on Reddit, resulting in an unexpected flood of visitors.

r/ihadastroke: A response to incomprehensible posts or comments.

r/lostredditors: A response to someone posting in the wrong subreddit.

r/nottheonion: A response to news headlines that seem too insane to be true.

r/thatHappened: A response used to accuse a poster of lying.

r/theydidthemath, r/theydidthemonstermath, r/itwasagraveyardgraph: Responses to someone posting a trivial but interesting math calculation (e.g. the exact dimensions of the house from Home Alone). Always in the form of a comment chain, with the first user posting "r/theydidthemath", followed by "r/theydidthemonstermath", and finally "r/itwasagraveyardgraph".

r/untrustworthypoptarts: A response used to accuse a poster of lying.

r/woooosh: A response to someone missing a joke.

SRDine: A user on r/SubredditDrama. This may seem overly specific, but we've allowed terms for members of specific Usenet newsgroups, which aren't all that different from subreddits.

the old Reddit switch-a-roo: A chain of linked Reddit comments going back to 2011. I'm not really sure how to explain this one concisely, but it's definitely attestable, and confusing enough that it'd be beneficial to users if we had a definition.

title gore: An incomprehensible post title.

updoot: DoggoLingo version of "upvote". Sometimes used in earnest, often used to mock users perceived as caring too much about upvotes.

This isn't an exhaustive list, but it should give you an idea of what I'm talking about.

To make myself clear, I do frequent these sites, and I'm well aware that there is useful stuff for us out there, but I don't think we should sacrifice the quality of our entries (i.e. whether these words are genuine or nonce) over quantity (i.e. including more words).

If we preemptively reject entire websites out of fear that they'll contain more nonce words than print media and Usenet, we're sacrificing quality and quantity.

I did not vote to ban Twitter, it was rather a weak oppose or a procedural oppose. I believe that Twitter will be a better source than Reddit and it deserves inclusion, but voting oppose was the only way to halt the process and avoid allowing Twitter wholesale, given the amount of support votes down there.

The thing is, this is a vote to either allow or ban Reddit and Twitter, with no outcome in between. We already had a system where online sources could be allowed on a case-by-case basis (hence the amount of internet slang added recently), and unfortunately, this proposal is an attempt to replace that with an all-or-nothing system (see the proposer's comments here).

This is exactly why I didn't think this vote was a good idea to begin with. Our existing system was an acceptable compromise, and an attempt to automatically allow internet sources is more likely to face opposition. Binarystep (talk) 00:21, 4 October 2022 (UTC)[reply]
Thank you for your comments, I do see a reason to include Reddit given the examples and other insights, so I would support inclusion. I do however have to agree that this vote was never a good idea to begin with, and we already have an acceptable though not perfect process of citing Twitter and Reddit. It's really weird to vote for everything or nothing, and also the wording is unclear ("rejected" = "not allowed but we can vote for it next time if needed" or "banned and we should never mention about it again"?) We also have a thing called WT:Votes for exactly the purpose of voting, rather than creating a lengthy BP thread that is not going anywhere. I think we should just close these votes as premature or no consensus. – Wpi31 (talk) 05:37, 4 October 2022 (UTC)[reply]

Do we even want to have entries for subreddit names that are used as statements? I'm not sure we'd want to document the limitless number of hashtags on Twitter, or Facebook tag groups, and these feel like the same sort of phenomenon. Most of the other ones seem potentially worth including, although I'm also unsure about having separate entries like "[3]", "[7]", etc. Maybe it would make more sense if they were all documented under "[ ]" or "{ ]", "[ }".

(BTW, almost all of those can technically be cited from Twitter anyway, even if they aren't primarily or originally used there. But of course citing Reddit for Reddit slang would be much more natural.) 98.170.164.88 05:56, 4 October 2022 (UTC)[reply]

Abstain

~~Abstain~~ for now, per Twitter: if we don't ban Usenet, Reddit is probably not much worse, but the lack of editorial control is a concern. Increasing the required number of quotations or the year span could alleviate things a bit. --Dan Polansky (talk) 14:26, 21 September 2022 (UTC)[reply]

Abstain, on the basis that Reddit is somewhat useful for attestation, but the concerns among the opposing votes are equally valid. IMO allowing Reddit will be more constructive and beneficial to Wiktionary, but extensive care and attention would be required on how cites are included and used. The current format of case-by-case voting on Reddit and Twitter is an acceptable implementation of this principle, but is too unwieldy and ineffective in the long run. Leaning towards Support if we impose certain limitations to address these issues, e.g. size of subreddit, age of account, amount of karma of the account, amount of upvotes on the post/comment, whether the post/comment was made before the term was created/RFVed (or the time between) (the details of these restrictions can be discussed later if needed) – Wpi31 (talk) 15:02, 28 September 2022 (UTC)[reply]

Changed to oppose after some thoughts, see my comments above. – Wpi31 (talk) 09:57, 30 September 2022 (UTC)[reply]

Result

No consensus on whether Reddit should be regarded as a durably archived source (9–9). — Sgconlaw (talk) 15:47, 28 October 2022 (UTC)[reply]

Twitter

https://twitter.com

Twitter should be allowed

Support Binarystep (talk) 11:53, 21 September 2022 (UTC)[reply]
Support. Most promising replacement to Usenet. Has a large active global userbase. Content is accessible for free without registration. Advanced and easy-to-use search functionality. Indexes emojis. Every pre-2018 public tweet has been archived by the Library of Congress. WordyAndNerdy (talk) 12:43, 21 September 2022 (UTC)[reply]
But not tweets after 2018? Can tweets be archived at the Internet Archive? — Sgconlaw (talk) 14:48, 21 September 2022 (UTC)[reply]
Yes, tweets can be archived via the Wayback Machine, provided they are public. I was just pointing out the existence of the pre-2008 LoC archive because even though it isn't publicly accesible (to my knowledge) it strikes me as similar to Google's preservation of historical Usenet posts. WordyAndNerdy (talk) 15:43, 21 September 2022 (UTC)[reply]
@WordyAndNerdy: OK, thanks for clarifying. — Sgconlaw (talk) 16:58, 21 September 2022 (UTC)[reply]
@Sgconlaw, @WordyAndNerdy, for the record, pretty much every public tweet is automatically archived onto Internet Archive at this point. I’ve seen even my own tweets post-2018 be archived. AG202 (talk) 17:04, 21 September 2022 (UTC)[reply]
— Sgconlaw (talk) 17:10, 21 September 2022 (UTC)[reply]
Support Acolyte of Ice (talk) 12:44, 21 September 2022 (UTC)[reply]
Support Allahverdi Verdizade (talk) 15:29, 21 September 2022 (UTC)[reply]
Support Overlordnat1 (talk) 16:08, 21 September 2022 (UTC) Update: I suggest we recreate the recently deleted sense at jogger when this passes. It was probably correct to delete it according to current rules about derogatory terms but if and when the proposal to allow Twitter passes there are clearly sufficient citations at the associated citations page to recreate it. --Overlordnat1 (talk) 21:46, 10 October 2022 (UTC)[reply]
Support PseudoSkull (talk) 16:09, 21 September 2022 (UTC)[reply]
Support - but it needs to be done well. No one-word cites on their own, but if we can group a thread together as a single citation to provide context (as someone suggested), that would be good. Theknightwho (talk) 13:03, 22 September 2022 (UTC)[reply]
Support I don't know how else I can cite Armenian slang. --Vahag (talk) 13:11, 22 September 2022 (UTC)[reply]
Support. Andrew Sheedy (talk) 01:47, 23 September 2022 (UTC)[reply]
Support - excarnateSojourner (talk | contrib) 16:54, 23 September 2022 (UTC)[reply]
~~Support~~. This is the baby and the bathwater problem. The baby is significant, as per "Most promising replacement to Usenet." We do depend on a non-edited resource for slang, Usenet, but this works poorly for languages other than English. Twitter can be a great equalizer in that regard, bringing in many more languages, as per "I don't know how else I can cite Armenian slang." Gaming the system is a problem, but it can be solved by ad hoc policy-overriding RFDs for individual terms; not ideal, but possible. Furthermore, if we accept Twitter, we may figure out a policy that will make these policy-overriding RFDs unnecessary. The bathwater is real and will need to be dealt with later. If we reject Twitter, I don't know how else we are going to get the baby in. Discussing each set of Twitter quotations on a per RFV basis seems unworkable: some RFVs have sections for support and oppose set up and no one posted anything. Online news do not bring any significant benefit over Google Books, by my estimate, and those proposed are English-only; if the passed vote that we are implementing here was to do anything, it was to allow Twitter, at least on a per-term basis. It was mentioned OED uses Twitter quotations; their advantage is they have editorial discretion to disregard some. The opposition has not figured out any policies so far to filter out the bathwater. For Czech and other languages with heavy use of diacritics, Twitter may attest diacritic-free forms via lazy writing, a bad outcome; this is to be addressed later. At worst, these low-value entries would be created as "diacritic-free form of X"; far from ideal, but tolerable, and avoidable unless blocked by a superminority. The more risk-averse procedure would be to reject Twitter and only allow it after an editorial discretion policy to reject attested terms has been adopted, but this seems not realistic. Many years have passed without anyone, including me, moving the project of including more attestation sources forward. Furthermore, our users are adults, and can check for themselves what kind of quotations support an entry, and not take it seriously unless supported by evidence. We should label Twitter-attested entries as "Internet slang", "Internet only" or the like, so the readers do not even need to look at the quotations. About "durably archived", I don't think this is what the poll is about, but as for that, "Every pre-2018 public tweet has been archived by [Library of Congress]", and newer tweets are claimed to be in Wayback Machine. If a tweet gets removed from the archive, an entry previously supported can be RFV-relitigated and RFV-failed as applicable. Those who want to prevent relitigation would do well to provide, say, 6 attesting tweets so even if half of them gets removed, the entry is still good. I would be on board with requiring 6 attesting quotations from Twitter as a minimum. To sum up, this is only a hesitant support because of the mixture of baby and the dirty bathwater, but on balance seems to be the most promising way to move the matter forward and kind of force the opposition (including previously myself) to do more policy work later. (Excuse the loquacity, but I think it is better than the bare supports; at least mentioning some rationale keywords in the posts would be very preferable. This was to be a discussion per CFI, not a rationale-free poll. No one is required to read a paragraph that is obviously fairly long.) --Dan Polansky (talk) 06:46, 24 September 2022 (UTC)[reply]
Since I am a fan of conditional supports where they make sense, I require Twitter uses to be by humans, not bots. This may be implied by the requirement of "use" to "convey meaning", but here I make it explicit. --Dan Polansky (talk) 19:15, 29 September 2022 (UTC)[reply]

Support. Andrew Sheedy (talk) 14:55, 1 October 2022 (UTC)[reply]
@Andrew Sheedy Assuming that you didn't mean to double vote? AG202 (talk) 15:12, 1 October 2022 (UTC)[reply]
Oops, thanks. I completely forgot I voted already. Andrew Sheedy (talk) 16:11, 1 October 2022 (UTC)[reply]
Support Very useful for attesting Internet slang that would never appear in print. Ioaxxere (talk) 16:29, 7 October 2022 (UTC)[reply]
Support any website including this one. We need more ability to add words, not more restriction. PseudoSkull (talk) 18:46, 21 October 2022 (UTC)[reply]

Twitter should be rejected

Oppose unless we refactor the CFI so that terms which are extremely rare (nonces, typos, etc.) are not kept without compelling reason given. - TheDaveRoss 12:54, 21 September 2022 (UTC)[reply]
I would like to add to my objection that this has not been thought out well at all. There are a multitude of bots (AI, reposting, etc.) which post on Twitter, (e.g. https://twitter.com/tayandyou), are their posts citable? If not, where is the policy which says so? There are so many ways in which Twitter is a problem, in which all social media is a problem, when it comes to providing citations in the context of our current CFI. The amount of utter nonsense which will be, according to policy, acceptable based on Twitter alone is staggering. - TheDaveRoss 14:50, 29 September 2022 (UTC)[reply]

A helpful Tweet from Tay (AI), perhaps useful in citing monkey? - TheDaveRoss 14:54, 29 September 2022 (UTC)[reply]
Human use can be required via a follow-up vote. I find it hard to imagine that a superminority would like to block such a proposal. One may also argue that "use" to "convey meaning" is something bots cannot do at current state of AI: one may philosophically interpret current CFI as already implicitly prohibiting citing bots. I don't see the problem with the monkey quotation: even if interpreted as made by a human, it would just be a derogatory use of the word "monkey" to refer to a human, something that sees widespread use anyway. Bots are the easy problem.

I do feel that the dirty bathwater is poorly mapped and documented. I do wonder whether my fix-it-later support is not perhaps too bold, given any later fixes will need to overcome 2/3 supermajority threshold. Do we have some other problematic examples, other than bots? --Dan Polansky (talk) 19:13, 29 September 2022 (UTC)[reply]
Oppose. I think the fact that there is little to no moderation makes the site vulnerable to being used to game the system – a number of users intent on trying to get a term into the Wiktionary could cook up a number of posts using the term, and then use that as the "evidence" for inclusion of the term. — Sgconlaw (talk) 19:00, 21 September 2022 (UTC)[reply]
Oppose per my comment above about Reddit. - -sche (discuss) 23:53, 22 September 2022 (UTC)[reply]
Oppose Per all of the above — SURJECTION ^{/ T / C / L /} 05:10, 23 September 2022 (UTC)[reply]
Oppose for now per Dave. I might change to support if we add some restrictions, things like excluding AI, echo chambers, etc. We need a way to measure HUMAN interaction, and have a certain threshold. Vininn126 (talk) 15:01, 29 September 2022 (UTC)[reply]
Weak oppose per Dave & Vininn. I was initially torn on whether or not to vote in support or to vote abstain, but after discussions in the Discord server and Dave's comments here, I'm actually more inclined to vote a weak oppose for now. Twitter is a solid source of information and language, especially for minority languages and could be such a helpful resource for the project. However, at the same time, it can be a weapon to harm the work that's been done here and the project's legitimacy. There need to be guidelines set in place before allowing the floodgates to open, otherwise we'll run into the same issues that we have right now, except exemplified. One might state that we can vote to allow it now and make regulations later, but knowing this website and the discussions that I've witnessed, that is exponentially harder to do, and once something is written "in stone", people are much less likely to change it. I do wish that there were a middle option that's not "abstain", but this is what we've been presented with at the moment. AG202 (talk) 15:08, 29 September 2022 (UTC)[reply]
Oppose per others, after more thinking. My own formulations are as follows. The fix-it-later approach I proposed in my previous support vote seems too cavalier. As a general principle, we have no idea of what the dirty bathwater is. Why open the gate fully when we can first open it just a little and see what happens? Open just a little means e.g. allow Twitter but require 10 or 5 Twitter quotations to count as 1 Google Books quotation and require spanning 3 years for Twitter. Our current attestation standards are lenient and generous enough: consider the wealth of English synonyms attested for anatomy entries in our thesaurus, very impressive. One can attest interesting rare words: I recently attested misargument from print alone, not Usenet. No one here has presented any supporting material, like what terms are we saving via Twitter, and the like. As another related general principle, I see no reason why a conversational Twitter quotation should carry as much weight as a quotation from edited printed material. Per Merriam-Webster's Kory Stamper[8]: "dictionary entries need to be based on a word’s accumulated and sustained use in print". Our 3-quotations-from-print requirement is more lenient than that: 3 is not "accumulated and sustained". Admittedly, we allow a single Usenet quotation to count as one Google Books quotations but this is a legacy defect, not a feature, a remnant of the wild vote-free origination of CFI. From this perspective, the diminished use of Usenet as of late is beneficial. Twitter is very conversational, even more than Usenet, and if we allow Twitter, we might as well allow blog posts (often better edited than tweets), article discussion posts, and the like. The proper procedure is to first raise standards for non-copy-edited material or non-print material, and then allow all Wayback-Machine-archived Internet with the new more stringent standard. I am sorry that this does not directly move the Twitter case forward, but again, what prevents us from opening the gate just a little to start with, as a test run of sorts? This concern is entirely orthogonal to whether something is archived. It only so happened that allowing print + Usenet gave us a good editing filter. This conservatism seems appropriate: once something bad is in, it is hard to get it out: non-addition of a thing requires a 1/3-superminority while removal requires 2/3-supermajority. This very poll is evidence: so far, 10 people support that a single tweet has as much weight as a single print quotation; how is the 7-sized minority going to convince the 10-sized majority later that we need to raise the standard? Also concerning is that the supports are mostly bare, not even saying "per arguments located at X" or "per person Y". --Dan Polansky (talk) 06:50, 30 September 2022 (UTC)[reply]
Oppose for now, per Dave, Vininn, and AG202. Twitter will be useful for attestation of many languages. However I think more thoughts will be needed before implementing this change – we should not allow everything from Twitter wholesale without adding restrictions to filter out all the nonces and nonsenses. – Wpi31 (talk) 09:57, 30 September 2022 (UTC)[reply]

Abstain

~~Abstain for now.~~ It is probably not much worse than Usenet, that's a pro. But there is no editorial control so who knows what can be 3-attested. But then, are we serious about forbidding Usenet? If not, is Twitter about the same or much worse? Would a requirement of spanning more years or a higher number of quotations alleviate concerns? Someone said OED accepts quotations from Twitter. I'll wait for the discussion to develop. --Dan Polansky (talk) 14:24, 21 September 2022 (UTC)[reply]
"Someone said OED accepts quotations from Twitter." Yes, and Usenet too [9] Ioaxxere (talk) 13:00, 8 October 2022 (UTC)[reply]

Result

No consensus on whether Twitter should be regarded as a durably archived source (12–8). (Consensus is indicated by at least two-thirds support: see "Wiktionary:Voting policy".) — Sgconlaw (talk) 15:47, 28 October 2022 (UTC)[reply]

I agree with your assessment, however I think it is worth noting that many of the opposing votes were close to supporting if Twitter (and perhaps some other social media style sources) had some reasonable criteria established for their content. For everyone who wishes to use Twitter for primary, CFI citations, let's work on some framework that bridges the gap. It is also important to note that it is OK to cite Twitter, those cites just do not count for CFI compliance. If the origin of a word is on Twitter please do include the citation. If the best examples of usage come from Twitter, by all means use them. But there must also be evidence elsewhere to the degree required by CFI. - TheDaveRoss 15:57, 28 October 2022 (UTC)[reply]

Don't know if you've been following the news but Twitter's days are numbered. It could've served as a valuable lexical resource over the last decade but Wiktionary collectively allowed it to sit on the shelf for so long that it expired. Anyway, guess this marks my permanent exit from the site. So long. WordyAndNerdy (talk) 16:17, 28 October 2022 (UTC)[reply]

@TheDaveRoss: sure, go ahead and try formulating a policy for further discussion. — Sgconlaw (talk) 16:21, 28 October 2022 (UTC)[reply]

What I see above and what my closure of the above discussion cum poll would be is this: Twitter quotations can be used for attestation going forward but must meet a much more stringent standard than any 3 independent quotations spanning a year. Thus, if someone provides 30 quotations spanning 5 years, it will be hard to argue it is not a pass. The basis is this: 1) 60% = 12 / (12 + 8) is a significant supermajority, and if you wish, you can count my vote as a conditional switch to support, yielding 65% = 13 / ( 13 + 7), very close to 66.6% = 2/3; 2) multiple of the oppose comments indicated they do not oppose Twitter per se but rather Twitter combined with the 3-quotes-in-1-year standard with no further restrictions; 3) Wiktionary:Voting policy says that for consensus, 2/3 is a hint and not a firm threshold--the determination of consensus should not be based on blind tallying but rather should consider other concerns, which would include comments made by the pollers; 4) anything else would be de facto annulling of the passed Wiktionary:Votes/pl-2022-01/Handling of citations that do not meet our current definition of permanently archived (77.7% = 21 / (21 + 6)), whose main purpose was not to enrich Google Books with edited online news publications (very little added value there) but rather to add the likes of Twitter. What we are trying to find through this poll cum discussion is where and for what the consensus is; it is not a mere formalistic political-vote-like exercise. If case of doubt, the bold text could be sent to a formal vote, but I don't think it's necessary since it follows from observing and aggregating the discussion that already took place. Nor do I think it necessary or advisable to wait until comprehensive formal algorithmic criteria are developed: that can be hard to do and take months or years. --Dan Polansky (talk) 07:08, 29 October 2022 (UTC)[reply]

Vice

https://www.vice.com

Vice should be allowed

Support — Sgconlaw (talk) 11:13, 21 September 2022 (UTC)[reply]
Support Binarystep (talk) 11:53, 21 September 2022 (UTC)[reply]
Support WordyAndNerdy (talk) 12:43, 21 September 2022 (UTC)[reply]
Support Acolyte of Ice (talk) 12:44, 21 September 2022 (UTC)[reply]
Support AG202 (talk) 13:17, 21 September 2022 (UTC)[reply]
~~Support~~ W:Vice (magazine), a copy edited online magazine. --Dan Polansky (talk) 14:19, 21 September 2022 (UTC)[reply]
If the intent is to interpret this as "durably archived", VICE News is listed as archived at https://www.loc.gov/collections/general-news-on-the-internet-web-archive/?st=list, an archive of Library of Congress. --Dan Polansky (talk) 12:13, 23 September 2022 (UTC)[reply]
Support, and while I don't think conditional voting is a thing I would like to make clear that I am voting to support assuming that we are reasonably restricting what we mean when we say that a certain site is allowed, namely that what is allowed from those sites is content they have "published", generally by selected contributors (paid or otherwise) and put through editorial review and approval. Not public blogs, not comment sections, not banner ads, not site source code, not Tweets by the publication hosted elsewhere, etc. - TheDaveRoss 15:14, 29 September 2022 (UTC)[reply]
Support what's going on here? –Jberkel 19:39, 30 September 2022 (UTC)[reply]
Support any website including this one. We need more ability to add words, not more restriction. PseudoSkull (talk) 18:46, 21 October 2022 (UTC)[reply]

Vice should be rejected

Oppose. I oppose accepting individual sites and listing them anywhere or a per-site basis. That is bizarre and helps nothing. The added value of Vice over material in print in Google Books is minuscule: you won't be able to attest slang and this is a single English-only site. Make a general policy that online publications subject to editing are as acceptable as Google Books and list some canonical examples, e.g. New York Times and at least one example of a tabloid to show that being a tabloid is no exclusion criterion. --Dan Polansky (talk) 07:05, 30 September 2022 (UTC)[reply]
I share this sentiment, while I am fine with Vice and HuffPost, I am equally fine with hundreds of other online media outlets of comparable editorial rigor and content moderation. I am not sure how/where to draw a clear line to make a policy, but if we can come up with one and not have to go through this sort of process hundreds of times for similar media that would be great. - TheDaveRoss 12:34, 3 October 2022 (UTC)[reply]
@Dan Polansky, TheDaveRoss: I do not see why this is "bizarre". As mentioned at the top of this discussion, I created the discussion because the issue of whether Reddit and Twitter should be regarded as durably archived sources came up at "Wiktionary:Requests for verification/English#jogger", and according to WT:CFI "Other online-only sources may also contribute towards attestation requirements if editors come to a consensus through a discussion lasting at least two weeks." If that sentence in WT:CFI does not mean discussing whether individual sites should be permitted as sources or not, then kindly start a formal vote at "Wiktionary:Votes" for that sentence to be amended. — Sgconlaw (talk) 12:42, 3 October 2022 (UTC)[reply]
I don't think you were wrong to create this discussion, I agree that you were following the process set forth. I would hope we can come up with something better than the process to permit more sources of the HuffPost and Vice sort without having to have one of these polls for each of them. - TheDaveRoss 14:30, 3 October 2022 (UTC)[reply]

Result

Vice should be regarded as a durably archived source (passes 9–1). — Sgconlaw (talk) 15:47, 28 October 2022 (UTC)[reply]

General discussion

Should archival via the Wayback Machine or an equivalent service be mandatory for web citations? 70.172.194.25 12:41, 21 September 2022 (UTC)[reply]

Probably, yeah. Binarystep (talk) 12:46, 21 September 2022 (UTC)[reply]

I think it is definitely a good practice to archive web citations, though I'm not sure whether it should be mandatory. We seem to allow Usenet citations because it is assumed that Google will indefinitely keep Usenet content accessible. Thus, perhaps one of the "tests" of durability should be whether there's consensus that a website in question can be archived or is likely to remain accessible for a long time? (Can Reddit and Twitter be archived at the Internet Archive? I've never tried doing so.) — Sgconlaw (talk) 14:47, 21 September 2022 (UTC)[reply]

Yes, it is possible to archive both Reddit and Twitter using the Wayback Machine. Examples: Reddit, Twitter. (Note: when I tried to archive using www.reddit.com, it gave me an error message, but it worked with old.reddit.com.) 70.172.194.25 15:04, 21 September 2022 (UTC)[reply]

Yes, Twitter can be archived- see for instance: diff. I can (and do) archive stuff at archive.today and Internet Archive and then cite it on Wiktionary all the time. I think those sources are helpful and interesting to the reader. Under the new "two week discussion" regime, those sources can probably stand up to rfv if the evidence for a given entry is convincing enough. But I don't think that the meaning of "durably archived" really needs to change, or if it does, not by the means above. Above, we see a bare url is provided for each site without any hint that the website is actually archived durably somewhere. For each of the above websites, provide at least a loincloth of explanation of how, absent Internet Archive or whatever else, the site is bona fide durably archived. That's just my opinion. --Geographyinitiative (talk) 15:13, 21 September 2022 (UTC)[reply]

The "durably archived" requirement is a surrogate for "copyedited" anyway. We copy our quotations into our space. If we later find our quotations are no longer available in any archive, we may relitigate the entry in RFV. The concern should really be, do we allow anything not copy edited to count toward attestation? And if we allow Twitter, the answer is probably, yes. What the consequences of that may be I cannot estimate; 3 quotations from anything anywhere, de facto? --Dan Polansky (talk) 15:39, 21 September 2022 (UTC)[reply]

@Dan Polansky: I think "durably archived" and "copyedited" are two separate issues. The requirement for a source to be "durably archived" is so there is a reasonable chance that some time in the future it will still be possible to access the source to verify it. On the other hand, it would be good if sources which are "copyedited" (which I take to mean fact-checked and edited?) are preferred to unedited, user-generated content, but I don't think we have any consensus on this issue yet. — Sgconlaw (talk) 16:55, 21 September 2022 (UTC)[reply]

They are two separate notions; that's why I said "surrogate". The reason why it was hard to remove the durably archived requirement was that it served to filter out non-copy-edited content: not entirely, but quite a lot.

"Fact-checked" is not "copy-edited": copy editing is concerned with spelling, minor wording and style. See copy edit. --Dan Polansky (talk) 17:00, 21 September 2022 (UTC)[reply]

According to WT:CFI, “other online-only sources may also contribute towards attestation requirements”, but cannot be “considered durable”, which would be a blatant contradiction and irreconcilable with the meaning of the word “durable”—I mean, there is a possibility of legal fiction, but it would be unlikely if I were to discern it in the formulation, as legal fictions are generally worded clearer. I maintain my position (sufficiently explained and evidenced in the past since the vote introducing the wording) that the objective meaning of the vote was to allow to consider the presence of a term in other online-only sources (but not individual occurrences) sufficient on a case-by-case basis (basically making sure a word is not an imageboard troll or short-lived meme, but not counting). Logically, I will not consider any of the proferred websites “durable”. Fay Freak (talk) 17:05, 21 September 2022 (UTC)[reply]

Literally speaking, CFI does not require "durable"; it requires "permanently recorded media". google:"permanently recorded media" finds 48 result on the whole web if you click "next". The phrase was creatively "interpreted" for our practical purposes, but what it means in idiomatic English I have little idea. This poll is reasonably well set up and is likely to guide future acceptance of sources of attestation. --Dan Polansky (talk) 17:19, 21 September 2022 (UTC)[reply]

Query: when we say "HuffPost is allowed" or "Vice is allowed" is it reasonable to assume that what we mean is that "Professionally edited article published by [outlet] on [outlet].com are allowed"? We are not allowing, say, comments in the comments section on an article, or a banner ad which is displayed on an article, or a non-edited blog post in a semi-public blogging section (as opposed to an edited and published op-ed, say), or a Tweet by [outlet], or a variable in the website source code of [outlet].com? I think there are thousands of similar sites of comparable editorial rigor that could all be allowed, but merely saying that a site is allowed can be interpreted in various ways, some of which are probably broader than intended by most voters. - TheDaveRoss 12:42, 23 September 2022 (UTC)[reply]
I would think we can assume that, but only because of the non-passed Wiktionary:Votes/2021-09/New standard for archived quotations, which rejected that archiving by Wayback Machine is enough for "durably archived". That notwithstanding, there is probably nothing much better then Wayback Machine for online news websites. The non-passed vote quasi-confirmed the confusing situation by which CFI only talks of "permanently recorded media" and "durably archived", yet many Wiktionary editors actually want "having sound editorial control" as a requirement, and are using current CFI to get that requirement half-way, bar Usenet. These are conflicting requirements: if something nasty is published in print, it meets CFI as specified, and that's it. (I tried Vice in the loc.gov archive, and I cannot navigate to individual articles there; I can do so in Wayback Machine. loc.gov seems no better than Wayback Machine.) --Dan Polansky (talk) 14:43, 23 September 2022 (UTC)[reply]
@TheDaveRoss: that would be my understanding. — Sgconlaw (talk) 08:38, 25 September 2022 (UTC)[reply]
@TheDaveRoss @Sgconlaw It may be a good idea to have a proper vote on these as a whole, including that stipulation. Admittedly, the exact line of where something counts as copy-edited is blurry, but print-media has a similar issue (even if you exclude self-published stuff). Theknightwho (talk) 12:04, 25 September 2022 (UTC)[reply]
@Theknightwho: perhaps you can formulate this into a separate Beer Parlour discussion first. — Sgconlaw (talk) 13:23, 25 September 2022 (UTC)[reply]

Administrative matters

Upon the conclusion of this discussion, what should be done? (See the discussion at the top of this section.) — Sgconlaw (talk) 18:35, 21 September 2022 (UTC)[reply]

Option 1

Record the result at "Wiktionary:Criteria for inclusion/Accepted online-only sources" (or some other suitably named non-policy page; please suggest); no further discussion required. In future, discussions about acceptable online-only sources can take place here at the Beer Parlour.

Support

Support — Sgconlaw (talk) 18:35, 21 September 2022 (UTC)[reply]
Support - TheDaveRoss 12:15, 22 September 2022 (UTC)[reply]
Support - no need for interminable bureaucracy when it would be pointless duplication. Theknightwho (talk) 14:17, 22 September 2022 (UTC)[reply]

Oppose

Abstain

Abstain I don't like this at all, but I feel it is not unacceptable, so I abstain. See my comment in option 2. --Dan Polansky (talk) 18:44, 21 September 2022 (UTC)[reply]

Option 2

Proceed to a formal vote at "Wiktionary:Votes" and record the result at "Wiktionary:Criteria for inclusion" itself; and in future, discuss such issues directly at "Wiktionary:Votes" rather than here at the Beer Parlour.

Support

Support 'Proceed to a formal vote at "Wiktionary:Votes" and record the result at "Wiktionary:Criteria for inclusion" itself'. This is more convenient for CFI readers than having to click through to a subpage. Having a subpage editable seams to be an underhanded trick to have CFI quasi editable without a vote anyway. The list is unlikely to grow too long anyway. The good thing about BP is that one does not need to wait a week before it starts, and I think this BP poll is fine. I think this is a bit of a false dichotomy: one can have a fairly conclusive BP poll and then formalize it via a one-week vote if required. One disadvantage of votes is that they can only be started one week after creation; maybe we should abolish that requirement as too bureaucratic. One cannot have it both ways, complaining of votes being too bureaucratic but at the same time creating procedural hurdles to make them more bureaucratic. --Dan Polansky (talk) 18:43, 21 September 2022 (UTC)[reply]

Oppose

Oppose I don't think it's a good idea to add the whitelist of sources to WT:CFI itself; it is certainly possible that the list may become quite long and make that policy page difficult to read. Also, if we're already going to have to have a formal vote on sources, it doesn't seem to me helpful to require two separate rounds of discussions and voting, one at the Beer Parlour and one at WT:VOTES. We might as well just save time by proceeding straight to a formal vote. — Sgconlaw (talk) 18:52, 21 September 2022 (UTC)[reply]
What do you mean by "difficult to read"? Cannot one visually skip a comma-separated list one is not interested in?

What's wrong about proceeding to a formal vote straight ahead? I created multiple votes without much prior discussion, but some complained. But some of such created votes were a success. People first create obstacles to well functioning votes, then complain that votes are too bureaucratic and switch to Beer parlour to eliminate the selfsame obstacles they put into votes. Pretty weird. --Dan Polansky (talk) 19:00, 21 September 2022 (UTC)[reply]
I agree that if we are going to require formal voting, then there's no point in having a separate Beer Parlour discussion first. A Beer Parlour discussion prior to a formal vote is helpful for more complicated matters to refine proposals, but unnecessary for a discussion on sources, I feel. Overall, however, I don't think we need formal voting for agreeing on acceptable online-only sources. — Sgconlaw (talk) 19:06, 21 September 2022 (UTC)[reply]
There is no deep difference between "formal voting" and Beer parlour icon-supported and support-oppose-section-supported polling. Votes have more visibility. And votes have a defined fixed 2/3 threshold, which I opposed, but I was a tiny minority. Changing CFI would require a vote per rules, which BP cannot do; thus, the difference is more formal than material. Unless, of course, BP discussion passed at plain majority, that would be a deal breaker. --Dan Polansky (talk) 19:17, 21 September 2022 (UTC)[reply]
Oppose - TheDaveRoss 12:16, 22 September 2022 (UTC)[reply]

Abstain

thirty nine

Why is there no entry for this English spelling out of '39'? --RichardW57m (talk) 11:34, 21 September 2022 (UTC)[reply]

Have you seen thirty-nine? Flackofnubs (talk) 11:36, 21 September 2022 (UTC)[reply]

Yes, it makes no mention of 'thirty nine'. RichardW57m (talk) 11:44, 21 September 2022 (UTC)[reply]

You can add it as an alternative spelling if you like :)! And all the others too, like fifty five, twenty nine... Flackofnubs (talk) 12:12, 21 September 2022 (UTC)[reply]

@RichardW57: only if the non-hyphenated versions are attestable, please. (Check before creating.) — Sgconlaw (talk) 14:52, 21 September 2022 (UTC)[reply]

User:Kanjishowa21-4 for templateeditor group

This is for Kanjishowa21-4, for the Template editor group. — This unsigned comment was added by Kanjishowa21-4 (talk • contribs).

No. Thadh (talk) 09:18, 24 September 2022 (UTC)[reply]

Including hyphenated prefixed words as single words

I think we should include hyphenated prefixed words as single words. Take WT:CFI:

This in turn leads to the somewhat more formal guideline of including a term if it is attested and, when that is met, if it is a single word or it is idiomatic.

Take non-French. It is a single word consisting of multiple parts, not a multi-word term and not a compound. The above passage requires idiomaticity only for terms that are not single words. CFI then contradicts the above by giving ex-teacher as an example excluded, but as quoted, if something is a single word, idiomaticity is not invoked in the first place. And ex-teacher is protected by WT:THUB.

Admittedly, such prefixed words are not particularly exciting, but nor are -ness and -less words and hyphenless non- and anti- words. It was said the prefixing is very productive, but so is suffixing with -ness. They are single words, and that's it.

Many such words are protected by WT:COALMINE and WT:THUB. English is fairly unique among European languages in using hyphen for prefixation, so THUB is quite a powerful protector for them. But THUB is still only an incomplete protector.

Hyphenated words made with prefixes non- and anti- have hardly ever been questioned in RFD, but there are recent exceptions, namely non-French. Hyphenated words prefixed with ex- sometimes failed RFD, namely ex-Christian.

Prefixing with or without hyphen is fairly chaotic and unpredictable in English. For instance, prefix ex- is usually used with a hyphen, but prefixes non- and anti- much less so, and it is not clear why. The matter is not only decoding (which is trivial) but also encoding, documenting for the user which spellings of prefixed words are in actual use. People use dictionaries not only for definitions but also as a spelling guide. This should ideally be done systematically by the single-word principle rather than relying on somewhat random and incomplete protection via THUB and COALMINE. non-Canadian should not need to depend on non-standard nonCanadian to survive, and I would be happy to see nonCanadian deleted.

One special case is prefixing of capitalized words: for them, the hyphen is usually forced by custom. Thus, there is non-Canadian, anti-American, anti-Muslim, and anti-Christian. antichristian exists, but I don't know how common that is. antiRoman was rightly deleted and we should have anti-Roman; we should document the word in some form, in the most usual form.

As a consequence, non-French and ex-Christian should ideally be undeleted.

Since some seem impressed by OED, I'll note it has anti-American, anti-German, anti-Egyptian, anti-English, anti-Japanese and more anti-X, and pro-American, pro-Arab, pro-Asiatic, pro-British and more pro-X, and un-English. Many of these are definitionless, indicated as adjectives, and showing pronunciation and quotes. Thus, OED recognizes the ease of decoding but still covers these terms. It has no non-French and non-Canadian.

Merriam-Webster has many anti-X entries, including anti-Japanese and anti-communist.

Category:RFD result (failed) shows almost no deleted anti-X; it shows Talk:anti-Muslim mania, Talk:antibiotic-resistant, Talk:antijewish and Talk:antiRoman, most arguably misspellings. For non-, it only shows Talk:non-exclusive list and Talk:non-French. For pre-, it shows nothing. For pro-, it shows nothing. For ex-, it shows Talk:ex-Beatle and Talk:ex-pilot. Thus, we have been fairly sparing with deletions, with the largest impacted prefix probably being ex- with items not covered by the category: ex-Beatle, ex-Christian, ex-pilot, ex-Muslim, etc. Some ex- forms were deleted as a group via Talk:ex-pilot, per "Idiomaticity rules apply to hyphenated compounds in the same way as to spaced phrases", but prefixed words are not compounds, so this was a misapplication of the policy. Our practice so far has been mostly inclusionist for this class of words and it is the fairly limited and fairly recent deletions that are the exception to the rule. The set of hyphenated prefixed transparent words included is many times larger than the set of those deleted via RFD. Thus, our custom is predominantly to include them.

We can compare prefix category sizes. Category:English terms prefixed with multi- has 1,827 items, most of them hyphen-free, corresponding to how productive the prefix is. Category:English terms prefixed with non- has 9,984 items. Category:English terms prefixed with anti- has 3,521 items. By contrast, Category:English terms prefixed with ex- has measly 86 entries, not because ex- is unproductive but probably because it is usually hyphenated and people feel less comfortable entering ex- derivations. The slippery slope of ex- would lead to a normal state of affairs as far as prefix productivity goes.

BinaryStep seems to agree with me, per User:Binarystep/Second Look.

To repeat the main point, the policy says idiomaticity is not required for single words, so the deletion of ex-Christian, ex-pilot and non-French seems to be contrary to policy. I find this conclusive and compelling and did not realize before it was so simple.

--Dan Polansky (talk) 10:29, 25 September 2022 (UTC)[reply]

Support. Well-said. As I mentioned during the RFD for anti-Putinism and anti-Putinist, our current policy is effectively a total ban on any word consisting of a prefix and a capitalized stem, due to the inherent awkwardness of spellings like antiJapanese or nonFrench. The fact that we barely have any entries for ex- words is rather dismaying as well, and only further proves how flawed our system is. Binarystep (talk) 09:53, 26 September 2022 (UTC)[reply]

Thank you. I only disagree in that I argue above that our current policy in CFI already is to allow hyphenated prefixed words. What happened to ex-X words was a misapplication of CFI. I will note Category:English terms prefixed with self- (456 items) as interesting: the items so far have not been attacked despite being nearly all hyphenated. For instance, self-censor is fully transparent (to censor oneself) and was created by Equinox, who would have guessed. What is flawed is not our current system but its previous interpretation, which ignored "including a term if it is attested and, when that is met, if it is a single word or it is idiomatic". This is the part to be quoted by those who defends these words. --Dan Polansky (talk) 15:39, 26 September 2022 (UTC)[reply]

Another category not attacked so far is Category:English terms prefixed with all-, including all-knowing and all-seeing.

There is a similar argument for hyphenated suffixed words, including apple-like, baklava-like, ball-like, Aesop-like, Arizona-like, California-like, calypso-like, candida-like, and chameleon-like. Further examples are alcohol-free, caffeine-free, carbon-free, and disease-free from Category:English terms suffixed with -free. --Dan Polansky (talk) 16:09, 26 September 2022 (UTC)[reply]

Oppose The additional entries that would be created would be largely worthless. Clearly we have some vague intuition about which of these kinds of terms are worth including and often add and keep them whether or not there is an explicit rule that seemingly should apply. It seems to me that our efforts to systematize such intuitions usually lead to unsatisfactory, lengthy changes in the text of our policy pages, often with unpredicted, undesirable consequences. DCDuring (talk) 14:43, 26 September 2022 (UTC)[reply]

I wonder what kind of vague intuition protects the fully transparent candida-like, created by DCDuring. --Dan Polansky (talk) 16:12, 26 September 2022 (UTC)[reply]

Some more: Special:Search/intitle:/County/ finds 1,841 country names like Washington County, purely encyclopedic content with nothing lexically interesting; and this is not yet complete. The 30 senses in that entry are a joke, lexicographically speaking. By contrast, the ex-X words are words, and a testimony to the productiveness of the prefix even if they are easy to break into components and easy for decoding. For the time being, to avoid conflict, I expanded ex- entry with a list of entry-less ex-X words as derivations, to allow us to document the productiveness of the prefix; more expansion is more than welcome, but only of attested words. Prefix ex- is the worst target of this deletionism: even if we deleted all hyphenated non-X words and anti-X words, we would still document the productiveness of non- and anti- prefixes in solid-written nonX and antiX words. THUB may in future help protect ex-X words against the deletionists who ignore our policy: Category:Spanish terms prefixed with ex- looks promising with its 302 entries and lets one guess about how productive the prefix may be in English. If there was a second language documented like that, THUB (requiring 2 languages) would do a decent job to protect the entries. Category:Czech terms prefixed with ex- has measly 22 entries: the prefix is not so productive in Czech. --Dan Polansky (talk) 12:46, 28 September 2022 (UTC)[reply]

Another point is that the difference between a hyphenated prefixed word and a solid-written prefixed word is linguistically arbitrary and meaningless. They are artifact of writing and typing and nothing more, and have nothing to do with wordhood outside of writing and typing. Both cases are equally easy to administer. Both nonstandard and non-standard are the same word, just different spellings or forms. For a human, both are equally sum of parts and nearly as easy to split (for a computer, that is different): humans know how to parse nonX words without having a look-up table in their heads for them. It may be that the non-X (dashed) forms are British: OED has a long list of non-X entries such as non-Aristotelian, non-Aryan, non-apparent, non-believer, non-existent, non-hazardous, non-Jew, non-Jewish, non-mathematical, non-Roman, non-toxic, non-violent, non-word, etc. For both human and computer decoding (going from word to meaning), these are worthless; they serve as evidence of existence of a word. The only thing the hyphenated vs. solid distinction does for prefixes is that it penalizes prefixes that are almost predominantly used with hyphen. If one is obsessed with eliminating as many hyphenated prefixed words as possible, better delete all non-X forms or hard-redirect them to nonX forms and keep all attested ex-X words: that would give us the required coverage of all prefixed words in existence. --Dan Polansky (talk) 08:46, 30 September 2022 (UTC)[reply]

~~Support, though I was initially hesitant about -like and -free in particular, because they can feel like separate words. - excarnateSojourner (talk | contrib) 20:34, 8 October 2022 (UTC)~~[reply]

Years later, I've since changed my mind on this. First, I'm not even convinced that affixes are not words. If we're not going to base our definitions on punctuation ("words are separated by spaces / hyphens"), what prevents an affix from being a word? They convey meaning, but their parts do not, just like other words. "But they can't be used by themselves, they can only modify other words." You mean like the word not? You mean like all adjectives and adverbs?

Also, free and like are undisputedly words in at least certain contexts. I see no difference between sugar-free and attention-grabbing. Surely we don't want to include -grabbing. — excarnateSojourner (ta·co) 02:16, 9 September 2024 (UTC)[reply]

Avoiding pro-American (anti-British) bias: corpus data shows British publications tend to favor non-X forms with a hyphen. If we exclude hyphenated non-X forms as sum of parts (contrary to CFI as per above), we will exclude British usage (contrary to what OED does as per above). Indeed, non-standard is less common in total than nonstandard so is not protected by COALMINE or THUB. The massive bias is there in the mainspace as is and it will not change any time soon: Special:Search/intitle:/non-/ shows 806 hits, a fraction of the total of over 10,000 non- prefixed words.

Ease of human parsing for solid forms: Human mind stores the most common prefixes ready for morphological analysis. non- is one of the most productive prefixes. When the mind sees a word form, it checks whether the start matches a common prefix and performs other cognitive operations. Doubtless, the processing is cheaper with hyphenated forms, but even if it is 10 times more expensive with solid forms, it usually would not be worth it to stop reading and make a dictionary look up. When presented with nonX, the reader will naturally look up X, with little cognitive effort. The cases where nonX presents a parsing ambiguity are fairly rare. The situation is very different with the long German and Finnish compounds, where the task of breaking them up is cognitively non-trivial; the more words are combined in the string, the harder the breaking up. --Dan Polansky (talk) 13:48, 14 October 2022 (UTC)[reply]

Counts of included hyphenated prefixed words: By using intitle: search for anti-, non-, post-, pre-, pro- and multi-, I counted 2308 items in total. Some are going to be spurious since this is a substring search, but not too many. I excluded ex-, re- and un- since they produced too many spurious hits. --Dan Polansky (talk) 14:59, 14 October 2022 (UTC)[reply]

Clarification needed in Wiktionary:Entry layout

Hi everyone. I'd like to call your attention to this ongoing discussion about an ambiguity in Wiktionary:Entry layout regarding the location of the "References" and "Further reading" sections. It seems to be a simple matter of an editorial overlook, rather than something that would imply a policy change to address, but please do share your thoughts on the matter. Thanks! Waldyrious (talk) 19:24, 25 September 2022 (UTC)[reply]

I'm okay with:

===Noun===

===Verb===

===References===

===Further reading===

or even potentially (when it makes sense):

===Noun===

====References====

====Further reading====

===Verb===

====References====

====Further reading====

I am not a fan of:

===Noun===

===References===

===Further reading===

===Verb===

===References===

===Further reading===

70.172.194.25 21:10, 25 September 2022 (UTC)[reply]

Completely agreed, you should never have ==References==, ==Further reading==, ==Related terms==, ==Descendants== or any other non-POS header interspersed between two POS headers at L3. Benwing2 (talk) 04:49, 26 September 2022 (UTC)[reply]

Support having “References” and “Further reading” once at the bottom of each entry just above “Anagrams” (the first option) in the majority of cases, or the second option exceptionally where there’s a good justification. — Sgconlaw (talk) 07:13, 26 September 2022 (UTC)[reply]

For clarity, please reference the definition of 'entry' you are using. In the parlance of WT:EL#Entry name, an entry means a whole page, which cannot be what you mean.--RichardW57m (talk) 11:19, 26 September 2022 (UTC)[reply]

I mean each language section. — Sgconlaw (talk) 18:46, 26 September 2022 (UTC)[reply]

The specification of entry layout is bedevilled by ambiguity, and fails to provide terms for what we want to talk about.

WT:References#Etymologies implies that each etymology section may have its own reference section, shared with the contents of the parallel or included part of speech headers. On the other hand, WT:Etymology#References appears to recommend that each etymology section have its own references section, which is a contradiction if the formal etymology section does not include part of speech headers. (Note the ad hoc terminology - 'formal etymology section' is what is headed by some type of etymology heading, as opposed to a section about etymology, which has no separate heading if the etymology section includes part of speech headings.) --RichardW57m (talk) 11:19, 26 September 2022 (UTC)[reply]

It would be nice to have at least approximate knowledge of how many L2 sections have each of the various orderings of headers. It should be possible for a regex whiz to do this in Cirrus search for the exceptional cases within, say, English, German, or Spanish lemmas. More accurate and comprehensive knowledge could be obtained by processing the XML dumps. It seems to me to be easier to make a decision to suppress more rarely used orderings of headers. DCDuring (talk) 16:18, 26 September 2022 (UTC)[reply]

The defect is in Wiktionary:Entry layout § List of headings. The change required is from

===References===
===Further reading===
===Verb===

to

===Verb===

The change is substantial and therefore requires a vote per "Any substantial or contested changes require a VOTE". I support the correction as obviously necessary. Luckily, people have not been foolish enough to apply what is in WT:EL to mainspace from what I have seen, and are unlikely to start any time soon. --Dan Polansky (talk) 16:34, 26 September 2022 (UTC)[reply]

Interesting — my first instinct would be to not consider such a change to be substantial, since it merely aligns the example with the rest of the information provided more explicitly elsewhere in that page (as I described in more detail in this edit), and with the community's general understanding of the policy (per the comments in the linked discussion and here so far). But sure, a vote might give additional legitimacy to this position.

That said, if we're to set up a vote, then I would suggest also taking the opportunity to resolve the ambiguity regarding these two sections (and the "Anagrams" section) in the §Headings after the definitions list, which I mentioned in the linked discussion — essentially, expanding the change you propose above to be something like this. --Waldyrious (talk) 17:25, 26 September 2022 (UTC)[reply]

I agree. Putting this to a vote is a massive waste of time, when the sensible solution is obvious and uncontroversial. It’s very obvious that “substantial” is being used in its common meaning of “major”. Theknightwho (talk) 20:14, 26 September 2022 (UTC)[reply]

I've gone ahead and made the change. - -sche (discuss) 22:26, 26 September 2022 (UTC)[reply]

Thanks for applying the change! I'd like to know your thoughts about the additional changes I proposed, i.e. splitting out of "References", "Further reading" and "Anagrams" to a new section, below §Headings after the definitions, since all the other ones in that list are meant to be level-4 headings, and the section intro does say "These headings generally derive from knowing the meaning of the word", which doesn't apply to those three. I think we should take the opportunity to iron out that ambiguity as well. Would that change warrant a vote? --Waldyrious (talk) 06:49, 27 September 2022 (UTC)[reply]

The change is substantial since it changes the way the policy impacts mainspace. It is also likely controversial for some editors: some prefer multiple L4 headings over single L3 headings. Search insource:/====References/ finds 33,199 occurrences of the L4 or L-more headings. This discussion and the change of EL should not be construed as a consensus to prohibit those L4 headings, especially given how few people participated and how short it was. From looking at EL history, these headings used to be on L4 level and the defect was introduced via diff without a vote. Thus, a more natural correction would be to bring them back to L4 level. Having them only on L3 level and at the end of the language entry is my preference, but some will differ. The good thing is that EL has Flexibility section; I hope no one will try to enforce the new EL inflexibly.

This whole matter shows people are not monitoring and reviewing changes to EL nor should they need to. This defect remained unnoticed for 6 years. Also, EL revision history is hard to browse because of all the incremental changes made without a vote. This whole idea of "minor" changes is detrimental. --Dan Polansky (talk) 08:08, 27 September 2022 (UTC)[reply]

That's a good point. Thanks for digging up the diff (I had tried, unsuccessfully, to find it, before my last comment). My understanding is that the possibility of References being a level-4 heading was not even present in the page (before -sche's edit above), since even the two removed lines were at level 3 anyway. I do agree that it should be made explicit that a level 4 References can be used if relevant. IMO this would require:

Removing (or qualifying) the note in entry #6 of §A very simple example, that says that "References" should be a level 3 heading;
Explicitly mentioning the possibility of a level-4 References in the §References section (it currently is silent on that)
Ideally, also adjusting Wiktionary:References#Implementation to also mention this.

Waldyrious (talk) 09:19, 27 September 2022 (UTC)[reply]

I'm not sure that "References" even belongs in §A very simple example. And in that very simple example, it should unambiguously be a level-3 heading. Making it level-4 would be unduly complicated.--RichardW57m (talk) 13:07, 27 September 2022 (UTC)[reply]

I've dug into the history of [[WT::EL]], to way back before references appeared in §A very simple example or the page was tagged as a policy. At that stage, e.g. mid-December 2005, "References" was already implictly a level-4 or deeper heading - they were shown as occurring under a noun PoS heading that preceded a verb PoS heading. "References" was added to "§A very simple example", with the formatting of a level-3 heading, at 9:16 on 28 December 2005. This was explicitly stated to be at level-3 at 18:43 on 6 Januuary 2006. "§A very simple example" should therefore say that it may be a level-3 or a level-4 heading. This is covered by my proposal below. --RichardW57 (talk) 20:53, 27 September 2022 (UTC)[reply]

I've looked at some of the level-4 references. There are some that are at level 4 because of the use of muultiple Etymology sections for a language. Most of the others I find unacceptable because they providing mentions rather than usages for the WDL language English or merely invoking a search engine, whose initial results do not demonstrate the alleged meaning, and in some instances seem to be nothing to do with it! Once they are brought up to standard, the reference section can be deleted. However, tens of thousands of pages is a lot to fix or raise {{rfv-sense}} challenges for. --RichardW57 (talk) 20:53, 27 September 2022 (UTC)[reply]

I think a better way to handle the matter is to split up §Headings after the definitions. There would be 3 groups:

Corresponding to each Part of Speech heading at one heading level deeper (but see notes on flexibility):
Usage notes .. See also
Corresponding to each Part of Speech heading, numbered Etymology heading, or language heading, at one heading level deeper, in a consistent style within each language section:
References .. Further reading
For each language heading, at one level deeper:
Anagrams

Note that this allows for a level-5 References when each header line has its own references, and header lines are grouped by etymology. We don't need to change the §References section.

We may need to revisit "Anagrams" et sim. when digraphs are treated differently to pairs of letters, as in Hungarian.

I suggest invoking flexibility, because, for example, mutations of Welsh words other than proper nouns depend on their form, not on their meaning. --RichardW57m (talk) 13:07, 27 September 2022 (UTC)[reply]

Ideally, §Headings before the definitions should be fixed to allow "References" within the etymology section, but there are various sequences of "Etymology", numbered "Etymology", "Pronunciation" and "Glyph Origin" headings, but I'd rather invoke flexibility for now. I'm not convinced that homographs with different pronunciations should necessarily be under different numbered etymologies. --RichardW57 (talk) 20:53, 27 September 2022 (UTC)[reply]

How to cite Latin nouns and verbs in Romance etymologies

Romance nouns and verbs are both derived from non-lemma forms of Latin nouns and verbs: verbs from infinitives (whereas Latin verbs are -- IMO annoyingly -- lemmatized using the first person singular present indicative), and nouns usually from the accusative singular (whereas Latin nouns are lemmatized using the nominative singular). There is no consistency in how this situation is handled. You might variously see any of these:

From {{bor|pt|la|abōminor}}.
From {{bor|pt|la|abōminārī}}, present active infinitive of {{m|la|abōminor||to abominate, to abhor}}.
From {{bor|pt|la|abōminārī}}, present active infinitive of {{m|la|abōminor||I abominate, I abhor}}.
From {{bor|pt|la|abominor|abōminor, abōminārī}}.
From {{bor|pt|la|abōminor}} (infinitive {{m|la|abōminārī}}).

Similarly for nouns:

From {{bor|pt|la|abōminātiō}}.
From {{bor|pt|la|abōminātiōnem}}, accusative singular of {{m|la|abōminātiō||aversion, loathing}}.
From {{bor|pt|la|abominatio|abōminātiō, abōminātiōnem}}.
From {{bor|pt|la|abominatio|abōminātiō, abōminātiōnis}}.
From {{bor|pt|la|abōminātiō}} (accusative singular {{m|la|abōminātiōnem}}).
From {{bor|pt|la|abōminātiō}} (genitive singular {{m|la|abōminātiōnis}}).

Along with other variations. Note also the inconsistency in how to gloss the verb lemma: either using "to do X" (IMO correctly) or "I do X" (IMO incorrectly). And the inconsistency in which non-lemma form to include in the last two noun variants: accusative singular (since that's what the Romance term is actually derived from) or genitive singular (since that's what Latin grammar books usually contain).

How should we handle this?

Benwing2 (talk) 05:25, 26 September 2022 (UTC)[reply]

There were two earlier threads on the matter (1, 2). In the latter thread with @Word dewd544, I supported the following solution (echoing what some had suggested in the first thread):

and by extension for nouns

In other words, linking to the Latin lemma form while displaying the form from which the Romance lemma derives. In general, that means the infinitive for verbs and accusative singular for nouns.

It may not be necessary to actually do this for all nouns, however. It's already common practice to derive nouns such as the Italian amico and Spanish amigo from the Latin lemma form (i.e. from amicus, without mentioning the accusative amicum). I agree with this, as there is nothing 'unexpected' about such derivations to anyone having at least a passing familiarity in Latin and Romance languages.

Conversely, it appears to be common practice for nouns such as the Italian questione to be derived explicitly from the Latin accusative (quaestiónem) rather than directly from the Latin lemma form (quáestio). I agree with this practice as well, since the difference in stress and the presence or absence of the final /-nem/ is significant. Nicodene (talk) 07:02, 26 September 2022 (UTC)[reply]

I agree with Nicodene, just link to the lemma and pipe whatever display form is needed. (If there are exceptional cases where a particular word was unusually derived form the ablative plural or something, maybe then the long "foo, such-and-such-form of bar" would be useful.) - -sche (discuss) 22:41, 26 September 2022 (UTC)[reply]

I also like the |lemma|form}} approach. Ultimateria (talk) 19:25, 28 September 2022 (UTC)[reply]

Thanks for the comments, I will go with this approach. Benwing2 (talk) 04:07, 30 September 2022 (UTC)[reply]

User:Inqvisitor

This user strongly believes that plovebat is not Vulgar Latin. I have cited, on Talk:pluo, five different academic sources regarding the fact that such words are Vulgar Latin: Jozsef Herman, Rebecca Posner, James Noel Adams, Leonard Palmer, and Lewis & Short - all top scholars in Latin or Romance Linguistics. Despite this, user Inqvisitor asserts that he does not care, that it's irrelevant, and that I supposedly don't understand what Vulgar Latin means. He even went so far as stating that 'any scholar who claims they can tell you for sure what was in the head of Petronius or any 1st century Latin writer or speaker is not brilliant but an arrogant fool. Nowhere did Petronius write 'I am using this fake word for satirical effect'". So his response to there being an overwhelming scholarly consensus about the Roman author Petronius satirizing lower-class 'Vulgar Latin' speech is to, in essence, call anyone who shares this view an idiot, including top scholars in the field. I have asked him more than once to cite any source that challenges this view, or at the very least doubts it, and he has not done so. (To be fair, one can't cite what doesn't exist.)

He has also insisted on inserting the Vulgar Latin form plovebat into the conjugation table on Latin pluo next to the standard Classical form pluebat. As far as I can tell, inserting single non-standard forms into conjugation tables like that is neither a normal nor accepted practice on Wiktionary, and I see no reason to start doing it here. I have suggested using the alternative forms section instead, and he has vehemently refused, claiming that a conjugation table is where a non-standard alt-form belongs. I have yet to understand where he got this impression. That'd be like providing both skills and skilz as the plural inflexions of skill, simply because there exists a book somewhere that quotes skilz while commenting on internet slang. Surely that's what the alternative forms section is for?

In sum, I find that the user is being unreasonable in the extreme and championing a personal conviction against the consensus of "academic sources" (his scare quotes). Nicodene (talk) 06:34, 26 September 2022 (UTC)[reply]

The prohibition of non-standard forms in inflection tables is news to me. How am I supposed to document substandard THAI CHARACTER SARA UE in place of <THAI CHARACTER SARA I, THAI CHARACTER NIKHAHIT> in Pali inflections? It occurs because the Windows XP wouldn't allow the combination if complex script support was enabled, and the restriction has carried over into modern non-Windows input systems. I was planning to deploy it once I had updated the input format for inflection tables to support footnotes in the output. --RichardW57m (talk) 12:44, 26 September 2022 (UTC)[reply]

In this case for Latin, isn't it more a matter of whether plovebat is part of pluo as well as of plovo? The former seems hard to demonstrate. --RichardW57m (talk) 12:44, 26 September 2022 (UTC)[reply]

@Nicodene Without commenting on specifics, we do include nonstandard forms if attestable. Theknightwho (talk) 15:47, 26 September 2022 (UTC)[reply]

@Theknightwho Since when does anyone mix random non-standard forms into standard inflexions? Should our entry for foot list the plural form as "feet or feetses"? Nicodene (talk) 17:50, 26 September 2022 (UTC)[reply]

See drink. Vininn126 (talk) 18:01, 26 September 2022 (UTC)[reply]

So it is done. I can't say I see any point in doing it that way, much less doing it with Latin inflexion charts (which would overflow from the sheer number of non-standard spellings), but whatever. At least there is a clear label that it's non-standard. Nicodene (talk) 18:17, 26 September 2022 (UTC)[reply]

FWIW, although it's a bit subjective to decide exactly where the cutoff is, I don't think including significant nonstandard forms like this (which is somewhat useful) requires us to slide down a slippery slope to listing every single variant spelling ever (which would indeed be excessive and silly). drink doesn't list spellings like drinkin' or google books:"dreenking" or google books:"drahnk" or google books:"druhnk" on the headword line, for example, even though obviously any attested one can have its own entry. (Personally, I would also move obsolete-or-rare nonstandard forms like drinken out of the headword line and down into usage notes or conjugation tables, like at laugh—and like at pluit.) - -sche (discuss) 21:06, 28 September 2022 (UTC)[reply]

Interwiki refs from lang specific subentries: prefer English WP or language links?

Too rarely do entries link to relevant WP articles, so I appreciate it when I see such. But then the entry here for 海 brought up a question. The subentry for 海#Chinese does mention links to WP articles and per Chinese dialect, and the link for 海姓 (hǎi as a surname) drew my attention.

But that 海姓 link is to zh.wikipedia.org, the Chinese WP. There is a certain logic to that -- Chinese linking to Chinese -- except that that ZH WP entry's sidebar shows a link to the English WP article Hai (surname). There is an English WP article for the topic, and this is en.wiktionary.org ... (In this case the two WP articles are nicely linked together in their sidebars, so further navigation by the reader is possible as needed.)

Note that the Japanese subentry here 海#Etymology_1 demonstrates a different take on how to do WP links.

.

Within an entry here at en.wiktionary.org, from within a language specific subentry, should links to WP preferentially reference:

articles in the same language as the host wiktionary site? (e.g. wikt:en: --> w:en:)
or articles in the language of the subentry? (e.g. #Chinese --> w:zh:)

Shenme (talk) 19:04, 26 September 2022 (UTC)[reply]

What are compounds and what are open compounds specifically

I have posted some analysis at Talk:open compound. This issue seems relevant to classification of our entries, so I post it here also for reference. I am not wholly clear which of our multi-word spaced entries are considered to be open compounds. Some questions:

Is black hole an open compound?
Is any adjective-noun spaced entry that we include an open compound?
Is any noun-noun spaced entry that we include an open compound?
Do we want to have a separate category for English open compounds? Our templates such as {{af}} and {{compound}} could automatically classify compounds into closed compounds, hyphenated compounds and open compounds categories. I for one would find it worthwhile.

This pertains mainly to English; other languages seem to have different tradition for defining "compound", one that is syntactic rather than semantic. There are no open compounds in multiple languages. I believe Czech černá díra (black hole) is not a compound and German schwarzes Loch (black hole) is not a compound; this is based on syntax, not semantics. Anyone knows more, for English or also for other languages? --Dan Polansky (talk) 10:06, 28 September 2022 (UTC)[reply]

Wikipedia defines the term, by the orthography of the compound, as a compound whose constituents are separated by spaces. Thus coal mine is an open compound, whereas coal-mine and coalmine are closed compounds. The Cambridge Grammar of the English Language makes the same distinction but uses different terminology, distinguishing a syntactic construction consisting of an attributive dependent + head, forming a “composite nominal” (Wikipedia’s “open compound”), from a morphological “compound noun“ (Wikipedia’s “closed compound”). They give a syntactic criterion – although all examples fit Wikipedia’s distinction – to wit that in composite nominals the components can enter separately into relations of coordination and modification. For example, we can shorten coal mines or lignite mines to coal and lignite mines. The dual of a black hole is a white hole; a text might be discussing the similarities between black and white holes. Note: I am merely reporting this. --Lambiam 15:19, 2 October 2022 (UTC)[reply]

Thank you. Using the above, I found the following:

How is 'compound noun' defined in CGEL?, english.stackexchange.com

It quotes CGEL and shows some screenshots. CGEL allows full stop as a compound. It uses the phrases "orthographic word" and "grammatical word", of import for defining compounds.

And the following:

Why do grammars claim that adjective+adjective is always a morphological compound and never a syntactic construction?, english.stackexchange.com

It covers "Various non-syntactic criteria [...] as differentiating between composite nominals and compound nouns": stress, orthography, meaning, and productivity. It mentions "coordination and modification" as well and gives some examples.

From your text it would follow that coal mine is not a compound, despite the existence of coalmine. It also suggests that different linguists are going to test for compoundhood of space-separated phrases differently, by there being CGEL treatment and other treatments.

In any case, CGEL does not recognize all adj-noun and noun-noun phrases as compounds.

I also found the following interesting bachelor thesis written by a non-native in decent English:

Compound Nouns and Noun Phrases by Michaela Bartušová, theses.cz

It heavily references English and other sources. For distinguishing compounds from noun phrases or "free phrases", it references Adams. wet day is not a compound while small talk is a compound: there is very wet day but not very small talk; and there is the day is wet. It mentions other English authors and how they test for compoundhood, including phonological and semantic criteria. (I mentioned phonology criteria at Talk:open compound.) It contains some non-trivially long lists of example compounds in "II.Practical Part". It is quite extensive. It mostly deals with English compounds and briefly mentions compounds in other languages.

--Dan Polansky (talk) 16:34, 2 October 2022 (UTC)[reply]

(Re-)formatting line breaks

Once in a while, some editors (mostly IPs) go around and swap forward slashes used as line break indicators in quotations with <br> tags. According to WT:QUOTE, this is acceptable:

HTML line break can be used by writing <br>, or the “¶” or “/” characters can be used.

It's fine to make this choice for quotations you add, but going around and reformatting quotes according to your own taste adds noise and is counterproductive. Is there any way this could be avoided? Maybe we could decide on a standard line break template (e.g. {{br}}), and then change the display according to a user setting? This would also help to avoid raw HTML in the source text.

There is also <poem>, which was designed for these cases, but it does not seem to work well when used in quote templates. Jberkel 12:08, 28 September 2022 (UTC)[reply]

A template invocation like that would break my word-selecting quotation templates, which use braces to tag words so as to embolden them. Their most efficient use case is to provide a quotation for every different word in a snippet of scriptio continua text. The code is in Module:RQ:pi:Sai_Kam_Mong. I think I can see a way to fix it.

As to raw HTML, how do we cite tables? I've found myself citing mentions in tables for rare words like unarbymthegfed. Wikitable syntax just doesn't work. --RichardW57m (talk) 16:34, 29 September 2022 (UTC)[reply]

The HTML break tags in citations lead to a substantial waste of vertical screen space. I think they ought to be removed whenever possible, replaced by front slashes to indicate line breaks. Besides wasting space, the multi-line poetic passages demand visual attention, overweighting what are very often highly ambiguous uses of the headword being cited. DCDuring (talk) 17:46, 29 September 2022 (UTC)[reply]

I agree with @DCDuring. — Sgconlaw (talk) 19:19, 29 September 2022 (UTC)[reply]

@DCDuring, Sgconlaw: So, would we have to decide whether each line break was a clause boundary, a word boundary or mere hyphenation? Using the HTML line break does not force us to impose an interpretation on the quoted original.--RichardW57 (talk) 23:23, 29 September 2022 (UTC)[reply]

A slash with a hard space preceding is fairly clear. We could resort to HTML breaks in the rare cases of ambiguity of line breaks — or we could dispense entirely with the often ambiguous cites from poems and lyrics. DCDuring (talk) 23:34, 29 September 2022 (UTC)[reply]

A line break in typeset traditional modern Thai is (almost?) always ambiguous between a clause break and a mere word boundary - unless @Octahedron80 knows different. (Magazines can make life easier by using European punctuation.) I've found the ambiguity a major nuisance when transcribing typeset Thai. On the other hand, {{th-usex}} even forces one to identify word boundaries, which is a pain in a language abounding in invisible COALMINEs. Do you think it is not too great a strain to remember that unbroken visual <space, slash> is for long lines merely a representation of a line break, and not an indicator of at least a word boundary. (Hyphenation is mostly restricted to multi-column text, and tends to use hyphens.) --RichardW57 (talk) 00:34, 30 September 2022 (UTC)[reply]

@RichardW57: I can’t speak for Thai. In English, major dictionaries like the OED silently remove line-breaking hyphens. It would be rather odd to try and replicate all the hyphens in the original text as this depends on the column or page width, and make the quotations difficult to read. Where a hyphen is ambiguous and potentially significant (for example, where it is in the lemma), I’ve indicated it like this: “line[-]breaking”. — Sgconlaw (talk) 04:21, 30 September 2022 (UTC)[reply]

I suppose one could use '[ ]/[ ]' for line breaks in scriptio continua, where the spaces are hard spaces. RichardW57m (talk) 11:59, 30 September 2022 (UTC)[reply]

Meaning of consensus for discussions other than formal votes created at Wiktionary:Votes

The vote at "Wiktionary:Votes/pl-2022-09/Meaning of consensus for discussions other than formal votes created at Wiktionary:Votes" has opened. Please express your views on the issue there. — Sgconlaw (talk) 12:14, 28 September 2022 (UTC)[reply]

Category:English terms suffixed with -ridden

There are several terms/words missing from this, even though it’s an autocategory, such as rat-ridden, louse-ridden and flea-ridden. Could someone with the ability, knowledge and perhaps requisite permissions to fix this please do so? Overlordnat1 (talk) 14:45, 30 September 2022 (UTC)[reply]

It just takes a Cirrus search and persistence: Search for 'intitle:/ridden/ -incategory:"English terms suffixed with -ridden"'. Not too many false positives, but more conditions could be added to eliminate annoying common ones if necessary. DCDuring (talk) 14:57, 30 September 2022 (UTC)[reply]

I haven’t tried searching quite like that before but this just produces a list of the 27 terms already in this category and the word ridden itself. How would I go about adding the three above terms manually? --Overlordnat1 (talk) 15:18, 30 September 2022 (UTC)[reply]

I edited etymologies of these three to add them to the category, e.g. {{af|en|rat|-ridden}}, where af stands for affix. --Dan Polansky (talk) 15:53, 30 September 2022 (UTC)[reply]

Special:Search/intitle:/ridden/ -incategory:"English terms suffixed with -ridden" finds more items to add, but the question is, are they really all via suffix -ridden? And is -ridden really a suffix? It survived a RFD, but should it have survived it? If you want to edit the etymologies of the remaining items to add them to the category or change "ridden" to "-ridden", you can, I guess. --Dan Polansky (talk) 15:58, 30 September 2022 (UTC)[reply]

It's an old RfD, not well argued. I wouldn't be unreasonable to revisit it. DCDuring (talk) 02:02, 1 October 2022 (UTC)[reply]

Odds and Ends

Today is a national holiday in the US and I have all kinds of things IRL that I'm procrastinating about. That means I've been spending too much time today looking through Special:RecentChanges. Before I get to work on the stuff I need to do, I thought I'd share a couple of oddities I stumbled on:

Special:Contributions/173.216.0.0/20. This IP editor editor reminds me in some ways of the Ann Arbor/St Louis editor that we blocked for cranking out mediocre English entries assembly-line fashion in order to camouflage the far-fetched racist stuff they were adding. It's definitely not the same person, though. The IP addresses used gelocate to Cabot, Arkansas, and seem to be free of any proxies or virtual anything. On the one hand, they seem to know enough about sourcing to create halfway plausible proto-language reconstruction entries (whether they're any good is another matter). On the other hand, they created an antry for the National Socialist German Workers' Party and added 26 translations in languages like Latin, Basque, Estonian, Kazakh, Korean, Thai and Indonesian- probably all harvested from interwikis on the Wikipedia page. The fact that all but the German one are redlinks is probably significant. They also added audio templates for three nonexistent pronunciation files, and added a lot of descendants to English hamburger. A particularly odd one is the Siraya entry they added at taywan. For context, this is a small Austronesian language of Taiwan, for which we have exactly 10 entries, including this one. I'm not sure where they got it from.
Special:Contributions/RedaCEC: this account is globally locked as a sock for an account that was originally blocked for adding bogus information and images re: traffic signs. They also seem to be yet another of the Australian accounts who have recently been adding tons of entries and translations in obscure languages they don't know, based on low-quality online sources. Chuck Entz (talk) 00:51, 11 October 2022 (UTC)[reply]

The Spanish Inquisition vs. CFI

CEFR levels

Category:English terms spelled with 0 etc.: yes or no?

Mongolian terms spelled with ъ and щ

Closing RFD discussions using the strength of the arguments

Making adding sources default

Main space vs. other namespaces contributions

{{surf}} shouldn't categorize

RFD header - abandoning text implying plain majority

Largish SoP Numerals

Should words with circumfixes also have the categories for the corresponding prefix and suffix?

Chinese etymology sections should not use zh

Categorisation (topics and labels) in Chinese

OED treatment of proper nouns

interface-editor group proposal

Changes to WT:LEMMING

Revised Enforcement Draft Guidelines for the Universal Code of Conduct

Harm done by User:Quercus solaris

Category:English words ending in "-gry" and Category:English words ending in "-yre"

CJKV Character list by Ideographic Description Characters

Inappropriate synonymy

“Old Ruthenian” language

Missing French entries

RFD header - the consensus is determined primarily based on tallies

Reconstruction:Proto-Uralic/pukta-

Unicode 15

Entries on simple.wiktionary, but not on en.wiktionary

West Polesian and Surzhik

Further reading only for dictionaries and encyclopedias

Expand "Descendants" part of WT:EL

GODlessness

Request: template-editor permissions

Contributions by User:Rajkiandris

Minecraft sense of creeper

Should Scots be an LDL?

A game/puzzle for you

Subtractive Welsh Numbers

Whether Reddit and Twitter are to be regarded as durably archived sources

HuffPost

HuffPost should be allowed

HuffPost should be rejected

Result

Reddit

Reddit should be allowed

Reddit should be rejected

Abstain

Result

Twitter

Twitter should be allowed

Twitter should be rejected

Abstain

Result

Vice

Vice should be allowed

Vice should be rejected

Result

General discussion

Administrative matters

Option 1

Support

Oppose

Abstain

Option 2

Support

Oppose

Abstain

thirty nine

User:Kanjishowa21-4 for templateeditor group

Including hyphenated prefixed words as single words

Clarification needed in Wiktionary:Entry layout

How to cite Latin nouns and verbs in Romance etymologies

User:Inqvisitor

Interwiki refs from lang specific subentries: prefer English WP or language links?

What are compounds and what are open compounds specifically

(Re-)formatting line breaks

Meaning of consensus for discussions other than formal votes created at Wiktionary:Votes

Category:English terms suffixed with -ridden

Odds and Ends

`{{surf}}` shouldn't categorize