Jump to content

Wikipedia talk:Manual of Style

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by SandyGeorgia (talk | contribs) at 14:57, 24 July 2018 (The endless "fan-capping" problem: lead citations). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Proposed footnote to discourage mass changes

I'd like to float the idea of adding a footnote off the MoS lead's short sentence about editwarring over style. Something like this:

{{efn|1=Rapid, mass changes to articles to "enforce" an optional style point can be disruptive. Alterations to the MoS are often reverted; we should not change thousands of articles then change them right back. Implementing changes gradually into articles helps avoid mistakes. See also this Arbitration Committee decision: "Where editors have made a number of similar edits in a short time space and other editors have raised concerns about those edits, the editor is to stop making the edits and engage in discussion." The assisted editing guidelines advise to "first ensure that there is a clear consensus" before performing a large number of semi-automated edits. While AutoWikiBrowser performs various MoS-related general fixes, its rules of use preclude semi-automation of either controversial or pointless changes. Large-scale erroneous edits by any editor may result in a block, even if done manually. }}

This would be much cleaner than adding a whole ==Section of verbiage==, or dumping more text directly into the lead. The issue doesn't come up frequently enough for that. But two (three, counting the mass-revert streak) in one week, leading to two WP:ANI threads and a WP:VPPOL discussion calling for MoS to address it directly, are a strong indication that we should at least have a clear footnote on this, with the ArbCom enforcement authority cited.
 — SMcCandlish ¢ 😼  21:50, 3 July 2018 (UTC)[reply]

You say M-o-s in your head? (an MoS) Or was that a typo? I've always said it "moss". I think this would be a good idea broadly. A choice link to WP:POINT in there with "then change them right back" might be reasonable; maybe there's a better link somewhere. --Izno (talk) 22:23, 3 July 2018 (UTC)[reply]
WP:POINT link integrated now. Pronunciation: I keep vacillating on it. I think I mostly do m-o-s in running text, because moss is that green stuff. But when I think of a shortcut, it comes out in my mind as mosskaps. Clearly, it's a brain tumor.  — SMcCandlish ¢ 😼  21:30, 4 July 2018 (UTC)[reply]
I'd support both the idea and the example wording (except, change "an MoS" to "a MoS") --BushelCandle (talk) 23:23, 3 July 2018 (UTC)[reply]
Since this is a behavioral concern, I think it deserves a section in a behavioral guideline, which we can reference from here. -- Netoholic @ 23:44, 3 July 2018 (UTC)[reply]
There probably already is one. I just wanted to have MoS itself say something, since most "sprees" seem to be MoS-motivated, and people are getting angry about it.  — SMcCandlish ¢ 😼  21:35, 4 July 2018 (UTC)[reply]
Would that include mass changes to include serial commas? Thinker78 (talk) 04:56, 4 July 2018 (UTC)[reply]
I would think so. I favor them myself, for clarity, and virtually never get reverted on my own insertions of them (for consistency with the rest of an article, or because the case in point is ambiguous or confusing without one). But I would expect to be keel-hauled if I used AWB to stick them into 1,000 articles.  — SMcCandlish ¢ 😼  21:35, 4 July 2018 (UTC)[reply]
Seems like a reasonable idea to me. · · · Peter (Southwood) (talk): 14:22, 4 July 2018 (UTC)[reply]
The wording needs to be specified. The federal government said that a mass shooting occurs when there are four or more victims.[1] I'm guessing you are not saying four or more edits. Thinker78 (talk) 01:19, 5 July 2018 (UTC)[reply]
Four or more commas in one article would be a mass edit; moving from article to article is serial editing. EEng 01:59, 5 July 2018 (UTC)[reply]
I'm still unclear on whether it's pschopathic or sociopathic.  — SMcCandlish ¢ 😼  03:03, 5 July 2018 (UTC)[reply]
  • Jokes aside, I've integrated what I can find from WP:BOTPOL and WP:AWB. There was a suggestion that WP:Consensus and WP:DE might say something relevant, too, but I've yet to find it. Also looked in WP:EDITING.  — SMcCandlish ¢ 😼  03:03, 5 July 2018 (UTC)[reply]
  • This is a solution in search of a problem. The AWB rules combined with WP:MEATBOT are more than sufficient to deal with this. There is nothing wrong with using AWB to enforce the vast majority of MOS recommendations. The efforts should be on identifying where AWB is applying unwanted fixes, and boot those out of WP:GENFIXES (or demote them to minor genfixes), not crafting yet another 'rule' that stiffle improvements to the encyclopedia. There is a reason why experience is required to be allowed to use AWB. Headbomb {t · c · p · b} 05:21, 5 July 2018 (UTC)[reply]
    • @Headbomb: How does the current draft wording not address appropriate use of AWB (i.e., to conform text to the actually expected MoS guidelines, versus violating MOS:STYLEVAR trying to enforce one preference out of a choice on an optional style matter)? This is not a "new rule", it's a footnote pointing to the existing ones. There is no searching for a problem; the problem is right in our faces, with outcry at WP:VPPOL and two nearly back-to-back WP:ANI cases the same week.  — SMcCandlish ¢ 😼  11:53, 5 July 2018 (UTC)[reply]

Note: Optional styles should not be enforced in a bot-like fashion. Doing so will likely be seen as disruptive and may lead to blocks and/or revocation of semi-automated tools privileges.

That could be added directly into WP:STYLEVAR (perhaps without the bold note).Headbomb {t · c · p · b} 14:44, 5 July 2018 (UTC)[reply]
I didn't take this approach because there's an ongoing thread at another guideline criticizing this style as WP:EASTEREGG. I argued against that view, but was outnumbered 2:1 or so. A happy medium should be easily achievable.  — SMcCandlish ¢ 😼  23:38, 5 July 2018 (UTC)[reply]
  • See also: User talk:PC-XT/Advisor#Stop futzing with headings – there's a semi-automated tool going around changing ==Heading== to == Heading == (or vice versa, I forget).  — SMcCandlish ¢ 😼  11:48, 5 July 2018 (UTC)[reply]
  • I would strongly support adding something like this (I say “like” this, because I know that we will quibble over specific wording for a while). Blueboar (talk) 13:18, 5 July 2018 (UTC)[reply]
  • I'd object because (a) it still refers to MoS style points (although I see you've removed the actual word "MoS"), but mass changes can be due to other style guides or delusions about them, (b) "thousands" is a big number, (c) the words "and then change them right back" (i.e. the editor who makes the mass change then self-reverts to make a point) don't look like a reference to something that I thought is the real problem (i.e. another editor has to clean up), (d) the earlier observation that behaviour guidelines belong elsewhere appears correct. Peter Gulutzan (talk) 14:25, 5 July 2018 (UTC)[reply]
    • It can only be about MoS points; this is the MoS page, not a behavioral guideline. It outside MoS's scope to try to address things like using scripts to re-categorize things disruptively, or abusing AWB to delete sources. All we're doing here is cross-referencing behavioral matter that's already pre-codified, and reminding people that (and how) it applies to MoS-related mass changes.  — SMcCandlish ¢ 😼  23:38, 5 July 2018 (UTC)[reply]
  • Looking at the arbitration cases, it seems like most (all?) involve automated or semi-automated edits, and most of them also involve users who continued to make changes after those edits had been challenged. It seems like the footnote should clarify then, that "mass changes" usually means automated or semi-automated edits, and/or add the clarification from WP:MEATBOT that simply editing quickly while still remaining responsive to discussion is not disruptive. I agree that large scale editing should be done with care, but I don't want to discourage editors from fixing problems, especially when WP:MOS intersects with WP:BLP (this), or accessibility issues. Nblund talk 19:33, 7 July 2018 (UTC)[reply]

References

  • I think this should be discussed wider. I have followed some of the ArbCom cases and I ve been following the problem for years. Any change to this direction, if not taken carefully, may lead to editors defending their own custom style against styles that are used in the 99% of the pages.The problem is not "mass changes" per se. The problem is "mass changes to enforce a certain style that does not have consensus". This is different from the "mass changes not wirth do it because the change hass consensus but let's not enforce it". We have to be really careful with thiss thing because we have editors that removed whitespaces inside header titles (useless) and editors that move punctuation points in the body text (useful). Take note that my examples are not about "optional styles" in the visual outcome but I bet the same problem holds there. There are people who like to have large paragraphs and those who like small paragraphs etc. People who like big images and people who like mall images within the article etc. My experience shows that if we discuss this further we may find that even the Manual of Style contains things that do not have consensus or things that may be added to the manual of style. I am pretty convienced that what we need here is a wider discussion to collect multiple epxeriences from using and editing Wikipedia. -- Magioladitis (talk) 07:10, 11 July 2018 (UTC)[reply]
    Are you saying that there are not legitimate cases for 99% of articles to use one approach to some stylistic or formatting question, and 1% to use a different approach? EEng 12:47, 11 July 2018 (UTC)[reply]
EEng I say that <1% of the pages have a different style due to various reasons. I have seen pages using a lot of colors for instance and the creator of those pages to insist keeping the colors all over the place. I have seen pages that do not use the standard wikitables, sometimes for good reason sometimes the tables could easily be replaced to make life easier for others to use and for Visual Editor to work, etc. I know for intances that a certain Wikiproject denies/denied infoboxes to biograpgies that are under their scope. I say that thank to bots and willing editors 99% of the pages follow a standard format and we have a small number of exceptions that forms a gray area. Sometimes there is a good reason for that, sometimes it is not. -- Magioladitis (talk) 14:35, 11 July 2018 (UTC)[reply]
Sometimes there is a good reason for that, sometimes it is not – Right, and it's the nature of mass changes that that distinction isn't taken into account. EEng 15:23, 11 July 2018 (UTC)[reply]
Favouring editors that use code hacks and custom styles is also a problem. Ther are editors who try to impose certain styles that they created. Discouraging mass changes en masse is a problem. We need a strategy on the mass change that affect the style of a page. -- Magioladitis (talk) 16:41, 11 July 2018 (UTC)[reply]
  • Favouring editors that use code hacks and custom styles – No one said anything about favoring them. What I've said is that each article needs individual evaluation.
  • editors who try to impose certain styles that they created – All the styling we see in articles today, such as the various referencing styles, table formatting conventions, infoboxes, overall article layout, and so much more, started out as something some one editor created. If someone is "imposing" something on one article, that's for the editors of that article to work out, not for some mass-change zealots to steamroll over.
  • We need a strategy on the mass change that affect the style of a page – I have no idea what that means.
EEng 16:51, 11 July 2018 (UTC)[reply]
EEng I do not want to get ot a further discussion right now on that. I just said that there are parameters of the story that we may be missing so it's better that we give this some publicity. Btw, mass-change zealots are editors too. Some of them have even got in the news exactly for their devotion to the project. -- Magioladitis (talk) 10:47, 16 July 2018 (UTC)[reply]
I don't know what "parameters of a story" are. And zealotry is a measure of the intensity of effort, not its utility. EEng 22:39, 16 July 2018 (UTC)[reply]

HTML entities

Greetings all, I'm currently updating the style-checking code that reports to Wikipedia:Typo Team/moss, and I need some clarity on which HTML character entity references (things like &amp;) are allowed or preferred. Variations that are not allowed or which are disfavored would be brought to the attention of human editors, along with other suspected style and spelling errors. There are occasional mentions of such entities in the Manual of Style, but no general rules that I could find. I would propose the following:

HTML character entity references

(edited to reflect the below comments)

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, "&euro;" is the same as "&#x20AC;", "&#8364;", or including the character "€" directly. For a comprehensive list, see List of XML and HTML character entity references. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.

  • In general, it is preferable to write characters directly instead of using an HTML entity reference. Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system.
  • Numeric references should not be used when there is a named reference available. For example, &minus; should be used instead of &#8722;
  • References must be used when the character itself cannot be used for technical reasons. For example, "]" cannot appear in wikilinks that use "[[" and "]]" to mark the start and end. The <nowiki> tag can also be used to prevent interpretation of special characters as wiki markup.
  • Named references are preferred when the characters themselves are easily confused. This includes:
    • Whitespace. The regular ASCII space " " should be typed directly, but entities should be used for others like "&nbsp;" and "&ensp;".
    • Dashes and similar characters. The regular ASCII hypen-minus "-" should be entered directly, but other characters might be entered with entities. For example, &minus; is generally preferred because "−" looks very similar to "-" in some web browsers. See Wikipedia:Manual of Style § Dashes for more usage guidelines.
    • Prime (′) and related symbols that resemble quote marks
  • Other guidelines ask that the Unicode characters not be used at all (except when the character itself is being discussed):

Initial discussion

What do folks think? -- Beland (talk) 19:39, 14 July 2018 (UTC)[reply]

  • Another set of characters to avoid are the superscript-digits (at least when used with a mathematical meaning). See MOS:MATH#Superscripts and subscripts. —David Eppstein (talk) 19:46, 14 July 2018 (UTC)[reply]
  • I disagree that mdash isn't easily confused -- in some fonts it definitely is. I'd pretty much advocate that everything not on a standard English keyboard (whatever the "standard English keyboard" is) should be symbolically represented by either a & form or a template. And I'm a little worried that the typo team link at the start of the OP talks about flagging "violations of the Wikipedia:Manual of Style"; I fear this will slide all too easily into a project to blindly "fix violations". EEng 20:06, 14 July 2018 (UTC)[reply]
    @EEng: OK, I'll drop the emdash example. As for scope...well, this is already a project to fix violations of the Manual of Style and English spelling and grammar, though it's never done blindly. In some cases it would be safe to make a bot to make certain substitutions (like converting numerical to named references), but that would require approval by Wikipedia:Bot requests to make sure it didn't have any unwanted side effects. Not sure why that is something to be afraid of; if we think a certain form is better for editors, that seems useful. We don't do that for spelling mistakes because there could be a good reason to keep the misspelling. Could you explain a bit why you feel it's better for an editor to come across say, &trade; instead of ™ when opening an article for editing? -- Beland (talk) 21:00, 14 July 2018 (UTC)[reply]
    I'm fine with replacing numerical refs and &trade; and so on; in fact I welcome it because, as I mentioned, I generally think everything not on standard keyboards should be expressed symbolically in the wiki source. Its the vague statement at Wikipedia:Typo_Team/moss that you're gonna find "violations of the Wikipedia:Manual of Style" that worries me. I don't mind automatically identifying apparent "violations", but what worries me is that that might slide into automatic "fixes" – worried because MOS isn't rigid, it needs to be applied with common sense, exceptions apply, etc. EEng 21:24, 14 July 2018 (UTC)[reply]
    Re replacing characters with entities or the reverse: what I don't want to see is slow-motion edit wars where one group of editors or bots regularly replace characters by entities and a different group regularly replace entities by characters. That sort of thing just clutters watchlists for no good reason. So I'd rather either see a very clear specification of which things should be expanded and which should be left as unicode (probably difficult to attain consensus for) or (more likely) something like WP:RETAIN where edits of this type are discouraged. —David Eppstein (talk) 21:32, 14 July 2018 (UTC)[reply]
    Absolutely agree. A hard-won consensus in advance will consume 1/1000 the editor time and energy wasted on a zillion skirmishes and rage-reverts all over the project. And certainly some part of that consensus might be that some things come under RETAIN (though honestly the less RETAIN stuff we have the better). EEng 21:35, 14 July 2018 (UTC)[reply]
    An explicit list would be great for me, since I have to code that into software anyway. I'll whip up a table. FTR, as of April there were a grand total of 7 numerical references the moss software could find, and I changed all of them just now. -- Beland (talk) 01:15, 15 July 2018 (UTC)[reply]

The proposal should be revised to make it clear how it relates to the advice already in the MOS at WP:MOS#Keep markup simple,

An HTML character entity is sometimes better than the equivalent Unicode character, which may be difficult to identify in edit mode; for example, &Alpha; is explicit whereas Α (the upper-case form of Greek α) may be misidentified as the Latin A.

Also the proposal should indicate where this addition would go into the MOS; context matters.

The proposal contains the statement "The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards." That's only partially true; in the version I use, there are a variety of special characters to choose from, but when I hover over them, there isn't any little hint that pops up telling me what the name of the character is. So it is hard to be sure if a character is an n dash or a minus. In another case, it's hard to tell a prime from an apostrophe. I've learned to tell an n dash from a hyphen, but I'll bet there's lots of editors who can't. Jc3s5h (talk) 22:18, 14 July 2018 (UTC)[reply]

Hmm, thumbnails for special characters would make a great feature improvement for the web UI. I agree it's a bit of a pain; I always have to paste characters into a search engine to figure out what they are. If we're making a big table of what should be which, maybe it would need to be on its own subpage? I'm agnostic as to where this goes, and I'm open to suggestions; I don't think it matters as long as it's easy to find. -- Beland (talk) 01:15, 15 July 2018 (UTC)[reply]
FTR, I have filed a feature request for the popup text to include the character name at [1] for anyone who wants to comment or follow along at home. Thanks for the suggestion! -- Beland (talk) 06:57, 16 July 2018 (UTC)[reply]

Second draft

(Edited to reflect the below discussion)

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, &euro; is the same as &#x20AC;, &#8364;, or including the character directly. For a comprehensive list, see List of XML and HTML character entity references [2].

In choosing between the numeric reference, named reference, and direct character methods, Wikipedia never uses the numeric reference when a named reference is available, and it usually prefers direct character input over named references (and edits in this direction are made by semi-automated systems like AutoWikiBrowser). For example, &minus; should be used instead of &#8722;, and é should be used instead of &eacute;. Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki> tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup. These preferences are detailed in the table below, and some instances where a given character is preferably not used at all (except where that character is itself the topic of discussion) are noted. Wikipedia editors are encouraged to follow these guidelines to make it easier for editors to read and understand wikitext, especially those not familiar with HTML notation.

Category Preferred forms Exceptions and notes
ASCII characters ! " % & ' + < = > [ ] Sometimes proximity to other characters causes misinterpretation of &, <, >, [, ], or ' as part HTML markup or wiki markup. In these cases, use &amp;, &lt;, &gt;, &#91;, &#93; or &apos;.
Latin and Germanic letters À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ Œ œ Š š Ÿ Instead of ligatures (Æ, æ, Œ, œ) write two separate letters, except in proper names and in text in languages in which they are standard – see Wikipedia:Manual of Style § Ligatures.
Greek letters Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϑ ϒ ϖ When written standalone (not part of a Greek word with other Greek characters), the following can be used to reduce confusion with similar-looking Latin alphabet letters: &Alpha; &Beta; &Epsilon; &Zeta; &Eta; &Iota; &Kappa; &Mu; &Nu; &Omicron; &Rho; &Tau; &Upsilon; &Chi; &kappa; &omicron; &rho;. μ (mu) and Σ (sigma) are nearly identical to µ (micro) and ∑ (sum), but the other characters are not used in Wikipedia so there is no potential for confusion.
Quote marks &lsquo; &rsquo; &sbquo; &ldquo; &rdquo; &bdquo; &acute; &prime; &Prime; ASCII quote marks are generally preferred. Wikipedia:Manual of Style/Dates and numbers § Specific units says not to use &prime; and &Prime; for inches and feet.
Dashes –/&ndash; —/&mdash; &horbar; &shy; &horbar; is not used by Wikipedia. For more info on &shy; (optional hyphen) see MOS:SHY.
Whitespace and non-printing &nbsp; &ensp; &emsp; &thinsp; &zwnj; &zwj; &lrm; &rlm; &ensp;, &emsp;, &zwnj;, and &zwj; are generally unnecessary. For more info on text direction, see MOS:RTL.
Math × ÷ √ ∝ ∝ ¬ ± ∂ ∇ ℵ ℜ ℑ ℘ ∀ ∃ ∈ ∉ ∋ ∅ ∏ ∑ ∠ &and; (∧ confused with ^) &or; (∨ confused with v) ∩ ∪ ∫ ∴ ∼ ≅ ≈ ≠ ≡ ≤ ≥ ⊂ ⊃ ⊄ ⊆ ⊇ ⊕ ⊗ ⊥ ⌈ ⌉ ⌊ ⌋ &lang; (⟨ confused with <) &rang; (⟩ confused with >) In some cases TeX markup is preferred to Unicode characters; see Wikipedia:Manual of Style/Mathematics § Typesetting of mathematical formulae. × (&times;) is used in article titles and also for hybrid species. ∑ (sum) should not be used; Wikipedia uses the nearly identical Σ (sigma).
Currency ¢ £ ¤ ¥ € $
Non-English punctuation ¿ ¡ « » &lsaquo; &rsaquo; &lsaquo; and &rsaquo; are not used by Wikipedia; < and > can be used instead.
Dots &middot; &bull; &sdot; "..." is preferred to "…" - see MOS:ELLIPSIS. Wiki markup should be used instead of these for lists; see Wikipedia:Manual of Style/Lists § List layout.
Diacritics ¨ ¸ ‾ ˜ ˆ
Arrows ← ↑ → ↓ ↔ ↵ ⇐ ⇑ ⇒ ⇓ ⇔
Other symbols ¦ § © ® ™ ° µ ¶ † ‡ ƒ ‰ ◊ ♠ ♣ ♥ ♦ µ (micro) is not used by Wikipedia; use μ (lowercase Greek letter mu) instead - see Wikipedia:Manual of Style/Dates and numbers § Specific units
Superscript and subscript ¹ ² ³ ª º Do not use Unicode subscripts and superscripts like these for numbers, per Wikipedia:Manual of Style/Superscripts and subscripts; use <sup> and <sub> instead.
Fractions ¼ ½ ¾ &frasl; These are not used unless discussing the characters themselves; for alternatives, see Wikipedia:Manual of Style/Dates and numbers § Fractions and ratios


Above is is a draft of a definitive list of whether the HTML reference or the character itself should be used, as suggested by other editors above. I noticed a few things:

  • Both the characters and the references are widely used for endash and emdash; allow both for now?
  • mu and micro are rarely if ever used in the same context; the direct form seems preferable? Same for sum and sigma?
  • ∼ (&sim;) and ~ (ASCII tilde) seem to be used interchangably but &sim; itself is used very rarely.

-- Beland (talk) 08:12, 15 July 2018 (UTC)[reply]

  • usually prefers direct character input over named references – That's too sweeping. I can see this is gonna take a lot of discussion. For starters, pinging David Eppstein for his thoughts on literal or symbolic for math symbols (not meaning to imply there's one simple answer to that). Not pinging SM because he'll find his was here without doubt and his user name is too hard to get right and it's late and I'm tired. EEng 08:32, 15 July 2018 (UTC)[reply]
    • I think it's very important to spell out &minus; as otherwise it's too difficult to distinguish from &ndash. Otherwise I don't feel strongly but I know I have seen legions of random AWB users replace &times; (e.g.) by its unicode character. So we should not encourage replacements that go the other way. —David Eppstein (talk) 16:30, 15 July 2018 (UTC)[reply]
    • @EEng: Well, if I'm counting right, out of the 252 named references, in 28 instances (11.1%), the proposal is recommending to use the reference over the character itself, and in 27 instances (10.7%) it's either not making a recommendation or different options are used in different circumstances. That leaves 78.2% of the time where the character itself is being recommended over the named reference. That seems to qualify as "usually"; am I missing something? -- Beland (talk) 21:55, 15 July 2018 (UTC)[reply]
      You're counting entries in the table; I'm counting occurrences in the wild i.e. I'd wager that the population of ndash + mdash in articles is greater than that of all those other characters put together, and those two should always be coded by name or template, IMHO. EEng 02:37, 16 July 2018 (UTC)[reply]
      @EEng: Ah, would it make more sense to say "for most characters prefers" rather than "usually prefers"? -- Beland (talk) 02:42, 16 July 2018 (UTC)[reply]
      At this point I don't know if anything needs to be said at all. I'm a bit unclear about something. Right now much or most of this advice, to the extent it's somewhere in MOS, is distributed among the various relevant sections. You're not proposing to insert this giant table somewhere, are you? Because then it will be in two places which will need to be kept in sync. EEng 03:36, 16 July 2018 (UTC)[reply]
  • WP:MOSNUM always uses the Greek letter mu or the html entity &mu; as the metric prefix for micro. I know some Unicode characters were created for obscure reasons such that Wikipedia has no interest in using those characters; I infer from it's low numerical code value &micro; (U+00B5, µ) exists as a way of coding the micro symbol that was used in some pre-Unicode character codes that didn't provide for most Greek letters, to permit round-tripping between those older character codes and Unicode. According to the Unicode Consortium, the Greek letter character is preferred,[1]. Maybe use the Greek letter mu directly, whether in a Greek word, the archaic stand-alone symbol for micrometer, or the metric prefix, and explicitly encourage editors to replace µ (U+00B5) with μ (U+03BC). Jc3s5h (talk) 10:31, 15 July 2018 (UTC)[reply]
  • As a comment, is convenient in templates when you want a whitespace. --Izno (talk) 21:58, 15 July 2018 (UTC)[reply]
    • Ah, this points out to me that the regular space (which is U+0032) actually doesn't have a named reference, so it probably doesn't belong on this chart.

References

  1. ^ Beeton, Barbara; Freytag, Asmus; Sargent, Murray III (30 May 2017). "Unicode® Technical Report #25". Unicode Technical Reports. Unicode Consortium. p. 11. {{cite web}}: Cite has empty unknown parameter: |dead-url= (help)

EEng made a good find, that &dollar; was missing. It turns out that this is because List of XML and HTML character entity references only goes up to HTML 4, and HTML 5 has a ton more, listed here. Given the length of the resulting table if we include all of them, maybe we should just say "use the character itself except for those listed below" and list the ones where named references should be used? (And maybe continue to list the characters that should not be used at all?) -- Beland (talk) 03:53, 16 July 2018 (UTC)[reply]

I still don't understand why, to a first approximation, we're not saying that everything other than a-zA-Z0-9`~!@#$%^&*()-_=+[]{};':",./<>? should be given via &foo; or {some template}. Also, the table mixes advice on how to express various characters with advice on whether and when to use various characters. Not saying that's bad, just worth noting. EEng 04:12, 16 July 2018 (UTC)[reply]
I think accented Roman letters should certainly be written as e.g. á not &aacute;. More generally I am in favor of using unicodes over html entities or templates in most cases, with exceptions for characters like &amp; (when written next to something that would cause it to expand to a different entity) or &minus; (because there is too much possibility for confusion with other dash-like characters). Also, as an aside, the text above about avoiding ligatures is too strong; when these characters occur in the standard spelling of a name (e.g.), we should write them that way even when we are writing in English. —David Eppstein (talk) 04:25, 16 July 2018 (UTC)[reply]
Re accented Romans, I did say "to a first approximation". Re ligatures, the text says "except proper names" -- is that not enough? EEng 05:05, 16 July 2018 (UTC)[reply]
I did a quick database check, and as of April 2018, – is more popular than &ndash; by a ratio of about 10.6:1.
My thought on combining "how" and "whether" is that it's entirely likely the answer to the question "how do I put this character into Wikipedia?" is "please don't, use this other one", so having it all in one place is handy. -- Beland (talk) 05:28, 16 July 2018 (UTC)[reply]
The fact that literal ndash is 10X as common as symbolic just shows how much work we have to do -- in my edit window it's very hard to tell ndash from hyphen or mdash unless they're next to each other. I'm fine with combining both kinds of advice, though (again) I'm not sure what exactly where this big table is gonna go. EEng 06:38, 16 July 2018 (UTC)[reply]
Well, you were using your guess that the numbers were the other way around as an argument for a wording change. The current preponderance might be evidence that most editors prefer the raw characters, or maybe it's just what people do because the UI is designed to encourage that. That fact that the UI is the way that it is may be an indication that there is not great support for using &ndash; and friends. I can generally tell the difference between dashes of different lengths, though if some people can't, that may be an indication that it just doesn't matter that much. In any case, given the lack of consensus on this, the current proposal is to remain neutral on the choice for ndash and mdash, and let editors decide on a page-by-page basis. In contrast, for other characters like ∀ and °, which can be clearly distinguished by everyone, I haven't heard a good argument for why those shouldn't just be used directly. -- Beland (talk) 23:26, 17 July 2018 (UTC)[reply]
  • Well, you were using your guess that the numbers were the other way around – No, you're mixing up two different things. I conjectured that ndashes and mdashes, together, make up the bulk (counting each use separately) of all these not-on-the-keyboard characters; that was without regard to how those characters were expressed (literal vs. symbolic).
  • that the UI is the way that it is may be an indication that there is not great support for using &ndash; and friends – WP's facilities and interfaces are full of debris that's little used or even "impossible" to use (e.g. template parameters that want to present information that an RfC has determined should never be presented). Trying to infer how things are spozed to be based on things you see in the UI will get you way off track very, very fast.
  • I can generally tell the difference between dashes of different lengths – So can I, easily in the rendered page, but in the wikisource only with a bit of effort, if I make a point of looking. It's that last bit that's the rub: in the rendered page an ndash vs. mdash look like – vs. —, but in the wikisource they're much more similar i.e. vs. . (What you see in that sentence may depend on your skin, so your mileage may very.) Thus it's easy in copyediting to not notice that the wrong one is present, and that's why symbolic names should be used instead. (If we really cared we'd suggest that hyphens be rendered as &hyp; as well. I actually tried that once in an article but got laughed off the stage, so we'll just have to live with using the literal -. What I usually do is when I see e.g. a date range like 1899-1920, I just change the literal hypheny-dashy thing that's there to &ndash, so that I know it's the right thing.)
  • I haven't heard a good argument for why those shouldn't just be used directly – Clearly a quotation in a language using a non-Roman script should just present that text literally. For everything else, there are a lot of pros and cons relating to how many different special symbols are used (in a given article), the extent to which each one is used repeatedly, how potentially confuse-able they are for one another or for something else not even used on the page page, the likely sophistication of editors who might work on the article, and a lot more. Here's a random example: WP:MOSNUM says arcminutes should be denoted by a prime and not an apostrophe or a single quote i.e. ′ but not ‘ or ' . Once again, you have to be looking to notice if the wrong one is there; thus MOSNUM suggests that the markup &prime; be used to save editors squinting. Unfortunately different considerations come into play for different symbols, so separate analyses are needed in each case. That's why I predicted this discussion would take a long time.
EEng 03:32, 18 July 2018 (UTC)[reply]

As for the general direction of the advice, using characters directly seems to be the recommended best practice for web development generally. It's more WYSIWYG and easier for web editors to read and think about. It also fits the goal of not forcing editors to learn HTML in order to be able to use Wikipedia; they can just input and edit these characters in the same way they do elsewhere like Word or phone apps or other web sites. We also have a UI right below the text-being-edited box which encourages people to add the characters directly; it would be weird if the advice is to generally use the references because that's not what the system is designed to encourage. The escaping system was originally designed to allow input of special characters that were part of SGML or HTML itself (like angle brackets). Later it became a way to work around the limitations of ASCII. But modern web sites all use Unicode now, as does Wikipedia, so it's a bit of an obsolete workaround. I think any system where you have to learn a special language for telling a computer something is less user-friendly than a system where you can express your intention in the way you would express it to other humans. -- Beland (talk) 06:29, 16 July 2018 (UTC)[reply]

I think we should treat it like citations: citations are hard, both inside Wikipedia and outside. Just see what happens in any university freshman humanities class where citation expectations are rigorously enforced for the first time in most student's life. So at Wikipedia we're satisfied if the first editor gives some way to find the source; gnomes can improve the citation format later. And the tools to do the improvement exist.
Similarly, editors who are not skilled with markup can do the best they can with the visual editor and other editors can improve it. The editors who make the improvements need the tools to do so, and bots must not overrule their contributions by converting html entities to characters.
The idea that you can write documents and web pages with purely WYSIWYG tools is only true if you're writing some thing simple, or you're a slob. That's why Microsoft Word has a little paragraph symbol so you can turn on the display of paragraph marks. That's why WordPress has two editing tabs, WYSIWYG view, and HTML view. The Wikipedia editors are quite primitive, hence the need for HTML entities continues. Jc3s5h (talk) 10:54, 16 July 2018 (UTC)[reply]
I agree contributions of new editors should be welcomed whetehr or not they follow this sort of guideline; I added language to that effect in the draft. -- Beland (talk) 23:26, 17 July 2018 (UTC)[reply]

General comment This discussion may affect WP:CHECKWIKI error 11. The error is currently disactivated. -- 11:10, 16 July 2018 (UTC)

  • A couple of quick responses:
    1. Wrap the table's characters-as-such, not just the HTML character entities, with <code>...</code> or perhaps with {{kbd}}, whatever looks better (semantically, it can be either – it's code when viewed in the wikitext but also input when you're entering it). If we don't like any of the faint-background effects, use bare <kbd>...</kbd>, which just uses monospace. I would go with <code> because the table already uses a light grey and it blends in well, while also not requiring any template calls.
    2. That for which we're providing entity codes should also be shown as characters.
    3. That for which we're showing characters but recommending/allowing entity codes should also be shown as those codes.
    4. "ASCII characters": Present the characters in the same order as the codes in the later column.
    5. "Greek latters: Change "but the other characters are not used" to "but these latter two characters are not used".
    6. "Dashes": This is a misuse of the slash character and and results in confusing typographical gibberish: "–/&ndash; —/&mdash;". Try: "– (&ndash;), — (&mdash;),". Also, "For more info on ­ (optional hyphen) see MOS:SHY" is a misuse of parentheses (round brackets), seeming for some kind of emphasis. Should just remove them.
    7. "Whitespace and non-printing": should also including &hairsp;; like &thinsp; it is generally only used for kerning in templates and such; there is usually not any reason to manually insert either into an article.
  • # "&lsaquo; and &rsaquo; are not used by Wikipedia; < and > can be used instead" is wrong; the are not the same character and should not be confused. If we need to illustrate French quotation style, etc., use the correct characters, not lesser-than and greater-than, which serve an entirely different purpose. This is pretty much exactly like hyphen vs. dash vs. minus.
 — SMcCandlish ¢ 😼  07:28, 17 July 2018 (UTC)[reply]
The weird "shy" line was due to a typo preventing &shy; from showing up at all. I fixed that. You're right about lsaquo; I must have messed up something when scanning the database for it. I'll change that and other points you mention in the next draft, as applicable. Thanks for reading! -- Beland (talk) 00:37, 18 July 2018 (UTC)[reply]

Third draft

Posted to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references

Proposed as new subsection titled "HTML character entity references" under Wikipedia:Manual of Style § Miscellaneous, replacing the second paragraph of "Keep markup simple".

HTML character entity references are a way to tell a web browser to render a certain character without including that character in the web page directly. Characters may be referenced by name, decimal number, or hexadecimal number. For example, &euro; is the same as &#x20AC;, &#8364;, or including the character directly.

On Wikipedia, characters should be used directly unless doing so is confusing for editors or causes technical problems. Numerical references should not be used if a named reference is available. For example, &minus; should be used instead of &#8722;, and é should be used instead of &eacute;. Edits favoring these conventions are made by semi-automated systems like AutoWikiBrowser. For a comprehensive list of available named references, see [3].

Wikipedia stores articles with Unicode, so any character that could possibly be referenced can also be input directly. The web site's editing pages have built-in special character support to make it easy to input characters not typically found on keyboards. Editors can also use the Unicode input method provided by their operating system. There are some exceptions where named references are preferred, to avoid confusion and to circumvent technical limitations. The <nowiki> tag can also be used instead of character escaping to prevent interpretation of special characters as wiki markup.

Characters to avoid |
Avoid Instead use Note
(&hellip;) ... (i.e. 3 periods) See MOS:ELLIPSIS.
Unicode Roman numerals like Latin letters equivalent (I II i ii) MOS:ROMANNUM
Unicode fractions like ¼ ½ ¾ &frasl; {{frac}}, {{sfrac}} See MOS:FRAC.
Unicode subscripts and superscripts like ¹ <sup></sup> <sub></sub> See WP:SUPSCRIPT. In article titles, use {{DISPLAYTITLE:...}} combined with <sup></sup> or <sub></sub> as appropriate.
µ (&micro;) μ (&mu;) See MOS:NUM#Specific units
Ligatures like Æ æ Œ œ Separate letters (AE ae OE oe) Generally avoid except in proper names and text in languages in which they are standard. See MOS:LIGATURES.
(&sum;) (&#8719;) (&horbar;) Σ (&Sigma;) Π (&Pi;) (&mdash;) (Not to be confused with \sum and \prod, which are used within <math> blocks.)
(&lsquo;) (&rsquo;) (&sbquo;) (&ldquo;) (&rdquo;) (&bdquo;) ´ (&acute;) (&prime;) (&Prime;) ` (&#96;) Straight quotes (" and ') Use {{coord}}, {{prime}} and {{pprime}} for mathematical notation; elsewhere use straight quotes unless discussing the characters themselves. See MOS:QUOTEMARKS.
(&lsaquo;) (&rsaquo;) « (&laquo;) » (&raquo;) Use &lang; and &rang; for math notation. In non-English quotations normalize angle quote marks to straight, per MOS:CONFORM, except where internal to non-English text, per MOS:STRAIGHT.
&ensp; &emsp; &thinsp; &hairsp; Normal space These are sometimes used for precision positioning in templates but rarely in prose, where non-breaking (&nbsp;) and regular spaces are normally sufficient. Exceptions: MOS:ACRO, MOS:NBSP.
In vertical lists

(&bull;) · (&middot;) (&sdot;)

* Proper wiki markup should be used to create vertical lists. See HELP:LIST#List basics.
&zwj; &zwnj; see note Used in certain non-English language words, see zero-width joiner/zero-width non-joiner. Should be avoided elsewhere.
£ for GBP, keep ₤ for Italian Lira and other lira currencies that use ₤ (see the main article for that currency) MOS:CURRENCY; find broken instances
Potentially confusing or technically problematic characters |
Category coded form (direct form) Notes
Miscellany &amp; (&) &lt; (<) &gt; (>) &#91; ([) &#93; (]) &apos; (') &#124; (|) Use these characters directly in general, unless they interfere with HTML or wiki markup. Apostrophes and pipe symbols can alternatively be coded with {{'}} and {{!}} or {{pipe}}. See also character-substitution templates and WP:ENCODE.
Greek letters &Alpha; (Α) &Beta; (Β) &Epsilon; (Ε) &Zeta; (Ζ) &Eta; (Η) &Iota; (Ι) &Kappa; (Κ) &Mu; (Μ) &Nu; (Ν) &Omicron; (Ο) &Rho; (Ρ) &Tau; (Τ) &Upsilon; (Υ) &Chi; (Χ) &kappa; (κ) &omicron; (ο) &rho; (ρ) In isolation, use coded forms to avoid confusion with similar-looking Latin letters; in a Greek word or text, use the direct characters.
Quotes &lsquo; () &rsquo; () &sbquo; () &ldquo; () &rdquo; () &bdquo; () &acute; (´) &prime; () &Prime; () &#96; (`) Can be confused with straight quotes (" and '), commas, and with one another. MOS:STRAIGHT generally requires conversion to straight quotes, except when discussing the characters themselves or sometimes with non-English languages. See next row for prime characters.
Apostrophe-like ' ` ´ ʻ ʼ ʽ ʾ ʼ ʽ ʻ ʼ
Dashes, minuses, hyphens &ndash; () &mdash; () &minus; () - (hyphen) &shy; (soft hyphen) Can be confused with one another. For dashes and minuses, both forms are used (as well as {{endash}} and {{emdash}}). Soft hyphens should always be coded with the HTML entity or template. Plain hyphens are usually direct, though at times {{hyphen}} may be preferable (e.g. Help:CS1#Pages). See MOS:DASH, MOS:SHY, and MOS:MINUS for guidelines.
Whitespace &nbsp; &emsp; &ensp; &thinsp; &hairsp; &zwj; &zwnj; In direct form these are nearly impossible to distinguish from a normal space. See also MOS:NBSP.
Non-printing &lrm; &rlm; In direct form these are nearly impossible to identify. See MOS:RTL.
Mathematics-related &and; () &or; () &lang; () &rang; () Can be confused with x ^ v < >. In some cases TeX markup is preferred to Unicode characters; see MOS:FORMULA. Use {{angbr}} instead of ) / ()
Dots &sdot; () &middot; (·) &bull; () Can be confused with one another. Interpuncts (&middot;) are common in horizontal lists and to indicate syllables in words. Multiplication dots (&sdot;) are used for math. In practice, the dots are used directly instead of the HTML entities.

FTR, as of the July 1, 2018 database dump, &lsqb; is used about 329 times and &lbracket; is used about 91 times, so I picked the more common one. -- Beland (talk) 15:04, 18 July 2018 (UTC)[reply]

  • While I still have my reservations about where this is going and the amount of effort it will take to iron all the bugs out, I'm warming up to this. EEng 15:35, 18 July 2018 (UTC)[reply]
  • The table asserts the &Prime; html entity resembles the ASCII backtick (`), and even have something displayed that looks like a backtick. But this is the real result of the &amp:Prime; html entity: ″. The table is just a mass of stuff and I wouldn't be able to find anything in there to make corrections. Jc3s5h (talk) 16:46, 18 July 2018 (UTC)[reply]
    • @Jc3s5h: Sorry, the backtick was missing from the second table; I just fixed that. It was rather exhausting to catalog everything and try to format it properly, so I didn't get a chance to double-check things. You're right about it being hard to read, so I also put each character in the second table on its own line, to make matching up characters and references easier. Is that clear enough now? Is it making the table too long? -- Beland (talk) 23:21, 18 July 2018 (UTC)[reply]
      • In the table, as rendered, &Prime; appears twice. Each time the character next to it is `, which is U+0060 and is named GRAVE ACCENT. But this is wrong; it should look like a double prime and is U+2033. It is used to mark seconds of time or seconds of arc; a backtick is completely wrong for that. Jc3s5h (talk) 00:04, 19 July 2018 (UTC)[reply]
Mostly looking good. It would put this at the bottom of MOS:TEXT, probably. Maybe in a section called "Unicode characters". We could see about cross-referencing it in various places.  — SMcCandlish ¢ 😼  02:07, 19 July 2018 (UTC)[reply]
Gave the boxes a spinshine/reorganization. Headbomb {t · c · p · b} 18:29, 19 July 2018 (UTC)[reply]

I posted this to Wikipedia:Manual_of_Style/Text_formatting#HTML_character_entity_references (there's another section there that talks about Unicode PUA and RTL characters) and cross-referenced from Wikipedia:Manual of Style § Miscellaneous. Feel free to edit the live version as needed. -- Beland (talk) 05:56, 20 July 2018 (UTC)[reply]

And thanks to everyone for greatly improving this section from the initial draft! It will be a great help to me in writing the code that will flag less-than-clear usage. -- Beland (talk) 05:57, 20 July 2018 (UTC)[reply]

Might be worth adding a comment in the Greek notes that the same sort of thing applies to Cyrillic letters that look like Latin and Greek ones; use the entity codes for clarity when discussing particular characters, but use the Unicode in actual Russian, Ukranian, etc. words. We probably needn't dwell on the details, since there's another proposal open for centralizing all the scattered Cyrillic-related material to one page. Then again, that's mostly to be about transliteration, so maybe the Greek section in the table should be Greek and Cyrillic?  — SMcCandlish ¢ 😼  04:11, 22 July 2018 (UTC)[reply]

Instances of character references for Cyrillic letters seem to be relatively rare. I don't see any on a casual skim through this report, though I'd have to go through the entire alphabet to definitively say they are never used. Unlike Greek letters, they aren't in common use for scientific and mathematical purposes. I think it would be simpler and probably more user-friendly just to say to use the Cyrillic characters directly, which is what the draft is currently proposing. -- Beland (talk) 07:58, 22 July 2018 (UTC)[reply]

Reversion of addition of third draft

So after I posted the tables proposed above, David Eppstein reverted, with the edit summary "what part of "I think you should be more patient"..."Try proposing something narrower and more specific" do you not understand?".

I think I did not see those remarks by David Eppstein and SMcCandlish because they were posted in the discussion ("Fraction slash" below) about the "Slashes" section of the main MOS page, which I did not check for comments before updating the "Text formatting" MOS subpage. SMcCandlish wanted a one-word change to the "Slashes" section, which he implemented. I think David Eppstein was commenting on the change he reverted, as he then wrote:

I'm not convinced that the html section is needed at all. It is more material for a guidebook on html than style guidance for Wikipedia editors. And you appear to have the purpose of using the new section as a bludgeon to begin a massive project of automatically reformatting characters in Wikipedia, which I think is a bad idea (watchlist clutter for no visible change to articles).

"Bludgeon" sounds pretty ugly and mean. I started a project to spell-check all Wikipedia, which is intended to improve its readability and credibility. Along the way I noticed that editors have also occasionally misspelled HTML character entity references. I thought as long as we're cleaning up the misspellings, we might as well clean up any undesirable forms, because right now we don't seem to be representing them consistently. I started this discussion because I couldn't find any guidance in the Manual of Style to help me write the code to correctly flag undesirable forms vs. ignore desirable forms.

Mediawiki markup uses this part of HTML syntax, and if we have a preferred form for these things we'd want to communicate that to editors, and the Manual of Style is the place to document choices of style rather than technical how-to for the benefit of editors, so I don't understand the criticism that this is not the right place for this sort of guideline. Especially since Wikipedia:Manual of Style#Keep markup simple already discusses exactly this point, and the other sections linked from the proposed tables also address which characters are preferred.

We already encourage editors to make edits that have no reader-visible changes but do have editor-visible changes intended to make wikitext easier to read and thus articles easier to edit. That's the whole point of Wikipedia:WikiProject Wikify and wikification. I do agree there are some edits that don't improve readability all that much that aren't that worthwhile on their own, like changing "==xx==" to "== xx ==". This seems less trivial than that. I'd also note we have Wikipedia:HTML5, a project which is doing nothing but replacing obsolete HTML tags with newer ones, with hopefully no user-visible changes.

There are less than 20,000 articles that even have HTML character entity references at all, less than 3.5% of all articles. Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big deal, and in reality it will probably take months or years to manually change all the instances, if that's what we want to do. At worst, editors who notice these changes happening will be educated about the desired way of doing things, and be more likely to input characters that way when adding new text.

Given that editors seem to use characters a lot more than references, and given that characters are built into the Wikipedia UI, it seems a lot less disruptive to move toward characters than away from them.

To illustrate the difference it makes to editors, consider an editor who comes across "São Paulo" in wikitext. To most people who are not web developers, that looks like a typographical error. Some English-speaking people might correct it to "Sao Paulo" which is often seen in English, or, getting the idea there might be an accent there, to "Sáo Paulo", which is incorrect. "São Paulo" is what Portuguese speakers are expecting to see - it's what they type with their keyboards, and it's what appears in Word docs and on the Portuguese Wikipedia and on Google Translate, and in the readable parts of other web sites. With "São Paulo", everyone knows exactly what's going on, and there's no need to waste time doing a search on the meaning of "atilde" or "&atilde" or whatnot.

If I were making the rules, I think I'd keep it simple and say to use characters directly except for otherwise invisible characters and those that cause technical problems when used directly. I'd actually be fine if we used ASCII hyphens for all of our dashes, but I'm not complaining if people who can see the difference on their monitors want to upgrade some of them to emdashes to make things look pretty as in the golden years of paper typography. That would make a much smaller table than the one proposed above, but given that other editors seem to feel more strongly about making it easy to tell the difference between certain lookalike characters, I think that table now represents a pretty good compromise. Leaving dashes and quotes as they are takes the biggest chunks of potential work off the table, anyway.

Given that this is proposing a simple general rule and then listing all the desirable exceptions to it, I'm not sure that a narrower proposal would make sense. The volume of comments has been relatively small, so having multiple discussions about the same topic it seems would just burn more editor time. I am, however, open to actionable suggestions. -- Beland (talk) 08:03, 22 July 2018 (UTC)[reply]

@David Eppstein: Did you have any thoughts in response? -- Beland (talk) 18:46, 23 July 2018 (UTC)[reply]
I don't think we should be setting up automatic processes that make neither a visible change to article content nor a semantic difference to the markup of the articles. And I don't think we should be prescribing such things in the MoS and by doing so encouraging such processes. —David Eppstein (talk) 18:54, 23 July 2018 (UTC)[reply]
@David Eppstein: OK, would you be happy if the guideline said that all such changes be made manually? -- Beland (talk) 20:27, 23 July 2018 (UTC)[reply]
Still not strong enough. I would prefer that such changes be made only as part of other substantive changes to articles (more or less what usually happens now with AWB users; see WP:AWBRULES #4). —David Eppstein (talk) 20:35, 23 July 2018 (UTC)[reply]
OK, I think that will lead to undesirable forms lingering around for a long time for no particularly good reason. -- Beland (talk) 20:56, 23 July 2018 (UTC)[reply]
(And I think leaving those forms around would generate higher cognitive load and more work for editors than the messages generated by removing them.) -- Beland (talk) 21:01, 23 July 2018 (UTC)[reply]
(ec with D.E.) Way TLDR. I warned you that this would take a LOT of work and patience before it would be ready to become part of MOS. Your table, without question, inadvertently trods on a lot of toes in the form of established ways various groups of editors do things in various topic areas. It would be wonderful to systematize and summarize and centralize all this but, like I said, it's gonna be a lot of work. And it's one thing to come up with a guide for future editing; it's quite a different one to use it for some mass-change project. To be blunt, if you think that Even if we changed all of them today, given the sheer volume of changes to the encyclopedia it would not be a big deal then there are some things you really don't understand; if you made changes like this to 3% of articles in one day, or one week, or even one month, you'd be strung up by your URLs.

I haven't been following that last week of discussion so I don't know where we are and what the open issues are, but if you want this to see the light of day you need to be prepared to keep plugging for quite some time to work through all the details with all interested parties (not that I even know how to find them). I've gone through an effort like this myself elsewhere in MOS and it can be an exhausting task, though you will be quite rightly congratulated by all in the end if you can pull it off, because it will be a very useful achievement for the project. EEng 19:05, 23 July 2018 (UTC)[reply]

What does "ec with D.E." mean? If you think I should consult more people, but don't know how to go about doing that, that's not really an actionable suggestion. -- Beland (talk) 20:25, 23 July 2018 (UTC)[reply]
It means "edit conflict"; EEng and I wrote our comments in parallel. —David Eppstein (talk) 20:36, 23 July 2018 (UTC)[reply]
@EEng: As far as I know, the only open issue is whether these improvements would justify their own systematic edits. To a large degree, this is just codifying current practice so we can clean up stragglers, so I don't expect very many objections. -- Beland (talk) 07:11, 24 July 2018 (UTC)[reply]
This will need much wider exposure before you can have that kind of confidence. EEng 07:17, 24 July 2018 (UTC)[reply]

How do other editors feel about David Eppstein's proposal for a rule that "such changes be made only as part of other substantive changes to articles"? Personally, I don't see the need for that, given the arguments I made above, but of course I'll implement whatever the consensus is. -- Beland (talk) 20:46, 23 July 2018 (UTC)[reply]

Double murders – "Murder of" or "Murders of"?

I originally posted this question on WT:TITLE, and was referred here. Should the title of an article about a double-murder be "Murder of [victims' names]" (singular) or "Murders of..." (plural)? There seems to be no consistent policy, e.g. Murder of Harry and Harriette Moore vs. Murders of Alison Parker and Adam Ward, even though in both cases the killings happened at the same time and were not a separate "event".--Muzilon (talk) 22:07, 14 July 2018 (UTC)[reply]

In my opinion, that's one of the many things that don't really need to be consistent across articles; they matter far more to some Wikipedia editors than to Wikipedia readers, even those few readers who notice the differences at all. That being the case, I'd suggest WP:RDL for such a question. Others may disagree, since there is little community agreement as to the role of MOS. ―Mandruss  01:32, 15 July 2018 (UTC)[reply]
Yeah, this isn't something MoS has a line-item about. I suspect the answer at RDL would be that it'll come down to the number of events, broadly defined. A massmurder is a single murder event. A serial-killing spree is a plural series of murders. You'll find this split pretty commonly in RS. E.g. "the murder of millions of Jews and others by the Nazis", not "murders", which kind of implies Nazis stalking around the countryside at night in hoodies to stab people from the shadows. The more usual phrase treats the entire Holocaust as as single extended mega-event.  — SMcCandlish ¢ 😼  07:11, 17 July 2018 (UTC)[reply]

Fraction slash

By my reading of Wikipedia:Manual of Style/Dates and numbers § Fractions and ratios, it looks like / (ASCII slash) is used in inline fractions instead of ⁄ (fraction slash &frasl;). This conflicts with Wikipedia:Manual of Style § Slashes which recommends &frasl; over /. The "Slashes" section also recommends {{frac}} but fails to mention all the other things that are recommended instead, like <math> and writing out in English. I think the simplest way to resolve this would be to change the "Slashes" section from:

  • in a fraction (7/8), though the "fraction slash" (7&frasl;8, producing 7⁄8) or {{frac}} template ({{frac|7|8}}, producing 78) are preferred

to:

Does that make sense?

Oh, and the two other instances of "fraction slash" in that section would just need to be changed: "(and fraction slashes)" -> "(and slashes in fractions)" and "to slash or fraction slash" -> "to slash".

-- Beland (talk) 07:11, 15 July 2018 (UTC)[reply]

  • Please don't misunderstand me because I don't want to discourage you – I think it's great that you're trying to clear this kind of markup and coding and typesetting stuff up. But you already have one thread open which – trust me – will take a LOT of editor time and attention to bring to fruition, and one such discussion at a time is enough. Let me suggest you withdraw this question for now and pick it up later. And by withdraw I mean delete the thread (including this comment of mine). Otherwise someone will inevitably post something beginning, "Well, let me just say..." and before you know it it will be a discussion after all. EEng 08:41, 15 July 2018 (UTC)[reply]
    • Well, this is a question about a conflict between the existing guides that needs to be resolved regardless of the outcome of the above discussion. I think it's also a bit orthogonal to ask how the character should be expressed vs. whether or not it should be expressed at all; some people may have feelings on this much narrower question. I'm sure I can walk and chew gum at the same time. -- Beland (talk) 21:24, 15 July 2018 (UTC)[reply]
  • I doubt you will achieve consensus on this. The usual slash is far far more convenient than anything else and there is too little benefit for the other slash. And really the <math> template should be used for anything serious; everything else like {{frac}} is a workaround for some rendering issues rather than a universal solution. —David Eppstein (talk) 16:27, 15 July 2018 (UTC)[reply]
To be more explicit: It is definitely not the case that {{frac}} is acceptable for use in mathematics articles. Maybe the standards are different for non-mathematical articles using fractional weights and measures. And your frasl example looks horrible also (the digits are too close to the slash); I think frasl is intended only for use with the raised/lowered fractions like those produced by {{frac}} that are disallowed in mathematics articles here. I think neither of these should be encouraged. —David Eppstein (talk) 23:32, 16 July 2018 (UTC)[reply]
Yes. If it were used in running text, it would need some kind of template to do kerning with CSS, an thus we have {{frac}} already.  — SMcCandlish ¢ 😼  07:06, 17 July 2018 (UTC)[reply]

Well, since I'm removing encouragement to use frasl and {{frac}}, and people don't seem to like those, it seems like there is consensus for this change. Please correct me if I'm wrong. The only remaining encouragement for either of those things is on Wikipedia:Manual of Style/Dates and numbers § Fractions and ratios which says {{frac}} is to be used in limited circumstances and discourages HTML character fractions that would use frasl. (So if you still have a problem with that guidance, the talk page for that policy page is probably the place to take it up.) -- Beland (talk) 14:57, 18 July 2018 (UTC)[reply]

You seem very aggressive in interpreting disagreement with parts of your proposal as consensus for everything else. I think you should be more patient. I have certainly not yet agreed that this is, in general, a good direction to go. —David Eppstein (talk) 01:52, 19 July 2018 (UTC)[reply]
Yes. Try proposing something narrower and more specific now.  — SMcCandlish ¢ 😼  02:02, 19 July 2018 (UTC)[reply]
@David Eppstein: @SMcCandlish: Are you objecting to something about fraction slashes or something about the HTML entities section which is being discussed above? What is it that you would want to see changed, or how would you want to see the scope narrowed? -- Beland (talk) 15:30, 20 July 2018 (UTC)[reply]
Re-reading what you originally posted, you claimed there's a conflict between two sections but this doesn't seem clear. You can write "2/3" or use {{frac|2|3}} (which uses &frasl;). If there's anything to resolve, it's probably to change Wikipedia:Manual_of_Style#Slashes to stop saying to use &frasl; inline in general; it should be / in cases like "2/3". We're only using fraction-slash in templates that super/sub script the numerals. We don't want people to use <math> markup for basic inline fractions; it's for more complicated usage. But that section is about slashes, including certain ways of doing fractions, not about fractions in general, so we don't need to go into writing out "two-thirds" or using math markup or [not] using Unicode fraction glyphs; just this: use / not ⁄ for fractions with digits, or use the frac template if you want super/sub-scripted fractions. A cross-ref to the MOS:NUM section on fractions should be sufficient otherwise. Your "though other techniques are usually preferred" isn't correct; sometimes another technique is preferred.  — SMcCandlish ¢ 😼  15:56, 20 July 2018 (UTC)[reply]
@SMcCandlish: So, the proposed change has already been made to this page. Am I reading correctly that the only modification you are requesting to that is changing "usually" to sometimes? -- Beland (talk) 01:35, 21 July 2018 (UTC)[reply]
That's the part that caught my eye; I already tweaked it on the live page.  — SMcCandlish ¢ 😼  02:24, 21 July 2018 (UTC)[reply]
I'm not convinced that the html section is needed at all. It is more material for a guidebook on html than style guidance for Wikipedia editors. And you appear to have the purpose of using the new section as a bludgeon to begin a massive project of automatically reformatting characters in Wikipedia, which I think is a bad idea (watchlist clutter for no visible change to articles). —David Eppstein (talk) 16:15, 20 July 2018 (UTC)[reply]
@David Eppstein: OK, this is not the right section for this discussion. Would you like to append this to the previous discussion, start a new section, or something else? -- Beland (talk) 01:35, 21 July 2018 (UTC)[reply]
Well, I had a long reply I didn't want to sit on any longer, so I copied the above and posted my reply in a new subsection of the "HTML entities" discussion above, "Reversion of addition of third draft". -- Beland (talk) 08:07, 22 July 2018 (UTC)[reply]
But what is "the html section"? I'm not seeing such a subsection at either of the MoS sections under discussion, nor in the wording stuff under discussion here. I strongly agree that "a massive project of automatically reformatting characters in Wikipedia ... is a bad idea"; much of the WT:MOS discussion over the last month has been about two (three, counting a revert spree) of mass "MoS enforcement" waves of robot-like edits, over which at least one person lost their WP:AWB permissions (should've been at least two, if you ask me).

The only thing I've seen in this discussion which should be implemented across a bunch of articles – and only as an add-on to more substantive changes – is conversion of things like "2⁄3" to "2/3". I would suggest asking at WP:GENFIXES if someone can think of a way to add that to AWB's "General Fixes" scripts in such a way that it a) does not clobber mentions of the &frasl; entity, &ampl#8260; entity, &#x2044; entity, or Unicode character outside of fractions; b) does not replace it in constructions like <sup>2</sup>⁄<sub>3</sub> or in templates that [correctly] use it; and 3) which (preferably) does replace the entities with the Unicode glyph in cases that are of the form <sup>2</sup>⁄<sub>3</sub>. If someone wants to convert those to {{frac}} or to <math> markup, they need to do that on a case-by-case basis, since any given markup may have been used for a contextually legitimate reason.
 — SMcCandlish ¢ 😼  02:09, 21 July 2018 (UTC)[reply]

@SMcCandlish:: As of July 1, 2018, there were only 8 articles detected using &frasl;: Assouad dimension, City and South London Railway, Discrete valuation ring, Q-derivative, Winding number. I think I already manually fixed all of them except the first one. The moss project picked up only three instances of numerical references in April, and I already converted all of those. So, none of those will result in a "spree". As for the fraction slash character itself, it's more widely used, appearing in article titles and also English text, it looks like because some people have confused it with a slash. I can add a rule that will post any word containing that character to the moss complaint list and just point editors at Wikipedia:Manual_of_Style/Dates_and_numbers#Fractions_and_ratios to resolve such instances using their judgment. We do have literally about half a million possible typos to get through just involving the letters a-z, so I'm not sure how quickly that list would be cleared unless a particular editor or editors feel that the resulting typography is just an abomination that needs to be removed from the face of the Internet. For people using automated editing tools, maybe it needs to be a "detect the problem but don't try to automatically fix it" thing, since the preferred solution depends on context. -- Beland (talk) 02:54, 21 July 2018 (UTC)[reply]

Okay. But which "moss" thing are we talking about? WP:MOSS = MOS:SPELL (Wikipedia:Manual of Style/Spelling). That's not a project and isn't a list of pages with stuff to fix.  — SMcCandlish ¢ 😼  03:12, 21 July 2018 (UTC)[reply]
@SMcCandlish: Wikipedia:Typo Team/moss. -- Beland (talk) 14:22, 21 July 2018 (UTC)[reply]
Ah, so! I added a disambiguation hatnote atop WP:Manual of Style/Spelling.  — SMcCandlish ¢ 😼  14:29, 21 July 2018 (UTC)[reply]

Merge proposed: WP:NCCOMICS to MOS:COMICS (which is already ~50% NC material)

 – Pointer to relevant discussion elsewhere.

Please see Wikipedia talk:Manual of Style/Comics#Merge in WP:NCCOMICS

Gist: We have WP:Manual of Style/Comics, the top half of which is naming-conventions material. Then we have WP:Naming conventions (comics), a competing comics naming convention. This is a silly WP:POLICYFORK. Having a combined guideline is thus proposed, based on successfully combined MoS/NC pages in other topics.  — SMcCandlish ¢ 😼  08:42, 19 July 2018 (UTC)[reply]

Merge the Cyrillic advice to one guideline

We have a problem. All of these pages overlap, and none of them are actually guidelines:

The non-mainspace pages are redundant and hard to find, likely to conflict and diverge, and not authoritative. They're moribund and all but forgotten, yet listed at Wikipedia:Romanization as if they're guidelines (it also lists articles like Romanization of Kyrgyz as if they are). Mostly what they say is not really naming-convention material in particular, but general MoS material that also happens to apply to article titles. They have inconsistent names and organizational approaches.

I think these should just be merged into a single WP:Manual of Style/Cyrillic, with a general table, footnoted as needed for specific languages where there are variances (or perhaps use different table rows for this?). Have language-specific sections with detailed notes. If anything in it is truly a naming convention (i.e., applies only to titles), this can be put in a separate paragraph, with a shortcut, like WP:NCUKRAINIAN or whatever, as needed; the page will cross-categorize as both an MoS and an NC guideline. We're already doing this with various topical MoS/NC pages, and with WP:SAL, and it works fine (better, actually, that splitting this information across multiple pages). We should actually be doing more of this; see, e.g., the note above about erasing the pointless WP:POLICYFORK that we have between WP:NCCOMICS and MOS:COMICS (which has its own naming conventions section).  — SMcCandlish ¢ 😼  08:55, 19 July 2018 (UTC)[reply]

  • I can agree on this — as long as we remember how many languages (most of them are not even Slavic ones) are using Cyrillic alphabet with so different phonetics. A unified page can become quite bloated. However, because it's not supposed to be a very particular «Englification of Russian», it's better be «Latinization (Romanization) of Cyrillic». Tacit Murky (talk) 15:36, 20 July 2018 (UTC)[reply]
    Sure. We have little actual material to cover that isn't Russian or Ukrainian. Most subjects on en.WP that might have a name in any of the Siberian languages also have a name in English or in Russian that will be more familiar to our readers. I would think we should consolidate and arrange the existing Cyrillic latinisation material at Wikipedia:Romanization and no add to it unless/until we see a need to do so.  — SMcCandlish ¢ 😼  01:17, 21 July 2018 (UTC)[reply]
  • @Beland: You seem to have a good eye for the table tweaking. Care to give this one a go?  — SMcCandlish ¢ 😼  01:17, 21 July 2018 (UTC)[reply]
My only interest in Slavic language words is that they be tagged with <lang> to indicate to spell/grammar checkers that they are not English, and to hint to TTS systems what pronunciation system they should use. -- Beland (talk) 01:41, 21 July 2018 (UTC)[reply]
Sure. Now that {{lang}} has been reworked, a bunch of people are working on doing this consistently, though it's very gradual.  — SMcCandlish ¢ 😼  03:13, 21 July 2018 (UTC)[reply]

The vast majority of articles about U.S. colleges and universities begin with sentence like this: "<Institution> is a <list of adjectives> college/university in <city>, <state>." In many cases, both the city and state are linked to their respective articles. In some cases, they both link only to the city. Is there a firm consensus that the MOS favors or discourages one of these two approaches? (Jweiss11‎ and I had a brief discussion about this on Jweiss11‎'s Talk page if anyone would like a little bit more background.) ElKevbo (talk) 14:06, 20 July 2018 (UTC)[reply]

WP:MOSLINK discourages overlinking, and discourages bunched linking where possible. It might be useful to link the city, provided it's not a well-known city such as LA, NYC, Chicago, or a host of others that English-speakers are likely to be familiar with. But I'm struggling to see why the US state is worthy of a link as well. Is there something I'm missing? This is better raised at WT:MOSLINK. Tony (talk) 14:18, 20 July 2018 (UTC)[reply]
Am I correct in inferring that the concern here is that state names are familiar to most readers and thus don't need a link? I ask not only because of the current discussion about linking but because that also ties into another question I have (which isn't related to the MOS) which concerns the inconsistent inclusion of ", United States" in the lead sentence of these articles.
It's also worth noting that part of this discussion is related to the fact that many colleges and universities are public and therefore governed by their respective states so we're not just concerned with geography. ElKevbo (talk) 14:29, 20 July 2018 (UTC)[reply]
MOS:SEAOFBLUE discourages back-to-back links. If it is being mentioned, the city is the location of interest, even if its name is being qualified by the state. The state link is inevitably linked in the city article.—Bagumba (talk) 14:46, 20 July 2018 (UTC)[reply]
The issue here is the back-to-back bunching of a more specific wikilink with a less specific wikilink when just the more more specific wikilink will do. We should also note here that Template:Infobox university has separate fields for city and state, which render back-to-back wikilinks. Perhaps this should be remedied? Jweiss11 (talk) 15:51, 20 July 2018 (UTC)[reply]
WP:USPLACE is also a factor here. Except for a few very notable exceptions, the articles for towns and cities located in the United States already include the name of the State in their titles (using the format "<City, State>"). For example, the city of Ann Arbor, Michigan (linked to in the article for the University of Michigan)... is formatted as: "[[Ann Arbor, Michigan]]", NOT "[[Ann Arbor]], [[Michigan]]".
However, there are those few exceptions... for example, our article on the city of Chicago doesn't include the name of the State (Illinois) in the title (Personally, I think it should, but consensus has deemed otherwise). Now... this will impact our article on DePaul University (which is in Chicago). The question is: do we want to include a link to Illinois, or is the link to Chicago enough? Blueboar (talk) 01:04, 21 July 2018 (UTC)[reply]
Something else we should consider, some of this is simply due to lazy writing... Let me give an example: While it is helpful for the University of Notre Dame article to specify what state Notre Dame is located in... there is absolutely no need to specify what state the University of Michigan or Ohio State University are in (the name of the institution kind of gives that fact away). So... we could avoid the entire "see of blue" issue and use piped links (writing: "The University of Michigan has it's main campus in the city of Ann Arbor" or "Ohio State University is located primarily in the city of Columbus"). Trying to make everything follow a consistent pattern can limit your options. 02:07, 21 July 2018 (UTC) — Preceding unsigned comment added by Blueboar (talkcontribs)
Yep.  — SMcCandlish ¢ 😼  03:15, 21 July 2018 (UTC)[reply]
Assuming of course that there are no American equivalents to the University of Warwick, which isn't based in Warwick but in nearby Coventry .Nigel Ish (talk) 10:51, 21 July 2018 (UTC)[reply]
When the First Unitarian Church of Berkeley moved to Kensington, they decided not to rename themselves First Unitarian Church of Kensington. (It'll come to you.) EEng 11:12, 21 July 2018 (UTC)[reply]
Sure, when something that was named for a location isn't actually in that location, the lead does need to more clearly specify where it actually is. However, that scenario is highly unlikely for universities named after US states. Blueboar (talk) 11:53, 21 July 2018 (UTC)[reply]
Highly unlikely?? I suppose you've never heard of Washington University. —David Eppstein (talk) 18:55, 21 July 2018 (UTC)[reply]
Named after the man, not the State... but since not everyone knows that, it could be confusing... so, sure, that would be one where we would include the state location as well as the city. Blueboar (talk) 22:42, 21 July 2018 (UTC)[reply]
(edit conflict) I've never even visited the United States, so I don't know, but I have to imagine the probability of a school with such a name existing, even apart from EEng's example above, is overwhelmingly high. That said, writing the lead sentence to say Fu University is a university located in Notfu, Wisconsin. should be discouraged anyway; if the fact that it is located somewhere in spite of its name is important enough to be noted in the lead, then it should be noted separately (Despite its name, Fu University is actually located in Notfu [for reason X].), not in the lead sentence where all it will do is confuse readers and potentially cause them to believe the page has been vandalized. As for linking, I'm inclined to say case-by-case: most Japanese university articles seem to include such links, and not doing so with American institutions because everyone knows what an Ohio is reeks of WP:SYSTEMIC. Hijiri 88 (やや) 12:00, 21 July 2018 (UTC)[reply]
That's the approach I take to such cases (not just universities but "names that don't make sense" in general). It also drives me nuts when I see a "Fu University is a university ..." construction or the like, anyway. It's terrible writing that treats our readers like they've had lobotomies.  — SMcCandlish ¢ 😼  17:52, 21 July 2018 (UTC)[reply]
Like Hijiri88, I'm wary of the assumption that most readers automatically recognize the names of most U.S. states. ElKevbo (talk) 13:32, 21 July 2018 (UTC)[reply]
I don't think the question is whether the state name should be included, but whether it should be linked. For example, should Yale University link to [[New Haven, Connecticut]] or [[New Haven, Connecticut|New Haven)]], [[Connecticut]]. Natureium (talk) 13:52, 21 July 2018 (UTC)[reply]
I would say the first... no need to link to the state article separately. Send the reader to the article on the city ... as that will probably give more relevant information when coming from a university article (such as what neighborhood the university is in, or if there has been any “town vs gown” history, or if there are other universities in the same town, etc)... the reader can get to the article on the state from there. Blueboar (talk) 23:02, 21 July 2018 (UTC)[reply]
I know what the question is - I'm the one who originally asked it. :) But one editor has proposed omitting the location entirely in cases where the institution's name includes the location. ElKevbo (talk) 14:12, 21 July 2018 (UTC)[reply]
The answer is that there is no single “correct” way to do it... there are lots of “correct” ways; and wording that works at one article, may not work at another. That said... in general... a well written article phrases things to avoid unnecessary repetition and avoids over linking. Blueboar (talk) 23:02, 21 July 2018 (UTC)[reply]

Trypophobia article – using wording from quoted text

Opinions are needed at Talk:Trypophobia#Latest changes. The discussion concerns whether or not it is fine to quote this source as much as desired without the use of quotation marks, and whether or not we should always use a source's exact words. Regarding the latter, the question is whether it's WP:Original research to use our own wording as opposed to a source's exact words and whether wording like this needs to be tagged as WP:Weasel. The discussion additionally concerns stating things in Wikipedia's voice when sources disagree, the research is new, and/or there is no consensus in the literature on the matter.

On a side note: The Trypophobia article contains an image that some find distressing. So a heads up on that. Flyer22 Reborn (talk) 15:31, 20 July 2018 (UTC)[reply]

Already commented there, even before the ping. --Tryptofish (talk) 16:55, 21 July 2018 (UTC)[reply]

Ordering of gendered titles when a gender-neutral equivalent is unavailable?

The lead of our RWBY article includes the phrase "Huntsmen" and "Huntresses", following my adding of quotes, since it's in-universe terminology. That point is important since we can't just say "hunters", which to the best of my knowledge is gender-neutral, because what they actually do in-universe is more military service. However, virtually all the important characters in the show, including the four title "Huntresses", are female, which makes me wonder if it would be better to say "Huntresses" and "Huntsmen".

Setting aside the "don't use in-universe terminology" solution (which I personally like but would never fly on this kind of article), what's the policy here? Are we allowed decide based on factors like the above point about the prominence of female characters to change the order? I suspect (can't seem to get hit-counts from GNews on a mobile device...) that the majority of third-party reliable sources (as well as at least one Wikia site) prioritize "Huntsmen", but that could be because of latent sexism, which is concerning...

Hijiri 88 (やや) 08:16, 21 July 2018 (UTC) (Stricken because a more careful check revealed no one puts "Huntresses" first. That might be deliberate satire on the fictional patriarchy being portrayed, but that gets into OR and SPECULATION territory. Sorry for jumping the gun. My bad. Hijiri 88 (やや) 08:48, 21 July 2018 (UTC) )[reply]

To be honest I think you might be overthinking it. Its probably just as simple as the various sources out there copying the press releases/blurbs from Rooster Teeth themselves rather than any latent sexism. Who (as far as I can tell) always use 'Huntsmen and Huntresses' in that order, unless specifically referring to a particular character. Only in death does duty end (talk) 08:27, 21 July 2018 (UTC)[reply]
Well, yeah, but wouldn't that just be latent sexism on RT's part? (Note that I say "latent" sexism because I don't for a second think there's any malice on their part, just habits ingrained in them from being raised in a partiarchal culture.)
That said, I did just Google RWBY "Huntresses and Huntsmen" and found that there was a total of one news result. (I initially didn't bother with this search since I assumed the above mobile device problem might make it pointless.) So I guess the question is kinda moot because, sexism aside, Wikipedia probably shouldn't be using in-universe terminology that even the producers don't use.
Hijiri 88 (やや) 08:48, 21 July 2018 (UTC)[reply]

Do we italicize the names of toy franchises?

I'm not really sure where to ask this, but I guess I'll ask it here. Should the name of a toy franchise be italicized? See Lego Friends. I've seen this a few places lately, including at Transformers, but not at Garbage Pail Kids. So, I'm confused. Thanks, Cyphoidbomb (talk) 01:26, 22 July 2018 (UTC)[reply]

No. If anything, the GPK case has a better claim to italics, since they're technically a published serial work (collectible cards) not toys. The GPK article is stylistically all over the the place, mis-capitalizing in headings, putting "scare quotes" around Facebook, etc. Haven't looked at the Transformers stuff. I would guess that someone's been italicizing it because they've seen some other franchises italicizing and are just copy-catting the style.

Anyway, we used to have clear instructions about this, and I wonder if they've been lost or dis-clarified. They were to not italicize a franchise, trilogy or other book or film series, fictional universe, or other mass of works and products, or reference thereto, unless and except where it is named after the title (not partial title) of the original work in the series: Thus, these are okay: the Star Trek franchise, Asimov's Foundation series, the Star Wars Extended Universe novels; but these are not: Tolkien's Middle-earth fiction and the films and games developed from them, The Harry Potter series, the Marvel Cinematic Universe. Some serial works come with an over-arching series title in addition to the works' titles, e.g. The Chronicles of Thomas Covenant;[a] this shouldn't be italicized, as it just confuses as to what the franchise name versus the book titles are. This may be a matter of MOS:TITLES not being fully centralized yet. As with MOS:BIO until the recent merges, the work-title-related stuff has been scattered through various guidelines.

The off-WP styles vary, but overall seem pretty close to this rule, or even to italicizing less, i.e. not italicizing franchise names at all. Sometimes they're put in quotes, sometimes italicized, more often neither. This un-stylizing habit pre-dates common fiction franchises, and evolved from treatment of non-fiction series of works (which are distinguished from single works published in a number of volumes, as many reference works and major monographs, like The Golden Bough are).
 — SMcCandlish ¢ 😼  04:05, 22 July 2018 (UTC)[reply]

  1. ^ Off-topic public service message: I must warn everyone away from the Thomas Covenant series. It's enormous – each book in the trilogy of trilogies is about 500–900 pages – and almost as detailed as any other F&SF series there is, but once you wade through it, you find that much of the plot was futile, and you'll be disgusted by the protagonist, nor caring much about the fools going along with him. If I could reclaim any fiction-reading time I've ever spent it would be that series, and I stopped at book 6 after fighting the urge to do so at book 4. So, if you just must investigate it, stop at book 3, before it really goes off the rails.

The endless "fan-capping" problem

We really need to do something to clarify the wording in main guideline and (where applicable) at MOS:CAPS, MOS:TM, MOS:TITLES, WP:NCCAPS, WP:OFFICIALNAME, etc., to stem the tide of fandom-based over-capitalization: "WP has to use the weird capitalization in this company logo/movie poster/album cover just because it's official and because business/entertainment magazines do it".

On a nearly daily basis there are either a) new RMs to move articles to MoS-non-compliant names to mimic logos and and other marketing, or b) fandom-based opposition to an MoS-compliance RM, because it doesn't match the over-capitalization on the album cover or the director's blog. A common variant of this is denial that the MOS:5LETTER rule (capitalize a preposition in the title of a work only if it is five letters or longer) exists or can be applied, simply because it's not the style used in news journalism.

One egregious case was the repeated fight to try to move the song article "Do It like a Dude" to "Do It Like A Dude" complete with capital A (yes, really) as well as capitalized preposition, to match the font styling on the cover the single. An ongoing one is Talk:Spider-Man: Far From Home#Requested move 14 July 2018, swamped with WP:ILIKEIT votes that are ignoring all WP:P&G arguments.

This is sucking up way too much editorial time at RM, and the discussions are always circular rehash. It's a constant firehose of WP:LAME. Something in the guideline wording needs to be adjusted to curtail this stuff. I can take a stab at it at some point soon, but would rather have some additional eyes and brains on it. Where are the weak spots? Why is it not getting through? How can so many people, even when pointed directly at the guidelines, all saying the same thing in different wording, still somehow not understand?
 — SMcCandlish ¢ 😼  14:43, 23 July 2018 (UTC)[reply]

  • I know this is "locking the barn door after the horse has fled"... but this all stems from our decision to put article titles in sentence case instead of title case. That was a bad decision, looking back at it (90% of the arguments would have been avoided if we had decided to use title case)… unfortunately, fixing that decision is now unworkable. Blueboar (talk) 14:51, 23 July 2018 (UTC)[reply]
    I felt that way when I first arrived; I really hated the sentence casing. But if we'd picked title case it would have made disambiguation a lot messier, and would make it harder to tell whether something was about a proper noun a lot without actually going to the article. I think that's why sentence case prevailed. A decision way before my time.  — SMcCandlish ¢ 😼  19:58, 23 July 2018 (UTC)[reply]
  • (edit conflict) I agree with the thrust of SMcCandlish's post. Indiscriminate capping is indeed a problem—and Blueboar, in my view rendering article titles in sentence case was the sanest formatting decision ever taken in the early days of en.WP. Sometimes we could be forgiven for feeling that fan-capping and vanity-capping is a violation of WP:POV, at least in spirit. Every self-respecting publishing house imposes its own rules (especially WRT capping, which has no influence on google searches). One reason is that inconsistency drags down the subtle sense of a publication's authority. Tony (talk) 14:59, 23 July 2018 (UTC)[reply]
    Of course you agree! You two are the main promoters of this absurd rule. WP has no business fiddling with the title of works where the actual title is clear, especially using its own home-cooked rules. I'd welcome discussing the question, and would hope that commonsense would prevail, and WP:COMMONNAME rule supreme, as it should. Johnbod (talk) 15:05, 23 July 2018 (UTC)[reply]
    Been over this already [4] (many, many times). And COMMONNAME is not a style policy; WP:AT and the naming conventions defer to the Manual of Style on style questions.  — SMcCandlish ¢ 😼  20:10, 23 July 2018 (UTC)[reply]
  • A consistent factor in the debate is the issue of "common name" versus "common style". Those who support the current position of the MoS argue that name and style can be separated, so that we decide on the name based on sources, but then apply our own styling based on the MoS. I fully accept this general approach, but long ago (during the debates about the capitalization of organism names) I asked those who espoused name/style separation to provide clear definitions and explanations of when orthography could be changed, as it is 'merely' style, and when it could not, as it conveys important components of meaning. I think it's important for those who support the status quo to try to step back and see things from a less committed perspective. For example, the MoS regards some capitalization in sources as 'mere' style, and so of no semantic importance, even though the capitalization is clearly noticed by many editors, who faithfully copy the source. On the other hand, the MoS imposes style choices, such as length of dash, that evidence shows many (if not most) editors don't notice and so don't naturally copy. This difference seems inconsistent to many editors, which, I think, is one reason for the endless re-opening of old debates.
So to answer User:SMcCandlish's question, I think that more rigorous and reasoned definitions and explanations of the differences between "name" and "style", as relevant to the MoS, might help.
What doesn't help or contribute to reaching consensus is giving positions you don't agree with prejorative labels, like "fan-capping", when many editors see it as just following the sources they use, which in other contexts is laudable. Peter coxhead (talk) 19:11, 23 July 2018 (UTC)[reply]
The whole problem with the idea (as has been explained to Johnbod, et al., many times, but they just refuse to hear or accept it), is that different kinds of publications have one rule or another about how to treat prepositions in titles of works. The same work will have its title rendered differently depending on who's writing about it, in the real world. News journalism usually uses a four-letter rule (sometimes even three, depending on publisher, with marketing style leaning toward one letter), at one extreme, while academic journals tend to go with never capitalizing any prepositions, even long ones like throughout and alongside. We and various others have a middle-ground approach, a compromise. But if all you read is newspapers and magazines, all you're exposed to, pretty much, is the four-letter rule, and you get the impression that it's The One True Way to write English. This is obviously an illusion. Pick any well-known book with "from" or "with" in the middle of its title. You'll find that journalism sources render it "From" or "With", academic ones virtually always lower-case them, and other kinds of publications vary widely.

There is no "official spelling" obeyed by everyone. It's a fantasy. And a weird one. I have never in my entire life encountered a book author, movie director, etc., throwing a public tantrum because a book review or a film journal used "from" but the artiste's marketing materials use "From". Seriously, no one cares, except a small number of Wikipedia editors. WP:RM routinely resolves to follow MOS:5LETTER, time after time. Yet people who focus on entertainment magazines and websites as their sources never, ever stop trying to forcibly capitalize "with" and "from" in works they like. They don't go around doing this to titles of obscure works of non-fiction, or songs people have probably not heard unless they're over 60; they do it with current pop-culture topics that they're big into. It appears to be another strain of the "I want to capitalize this because it makes it seem important" thing; the same emphasis-caps urge that we have to deal with a lot more broadly. But this fannish version of it is just really common, and really tendentious.  — SMcCandlish ¢ 😼  19:55, 23 July 2018 (UTC)[reply]

Perhaps it is time to accept that our MOS is out of sync with what our editors want. Continually telling people “but what you want is WRONG” is pointless if no one wants to listen. Blueboar (talk) 20:16, 23 July 2018 (UTC)[reply]
Except that if these voices of complaint are coming from fans (and reading between the lines, dedicated fans) there's a bit of COI - not actionable! - here to demand that MOS is wrong. The argument reminds me of the past situation with MMA and current with wrestling in general that "but for our area, we need our rules!" They're not seeing the bigger picture that a MOS is meant to provide, which is a general reading and editing consistency for WP. I would say the community is listening, but simply not accepting the argument that the one topic area needs special rules here, particularly one based on pop culture. --Masem (t) 20:52, 23 July 2018 (UTC)[reply]
Right. The same 10 or so people – out of around 30,000 monthly editors – pursuing pop-culture over-capitalization again and again no matter how many times consensus turns against them is not an indication that our guidelines are broken.  — SMcCandlish ¢ 😼  06:05, 24 July 2018 (UTC)[reply]
This problem can be summed up (kind of) with the Wikipedia Stephen King story collection Four past Midnight title, which seems wrong on many levels. It's not the name that King uses, nor his fans (and no, I am not a dedicated fan, read him long ago but he lost me in the decades he started writing 7,000 page books), nor the world at large. This one example stands out as defining "what's wrong" with the hard-and-fast rule on how many letters a word has to have to be capitalized in a title. The other major problem is MOS says that if something isn't capitalized "consistently" (which some editors define as always, no exceptions) then it must be lower-cased, even if the vast majority of sources and common sense itself deem that upper-case is the way to go. That MOS point often gives naming-rights to a few people, those who write the sources. They, probably out of ignorance or research-laziness, fail to upper-case something, and that error then flows into Wikipedia where it can be pointed to as non-consistency, and thus brings non-common use styling into this project. Solving it should not mean adding even stricter language to MOS, but loosening it up to allow common-sense and most familiar names in English to be considered as important and viable components in capitalization decisions. Randy Kryn (talk) 00:17, 24 July 2018 (UTC)[reply]
Here is the n-gram for Four past Midnight, published in 1990, which seems relevant to this discussion. Randy Kryn (talk) 03:55, 24 July 2018 (UTC)[reply]
I've suggested before that we could consider a change, to capitalize short prepositions that are often are not prepositions (Past, Like, etc.). Just because various style guides treat all prepositions, by length, as a class doesn't mean we are forced to, especially given doubt among linguists that the "preposition" categorization is actually valid rather than an obsolete idea from early-20th-century approaches to language (for an easy-reading explanation of this, and a tremendous amount of good writing advice, see Steven Pinker's The Sense of Style, which covers it in detail without miring the reader in linguistics jargon; IIRC, it's covered in ch. 4, "The Web, the Tree, and the String", which should be required reading before anyone can edit this site. >;-) No one's bothered with an RfC suggesting such a change, and instead they just try to re-re-re-litigate their preferences at RM after RM. It's a productivity drain for everyone. And that is all it is. It's not editorial cluster A's preferences versus cluster B's, it's A's versus something like 5 consistent guidelines and thousands of previous RM closes. But such a change still would have no effect on "with" and "from". Over-capitalizers of these just need to let it go. It's a classic specialized-style fallacy, the silly notion that sources reliable about a topic (e.g. who has been cast in an upcoming Spider-Man movie) are reliable sources for how Wikipedia must write and style prose about the subject.  — SMcCandlish ¢ 😼  06:19, 24 July 2018 (UTC)[reply]

PS: An N-gram on a pop-culture topic is utterly meaningless for capitalization analysis of titles of works, because around 90% of the material written about such topics is entertainment journalism, which all follows the four-letter (or even shorter) rule. I.e., it's circular reasoning, begging the question, cherry picking (in the off-WP sense, a.k.a. fallacy of incomplete evidence), all at once. I think people have difficulty with this because they mistake COMMONNAME for a style policy and don't understand the reason we have the policy and why it's not a style policy. It exists so people looking for David Johansen don't end up at Buster Poindexter; it has nothing to do with forcing particular nitpicks of typography, and we have at least 5 guidelines against doing that to mimic "official" stylization.  — SMcCandlish ¢ 😼  06:38, 24 July 2018 (UTC)[reply]

@SMcCandlish: returning to your initial question, neither Wikipedia:Article titles#Article title format nor Wikipedia:Manual of Style#Article titles explicitly cover the situation where the title of an article is the title of a work. Perhaps it would help to add something here, or at least to add links to other places in the MoS. When What Is To Be Done? is used as an example in the MoS, you can understand why editors might choose "Four Past Midnight" or capitalize "with" in the title of a work. Peter coxhead (talk) 06:59, 24 July 2018 (UTC)[reply]
Hmm. That appears to be an error. I'm surprised it wasn't noticed sooner. Interestingly, even WikiProject_Russia's literature task force has it as What Is to Be Done?, so it's odd that the "To" version has arisen. Anyway, no such over-capitalization shows up in any example of titles at MOS:TITLES, MOS:CAPS, etc., as far as I can tell. Anyway, "the situation where the title of an article is the title of a work" might well be part of the issue. Maybe the assumption that covering it at MOS:TITLES is enough is a poor assumption. A cross-reference, at least, couldn't hurt.  — SMcCandlish ¢ 😼  07:27, 24 July 2018 (UTC)[reply]
Nope. I have not been involved in the discussions about any of these titles. Blueboar (talk) 10:38, 24 July 2018 (UTC)[reply]
  • I echo the objection to the use of pejorative phrases like "fan-capping" or implication of the motives of anyone involved has anything to do with their level of fan involvement. The highest principle in WP:Article titles, that topics are named based on reliable, secondary sources, stems from WP:Verifiability. These are core POLICIES compared to a MOS guideline which continues to expand beyond its initial purpose of addressing technical limitations of Wiki software and is now becoming an WP:OR bible of usage. The problems related above stem from MOS advocates pushing this set of guidelines too far to the front. A MOS is fine for describing how we should handle original, subjective prose in articles, but cannot be used to override hard facts (like titles of works and other proper names) which are directly presented in reliable sources. So no, we cannot continue to alter commonly-accepted titles of a works based on a set of guidelines we've created ourselves - except where the common presentation of such is incompatible with the wiki software or other practical concerns. -- Netoholic @ 08:27, 24 July 2018 (UTC)[reply]
  • I agree with Netoholic. Not only in the pejorative way this topic is stated, but in the general "tail-wagging-the-dog" mentality here. I'm sorry, but who the hell do we think we are? We have zero moral right to tell an author their title is wrong just because it offends some rando on the internet's idea of proper grammar. Can unusual styling be promotional? Yes. Does that mean we should never use unusual stylings to maintain NPOV? Absolutely not. Maybe the author/director/producer chose such a style for promotional reasons, or maybe they had an artistic purpose. If you don't know the reason, then you have a moral obligation to the author's freedom of expression to respect their articstic choice and use their styling. Period. Anything else is bollocks. oknazevad (talk) 10:00, 24 July 2018 (UTC)[reply]
  • Hi everyone. Personally I think Masem sums it up best: MOS is supposed to provide a feeling of consistency across all topics and articles. As for the specifics, I'm also in favour of applying standardised capitalisation in nearly all cases (as stated above, it makes it much easier for readers of an encyclopedia, rather than following the whims of branding and advertisers). There are a few cases where reliable sources all tend toward using the owner's stylisation (e.g. iPod, eBay, etc.), which is absolutely the right thing to do, but for titles of individual comic books, cartoons, etc. there are often not enough serious reliable sources that use consistent style guides and have professional editorial oversight that cover them for us to follow their conventions (e.g. it's not unusual for pop-culture artifacts like this to be reviewed by one or two borderline reliable websites by semi-professional writers, and only edited cursorily – this isn't really a strong precedent to follow, and I would say that, unlike iPod etc. above, there is in practice no fully established consensus among reliable sources on how these specific stories should be capitalised, as they are not covered widely enough). ‑‑YodinT 13:49, 24 July 2018 (UTC)[reply]
    • Also, following the main thread of the discussion, I'd agree with pretty much everyone above that this whole process is a productivity sink. I can't see a way forward that would help with this in practice, but would just say that my impression is that it's essentially about what casual editors see as looking "right" rather than trying to make their topic more important (though no doubt they might also try to do this in other ways) – they look up Four past Midnight or whatnot, think to themselves "that just doesn't seem right", and then (and normally only if they really care about the topic...) they try to get it changed to be all initial capitals. We absolutely could change our style guidelines on the capitalisation of titles of works (following modern journalistic style guidelines for example), but then the exact same process as above would play out in reverse, but for fans of traditional grammar/capitalisation, who would invest the same amount of energy in trying to get it reverted back to what we have now. On the one hand, the grammar-fans might perhaps be more likely to understand that it's our convention, and just to accept our style guidelines even if they disagree... (maybe a bit optimistic...), but on the other my impression is that it might alienate them further from Wikipedia in a way that pop-culture fans wouldn't be put off. Just a few thoughts. ‑‑YodinT 14:30, 24 July 2018 (UTC)[reply]
  • I generally look at it as: If we wouldn't change the capitalisation on someone's name, we shouldnt on the name of other works. Where the capitalisation as part of the title is clearly evident (most often in books, film titles etc) and not a function of the logo/trademark (often seen in companies with allcaps/lowercase etc) then really WP:COMMONNAME does apply. Capitalisation has never been accepted as solely a style issue rather than a naming issue, which is why the RFC is getting the results it is. And really the point of the MOS is that is it meant to be a guide of best practice for the majority of situations with some exceptions. On this issue there are too many exceptions that can be easily argued makes the MOS guidance not useful. If it causes more problems than it solves, its not useful guidance. What would eliminate most of the conflict on ENWP would be stating where the article title matches a creative work, capitalisation is deferred to local consensus. Only in death does duty end (talk) 14:49, 24 July 2018 (UTC)[reply]

Citations in the lead

This text was added to WP:MEDMOS, based on this discussion.

Consensus was not gained that this change is in concurrence with project-wide MOS. It has not been determined that statements in the leads of medical articles are more likely than any other type of article to be challenged, and the main reason for this push for citations in the lead has been for the (external) translation project, which translates only the leads of medical articles (a separate problem in and of itself). Many examples have been given over the years of how this demand for citations in the lead compromised the summary aspect of article leads. SandyGeorgia (Talk) 14:57, 24 July 2018 (UTC)[reply]