Xover

Pellucidar

Latest comment: 3 months ago19 comments3 people in discussion

The publication year is 1962, but the copyright year is 1915. The license is correctly PD-US with the death year of the author, but the revised templates interpret the publication year as the year for copyright, and I have no idea whether there is any way to correct the issue without revising template code. --EncycloPetey (talk) 05:19, 24 May 2024 (UTC)Reply

@EncycloPetey: In the context of the license template, the year argument is the year of first publication because that's what affects the copyright. The year of subsequent editions only becomes relevant if there is copyrightable new matter, in which case the year argument should be the year of first publication for the last published of any copyrightable matter.

This is not a very intuitive usage, but I think with copyright law being what it is there is little alternative. Xover (talk) 05:53, 24 May 2024 (UTC)Reply

That presents a problem if we have multiple editions. But in this case, it's an edition published by Ace Books. Claiming a publication year of 1915 would be nonsensical here because (a) 1915 was the year of first publication, when the story was serialized, in pulp magazines, and (b) Ace Books did not exist in 1915. We need a means to distinguish year of publication from year relevant for copyright. --EncycloPetey (talk) 06:03, 24 May 2024 (UTC)Reply

Maybe just rephrase "because it was published before January 1, 1929" to specify that we're talking about the (latest) year of first publication? —CalendulaAsteraceae (talk • contribs) 04:58, 25 May 2024 (UTC)Reply

@EncycloPetey: I am (as usual) being slow and failing to see the problem. Sorry. Why does using year of first publication create a problem when we have multiple editions? Why is it a problem that the license tag, way down at the bottom of the page, references the date that's relevant for copyright purposes (the {{header}} contains the date relevant for bibliographic purposes)?

Would simply changing the wording that is displayed for these cases to be specifically about first publication (along the lines Calendula suggests) resolve the issue? Xover (talk) 08:22, 25 May 2024 (UTC)Reply

If we use the date of first publication, then all editions have the same date, even if the specific copy was issued decades or centuries later. This will create a lot of confusion for readers looking at versions pages, where all the dates are now the same for all the versions. It will also create a problem for disambiguating versions by maninspace name where the key difference is the date the edition was issued; resulting in editions with one date in the page title (for disambiguation) that is different from the date in the header. How many years of future explanations and minor edit warring would this lead to? --EncycloPetey (talk) 17:24, 25 May 2024 (UTC)Reply

@EncycloPetey: If I've suggested that then I am most definitely confused. Sorry!

What I mean is that in the {{header}}, in the wikipage name, and on versions pages, we use the actual year of publication for the edition of the text we are reproducing. But in the licensing template—because it expresses copyright status and copyright cares about a) the abstract work more than the concrete edition, and b) the year of first publication—we use the year relevant for copyright, which is the year of first publication.

This obviously isn't ideal because you have to use different dates in the header and in the license template, but it is sort of necessary since they express different things (copyright vs. bibliographic domains, essentially).

Did that make more sense, or am I still confused? Xover (talk) 18:44, 25 May 2024 (UTC)Reply

Understood, but in this case, the License template compared itself automagically with the header date, and decided there was an error. And the header documentation offers no explanation of what's going on. Now, I have since determined that PD-US has options for coding that will solve the issue, but other licensing templates, such as PD-old do not. Until the license and header templates are coordinated and co-documented, we're going to get errors generated by template magic. --EncycloPetey (talk) 19:18, 25 May 2024 (UTC)Reply

Oh. Hmm, I think it's beginning to dawn on me. {{PD-US}} is probably not looking at what's been put in {{header}} but rather is pulling the year of publication from the attached Wikidata item when one isn't provided as an argument to the template. I'd need to check the code to be sure, but that seems a likely explanation. I wasn't aware it did that, hence my confusion.

That will, as you say, tend to cause some confusion. I am not entirely sure we can safely use publication dates from Wikidata for license templates, since at least most what is there today are bibliographic dates and not copyright dates. We'd either have to walk up the tree to the work level item to get the date there, or look for a Wikidata property that specifically is for copyright. But then we get into the different licensing policies between projects and other weirdness. Maybe it would be best to just not try to automatically get the date, making specifying one manually mandatory? It may be the data we need is reliably(ish) available at Wikidata, but I'd need to do some digging to be confident about that. @CalendulaAsteraceae: Have you done any research into that when you worked on {{PD-US}}? Xover (talk) 20:08, 25 May 2024 (UTC)Reply

My recent experience is that a lot of our new copies are attached (incorrectly) to work data items instead of edition data items. And Wikidata currently does not distinguish publication date from print date from copyright date. --EncycloPetey (talk) 01:03, 26 May 2024 (UTC)Reply

Indeed, {{PD-US}} is referring to Wikidata, not the header (and I'm not actually sure how it could refer to the header). The relevant code is in Module:License Wikidata.

Wikidata technically has a copyright property, but it's inconsistently applied, doesn't always include the reasoning/jurisdiction, and is IMO generally less useful than just getting the year. (I added an example at Last Poems in case you're curious.) Probably walking up the tree to the work level (if needed) is the way to go. —CalendulaAsteraceae (talk • contribs) 04:22, 26 May 2024 (UTC)Reply

I added code to Module:License Wikidata to get the date from the parent work item if available. I think this should make the dates reliable enough for copyright purposes, given that we can override as needed. I will also say that I've found the automatic year determination and addition to Category:Possible copyright violations helpful for catching cases like The Crane is My Neighbour, which was inappropriately tagged as PD-US even though it was published in 1938. —CalendulaAsteraceae (talk • contribs) 04:45, 27 May 2024 (UTC)Reply

@CalendulaAsteraceae: Awesome. Thanks!

@EncycloPetey: Could you have a look at various cases and see if the behaviour is now more in line with expectations? We'll need to watch this and see if further tweaks are needed. Xover (talk) 05:39, 27 May 2024 (UTC)Reply

If it's pulling the date now from the work data item, then what happens for translations of classical Greek and Latin texts, where the relevant date is the modern date of publication and/or the date of death for the translator? --EncycloPetey (talk) 15:45, 27 May 2024 (UTC)Reply

@EncycloPetey: I'll check. Could you point me to an example where the translator died less than 100 years ago? —CalendulaAsteraceae (talk • contribs) 02:24, 28 May 2024 (UTC)Reply

Examples: The Antigone of Sophocles (1911); Trojan Women (Murray 1905); Sophocles' King Oedipus. --EncycloPetey (talk) 04:30, 28 May 2024 (UTC)Reply

Hmm, interesting. Translations are somewhat unique in the context of copyright since translating something inherently creates a new copyright, unlike most new editions of an extant work. Wikidata's current model doesn't distinguish between a translation and any other form of edition, so we can't detect translations and apply special logic to those.

I am increasingly sceptical that we should be pulling years from Wikidata for the license templates. We want people to actively assess the copyright, not just slap something on there, and autofilling from Wikidata somewhat undermines that. When in addition most of our translations are going to get very incorrect years and we can't even detect that it's happening, I think we may be at a point when the problems outweigh the usefulness.

Or does anybody see any brilliant solutions to this that I'm missing?

What we could consider, is pulling the "copyright status" from Wikidata if not explicitly given. "Copyright status" is a derived property, so it means someone has at some point made some kind of assessment (unlike calculating it from a raw datum like a year). It is also specifically about copyright rather than the year of first publication (bibliographic) so we wouldn't be repurposing data from one domain to another.

Given the poor state of this data on Wikidata I'm not at all sure it'd be worth it, though. Xover (talk) 05:10, 28 May 2024 (UTC)Reply

It would mean pulling multiple copyright status values and combining them to get the right template. And it would still require applying the correct death year based on the latest date from among the author, translator, illustrator, possibly editor, and possibly someone else, and any of those could have multiple persons listed. --EncycloPetey (talk) 14:04, 28 May 2024 (UTC)Reply

No, I don't think it would work to grab all years and try to figure out the right one. If we went down that path it'd have to be trusting the copyright-specific properties already set at Wikidata and then just applying that license directly. See d:Q19092354, where there are two statements for copyright status (P6216). One that says in the jurisdiction United States of America (Q30) the status is public domain (Q19652) as determined by published more than 95 years ago (Q47246828). And one that says that in the jurisdiction countries with 80 years pma or shorter (Q61830521) the status is public domain (Q19652) as determined by 80 years or more after author(s) death (Q29940641).

I am not at all sure this would be a good idea, mind, but that's the approach I see having some chance at having any kind of reasonable accuracy. For items when nobody has set the copyright related properties we should probably just emit an error message asking the user to supply them. Xover (talk) 18:49, 30 May 2024 (UTC)Reply

two djvu to redo?

Latest comment: 2 months ago4 comments2 people in discussion

I have two recent IAUpload creations whose OCR is off by at least one. And, I am having format envy because the fæbot pdfs are larger. At your earliest convenience could you look after File:The Wireless Operator with the U.S. Coast Guard.djvu and File:Jack Heaton, Wireless Operator (Collins, 1919).djvu?

Also, I am used to you being around more!!--RaboKarbakian (talk) 17:51, 7 June 2024 (UTC)Reply

@RaboKarbakian: Both files regenerated and reuploaded. No need for format envy because the file size reduction comes from excessive downsampling and compression (i.e. it reduces quality).

Yeah, I used to have more free time for wiki stuff, but real life is sorta eating up my every spare second lately. Sorry. Xover (talk) 07:25, 9 June 2024 (UTC)Reply

No sorries! Less than 2 days turn around, 3 times the size, and files that tersely and tactically point to the previously filed bug report. Thank you so much! And that real life treats you as well as you treat the wiki.--RaboKarbakian (talk) 10:12, 9 June 2024 (UTC)Reply

OCR again. At you convenience{?}: File:Wireless Telegraphy and Telephony (1908, Massey and Underhill).djvu--RaboKarbakian (talk) 20:43, 21 June 2024 (UTC)Reply

Hamlet

Latest comment: 2 months ago1 comment1 person in discussion

If you have the time, there is one page of Notes (p 155) and seven pages of Appendix (pp 177–183) that I cannot Validate, since I am the person who proofread them. If you can validate those eight pages, then all the end matter will be validated. --EncycloPetey (talk) 23:13, 7 June 2024 (UTC)Reply

A beginner's questions about editing advice:

Latest comment: 2 months ago3 comments2 people in discussion

You reverted one of my formatting revisions last night, with the comment, "Whenever there is a newline involved (or even large amounts of text) you should prefer the block variants (/s + /e) of templates, and serif is (almost) never applied manually (dynamic layouts apply that)", which prompted a fair amount of headscratching, mainly about block variants, but also about typefaces, which is probably the easier subject to begin with.

Serif text

I have gone back and removed the serif template from everywhere that I had specified it in the contents of this book, save possibly the title page, since I foresee that this is likely to be an issue if I do not gain a fuller understanding of the subject first. As a preface, I note that 99% of source texts use exclusively Roman type, even though Gothic (sans-serif) has been widely used for over a century (at least, it appears as a secondary typeface in newspaper headlines from the early twentieth century, though usually not in the body of articles). In fact it's still preferred for nearly all books, magazines, newspapers, and a wide selection of other published items, probably because people prefer its appearance and how readable it is.

Even many online sources prefer Roman type. So it looks odd to render a book written entirely in Roman type—especially when it seems to have been the deliberate and artistic choice of the author and/or publisher, and would be expected in a work due to its age and/or genre—entirely in Gothic type. But the internet being what it is, and Wikisource relying primarily on Gothic type for convenience, nearly everything here does so anyway. Rather than trying to fight the tide, I had thought that perhaps using the serif template for chapter or section headings or poem titles, where the size of the text makes its typeface rather obvious, and accessibility is unlikely to be at issue (these days, even smartphones have a high-enough resolution to where font choice is unlikely to affect readability), would be acceptable, at the cost of nothing more than being perceived as a fussy editor who had not yet come round to the futility of the effort.

On a completely different topic, another editor, whose topic of interest seems to be primarily table of contents formatting, keeps trying to explain to me that things like maximum page width should not be set, because readers will be using different settings, rendering it officious to limit say, the table of contents to something approximating the width of the widest pages in the rest of the book. I can only assume that the same reasoning lies behind the advice not to use the serif template—that typefaces are determined by the reader's device, and nothing we can do as editors will ordinarily affect what they see. This is a shame, if the case, since I doubt more than one in a thousand readers even knows both that this is an option, and how to do it; but if so, then perhaps it is entirely futile to specify that any line of text should be in serif.

But then, would the template serve any useful purpose, if it will automatically be disregarded by reader preferences? If it's not normally disregarded, then isn't it fair to use it for large type, or titles where it's rather conspicuously used in the original text? And is there some archived community discussion I have yet to encounter that goes into the specifics in more detail than I have even considered?

Newlines and block variants

I'm familiar with wiki markup and simple template usage, but HTML/CSS coding and similar topics are a stretch for me. Usually I learn formatting through trial and error, and until recently I had little reason to worry about being reverted for any reason other than that another editor didn't like my changes. But "newline" is a term I don't know that I understand, despite it looking transparent. Do you mean "any time that text is broken into lines", or only when it's typed on separate lines, but not when broken using tags (and, BTW, does it matter whether it's written or or )?

The text on the page you reverted is using the "fine block" template, which I find puzzling because it's not small text; some of it is small, but the largest part is obviously large text, and the "fine block" template seems to have a standard size that is smaller than the default, though its documentation says that size can be specified using "the appropriate inline template ()", something that is not done on the page in question. So we are back to HTML, or something else like it, which wiki markup and templates have generally spared me having to learn, and I am a bit confused as to why the number of lines involved requires such complex formatting. Obviously I cannot use "(/s +/e)" with any of the font size templates other than "fine block", as they do not seem to have them.

The text on the dedication page gets around the different font sizes in the original, in part by using the "all small caps" template, the advantage of which over simply typing a line of smaller text in all caps eludes me (obviously the purpose in this instance is to vary the text size). But I am trying to figure out the best way to render text that varies in size and weight and alignment from line to line, so that I will not have to worry about messing up page coding, or being reverted due to having done it wrong. And if there is no way to do this using simple markup or templates alone, then is there some kind of tutorial that explains how to do it right? Trial and error are becoming quite laborious, and I am worried that it will all be for nought if someone goes through my attempts to replicate even the approximate appearance of text on pages with a fine-toothed comb. P Aculeius (talk) 13:10, 25 June 2024 (UTC)Reply

@P Aculeius: That's a very well-considered set of questions that deserve a well-considered answer, which I sadly don't have the time for just now. I'll try to get back to it when real life is being less recalcitrant, but for right now it'll have to be the abbreviated version…

In general, typographic detail at the level you are discussing is an artefact of the printing shop, not the author, and our primary goal is to faithfully reproduce the author's intent. There are known cases when we know the author had a hand in this kind of decision, but that's very exceptional. In addition, we do not aim for diplomatic transcriptions: approximate representations are sufficient. And web technology being what it is it is impossible to achieve anything resembling the pixel-perfect control this level of detail implies, and attempting it generally just makes things worse. "Big" and "small" text perceptually depend on the web browser's font size, screen resolution, and browser viewport (window) size. And, finally, we have a gadget that dynamically applies styles based on various sources—you can set it per-page with {{default layout}}, and each user can override this, and turn use of serif fonts on and off, etc.—which does not work if you force serifs with {{serif}}.

All this should be much better documented in our help pages, but sadly we (as a community) are pretty bad at doing so. This stuff is the result of organic trial and error over ~20 years, so there's rarely One Tru Discussion™ that establishes all these various rules. Infuriating and frustrating for the newcomer (trust me, I've been there), but that's where we are.

On block templates… whenever you have non-trivial amounts of text (possibly including other templates and wikimarkup) you should strongly prefer /s+/e templates for the simple reason that they are much much more robust and less prone to mysterious failures in those cases. They also happen to be a lot more readable and easier to understand than a mess of inline templates, and especially when you start adding   into the mix, but those are secondary to the main reason. The short version of why is that the /s+/e variants spit out the necessary start and end html tags independently of one another, but with the inline variants you get MediaWiki template argument parsing involved (sensitive to various non-obvious special characters, has magical rules for whitespace, etc.).

About  : it doesn't really matter whether you use  ,  , or  . Although some people care a lot about it there's no real technical reason to do so that's worth fighting about. We may at some point use a bot to normalize these to one variant or another, but generally you can use whichever you prefer. But the use cases where using   is appropriate are somewhat rare. A few title pages and colophons are the main uses, but for everything else you should think carefully before resorting to  . I know the impulse, but it's triggered by the default skin's use of increased line-height (1.6x iirc) which makes such text blocs seem unbalanced. It's not, really, and the skin changes these values from time to time, so it's not something to sweat overly much. It's much more important (relatively speaking) to have clear and easily understandable code, and   is the antithesis of that. In particular, inside {{ppoem}}   is a bad idea.

Anyways… this was a bit hasty, so my apologies for that, but I hope it helped some all the same. Do feel free to ask if there's anything I can help with (I complain about being busy irl to excuse my tardiness in replying, not to suggest questions are not welcomed). Xover (talk) 14:00, 25 June 2024 (UTC)Reply

Thank you, that's actually fairly clear and detailed. Especially now that I've spent some time reworking my markup on the affected book from the dedication page forward up to index page 40, and am (I hope) getting the hang of the fine block template, as well as how to fix text size in ppoem for the couplets that precede some of Adams' own verses. My only bothers there are that it looks like there should be more space between the couplets and the poems they precede, but I could just insert a blank line if need be; and I'm not sure there's an ideal weight for the text, as plain looks a bit weaker than the original and boldface looks too heavy. But those are minor quibbles.

As for the serif text, I note that the example I was pointed to when reading about page layouts actually does specify serif for titles. But it also uses a layout template which, although easy enough to understand, says in its documentation that it's now disfavoured (the template, not serif text), and that CSS styles are now preferred. I'm not sure I'm up to learning CSS now, and I'm wary of employing a layout template that's considered outdated. Subsequent chapters in the same book just use the serif template for the titles. So I think perhaps this is acceptable, even though users can toggle between Roman and Gothic views: it looks as though text contained in the serif template remains in serif even if the body text is toggled to sans-serif. But I can understand why this would not be desirable for body text; and it's easy enough to go back and change the titles at some future time, so for now I'll just leave them sans-serif along with the rest of the text.

I hope that my work on this book is improving and generally acceptable so far. Thank you again for your reply, and I look forward to anything you have to add to it in future. P Aculeius (talk) 20:56, 25 June 2024 (UTC)Reply

Running headers (and footers)

Latest comment: 2 months ago1 comment1 person in discussion

I've only just realized that you're the one who created the index and most, if not all of the pages for In Other Words, and proofread them, and now I'm muddying up your work with fussy edits... your patience must be truly formidable!

Earlier pages all placed the title of the book or poem in the header using the {{rh}} running header template; later ones use the {{c}} center template. I've been replacing the latter with running header, but since there's only one thing to include—and it varies in terms of whether it's used and whether it consists of the book title or current poem—I was already wondering why running header was needed, before I realized that the person who just gave me a better understanding of how to use templates is the one who placed the ones I was replacing.

So is there any reason to use running header in this instance, or would it be better converted back to plain center? And if there's no advantage, then what distinguishes it from the running header template consistently used in the footer, which contains only a centered page number? At least that occurs on every page, though I don't know whether it should make a difference.

Meanwhile, based on our previous discussion, I've been using {{c/s}} and {{c/e}} constantly, in some cases replacing {{c|{{foo|bar}}}} that you had made... I really don't want to make a mess of things. Is it more a judgment call based on the complexity of nested templates than a question of one being superior to the other? Sorry to have so many questions... P Aculeius (talk) 00:47, 27 June 2024 (UTC)Reply

Replacing the Illustrator template

Latest comment: 1 month ago3 comments2 people in discussion

Hello. I have noticed you replaced the deprecated {{Illustrator}} template for the built-in parameter "illustrator", which is great. However, when the parameter was used for a specific subpage only and not for other subpages of the work, such as here, then the parameter "section_illustrator" should be usually used. Do you think it could be possible (and not too difficult) to find all other cases to change the parameter? -- Jan Kameníček (talk) 15:46, 1 July 2024 (UTC)Reply

@Jan.Kamenicek: I'm working on this, but it overlaps with a couple of other kinda knotty issues so it's taking some time. Xover (talk) 10:11, 12 July 2024 (UTC)Reply

I'll follow up on this at WS:BR#Replace illustrator header parameter with section_illustrator in subpages of works. Xover (talk) 12:05, 21 July 2024 (UTC)Reply

Portal:Federal_Government_of_the_United_States/Tab1

Latest comment: 1 month ago2 comments2 people in discussion

https://en.wikisource.org/w/index.php?title=Portal:Federal_Government_of_the_United_States/Tab1&action=edit&lintid=3236265

This was showing up as having fostered content , but in looking at it, the only thing I can think of is the <onlyinclude></onlyinclude> which should be ignored by the linter?

I think this is a false positive detection, along with the <section></section> tag issue on many of the other pages detected. ShakespeareFan00 (talk) 07:08, 5 July 2024 (UTC)Reply

It's probably just a false positive, yes. That whole portal should also be redesigned (it's copied over from enWP and doesn't work for enWS) without these pseudo-tabs, so I'm disinclined to spend much time on fixing what's there now. Xover (talk) 10:25, 12 July 2024 (UTC)Reply

Template:Information

Latest comment: 1 month ago3 comments3 people in discussion

I thought this had a box border around it? (It does on Commons). I'd been doing a lot of edits in preperation for "night mode" and wanted a second opinion as to those edits having caused the border to vanish. (I did check the underlying Styles and did not find anything that obviously that could have caused the border to vanish. ShakespeareFan00 (talk) 17:33, 6 July 2024 (UTC)Reply

I just synced the template with Commons, so hopefully this problem should be obviated. —CalendulaAsteraceae (talk • contribs) 15:20, 7 July 2024 (UTC)Reply

Thanks. I have on my todo to reimplement {{book}} and {{information}} from scratch because the Commons versions of these are a pain in the neck and use lots of Commons-specific stuff. Xover (talk) 10:18, 12 July 2024 (UTC)Reply

PPoem

Latest comment: 1 month ago3 comments2 people in discussion

Page:A Jewish Interpretation of the Book of Genesis (Morgenstern, 1919, jewishinterpreta00morg).pdf/338

Is this a bug or am I asking the >>> feature to do too much. ShakespeareFan00 (talk) 19:16, 11 July 2024 (UTC)Reply

@ShakespeareFan00: You're asking too much of default {{ppoem}}. >>> is initially for line numbers and such; so whenever you have longer text there will no longer be room within the right margin and things start looking wonky. You can make it work for these cases but it'll require fiddling with margins etc. in the per-work CSS. I've never been sufficiently hard up to have to do that (it's fiddly and a pain) so I don't have any existing examples to hand. Xover (talk) 10:23, 12 July 2024 (UTC)Reply

Do we need a quotation version of ppoem? ShakespeareFan00 (talk) 18:31, 12 July 2024 (UTC)Reply

Problems with automatic header

Latest comment: 1 month ago5 comments2 people in discussion

While we're on the subject, I may as well inform you of an actual bug in the automatic header (that I didn't bother with because I wasn't going to use the automatic header anymore, and don't have access to fix it myself anyway).

If the Index page has multiple illustrators (like Index:Lange - The Blue Fairy Book.djvu), the automatic header will display this as follows: illustrated by [[Author:H. J. Ford and G. P. Jacomb Hood|H. J. Ford and G. P. Jacomb Hood]]

If you're fixing bugs in the automatic header, this might be one to look into. —Beleg Tâl (talk) 15:46, 19 July 2024 (UTC)Reply

Heh, yeah, that's the one I'm currently looking into which is why I might as well try to fix other stuff while I'm at it. :) Xover (talk) 15:56, 19 July 2024 (UTC)Reply

I'm having another issue, not sure whether it's a bug or PEBCAK. I'm trying to add a contributor field to Poetical works of Mathilde Blind/Preface for Arthur Symons, but it's not displaying in the header. Any ideas?

PS. I appreciate all your work on this :) —Beleg Tâl (talk) 15:22, 23 July 2024 (UTC)Reply

If I understand it correctly, the <pages> tag can only pass its parameters to the ProofreadPage header template in order to override a field provided by the ProofreadPage Index template, and can't add additional parameters. So I'm thinking that we'd need to create a hidden field in the Index page for "contributor" (and any other such fields) that is always empty, which can then be overridden. Would that work? —Beleg Tâl (talk) 15:37, 23 July 2024 (UTC)Reply

@Beleg Tâl: That is the way it has to be done, yes. I'm hoping it may be possible to get something more flexible long term, but for now that's the approach. Xover (talk) 12:14, 24 July 2024 (UTC)Reply

poem tag question

Latest comment: 24 days ago1 comment1 person in discussion

On Wikisource:Scriptorium/Help I have an open topic about the poem tag formatting carrying over into footnotes included inside the tag. I've had no responses yet. --EncycloPetey (talk) 22:49, 12 August 2024 (UTC)Reply

Archive of Files missing machine-readable data?

Latest comment: 18 days ago2 comments2 people in discussion

Both categories have been cleaned up, and there are now only a few files that go in and out, so I thought maybe we should remove the {{DNAU}} and let the bots archive. Fine with that? — Alien333 (what I did & why I did it wrong) 18:10, 17 August 2024 (UTC)Reply

@Alien333: Indeed. Thanks for the reminder. Xover (talk) 17:24, 18 August 2024 (UTC)Reply

Daily News

Latest comment: 12 days ago2 comments2 people in discussion

Not really your fault, since Daily News is such a generic title, but you’ve got the wrong Daily News. Our Daily News was created for Daily News/1940/12/24/Cheated Death In Air Battles, Dies In Crash, which is for the New York Daily News, while G.K. Chesterton contributed to the Daily News of London (see The Daily News (UK) on Wikipedia). I’ll try to get scans of the relevant articles, but I can’t promise anything insofar as British newspapers (and library holdings) are concerned. TE(æ)A,ea. (talk) 17:54, 24 August 2024 (UTC)Reply

@TE(æ)A,ea.: Thanks. I'm not sure I can absolve myself of sloppiess here, because I really should have caught that. I've dab'ed the two and updated links etc. Interestingly, almost all incoming links were intended for the London magazine, so the New York title was somewhat of a squatter. Xover (talk) 18:56, 24 August 2024 (UTC)Reply

IRC #wikisource

Latest comment: 7 days ago2 comments2 people in discussion

Is there still any discussion over there? Because the few times I poked my nose around there wasn't. — Alien333 ( what I did
why I did it wrong ) 14:18, 28 August 2024 (UTC)Reply

@Alien333: Very rarely. But most IRC channels are pretty low-volume these days, so I don't know that #wikisource is any worse. Xover (talk) 08:07, 29 August 2024 (UTC)Reply

Page:Hans Andersen's Fairy Tales (1888).djvu/472 and {{img float}}

Latest comment: 2 days ago2 comments2 people in discussion

I saw it mentioned on your page, but I'm pretty sure that there's no need of any further technical work, as {{overfloat image}} fits perfectly (see that page). — Alien333 ( what I did
why I did it wrong ) 12:16, 3 September 2024 (UTC)Reply

@Alien333: The issue isn't with Andersen /472, it's with {{img float}}. Feel free to remove the hidden comment as it was just a reminder / todo for myself about the issue. Xover (talk) 12:25, 3 September 2024 (UTC)Reply

Orlando Furioso v4

Latest comment: 1 day ago8 comments2 people in discussion

Could you please generate a DjVu file from File:Orlando Furioso (Rose) v4 1825.pdf? Seven of the eight volumes were available at IA, and have been uploaded to commons:Category:Orlando Furioso (Rose), but volume 4 does not exist there for some reason. TE(æ)A,ea. (talk • contribs) was kind enough to acquire and provide a PDF, but I would prefer a DjVu, so that the whole series is in the same format (and because of the numerous technical issues we're having with PDFs). The DjVu should be named File:Orlando Furioso (Rose) v4 1825.djvu to match the naming pattern for the rest of the series. --EncycloPetey (talk) 20:12, 3 September 2024 (UTC)Reply

@EncycloPetey: File:Orlando Furioso (Rose) v4 1825.djvu. IA has a scan and HathiTrust has several, it's just that UCal seems to be missing vol. 4 from their physical collection so it's missing in that scan series. I grabbed one of the Harvard copies and uploaded that since it seemed to be decent quality and then I wouldn't have to deal with Google's terrible PDFs. Xover (talk) 15:40, 4 September 2024 (UTC)Reply

I found my IA copies using a search, which did not turn up a copy of volume 4. And if you look at the pattern in the local IDs, you can infer what we be correct for volume 4 in the set I found, but it's a scan of an entirely different book. I am aware of the copies at Hathi, and I asked TE(æ)A,ea. if one could be provided, but there were complications, I gather, from subsequent conversation.

Well, thank, and I'll take a look today to see whether this copy is a complete scan or not. I have come across copies that were missing portions of the original. --EncycloPetey (talk) 16:43, 4 September 2024 (UTC)Reply

There is no text layer. Could you please generate a text layer for the file? I am also getting zero file size errors, which I never have had previously with DjVu files. --EncycloPetey (talk) 16:52, 4 September 2024 (UTC) This problem sorted. --EncycloPetey (talk) 16:59, 4 September 2024 (UTC)Reply

The text layer exists, but is garbled because it was generated by Google. Where there is text, there can be whole lines placed at the bottom of the page, instead of in their proper sequence, if not missing altogether from the page. I have found pages with randomized punctuation. I may be able to use the OCR tool, since this is a regular and very structured text with a relatively clean scan, but I forsee a higher error rate on this volume, and we have had recent days where the OCR tool failed or was unpredictable. --EncycloPetey (talk) 17:08, 4 September 2024 (UTC)Reply

@EncycloPetey: The text layer in the DjVu was generated by my tools (tesseract is the OCR engine), not by Google. Spot checking pages in Index:Orlando Furioso (Rose) v4 1825.djvu I see no significant problems with the text layer. On what pages are you seeing problems? Xover (talk) 17:42, 4 September 2024 (UTC)Reply

I did not keep track of which pages. I checked several dozen to be sure the scan had likely included all the relevant pages without duplicates, and noted bizarre issues like the ones I describe. But looking for a few examples now: scan page 130 has randomized start-of-line punctuation; 190 has text that does not appear on the page; 200 is one of the pages where the text was out of sequence. --EncycloPetey (talk) 17:50, 4 September 2024 (UTC)Reply

/130 is just Tesseract being really bad at quotation marks. That's a general problem with no fix. /190 is Tesseract being over-eager and detecting the text on the opposite side of the sheet. It'll mostly just happen on empty pages (because it doesn't have real text to correct against), so it's usually not a big problem. The misplaced text on /200, though, is a weird bug. Tesseract detects the relevant line correctly, and with the correct coordinates (if you load it in DjView and turn on hidden text you'll see the text positioned exactly over the letters in the scan), but the line is stored out of order in the OCR output (Tesseracts outputs a HTML-like structured format where each detected word is tagged with its coordinates on the page; normally each line is in the output in the order it is on the page, but here that line comes at the end of the output, and hence also in the plain text shown in the text box). I'm guessing this is because it is getting confused by the first-line indentation and thinking the page is a two-column layout. I'll try to see if there are any settings I can tweak or something, but I'm not hopeful and it probably won't happen soon in any case. IOW, unless the problems with this are more severe than currently apparent this is as good as it's going to get for now. Xover (talk) 18:46, 4 September 2024 (UTC)Reply

Vector 2022

Latest comment: 21 hours ago3 comments2 people in discussion

Hey, I'd like to revive the topic of making Vector 2022 the default here. Before I start a discussion in Scriptorium though, I wanted to check in with you. Do you see any issues that need to be addressed (fixed, explained, regardless) either before or reasonably shortly after deployment? Maybe we could do some things before involving more people, esp. the less technical editors. Thanks! SGrabarczuk (WMF) (talk) 19:20, 4 September 2024 (UTC)Reply

@SGrabarczuk (WMF): This is just a quick braindump before morning coffee. Once the caffeine kicks in I may regret everything and take it all back. Or something like that… 😎

I think the biggest issue is going to be general pushback from the community of the kind enWP so emphatically provided, even if somewhat more muted and on different causes. Partly that's going to be motivated by resistance to change (we have contributors still using Monobook for no articulable reason), but partly also because Vector 2022 reflects different priorities than their own. Its major focuses are things that make sense on a Wikipedia, but not so much on Wikisource; it moves around UI elements that are now harder to find and get at than before; and it reflects WMF priorities over community priorities (e.g. the language selector vs. the mw-indicators positioning). I'm afraid the community here will see little benefit in the changes Vector 2022 makes, and things like having to go to a submenu to find the link to your own user talk page will be viewed as significant drawbacks. I could be wrong, but that's my concern.

In more concrete and technical terms I'm not aware of any major things of the "breaks core workflow" variety. The new menus are breaking some Gadgets that modify them (bigChunkedUpload is the latest I've noticed). The interlanguage links we manually add to Special:RecentChanges by way of MediaWiki:Recentchangestext and {{Interwiki Wikisource}} no longer work in Vector 2022 (it works in all other skins). Vector also overlaps our Dynamic Layouts (essentially MediaWiki:Gadget-PageNumbers.js) while providing no community control, not integrating with our layout system, nor provide any facilities that make our implementation easier (the Gadget is somewhat fragile and prone to FOUC-type problems). Also, because Wikisource is so poorly supported by the WMF (so far as I can tell not a single developer has ever been allocated to Wikisource; we depend entirely on the good will of individual developers and teams with other responsibilities for everything we need) we are dependent on a large number of gadgets and user scripts that make repetitive editing tasks more efficient. Vector 2022 is designed with hiding these away as an apparent goal (stuff added to #p-toolbox is now hidden in the looong and cluttered Tools dropdown menu), leaving us with no clear way to surface editing helpers that need to be Fitts's law-compliant. The 2017 editing toolbar and Visual Editor doesn't support Wikisource (at all), and the 2010 editor is really primitive in terms of extension and integration points (see e.g. T370353 for the completely basic s...tuff that's not there). The paragraph spacing is still broken (compare the text inside the box on this page in Vector 2010 and Vector 2022), and this affects a lot of pages on enWS.

I haven't done a systematic assessment of Vector 2022 here (partly because it's a moving target, partly because I haven't had time, partly because y'all have been focussed on the Wikipedias), but I have had it set as default since the last time the issue was brought up back in March. My assessment is that the main issue with making Vector 2022 default is that the value proposition for English Wikisource—the "What's in it for me?"—is too poor when held up against both the concrete drawbacks and the need for change in general (all change has a cost; resistance to change is not inherently irrational). If the development of the skin had been more able to identify and incorporate this specific community's needs and priorities in its scope early on I think that calculus could have easily changed. But as it stands the value proposition is going to be perceived as marginal, at best, and the Wikisourcen in general have way too few technical contributors able to follow up with the Web Team to get issues fixed as they crop up (by the time the community has got its act together the team will be onto other tasks and greener pastures). Xover (talk) 06:14, 5 September 2024 (UTC)Reply

Wow, thanks for the detailed and long response, I really appreciate it! If you'd like to add something or take something back, you can also reach out to me on Discord, Telegram, lots of places - there are very few people named Szymon Grabarczuk, I'm easy to find across platforms :D SGrabarczuk (WMF) (talk) 08:43, 5 September 2024 (UTC)Reply

Add topic