Wikipedia:Bots/Requests for approval/DeadLinkBOT: Difference between revisions

Content deleted Content added
{{BotTrial|days=8|edits=100}}
MalnadachBot (talk | contribs)
m Fixed Lint errors. (Task 12)
 
(35 intermediate revisions by 10 users not shown)
Line 1:
<noinclude><br />
<noinclude>[[Category:Wikipedia bot requests for approval|{{#titleparts:{{PAGENAME}}|1|3}}]]</noinclude>
==[[UserCategory:DeadLinkBOTApproved Wikipedia bot requests for approval|DeadLinkBOT]]==
<div class="boilerplate metadata" style="background-color:
#A0FFA0; margin: 2em 0 0 0; padding: 0 10px 0 10px; border: 1px solid #AAAAAA;">
:''The following discussion is an archived debate. <span style="color:red">'''Please do not modify it.'''</span> Subsequent comments should be made in a new section.'' The result of the discussion was [[File:Symbol keep vote.svg|20px]] '''Approved'''.<!-- from Template:Bot Top--><noinclude>
 
== [[User:DeadLinkBOT|DeadLinkBOT]] ==
{{Newbot|DeadLinkBOT}}
 
Line 24 ⟶ 29:
 
 
=== Discussion ===
<!-- This is not a vote. It is a discussion -->
:What about websites that go through regular downtime? If the bot reads them as dead while they are temporarily down, it will remove a good link. [[User:X!|<span style="font-family:Verdana,Arial,Helvetica;color:steelblue;">'''X'''</span>]][[User talk:X!|<span style="color:steelblue;"><small>clamation point</small></span>]] 05:01, 2 December 2008 (UTC)
Line 32 ⟶ 37:
:::::Yes, I will add that feature. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 22:00, 7 December 2008 (UTC)
 
*Is the source code to your bot available? &mdash;&nbsp;Carl <small>([[User:CBM|CBM]]&nbsp;·&nbsp;[[User talk:CBM|talk]])</small> 14:15, 6 December 2008 (UTC)
::I wasn't planning on releasing it for public consumption. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 22:00, 7 December 2008 (UTC)
:::Why not? If its going to be actually be ''changing links'' in articles I'd really like to know that the code is sound. <fontspan facestyle="font-family:Broadway;">[[User:Mr.Z-man|Mr.]][[User talk:Mr.Z-man|'''''Z-'''man'']]</fontspan> 21:57, 12 December 2008 (UTC)
::::Well considering its explicitly not required, I shouldn't have to justify my decision. But since you asked, my code is undocumented and "ugly" - it is not intended to be read by anyone but me. I really don't see what the issue is - all the program does as far as Wikipedia goes is substitute a pre-screened dead URL for a pre-screened good one, possibly applying pre-screened regrexes to pick between two or more different options. None-the-less, I put the code up anyway: [[User:DeadLinkBOT/source]] --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 23:45, 12 December 2008 (UTC)
 
Line 42 ⟶ 47:
::I also stated the list of links I'd be starting with above. This is from a specific request from [[Wikipedia:AutoWikiBrowser/Tasks#LeighRayment.com_.28continued.29]]. There are 2500+ of them. I have tested the first batch of them with local writes, but it's against WP policy to have a bot edit WP without test approval, so of course I haven't actually written them to WP. Isn't that the whole point of having a test period?
::Since its correcting DEAD links, I don't see any reason to limit its scope (although it does avoid editing archives), but I could easily change that if desired. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 05:13, 15 December 2008 (UTC)
:::So will this bot only be working on angeltowns.com links or is this request for broader approval? If it's for the former, this can probably be speedily approved. For the latter, it's going to require more time / testing / whatever. As to why anyone didn't mention that [[WPWikipedia:AWB|AWB]]'s general fixes are problematic, well probably because most of [[WPWikipedia:BAG|BAG]] is either inactive or incompetent. /me shrugs. Though I do think AWB's documentation is pretty explicit about the 'danger' of general fixes. --[[User:MZMcBride|MZMcBride]] ([[User talk:MZMcBride|talk]]) 06:28, 15 December 2008 (UTC)
::::I wrote the bot in order to correct the angeltown links, but I don't see any reason to limit its scope. I have written dozens of text-parsing scripts in the past and am well aware of the potential issues involved with unexpected input and such. I do realize AWB-style general fixed are difficult to implement correctly, but I feel I am up to the challenge. Nonetheless, I will drop that part of the request at this time. (I have seen bot approved for gen fixes in the past and didn't htink it would be an issue or I would never have added that part.) HTML links, however, do not present such issues. There just is not any realistic chance of a search for "http://www.somesite.com/directory/somedeadURL.htm" generating false positives outside of a few specific pages such as WP's list of dead links. If my bot works correctly on somewebsite.com, it will work on someotherwebsite.com assuming the input (supplied by me) is valid.
::::I am well aware that ultimately I am responsible for every edit the bot makes, and will utilize the utmost care in what I tell it to fix. If a maliciously tell it t change every www.microsoft.com to www.myspamsite.com then obviously I'd be in trouble. But if that was my intention why would I even bother trying to get approval? --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 12:37, 15 December 2008 (UTC)
:::I think this is wonderful. If it fixes angeltowns.com/town that is a great test in itself. Let's go guys. [[User:Kittybrewster|Kittybrewster ]] [[User_talkUser talk:Kittybrewster|<fontspan colorstyle="color:#0000FF;">&#9742;</fontspan>]] 09:56, 15 December 2008 (UTC)
Can a member of BAG please explain exactly what they want me to do to prove this bot works correctly? I've tested it locally, answered every question here, released the source, and tried to be patient but no one seems to be willing to act. What do I need to do to get the ball rolling? --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 02:34, 18 December 2008 (UTC)
 
Line 53 ⟶ 58:
*It can be used for "identification" situations to quickly identify all occurrences of a particular URL or URL fragment for other uses, such as looking for patterns of spamming, etc. I'm not familiar with semi-automated editing tools, but in principle the generated list can be used as input to a semi-automated tool, leaving it to a human being to confirm or cancel each edit. This would be practical on only relatively short lists, maybe a few hundred or so.
*The read-only portion by definition does not need approval of the BAG, it can run as soon as it's written.
Once the two bots are working nicely separately, they can be interleaved, so as an item is added to the list, it is immediately processed and the edit is made.[[User:davidwr|davidwr]]/<small><small>([[User_talkUser talk:davidwr|talk]])/([[Special:Contributions/Davidwr|contribs]])/([[Special:Emailuser/davidwr|e-mail]])</small></small> 19:50, 18 December 2008 (UTC)
 
 
There is already a bot request for which this bot would be useful: [[Wikipedia:Bot_requests#Bulk-replace_URL_for_Handbook_of_Texas_Online]] [[User:davidwr|davidwr]]/<small><small>([[User_talkUser talk:davidwr|talk]])/([[Special:Contributions/Davidwr|contribs]])/([[Special:Emailuser/davidwr|e-mail]])</small></small> 19:50, 18 December 2008 (UTC)
:Hello, the way the bot is currently structured is as follows:
<blockquote>
Line 67 ⟶ 72:
# Write recommended changes to file for review
'''Wait a week''' to insure the URL is indeed dead
<br />'''Alternative'''ly if a user (such as yourself) supplies an URL that needs changed, the URL can go directly into the "for review" stack
<br />'''URL reviewed by me''' to make sure the recommended change is accurate, then its moved into a machine readable to be processed file
<br />'''Processing'''
# Get URL + change(s) from file; change can require a simply test such as making sure "text" is in the page to be changed and make decisions based on those tests; the new text can be anything - presumably a URL or a template.
# Use [[Special:LinkSearch]] to find all instances of the URL on wikipedia - this list could be output to a file if you want
Line 81 ⟶ 86:
::*After a suitable number of changes are in the holding file, manually review each change and mark it approved. A "suitable number" could be 1 change or an entire batch. Doing it 1 change at a time simulates assisted-editing tools like AutoWikiBrowser.
::*For each approved change, verify there have been no intermediate edits and make the edit then move on to the next approved change. If possible, don't count intermediate edits that only affected other sections, i.e. make the change if at all possible, but abandon any change that looks like an edit-conflict and log it as a failure so it can be done over again.
::[[User:davidwr|davidwr]]/<small><small>([[User_talkUser talk:davidwr|talk]])/([[Special:Contributions/Davidwr|contribs]])/([[Special:Emailuser/davidwr|e-mail]])</small></small> 21:04, 18 December 2008 (UTC)
:::Thanks for the comments.
:::*I can certainly review the first X changes manually before uploading. However, the bot is capable of doing, for example, 1000 edits in under 3 hours (with standard rate limits applied); I certainly don't want to review 1000+ edits, let alone an entire month's worth. (I have already written and reviewed a # locally, but can also intermediately review some more.)
:::*I think you were actually talking about only having the edit conflict feature for the manually approval tests which is definitely wise. However, once it goes live, it would be pointless. I could pull the history and check for intermediate updates, but this would most likely actually take longer than the text parsing (which happens in a tiny fraction of a second.) I could, however, pull the history after an edit to just make sure there was no intermediate edit and auto-revert if there was any. LMK what you think. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 21:26, 18 December 2008 (UTC)
 
==== Trial ====
{{BotTrial|days=8|edits=100}} <small>—<font face="Trebuchet MS">'''[[User:Reedy|<font color="darkred">Ree</font>]][[User talk:Reedy|<font color="darkred">dy</font>]]'''</font></small> 22:10, 18 December 2008 (UTC)
{{BotTrial|days=8|edits=100}} <small>—<span style="font-family:Trebuchet MS;">'''[[User:Reedy|<span style="color:darkred;">Ree</span>]][[User talk:Reedy|<span style="color:darkred;">dy</span>]]'''</span></small> 22:10, 18 December 2008 (UTC)
:I will begin trial edits after I add the logging features requested below. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 22:22, 18 December 2008 (UTC)
 
::::*On the auto-revert, that's a great idea, but be sure to log if the auto-revert failed for any reason so you could do manual cleanup. As for pointless, you may not be willing to review 1000 changes, but the person requesting the tool might want to review the changes before they were committed or possibly shortly after. Given restrictions on multi-user bots, a report listing both a "click here for diffs" plus the actual diffs in-line would be a handy thing to give to the requester: It's a lot easier to wade through a several-hundred-KB text file with page after page of diffs than it is to click on a few hundred links. Such a report would of course have a "click here for diff" link for each change, so the requester would have easy-access to do a manual diff and if necessary, manual undo or cleanup. [[User:davidwr|davidwr]]/<small><small>([[User talk:davidwr|talk]])/([[Special:Contributions/Davidwr|contribs]])/([[Special:Emailuser/davidwr|e-mail]])</small></small> 22:12, 18 December 2008 (UTC)
:::::*Sure, I will add feature to log all changes to a file and upload them to the DeadLinkBot user space after every 50 or so edits. I'll send you a link for your project's logs after I add the feature. (I'll put 50 of my trial edits into your project and 50 into the angeltowns.com request I initially wrote this for.) --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 22:22, 18 December 2008 (UTC)
::::::*Just a note: I'm going to play with AutoWikiBrowser to see if it's suitable for my project. I hear those tools can do 2000 or so edits an hour at full speed, which means it will take me less than 2 hours to go through the list. I'll leave 50 for you. Of course, I'll be slower than that until I become familiar with AWB, and I'll be rate-limited by the wiki software. It will be interesting to see which is faster per 50 edits: AWB or reviewing the edits after the fact. [[User:davidwr|davidwr]]/<small><small>([[User talk:davidwr|talk]])/([[Special:Contributions/Davidwr|contribs]])/([[Special:Emailuser/davidwr|e-mail]])</small></small> 22:26, 18 December 2008 (UTC)
 
Why not just use the appropriate options (<code>basetimestamp</code> and <code>starttimestamp</code>) to the API edit command to detect edit conflicts the normal way, instead of trying to do some odd "possibly overwrite others edits, and then try to self-revert" scheme? [[User:Anomie|Anomie]][[User talk:Anomie|⚔]] 23:59, 18 December 2008 (UTC)
:Whoops! I've been using perlwikipedia 1.0 since it is the "featured download" on Google code site linked to from here. It didn't support edit conflict detection (nor linksearch which I wrote code for myself). Your comment didn't make much sense to me, so I went and looked and its actually on version 1.5 now! I'm assuming this new version detects edit conflicts... I guess I better install that and read up on it instead of making some silly workaround. :) --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 04:54, 19 December 2008 (UTC)
 
==== Trial complete ====
{{BotTrialComplete}}
I rewrote the program to query the API directly (rather that using perlwikipedia.pm or an equivalent). This enabled more efficient resource usage and the ability to correctly detect edit conflicts. However, it did lead to some temporary bugs. Most embarrassingly, the bot's first 5 edits blanked pages due to a variable being mistyped. (Doh!) Of course, I promptly fixed any errors the bot made and corrected the code to avoid repeating them. :)
 
The bot can now detect edit conflicts and false positives (e.g. on talk pages), although neither arose in the trial period. It ignores Wikipedia: space articles (excluding WikiProject pages), archives, sandboxes, and pages in its own userspace.
 
After everything was working, DeadLinksBOT made just under 50 edits correcting angeltown.com links. A log of these edits can be found at [[User:DeadLinkBOT/Logs/AngelTowns.log]]. I have manually reviewed them and have also invited Kittybrewster to review and comment here.
 
Here is a representative sample of the kinds of corrections it can routinely make:
* Boring 1:1 URL replacement [http://en.wikipedia.org/w/index.php?title=Wilfred+Stamp%2C+2nd+Baron+Stamp&diff=259469518&oldid=239426055]
* URL replacement based on regrex [http://en.wikipedia.org/w/index.php?title=Philip+Anstruther+(died+1760)&diff=259302881&oldid=239431957]
* multiple related URLs corrected on same page [http://en.wikipedia.org/w/index.php?title=Wikipedia%3AWikiProject+Baronetcies%2FRayment&diff=259302801&oldid=79466679]
* URL -> simple "permanent link" template (template chosen from small list based on article's title & contents) [http://en.wikipedia.org/w/index.php?title=1st+Troop+of+Horse+Guards&diff=259050046&oldid=217549675]
* URL -> two simple templates based on the subject being both a baron and a member of parliament [http://en.wikipedia.org/w/index.php?title=Baron+Leith+of+Fyvie&diff=259067083&oldid=101400238]
 
Collectively, these edits are represent both the typical workload of the bot (straight URL replacement) and the most complicated case that will regularly arise (transition to simple template). I am confident that the bot will be 99%+ accurate with these edits.
 
During the trial, I used the other 50 approved edits to parse a much more difficult situation that the bot would typically face - transitioning a dead URL to a complicated template (Handbook of Texas). This change uses a custom function that no other changes will use, so its accuracy is independent of the normal functional accuracy. Since the parsing is fairly complex, I have had to make several changes to it so the edit history ([[User:DeadLinkBOT/Logs/HandbookOfTexas.log]]) is not completely representative of the current functionality. In particular, t
he bot made several errors that it would no longer make. (All changes were manually verified and corrected when needed.) The bot should be much closer to fully accurate now, but all changes will be manually verified for the foreseeable future.
 
Here is a representative sample of the kinds of corrections it can make:
* &lt;ref&gt; transitioned to {{tl|Handbook of Texas}} template with missing information filled in by retrieving the handbook page [http://en.wikipedia.org/w/index.php?title=Lady+Bird+Johnson&diff=259866264&oldid=259595583]
* named reference transitioned with name left intact [http://en.wikipedia.org/w/index.php?title=The+Texas+Observer&diff=259866140&oldid=201592887]
* external link transitioned to simpler version of HT template (no author/dates) [http://en.wikipedia.org/w/index.php?title=Tandy+Corporation&diff=259866054&oldid=258400093]
* bare link transitioned to template with &lt;ref&gt; tags added [http://en.wikipedia.org/w/index.php?title=Buffalo+Hump&diff=259855524&oldid=259673671]
* bare link with title updated (b/c its not a reference, but rather part of the text); also malformed template corrected [http://en.wikipedia.org/w/index.php?title=Dallas+Herald&diff=259669389&oldid=231681372]
* bare link on talk page simply updated (not appropriate to transition to template) [http://en.wikipedia.org/w/index.php?title=Talk%3ATexas+Transportation+Company&diff=259866175&oldid=196776238]
 
Again, these edits are not typical but rather representative of the most complicated edits the bot would ever do. If the need for this sort of change ever arises again, I would of course be manually verifying everything again. I have explained my methodology to davidwr and invited him to comment here.
--[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 07:14, 24 December 2008 (UTC)
:It works incredibly well. Congratulations and thank you very much. [[User:Kittybrewster|Kittybrewster ]] [[User talk:Kittybrewster|<span style="color:#0000FF;">☎</span>]] 08:52, 24 December 2008 (UTC)
: Something to consider for the future: add a "test" switch to the bot where it will save the proposed edits to its local hard drive instead of actually editing Wikipedia. You could then use diff, wdiff, and the like to make sure the edit is correct before running the bot for real. [[User:Anomie|Anomie]][[User talk:Anomie|⚔]] 14:25, 24 December 2008 (UTC)
:: I did write some edits to file, but I didn't think to use diff. I manually compared the before and after, which made it easy for me to miss errors. I certainly will use diff in the future - that will help a lot. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 15:57, 24 December 2008 (UTC)
 
:Overall good work. I confess I haven't had time to go back behind you and audit everything, I'll take your word for it that you did a good job auditing the results. However, I did find a couple of issues:
:*There is a problem with non-ASCII character encoding: Pages with unusual characters do not log properly. The change to [[Texas–Indian Wars]] logged as [[Texas–Indian Wars]]. The change to [[Alonso Álvarez de Pineda]] logged as [[Alonso lvarez de Pineda]]. While this particular error is of no great consequences, please check the code for similar errors that may be more consequential.
::Yah, its a problem specific to the log. It should be easy to clear up but I didn't bother since the diff links work right. I'll go ahead and fix it now. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 15:57, 24 December 2008 (UTC)
:*Some people don't like user page material modified. Consider immediately self-reverting any change made in User: space and putting a note on the user's talk page pointing to the changed diff, and leave it up to them whether or not to commit the change. Alternatively, don't self-revert but do drop the user a note. I know most bots treat user pages the same as article space, but it's a trend I'd like to see change. [[User:davidwr|davidwr]]/<small><small>([[User talk:davidwr|talk]])/([[Special:Contributions/Davidwr|contribs]])/([[Special:Emailuser/davidwr|e-mail]])</small></small> 14:49, 24 December 2008 (UTC)
::*I agree with the first point. But not, in the case of dead links, with the second. [[User:Kittybrewster|Kittybrewster ]] [[User talk:Kittybrewster|<span style="color:#0000FF;">☎</span>]] 15:24, 24 December 2008 (UTC)
::I'll add a feature to drop the user a courtesy note on their talk page. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 15:57, 24 December 2008 (UTC)
:::Your decision. I don't like the idea of leaving dead link lying around clogging up the internet and wikipedia. Maybe the answer is to have the bot change it and add a note saying this has been done. [[User:Kittybrewster|Kittybrewster ]] [[User talk:Kittybrewster|<span style="color:#0000FF;">☎</span>]] 16:46, 24 December 2008 (UTC)
::::Yah, that's what I meant. The bot will still change the link but will leave a courtesy note informing the user of the change. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 19:35, 24 December 2008 (UTC)
 
==== Comments from [[User:Dispenser|Dispenser]] ====
Statements like "99%+ accurate" are meaningless, since there'll always be a user who does the unexpected and sets examples for others to follow. So anyway I'm author and maintainer of the [[User:Dispenser/Checklinks|Checklinks tool]] and [[User:PDFbot|PDFbot]]. Checklinks detects, lists, and allows user repairs of dead links on pages, it is mostly used as a link checker on article review. PDFbot had been approved for similar dead link repair; however, it actually checks every link it replaces to make sure it works.
 
So here are some of the cavoits
* replacement of example.com -> example.org no replacement should happen http://web.archive.org/web/*/http://example.com
* bracketed link are match differently from free links, free links match different depending on if there's a "(" in them. Are you going to replace free links?
* http://www2.jsonline.com/story/index.aspx?id=279432 will on occation return the status code 404
* Many site use [[soft 404]], matching these are hard. Some only by content anaylsis, see [[Wikipedia:Bots/Requests for approval/DumZiBoT]] for some details
* Does the bot remove {{tl|dead link}} when replacing the dead links?
 
That is all I can think of at the moment, if this bot is approved can it simply nytimes link from
 
[http://www10.nytimes.com/2007/07/01/magazine/01WIKIPEDIA-t.html?_r=5&pagewanted=print&oref=slogin&oref=slogin&oref=slogin&oref=slogin] to [http://www.nytimes.com/2007/07/01/magazine/01WIKIPEDIA-t.html] to remove the login requirements. — [[User:Dispenser|Dispenser]] 18:47, 31 December 2008 (UTC)
 
:First of all, thank you for your insight. I used the phrase 99% accurate because I couldn't think of any way it would fail, but there is always a possibility of something wacky happening. With your insights, I was able to eliminate some unlikely, but possible situations...
:*<nowiki> As currently programmed, the bot will replace something like http://whatever.com/page.htm with http://newsite.com/page.htm (no bracket->no brackes) when in "standard URL replacement" mode. This way somethign goofy like [http://oldsite.com http://oldsite.com] doesn't change into [http://newsite.com http://oldsite.com] (it normally leaves titles unchanged) but rather [http://newsite.com http://newsite.com].</nowiki>
::*In "URL -> template" mode, it will usually replace these types of bare with the desired template, but it can also leave them untouched or just change them to new bare URLs, depending on how the rule for the change is set up. (I believe [[Special:Linksearch]] does pick up these kind of "bare" URLs; if it doesn't my bot won't find the page though).
 
:*I have now added a clause that these bare URLs must be proceed by a punction mark (! ? . , ' "), space, }, |, or >. Thus if it is part of a larger URL it won't be picked up. Although it was unlikely that a page would have an archive.org link and one to old URL directly, it doesn't hurt to fix this problem and anything similiar by being explicit. :)
:*All old_link->new_link rules are manually reviewed before being sent to the bot for processing, so I'll just ignore anything from jsonline.com.
:*There are plenty of normal 404s to work through, so I'll be ignoring "soft" ones for the time being.
 
:*I was previously unaware of the <nowiki>{{dead link}}</nowiki> template, but now that I am I've added code to remove the template (or its aliases) from a corrected link (as long the link & template are seperated only by whitespace).
:*Changing NYT links is, technically, quite simply. However, I am not quite sure what you wanted exactly:
 
::# <nowiki>http://www10.nytimes.com/(date)/(scope)/article.htm?(bunch of junk) -> http://www.nytimes.com/(date)/(scope)/article.htm</nowiki>
::# <nowiki>http://www(##).nytimes.com/(date)/(scope)/article.htm?(bunch of junk) -> http://www.nytimes.com/(date)/(scope)/article.htm [all numbers]</nowiki>
::# <nowiki>http://www*.nytimes.com/(date)/(scope)/article.htm?(bunch of junk) -> http://www.nytimes.com/(date)/(scope)/article.htm [include www.]</nowiki>
::# <nowiki>http://*nytimes.com/(date)/(scope)/article.htm?(bunch of junk) -> http://www.nytimes.com/(date)/(scope)/article.htm [include links without subdomain]</nowiki>
::# <nowiki>http://www(##).nytimes.com/(date)/(scope)/article.htm* -> http://www.nytimes.com/(date)/(scope)/article.htm [include articles without the paramaters]</nowiki>
::Basically, I need to know what causes the login screen to be generated.
 
:Any further question, just ask. :) --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 23:03, 31 December 2008 (UTC)
 
:: You should update the source code. Also, you can probably make good use of Checklinks source code.
::* The best way to ensure that the url isn't part of another is to use the look behind <code>(?<!\w://[^][<>\s"]*)</code>
::* By manually review, do you mean that the look at each replacement or just look to make sure it makes sense for most of them? I would prefer automated review in addition to any manual.
::* I think the login is caused by the oref=slogin but would like the url to be simple, see http://no-www.org/ and [[URL normalization]].
::* Does the bot ignore nowiki, comment, includeonly, source tags?
:: — [[User:Dispenser|Dispenser]] 22:17, 7 January 2009 (UTC)
:::*Yes, I use lookbehind sorry if that wasn't clear. The snippet you provided won't actually work in Perl since it doesn't have variable length lookbehind, but my positive lookbehind <code>(?<=[\s!?.,'"}|>*])</code> should be functionally equivalent.
:::*I mean that I make sure the change is valid in general (for domain moves). Obviously if the original link was mistyped or something the new one will still be wrong, but it will be less wrong. Why leave a link to a dead domain unchanged just because it was mistyped? There is no (reasonable) way for a bot to fix such links, but at least it will be easier for a human to fix if they at least know where to look. In the rare case when a page simply moved locations on the same domain, only the exact page match would be changed.
:::*LinkSearch will only find actual links, not comments and such, so no they wouldn't be corrected.
:::*I'll put the NYT links on my to-do list for when the bot is approved.
:::--[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 22:40, 7 January 2009 (UTC)
 
==== Approval? ====
Any chance of getting this approved soon? The trial ended almost 2 weeks ago and I have addressed all the concerns raised. I'd like to get started on fixing up more dead links soon. Thanks. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 23:45, 4 January 2009 (UTC)
:I'm somewhat against the concept. I understand its need, but we really should not be expending human effort in cleaning up after other people's intentional messes. The method that we should be employing is to email the webmaster (possibly with a list of broken URLs) and point them to some guide on how to setup the server with proper redirection and some guidance on a good URL scheme (like DOI and such). Broken URLs affect everyone not just us. — [[User:Dispenser|Dispenser]] 21:23, 7 January 2009 (UTC)
::It is Wikipedia policy to repair [[Wikipedia:Dead external links|dead links]] already. All my bot does is reduce the amount of human effort needed to do so. I'm not creating policy, just trying to automate some tedious work.
::Websites change their location for a variety of reasons. Not everyone wants to continue to pay for an old domain/hosting just for redirect purposes. I agree that websites should use redirects instead of just "disappearing" but the fact is that they don't always. I can't change the world, only make do with the way it is. Besides, even websites that do have redirects from their old locations usually don't keep the old links valid indefinitely and almost always ask visitors to update the link that brought the there. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 22:20, 7 January 2009 (UTC)
 
{{tl|BAGAssistanceNeeded}} --[[User:Tinucherian|Tinucherian]] 12:04, 8 January 2009 (UTC)
 
I've got a few more user submitted link updates to work on now. Now >5000 links waiting to be updated. Hoping to get started soon, [[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 13:26, 8 January 2009 (UTC)
:{{BotApproved}} --[[User talk:Chris G|<b><span style="color:Green;">Chris</span></b>]] 23:52, 8 January 2009 (UTC)
 
:''The above discussion is preserved as an archive of the debate. <span style="color:red">'''Please do not modify it.'''</span> Subsequent comments should be made in a new section.''<!-- from Template:Bot Bottom --></div>
</noinclude>
:::Wait, wait, wait. Sorry, that I don't have the time (or writing inclination) to commit to WP 24/7, but ThaddeusB response was flawed. [[Wikipedia:Dead external links]] isn't a policy it just a unmaintained project, arbitrarily moving dead resources without checking it is a bad idea since it eliminates Wayback history data and ''will'' introduces problems. — [[User:Dispenser|Dispenser]] 07:15, 9 January 2009 (UTC)
::::Strongly disagree. This is human overseen, is much needed and fully complies with wikipolicy. It is not arbitrary and leaves a trace in the history. [[User:Kittybrewster|Kittybrewster ]] [[User talk:Kittybrewster|<span style="color:#0000FF;">☎</span>]] 09:17, 9 January 2009 (UTC)
:::::The only parts that are overseen were the approval of link changing this is no substitute when the replacement algorithm is flawed? He hasn't shown that it is harmless. It modified comments, nowiki tags, skips citations that embed links in < >.
:::::The approval was granted in a 26.5 hours after the last post by me, for reasons of edit waring, I do not immediately respond to comments. I was out yesterday doing some work so I could respond till the late evening. So it is wikipolicy that a bot can make harmful edits, which are unsearchable in wikiblame. Why was the request of my review of the source code not taken in account? So if I dumped a list of bad pages in 3-4 months Kittybrewster go through the history and find who edited them? — [[User:Dispenser|Dispenser]] 15:34, 9 January 2009 (UTC)
::::::The above comment is factually inaccurate and unnecessarily rude. The bot has *not* "modified comments, nowiki tags, [or skiped] citations that embed links in < >" and even if it did it wouldn't be harmful; the link is out-of-date whether it is clickable or not. Every change the bot makes is logged with easily clickable diffs at [[User:DeadLinkBOT/Logs]] and every change is being manually review by me to iron out any bugs. (I've stated this particular fact several time now.) Please DO raise any actual errors the bot makes either on its talk page or mine, but this endless speculation of how its going to harm Wikipedia is getting old.
::::::This request had been open for 2 months and Dispenser is the only person to object. No one user gets veto rights and the task is clearly desirable, despite Dispenser's insistence that it isn't Wikipedia policy to fix dead links. --[[User:ThaddeusB|ThaddeusB]] ([[User talk:ThaddeusB|talk]]) 16:34, 9 January 2009 (UTC)
::::::: I want the bot, but I want it to work right. My frustration is in BAG's strange and/or sudden approval in bot processes, often without much warning. The "speculation" was based on the last release of the source code. — [[User:Dispenser|Dispenser]] 17:25, 9 January 2009 (UTC)