Wikipedia:Bots/Requests for approval/DeadLinkBOT

This is an old revision of this page, as edited by MZMcBride (talk | contribs) at 06:28, 15 December 2008 (Discussion: +reply). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Operator: ThaddeusB (talk)ThaddeusB

Automatic or Manually Assisted: (Mostly) Automatic, supervised

Programming Language(s): Perl

Function Summary: To correct dead links due to link rot

Edit period(s) (e.g. Continuous, daily, one time run): as needed

Already has a bot flag (Y/N):

Function Details: DeadLinkBOT's purpose is to update links that are invalid due to link rot. The first version of the program will simply replace all instances of a user supplied link with an updated link. When needed, the program is capable of making simply determinations about the nature of the WP page in order to pick a new link from a list of alternatives (given user supplied rules). In the future, the program will be expanded to actively seek out updated links after retrieving a list of dead links to be updated. These more advanced changes will require user confirmation. When a page is edited to update a link, the bot will also apply AWB-like general fixes.


Discussion

What about websites that go through regular downtime? If the bot reads them as dead while they are temporarily down, it will remove a good link. Xclamation point 05:01, 2 December 2008 (UTC)[reply]
It will be attempting to fix 404 links found at Wikipedia:Linkrot. In theory, 404 errors are not due simply to downtime, but rather a page being renamed or moved. Per WP policy, the bot won't remove any link for which it can't find an alternative. I.E. It will only address links that have moved to a new location. These precautions should prevent any removal of temporary unavailable locations.
The first version of the program will only change links specified in advance, starting with the 2200+ links here [1]. When I add the automatic updated link finding feature, the bot will double check proposed changes with me before making them --ThaddeusB (talk) 15:46, 2 December 2008 (UTC)[reply]
Can you put in a double check of links, say, a week apart to ensure that it's not an short run 404 that caused the problem -- Tawker (talk) 07:17, 6 December 2008 (UTC)[reply]
Yes, I will add that feature. --ThaddeusB (talk) 22:00, 7 December 2008 (UTC)[reply]
I wasn't planning on releasing it for public consumption. --ThaddeusB (talk) 22:00, 7 December 2008 (UTC)[reply]
Why not? If its going to be actually be changing links in articles I'd really like to know that the code is sound. Mr.Z-man 21:57, 12 December 2008 (UTC)[reply]
Well considering its explicitly not required, I shouldn't have to justify my decision. But since you asked, my code is undocumented and "ugly" - it is not intended to be read by anyone but me. I really don't see what the issue is - all the program does as far as Wikipedia goes is substitute a pre-screened dead URL for a pre-screened good one, possibly applying pre-screened regrexes to pick between two or more different options. None-the-less, I put the code up anyway: User:DeadLinkBOT/source --ThaddeusB (talk) 23:45, 12 December 2008 (UTC)[reply]

I have tested the bot with local writes and all works according to plan. I would appreciate it if a trial could be approved for actual wiki editing soon. Thanks. --ThaddeusB (talk) 02:34, 14 December 2008 (UTC)[reply]

This all seems rather sketchy to me. "I'm gonna go through all the articles and change a bunch of links. And... um... apply 'AWB-like' general fixes too." People applying AWB-like changes usually get banned pretty quickly because they rarely consider the large, large number of corner cases. And general fixes require someone to watch them and verify each edit to avoid things getting screwed up. As for the link changes, do you have any examples from articles? Edits you've done by hand (or even using this script)? And will you only be dealing with pages in namespace 0? --MZMcBride (talk) 04:41, 15 December 2008 (UTC)[reply]
First of all, I don't appreciate the attitude. I said nothing like "I'm going to go through all articles and change a bunch of links." What I actually said is that I was to to change all specific instances of a known bad link to a known good link (using Special:LinkSearch). I also said several times that every change would be pre-approved by me. If there was a problem doing general fixes, then why was that never mentioned before now? This request is now 2 weeks old and this is the first I'm hearing of it potentially being a problem. I am certainly willing to drop that part of the request and (potentially) resubmit it with a specific list of fixes as a separate request.
I also stated the list of links I'd be starting with above. This is from a specific request from Wikipedia:AutoWikiBrowser/Tasks#LeighRayment.com_.28continued.29. There are 2500+ of them. I have tested the first batch of them with local writes, but it's against WP policy to have a bot edit WP without test approval, so of course I haven't actually written them to WP. Isn't that the whole point of having a test period?
Since its correcting DEAD links, I don't see any reason to limit its scope (although it does avoid editing archives), but I could easily change that if desired. --ThaddeusB (talk) 05:13, 15 December 2008 (UTC)[reply]
So will only be working on angeltowns.com links or is for broader approval? If it's for the former, this can probably be speedily approved. For the latter, it's going to require more time / testing / whatever. As to why anyone didn't mention that AWB's general fixes are problematic, well probably because most of BAG is either inactive or incompetent. /me shrugs. Though I do think AWB's documentation is pretty explicit about the 'danger' of general fixes. --MZMcBride (talk) 06:28, 15 December 2008 (UTC)[reply]