Wikipedia:Dead external links

From Wikipedia, the free encyclopedia.

Jump to: navigation, search

Like almost all large websites, Wikipedia also suffers from the phenomenon known as link rot, where external links go stale after a period of time. As of the September 13, 2005 database dump, Wikipedia contained 845,416 external links, many of which are no longer functioning. Such dead links are unprofessional, and should be fixed on a regular basis.

This page is intended to be a clearinghouse for all such external links. If you make corrections to the source article to fix a broken link, please indicate so below to prevent a duplication of effort.

Although the sections below contain a short description of the status code in question, please see the list of HTTP status codes for a more complete description.

Contents

Status codes

200

The 200 status code indicates that the link is correctly formed, and retrievable. Although such links do not need correction, they are included here for completeness. Wikipedia currently contains 704,147 of these links. Due to the sheer number of links that correctly resolve, these are not available for download.

300

Indicates that the website requested more information from the bot so that it could make an appropriate presentation of the content. Although such links are most likely correct, they should probably be double checked. Wikipedia currently contains 36 of these links.

301

Indicates that the content has been moved permanently, and that the link inside Wikipedia should probably be updated to reflect the new location. Wikipedia currently contains 21,538 of these links.

302, 303, 307

Indicates that the content has been temporarily moved, and that the client should continue to use the original link. Although these links should be correct in theory, they are often used by link farms, and should probably be checked. Wikipedia currently contains 46,767 status 302 links, 198 status 303 links, and 6 status 307 links.

400

Indicates that the site in question could not understand the bot's request. Although these should hopefully diminish with future revisions of the bot, it may be useful to test them anyways (low priority). Wikipedia currently contains 1,205 of these links.

401

The page required authorization, which the bot does not support. The page in question may have included login information, the bot has no way of knowing this. Such links should be fixed if the page does not contain login information. Wikipedia currently contains 210 of these links.

402

Although not an active status code, the servers used it anyways. It indicates that the server requested payment (in theory) from the client. Such links should be fixed. Wikipedia currently contains 1 of these links.

404, 410

The 404 error is the most common symptom of link rot, and it indicates that the page has not been found. The 410 status code is similar, but indicates that the server doesn't know whether the situation is permanent or not. Such links must be fixed, perhaps with a link to the Internet Archive. Wikipedia currently contains 24,012 status 404 links and 31 status 410 links.

405

Indicates that the bot request was of a method not allowed. Since regular Wikipedia links are of the HTTP variety (which the bot uses), these links are probably broken and should be fixed. Wikipedia currently contains 3 of these links.

406

Occurs for a number of reasons, indicates that the client request was unacceptable in some manner. Should probably be fixed. Wikipedia currently contains 94 of these links.

409

Indicates some sort of error that the client needs to resolve. Should probably be fixed. Wikipedia currently contains 4 of these links.

412

Indicates that the request failed to meet some sort of precondition. Should probably be tested. Wikipedia currently contains 3 of these links.

423

Although not an active status code, servers use it to indicate some sort of "Locked" error. Should probably be fixed. Wikipedia currently contains 4 of these links.

425

Another non-active status code. Although the bot was not mirroring their content, it indicates that the server denied the request due to it being a "mirroring" request. Should probably be tested. Wikipedia currently contains 21 of these links.

5xx

Indicates there was some sort of internal server error. This could be the result of a malformed bot HTTP request, or numerous other reasons. Should be examined to determine whether or not the site is suffering from some sort of permanent problem with the link in question. Wikipedia currently contains 4,269 500 status links, 10 501 status links, 135 status 502 links, and 193 status 503 links.

NA - Unsupported protocol

Indicates that the link was used a protocol such as IRC, Gopher, etc. that the bot is not capable of resolving. Should be checked as to whether or not the resource type is correct (eg, htttp://www.wikipedia.org). Wikipedia currently contains 171 of these links.

NA - Unknown error

Indicates that the had some sort of difficulty resolving the link in question. Could be caused by a number of errors: DNS lookup failures, socket timeouts, etc. The default socket timeout was set to 30 seconds, which may be too low for some very slow sites. Should probably be tested. Wikipedia currently contains 39,749 of these links.

Downloads

Below are links to download tab separated text files (gzip compressed) containing the links. They are in the form:


Article title, tab, URL, tab, further description (as in [http://www.wikipedia.org/ Wikipedia] links), tab, error code, tab, server response. These should probably be located to somewhere more permanent in the future.

200 (not currently available)

300 - 301 - 302 - 303 - 307

400 - 401 - 402 - 404 - 405 - 406 - 409 - 410 - 412 - 423 - 425

500 - 501 - 502 - 503

NA (Unsupported protocol) - NA (Unknown error)

The 404 errors have pages to themselves:

  • a, 1782 entries
  • b, 1246 entries
  • c, 1699 entries
  • d, 985 entries
  • e, 739 entries
  • f, 883 entries
  • g, 773 entries
  • h, 889 entries
  • i, 642 entries
  • j, 1366 entries
  • k, 513 entries
  • l, 1468 entries
  • misc, 539 entries
  • m, 2007 entries
  • n, 951 entries
  • o, 538 entries
  • p, 1163 entries
  • q, 80 entries
  • r, 894 entries
  • s, 1838 entries
  • t, 1299 entries
  • u, 460 entries
  • v, 303 entries
  • w, 681 entries
  • x, 41 entries
  • y, 119 entries
  • z, 95 entries

Status

Please indicate your correction status in the form "123: ABC - XYZ", eg, "404: African Academy of Sciences - anonymous remailer"

300: None
301: Agesilaus II - Cough CPR, All Too Flat, Alvito, Boto, Eight Crazy Nights, Family First Party, Fine motor skill, Gabrielle Pizzi, Mighty Mohawk Man, Zane, ZZ Top
302: None
303: None
307: None
400: None
401: None
402: Kayo Hatta. All fixed!
404: Controlled Combustion Engine, P*U*L*S*E - People's Instinctive Travels and the Paths of Rhythm, Punjab Regiment (Pakistan) - Père Noël, Z M Dagar - Z39.50
405: All links work correctly, despite 405 error. Probably a web-server configuration issue.
406: A Special Sesame Street Christmas - Bay County, Florida. Links either worked anyway or have been fixed.
409: Court of Session - Self-hatred. All fixed!
410: 2004 U.S. election voting controversies, Ohio - Publieke Omroep
412: All links tested, work correctly.
423: Natib Qadish - The Leopard Man. All fixed!
425: None
500: None
501: None
502: None
503: None
NA (Unsupported protocol): Fixed numerous links with typos. Everything else "looks" correct.
NA (Unknown error): None

Personal tools