Wiktionary:Searchable external archives
Wiktionary has an attestation process for entries (WT:ATTEST) that includes a requirement that citations used in attestation are citations from permanently recorded media (that the media is durably archived).
This is a list of durable archives that are searchable for free. It is intended as a resource for finding citations of words to show that they satisfy Criteria for Inclusion. Not listed are resources that require paid membership, even after a trial period, or otherwise charge a fee to perform the search or retrieve quotations. This excludes sites such as Amazon.com, which requires a previous purchase to preview material, and Jstor.org, which requires subscription.
Printed media
editThe most commonly cited sources are printed books, magazine and journal articles, and newspapers.
- Books.google.com is Wiktionary's go-to engine for searching books and some magazines.
- scholar.google.com is a good engine for searching academic, scientific and medical journals. Google Scholar can be used to find mathematical symbols that are otherwise ignored by search engines and hence unfindable: search for the symbols' Latex notation.
- Issuu.com is a large index of newspapers and magazines.
- Wikisource - large, searchable collections of printed works
- Project Gutenberg - large, searchable collections of printed works
- HathiTrust Digital Library - large, searchable collections of printed works
- The Internet Archive provides full-text search over its large collection of scanned books and magazines. Material still under copyright can be consulted for free with an account.
Note that Google Books, Google Scholar and Issuu sometimes index e-publications which do not exist in print, so mere inclusion in one of these indices is not a guarantee that a source is durably archived. Use the parameters doi
, isbn
, issn
, jstor
, lccn
, oclc
, ol
, and/or id
in Template:quote-book, Template:quote-journal, etc. to help confirm that printed materials are durably archived.) To confirm that material found on Google Books is durably archived, on the page for the book or materials in question, click "Find in a library" which should direct you to a WorldCat page for the book in question.
Laws are also durably archived, and several websites exist to allow search corpora of them:
- Ireland: Houses of the Oireachtas (debates.oireachtas.ie, historical-debates.oireachtas.ie)
- United Nations Educational, Scientific and Cultural Organization
- United States of America, Federal: FindLaw
- United States of America: Legal Information Institute
Resources with narrower scope
editEarly English Books Online (EEBO) is a helpful resource for words used in English before 1700. Spelling during this period varies wildly and some creative searching is often required (the "word index" function can be helpful for this). Note that texts written before 1500, even if published later, are treated as Middle English and are not valid for Modern English attestation on Wiktionary.
Several institutions maintain corpora of English language works; in alphabetical order, these include:
- Brigham Young University Corpus of Contemporary American English
- British National Corpus
- The Free Library thefreelibrary.com
When attempting to attest an obscure, obsolete, or dialectal term, it can be useful to consult the Century Dictionary ({{R:Century 1911}}
) and Wright's English Dialect Dictionary ({{R:EDD}}
), as these often provide pointers to books/manuscripts where the terms have been used.
Numerous websites maintain searchable copies of the Hebrew and Greek texts of the Bible, as well as numerous English, Latin, and other-language translations. These include BibleGateway.com, Biblehub.com, and Bible.cc.
Resources specific to languages other than English
edit- Austrian literature online (German)
- Biblio (Portuguese)
- Bibliothèque nationale de France (French)
- Germany: Klaus Graf's Zeitungsarchive search engine (German)
- custom search in several German newspaper archives at once, including:
- zeus.zeit.de, welt.de, netzeitung., taz.de, berlinonline.de, spiegel.de, stern.de, freitag.de, jungewelt.de, nd-online.de
- Germany: Internet-Links für Journalisten (German) recherchetipps.de
- Germany, Berlin: taz - die tageszeitung (German) www.taz.de
- Vietnam: Thư viện Quốc gia Việt Nam (Vietnamese; look for collections like [1])
- polona (Polish)
- National corpus of the Polish language (Polish)
- Netherlands: delpher (Dutch)
Audio and video media
editSome audio and video media produced in some countries are durably archived by libraries; these include commercially-released songs, motion pictures, and television shows. imsdb.com, the Internet Movie Script Database, provides a searchable archive of movie scripts.
Usenet
editUsenet is considered durably archived because its archives are decentralized. It has been accessible continuously since 1980, before the creation of the World Wide Web. It can be accessed through Google Groups.
Other online media: websites are not durable
editWebsites are not considered durably archived; do not add any web search engines here. Sites such as web.archive.org and WebCite[1] attempt to archive the Internet where possible, but at present cannot be considered durable because they are at the mercy of the original copyright holders. (Note: citations from the web may be useful if they are particularly good examples of the use of a word or sense, and may be retained for this reason even though they do not help the word meet CFI.)
Media resources such as YouTube, intended for online use only, are not considered durably archived. If the material is taken from another source, such as a movie or television show, cite the original source.
Other media
editMonumental inscriptions such as runestones are also durable, particularly because they are often reproduced in printed literature. Various websites document runic and other inscriptions in a searchable way; these include CISP for Celtic inscriptions, Rundata for Norse Runic inscriptions (requires downloading a client), and the Epigraphic Database Heidelberg for Latin inscriptions (search page).
Gadget for finding and formatting citations
editIn the Gadgets tab of Special:Preferences, you can find and activate Quiet Quentin, a gadget which allows users to search Google Books for a term, and creates quotations formatted to Wiktionary’s standards.