|
Lots Of Copies Keep Stuff Safe ™ The LOCKSS project, under the auspices of Stanford University, develops and supports an open source system allowing libraries to collect, preserve and provide their readers with access to material published on the Web. The system attempts to replicate the way libraries do this for material published on paper. It was originally designed for scholarly journals, but is now also used for a range of other materials. Examples include the Solinet project to preserve theses and dissertations at eight universities, and the MetaArchive project preserving at-risk digital content about the culture and history of the American South. Stanford redirects here. ...
Traditionally, academic libraries will retain issues of scholarly journals, either individually or collaboratively, providing their readers accesss to the content received, even after the publisher has ceased or the subscription has been canceled. In the digital age, libraries often subscribe to journals that are only available digitally over the Internet. Although convenient at the time, this presents a problem for the preservation of data. If either the publisher ceases to publish, or the library cancels the subscription, the content that was previously paid for is no longer available. The LOCKSS system allows a library, with permission from the publisher, to collect, preserve and disseminate to its own readers its own copy of material to which it has subscribed, and open access material (perhaps published under a Creative Commons license). Each library's system collects its copy using a specialized web crawler that verifies that the publisher has granted suitable permission. The system is format-agnostic, collecting whatever formats the publisher delivers via HTTP. Libraries which have collected the same material cooperate in a peer-to-peer network to ensure its preservation. Peers in the network vote on cryptographic hash functions of preserved content and a nonce; a peer that is outvoted regards its copy as damaged and repairs it from the publisher or other peers. For details, see papers published at SOSP 2003 and USENIX 2005. Version 2 of Some Rights Reserved logo No Rights reserved logo The Creative Commons (CC) is a non-profit organization devoted to expanding the range of creative work available for others legally to build upon and share. ...
A web crawler (also known as a Web spider or Web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. ...
HTTP (for HyperText Transfer Protocol) is the primary method used to convey information on the World Wide Web. ...
A peer-to-peer (or P2P) computer network is a network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively few servers. ...
In cryptography, a cryptographic hash function is a hash function with certain additional security properties to make it suitable for use as a primitive in various information security applications, such as authentication and message integrity. ...
In security engineering, a nonce is a number used once. ...
The LOCKSS license used by most publishers allows a library's readers access to its own copy, but does not allow similar access to other libraries or unaffiliated readers; the system does not support file sharing. On request, a library may supply another library with content to effect a repair, but only if the requesting library proved in the past that it had a good copy by voting with the majority. If the reader's browser no longer supports the format in which the copy was collected, a format migration process can convert it to a current format. These limits on the use that may be made of preserved copies of copyright material have been effective in persuading copyright owners to grant the necessary permission. File sharing is the activity of making files available to other users for download over the Internet, but also over smaller networks. ...
The LOCKSS approach of selective collection with permission from the publisher, distributed storage, and restricted dissemination contrasts with, for example, the Internet Archive's approach of omnivorous collection without permission from the publisher, centralized storage, and unrestricted dissemination. The LOCKSS system is far smaller, but it can preserve subscription materials to which the Internet Archive has no access. Internet Archive headquarters. ...
The fact that each library administers its own LOCKSS peer, and maintains its own copy of preserved material, and the fact that there are libraries doing so worldwide (see the list of participating libraries below), provides a much higher degree of replication (computer science) than is usual in a fault-tolerant system. The voting process makes use of this high degree of replication to eliminate the need for backups to off-line media, and to provide robust defenses against attacks aimed at corrupting preserved content. Replication refers to the use of redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or performance. ...
Fault-tolerance or graceful degradation is the property that enables a system to continue operating properly in the event of the failure of some of its components. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
In addition to their role in preserving access, libraries have traditionally made it difficult to rewrite or suppress printed material. The existence of an indeterminate but large number of identical copies on a somewhat tamper-resistant medium under many independent administrations meant that attempts to alter or remove all copies would likely both fail and be detected. Web publishing, based on a single copy under a single administration, provides none of these safeguards against subversion. Web publishing is, therefore, a suitable tool for Winston Smith's job of rewriting history. By preserving many copies under diverse administration, by automatically auditing the copies at intervals against each other (and, in the future, against the publisher's copy), and by alerting libraries when changes are detected, the LOCKSS system attempts to restore many of these safeguards. Peter Cushing as Winston Smith in the 1954 BBC Television adaptation of Nineteen Eighty-Four, with Donald Pleasence as Syme. ...
The source code for the entire LOCKSS system carries BSD-style open-source licenses and is available from SourceForge. LOCKSS is a trademark of Stanford University. An open-source license is a copyright license for computer software that makes the source code available under terms that allow for modification and redistribution without having to pay the original author. ...
See also
A digital library is a library in which a significant proportion of the resources are available in machine-readable format (as opposed to print or microform), accessible by means of computers. ...
It has been suggested that Digital obsolescence be merged into this article or section. ...
External links - LOCKSS site
- Participating libraries
|