请输入您要查询的百科知识:

 

词条 Web ARChive
释义

  1. Software

  2. References

  3. External links

{{distinguish|webarchive}}{{Infobox file format
| name = Web ARChive
| icon =
| iconcaption =
| icon_size =
| screenshot =
| screenshot_size =
| caption =
|_noextcode =
| extension = .warc
|_nomimecode =
| mime = application/warc[1]
| type code =
| uniform_type =
| conforms_to =
| magic =
| developer =
| released =
| latest_release_version =
| latest_release_date =
| genre =
| container_for =
| contained_by =
| extended_from = ARC[2]
| extended_to =
| standard = ISO 28500:2017[3]
| free = Yes
| url = {{Url|https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/}}
}}

The Web ARChive (WARC) archive format specifies a method for combining multiple digital resources into an aggregate archive file together with related information. The WARC format is a revision of the Internet Archive's ARC File Format[4] that has traditionally been used to store "web crawls" as sequences of content blocks harvested from the World Wide Web. The WARC format generalizes the older format to better support the harvesting, access, and exchange needs of archiving organizations. Besides the primary content currently recorded, the revision accommodates related secondary content, such as assigned metadata, abbreviated duplicate detection events, and later-date transformations.[5]

WARC is now recognised by most national library systems as the standard to follow for web archival.[6]

Software

  • Heritrix web archiver in Java
  • wget (since version 1.14[7])
  • Webrecorder
  • StormCrawler
  • Apache Nutch

References

1. ^{{cite web|title=application/warc|url=https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#warc-file-name-size-and-compression|accessdate=17 March 2018}}
2. ^{{cite web|title=Introduction|url=http://archive-access.sourceforge.net/warc/warc_file_format-0.16.html#anchor1|accessdate=5 March 2015}}
3. ^{{cite web|title=Information and documentation -- WARC file format|url=https://www.iso.org/standard/68004.html|accessdate=16 March 2018}}
4. ^{{Cite web|title = ARC_IA, Internet Archive ARC file format|url = http://www.digitalpreservation.gov/formats/fdd/fdd000235.shtml|website = www.digitalpreservation.gov|accessdate = 2015-05-09}}
5. ^{{Cite web|title = WARC, Web ARChive file format|url = http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml|website = www.digitalpreservation.gov|accessdate = 2015-05-09}}
6. ^http://digitalia.sbn.it/article/view/1473
7. ^{{Cite web| url = https://lists.gnu.org/archive/html/info-gnu/2012-08/msg00002.html| title = GNU wget 1.14 released| last = Scrivano| first = Giuseppe| date = August 6, 2012| website = GNU wget 1.14 released| publisher = Free Software Foundation, Inc.| access-date = February 25, 2016}}

External links

  • http://archive-access.sourceforge.net/warc/
  • http://bibnum.bnf.fr/WARC/
  • http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
  • https://netpreserve.org/resources/WARC_Guidelines_v1.pdf
  • https://iipc.github.io/warc-specifications/
{{Web-stub}}

3 : Archive formats|Web archiving|Web Archives

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/9/21 4:27:57