SPEC Kit 329: Managing Born-Digital Special Collections and Archival Materials · 167
University of Michigan
BHL Web Archives: Methodology for the Acquisition of Content
http://bentley.umich.edu/dchome/webarchives/BHL_WebArchives_Methodology.pdf
August 2, 2011 6
crawl time would result in unnecessary content (for instance, the archivist
only wants to capture a blog’s most recent posts and is not interested in the
entire site).
Capture frequency: designates how often a crawl will be repeated. The
archivist may elect to crawl a site once or configure the robot to perform
daily, weekly, monthly, or custom captures (see Figure 3).
Figure 3
Archivists generally choose the “Custom” option and select an annual capture date,
being mindful of important events/dates that might result in updates to the target
site. (For instance, University of Michigan sites are captured near the beginning or
end of the academic year.) This strategy is particularly effective with ‘aggregative’
websites in which new content is placed at the top/front of pages while older
information is moved further down the page or placed in an ‘archive’ section. For
high priority targets (such as the University of Michigan Office of the President) or
sites with a large turnover of important content, captures may be scheduled on a
more frequent basis.
As the foregoing discussion reveals, the accurate and effective configuration of crawl
settings must be based on the archivist’s appraisal of content and understanding of
the target site’s structure. The failure to consider these factors may lead to a capture
that, on the one hand, is narrowly circumscribed and incomplete or, on the other, is
unnecessarily broad and filled with superfluous information.
Previous Page Next Page