176 · Representative Documents: Workflows
univerSity of michigan
Quality Assurance for BHL Web Archives
ii. Viewing the source code of the original page will help to identify
web design features or resources that may not have been
iii. Check live version of archived site (if available) to compare
appearance of archived version.
iv. Check reports/crawl logs to understand issues with the crawl.
1. Look up specific URLs to see if they were captured.
2. Trace progress of crawl, identify where issues arose.
f. If (for MHC or high priority U of M sites) linked pages have been
captured, determine if these contain significant information. This may
require consulting the “Hosts” report (or others).
5. For sites with multiple captures:
a. If there are more than 3 captures, only review a sample (i.e. the first,
one in the middle, and the most recent).
b. Check to see if content/features change significantly between captures.
Are these frequent captures necessary? Does older content (such as
course schedules or news stories) tend to stay on the site as it is
updated? Will a less-frequent capture schedule allow us to preserve the
6. If there is a notable problem with the crawl, identify the underlying cause and
document the issue on the QA spreadsheet.
a. Robots.txt exclusions
b. Crawl limits (timed out)
c. Display errors:
d. Seed redirect
e. ‘Live links’—rendering error
f. Missing .css files
g. Resources not in archive (partial)
h. Seed issues: did not capture (at all)