Workbook on Digital Private Papers > Accessioning

Harvesting websites with Adobe Acrobat Professional 7.0

The Paradigm pilot experience

Using Adobe 7.0 to archive websites is very straightforward and intuitive. We were particularly impressed with the speed with which the software captured entire sites, though most of the sites captured in the pilot were not very large or complex. The longest capture time recorded was 9 minutes for Richard Allan's weblog which comprised 1581 pages.

The Paradigm project used the default settings during its pilot, so we can't report on the impact of changing any of the settings in the Settings dialogue box. We did not test the Stay on same path or Stay on same server options because we were interested in capturing whole sites.

Initial captures were made using the default Get only 1 level(s) rather than Get entire site. Using this option, we found that once we had captured the home page (i.e. the 1 level), clicking on the links in the newly created Adobe PDF file would instruct the software to create PDF files for each page selected. This feature can be used to capture entire websites manually by systematically clicking on each link from the sites home page menu bar(s).

An example - the Paradigm project website

This website was captured on 16 March 2006. As you can see from the image below, Adobe retains the structure of the website, but it is unable to render the institutions' logos in their proper place.

adobe's capture of the 'about' section of the Paradigm website

This is the full snapshot of the website:

Conclusions

Upside

Downside

Adobe Professional 7.0 is simple to use, but the manual intervention required means that it is most suitable for small scale web preservation. The software will cost around £250, unless your institution has a site licence.

<<Previous section
Using Adobe Acrobat Professional 7.0 for harvesting websites

[Top]