Some old news about digitisation
Last year I gave a couple of workshops about the digitisation of cultural materials by libraries, museums, archives and the like. I didn’t quite get around to sharing the presentation because it contained a lot of material not evident on the 40 odd slides that I used, so here is is in relation to the relevant slides as presented by SlideShare. Apologies in advance for the long post, but it was a half-day workshop.
I’ve not had the time to go through this and check all of the hyperlinks, so if you find some that are not functioning, please let me know.
Links used in this presentation:
Most can be found via http://del.icio.us/malbooth
AWM home page http://www.awm.gov.au/
Unit war diaries online http://www.awm.gov.au/diaries/
Private records currently online (eg) http://www.awm.gov.au/findingaids/process.asp?collection=private&item=100days
Official Histories online http://www.awm.gov.au/histories/index.asp
I managed the Memorial’s Research Centre 2001-2008 and curated the 2007-2008 exhibitionLawrence of Arabia & the Light Horse. In that exhibition, we borrowed extensively from UK cultural institutions and collections and we also highlighted little known aspects of our own collection including recently digitised war diaries of the Light Horse Brigades and Regiments. It wasn’t sufficient to just shovel that up on the web in a database: we used a blog to draw some material out, relate it to the stories we were telling in the exhibition itself and to expose recently digitised war diaries from the First World War.
Avoiding missed opportunities
- Learning to compromise – adjust targets and use good enough when appropriate (the 80:20 principle)
- Not everything can happen at once, so prioritise and be patient & generally your users will be patient too
- The rest of the world won’t wait and it isn’t appropriate for us to make them wait for our content (nobody is really that unique)
- Positioning yourself to be able to take advantage of advances in IT – getting over the initial sea-sickness and becoming used to the dynamism of info technology and the opportunities in economies of scale and ‘commoditisation’
- Searchability & findability
- Sharing & cooperating with others (nobody has everything) – and this means the public can help you too!
- Creative Commons?
- Free content – I don’t think pay-per-view digitised content helps anyone much at all
Innovate or die
- Experiment & make a few mistakes – learn from them – call it “active learning”
- Don’t just be restricted to what others have done with new technologies – vary your digitised content & the way it is presented: podcasts, blogs, wikis, search, SlideShare, Del.icio.us, Google, etc.
- Surprise your users! Anticipate their needs and give them something they don’t even know they want yet.
- Don’t ask users what they want – it is like driving in the rear view mirror. Keep in touch with them and understand their needs.
- People never do what they say they will do, so don’t ask them, prototype solutions as early as possible.
- Don’t be a slave to all the old rules – some are now irrelevant or obsolescent. In some cases they were always useless!
For Internet Archive, see this recent article:http://www.wired.com/entertainment/theweb/multimedia/2008/03/gallery_internet_archive
PORTICO – http://www.portico.org/
Dynamism: digitisation is a dynamic field and there are no set or concrete answers. While I was researching new research papers emerged on the use of JPEG 2000 and the digital curation cycle and I had to touch on both of those. I think the only way to stay up-to-date is to become an avid and regular reader of selected relevant blogs.
Preservation: There are some who don’t believe or understand the essential link between any digitisation program and preservation. But it is there and it is there in two forms. Firstly the BENEFIT: because we do digitise as a preservation strategy. In our institution we have preserved documents and images that could only have been saved using digital techniques. Thermal papers meant historic documents were disappearing before our eyes and acetate syndrome was destroying rare photos. There is also an often disregarded preservation benefit in giving access to digital surrogates which prevents or minimises the risk involved in allowing physical access to rare, fragile or unique collection items. Secondly, the OBLIGATION: whatever you create through digitisation programs or projects needs to be preserved: through a curatorial life cycle, just like other collections do, but with different requirements as applicable to digital objects and collections.
Learning by Playing: Adults learn best by doing (at least in my experience) – sorry, but I think it is true and this is my blog. I’ve also been involved in digitising archival and museum materials for a long time now and I reckon we’ve learned more through our projects than any courses any of us have ever undertaken. So my motto would be “start now and learn by doing”. The chances are the authorities will probably go for hardened criminals like mass murderers before they come after you, so you’ve got a bit of time up your sleeve.
Management & Planning: A lot of useful material of late about digitisation has been indicating the importance of abiding by sound project management principles and using appropriate planning methodologies in your initiatives. This greatly assists us when the authorities (decision makers and purse holders) come after us or don’t understand what it is we are doing and why we are doing it.
Compromise: This comes up continually in our projects and you won’t see it in any of the theories emanating from academia and various standards organisations. The fact is that hardly anyone I know in this field meets all the criteria and principles that are mapped out for them or even mapped out by them. All the practitioners have made compromises somewhere, whether it be in metadata, file formats, digital preservation, QA, storage, evaluation, reports or many other critical elements of digitisation.
Access: Providing access to the product or output of your digitisation programmes and providing it in the best possible way (so that it is easy for all to find and use) is essential for at least those of us in cultural institutions. It is part of the curatorial life cycle and must be considered a valid and critical element of any initiative.
Unit war diaries: http://www.awm.gov.au/diaries/
Images (example): http://cas.awm.gov.au/item/ART25701
Official histories: http://www.awm.gov.au/histories/index.asp
We didn’t really advertise it, but we gradually phased out the old photocopiers in our copying service and replaced them with ‘multi-function devices’ that can scan and copy and set out to provide digital copies where convenient to both the client and ourselves. So 90% of on demand copying from our archives became digital and therefore required no re-scanning.
This operation required a compromise in the resolution that we digitised at – in order to meet the speed needed to keep up with user demand. In time, technology will draw the resolution that this program achieves together with that of our other preservation standard digitisation programs.
In 2009 the Memorial provided about 300,000 images to Picture Australia (I’m not sure how many are now on PA in total). It is all automatic: Collection records go to the collection access system (or OPAC) each night, and PA harvests every night too, so images get to the PA within 2 days.
We use OAI PMH to do this – to be discussed later in the presentation.
I think Picture Australia is a great example of a collaborative approach, in this case regardingACCESS to digitised collections across our nation.
Deciding to Digitise
Reasons for embarking on a digitisation project might include:
- Providing better access to unknown or little used collections
- Offering better search and retrieval facilities for an image collection
- Providing a better understanding of original works through improved indexing or some form of digital image enhancement
- Creating resources that are suitable for use in research, learning and teaching
- Enhancing the public knowledge, recognition and understanding of the collection
- Fragility of collections or objects within collections
- Uniqueness, so loss of originals would be catastrophic (creation of preserved digital surrogates)
- Relationship to other digitised collections
DCC’s Appraisal & Selection http://www.library.cornell.edu/preservation/tutorial/contents.html and TASI’s Selection Procedures http://www.tasi.ac.uk/advice/creating/selecpro.html
The NISO Framework of Guidance for the Building of Good Digital Collections may be of assistance in the development and achievement of a set of principles (covering collections, objects, metadata & projects) for your digitisation project(s). Seehttp://www.niso.org/framework/ (it was being updated in 2008-09)
Collections (organised groups of objects)
1. An agreed and documented collection development policy
2. Sound collection description (metadata covering scope, format, access restrictions, ownership, authenticity, integrity and interpretation)
3. The collection is curated over its lifecycle (data management, archiving and digital preservation)
4. Broad access to all (usability)
5. Respect for intellectual property (see IPRIA Guidelines: Copyright and Cultural Institutions: Short Guidelines for Digitisation by Emily Hudson & Andrew T Kenyon)
6. Evaluation mechanisms for use and usefulness (metrics)
7. Interoperability (metadata is shared; fit within a broader context)
8. Integration of staff and user workflows (where possible/relevant – Web 2.0)
9. Sustainability of the collection and continued usability
Objects (digital assets)
1. Production ensures collection priorities & maintains interoperability and re-use (NISO Framework 3rd edition includes a useful table on the reformatting of non-digital materials from page 28-36; see also (http://www.digitalpreservation.gov/formats/index.shtml ).
2. Preservability: persistence & accessibility over time; across evolving media, software & formats
3. Meaningful outside its context: portable, reusable, interoperable
4. Persistent identifiers: URLs or URIs
5. Authentication: veracity, accuracy & authenticity
6. Inclusion of associated metadata: descriptive, administrative & structural (if needed)
Very briefly, metadata may be grouped or described as:
Descriptive: Facilitates discovery and describes intellectual content;
Administrative: Facilitates management of digital and analog resources;
Technical: Describes the technical aspects of the digital object;
Structural: Describes the relationships within a digital object; and
Preservation: Supports long-term retention of the digital object and may overlap with technical, administrative, and structural metadata
1. Appropriate to materials, users and use
2. Support for interoperability: mappings & crosswalks between schemes
3. Use of authority control and content standards
4. Includes a clear statement on conditions of use for the objects (eg. fair use)
5. Support for long term management, eg. the PREMIS working group’s metadata set (seehttp://www.oclc.org/research/projects/pmwg/ )
6. Metadata records are treated as digital objects
Further reading: DCC’s Metadata; TASI’s Metadata Overview, Metadata Standards and Interoperability & Getting Practical with Metadata
There are several pretty reasonable and brief introductions to the following metadata schemes or models on Wikipedia.org: METS, DCMI, EAD, MARC, RDF as well as the Open Archives Initiative Protocol for Metadata Harvesting (OAI PMH), which is used to enable metadata harvesting for Picture Australia.
See also RUBRIC’s excellent metadata overview page here:
Digitisation initiatives (the creation & management of collections)
1. A substantial design and planning component (see TASI’s Generic Image Workflow)
2. Appropriate staffing and expertise (see particularly:http://www.tasi.ac.uk/advice/managing/managing.html )
3. Best practice project management (see TASI’s Project Management and Cornell’s Management 1 & 2)
4. An evaluation plan (see http://www.useit.com/papers/heuristic/ )
5. A project report that documents the process & outcomes
6. Consideration of the entire lifecycle (ongoing management)
See: Melbourne Law School Legal Studies Research Paper No. 141 February 2006: Copyright and Cultural Institutions: Short Guidelines for Digitisation by Emily Hudson and Andrew T Kenyon
& http://www.copyright.org.au/publications/books/b130.htm from the Australian Copyright Council
Much of the AWM collection was acquired before Copyright legislation was enacted and well before we had professional archivists and curators on staff who understood processes such as transfer of ownership.
In addition, where copyright owners were recorded, there is a difficulty associated with tracking their own movements and addresses.
Orphaned works are numerous in the collection – where there are no known copyright owners or creators of the works in question. It became a risk management issue.
Most of the Private Records (personal manuscripts) consist of unpublished works, for which copyright exists in perpetuity (although there are still some provisions relevant to libraries for research purposes).
Multiple owners of works responsible for underlying works are very difficult to deal with – it is particularly so with collections such as sound and film works. Many creators may also have been involved in the production of things such as prisoner-of-war camp concert programs, troopship serials, etc.
In some cases, further restrictions are imposed by owners when they donate collections – such as access only within a designated Reading Room, or for certain purposes or only after a certain date or death.
The libraries and archives provisions set out instances in which cultural institutions can reproduce collection items, and communicate them to the public. See the IPRIA Guidelines.
The digitisation of published material presents some interesting challenges concerning publisher’s rights, subsequent publishers, authors, new contributors, etc.
The AWM Act encourages staff to use every endeavour to make the most advantageous use of the collection and I firmly believe that means using digitisation to provide online access to as many collections as is feasible. It is no longer realistic or sufficient to expect people to view these extensive and historic collections only in a Reading Room.
We often talk about a risk management approach, but few organisations actually take real steps. Some examples can be seen with digital material provided online by SLSA, SLQ and the Library of Congress. We are also slowly and carefully moving in the same direction and have already used such a disclaimer for certain small projects.
Under the recent revisions to the Copyright Act, it appears that a new community wide approach is close to being agreed.
TIFF or JPEG; 300 or 600 dpi; metadata (nobody is perfect)
In-house or outsource – experiment & learn
The preservation principle (use-neutral scanning) – there are always exceptions/compromises (eg. captured Japanese documents)
Some advantages of outsourcing
- The contractor or bureau is responsible for capital equipment costs
- The contractor or bureau is responsible for hiring and the training of operators
- You don’t have to find large spaces for people and equipment to be housed to undertake the project
- You don’t have to develop an in-depth knowledge of digitisation
- It may be cheaper than digitising in-house if the bureau has labour saving technology or is able to achieve good economies of scale
- The bureau may be able to achieve a better quality result if they have high-end equipment
- You should be able to get a better fix on costs and timescales, since the bureau will have a good understanding of its workflow and will have made a contract with you
Some disadvantages of outsourcing
- You don’t develop institutional knowledge or capacity in digitisation and capture techniques
- There are risks associated in working with external parties to deliver digitised content (see below)
- You have to develop some understanding of digitisation projects in order to deal competently with an external digitisation bureau – understanding technical terminology, understanding and agreeing technical specification and quality control within a Service Level Agreement or contract
- The bureau may not be able to accommodate your timescales (or meet all of your needs)
- A bureau may charge you more than it would cost to digitise in-house (this is especially so with oversized or non-standard materials)
- You will still need to operate your own internal quality control checking to ensure the quality of the bureau’s work before you sign the invoice (you can’t take their word for it)
- You will also need to arrange transport and, for valuable items, if they are allowed off your premises, insurance
- There may be some risk to your material if it is going off-site
- Staff and expertise can change at a bureau suddenly and after contracts are signed
Mostly from TASI http://www.tasi.ac.uk/
- Choosing a bureau – In choosing a bureau, ask for sample of their work and a reference sites for previous work. Make sure that you talk to previous clients and find out about their experiences. Remember that the cheapest is not necessarily the best; so don’t base your decision-making on cost/time factors alone.
- The range of material to be digitised – Will the digitisation only involve scanning a collection of similar sized originals or could it involve scanning and digitally photographing oversize and 3D originals? Some bureaux will not be able to offer both a scanning and photographic service. If the range of material is not discussed from the outset, the final cost could be much higher than the original estimate.
- Collaborating with others in outsourcing – Often the more you outsource the cheaper the rate. In some circumstances it may be worthwhile joining with other collections to increase the number of works being sent: but this needs to be carefully managed and all parties need to agree to common standards for capture and file formats.
- Quality Assurance – Make sure you see samples of the work before the whole job is done: build in a QA procedure and budget time to check work.
- Metadata – You need to give some thought to metadata, especially technical metadata. Are you going to get the bureau to record this too, or will you deal with all the metadata yourself? Will you outsource some of this too (e.g. OCR/text transcription)
- Partial Outsourcing – It’s not necessarily a matter of either/or: you may only want to outsource part of your digital capture or metadata entry. This is commonly done where there is outsized or delicate material. We have found this to be the best solution for very large and complex projects and where we simply don’t have all the required expertise.
Investing in an Intangible Asset – http://www.dcc.ac.uk/resource/curation-manual/chapters/intangible-asset
- A major problem lies in the lack of a reliable and objective valuation of intangible assets, which gives rise to deficiencies in the information available to shareholders, business analysts and managers taking investment decisions.
- Long term digital preservation may be viewed as a form of intangible investment that shares the difficulty of setting values on the expected stream of benefits of preservation over time. This prevents a clear-cut investment case being made for digital preservation as a long term on-going programme in any organisation, yet the costs of not preserving in many cases could be high if action is not taken.
- More and better information on both costs (under different technical and organisational regimes) and benefits is needed to provide the incentive for the managers of investment to take robust decisions.
- Digital materials are not homogeneous and the economic and management properties of different types of materials need to be explored in more detail, including empirical measurement of value influences, time scales and potential ‘markets’ for the future usage.
- The present relatively immature stage of digital preservation leaves considerable scope for market creation and development, which in turn seems likely to become an important area for research and experimentation.
- The cultural heritage community has a clear sense of the importance of long term digital preservation programmes, but faces challenges in presenting attractions and incentives for the controllers of investment resources. Greater credibility for investment cases requires the development of business cases based on strong empirical evidence, clear cause and effect relationships and alignment with institutional or business strategic objectives. A promising approach to this is the modified balanced scorecard model currently being pioneered, bringing together the interests of the various stakeholders in a paradigm that bridges the gap between business decision takers and the information professionals.
- Digital preservation is, of course dependent on technological developments but as argued here, the management and organisational conditions are equally vital. Both dimensions matter and will interact with each other. It is worth remembering, then, that the introduction of long term preservation programmes constitutes an organisational innovation which itself will require to be managed effectively, especially since there is no guarantee of private and organisation interests coinciding.
AWM Digitisation Project Process – Main Steps:
- Appraise & Scope Digital Project (includes rights and permissions)
- Determine Specifications and Purpose
- Estimate Resources Needed
- Create Databases
- Prepare Collection Items
- Scan or Photograph, Manipulate & Save Images
- Check Image Integrity (QA)
- Image Conservation and Derivative Creation
- Metadata Creation
- Image Back-up
- Create Export for Web & Create Web Pages
- Ongoing Digital Asset & Database Management
- Retrieval of Preservation Images for Publication & Exhibition
Note: this was initially circulated as a draft from the UK’s DCC (JISC) see: www.dcc.ac.uk in 2008-09.
I am yet to fully understand how the model works, but believe it is a very useful draft concept to map out the digital curatorial process as a cyclical one. I do think we need to look at this business as a curatorial process and not simply as a “scanning project”. It looks as though some of their definitions and explanations need to be more clearly defined and note whether they include some of the processes that I think are missing in the draft (see below).
The Curation Lifecycle
The DCC Curation Lifecycle Model provides a graphical high level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt. The model can be used to plan activities within an organisation or consortium to ensure that all necessary stages are undertaken, each in the correct sequence. The model enables granular functionality to be mapped against it; to define roles and responsibilities, and build a framework of standards and technologies to implement. It can help with the process of identifying additional steps which may be required, or actions which are not required by certain situations or disciplines, and ensuring that processes and policies are adequately documented.
Some other stages and processes that need to be incorporated somewhere into this model include:
- setting up the IT infrastructure
- procuring hardware, software, staff, partners/vendors (as needed)
- formulating policies, guidelines & setting up workflows
- obtaining or confirming copyright permissions
- preparing materials for digitisation (document/photo conservation & preparation)
- design, delivery & release of digital collections to users (web, optical devices, publication, etc.) – although that could be assumed to be part of “Access, Use and Re-use”.
JPEG 2000 http://www.jpeg.org/jpeg2000/index.html
DNG http://www.adobe.com/products/dng/ (as a growing solution for those who need to keep camera raw files in a non-proprietary format)
Of note is some recent (March 2008) research from the National Library of Netherlands which looked at the storage of master files that indicates the clear leadership of JPEG 2000 (against JPG, TIFF and PNG) in a comparative matrix measuring each for: standardization; storage savings; image quality; long term sustainability; and functionality. I think there is a lot in this study as far as mass digitisation programs are concerned, but I know that JPEG 2000 is not very popular with professional photographers (purists) who favour keeping TIFF and RAW files.
See also JPEG 2000 as a Preservation and Access Format for the Wellcome Trust Digital Library by Robert Buckley & Simon Tanner http://library.wellcome.ac.uk/assets/wtx056572.pdf
TASI – The Digital Image, File Formats and Compression, New Digital File formats & Choosing a File Format
DCC – File Formats
Cornell – Common Image File Formats
NARA digitizing-archival-materials http://www.archives.gov/preservation/technical/guidelines.html
Harvard – Page Image Compression for Mass Digitization http://preserve.harvard.edu/massdig/hul_study/index.html
For further reading:
Cornell: How scanners work; Scanners comparison; Scanners types
TASI: Scanners; Digital Cameras
NINCH: Equipment http://www.nyu.edu/its/humanities/ninchguide/appendices/equipment.html
TASI’s Image Editing Software.
DCC’s Open Source for Digital Curation
Just doing it http://www.awm.gov.au/histories/volume.asp?conflict=1
Facebook ArtShare http://www.facebook.com/profile.php?id=688691662
Facebook page http://www.facebook.com/pages/Canberra-Australia/Australian-War-Memorial/7244252524
How to create a Google-friendly site http://www.google.com/support/webmasters/bin/answer.py?answer=40349&topic=8522
TASI’s Using Optical Media for Digital Preservation, An Introduction to Digital Preservation, & Establishing a Digital Preservation Strategy
DCC’s Preservation Strategies for Digital Libraries, & Preservation Metadata
OCLC’s (RLG) & CLR’s Trusted Repositories Audit & Certification (if you can find it)
Most of our needs will be familiar to everyone. So, the things we “discovered”:
- Some more mechanical discoveries included the simple mechanics of dismantling pages & scanning UN-NUMBERED pages of photos/illustrations & then how to index these online
- Innovative solutions were provided that we could not envisage before doing it for real – discovery by active learning
- Some form of collaboration seems possible with a keen private sector firm who also see mileage in this for their own skills
- Money purely for access projects seems hard to find (for us), but there is money for creation of new collections via digital preservation and sponsorship of digitisation projects, even on a small scale is possible
- We also discovered in-house enthusiasm & expertise that was keenly applied to the project and that it had some unexpected technological spin-offs (re metadata, etc.)
- Some suppliers know about pricing and have the right equipment, but do not know how to use it for our purposes
- Ensuring a quality product that we are happy with has not always been possible through automation
- I doubt we have had any digitisation projects where we have not had to make some compromises