Skip to content
Tyler Danstrom edited this page Aug 15, 2017 · 5 revisions

Introduction

Metadata storage must be able to answer the following questions about an intellectual unit.

  1. What collection(s) does this collection belong to? This is the primary intellectual access point.
  2. What is the title of this collection? This is a secondary intellectual access point.
  3. What is the publication date of this collection? This is a secondary intellectual acess point.
  4. Who is the creator of this collection? This a secondary intellectual access point.
  5. Where can I find a representation of this unit? This is the primary access point.

We have to assume that there will be AT LEAST two types of metadata in the ldr metadata storage system.

  1. Metadata about collections that belong to the University of Chicago Library and which all byte streams associated are in asset storage, e.g. OwnCloud
  2. Metadata about collections that do not belong to the University of Chicago Library and which byte streams are stored elsewhere, e.g. Luna

This means that in order to provide a REQUIRED functionality, ldr metadata storage must be able to retrieve byte streams from either asset storage or any arbitrary outside storage accessible over the Web. It is therefore MANDATORY that all publicly available byte streams be available over the Web. However, the system must be able to distinguish between a remote asset and a library-controlled asset in order to know where to point clients for the location of byte streams. In order to distinguish between the two types of byte streams, there must be some marker for the ldr metadata storage to use to make this distinction. One marker will lead the ldr metadata storage to locate the byte streams from the digcoll retriever. The other will tell it to check if the address is valid. If the address is not valid, the ldr metadata storage should notify the administrators of the ldr metadata storage as well as the person attempting to ingest material into the ldr metadata storage. Any identifier that points to an asset controlled by the library MIST be a URI, because the ldr metadata storage must know from what host to pull the byte streams and we must assume that this host will change over time. Any identifier that points to a remote asset must be a URL. This is how the system will be able to distinguish what is a library-controlled asset and remote asset.

We also have to assume that there will be a variety of descriptive metadata formats used. These are the metadata formats currently being used in library digital collections.

  • TEI
  • EAD
  • MODS
  • VRACore
  • OCR
  • MARC

This means that the metadata storage must be able to store a variety of metadata formats that are not actionable by the metadata storage. However, the metadata storage must have a single schema that it can interpret in order to provide answers to the five questions defined earlier in this document.

COROLLARY: the metadata storage should be able to store technical metadata about byte streams in digital collections. Technical metadata is currently being stored in asset storage, but there is a strong argument to be made that doing this "muddies the water" between asset and metadata. Asset ought to be strictly defined as a byte stream representing an intellectual unit or some portion of an intellectual unit. By storing technical metadata, which by definition is not a byte stream representing a whole or some part of an intellectual unit but rather information about the byte stream the asset storage is being forced to perform a task that is a violation of its primary function.

Table of Contents

  1. List of Endpoints
  2. Specifications for Each Endpoint
  3. Description of the Different Kinds of Responses to Expect
  4. Exectations for Core Metadata
  5. Expecations for Proxy Metadata
  6. How to Add Data to the System
  7. Glossary of Terms