DRAFT 04 - December 23, 2003

 

 

Observations on the relationship between container elements of an EAD container list

 

 

Abstract

 

A number of EAD container list examples show uses of the <container> element that seem to be at odds with one another, sometimes within the same list. This note provides some observations on the issues involved and raises a question about the lack of a definition of implied relationships between container elements.

 

 

Background

 

Most of the textual documents describing a container list use context to convey the various container elements like boxes and folders for the items in the collection. As a human reader, there is generally enough information within a page or two to correctly determine the complete container description. Sometimes it is complete, like box2:folder3. Sometimes there is a folder column, and entries that are the same are omitted after the first so one looks back up the page to find the last folder noted. Sometimes boxes are shown in a similar column, and sometimes the information is shown only when the box changes so again one looks back to find the last box noted.

 

In any case, the need is to be able to unambiguously find the full set of information to identify the complete container for an item. Sometimes this is just a folder or a box or a folder within a box. A notation like ':folder', 'box:' and 'box:folder' completely describes the container in this case. In this note, only box and folder are described, but the problem in general may have considerably more complexity (box:folder:page, carton:reel:frame, oversize folder 3, etc.)

 

When trying to convert a container list from a textual document to an EAD XML document, similar issues arise. The best discussion I've found is in Chapter 3.5.2.4, "Physical Location and Container Information <container>" and Chapter 7.2.5 "The PARENT Attribute on the Container <container> and Physical Location <physloc> Elements" of the "EAD Application Guidelines for Version 1.0" document at http://lcweb.loc.gov/ead/ag/agcreate.html. This document directly addresses several of the main issues when the container element for a box needs to be referenced by the folder element. In particular, it addresses the issue of a box containing parts of different logical elements or components (the <c> and <cnn> elements), suggests the use of the 'parent' attribute. Much of this note is a rephrasing of that information and a discussion of a few problem areas.

 

I note that these observations come from developing a computer program to convert a textual container list into an EAD xml document. I am what is referred to as a "Domain Idiot" in object oriented computer terms, with little background in library science.

 

 

The container structure and the logical structure

 

The EAD makes clear that the hierarchy of logical elements should be captured using components (the <c...> elements) rather than the physical structure of folders within boxes. So there are two hierarchies intertwined here, the logical and the physical. Although the organizer of a collection tends to keep them closely related, they might be completely independent if the physical structure cannot be changed, for example.

 

Here is a simple example of a logical structure that highlights some of the issues.

 

Series I

          item a in box1:folder1

Series II

          item b in box1:folder2

          item c in box2:folder1

Series III

          item d in box3

          item e in box3

 

Here Series II is split between box 1 and 2, and box 1 contains items from both Series I and II. Series III and box 3 are matched although there are no folders present (for example, items in an oversized box).

 

Examined as a physical hierarchy, this looks quite different:

 

box1

          folder1

                   item a

          folder2

                   item b

box2

          folder1

                   item c

box3

          item d

          item e

 

 

Representing containers within the logical structure

 

One unambiguous way of representing the containers is to use something like

 

         <container type="box-folder">box 1:folder 2</container>

   

or the types box and folder for items in just a box or folder for each logical item of the collection. This admits scattering the items of the logical collection throughout containers in an arbitrary manner, and thus is general enough to handle any collection.  The use of containers only at the leafs of the logical hierarchy is sufficient.

 

Here is a complete 'dsc' element of the example above using this approach:

 

 

    <dsc type="combined">

      <c01 level="series">

        <did><unittitle>Series I:</unittitle>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item a</unittitle>

            <container type="box-folder">box 1:folder 1</container>

          </did>

        </c02>

      </c01>

      <c01 level="series">

        <did><unittitle>Series II:</unittitle>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item b</unittitle>

            <container type="box-folder">box 1:folder 2</container>

          </did>

        </c02>

        <c02 level="file">

          <did>

            <unittitle>item c</unittitle>

            <container type="box-folder">box 2:folder 1</container>

          </did>

        </c02>

      </c01>

      <c01 level="series">

        <did><unittitle>Series III:</unittitle>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item d</unittitle>

            <container type="box">box 3</container>

          </did>

        </c02>

        <c02 level="file">

          <did>

            <unittitle>item e</unittitle>

            <container type="box">box 3</container>

          </did>

        </c02>

      </c01>

    </dsc>

 

 

Reservations and an alternative

 

However, this does require type attribute values for all combinations of logical containment, with appropriate sets of identifiers. So one might need a 'carton-box-folder-page' coding, for example. The only composite other than box-folder I've encountered is reel-frame from the document "The Encoded Archival Description, Retrospective Conversion Guidelines. A Supplement to the EAD Tag Library and EAD Guidelines" at http://sunsite.berkeley.edu/amher/upguide.html.

 

The 'parent' attribute gives a way of creating the necessary physical hierarchy from a smaller set of attributes. So a container with type="box-folder" might also be coded with attributes type="folder" parent="box1" where, somewhere, there is a unique container with attributes type="box" id="box1". 

 

Here is the example where a container for a box occurs with the first concrete item that needs it.

 

    <dsc type="combined">

      <c01 level="series">

        <did><unittitle>Series I:</unittitle>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item a</unittitle>

            <container type="box"     id="box1">box 1</container>

            <container type="folder" parent="box1">folder 1</container>

          </did>

        </c02>

      </c01>

      <c01 level="series">

        <did><unittitle>Series II:</unittitle>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item b</unittitle>

            <container type="folder" parent="box1">folder 2</container>

          </did>

        </c02>

        <c02 level="file">

          <did>

            <unittitle>item c</unittitle>

            <container type="box"     id="box2">box 2</container>

            <container type="folder" parent="box2">folder 1</container>

          </did>

        </c02>

      </c01>

      <c01 level="series">

        <did><unittitle>Series III:</unittitle>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item d</unittitle>

            <container type="box" id="box3">box 3</container>

          </did>

        </c02>

        <c02 level="file">

          <did>

            <unittitle>item e</unittitle>

            <container type="box">box 3</container>   <!-- Note lack of an id attribute -->

          </did>

        </c02>

      </c01>

    </dsc>

 

Notice here that every container for a folder has a reference to the box containing it, even if that box container is part of the same logical component. This creates the physical hierarchy. Thus the information does not depend on context or any particular ordering of elements.

 

One nagging detail

 

The "EAD Application Guidelines" document does not show a parent attribute in a folder contained in the same component as the box element. So in the above example item a might be shown with two containers coded like this:

 

    <container type="box" id="box1">box 1</container>

    <container type="folder">folder 1</container>   <!-- Note the lack of a parent attribute -->

   

This style of coding, without the id attributes, is common in collections I have seen. For example, it occurs in the templates of the "Retrospective Conversion Guidelines" at Berkeley:

 

          <c04>

             <did>

               <container type="box"></container>

               <container type="folder"></container>

               <unittitle>[Title], <unitdate>[Date or

                date range]</unitdate></unittitle>

             </did>

          </c04>

 

However, if we change the types involved to ones where the hierarchal relationship between the container types is not so clear, the need for an explicit parent becomes apparent:

 

    <container type="folio"      id="folio1">folio 1</container>

    <container type="volume" id="vol1">vol 1</container> <!-- volume in folio or folio in volume? -->

   

In other words, a program trying to understand the physical relationships between containers needs either an explicit parent attribute or domain specific knowledge of container types. This argues for not making an exception of coding the parent attribute when the parent container is in the same component.

 

Note also that when trying to represent the containers for items d and e, we run into a dilemma since they share a container, box 3. They are not in containers with box 3 as a parent, so the parent attribute cannot be used. We must generate two instances of the container element for box 3. The first one can carry the id and be referenced, if necessary, by any folders in the box.

 

Unclear areas

 

There are also other contextual techniques that are tempting to use, like that of placing container information higher in the logical structure so that it is inherited by the lower level components and hence they need no container themselves. This suits the situation with Series III and box 3. A container may be factored out, moving it up the component hierarchy until it covers all components within that container. This is similar to what is done in textual documents.

 

      <c01 level="series">

        <did><unittitle>Series III:</unittitle>

            <container type="box" id="box3">box 3</container>

        </did>

        <c02 level="file">

          <did>

            <unittitle>item d</unittitle>     <!-- note there is no longer any container element -->

          </did>

        </c02>

        <c02 level="file">

          <did>

            <unittitle>item e</unittitle>     <!-- note there is no longer any container element -->

          </did>

        </c02>

      </c01>

   

This interprets the scope of a container as the hierarchy below it. More on this later.

 

Notice that, in general, the parent relationship is still required for folders within boxes. This is apparent in Series II, comprising boxes 1 and 2. Hence a folder with no parent reference would be ambiguous without a parent reference.

 

As attractive as this contextual approach seems, this style is at odds with the interpretation that a containers scope is the remainder of the document, as is shown in section 3.5.2.4 of the "EAD Application Guidelines", where the items with titles "Correspondence" and "Scripts and screenplays" inherit the container for Box 47 (not box 46). A computer program parsing an xml document would find that very peculiar since Box 47 is defined in a part of the logical hierarchy 4 levels up and then 3 levels down a different branch. If the reference to box 47 were done via a parent/id relationship, there would be little doubt since the id values within an xml document are checked for uniqueness and the parent references must point to a valid id.

 

Moreover, whichever interpretation of the scope of a container is used, a document using the implied context higher in the hierarchy or in an earlier part of the document will not be robust to changes in the document. In the example, if another box is added to Series III, there will be two box containers, and containers would have to be introduced back into all the lower level components. Similarly, if the container element for box 3 were inadvertently deleted, there will be no xml diagnostic to warn that items d and e no longer have containers or perhaps have inherited a container from higher in the collection. Both the lack of definition of implied relationships and the lack of robustness argues against using contextual relationships between containers.

 

Multiple containers

 

Another technique that is frequently used is listing ranges of boxes or folders within one container:

 

        <c02 level="file">

          <did>

            <unittitle>item h</unittitle>

            <container type="folder">folder 1-3, 5</container>

          </did> . . .

 

The intent here seems clear that item h is housed in 4 folders. There are other cases where multiple containers are used instead:

 

        <c02 level="file">

          <did>

            <unittitle>item h</unittitle>

              <container type="folder">folder 1</container>

              <container type="folder">folder 2</container>

              <container type="folder">folder 3</container>

              <container type="folder">folder 5</container>

          </did> . . .

 

 

In this case, the relationship between the containers is intended to be one of a union. This clashes with the more common use of multiple containers of differing types (a box and a folder, for example) where the intent is to express a parent child relationship.

 

This same structure is also seen at times at upper levels of the hierarchy. Series II from the example might have shown a complete list of containers used by all the items below it:

 

      <c01 level="series">

        <did>

         <unittitle>Series II:</unittitle>

         <container type="box-folder">box 1:folder 2, box2:folder 1</container>

        </did> . . .

 

Although a person reading these xml documents may have a clear idea of the meaning or semantics, EAD tools which use the xml documents as input need a clear definition of how to interpret multiple containers at the same level, and how to process extents contained within them. This semantic interpretation is missing from the standard.

 

 

Statement of the need for a clarification of the standard

 

The issue might be stated as follows. Given a number of container elements within an EAD component hierarchy, what is the implied relationship between them?

 

If two containers exist within one component, is one contained within the other? Is the component contained in both containers?

 

If a component with a container has an ancestor with a container, does the lower level container override the higher level container? Is there an implied parent relationship?

 

If a component has no container, does it inherit the container(s) of an ancestor? Does it have an implied container from an unrelated part of the structure?

 

 

Summary

 

Intertwining a physical hierarchy of containers within a logical hierarchy of components can be done unambiguously with the parent and id attributes of the container element. It can also be done for box and folder hierarchies using the box-folder type. However, using context implicit in either the logical structure or from earlier parts of the document raises questions about the definition of the relationship between container elements.

 

         Paul Jensen                    info@agileimage.com

         Agile Image Movers               http://agileimage.com/