Russian Doll Keywording - Is a Hierarchical Vocabulary Really Worth It?

Creating and amending a hierarchical vocabulary is par for the course in many keywording systems, but does storing keywords like a set of Russian dolls make keywording faster and more accurate?

Keywording in a hierarchy - Flora: Trees; Oak; Acorn, for instance - helps solve some important problems by making it clear by the relationships between each word, exactly what the meaning of each is.  This solves, for instance, the difficulty of words spelled the same with different meanings, eg pool (the game) and pool (a body of water).  So a hierarchical, controlled vocabulary adds precision and reduces confusion.

Controlled vocabularies have their roots in conventional library systems such as The Library of Congress.  They are powerful tools, particularly in the hands of people with at least a modicum of training,


Because of the precision of meaning a hierarchical vocabulary gives, it lends itself to translation into different languages and refining of searching by displaying higher or lower parts of the hierarchy when searching.

Despite these advantages, there are some important questions to be addressed if thinking of using and maintaining such a system when keywording images or videos:

1. Portability - If keywords are attached to images in IPTC, and are carried around as they are distributed to sub-agents photo editors, advertising creatives and so on there is little chance those keywords will plug into an identical system belonging to the organisation/company receiving them.  Indeed, once keywords are no longer linked to a database which knows that a particular instance of pool is linked to swimming and water rather than pool halls and cues, then the keywords have to fend for themselves.  That's not to say that the keywords are pointless, but merely to suggest the full power of a controlled vocabulary is lost.

2.  Flexibility - Because each new word added has to fit into an intricate structure, adding new words and terms is not straightforward.  It is important that branches of the hierarchy do not overlap inappropriately, thus avoiding duplication of meaning which would destroy the point of the whole thing.  This is a relatively easy thing to do when the vocabulary is small, but as the vocabulary get's bigger this can become a mammoth undertaking.  Also, once a structure is in place,  

3.  Arbitrary Synonyms - As an intrinsic part of hierarchical vocabularies, the final word in the sequence (eg "Acorn", above) is normally given a relationship to words which are classed as synonyms.  So for Acorn, synonyms might include "Nut" and "Seed".  Of course deciding which words are the keywords and which are synonyms is essentially an arbitrary process based around the design principles of the particular vocabulary.  When entering keywords it can be difficult to recall what is the keyword you must enter unless the synonyms are automatically available for selection also.

3. Time - Creating a controlled vocabulary in the first place is a huge task.  A structure has to be maintained, so decisions need to be made about what is appropriate to go where.  The process of maintaining the integrity of the system can therefore be extremely time-consuming.  We have heard of some clients with hierarchical vocabularies who complain that maintaining the vocabulary takes far longer than keywording itself.  Meanwhile adding keywords using an inflexible structure can greatly add to the inputting time, particularly when a keyword needs to be added which is not in the structure.  Parallel systems for handling these new words adds to the complexity of the process, and the time it takes.

4. Cost - Creating, maintaining and inputting keywords with a hierarchical structure takes extra time, and inevitably the labour costs go up.  To get the maximum return from the hierarchical system, libraries may also find themselves locked into using expensive keywording databases and systems.

5. Searchability - Ironically, hierarchical systems can produce inferior keywords because there is less freedom to add in words which don't fit into the system.

There are alternatives to hierarchical keywording, in particular using word strings or word swarms in which keywords are added and selected because of usefulness and relevance, rather because they are part of a hierarchy.

So a string for "Swimming Pool" would include synonyms, plus related keywords such as "Exercise".  Should a new, useful keyword be needed, it is simply added into the string. This avoids adding numerous hierarchical keywords which are largely there as placeholders for the system, thus making it easier to get to the essence of the image or video in question.  Overlapping of keywords is also not a problem, thus making vocabulary maintenance quick and inexpensive.

Selection of keywords is fast, as the various strings, including synonyms, can be exposed, and the most relevant selected from various options.  Because similar images and videos tend to get keyworded over and over, strings can be made better and more relevant by experience.

Even hierarchical vocabularies can be put into such strings, so conversion to this method can be a relatively easy process.