Its always been the case. The more we know about how a picture was taken—the camera settings, the distance from the subject, whether the flash fired—the easier it is to process the picture.
The more I know about my pictures—when and where they were taken, even who is in them—the easier it is for me to organize them and the more important they are to me.
That’s why metadata, the extra bits describing the bits, are key to the imaging infrastructure the digital age. When Kodak announced the release of its Picture Metadata Toolkit as Open Source software at Comdex last November, the goal was to achieve this infrastructure. We wanted to make it easier to process digital pictures, and we wanted to make it easier for people to organize their pictures, using date and time information and their own names for their pictures.
Picture metadata conveys additional information about a picture—scene, camera type, names of people in the picture, or other related data. More powerful imaging systems can be created when the system interprets and uses such information, rather than simply passing around pixels. For example, picture metadata describing the scene content can be used to provide easier and more useful image-organizing tools—enabling people to capture more of their memories. Or picture metadata describing the image creation and processing history can be used to optimize display and printing.
By knowing whether the flash was used when capturing an image, the processing system can determine if it is appropriate to apply the red-eye correction algorithm. The tag, Flash,indicates the flash firing and return status.
But for the full benefits of picture metadata to be realized across the open system, imaging applications first need a universal means to define metadata, and a means to create, access, update and save metadata associated with pictures (for example Adobe Photoshop 6.0). The Picture Metadata Toolkit addresses this need, and builds the foundation for future development of a complete Picture Metadata architecture to fully manage information about pictures.
The performance of a Scene Balance Algorithm (SBA) is improved through the use of illuminant information, such as the scene brightness and whether the flash fired.
Initially available for C++ developers on the Microsoft Windows and Unix platforms, this toolkit provides software developers uniform access to a variety of metadata storage formats, including Exif 2.1 and several variants of TIFF. By releasing this toolkit as Open Source software, Kodak reduces the barriers faced by application developers in using and persisting picture metadata in their applications. In addition, Kodak invites others in the imaging industry to join in the further development of the Picture Metadata Toolkit. Toolkit features include:
The initial release of the Picture Metadata Toolkit is the first step toward a broader Picture Metadata architecture, which will take metadata beyond simply storing additional information with an image. The architecture defines a structure with the intent of enabling the computer to automatically perform many operations on behalf of the user.
For example, imagine a web-based application that accepts digital images for printing. A customer submits "high quality" digital images for printing 8 x 10’s, and she compresses the images to shorten the upload time. The service, meanwhile, has established a compression ratio guideline for prints to ensure optimum print quality. Upon receiving the customer’s images, an automated application audits their state and order information for potential quality issues. The audit includes checking the appropriateness of the compression ratio for the ordered print size. In this context, the service’s guidelines provide the application with the semantic interpretation of the metadata structure that carries the compression ratio. In this case, the interpretation is that the compression ratio is not appropriate for the requested print size, and the customer is issued a quality warning recommending a more appropriate compression ratio.
In this example, the use of metadata by the application involves more than just the interpretation of the bytes that make up the metadata itself. A survey of several digital applications that intelligently use picture metadata reveals a similar structure. The Picture Metadata architecture concisely captures this structure by defining three levels of abstraction: structure, semantics, and context.
The structure level addresses the structural representation of the metadata. For the receiver of the metadata, it defines how to group the bytes or characters into meaningful components (e.g., the physical type of the compression ratio). Semantics is concerned with the conveyance of the meaning of the metadata, and answers such questions as, "What assertion is intended by this sequence of components?" (e.g, a print service’s compression ratio guidelines). Context addresses the contextual translation of the metadata. It determines if the assertions conveyed by the semantics are in an understandable form within the current context (e.g., the context within which to interpret the compression ratio is "high quality printing" rather than "fast image display over the Internet"). There is a clear dependency among these levels. The translations of the context level rely on the well-formed meanings in the semantic layer. The semantic layer, in turn, relies on the structure layer for the well-formed representations.
This first release of the Picture Metadata Toolkit addresses the structural level of the Picture Metadata architecture. At the structural level, the Toolkit provides facilities for defining metadata and for creating, accessing, and manipulating metadata instances. The representation of metadata is divided into two distinct components: logical and procedural. The logical representation is a procedural programming language-independent means to define and capture instances of metadata. The logical definition is expressed through a full-featured Metadata Definition Language. It directly drives the structure and allowable content of the logical instance, thereby allowing the logical definition to be used directly in the validation of logical instances.
In developing an implementation for the logical representation, we considered that metadata essentially is just data, and that there already exists a class of languages called declarative languages which are specifically designed to build descriptions of data structures. Because of the popularity of the World Wide Web, the eXtensible Markup Language (XML) –a declarative meta-language—is emerging as the de facto standard for capturing data structures (its formal application on the Web is capturing document structure). The XML recommendation defines a language with obscure syntax language and limited expression abilities for defining document types: the Document Type Definition language. Recognizing the utility of XML for data structures, the WWW Consortium (W3C) has defined XML Schema, a full-featured declarative language using XML syntax. The Picture Metadata Toolkit uses XML schema as its Metadata Definition Language, and XML to capture logical instances. This decision makes available a wealth of standard tools for using metadata in its logical form. For example, XML parsers are readily available in most programming languages, and XML Style Sheets and the XML Transformation Language can be used to present metadata to users through a web browser, etc.
The procedural representation is a programming language-specific means to expose the logical representation to manipulation by applications. Procedural metadata prototypes are created directly from the logical definitions. The logical definitions are parsed. The appropriate procedural prototypes are then created through the interpretation of the results of the parsing operation and placed into a metadata factory. An application wishing to create a procedural metadata instance simply sends a request to the metadata factory for the creation of the desired procedural metadata instance.
The logical representation also provides a common ground for the exchange of metadata between heterogeneous systems. For example, assume two systems based on different programming languages (C++ vs JAVA) wish to exchange metadata. The C++ system can "serialize" its procedural metadata instances into logical metadata instances and send these to the JAVA based system. The JAVA system can then parse and interpret the logical metadata instances into the appropriate procedural metadata instances.
An important issue that needs to be addressed at the structural level is that of access to metadata stored in image files. If the metadata is stored in the image file in XML format, then the access to it is a straightforward application of the Picture Metadata Toolkit facilities. Emerging standards, such as DIG35 and JPEG2000, are creating metadata storage specifications based on XML. The challenge is accessing the metadata stored in the file formats that are currently in wide use, such as TIFF and Exif. TIFF stores metadata as a tag-value pair in a loosely defined directory structure. Exif adopted the TIFF conventions, but tightened the specification of the directory structure. In either case, the desire is to provide an interface to non-XML based metadata storage that is in terms of the XML Schema-based representation.
The storage format input / output variations have been abstracted out to a common class interface, referred to as Accessors. The Accessors have an explicit understanding of the file format’s and Picture Metadata Toolkit’s metadata representation. Therefore, Accessors are able to perform the mapping between the two representations. The Accessors’ interface is defined in terms of the procedural metadata representation, completely hiding any file format details from the application. Presently, the Accessors support the Exif, TIFF, and APS MOF (Magnetics On Film) file formats.
Storing picture metadata is nothing new—many file formats and capture devices already provide metadata capability. Popular file formats such as TIFF and Exif use different, but related schemes to store metadata. And Kodak’s current guidelines for writing, using, and maintaining metadata with existing image file formats are already available from the DRG web site. In addition, Kodak is participating in the Digital Imaging Group’s DIG35 initiative, which is trying to define a common set of metadata independent of a particular file format. Over time, our guidelines will evolve, and the KODAK Picture Metadata Toolkit will support this standard as it evolves.
The KODAK Picture Metadata Toolkit is available now for download through Kodak’s Developer Relations Group at www.kodak.com/go/drg. In addition, The Picture Metadata Toolkit Open Source Development Site is now up and running on Source Forge.
Become a Picture Metadata Toolkit developer today by visiting the Source Forge web site: http://sourceforge.net, and go to the ‘picturemetadata’ project page: http://ourceforge.net/projects/picturemetadata.The toolkit enables the development of flexible, powerful applications that can treat pictures as more than just pixels.