This article was written in 2005.
Coded Raw is an ambitious project to develop a means for open, universal, and future-proof digital image storage and retrieval.
Our objectives are:
The basic elements of the project are:
When you take a picture with your camera, how do you know that you’ll be able to read it in, say, ten years’ time?
This is just one example of a problem with proprietary image formats. There are many other examples of problems, and it’s not just limited to home users who want to be able to see pictures of their family for years to come. It also applies to libraries, archives, and museums, who need to think even further into the future.
To prevent a situation of data loss because an image's file format has faded into obscurity.
Imagine you are a software developer who wants to write a program, say an image gallery viewer, a painting application, a jigsaw game, or an image-processing suite.
One of the most significant obstacles faced by a developer starting afresh in this field is the handling of all the image formats that currently exist. Many of these formats are protected by patents and licences: you need to pay to write software that encodes GIF files (the same can be said of mp3 and AAC files in the audio field).
If you are lucky, some of the formats will be documented and free, or there is an API or library you can incorporate into your application. If you have a large budget, you can pay off all the patent and rights holders. If you’re unlucky, you’ll come across a file format that is undocumented, encrypted and understood only by proprietary software; your only option for supporting that file type would be to reverse-engineer the format and decrypt the data, which is sure to attract lawyers!
Next, we imagine what happens to these ‘difficult’ files when a manufacturer ceases trading, or when they move on to a new image format. Chances are, you have about ten years to translate all your files into a currently-understood format. After those ten years, the software that reads your files will probably not run on your current operating system, and your files might as well be noise, or deleted… or you could print them all out on archive paper, or wait in hope for patents to expire, when software developers are free to write new image-reading software.
To define a universal means of accessing images, by removing decoding responsibility from the host application.
We aim to solve the problems caused by the diversity of native file formats in the digital photographic industry.
To overcome the practical problems caused by proprietary or encoded formats, we emphasize:
All aspects of the project are intended to be ‘open’, both in terms of education and future development, and we aim to interest all parties who handle image files, including: camera manufacturers, software developers, government and educational libraries, medical imagers, museums, digital archive repositories, and home users.
All aspects of the project will be documented, because we feel that this would increase the likelihood of widespread uptake. This can be achieved while still protecting investments in proprietary intellectual property (see the example, in for camera developers).
The Coded Raw API and file format allow many different decoding methods to be included within an image file. An application can then query the image file to understand which methods it supports, and use them as approprite. The readability of the file may be improved by including new methods in the file, and new reading methods may be used from within the API-using application, in preference to using the methods supplied in the image file. Together with the proposed Coded Raw interface standards, an image file may be read in the best way appropriate to the situation, yet still have fall-back methods for reading the image data using the interface methods required by the base standard.
An important part of the API is the ability of an image file to tell the reader: what standards are supported, what methods (procedures) are embedded, and what the parameters are for those methods. There are two types of standard:
API standard, which determines how the reader understands the capabilities of the Coded Raw file, and how the reader makes requests for image data;
Method Set standards, which comprise a set of required and optional embedded methods, and their parameters. These methods and parameters will be enumerated for easy identification, and be identified as ‘standard methods’. By way of an example, a ‘Base’ Method Set standard for Coded Raw will require methods to:
Further standards have the option to support more sophisticated methods, e.g.
Clearly we need to carefully define the framework for the standards, and design appropriate methods.
We expect that SIGs (special interest groups) will want to define Method Set standards appropriate to their specialism, e.g. 3D imaging, stochastic models, video playing, etc. The range of possible applications is staggering, but we will remain focused on plain 2D imaging while avoiding closing any doors to the more exotic uses for this technology.
A Coded Raw image is required to support the base standard, to allow image data to be extracted as a basic array of RGB pixels. This is the fallback method, and the advantage of the Coded Raw interface becomes clear when we consider the multitude of possible encoding methods that might be used within an image file. All these are hidden from the querying application. In other words, we have abstracted the encoding. For example, the embedded image might be a 7z/LZMA-encoded greyscale image, or JPEG-encoded, or described as a vector like SVG; in all cases, regardless of encoding method, the image file will be able to respond with a rectangular RGB pixel grid, using the embedded code that is appropriate to the content.
The project applies principles of object orientation to image files. In traditional OO programming, we have an object that represents some abstract concept (say a person). That object will have data (like date of birth, sex, height, weight ...) and methods (like create(owner), delete(), calculateAge(), calculateBMI(), and so on. Each object that is of class person can then be queried for age and BMI. We have simply extended this OO concept to image files, along with a means of querying the object’s methods, and a way of defining standards that guide programmers to embed specific methods when creating those objects.
At the core of this technology is an execution model that allows code to be imported from an image file and executed by the host: an advanced form of dynamic linking. There are many technologies that allow this, with JavaScript emerging as a viable candidate. [In 2005, we thought that Java might be viable, because it is suited to embedded systems, is well-documented, it was likely to be reasonably future-proof]. We understand we must design this very carefully, to ‘sandbox’ embedded code so that it does not affect the host system (i.e. protect against ‘malware’ exploitation).
For the application programmer, it will offer the means to extract raw and processed information from a camera raw file, without requiring special knowledge about the camera or its proprietary file format.
For the camera developer, it offers the opportunity to embed proprietary code into raw images, and develop hardware without worrying about third parties catching up with new technology. When Coded Raw is sufficiently developed to the Base Standard, then as a first step for adoption, we would recommend the use of Coded Raw as a wrapper for proprietary raw files; gradual implementation of standard methods would then follow.
Assuming for a moment(!) that the industry adopts the Coded Raw standard, the photographer and graphic artist will be able to use any supported camera format, and choose any supported application, because the applications will not need to decode proprietary camera-raw data.
All the proposed technologies are openly archived and documented to a high standard, providing the Coded Raw workflow with better future-proofing and archive potential. The Coded Raw project itself will be Open Source, but software developers may publish processing methods in an implementation of the Coded Raw interface, without necessarily disclosing the source code.
We are aware that this project has potential to ‘fork’ into uncontrolled third-party projects, but in so doing, the such authors risk re-creating the ‘proprietary problem’ mentioned above. Efforts to develop Coded Raw should be united and co-ordinated, such that project developers have clear targets, and consumers of the format have clear expectations.
If our authors are unable to dedicate sufficient resources to the project, and subsequent developers are so able, then we require attribution for the IP and ideas presented here. In the spirit of openness, we have provided this information on the web, to generate interest in the project.
We sincerely wish, and we require, that the Coded Raw designs are not taken over by any organisation with intentions to develop it into a proprietary or closed format. For purposes of potential patents, web archives can demonstrate Prior Art by this author.
If you are interested in supporting or developing this project, please use the feedback form to contact the author. Please take a look at the other pages, which describe some aspects of Coded Raw in more detail.