Internet imaging differs from other forms of electronic imaging in that it employs an internet (network of networks) as a transmission vehicle. However, the internet is only one component (albeit a major one) in the total imaging system. The total system comprises client applications internetworked with server applications, as well as off-line authoring tools.
The internet is an evolving communication system. Its functionality, reliability, scaling properties, and performance limits are largely unknown. The transmission of images over the internet pushes on its engineering envelope more than most applications. Consequently, the issues we are interested in exploring pertain to all aspects of the total system; not just images or imaging algorithms.
This emphasis on systems is what sets the Internet Imaging conference apart from other electronic imaging conferences. For a local imaging application, even when it is split between a client and a server linked by an ethernet, a system can be designed by stringing algorithms in a pipeline. If performance is an issue, it is easy to identify the weak link and replace it with a better performing component.
On the Internet, the servers are unknown, the clients are unknown, and the network is unknown. The system is not easily predictable and the result is that the most common problem today is scalability. To be successful one has to follow a top-down design strategy, where the first step is a detailed analysis of the problems to be solved. When a solution is invented, algorithms are selected to produce a balanced system, instead of choosing algorithms of best absolute performance as is done in bottom-up approaches.
An example of how wrong things can go when these fundamentals are not understood are content based image retrieval (CBIR) systems. Today they are part of all the major search engines on the internet, and anyone who has tried to use them for real work has experienced how useless they are.
Although over the years a number of CBIR algorithms has been proposed, none has stood out as being particularly robust, despite the fact that each claims to perform best on some benchmark. Unfortunately there is no universally accepted benchmark for CBIR and the lack of a metric is probably one of the main causes for the poor quality of today's algorithms -- without a performance metric is it impossible to diagnose the shortcomings of a particular algorithm.
During 2001 a team of researchers from the University of Buffalo in the USA and the University of Geneva in Switzerland has collaborated on basic research towards the specification of a universal benchmark for CBIR, known as Benchathlon. The researchers in Buffalo, whose work is described in paper 4672-24, "Testing a vocabulary for image indexing and ground truthing" elaborates on the issues in defining a vocabulary for indexing images, which allows to establish when images are similar. The researchers in Geneva, whose work is described in paper 4672-25, "Dynamic multimedia annotation tool" present a tool that uses the above vocabulary to allow people to collectively categorize the images in a database. The authors then describe how the resulting ground-truth is used to drive a CBIR benchmark.
Much research still remains to be performed to achieve the goals set in the Benchathlon effort (see http://www.benchathlon.net/ ). This is an opportunity for collaborating with the best researchers in the field and make important contributions. For example, many of today's CBIR algorithms tend to rely on bottom-up models for the human visual system (HVS). During the 2001 work on the Benchathlon it became quickly evident that only top-down vision models can correlate with the user's expectations, because the HVS matches images in a top-down process, even at the most rudimentary levels (see A. Pascual-Leone and V. Walsh, "Fast Backprojections from the Motion to the Primary Visual Area Necessary for Visual Awareness," Science Vol. 292, 510-512, 20 April 2001).
The fallacy of popular metrics like color histograms is clear when one compares the images from a stock photo agency to those generally found on the Internet. While the former have been carefully rendered to a normalized intent, the latter are most often the raw output from digital cameras or scanners. A first step is to clearly specify for each image whether it is rendered for a particular output medium or unrendered. A family of large-gamut rendered/unrendered color encoding specifications has recently been proposed under the designation RIMM/ROMM RGB and the paper "Color encodings for image databases" in the session on Valorizing Images is a clear presentation of these issues.
A second step is to develop bottom-up algorithms that can perform a canonical rendering operation, a process variously referred to as automatic enhancement or intelligent enhancement. At last year's Internet Imaging II conference R. Eschbach and D. Pfeiffer had each presented the state of the art in this field.
This year we give special consideration to fundamental research necessary to invent algorithms for estimating the rendering intent of images. First in his invited paper 4672-01, "Natural Image Database and illuminant classification research," Prof. Tominaga shows how to build a reference database.
Second, in a joint session with the Human Vision and Color Imaging conferences we review the progress in Retinex since its inception 40 years ago. In this session the top researchers in the field reveal the current understanding of Retinex and how it is applied to electronic imaging. Retinex is important because it allows to process pixels in their spatial context, thus providing a basis for developing more sophisticated bottom-up algorithms.
The other sessions in the conference are devoted to subjects in which more progress has been made and the results are less controversial, albeit still important. To underline the emphasis on systems, the first session is dedicated to systems and architecture, but eventually internet imaging is for real people and the second session is on human-computer interaction.
Today's network bandwidth and computer graphics accelerators is enabling a trend towards more use of animation and video in internet imaging. The latest results are presented in session 3.
As mentioned earlier, there is a need for top-down methods, and a first step is to find suitable image representations, the topic for session 4 Monday afternoon before the poster session. The next step is feature extraction, the topic for session 6.
Today in evaluating research the question is no longer how hard was it and how many other scientists have built on it. Today one of the first questions is what new business models it enables. Consequently, valorizing images is an important topic for success in research and session 5 presents papers with very interesting perspectives.
Session 7 is on performance analysis and benchmarking. The conference concludes with the mini-symposium on Retinex at 40.
We hope our conference and the EI receptions give you the opportunity to renew old friendships and network with new contacts, leading to new breakthroughs in your research. We look forward to learn about your progress in a paper submitted to Internet Imaging IV in 2003.