Multimedia content is expanding fast. There are now an estimated 800 million HTML pages on the Web, encompassing about 15 terabytes of data (40% actual text and 60% page layout overheads). There are also about 180 million images, amounting to 3 terabytes. In the professional field, multimedia is also on the increase. The Gale Directory of Databases has nearly 400 of its 9800 entries listed as containing video, and over 1300 with image content.
Web search engines cannot cope. A recent survey showed that no search engine covers more than 16% of the public parts of the Web. Many are biased to US popular commercial sites. "Search engine indexing and ranking may have economic, social and political and scientific effects" they say.
Search engines are not very good at recognising the context of data. Most Web users have experienced the exasperation of getting thousands of irrelevant hits after a simple word search. The XML metadata standard will help. Together with the proper dictionaries or schemas, it can help define classes and categories of terms. There are some standards for image and sound formats but practically none for content-based retrieval of multimedia data. The need for information visualisation is also becoming obvious, especially in scientific fields like genetics, high energy physics and astronomy where very large scale databases are being constructed.
In the next century, the volume of images and multimedia will completely swamp text and data. For example one Earth Observation satellite can generate up to one terabyte per day. Multimedia content will come from all quarters: next-generation authoring systems, the graphic arts industry, interactive TV, digital radio, digital cameras and video cams, movie archives and clips, synthetic computer image banks, etc. When they go fully digital, with random access, users will require interactivity not just on titles and bibliographic indexes, but also on the content itself.
Research is needed to develop new ways to master the multimedia overload, through standardised multimedia object management systems. Otherwise the content and creative industries will not be able to create, use and exchange this new wealth in an open and economical way. The EU intends to launch a Call for proposals in late 1999 for collaborative research amongst partners in the EU and around the world, in the areas of information access, filtering, analysis and handling.
This talk will draw on the ETAN Study which I am chairing on "Transforming European Science through ICT: Challenges and Opportunities of the Digital Age".
I will discuss the current status of astronomical computing and information services as seen from the point of view of an active research astronomer, and discuss how funding issues drive what projects are undertaken. Significant amounts of time are still be spent both by astronomers and programmers on software for reducing data, despite many years of effort in this field. This is surprising as there is an argument that the algorithms are now stable. The drive towards interoperability should help in this area, but is unlikely to be a cure. Computer based bibliographic services are now a crucial component of astronomical infrastructure, as are the repositories of processed data. Archives of raw data are becoming easier to access, and thus more used. There is still a gap in the tying together of archives and the papers which resulted from those data.
SARA ( http://www.cacr.caltech.edu/SDA/digital_puglia.html) is a prototype of a system to provide "processing on demand" for Synthetic Aperture Radar (SAR) data, which are stored in a distributed database. SAR data are "seen" by a sensor as raw data. Raw data need to be processed to obtain an image. Further processing may be needed by the user in order to extract some information from the image, for instance the system allows the client to perform such tasks as Principal Components Analysis, Singular Value Decomposition or Supervised Classification, that can transform the original image into something more suitable to the user's needs. The objectives of the system are to allow a user to choose from a web browser a geographic area of interest and to specify the information to be extracted from the selected images. The system has to solve dynamically a variety of problems. First the system has to look for the images, then it has to look for the program that can perform the required processing, then it has to select a strategy to let the program and the image meet (i.e. should the processing be transferred on the machine with the data or is the opposite more convenient?). Other issues are the need for new forms of electronic trade, as both the data and the processing time have to be paid by the user, and the need for the user to control interactively some forms of processing (as the supervised classification). All these problems concerning the management and the control of distributed resources can be stated better using the concept of "Computational Grid" and most of them can be solved using Globus (http://www.globus.org/). By integrating the tools of Globus with the web technologies, SARA now allows a user to look for data and to run an interactive application on the found image. The authentication of the user, the search for data, the launch of the remote application and the remote, interactive control of the processing are now all possible from a standard java-enabled web browser.