The Virtual Observatory: Methodologies for Data Handling
The ESF scientific network on
Converging Computing Methodologies in
Astronomy (CCMA) has been active in the period 1995-1997. A short report
is attached. The following proposal takes CCMA as its point of departure,
and seeks to include also closely-related areas of remote sensing.
A picture, it is said, replaces a thousand words.
Image data, with its associated text and other data and information,
is particularly well suited to human information processing.
of scientific imagery is dramatically increasing.
The early part of the life cycle of data in the fields of astronomy
and of Earth observation includes calibration and processing, to make the data
generally usable. Data centres, or Processing and Archiving Facilities (PAFs)
play a role here. Data handling, and even data access and understanding,
become very important from this point onwards. This is where much effort, and
much pioneering work, is needed. Spin-off influence and results in
other areas is clearly happening: image filtering and `object' detection
in mammography image analysis, image registration in molecular
biology, close links with research work in
archaeology and video databases, oceanography and industrial vision
inspection - fault detection in jeans and other textiles - to name but a
- In astronomy,
expensively collected ground-based and satellite-borne instruments collect
vast quantities of data which - it is widely felt - ought to be stored for
posterity. This is part of humanity's collective
memory. Astronomical data is the raw material on which professional
astronomers work, and such data is also hugely attractive to amateurs and
the general public (cf. the comet Shoemaker-Levy's collision with the planet
In Earth observation, the data flow (or "tsunami" of data) is
far greater. The ERS satellites provide in one day what the Hubble Space
Telescope provides in 100 days. The possibilities and promise of Earth
observation data and information are very great. One can think of the social
and human problems related to the current El Nino event, the future
applications of ocean colour missions for our understanding of life and
society, and so on.
Among issues relating to methodology are the following:
Participants in the
ESF Exploratory Workshop to be held in September 1999
Participants in the CCMA ESF scientific network, 1995-1997;
The ESF-associated European Space Science Committee (ESSC);
Methodologists associated with large-scale data repositories in Earth
observation (the European Commission's Joint Research Centre, Ispra; the
European Space Agency, ESRIN; DLR Oberpfaffenhofen; CNES; PAFs
specializing in Earth observation data, and,
in astronomy, Strasbourg Observatory, and ADS, Center for Astrophysics,
Publishers (Springer-Verlag, Elsevier, Kluwer, University of Chicago
Press, professional associations, Cambridge University Press, Nature and
Selected specialists in the handling of multimedia, multimodal data,
or what could be characterized as the vision/language interface;
specialists in statistical and data analysis methodology; and in data mining
and in knowledge discovery in databases.
- The enormous size of data holdings means that the issues of
resource discovery and information retrieval
must be faced. (Some current
solutions include the Ingrid distributed data and information `publishing'
protocol; indexing and search technologies; summarization of text and other
data; data structures, based on SGML; Hyper-G, XML and other new Web-related
- Electronic publishing:
the literature is increasingly online and must be linked to the
data on which the scientific findings are based, and should be also linked
to public relations and pedagogical forms of the information and knowledge
- New user interfaces are being designed for multimedia
data, in an environment of mobile and ubiquitous computing. The scientist
must be represented in these developments.
Closely associated with access and navigation on the one hand, and
multimedia data on the other, are collaborative and cooperative work
practices. There is convergence between the professional's work and
broad information dissemination, between the scientific production process
the educational arena.
- Multimedia data - still images, text and metadata, documentation,
the published literature, image sequences, video - require
homogenization and cross-correlating. Such data are often heterogeneous and
distributed, which raise special problems.
- Web-based technologies have become the essential infrastructure
so much of scientific work. Such an infrastructure also beneficially links
together the professional scientist with the non-professional. Among
methodological issues are compression and transfer of large images and other
data; new strategies for progressive transmission; ways of handling
sky maps and large data stores.
- Statistical and mathematical experts are needed to closely oversee
the new developments, and to provide the highest quality foundations for
our work (e.g. model selection through Bayes factors; neural networks and
- Exploitation of large data repositories requires data mining -
the search for novel information. Appropriate data structures, communication
protocols and analysis methodologies are also the basis for new developments
in knowledge discovery in databases (KDD).
Following the ESF Exploratory Workshop, a proposal
for an ESF Scientific Programme will be drafted.
Prof. Fionn Murtagh
The Queen's University of Belfast, Belfast, Northern Ireland
Strasbourg Astronomical Observatory, Strasbourg, France
Tel +44 1232 274620
Fax +44 1232 683890
Link to CCMA area,
Converging Computing Methodologies in Astronomy.
Last update to this page: 16 June 1999.