Data information and knowledge for e-science and e-infrastructure abstract

From ERCIM Working Group Data and Information Spaces
Jump to: navigation, search

Data information and knowledge for e-science and e-infrastructure

Speaker: Keith G Jeffery

Abstract:

Dataspaces are intended to overcome the cost of integrated distributed information systems by providing on-demand partial integration at less cost. They are characterised by participants and relationships [FrHaMa2005]. Participants are data systems and relationships are semantic linkages between them. However, the linkages are only invoked as needed, built incrementally and are based on human actions, interpretation and annotation.

The research challenges are what they have always been:

  1. homogeneous access over heterogeneous data sources;
  2. utilisation of learned experience in the system;

Dataspaces invoke additional human effort to overcome the limitations of computer-based information integration and management. In fact they are closely related to ‘mash-ups’. Current implementations are very simple and make simplifying assumptions. Dataspaces do not (yet?) tackle the problem of discovering and composing/orchestrating software service components.

Inspired by the e-Science concept, work on GRIDs and SOA (Service-Oriented Architectures) for e-infrastructure over recent years has been tackling the same problem and even extending it – by including utilisation of appropriate software to execute the processes required (even if not defined) by the user. However the approach has been to rely on formal syntax and defined semantics and techniques for a) in the information domain: schema matching and mapping which are automated; b) in the software domain: requirements and capabilities matching and automated composition and orchestration;

The two approaches (dataspaces and e-infrastructure) overlap in the use of metadata and catalogs, in each area there are developing research and development directions. Related work on CRIS (Current Research Information Systems) has defined integration mechanisms and has led to considerations of CRIS as the integrating ‘glue’ in a research information environment (e-infrastructure). It may prove a suitable ‘glue’ for dataspaces.


[FrHaMa2005] Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4) (2005) 27–33