GIR


 
   
Minutes

GGF5 (BOF), July 2002, Edinburgh, Scotland

Grid Information Retrieval Proposed WG
Tue, Jul 23, 2002: 6:00pm - 7:00pm
Description led by Kevin Gamiel (kgamiel at cnidr.org)
Location Harris 2 (40)

The BOF hosted 28 attendees.

Chair presented several slides on information retrieval (IR) background and proposed an evolutionary step towards a grid-based IR system. In summary, GridIR proposes to explode the traditional, monolithic indexer/search engine into it's constituent pieces, develop well-defined interfaces for each as OGSA grid services, and using a technology such as web services flow language (WSFL) tie the modular services together into highly customized IR systems (slides to be posted to www.gridir.org). The services could be tied together via a dynamic routing scheme based on logical service name, such that at runtime a directory service could provide a real-time mapping from logical service name to appropriate service instance. With each piece being well-defined and modular, the system promotes rapid experimentation and specialization, as a researcher might focus on a single problem such as merging results from several different searches without worrying about the complexities of other supporting services.

Is there overlap with Grid Information Services working group? Chair of that group in attendance and says no, there is no overlap. Grid Information Services is a service metadata system, a system that may indeed support GridIR as a vehicle for publishing the existence of GridIR services.

Any objections or problems with the charter? A proposal was raised to collapse the requirements and architecture docs into one, as the architecture doc would by definition include the requirements. Counter argument offered, that it may be useful to keep separate as a logical sequence of events, even if there was ultimately overlap. No real consensus either way, should have further discussions on mailing list.

What are systems we can look at/learn from? JXTA Search system and other peer to peer systems. An internal Oracle code to be released to Chair (and community?) for evaluation.

Will GridIR solve the merging results problem? No, but enables the possibility to focus on such problems. Because GridIR consists of modular, connectable services, one can focus on a specific service, like merging, without having to bother with any of the other services, something that to this point has been difficult to accomplish.

Will MCNC-CNIDR offer a prototype? Yes.

How will average user benefit? As far as use, a user wouldn't see much difference. That is, they would probably interact with the system through a web page with a query interface. However, GridIR should enable secure databases, more databases, and better quality databases because GridIR modularity promotes greater experimentation with new types of IR systems.

Chair requested as many IR systems as possible to help with interface development

Issue of compression raised, both for indexes and data transfer between nodes. Index compression considered out-of-scope and dependent on particular indexer implementation for a specific node instance, whereas compression of data transfer an important issue for further consideration.

Caching issue raised. Consider caching nodes that cache data, e.g. from distributed crawlers.

Should this be a working group or a research group? Discussion ensued on merits of both. Led to a consensus that we should definitely have a working group to perform the work as originally proposed, i.e. the OGSA-based IR system. There was also support for a separate research group to address the many valid, broad research questions related to IR on the grid, e.g. how to merge ranked results from various searches in a meaningful way. For example, one might devise a method of returning corpus information along with each result set to enable reasonable merging. Chair agreed to approach idea of a separate research group with Area directors.

About GIR-WG.org