GGF5 (BOF),
July 2002, Edinburgh, Scotland
Grid Information Retrieval Proposed WG
Tue, Jul 23, 2002: 6:00pm - 7:00pm
Description led by Kevin Gamiel (kgamiel at cnidr.org)
Location Harris 2 (40)
The BOF hosted 28 attendees.
Chair presented several
slides on information retrieval (IR) background and proposed
an evolutionary step towards a grid-based
IR system.
In summary, GridIR proposes to explode the traditional, monolithic
indexer/search engine into it's constituent pieces, develop
well-defined interfaces for each as OGSA grid services, and using
a technology
such as web services flow language (WSFL) tie the modular
services together into highly customized IR systems (slides to be
posted
to www.gridir.org). The services could be tied together via
a dynamic routing scheme based on logical service name, such
that at runtime
a directory service could provide a real-time mapping from
logical service name to appropriate service instance. With each
piece
being well-defined and modular, the system promotes rapid
experimentation
and specialization, as a researcher might focus on a single
problem such as merging results from several different searches
without
worrying about the complexities of other supporting services.
Is there overlap with Grid Information Services
working group? Chair of that group in attendance and says no, there
is no
overlap. Grid
Information Services is a service metadata system, a system
that may indeed support GridIR as a vehicle for publishing
the existence
of GridIR services.
Any objections or problems with the
charter? A proposal was raised to collapse the requirements and
architecture
docs
into one,
as the architecture doc would by definition include
the requirements. Counter argument offered, that it may be
useful to keep separate
as a logical sequence of events, even if there was ultimately
overlap.
No real consensus either way, should have further discussions
on mailing list.
What are systems we can look at/learn
from? JXTA Search system and other peer to peer systems. An internal
Oracle
code to
be released
to Chair (and community?) for evaluation.
Will GridIR
solve the merging results problem? No, but enables the possibility
to focus on such problems.
Because
GridIR
consists of
modular, connectable services, one can focus on
a specific service, like merging, without having to
bother with
any of the other
services, something that to this point has been
difficult to accomplish.
Will MCNC-CNIDR offer a prototype? Yes.
How will
average user benefit? As far as use, a user wouldn't see much
difference. That is,
they would
probably interact
with the
system through a web page with a query interface.
However, GridIR should enable secure databases,
more databases,
and better quality
databases because GridIR modularity promotes
greater experimentation with new types of IR systems.
Chair
requested as many IR systems as possible to help with interface
development
Issue of compression raised, both for indexes
and data transfer between nodes. Index
compression considered
out-of-scope and
dependent on particular indexer implementation
for a
specific node instance,
whereas compression of data transfer
an important issue for further consideration.
Caching issue
raised. Consider caching nodes that cache data, e.g. from distributed
crawlers.
Should this be a working
group or a research group? Discussion
ensued
on merits of
both. Led to a
consensus that we should
definitely have a working group
to perform the work as originally
proposed,
i.e. the OGSA-based IR system.
There was also support for a
separate research
group
to address
the many
valid, broad
research questions
related to IR on the grid, e.g.
how to
merge ranked results from various
searches in
a meaningful way. For example,
one might
devise
a method of returning corpus
information along with each result set to enable
reasonable merging.
Chair
agreed to
approach
idea of a separate research group with Area directors.
|