GGF6
(BOF), October 14-17, 2002, Chicago, IL, USA
GRIDIR BOF2
GGF Chicago
October 16, 2002, 3:30 p.m. to 5:00 p.m.
Abstract: The meeting consisted
of a presentation prepared by Dr. Gregory Newby of UNC-Chapel Hill,
Mr. Kevin Gamiel of MCNC/CNIDR,
and Mr. Nassib Nassar of UNC-Chapel Hill; followed
by general discussion on a variety of topics related to the GridIR
proposal.
Discussants:
Mr. Gamiel
Dr. Newby
Mr. Gamiel began by discussing the meeting agenda
and BOF/WG status overview.
Dr. Newby discussed the requirements
document outline.
A question was raised about the definition of "collection." Dr.
Newby defined it as "a logical grouping of documents." The
questioner asked whether a collection could itself be a grid. Mr.
Gamiel pointed out that the results of a query could become a collection.
Dr. Newby discussed the meaning of information
retrieval and GridIR.
Mr. Gamiel asked how many of the participants
were familiar with OGSA. A majority indicated that they were. Mr.
Gamiel stated that
GridIR assumes OGSA functionality, i.e. that it will be specified
as an OGSA service. In response to this, Mr. Gamiel was asked what
he meant by "OGSA service." He replied with a brief description
of the GridIR system and explained that it would be composed of
multiple PortTypes.
A participant noted the existence of MDS (metadata
server). Mr. Gamiel responded that there may be overlap with GridIR,
but that
GridIR would be a collection of tools that could be utilized in
building a resource discovery system. He added that the proposed
GridIR working group would learn from previous relevant architectures.
Mr. Gamiel discussed the architecture document.
A questioner asked whether queries would be persistent.
Dr. Newby replied that the user client should be able to specify
persistence.
Mr. Gamiel discussed the architecture diagram.
In response to a question, Dr. Newby stated that there had been
a lot of debate
about
the core components, and that the co-chairs want to see [further]
scrutiny and debate on the architecture.
Mr. Gamiel discussed the
specification document outline and specification diagram. He stated
that the "explain" functionality, a
set of methods for requesting metadata, is located in the base class
of the inheritance tree. He also pointed out that the IR PortType
and QueryProcessing PortType should look identical to each other
from the point of view of a search client. [Ed.: In other words,
the searching functionality is located in the ancestor of the two
PortTypes.]
A question was asked whether anyone at all would
be able to create a collection. Mr. Gamiel responded that OGSA takes
care of policy
questions such as regarding access, instantiation, etc., and that
it supports different levels of administrators. Dr. Newby added
that each web server could, for example, publish its data; as opposed
to using a monolithic crawler. Therefore, much of the administration
can be distributed.
A participant stated that systems such as WAIS,
etc. failed because people could not agree on the indexing format.
Mr. Gamiel replied
that GridIR aims to standardize query and content structures (i.e.
containers and methods), but not particular indexing methods. He
also added that using the event model, it will be possible to generate "push" events
from various sources that do not traditionally support "push" indexing.
Mr. Gamiel mentioned Z39.50's Query Type concept
as an example of one area where we need not "reinvent the wheel." I.e.
it is desirable to take the RPN query and any other successful concepts
and apply them in GridIR.
A question was raised about query expansion
and transformation [in the context of the QueryProcessing PortType].
Dr. Newby responded
that all the "smarts" needed for distributed, asynchronous
searching will be in the QueryProcessing PortType.
A participant
asked where the API will connect to the architecture. Mr. Gamiel
replied that there would be three APIs, one for each
PortType. Dr. Newby added that the IR and QueryProcessing PortTypes
would have some methods in common. In reply to the question whether
the QueryProcessing PortType would be a centralized service, Mr.
Gamiel said that no, the client can go directly to the Indexing/Searching
component (IR PortType).
A participant asked what would be the life
cycle of the PortTypes; would they be short-lived? Mr. Gamiel responded
that the proposed
architecture would not place limits on the life cycle, and that
it would depend on what one were trying to accomplish.
Following
more discussion, Dr. Newby mentioned that the CollectionManager
PortType might be useful on its own for resource discovery.
The
question was raised again, why is the CollectionManager PortType
not simply "another GridIR?" There were a few responses.
Dr. Newby mentioned that one might use one or more CollectionManager
PortType instances to harvest data.
Some participants asked about
usage scenarios. Dr. Newby responded by saying that most of the
work on GridIR has been done by IR system
developers (such as the presenters), which has resulted in our taking
a somewhat bottom-up approach. Mr. Nassar requested that participants
contribute a wide range of usage scenarios.
A question about whether
users would be able to create their own index: Dr. Newby replied,
yes, but qualified this by saying that
indexing is the most heavy-weight part of the process.
There was
an extended general discussion about CollectionManager transformations
and the relationship of the CollectionManager to
the Indexing/Searching component (and communication between the
two). This resulted in varied suggestions from the participants,
such as that the inheritance hierarchy diagram should perhaps be
eliminated and that the group should develop very simple, abstract
use cases. One participant proposed that the architecture should
be driven top-down from "first principles." However, some
of these questions were not resolved by the official end of the
meeting (with ad hoc discussion, however, continuing afterward).
Mr. Gamiel requested that all interested parties join the mailing
list and contribute use cases.
|