GIR


 
   
Minutes

GGF6 (BOF), October 14-17, 2002, Chicago, IL, USA

GRIDIR BOF2
GGF Chicago
October 16, 2002, 3:30 p.m. to 5:00 p.m.

Abstract: The meeting consisted of a presentation prepared by Dr. Gregory Newby of UNC-Chapel Hill, Mr. Kevin Gamiel of MCNC/CNIDR, and Mr. Nassib Nassar of UNC-Chapel Hill; followed by general discussion on a variety of topics related to the GridIR proposal.

Discussants:
Mr. Gamiel
Dr. Newby

Mr. Gamiel began by discussing the meeting agenda and BOF/WG status overview.

Dr. Newby discussed the requirements document outline.

A question was raised about the definition of "collection." Dr. Newby defined it as "a logical grouping of documents." The questioner asked whether a collection could itself be a grid. Mr. Gamiel pointed out that the results of a query could become a collection.

Dr. Newby discussed the meaning of information retrieval and GridIR.

Mr. Gamiel asked how many of the participants were familiar with OGSA. A majority indicated that they were. Mr. Gamiel stated that GridIR assumes OGSA functionality, i.e. that it will be specified as an OGSA service. In response to this, Mr. Gamiel was asked what he meant by "OGSA service." He replied with a brief description of the GridIR system and explained that it would be composed of multiple PortTypes.

A participant noted the existence of MDS (metadata server). Mr. Gamiel responded that there may be overlap with GridIR, but that GridIR would be a collection of tools that could be utilized in building a resource discovery system. He added that the proposed GridIR working group would learn from previous relevant architectures.

Mr. Gamiel discussed the architecture document.

A questioner asked whether queries would be persistent. Dr. Newby replied that the user client should be able to specify persistence.

Mr. Gamiel discussed the architecture diagram. In response to a question, Dr. Newby stated that there had been a lot of debate about the core components, and that the co-chairs want to see [further] scrutiny and debate on the architecture.

Mr. Gamiel discussed the specification document outline and specification diagram. He stated that the "explain" functionality, a set of methods for requesting metadata, is located in the base class of the inheritance tree. He also pointed out that the IR PortType and QueryProcessing PortType should look identical to each other from the point of view of a search client. [Ed.: In other words, the searching functionality is located in the ancestor of the two PortTypes.]

A question was asked whether anyone at all would be able to create a collection. Mr. Gamiel responded that OGSA takes care of policy questions such as regarding access, instantiation, etc., and that it supports different levels of administrators. Dr. Newby added that each web server could, for example, publish its data; as opposed to using a monolithic crawler. Therefore, much of the administration can be distributed.

A participant stated that systems such as WAIS, etc. failed because people could not agree on the indexing format. Mr. Gamiel replied that GridIR aims to standardize query and content structures (i.e. containers and methods), but not particular indexing methods. He also added that using the event model, it will be possible to generate "push" events from various sources that do not traditionally support "push" indexing.

Mr. Gamiel mentioned Z39.50's Query Type concept as an example of one area where we need not "reinvent the wheel." I.e. it is desirable to take the RPN query and any other successful concepts and apply them in GridIR.

A question was raised about query expansion and transformation [in the context of the QueryProcessing PortType]. Dr. Newby responded that all the "smarts" needed for distributed, asynchronous searching will be in the QueryProcessing PortType.

A participant asked where the API will connect to the architecture. Mr. Gamiel replied that there would be three APIs, one for each PortType. Dr. Newby added that the IR and QueryProcessing PortTypes would have some methods in common. In reply to the question whether the QueryProcessing PortType would be a centralized service, Mr. Gamiel said that no, the client can go directly to the Indexing/Searching component (IR PortType).

A participant asked what would be the life cycle of the PortTypes; would they be short-lived? Mr. Gamiel responded that the proposed architecture would not place limits on the life cycle, and that it would depend on what one were trying to accomplish.

Following more discussion, Dr. Newby mentioned that the CollectionManager PortType might be useful on its own for resource discovery.

The question was raised again, why is the CollectionManager PortType not simply "another GridIR?" There were a few responses. Dr. Newby mentioned that one might use one or more CollectionManager PortType instances to harvest data.

Some participants asked about usage scenarios. Dr. Newby responded by saying that most of the work on GridIR has been done by IR system developers (such as the presenters), which has resulted in our taking a somewhat bottom-up approach. Mr. Nassar requested that participants contribute a wide range of usage scenarios.

A question about whether users would be able to create their own index: Dr. Newby replied, yes, but qualified this by saying that indexing is the most heavy-weight part of the process.

There was an extended general discussion about CollectionManager transformations and the relationship of the CollectionManager to the Indexing/Searching component (and communication between the two). This resulted in varied suggestions from the participants, such as that the inheritance hierarchy diagram should perhaps be eliminated and that the group should develop very simple, abstract use cases. One participant proposed that the architecture should be driven top-down from "first principles." However, some of these questions were not resolved by the official end of the meeting (with ad hoc discussion, however, continuing afterward). Mr. Gamiel requested that all interested parties join the mailing list and contribute use cases.

About GIR-WG.org