GGF8, June 24-27, 2003, Seattle, WA, USA
GIR Working Group
June 26, 2003
GGF8, Seattle
Presenters:
Kevin Gamiel (MCNC/CNIDR)
Gregory Newby (Arctic Region Supercomputing Center)
Nassib Nassar (Etymon)
Minutes:
Sousan Karimi (MCNC/CNIDR)
Dr. Gregory Newby started the GIR-WG session
by introducing himself, Kevin Gamiel, Nassib Nassar and Sousan
Karimi. He talked about the GGF Intelectual Property policy, and
mentioned that it can be found on the GGF web site. He talked about
the meeting's agenda, the status of the requirements document, and
invited the attendees to the GridIR demo. Dr. Newby asked the participants
for their consensus to announce the last call for the requirements
document, and also their input for the GridIR architecture document.
Mr. Gamiel clarified that there are two more weeks for comments
during last call on email list.
Dr. Newby
next talked about background of IR and the importance of GridIR.
He explained that through setting up a Virtual Organization (VO)
those organizations that didn't want to make their information available
through the web, can now share them with those in the VO using GridIR.
Question:
Does GridIR work better with static or dynamic information?
Answer:
Dr. Newby: It works for both but because of Grid infrastructure
it is more suitable for dynamic information.
Question:
There were discussions in Chicago about the reason of having
3 services (Collection Manager, Indexer and Query Processor),
so clarification was requested.
Answer:
Mr. Gamiel: The requirements doesn't impose an architecture of
the three services per se, but the intent is to provide the functionality
that is required, therefore in the current architecture we have
chosen to use these three services.
At this time Dr. Newby suggested that the audience
read the posted requirements document to provide input. He also
mentioned that the document can be found on GridIR website and also
that it will be available using GridForge soon. There wasn't much
response for last call on requirements document by the attendees,
but no objections on announcing last call on email list.
Dr. Newby explained that since
GGF7 minor changes have been made to the document for language clarification
and enhancement has been made to the security description. He said
we mention security
since we
think in some area security is not addressed we need to identify
those areas. Basic concern is that the desirable function of GridIR
is document sharing with ability to control the level of sharing.
Service level and user level is provided by OGSA and we can have
it for free, but we want to have access level and in some cases
even document level control, which we don't think it will come from
GGF.
Question:
Do you have any thing below the document level in mind?
Answer:
In our requirement document we stopped at the document level.
Question:
Our concern is mostly dealing with patient data that have lots of
security and privacy issues, I am not sure if the document level
control can address this.
Answer:
It depends on how you define your document, plus we are trying to
use filters to solve some of these problems, however, the problem
of security and its level is a hard problem and that may be a task
for another working group.
At this time Mr. Nassar said "if you
look at XML and the way subdocument is defined, it is very general
and is a moving target." Ms. Ramakrishnan at this time explained
that these issues are being discussed in the Security Working group.
Mr. Newby continued his
presentation by giving a quick over view of the architecture,
he said there are four main components to GridIR architecture. First
is the Collection Manager, which is like a gatekeeper and it does
whatever transformation is necessary to be used in VOs. Then he
moved on explaining the Indexer and the query processor. He stated
that the query processing can be one by an Indexer since Indexer
can respond to a query, but what Query Processor can do is more
than just search and getting the result, it can send one query to
different indexers, merge the results, rank them and so on. He stated
that another important job of the Query processor is to deal with
persistent queries, which you will see in the demo this afternoon.
Then he talked about the last component, the Security Service, and
explained that it requires a fine level of control over data
and methods.
Question: Is there sub-document level acl?
Answer by Dr. Newby: not sure what level we should stop at.
Mr. Nassar offered: it is a moving target
Ms. Ramakrishnan: just a change in policy representation and enforcement
points
At this time, Mr. Nassar took over and went
over the current architecture of GridIR. He said the presentation
was put together by Mr. Jeremiah Morris and some may have already
seen it in Tokyo session. (You can find this presentation at http://www.gridir.org/publications.html.)
Mr. Nassar stated that Mr. Gamiel's group has been
implementing these components, and of course it is a work in progress.
He gave an overview of the architecture and talked about the different
components of this architecture. He said common to the three components
is the notification process. He said the VO could have one or more
of these components on a machine. Talked about Collection Manager
and said that when it is created, it is configured to retrieve data
from local or remote points, it notifies indexer if any of the data
has changed. Mr. Gamiel offered an explanation about notification
and stated that we want to support both push and pull methods, for
example crawls as well as email list. He said it could be tightly
coupled with databases that know when data has changed, then standing
queries need to be processed, Query processor re-executes queries
when it is notified. Query Processor combines all of the data from
indexer and send it to the client/user. Then he talked about the
importance of relevance for IR, and stated that of course it is
a fuzzy notion and databases don't usually have fuzzy notions. Ranking
based on relevance don't have a fixed way of representing the response
set. He continued with saying that queries on different search engines
give different results
since there are different documents and different ways to compute
relevance and overlap with DAIS when we formulate queries or specifications.
Question:
GridIR assumes these are dumb sources, databases are smart sources,
so architecture should allow for these?
Answere:
Dr. Newby: we are accounting for that to some extent, but maybe
we need to look into it more.
Mr. Nassar: relational database are kind of out of scope.
Mr. Gamiel: DAIS may be focussing on this issue. However, both query
processor and indexer have the same interface, so user could query
it, and chining of query processors is possible. We see the Query
Processor as a more sophisticated service as it is now, and for
it to handle the merge and other functions.
Question:
Is ranking and ordering of results part of the model? There may
be a situation where user may want to change the relevance.
Answer:
We are trying to give you a model where you can solve such problems.
Query Processor can be implemented in such a way.
Question:
Do you have an API?
Answer:
We need to come up with that in the Specification document. We
will say what all you can do, but not necessarily all of it needs
to be supported. We are trying to grid enable the information retrieval
system.
Dr. Newby: Effective merging of results is the
least solved problem right now. GridIR will lay foundation for more
research on this.
Mr. Nassar: The reason we separated Indexer from Query Prosessor
is to be able to solve these problems.
Dr. Newby: We are very excited about Query Processor and the
chining process for example creating a set from a result of
a query, and then be able to query that.
Mr. Gamiel gave an example
of how a user can construct a query and submit it and come back
after a week and get the updated results. Client can come and go
as they want, this is all secure, the user can also create Query
Processor and Indexer he/she is authorized based on OGSA model of
factories, instances, etc ... User can dynamically create informational
retrieval systems or part of it.
Next Mr. Gamiel requested
that everyone should subscribe to the mailing list and explained
that if they were on the old mailing list hosted by MCNC now they
should switch to the one offered through GGF. He asked the attendees
to send their comments on the Requirement Document and the Architecture
document even though it is a rough draft.
Then Dr. Newby
presented the conclusion slides and talked about the fact that
how security and logging should be handled has not yet been defined,
and that we don't want to overload the requirements. He said we
would like to have other reference retrieval or parts of it, and
asked the attendees to contact the Chairs or others to contribute
to the reference implementation of GridIR.
Next Dr. Newby talked about plans
for GGF9 he said by then requirements will be out of the door,
and we possibly will have final architecture document and the first
draft of specification document. He then request for volunteers
for Co-ordination with PA, DAIS and FTP. He invited every one to
attend the demo at 2 pm in Aspen room.
|