 |
GIR-WG Charter
Administration
Name
Grid Information Retrieval (GIR or GridIR)
Chairs
Greg Newby, PhD
Arctic Region Supercomputing Center
gbnewby at arsc.edu
Nassib Nassar
Etymon Systems, Inc.
nassar at etymon.com
Yangwoo Paul Kim, PhD
Dongguk University
ywkim at dongguk.edu
Secretary
Sousan Karimi
MCNC
sousan at mcnc.org
Mailing List
Subscription details and user interface
available at http://www.gir-wg.org/wg_list.html .
Description
and Objectives
Purpose
The GridIR WG will focus on establishing
the requirements,
specifications, reference implementations
and best
practices in supporting
Information Retrieval (IR) services on
the Grid. Grid IR services will be
needed
by users, applications
and portals
to provide
documents, document
extracts, answers
or other data items to satisfy
information needs.
Goals
The GridIR WG will
focus on the
following:
1. Establish
the requirements
for
Grid IR
services:
GridIR will
be
defined as a
set of grid
services
which,
together, constitute
a
complete
an
IR
system, including:
- Harvesters,
to gather
network-based documents
- Indexers,
to build
data- and
file-structures for
retrieval
- Index
processors, to
determine post-indexing
term and
document weights
- Query
processors, to
take user
queries and
gather results
- Integrators,
for ranking
results from
different sources
- Renderers,
to take
results and
organize or
present them
- Many
other sub-systems
and control
systems
GridIR
will also
need to
impose requirements
on the
IR service
specific
to
the Grid,
including:
- Rapid
update
schedules
for datasets
- Federation
of
datasets
from
multiple
sources
- Enabling
local
policy
for
dataset
content
access,
based
on Grid
security
infrastructure
- Sophisticated
localized
indexing
and query
processing
appropriate
for
each
dataset
- Sophisticated
post-hoc
results
ranking
- Efficient
use
of
computational
resources (e.g.,
multiple
harvesters feeding
one
indexer)
- Multimedia
capabilities
(incorporation
of special-purpose
IR
systems
into
one
meta-system)
- Rapid
rendering
and
context-switching,
including
data
visualization
of
results
and
multiple
'views'
of
data
b
ased
on
different user
profiles
- Consensus-based
results
generation
from
multiple retrieval
algorithms
to
select
best-of-breed algorithms
2. Define
a
set
of GridIR
specifications:
The
Open
Grid
Services
Architecture
(OGSA)
along
with
technologies
such
as
the
Web
Services
Flow
Language
(WSFL)
provide
a
framework
for
linking
loosely
coupled
grid
services
together
to
form
more
advanced
services.
Though
these
technologies
provide
the
infrastructure,
each
service
description
must
be
created
by
stakeholder
communities
to
ensure
required
functionality.
The
GridIR
WG
will
develop
an
overarching
IR
architecture,
will
detail
service-level
requirements,
will
establish
independent
service
models,
and
develop
interface
specifications
for
the
various
independent
IR-related
services,
all
with
an
eye
towards
tying
those
services
together
into
an
integrated
whole.
The
WG
will
work
to
develop
a
plug-and-play
type
architecture
for
GridIR
where
the
Grid
infrastructure
enables
rapid
integration
of
standards-compliant
IR
modules.
In
many
cases,
GridIR
will
allow
communication
between
modules
(e.g.,
for
multiple
harvesters
feeding
an
indexer).Anticipated
individual
services
include
crawlers, indexers,
search
and
presentation
engines.
3.
Support
and
Evaluate
GridIR
Reference
Implementations
There
are
numerous
investigation
areas
for
the
reference
implementation
for
GridIR
specifications.
The reference
implementations
will
address
many
of
the following
IR
considerations:
- Extremely
large
collections
(billions
of
documents)
- Documents
in
plain
text,
HTML,
XML
- Multimedia
documents
(video,
audio,
other
non-text
formats)
- Documents
in
multiple
languages;
queries
in
multiple languages
- All
variety
of
harvesting
methods
- Numerous
fundamental
IR
algorithm
components
(Boolean;
Vector Space
Model,
probabilistic
IR,
Page
Rank,
Latent
Semantic
Indexing...)
- Flexible
local
policy
for
what
documents
are allowed
- Sub-document
retrieval,
linguistic
approaches,
question
answering
- Long
and
short
queries;
document
filtering
Solutions
for
most
of
the
IR
techniques
are
available,
although
some
do
not scale
well
or
are
less
amenable
to the
distributed
processing
of
the
Grid.
GridIR
will
benefit
from
past
experiences
in
networked
IR.
For
example,
Z39.50
offers the
ability
to
send
a
query
to
multiple
IR
engines.
GridIR
will
take
Z39.50
further
by
layering
IR
on
the
Grid
security
and
authentication
infrastructure,
and
by
providing
sophisticated
techniques
for
merging and
ranking
the
results
from
the
engines.
To
support
the
evaluation
of
the
reference
implementations,
the
GridIR
WG
will
promote the
development
of
test
suites
that
can
be
used
to
validate
an
implementation
and
provide
the
basis
for
comparing them.
4.
Establish
Best
Practices
for
GridIR
The
GridIR
WG
will
establish
best
practices
for
GridIR
implementations
and
use
by collecting
and
disseminating
experiences.
Furthermore,
the
GridIR
WG
will
ensure
that
the
best
practices
conform
with
the
other
Grid
Services
groups
which
define
services
that
will
be
needed
to
implement
GridIR
as
well
as
the
Portals,
Users,
and
Applications
groups
that
will
use
the
GridIR services.
Milestones
- GridIR
Requirements
Document
-
Stakeholder-driven
list
of
service-level
requirements
for
building
a
grid-based
IR
system.
Revised
draft
by
GGF7,
finalize
by GGF8.
- GridIR
Architecture
Document
-
Describes
overall
system comprised
of
integrated
grid
services,
scenarios,
etc.
First
draft
by
GGF7,
revisions
by
GGF8, finalize
by
GGF9.
- GridIR
Services
Document
-
Describes
each
service
in
detail,
with
an
emphasis
on
WSDL
interface
specification.
First
draft
by
GGF7, revisions
by
GGF8,
finalize
by
GGF9.
Website
http://www.gir-wg.org/
|
 |
|