[RDF-Survey] [XWMF]

Summary of Recent Discussions about an Application Programming Interface for RDF

Author : Peter Hannappel, Mat.Nr. 205423
Institute : University of Essen, Germany
Chair : SWT
Chairman : Prof. Gustaf Neumann
Supervisor : Reinhold Klapsing (contact)

Table of Contents

1) Motivation
2) Related Work
    2.1) Mozilla
    2.2) GINF
    2.3) The RADIX proposal
3) W3C RDF Interest Group RDF API Proposal
    3.1) Points of interest
    3.2) The RDF API Core interfaces
    3.3) Using the RDF API Core interfaces - A short Example
4) An implementation of the RDF API Proposal
5) Future Work
6) References

1) Motivation

RDF (Resource description framework) is a standard for the notation of structured information, recommended by the W3 Consortium[1,2].
The main concept of RDF is the model, containing information about a set of resources, which represent objects on the web or in the real world.
Constants (String values) may be included, they are called literals.
Relationships between resources and literals are expressed by statements, which connect a subject with an object via a predicate (e.g. the statement with this paper as subject, „author“ as predicate and object=“Peter Hannappel“ expresses that the author of this paper is Peter Hannappel).
An RDF schema is used to define the set of resources that may be used by a model, including constraints for resource and literal values.

RDF models are parsed and serialized to enable easy storage and transportation as a file or stream. The format most commonly used is XML, although there exist other (e.g stream of statements). Utilities for parsing and serializing RDF have been developed, an example is the SiRPAC parser[3] available on the W3C web site.

However, no Application Programming Interface (API) is available. To use applications for designing / manipulating RDF models, self-made solutions have to be developed.

Although it is possible to manipulate RDF models in XML format by using an API Standard for XML (e.g.DOM [4]), this approach is not favorable, as XML is just one way to display RDF (see above).

This lack of a standard API for RDF has led to isolated solutions (compare Chapter 2), which are often designed to fit to a specific purpose rather than to cover all important aspects of RDF modelling. Cooperation between this API’s is difficult at best, as compartibility is questionable regarding the differences.

This paper introduces a proposal for an RDF API standard made by the RDF interest community which discussed the topic in a newsgroup hosted by the W3C.
Chapter 2 is dedicated to previous approaches to an RDF API. Chapter 3 describes the development of the API standard and displays its current status. In Chapter 4 an example of implementing the standard is introduced. The last chapter is used for describe future work that may be done on the API.


2) Related Work

Chapter 2 is dedicated to previous approaches to an RDF API. Mozilla and GINF are applications which use RDF for data storage, while RADIX is an RDF API standard proposal made in an RDF development newsgroup.

2.1) Mozilla

Mozilla’s browser software uses RDF to store browser-related data, such as : For automated access to the RDF data, an implementation for RDF was developed[5].
RDF data is stored in form of three sets : Each resource can be mapped to an URI. This mapping is 1-to-1, so mapping can be performed both ways. To accomplish this mapping, noname resources are assigned an URI automatically.
Likewise, Literals are mapped 1-to-1 to the corresponding string values.
The statements are organized in subsets, so-called datasources. A datasource contains the statements about a specific topic (e.g. bookmark data source). Therefore a datasource is comparable to an RDF model.

The RDF implementation of Mozilla is defined by a set of interfaces, as follows :

nsIRDFService

This interface provides methods for

nsIRDFNode

The abstract interface is used to manage a resource or literal.
Derived from nsIRDFNode are the interfaces

nsIRDFDataSource

The interface is used to manage a single datasource (i.e. a model).

nsIRDFCompositeDataSource

With this interface, a set of datasources may be combined (union of the datasources). The composite datasource contains the information of all datasources combined in it.

nsIRDFObserver

The interface provides methods which allow to „observe“ a datasource for changes made and trigger reactions on them („active RDF“).

nsIRDFContainer
nsIRDFContainerUtils

This interfaces enable handling of RDF container objects (bag, sequence, alternation).


2.2) GINF

The „Generic Interoperability Framework“ (GINF)[6] is a project of Sergej Melnik from Stanford University.
GINF is a tool for the integration of heterogenous components.
It uses RDF for generic representation of GINF uses a modified SiRPAC[3] parser for parsing and serializing RDF.
To access and manipulate RDF data, an implementation was developed. The core interface of this implementation is the interface model, providing methods for : GINF makes use of this model interface to provide a basis for more sophistricated funtionality.
For example, a „schema model“ was developed, making use of RDF schema information (e.g. subClasses) to extend querying functionality (for example, a query for all statements with a predicate of a specific class may be made). For more information on this schema model and its functionality, see [7].

2.3) The RADIX proposal

RADIX is a proposal for an RDF API issued by Ron Daniel[8]. The name RADIX is a mix from „RDF“, „API“ and „XML“.
Ron Daniel issued a collection of requirements an API for RDF „must“, „should“ and „may“ fulfill. These requirements are summarized below.

Must :

May : Should (for implementations of the RADIX API) : Following is a summary of the interfaces proposed in the message :

Resource

The interface supports a single resource. It contains methods for manipulating / querying for the URI. Of interest at this point is the lack of an interface for a literal, which is treated as a standard data type (e.g. string, integer). Data of literals is stored in the Statement using the literal.

Statement (subclass of Resource)

The statement interface supports an RDF statement, with methods for changing / getting the subject, predicate and object of the statement. Because statement is a subclass of Resource, each statement has an URI (enabling easy creation of reified statements), which may be set at the creation of the statement or at a later time (via the inherited method from Resource).

Model (subclass of Resource)

This interface is used for support of an RDF Model. The Model interface provides the following functionalities :

ModelStore (subclass of model)

The ModelStore Interface is used for storage of models. It is a collection of several models. Methods exist for adding / removing models from the store.

Query (subclass of statement)

The interface is used for construction of queries on models. It has no further methods other than those derived from Statement. A query to a model is made by creating a statement (via the Query interface) and calling the appropriate method from the model interface. The query returns the statements which „match“ the Query statement, null values in the Query statement match anything (Compare the find method in Section 3.2 of this paper)).


3) W3C RDF Interest Group RDF API Proposal

Comment : This chapter refers to contributions to the W3C RDF interest group. In this chapter, references to single messages are noted {Author N#} for november postings, {Author D#} for december postings, where # is the number of the message. The base references to the november/december discussion groups are stored in the References section [9,10].

Developing a standardized RDF API was discussed since the summer of 1999. Chapter 3.1 below is dedicated to topics that were discussed in November/December 1999, when the RDF API discussion went concrete.
In Chapter 3.2 the APIs central interfaces are introduced. Chapter 3.3 contains an example how to use these interfaces.
Chapter 4 is dedicated to an implementation of the API interfaces by Sergej Melnik (available as package[11]).

Serious discussion about the API aroused around November 14th, when a structure for an RDF API based on the existing API used by GINF was proposed{Melnik N44}.
A second proposal for the RDF API{Melnik D5}, based on the first and the discussions thereafter, was made on December 3rd. Melniks implementation is based on the december proposal.


3.1) Points of interest

Following is a sample of discussion topics that were of particular interest for the rdf-interest community :

Standalone resources

There was a proposal to provide add/remove methods not only for triples, but also for inserting / removing single nodes{Saarela N79} (as in the RADIX proposal, compare section 2.3). An argument were possible problems in calculating the intersection / difference of two models (What is triple {A,B,C} minus triple {A,D,E} - {A} or {A,B,C}).
However, an argument against the existence of stand-alone resources is the question whether resources that have an identifier but provide no additional information make sense in the context of an RDF Model.
The discussion about this topic was closed without changes to the API, i.e. the current API version does not support handling of stand-alone-resources. Difference and Intersection of models is computed on the base of triples, not resources (i.e. {A,B,C} minus triple {A,D,E} is {A,B,C}).

Models as resources
In the first API proposal it was not clearly said if Models are treated as Resources (have an URI). However, there were arguments made by members of the discussion group that could hardly be ignored{Beged-Dov N53}{Daniel N119}:

A model should be a resource / have an URI because

The current API was designed in the way that the class model inherits from the class resource (Compare Picture 1). Accordingly, a model is treated as a resource and is assigned an URI (for creation of Model URIs see below).

Versioning of models

A message in early December{Liljegren D12} pointed out the need for versioning of RDF models. According to this message, versioning is essential

The version number could be part of the URI.
Additionally the message pointed out a problem with versioning : When should a new version be created. If created every time, there would be a problem with small changes to the model that normally should not trigger a version change.

Related to the versioning of models is the concept of open/ closed models proposed in {Beged-Dov D26}. An open model is in an „edit“ phase of ist lifecycle - it may be modified, but accessing information (e.g by querying, set operations with other models) may not be done. When the model is closed, URIs are computed for the model itself, triples and no-name resources. Closed models may be accesed for information, but not edited.
In response to this proposal, it was pointed out{Melnik D46} that calculation of URIs for no-name resources should be done in the moment of modifikation, regarding the possibility of cyclic dependancies in the model.

The current implementation of the API (Compare Chapter 4) does not support open/closed models or model versions, but the discussion on this issue may be continued. For versioning of models, compare Chapter 5.

API Layers

In a message from December{Melnik D44} the creation of an RDF schema  interface was announced, according to a proposal made in an earlier message{van Dort D29}. The RDF schema interface (SchemaModel) was meant to be an extension of VirtualModel and to be placed in org.w3c.rdf.schema.
The released API contains an interface for accessing schema information (compare 3.2). It currently lacks a way of manipulating RDF schema information.

In a later message{Liljegren D81} a concrete layer proposal for the RDF API was made. The message proposed four layers1 :

By now, this layer proposal has not been discussed further.

URI generation for triples

In the API interface the class Triple (respective Statement) is an instance of the class Resource (Compare Picture 1). Accordingly, the API assignes an URI to each triple. Discussions aroused what this URI should be like :
A proposal was made to set a triples URI to the model URI + an UUID for the triple{Beged-Dov D6}. During  serialization, the corresponding property should get an rdf:ID entry (according to the basic serialization syntax). This approach would have the advantage that the process is independant of the used XML serialization.
Additionally, the importance of an algorithm which creates the same identifier for equal triples in different implementations was mentioned {Melnik D23}.
The API followes this approaches so far that it creates a MD5 digest as URI for each triple. This digest is calculated as digest(subject) XOR digest(predicate) XOR digest(object).
A later message {Liljegren D73} claimed that relative URIs do not make sense regarding triples, because

Liljegren proposed to use absolute URIs for triples, therefore avoiding the problems above.

URI generation for anonymous resources

Automatic generation of URIs for anonymous (no-name) resources was of specific interest for the rdf-interest discussion group.

The concept of assigning an URL to any resource in the model did not please everyone in the community.
Starting this discussion was a message {Melnik D32} that demanded accessibility for every resource and statement in a model, which requires URI generation.
On the other side it was pointed that the URI of a named resource (e.g. the URL of a person’s homepage) usually holds more information than an generated URI, which refers to the RDF describing the resource{Brickley D33}.
In an additional message it is said that anonymous resources are ‘mentions’ of resources which (opposite to the resources themself) may be anonymous. Additionally it was pointed out, that „assigning identifiers to people is a political loaded business“ {Brickley D35}).

Nonetheless, algorithms for calculating URI’s for no-name resources were brought up.
Like some explicitely named resources, an anonymous resource is accessible via a fragment identifier (i.e. a relative hyperlink). For noname resources the fragment identifier would be a digest somehow based on data related to the resource.

Sergej Melnik proposed algorithms for calculation of digests for noname resources and models{Melnik D23}. According to his proposal, the URI of a noname resource is computed out of the digests of other resources, as follows :

Additionally, the digests used for computation are previously rotated by n bits, where n is the „level“ in the RDF fragment used for calculation.
Example (from Melnik’s proposal) : Given the RDF structure

X

, the digests for the resources X, Y, Z are computed as follows (cx is the digest of c rotated by x bytes)

digest(Z)=a0 xor b1 xor c2 xor A3 xor d2 xor B3
digest(Y)=a0 xor b1 xor Z2 xor e1 xor C2
digest(X)=a0 xor Y1

Printed bold is the digest for the path, the other part is the digest for the attributes.
With this algorithm, digests for noname resources can be computed beginning at the resources at high levels in the RDF tree. If a noname resource later gets an explicit name, only the digests of the resources „one level above“ need to be recomputed. Furthermore, switching the order of attributes of a single resource has no effect on that resource’s digest.

For easy separation of named resources and anonymous resources, a proposal was made that anonymous resource fragments are named rdfpointer:anon:digest - opposed to named resources which would be refered to as rdfpointer:id:name{Beged-Dov D36}. In this proposal the resource fragments were used with the generated model id (see below) rather than the model’s source URI.
In a later message{Melnik D43}, disadvantages of using the generated model id were mentioned. First, URIs for anonymous resources would change whenever the model itself changes. Second, the models generated id depends on the digests of anonymous resources, creating cyclic dependancies in digest calcution.

URI generation for models

Sergej Melnik proposed an algorithm for computing a Models URI in the same message as his algorithm for triple URIs (see above).
The model digest is computed combining the digests of all triples with the xor operator. The algorithm allows fast processing of changes in the model : When adding/removing a triple, a single xor operation is sufficient to re-calculate the digest.

Equality of Resources

The RDF community agreed that an important condition algorithms for URI generation are required to to fulfill is the generation of equal URIs for equal resources. The definition of equality, however, still had to be discussed.

The first message refering to the problem {Liljegren D33} listed three cases where equality of resources could be assumed :

Liljegren made a comment that equality in Case 1 and 2 would be questionable, as follows :

1. Identical XML from same server (Byte - for - Byte)

A weather report RDF example {Liljegren D33} was meant to point out possible problems with this case :
The weather report resource in this example is an anonymous resource , describing the „current“ weather report (It refers to a cloud-cover photo which changes daily and has a non-changing description). Liljegren argued, if the same URI would be created for the weather report each day, it would be incorrect, because the photo (and thus the content) has changed.
A comment {Melnik D34} about this statement was that the content of the RDF would not change, because it refers to the „current“ weather, not to a specific day. So, Case 1 would imply equal resources.
Melnik stated in the same message that handling of Case 1 should be a minimum requirement for an algorithm that determines equality of resources.

2. Identical XML from different servers (Byte - for - Byte)

An example about an RDF webmaster anonymous resource {Liljegren D33} showed possible inequality between resources with same XML from different sources :
The webmaster resource in this example possesses an email address („webmaster“), a homepage („/~webmaster/“) and a constant description. When received from different servers, it refers to different persons.
It was agreed that Case 2 is not sufficient to prove equality between resources{Melnik D34}.


3.2) The RDF API Core Interfaces

Following is the definition of the API’s core interfaces as proposed by Sergej Melnik after discussed by the interest group. The interface classes are thought to be implemented by other classes (Compare packages RDFModel and RDFModelImpl in Chapter 4).
Picture 1 shows class inheritance between the interface classes :


Picture 1) org.w3c.rdf.model class hierarchy

Methods defined in the interfaces are as follows :

Interface RDFNode (Abstract, i.e. only subclasses of the interface (Resource/Literal) may be used) :

Interface Literal :

The interfaces contains no additional methods.

Interface Resource :

Interface Statement :
Comment : The inherited method getURI must be implemented to return an MD5 digest. Interface RDFModel :
Comment : The inherited method getURI must be implemented to return an MD5 digest.

object creation

model manipulation model querying Interface SetModel : Interface VirtualModel : Comment : Virtual Model is intended to be used to add functionality to the normal model interface (e.g. active models).


3.3) Using the RDF API Core interfaces - A short Example

Following is a short script demonstrating the usage of the RDF Core interface. The script creates a small RDF model and performs a query on it.

#Creating the model
Model Example;
Example=Example.create;
Example.setSourceURI(„http://example_server.de/rdf/models/example“);

#Creating Resources and Literals
Resource paper =Example.createResource(„http://paperServer/paperURL“);
Resource author =Example.createResource(„“); #noname resource
Resource author_predicate =Example.createResource(„author“);
Resource firstname_predicate =Example.createResource(„firstname“);
Resource lastname_predicate =Example.createResource(„lastname“);
Literal author_firstname = Example.createLiteral („Peter“);
Literal author_lastname = Example.createLiteral („Hannappel“);

#Creating Statements
Statement paper_attr_1= Example.createStatement (paper, author_predicate, author);
Statement author_attr_1= Example.createStatement (author, firstname_predicate, author_firstname);
Statement author_attr_2= Example.createStatement (author, lastname_predicate, author_lastname);

#adding the Statements to the model
Example.add (paper_attr_1);
Example.add (author_attr_1);
Example.add (author_attr_2);

#Use of the find method
Model authorInfo=Example.find(author,0,0);


Picture 2) The created Model (left) and the Model returned by the „find“ method (right).


4) An implementation of the RDF API Proposal

Section 4 is dedicated to the RDF API 1.0 package, an implementation of the RDF API by Sergej Melnik[11]. Melniks API package contains an implementation of the previously discussed interfaces along with additional modules for parsing/serializing (SiRPAC, Strawman and standard XML) and encryption. Also included are tools and utilities for convenient use of the API. Following is a summary of the packages and their contents :

org.w3c.rdf.examples

Two examples for usage of the RDF API.

org.w3c.rdf.model

The core interfaces for the RDF API (see Section 3.3 of this paper).

org.w3c.rdf.model.impl

An implementation of the core interfaces (except of SetModel and VirtualModel, see 3.3).

org.w3c.rdf.syntax               

Interfaces for parsing and processing RDF Models.
(Input stream -> Model and Model -> output stream).

org.w3c.rdf.syntax.sirpac

A modified SiRPAC adapted to the RDF API used for both parsing and serializing.

org.w3c.rdf.syntax.xml

A parser for XML files.

org.w3c.rdf.syntax.strawman

A simplified parser for XML files, serves as base for org.w3c.syntax.xml.

org.w3c.rdf.util

Utilities for use with the RDF API :
RDFFactory : Convenient use of the API (e.g. : create Parser/Serializer).
RDFDigest : Creating Hash Codes used for URIs.
RDFReader : Converting an arbitrary character stream to RDF.
RDFUtil : A sample of utility methods on RDF Models (e.g. „print statements“).
SetOperations : A sample of set operations between models (union, difference, intersect).

org.w3c.rdf.vocabulary.rdf_schema_19990303

Comfortable access to RDF schema-related vocabulary.

org.w3c.rdf.vocabulary.rdf_syntax_19990222

Comfortable access to RDF syntax-related vocabulary.

org.w3c.tools.crypt

Creating and manipulating MD5-digests.

org.w3c.tools.sorter

Tools for sorting/comparing objects.

org.xml.sax

SAX Parser (SAX : Simple API for XML).

5) Future Work

This chapter deals with further work that may be done on the RDF API. It focuses on the versioning of RDF models.

Comment : References to messages from the RDF interest group are noted {Author N#} for november postings, {Author D#} for december postings, where # is the number of the message (Compare Chapter 3). The base references to the november/december discussion groups are stored in the References section [9,10].

RDF Schema / Namespace support

One of the next steps in designing the RDF API standard is an API for manipulating RDF schemata for model definition and validation. Methods for usage of namespaces should also be included. The approaches of Sergej Melnik in his API Implementation may be a good start for an discussion.

Model Versions

The concept of model versions was discussed by the interest group (Compare 3.1) but not integrated into the first API standard. In later versions of the API standard, classes / methods supporting model versioning may be published.
Versioning of models could be performed based on the way proposed in{Beged-Dov D26}. The lifecycle for a model would look like this (Picture 3).


Picture 3) Open / Closed Models and Model Versioning

An open model may be modified at will. When the model is suitable for usage, it is closed. At time of closing, validation of the model is performed. A new Model version may be created (at the author’s option) and stored in a repository to enable access to older versions later.
A closed model is only accessible for reading. When changes are to be made, the model is opened again.

There are possible modifications to that process :

The API Layer proposal{Liljegren D81} includes a version layer which may be used to manage the versions of a single model and ist current status. Possible methods for a class VersionedModel (which would be a subclass of model) in this version layer would be :

References

[1] Lassila, O. ; Swick, R. R. : „Resource Description Framework (RDF) Model and Syntax Specification“.
Online : http://www.w3.org/TR/REC-rdf-syntax 29.1.2000.

[2] Brickley, D. ; Guha, R. V. : „Resource Description Framework (RDF) Schema Specification“.
Online : http://www.w3.org/TR/PR-rdf-schema 29.1.2000.

[3] W3C : „SiRPAC“.
Online : http://www.w3.org/RDF/Implementations/SiRPAC 29.1. 2000.

[4] Apparao, V. ; Bryne, S. ; Champion, M. ; Isaaks, S. ; Jacobs, I. ; Le Hors, A. ; Nicol, G. ; Robie, J. ; Sutor, R. ; Wilson, C. ; Wood, L „Document Object Model (DOM) Level 1 Specification“.
Online : http://www.w3.org/TR/REC-DOM-Level-1 29.1. 2000.

[5] Waterson, C. : „rdf : back-end-architecture“.
Online : http://www.mozilla.org/rdf/back-end-architecture.html 29.1. 2000.

[6] Melnik, S. : „Generic Interoperability Framework“.
Online : http://www-diglib.stanford.edu/diglib/ginf 29.1. 2000.

[7] Melnik, S. : „RDF Schema Support in GINF“.
Online : http://www-diglib.stanford.edu/diglib/ginf/rdf-schema-in-ginf.html 29.1. 2000.

[8] Daniel, R. : „A proposal for an RDF API“.
Online : http://www.mailbase.ac.uk/lists/rdf-dev/1999-06/0002.html 2.1. 2000.

[9] W3C : „www-rdf-interest@w3c.org from November 1999“.
Online : http://lists.w3.org/Archives/Public/www-rdf-interest/1999Nov 2.1. 2000.

[10] W3C : „www-rdf-interest@w3c.org from December 1999“.
Online : http://lists.w3.org/Archives/Public/www-rdf-interest/1999Dec 2.1. 2000.

[11] Melnik, S. : „RDF API 1.0 draft“.
Online : http://www-db.stanford.edu/~melnik/rdf/api.html 2.1. 2000.


1 The layer descriptions are assumptions made by the author and not explicitely stated in the message