TurtlePatch

From Semantic Web Standards

Status: Proposed by Sandro Hawke, December 2011. Discussed as one option by the Linked Data Platform Working Group, but the group decided not to standardize in this space yet. Largely rewritten January 2014.

Modified July 2014 to include wildcards and use Prefer: blank-nodes=use-genid for handling blank nodes.

Implementations: none known at this time. Report any to sandro@w3.org.

Motivation

Motivation for RDF Patching

A Web server is offering some RDF data, at http://example.org/g1.

1. G1 was very big, and the server wants to allow authorized clients to modify it without them needing to re-transmit the whole thing using an HTTP PUT.

2. Different clients want to update different parts of G1 at the same time. With PUT they would have an if-match etag conflict and have to re-try. If G1 changes faster than some client's round-trip time, it will never be able to update it. With PATCH, in some applications, the client can omit if-match, doing "blind" patching, and still have acceptable results. (Arguably G1 should be split into different resources to address this kind of problem.)

3. The server wants to offer a stream of patches to other systems (clients or slave servers), so they can efficiently maintain an up-to-date copy of g1.

Motivation for TurtlePatch Specifically

TurtlePatch is a subset of SPARQL 1.1 UPDATE which is easier to parse, easier to process, and guaranteed to be fast to process. As such, TurtlePatch is better than SPARQL 1.1 UPDATE in two situations:

1. If patch receivers need to be simple code, not a full SPARQL 1.1 UPDATE implementation. For example, if the receiver is running in a browser or mobile app, or is a small add-on to an existing application.

2. If receivers need to be able to apply all patches in linear or near-linear time with respect to the size of the patch. (Using full SPARQL UPDATE, a poorly-constructed patch can easily require exponential time to process. For certain unlikely graphs, it may not be possible to construct a patch that can be processed under practical resource constraints.

In these situations, it may be better to use text/turtlepatch instead of application/sparql-update for the PATCH language.

Example

 PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
 PREFIX s: <http://www.w3.org/2000/01/rdf-schema#>
 DELETE WHERE {
   <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@w3.org>.
 }
 INSERT DATA {
   <http://www.w3.org/People/Berners-Lee/card#i> foaf:mbox <mailto:timbl@hushmail.com>.
   <http://www.w3.org/People/Berners-Lee/card> s:comment "This is my general description of myself.\n\nI try to keep data here up to date and it should be considered authoritative.".
 }

Details

TurtlePatch is defined as the syntactic and semantic subset of the SPARQL 1.1 UPDATE language with the following characteristics:

  • The only allowed SPARQL syntactic constructs are BASE, PREFIX, INSERT DATA, and DELETE WHERE
  • The expression between the curly braces is only Turtle (no GRAPH Keyword), without any BASE or PREFIX directives
  • String literals MUST NOT contain unescaped newline characters
  • The document must be of the form:
    • zero or one BASE directives, followed by
    • zero or more PREFIX directives, followed by
    • zero or one DELETE WHERE directive, followed by
    • zero or one INSERT DATA directive
  • Each of the directives, and the closing braces, must be on a line by itself with no extra whitespace.
  • After parsing, any blank nodes in the DELETE clause must occur only once. That is, there are no joins.

This syntax is designed so that a TurtlePatch document can be correctly processed either by (1) treating it as a standard SPARQL 1.1 Update, or (2) easily turning it into two Turtle documents which simply enumerate the triples to be deleted and added.

As per SPARQL 1.1 Update:

  • blank nodes in the DELETE clause are treated as variables. Since they can only occur once, they are just wildcards.
  • blank nodes in the INSERT clause turn into fresh blank nodes in the resulting graph.
  • if any triples in the DELETE WHERE clause are not matched, they are ignored.

PROBLEM REPORTED with DELETE WHERE + Blank nodes: http://lists.w3.org/Archives/Public/public-ldp-wg/2014Jul/0111.html

Handling Blank Nodes

By itself, TurtlePatch cannot be used to remove triples containing specific blank nodes. There is, however, a technique for allowing graphs containing blank nodes to be arbitrarily patched.

The technique is to provide a Skolemized view of each graph, where the Skolem constants (genid IRIs) have a persistent mapping to each blank node, used during both GET and PATCH. This is relatively easy to do when the server has an internal identifier for blank nodes, as many databases do, by just mapping it to a genid IRI. If the server does not have internal identifiers — for instance if the server is just storing Turtle files — this can be difficult.

We propose the client ask for the genid version of a graph using a Prefer header, with return=representation and new parameter blank-nodes=use-genid. For example, without the header:

> GET /alice HTTP/1.1
> Host: example.org
> Accept: text/turtle
...
< HTTP/1.1 200 OK
< Vary: Prefer
< Content-Type: text/turtle
...   
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX : <http://example.org/>
:me foaf:knows [ foaf:name "Bob" ].

And then with the header:

> GET /alice HTTP/1.1
> Host: example.org
> Accept: text/turtle
> Prefer: return=representation blank-nodes=use-genid
...
< HTTP/1.1 200 OK
< Vary: Prefer
< Content-Type: text/turtle
...
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX : <http://example.org/>
PREFIX genid: <http://example.org/.well-known/genid/d7432/>
:me foaf:knows genid:517.
genid:517 foaf:name "Bob".

At some point after doing the second GET, the client can use genid:517 (that is, http://example.org/.well-known/genid/d7432/517) in a PATCH operation on http://example.org/alice

The Prefer parameter has not yet been discussed at IETF.

Advice on Parsing

Is there any easy way to get a Turtle serializer to use \n instead of newlines? Let's gather a list of flags for different systems.

You can parse the patch into your prefix declarations and two turtle strings with a regexp like this:

p = re.compile(r"""
(?P<prefix_declaration>(\s*(PREFIX.*|\#.*|)\n)*)  # prefix block, with comments
\s*DELETE\s+DATA\s+\{\s*\n                        # "delete" header
(?P<delete_pattern>(\s*[^}].*\n)*)                # "delete" pattern
\s*}\s*\n\s*INSERT\s+DATA\s+\{\s*\n               # "insert" header
(?P<insert_pattern>(\s*[^}].*\n)*)                # "insert" pattern
\s*\}(\s|\n|\s*\#.*)*                             # footer, with comments
""", re.MULTILINE+re.VERBOSE)

So you run that regexp above, then feed match.prefix_declaration+match.delete_pattern to a turtle parser to get the delete triples and match.prefix_declaration+match.insert_pattern to get the insert triples.