Wikidata:Content Schema

It is proposed that this informational page become a policy or guideline. Please see the discussion on its talk page or the project chat.
The following is a proposed Wikidata policy or guideline. The proposal may still be in development, under discussion, or in the process of gathering consensus for adoption.

The Wikidata Content Schema (WDCS) is a concept for representing different types of data models on Wikidata. It is based on an "if–then" concept: if a Wikidata item has certain claims (such as instance of (P31) scholarly article (Q13442814)), then it must, or must not, or should, or could have other claims.

It is currently under development.

Raw notes edit

A model for documenting the musts, shoulds, coulds, and shall-nots of a Wikidata entry, depending on what the entry is.

At its most fundamental, the skeleton:
{ “if”: […],
“then”: {
“required”: […],
“recommended”: […],
“optional”: […],
“must-not”: […],
}

The skeleton comprises the if part and the then superpart. Within the then superpart are the parts must, recommended, optional, and must-not. Those terms carry the RFC 2119 meaning. Thus there are five parts; each part is a key-value pair with the key being the part name and the value being an array of objects. The objects collectively are the guts.

Here are the types of guts:

{“property”: “P###”} — the property is represented on the item
{“property”: [“P1”, “P2”]} – either one property or the other is represented on the item
{“property”: “P###”, “value”: “abc”} – the property is represented on the item and has the stated value. (Note that the specification will not enforce content types – that’s up to the person writing the schema. Thus {“property”: “P31”, “value”: 12} is valid per the specification but not valid per Wikidata)
{“property”: “P###”, “value”: [“abc”, “def”]} – the property is represented on the item and has one or more of these values. Whether it’s X and Y or X or Y depends on whether the gut is located in the required, recommended, optional, or must-not section. If two things are required in pairs, then it looks like this: [ “a”, [“b”, “c”] ]
{“property”: “P###”, “expression”: “.*”} – the property is represented on the item and its value is captured by the stated regular expression
{“property”: “P###”, “qualifier”: {“property”: “P###”, “value”: “…" }} – the property is represented on the item and can have any value, but also has the stated qualifier expressed as an object, or an array of such objects. To specify the value, include a value key-value pair per above
{“property”: “P###”, “reference”: {“property”: “P###”, “value”: “…”}} – the property is represented on the item and can have any value, but also has the stated reference expressed as an object, an array of such objects, or just true to state that a reference of some kind is required. (If you set reference: true in the must-not part, it technically means a reference is forbidden)
{“property”: “P###”, “rank”: “normal”} – the property is represented on the item and can have any value, but the value is of the stated rank, which can be one of “normal,” “deprecated,” or “preferred”. You could also have an array of two in theory, I guess. Having an array that’s just all three is phenomenally stupid; don’t do it.

Key-value pairs can be combined arbitrarily, and to have more than one, you can arrange them in arrays.

Guts mirror concepts in the Wikibase data model. A schema compliant with WDCS can be fed into a tester to generate a compliance report.
* If a gut is present in a given item as a claim and is in the “required” part then the test passes
* If a gut is present in a given item as a claim and is in the “must-not” part then the test fails
* If a gut is NOT present in a given item as a claim and is in the “required” part then the test passes
* If a gut is NOT present in a given item as a claim and is in the “must-not” part then the test fails
* In order for the test overall to pass, all the sub-tests must pass. One failure results in an overall failure.
* Recommended and optional are “non-voting.” While they show up on the report, they do not determine whether a test succeeds or fails.