Wikidata:Lexicographical data/Notability

This is only a draft; everyone can edit this page. If you disagree with some rules in the page, feel free to "correct" it. It's better to leave your reason in edit summary or talk page.

As Wikidata for Wiktionary is aimed to support Wiktionary editors and content and to make lexicographical data available in a structured and machine-readable way, basically all words acceptable in some Wiktionary are acceptable.

Lexeme edit

A Lexeme is considered notable if:

  1. there's at least one notable Form in the Lexeme, and
  2. the Lexeme is idiomatic.

Note we do not exclude terms originating in fictional universes.

Idiomaticity edit

Whether a Lexeme is idiomatic is up to community. However, these type of Lexemes are allowed in principle:

  1. Any Lexeme in <unspecified languages> with a sense which is a single word.
  2. Any number between 0 and 100, inclusive; by contrast, numbers, numerals, and ordinals over 100 that are not single words or are sequences of digits are not considered idiomatic per se.
  3. Phrasebook entries (very common expressions that are considered useful to non-native speakers).

Sometimes, a phrase may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers.

Proper nouns edit

Proper nouns should be included if they meet one of the following conditions:

  1. There are multiple senses (multiple Wikidata items with this label)
  2. They are etymologically important (some common noun or other part of speech is derived from this proper noun, e.g. adjective)
  3. They have been included in multiple authoritative dictionaries or similar sources
  4. They are a part of another proper noun (for example given or family names)
  5. They have inflections

Proper nouns already added that do not meet one of these criteria may be retained, but they should not be systematically added outside of these conditions. See this initial discussion.


Letters edit

Letters are accepted as lexemes. Thus, it is possible to add information such as gender, definiteless, pronunciation, etc. that may differ over different languages.

Phonemes edit

Phonemes are not accepted as lexemes (see also here). They have to be stored as items.

Form edit

A Form is considered notable if:

  1. it is or can be attested, or
  2. it is a inflection form or term in alternative script or orthography, but attestation is needed if the existence of such Form is likely to be disputed.

Attestation edit

"Attested" means verified through:

  1. use or mention in at least one serious source; or
  2. two other sources, provided that they can be archived somewhere (e.g. at archive.org), and are in different sentences by different people, and are spanning at least six month.

Sources should be provided through attested in (P5323), entries in dictionaries with described by source (P1343), described at URL (P973) or thorugh providing dedicated properties with external identifiers. Quotes may be added with the qualifier quotation (P1683). See this archived discussion.

Note this does include nonce words and dictionary-only terms.


Sense edit

A Sense is considered notable if it is or can be attested.

Languages edit

Any language is included as long as they are notable as Wikidata item. This broadly includes:

  1. All natural languages.
  2. Sign languages.
  3. Constructed languages like Esperanto and Toki Pona, and artistic languages like Klingon.
  4. Reconstructed languages such as Proto-Indo-European.

Note community may explicitly include or exclude a language based on consensus, such as when a proposed language is considered a dialect of or alternate name for another language.