Page MenuHomePhabricator

Shape Expressions for Wikidata Lexemes using ShExStatements
Closed, ResolvedPublic

Description

ShExStatements can be used to write simple shape expressions using CSV files.
The goal of this project is to create shape expressions for different lexical categories in different languages.

Languages:

  1. English
  2. French
  3. Malayalam

Lexical categories

  1. Nouns
  2. Pronouns
  3. Verbs
  4. Adverbs
  5. Adjectives

Source Code: ShExStatements
Examples: CSV files

Event Timeline

  1. Malayalam Noun: Entity Schema: E309 built from ShExStatements: Malayalam Noun CSV file
  2. English adjective: Entity Schema: E310 built from ShExStatements: English adjective CSV file
  3. English adverb: Entity Schema: E311 built from ShExStatements: English adverb CSV file

@Jsamwrites Any plans to insert such Shape Expressions into the workflows for Lexeme Forms?

@Daniel_Mietchen This will be very interesting idea. Shape Expressions for lexemes of exisiting languages on Lexeme Forms can be generated.

If my understanding of Lexeme-Forms is correct, it reads the data directly from the Lexeme-Forms pages on Wikidata 'on the fly' for generating the form page for every language. But these pages could be parsed for generating the shape expressions.

Adding @Lucas_Werkmeister_WMDE in the loop.

Some more shape expressions created:

  1. English preposition: Entity Schema: E312 built from ShExStatements: English preposition CSV file
  2. English interjection: Entity Schema: E313 built from ShExStatements: English interjection CSV file
  3. French interjection: Entity Schema: E314 built from ShExStatements: French interjection CSV file
  4. French adjective: Entity Schema: E316 built from ShExStatements: French adjective CSV file

If my understanding of Lexeme-Forms is correct, it reads the data directly from the Lexeme-Forms pages on Wikidata 'on the fly' for generating the form page for every language.

It doesn't read them from the wiki pages. They're checked to make sure they make sense and then manually added to templates.py in the Lexeme Forms repository.

Thanks @Nikki for the pointer. In that case, shape expressions can be easily generated from the lexical categories of the languages in templates.py.

Closing this task as this was created during Wikimedia-Hackathon-2021.