Global Wordnet Formats

The Global WordNet Association provides three formats for which WordNets can be published and submitted to the ILI. These are as follows:

All of these formats are considered equivalent and a converter between them can be used at.

XML

The XML is specified by the following DTD. An example is given here:

The first three lines must always be as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE LexicalResource SYSTEM "http://globalwordnet.github.io/schemas/WN-LMF-1.0.dtd">
<LexicalResource xmlns:dc="http://purl.org/dc/elements/1.1/">

A file may contain multiple WordNets in different languages:

The following information is required:

Extra properties may be included from Dublin core and in addition

Each word (lexical entry) must have a unique id:

        <LexicalEntry id="w1">

The part of speech values are as follows:

The set of relations between senses is limited to the following

Syntactic Behaviour is given as in Princeton WordNet

            <SyntacticBehaviour subcategorizationFrame="Sam cannot %s Sue "/>
            <SyntacticBehaviour subcategorizationFrame="Sam and Sue %s"/>
            <SyntacticBehaviour subcategorizationFrame="The banks %s the check"/>
        </LexicalEntry>

If a synset is already mapped to the ILI please give the ID here. All synsets must have an ID that starts with ID of the lexicon followed by a dash, e.g., example-en + - + local_synset_id.

        <Synset id="example-en-10161911-n" ili="i90287" partOfSpeech="n">
            <Definition>
                the father of your father or mother
            </Definition>

The set of relations between synsets is limited to the following:

Princeton WordNet Properties

Non-Princeton WordNet Relations

If you wish to define a new concept call the concept "in" (ILI New). If there is no mapping to the ILI leave this field empty (it is required).

        <Synset id="example-en-1-n" ili="in" partOfSpeech="n">
            <Definition>A father's father; a paternal grandfather</Definition>

You can include metadata (such as source) at many points The ILI Definition must be at least 20 characters or five words

            <ILIDefinition dc:source="https://en.wiktionary.org/wiki/farfar">
                A father's father; a paternal grandfather
            </ILIDefinition>
        </Synset>

You must include all targets of relations

        <Synset id="example-en-10162692-n" ili="i90292" partOfSpeech="n"/>
    </Lexicon>
    <Lexicon id="example-sv"
             label="Example wordnet (Swedish)"
             language="sv" 
             email="john@mccr.ae"
             license="https://creativecommons.org/publicdomain/zero/1.0/"
             version="1.0"
             citation="CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)."
             url="http://globalwordnet.github.io/schemas/"
             dc:publisher="Global Wordnet Association">

The list of lexical entries (words) in your wordnet

        <LexicalEntry id="w4">
            <Lemma writtenForm="farfar" partOfSpeech="n"/>

Synsets need not be language-specific but senses must be

            <Sense id="example-sv-2-n-1" synset="example-en-1-n">
                <SenseExample dc:source="Europarl Corpus">
                    Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför
                </SenseExample>
            </Sense>
        </LexicalEntry>
    </Lexicon>
</LexicalResource>

JSON

The JSON format follows that of the XML and is based on JSON-LD An example of the JSON is as follows:

The top level of a JSON graph consists of an object with two properties @context which must be the fixed string referring to the JSON-LD context and @graph giving the lexicon format. This structure is required for submission to the Collaborative Interlingual Index, but web services may of course return shorter fragments of the structure.

{ 
  "@context": "http://globalwordnet.github.io/schemas/wn-json-context-1.0.json",
  "@graph": [{

The following are required properties of every WordNet (note the language must be given twice). @id gives the identifier of this wordnet (should be unique in this document) and @type must be ontolex:Lexicon.

      "@context": { "@language": "en" },
      "@id": "example-en",
      "@type": "ontolex:Lexicon", 
      "label": "Example wordnet (English)",
      "language": "en",
      "email": "john@mccr.ae",
      "rights": "https://creativecommons.org/publicdomain/zero/1.0/",
      "version": "1.0",

In addition the properties citation, url, status, confidenceScore and any property from Dublin Core Elements 1.1 May be used

      "citation": "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016).",
      "url": "http://globalwordnet.github.io/schemas",
      "publisher": "Global Wordnet Association",

The entries are given as a list under the entry property it requires an @id partOfSpeech and lemma and may have sense, synBehavior, status, confidence and Dublin Core properties. The lemma has only a single value writtenForm and the partOfSpeech must be one of the following: [ noun, verb, adjective, adverb, adjective_satellite, phrase, conjunction, adposition, other, unknown ]. The @id must be unique in the document, it is not the same as the @id of the wordnet or any other entry.

      "entry": [{
          "@id" : "w1",
          "lemma": { "writtenForm": "father" }, 
          "partOfSpeech": "noun",

The Sense requires only an @id and a synset and may take status, confidenceScore, Dublin Core properties, an example.

          "sense": [{
              "@id": "example-en-10161911-n-1",
              "synset": "example-en-10161911-n"
          }]
      }, {
          "@id" : "w2",
          "lemma": { "writtenForm": "paternal grandfather" }, 
          "partOfSpeech": "noun",
          "sense": [{
              "@id": "example-en-1-n-1",
              "synset": "example-en-1-n",

A sense may also have any number of relations which have a relType from the list below and a target and may have Dublin Core properties

The syntactic behavior is given here as follows:

          "synBehavior": [
             {"label": "Sam cannot %s Sue"}, 
             {"label": "Sam and Sue %s"},
             {"label": "The banks %s the check"}
           ]
      }],

Synsets are listed under the synset property. A synset requires only an @id. It may take an ili which is a code from the CILI (starting with ili:i), a definition, an iliDefinition (which must be given in English), status, confidenceScore, relations and Dublin Core properties.

In contrast to the XML form the ili is optional. If there is no match omit this tag, if you wish to propose a new synset add only a iliDefinition.

      "synset": [{
          "@id": "example-en-10161911-n",
          "partOfSpeech": "noun",
          "ili": "ili:i90287",

Definitions must have a gloss and may be have a language, in addition, status, confidenceScore and Dublin Core properties may be added. An iliDefinition is the same but may not have a language.

          "definition": [{
            "gloss": "that which is perceived or known or inferred to have its own distinct existence (living or nonliving)"
           }],

Synset relations are given as for sense relations except the target must be the @id of another synset not a sense. The following properties can be used:

Any examples should be given on the sense as follows:

          "sense": [{
              "@id": "example-sv-2-n-1",
              "synset": "example-en-1-n",
              "example": [{
                  "value": "Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför",
                  "source": "Europarl Corpus"
              }]
          }]
      }]
  }]
}

The JSON format can be validated by the JSON Schema provided at https://github.com/globalwordnet/schemas/blob/master/wn-json-schema.json

RDF

The RDF schema is significantly more flexible and builds principally on the W3C OntoLex Model. The details of the RDF serialization are principally built on those of the JSON-LD model. We include a separate tutorial here for the benefit of those who wish to create their resource natively in RDF.

The standard namespaces are

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ili: <http://ili.globalwordnet.org/ili/> .
@prefix lime: <http://www.w3.org/ns/lemon/lime#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix synsem: <http://www.w3.org/ns/lemon/synsem#> .
@prefix wn: <http://wordnet-rdf.princeton.edu/ontology#> .
@prefix wordnetlicense: <http://wordnet.princeton.edu/wordnet/license/> .

Each wordnet is an instance of the class ontolex:Lexicon and must have the following properties

The mapping to the Lemon-OntoLex model is as follows:

A more extended example is given here:

<#example-en> a lime:Lexicon ;
  rdfs:label "Example wordnet (English)"@en ;
  dc:language "en" ;
  schema:email "john@mccr.ae" ;
  cc:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
  owl:versionInfo "1.0" ;
  schema:citation "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)." ;
  schema:url "http://globalwordnet.github.io/schemas/" ;
  dc:publisher "Global Wordnet Association" ;
  lime:entry <#w1>, <#w2>, <#w3> .

<#w1> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "grandfather"@en 
  ] ;
  wn:partOfSpeech wn:noun ;
  ontolex:sense <#example-en-10161911-n-1> .

<#example-en-10161911-n-1>  a ontolex:LexicalSense ;
  ontolex:reference <#example-en-10161911-n> .

<#w2> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "paternal grandfather"@en 
  ] ;
  wn:partOfSpeech wn:noun ;
  ontolex:sense <#example-en-1-n-1> .

<#example-en-1-n-1> a ontolex:LexicalSense ;
  ontolex:reference <#example-en-1-n> .

[] a ontolex:Sense ;
  vartrans:source <#example-en-1-n-1> ;
  vartrans:category wn:derivation ;
  vartrans:target <#example-en-10161911-n-1> ;
  dc:creator "John McCrae"@en .
          
<#w3> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "pay"@en
  ] ;
  wn:partOfSpeech wn:verb ;
  synsem:synBehavior [
    rdfs:label "Sam cannot %s Sue" @en
  ], [
    rdfs:label "Sam and Sue %s"@en
  ], [
    rdfs:label "The banks %s the check"@en
  ] .

<#example-en-10161911-n> a ontolex:LexicalConcept ;
  wn:partOfSpeech wn:noun ;
  skos:inScheme <#example-en> ;
  wn:ili ili:i90287 ;
  wn:definition [
    rdf:value "the father of your father or mother"@en
  ] .

[] 
  vartrans:source <#example-en-10161911-n> ;
  vartrans:category wn:hypernym ; 
  vartrans:target <#example-en-10162692-n> .
          
<#example-en-1-n> a ontolex:LexicalConcept ;
  wn:partOfSpeech wn:noun ;
  skos:inScheme <#example-en> ;
  wn:definition [
    rdf:value "the father of your father or mother"@en 
  ] ;
  wn:iliDefinition [
    rdf:value "the father of your father or mother"@en ;
    dc:source "https://en.wiktionary.org/wiki/farfar"
  ] .

[]
  vartrans:source <#example-en-1-n> ;
  vartrans:category wn:hypernym ;
  vartrans:target <#example-en-10162692-n> .

<#example-sv> a lime:Lexicon ;
  rdfs:label "Example wordnet (Swedish)"@sv ;
  dc:language "sv" ;
  schema:email "john@mccr.ae" ;
  cc:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
  owl:versionInfo "1.0" ;
  schema:citation "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)." ;
  schema:url "http://globalwordnet.github.io/schemas" ;
  dc:publisher "Global Wordnet Association" ;
  lime:entry <#w4> .

<#w4> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "farfar"@sv 
  ] ;
  ontolex:otherForm [
    ontolex:writtenRep "farfäder"@sv ;
    wn:tag [
        wn:category "penn" ;
        rdf:value "NNS" 
    ]
  ] ;
  wn:partOfSpeech wn:noun ;
  wn:sense <#example-sv-2-n-1> .

<#example-sv-2-n-1> a ontolex:LexicalSense ;
  ontolex:reference <#example-en-1-n> ;
  wn:example [
    rdf:value "Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför"@sv ;
    dc:source "Europarl Corpus"
  ] .