Global Wordnet Formats

The Global WordNet Association provides three formats for which WordNets can be published and submitted to the ILI. These are as follows:

All of these formats are considered equivalent and a converter between them can be used at.

A converter and validator is available at http://server1.nlp.insight-centre.org/gwn-converter/

XML

The XML is specified by the following DTD. An example is given here:

The first three lines must always be as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE LexicalResource SYSTEM "http://globalwordnet.github.io/schemas/WN-LMF-1.3.dtd">
<LexicalResource xmlns:dc="http://purl.org/dc/elements/1.1/">

A file may contain multiple WordNets in different languages:

The following information is required:

Extra properties may be included from Dublin core and in addition

Each word (lexical entry) must have a unique id:

        <LexicalEntry id="w1">

The part of speech values are as follows:

The set of relations between senses is limited to the following

Syntactic behaviour is given as in Princeton WordNet

            <SyntacticBehaviour subcategorizationFrame="Somebody ----s" id="intransitive"/>
            <SyntacticBehaviour subcategorizationFrame="Somebody ----s somebody" id="transitive"/>
        </LexicalEntry>

Syntactic behaviour can also be given as part of the lexicon and referred to with the subcat property.

If a synset is already mapped to the ILI please give the ID here. All synsets must have an ID that starts with ID of the lexicon followed by a dash, e.g., example-en + - + local_synset_id.

        <Synset id="example-en-10161911-n" ili="i90287" partOfSpeech="n"
            members="example-en-10161911-n-1 example-en-1-n-1">
            <Definition>
                the father of your father or mother
            </Definition>

The members property gives the list of senses in order.

The set of relations between synsets is limited to the following:

Princeton WordNet Properties

Non-Princeton WordNet Relations

If you wish to define a new concept call the concept “in” (ILI New). If there is no mapping to the ILI leave this field empty (it is required).

        <Synset id="example-en-1-n" ili="in" partOfSpeech="n">
            <Definition>A father's father; a paternal grandfather</Definition>

You can include metadata (such as source) at many points The ILI Definition must be at least 20 characters or five words

            <ILIDefinition dc:source="https://en.wiktionary.org/wiki/farfar">
                A father's father; a paternal grandfather
            </ILIDefinition>
        </Synset>

You must include all targets of relations

        <Synset id="example-en-10162692-n" ili="i90292" partOfSpeech="n"/>
    </Lexicon>
    <Lexicon id="example-sv"
             label="Example wordnet (Swedish)"
             language="sv" 
             email="john@mccr.ae"
             license="https://creativecommons.org/publicdomain/zero/1.0/"
             version="1.0"
             citation="CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)."
             url="http://globalwordnet.github.io/schemas/"
             dc:publisher="Global Wordnet Association">

The list of lexical entries (words) in your wordnet

        <LexicalEntry id="w4">
            <Lemma writtenForm="farfar" partOfSpeech="n"/>

Synsets need not be language-specific but senses must be

            <Sense id="example-sv-2-n-1" synset="example-en-1-n">
                <SenseExample dc:source="Europarl Corpus">
                    Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför
                </SenseExample>
            </Sense>
        </LexicalEntry>
    </Lexicon>
</LexicalResource>

Pronunciation

Since 2021, the schema has the ability to represent the pronunciation of lemmas.

This is in the <Pronunciation> element, which gives the IPA text. It has the following attributes: * variety uses the IETF language tags to indicate dialect, for example encoding British English in IPA as en-GB-fonipa * notation: can encode further information such as indicating a particular dialect (this was notes in the paper) * phonemic: indicates whether the transcription is phonemic (‘true’) or phonetic (false), defaulting to ‘false’ * audio: gives the URL of an audio file of the pronuncation

An example of encoding is given below:

    <LexicalEntry id="ex-rabbit-n">
        <Lemma writtenForm="rabbit" partOfSpeech="n"/>
            <Pronunciation variety="en-GB-fonxsamp en-US-fonxsamp" 
                audio ="https://path/rabbit.flac">'r\{bIt</Pronunciation>
        <Pronunciation variety="en-AU-fonxsamp" notation="weak vowel merger" 
                audio ="https://path/rabbit1.flac">'r\{b@t</Pronunciation>
         </Lemma>
     </LexicalEntry>

Wordnet Extensions

A file may contain a lexicon extension which serves to augment an existing lexicon with new lexical entries, synsets, senses, relations, etc. They are defined much like regular lexicons, but the <Extends> element specifies the ID and version of the base lexicon:

    <LexiconExtension id="ewn-cs-example"
                      label="English WordNet Computer Science Terms (example)"
                      language="en"
                      email="goodmami@uw.edu"
                      license="https://creativecommons.org/publicdomain/zero/1.0/"
                      version="1.0">
        <Extends id="ewn" version="2020" />

The contents of the lexicon extension are the same as a regular lexicon with the addition of elements for external lexical entries, synsets, and senses. There are two uses of external elements. First, they allow one to add additional information to the corresponding element in the base lexicon, such as adding a new sense to an existing lexical entry:

        <ExternalLexicalEntry id="ewn-process-n">
            <Sense id="ewn-process-n-20000123" synset="ewn-20000123-n" />
        </ExternalLexicalEntry>

In the above example, the ewn-process-n ID is not used to create a new lexical entry, but rather it must already exist in the base lexicon. The external lexical entry (as well as other external senses or synsets) may only add information; therefore it may not specify metadata or elements required on lexical entries, such as for the lemma.

Second, they introduce an ID which may be referenced by new structures, such as the target of synset relation:

        <ExternalSynset id="ewn-06581154-n" />
        <Synset id="ewn-20000123-n" ili="" partOfSpeech="n">
            <Definition>a running instance of a computer program</Definition>
            <SynsetRelation relType="hypernym" target="ewn-06581154-n" />
        </Synset>

Due to the way external IDs are used, a lexicon extension may not exist in the same file as the base lexicon.

Wordnet Dependencies

Some wordnets depend upon others, such as those in the Open Multilingual Wordnet which depend upon the Princeton WordNet for synset structure. With the <Requires> element, it is possible to explicitly codify those dependencies:

    <Lexicon id="spawn"
             label="Multilingual Central Repository"
             language="es"
             email="bond@ieee.org"
             license="https://creativecommons.org/licenses/by/3.0/"
             version="1.3+omw"
             citation="Aitor Gonzalez-Agirre, Egoitz Laparra and German Rigau. 2012. `Multilingual Central Repository version 3.0: upgrading a very large lexical knowledge base &lt;http://adimen.si.ehu.es/web/sites/all/modules/pubdlcnt/pubdlcnt.php?file=http://adimen.si.ehu.es/~rigau/publications/gwc12-glr.pdf&amp;nid=18&gt;`_. In *Proceedings of the 6th Global WordNet Conference (GWC 2012)*. Matsue, Japan."
             url="http://adimen.si.ehu.es/web/MCR/"
             dc:publisher="Global Wordnet Association"
             dc:format="OMW-LMF"
             dc:description="Wordnet made from OMW 1.0 data"
             confidenceScore="1.0">
        <Requires id="pwn" version="3.0" />

This element signifies to an application processing the wordnet that the required wordnet should be loaded as well. The <Requires> element may also be used on a <LexiconExtension> for cases where the lexicon extends one wordnet but requires another.

JSON

The JSON format follows that of the XML and is based on JSON-LD An example of the JSON is as follows:

The top level of a JSON graph consists of an object with two properties @context which must be the fixed string referring to the JSON-LD context and @graph giving the lexicon format. This structure is required for submission to the Collaborative Interlingual Index, but web services may of course return shorter fragments of the structure.

{ 
  "@context": "http://globalwordnet.github.io/schemas/wn-json-context-1.3.json",
  "@graph": [{

The following are required properties of every WordNet (note the language must be given twice). @id gives the identifier of this wordnet (should be unique in this document) and @type must be lime:Lexicon.

      "@context": { "@language": "en" },
      "@id": "example-en",
      "@type": "lime:Lexicon", 
      "label": "Example wordnet (English)",
      "language": "en",
      "email": "john@mccr.ae",
      "rights": "https://creativecommons.org/publicdomain/zero/1.0/",
      "version": "1.0",

In addition the properties citation, url, logo, status, confidenceScore and any property from Dublin Core Elements 1.1 May be used

      "citation": "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016).",
      "url": "http://globalwordnet.github.io/schemas",
      "publisher": "Global Wordnet Association",

The entries are given as a list under the entry property it requires an @id partOfSpeech and lemma and may have sense, synBehavior, status, confidence and Dublin Core properties. The lemma has only a single value writtenForm and the partOfSpeech must be one of the following: [ noun, verb, adjective, adverb, adjective_satellite, phrase, conjunction, adposition, other, unknown ]. The @id must be unique in the document, it is not the same as the @id of the wordnet or any other entry.

      "entry": [{
          "@id" : "w1",
          "lemma": { "writtenForm": "father" }, 
          "partOfSpeech": "noun",

The Sense requires only an @id and a synsetRef and may take status, confidenceScore, Dublin Core properties, an example.

          "sense": [{
              "@id": "example-en-10161911-n-1",
              "synsetRef": "example-en-10161911-n"
          }]
      }, {
          "@id" : "w2",
          "lemma": { "writtenForm": "paternal grandfather" }, 
          "partOfSpeech": "noun",
          "sense": [{
              "@id": "example-en-1-n-1",
              "synsetRef": "example-en-1-n",

A sense may also have any number of relations which have a relType from the list below and a target and may have Dublin Core properties

The syntactic behavior is given here as follows:

          "synBehavior": [
             {"label": "Somebody ----s", "@id": "intransitive"}, 
             {"label": "Somebody ----s somebody", "@id": "transitive"}
           ]
      }],

Synsets are listed under the synset property. A synset requires only an @id. It may take an ili which is a code from the CILI (starting with ili:i), a definition, an iliDefinition (which must be given in English), status, confidenceScore, relations and Dublin Core properties.

In contrast to the XML form the ili is optional. If there is no match omit this tag, if you wish to propose a new synset add only a iliDefinition.

      "synset": [{
          "@id": "example-en-10161911-n",
          "partOfSpeech": "noun",
          "ili": "ili:i90287",

Definitions must have a gloss and may be have a language, in addition, status, confidenceScore and Dublin Core properties may be added. An iliDefinition is the same but may not have a language.

          "definition": [{
            "gloss": "that which is perceived or known or inferred to have its own distinct existence (living or nonliving)"
           }],

Synset relations are given as for sense relations except the target must be the @id of another synset not a sense. The following properties can be used:

Indicate the members and the order they occur in:

          "members": ["example-en-10161911-n-1", "example-en-1-n-1"]
      }, {
          "@id": "example-en-1-n",
          "partOfSpeech": "noun",
          "definition": [{
              "gloss": "the father of your father or mother"
          }],
          "iliDefinition": {
              "gloss": "the father of your father or mother",
              "source": "https://en.wiktionary.org/wiki/farfar"
          },
          "relations": [
            { "relType": "hypernym", "target": "example-en-10162692-n" }
          ]
      }]
    }, {
      "@context": { "@language": "sv" },
      "@id": "example-sv",
      "@type": "lime:Lexicon", 
      "label": "Example wordnet (Swedish)",
      "language": "sv",
      "email": "john@mccr.ae",
      "license": "https://creativecommons.org/publicdomain/zero/1.0/",
      "version": "1.0",
      "citation": "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016).",
      "url": "http://globalwordnet.github.io/schemas",
      "publisher": "Global Wordnet Association",
      "entry": [{
          "@id" : "w4",
          "lemma": { "writtenForm": "farfar" }, 
          "form": [{ "writtenForm": "farfäder", "tag": [{ "category": "penn", "value": "NNS" }] }],
          "partOfSpeech": "noun",

Any examples should be given on the sense as follows:

          "sense": [{
              "@id": "example-sv-2-n-1",
              "synsetRef": "example-en-1-n",
              "example": [{
                  "value": "Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför",
                  "source": "Europarl Corpus"
              }]
          }]
      }]
  }]
}

The JSON format can be validated by the JSON Schema provided at https://github.com/globalwordnet/schemas/blob/master/wn-json-schema.json

RDF

We acknowledge the existence of two vocabularies to wordnet encoding. The wn-simple.ttl is based on the W3C RDF/OWL Representation of WordNet. This vocabulary is a straightforward encoding in RDF of the original Princeton data model where synsets, word senses, and words are the main classes. In the current version, new relations are added and additional axioms are provided to reinforce consistency.

The second RDF schema is significantly more flexible and builds principally on the W3C OntoLex Model. The details of the RDF serialization are principally built on those of the JSON-LD model. We include a separate tutorial here for the benefit of those who wish to create their resource natively in RDF.

The standard namespaces are

@prefix dc: <http://purl.org/dc/terms/> .
@prefix ili: <http://ili.globalwordnet.org/ili/> .
@prefix lime: <http://www.w3.org/ns/lemon/lime#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix synsem: <http://www.w3.org/ns/lemon/synsem#> .
@prefix wn: <https://globalwordnet.github.io/schemas/wn#> .
@prefix wordnetlicense: <http://wordnet.princeton.edu/wordnet/license/> .

Each wordnet is an instance of the class lime:Lexicon and must have the following properties

The mapping to the Lemon-OntoLex model is as follows:

A more extended example is given here:

<#example-en> a lime:Lexicon ;
  rdfs:label "Example wordnet (English)"@en ;
  dc:language "en" ;
  schema:email "john@mccr.ae" ;
  cc:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
  owl:versionInfo "1.0" ;
  schema:citation "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)." ;
  schema:url "http://globalwordnet.github.io/schemas/" ;
  dc:publisher "Global Wordnet Association" ;
  lime:entry <#w1>, <#w2>, <#w3> .

<#w1> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "grandfather"@en 
  ] ;
  wn:partOfSpeech wn:noun ;
  ontolex:sense <#example-en-10161911-n-1> .

<#example-en-10161911-n-1>  a ontolex:LexicalSense ;
  ontolex:reference <#example-en-10161911-n> .

<#w2> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "paternal grandfather"@en 
  ] ;
  wn:partOfSpeech wn:noun ;
  ontolex:sense <#example-en-1-n-1> .

<#example-en-1-n-1> a ontolex:LexicalSense ;
  ontolex:reference <#example-en-1-n> .

[] a ontolex:SenseRelation ;
  vartrans:source <#example-en-1-n-1> ;
  vartrans:category wn:derivation ;
  vartrans:target <#example-en-10161911-n-1> ;
  dc:creator "John McCrae"@en .
          
<#w3> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "pay"@en
  ] ;
  wn:partOfSpeech wn:verb ;
  synsem:synBehavior <#transitive>, <#intransitive> .

<#intransitive> rdfs:label "Somebody ----s"@en .
<#transitive> rdfs:label "Somebody ----s somebody"@en .

<#example-en-10161911-n> a ontolex:LexicalConcept ;
  wn:partOfSpeech wn:noun ;
  skos:inScheme <#example-en> ;
  wn:ili ili:i90287 ;
  wn:definition [
    rdf:value "the father of your father or mother"@en
  ] ;
  wn:memberList ( <#example-en-1016911-n-1> <#example-en-1-n-1> ) .

[] 
  vartrans:source <#example-en-10161911-n> ;
  vartrans:category wn:hypernym ; 
  vartrans:target <#example-en-10162692-n> .
          
<#example-en-1-n> a ontolex:LexicalConcept ;
  wn:partOfSpeech wn:noun ;
  skos:inScheme <#example-en> ;
  wn:definition [
    rdf:value "the father of your father or mother"@en 
  ] ;
  wn:iliDefinition [
    rdf:value "the father of your father or mother"@en ;
    dc:source "https://en.wiktionary.org/wiki/farfar"
  ] .

[]
  vartrans:source <#example-en-1-n> ;
  vartrans:category wn:hypernym ;
  vartrans:target <#example-en-10162692-n> .

<#example-sv> a lime:Lexicon ;
  rdfs:label "Example wordnet (Swedish)"@sv ;
  dc:language "sv" ;
  schema:email "john@mccr.ae" ;
  cc:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
  owl:versionInfo "1.0" ;
  schema:citation "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)." ;
  schema:url "http://globalwordnet.github.io/schemas" ;
  dc:publisher "Global Wordnet Association" ;
  lime:entry <#w4> .

<#w4> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "farfar"@sv 
  ] ;
  ontolex:otherForm [
    ontolex:writtenRep "farfäder"@sv ;
    wn:tag [
        wn:category "penn" ;
        rdf:value "NNS" 
    ]
  ] ;
  wn:partOfSpeech wn:noun ;
  wn:sense <#example-sv-2-n-1> .

<#example-sv-2-n-1> a ontolex:LexicalSense ;
  ontolex:reference <#example-en-1-n> ;
  wn:example [
    rdf:value "Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför"@sv ;
    dc:source "Europarl Corpus"
  ] .