The Global WordNet Association (GWA) is a free, public and non-commercial organization that provides a platform for discussing, sharing and connecting semantic lexicons (wordnets) for all languages in the world. Our goal is to make compatible, linked, open lexical resources useful for both humans and computers. As an example, here is the entry for wordnet in the Open Multilingual Wordnet (OMW 1.0) and in the development branch of OMW 2.0.
The English Wordnet is extremely widely used, and so is the multilingual data from the OMW 1.0, which is distributed with the Python Natural Language Toolkit, used by Google Translate, Babelnet and many more. The documentation we are building here is aiming at the new version (OMW 2.0) which makes the semantic network less biased towards English. We have high quality contributions for over 40 languages (not all on the website yet).
We are eager to welcome tech writing collaborators through the Google Season of Docs program to help us with our documentation needs! GWA is a collaboration of many projects. We have selected three for this proposal, and the application has been blessed by the GWA board (five members of which are involved in the proposal).
We have
been selected
to participate in 2020!
If you are interested in working on this project, email us!
These tasks should all start with a documentation audit, to see what we do and don't have already.
If you are interested to know more about these project ideas, please email wngsodoc@gmail.com and the mentors of each project and we'll reply as soon as we can.
The goal is to make sure all the data-types (semantic relations,
parts-of-speech, verb patterns, ...) are properly documented. The
goal is that both dictionary users and dictionary developers can
access this easily, and we have a procedure for adding more
documentation as needed.
We have started to make a central repository of documentation for, e.g., semantic relations: gwadoc, but it needs a lot of work. We are trying to keep UX strings together with the documentation, to make it easier to keep them in sync.
The goal is to have the documentation dynamically produced with the lexicons, so that we know we always have the same set of relations, and we can pull in examples directly from the database.
We have also started adding more documentation to the development branch of our new OMW interface, but it has not been deployed yet: here is a test server.
Link to the open source project that needs documentation: GWADOC, and some more documentation in the development branch of our new OMW interface.
We have started with documentation for the core semantic relations, such as hypernym. The documentation needs to be fleshed out more, with descriptions of the meanings, linguistic tests, examples and more.
Currently our documentation is both incomplete, inconsistent and overly technical when it exists. It would also be good to have more links to the original documentation.
Documentation on relations is spread over many sources, from the original man pages for Princeton Wordnet, to general guidelines (this is from EuroWordnet), books (this is from the Polish Wordnet) and papers.
We often add new information (like additional semantic relations), so it would be good to have a template for adding new relations.
Finally (or perhaps initially), it would be good to have a documentation audit to see what information is missing.
To help get an idea of what the task involves, please take a look at the definitions of semantic relations in GWADOC. Then based on the documentation linked to above, can you try to add some documentation for the Meronym/Holonym relations? When you have done that, either make a pull request or email your sample to wngsodoc@gmail.com. Please feel free to improve on both the content and the appearance.
Ewa Rudnicka, German Rigau, Francis Bond
The OMW is relatively complicated, and it would be good to have a
couple of user guides.
The Open Multilingual Wordnet provides access to open wordnets in a variety of languages, all linked to a collaborative Interlingual Index. The goal is to make it easy to use wordnets in multiple languages. The individual wordnets have been made by many different projects and vary greatly in size and accuracy. We have defined a common interchange format (GWA-LMF: Lexical markup framework) and a website where people can upload and make new wordnets accessible. The Open Multilingual Wordnet and its components are open: they can be freely used, modified, and shared by anyone for any purpose.
The wordnets are all developed independently, although all based on the original Princeton Wordnet of English. They are also typically made in projects with finite funding windows, and not much money for maintenance. Because of this, documentation is spread all over the world: as technical reports, academic papers, theses and more.
Link to the open source project that needs documentation.
Open
Multilingual Wordnet Version 2.0
These are some of the areas that we think need improvement.
Existing Documentation
There is no existing documentation (we hope it is sort of intuitive, but
suspect it may not be to outsiders). Here are some examples.
Some documentation on CILI in the developer branch on github.
To help get an idea of what the task involves, please
try to write some documentation for the concept page
(e.g. Page
for the concept software documentation). Maybe something
similar
to the
explanation for the Japanese WordNet (sorry it is in Japanese)?
You could also suggest ways that the search result page could be
improved. When you have done that, either make a pull request or
email your sample
to wngsodoc@gmail.com.
Please feel free to improve on both the content and the appearance.
A guide for someone searching the wordnet (or family of wordnets)
Synset, Sense, Summary, ...
A guide for interacting with the Collaborative InterLingual Index (CILI)
Sample
Skills
Nice to have:
Mentors:
Francis Bond,
Alexandre Rademaker
The goal is to let contributors know how to add new information
(words, senses and synsets). It would be good to have (i) a general
guide and (ii) a guide specific to the English wordnet.
We have a very rough guide at NTU: http://compling.hss.ntu.edu.sg/ntumc/tagdoc.html
The Polish wordnet project and Eurowordnet also have extensive documentation: general guidelines from EuroWordnet, The Polish wordnet book.
An illustrated step-by-step guides for:
There will be a need for language specific extensions, but we want to start by targeting English, specifically the English Wordnet.
To help get an idea of what the task involves, could you try to make a rough guide for someone trying to
You could maybe reference the wiktionary documentation: Criteria for inclusion and Entry layout. Their structure is different from ours, but it gives a good idea of the kinds of information. When you have done that, please email your sample to wngsodoc@gmail.com. Please feel free to improve on both the content and the appearance. This is probably the hardest task, and to do properly will require discussion with the wordnet developers. For the sample, maybe just sketch out general guidelines.
Proposal format very much inspired by Kolibri's from 2019, thanks to the GWA board for discussion, and the GSoDoc organisers for the chats.