Jump to content

Abstract Wikipedia/Updates/2022-06-21

From Meta, a Wikimedia project coordination wiki
Abstract Wikipedia Updates Translate

Abstract Wikipedia via mailing list Abstract Wikipedia on IRC Wikifunctions on Telegram Wikifunctions on Mastodon Wikifunctions on Twitter Wikifunctions on Facebook Wikifunctions on YouTube Wikifunctions website Translate

Manually-written articles

Communities will create (at least) two different types of articles using Abstract Wikipedia: on the one hand, we will have highly-standardised articles based entirely on Wikidata, called model articles; and on the other hand, we will have bespoke, hand-crafted content, assembled sentence by sentence. Today we will discuss the second type, after we discussed the first type, model articles, in a previous newsletter. Both types, by the way, can be implemented by the "templatic renderers" concept that is part of Ariel Gutman’s proposal. We will also dedicate a future newsletter to a comparison of the two types.

For manually-assembled articles, we have to make many more assumptions about what will eventually be available in Wikifunctions than we do for model-based articles. The following description is not meant to prescribe to the community how things should work, but provides just the sketch of a possibility. It is based on a "Wizard of Oz experiment" we did during our recent Abstract Wikipedia team offsite.

We took the first sentence from a semi-randomly chosen article, with the aim to handcraft the representation of said sentence in Abstract Wikipedia. It's often harder to see how to translate articles about ideas than more concrete things like people, places, and objects. The sentence came from the English Wikipedia article Profit (economics), which we picked as a common example of a concept:

An economic profit is the difference between the revenue a commercial entity has received from its outputs and the opportunity costs of its inputs.

Note that we do not expect that English Wikipedia will be the source for all articles for Abstract Wikipedia, but it is certainly a convenient source of inspiration for the team, given that all of us speak English. As a baseline, we each manually translated that text into the languages we speak.

One powerful, if not the most powerful tool in our arsenal towards turning this sentence into abstract content is that we can rewrite and simplify it. In Abstract Wikipedia the goal is not to translate as faithfully as possible the wording of any existing Wikipedia articles, but to capture as much as possible of the meaning of the articles. So we took the freedom to rewrite the sentence as follows:

In economics, the profit of a commercial entity is defined as the difference between its outputs’ revenue and its inputs’ opportunity cost.

We further reduced the sentence, due to time constraints, as simply:

In economics, profit is defined as the difference between revenue and cost.

We then from this assembled the following abstract content.

Context

Here, the bold text is the label of a constructor, the italic text is the label of a key of the given constructor, and the link points to a Wikidata item. This follows the notation used in previous examples. Just as with previous examples, we assume the availability of the used constructors. To be explicit, in this case we assume the constructors listed below with their respective keys. How the keys or constructors would be named, and in fact, which constructors and keys would even exist, might very well be very different.

Context returns a full clause representing a subordinate clause being put in a context

  • context takes a noun phrase, describing the context in which the content is
  • content takes a clause that is being put in the context

Definition returns a full clause defining something as a definition

  • subject takes a noun phrase that is being defined
  • definition takes a noun phrase that represents the definition

Difference returns a noun phrase that means the quantitative difference between two given noun phrases

  • first takes a noun phrase that represents the first part
  • second takes a noun phrase that represents the second part

Where we have mentioned "noun phrase" above, we actually mean "concept that can be realized as a noun phrase by a renderer". Also, we have glossed over the considerable challenge of having a mechanism through which a renderer could just take in a Wikidata item and turn it into a noun phrase. That is a challenge that Mahir has tackled admirably with Ninai and Udiron.

Another challenge was to find the right Wikidata items for each of the involved noun phrases. For example, for the second key of the Difference constructor, we chose operating cost. Other candidates could have been cost or opportunity cost. Again, this is not necessarily the best choice, but just the one we came up with, given our time constraints and the way we approached the task.

The final step of the exercise was to take that abstract content, and to render (by hand) a natural language text in the languages that we speak, as mechanically as possible, using the labels of the selected Wikidata items (it should be the lexeme connected to the items, but that was too sparse). This step is why we called the whole exercise a “Wizard of Oz” exercise, as we simulate here what renderers in Wikifunctions would do.

Here are some results (unfortunately, we didn’t record the results we came up with during the offsite, so we re-created them for this newsletter):

English: In economics, economic profit is defined as the difference between income and operating cost.

German: In Wirtschaftswissenschaft ist Gewinn definiert als der Unterschied zwischen Einkommen und Betriebskosten.

Croatian: U ekonomiji, dobit je definiran kao razlika između dohodka i troška*.

Russian: В экономике, экономическая прибыль определяется как разница между доходом и операционными затратами.

French: En économie, le profit est défini comme la différence entre les revenus et les dépenses d'exploitation.

Spanish: En economía, ganancia económica se define como la diferencia entre ingresos y costes*.

Kannada: ಅರ್ಥಶಾಸ್ತ್ರದಲ್ಲಿ, ಆರ್ಥಿಕ ಲಾಭವನ್ನು ಆದಾಯ ಮತ್ತು ನಿರ್ವಹಣಾ ವೆಚ್ಚದ ನಡುವಿನ ಅಂತರವೆಂದು ವ್ಯಾಖ್ಯಾನಿಸಲಾಗಿದೆ.

Chinese: 在经济学中,经济利润被定义为收入与经营成本之间的差额。

Hebrew: בכלכלה, רווח מוגדר כהפרש בין הכנסה להוצאות תפעוליות.

Swedish: I nationalekonomi definieras vinst som skillnaden mellan inkomst och Opex.

Italian: In economia, il profitto è definito come la differenza fra il reddito e i costi operativi*.

Arabic: في الاقتصاد*، يتم تعريف الربح على أنه الفرق بين الدخل المالي والمصروفات الجارية.

Words marked with an asterisk were given manual translations from us, as they did not at the time have a label in Wikidata, or the label did not fit.

During the offsite, we evaluated the results, and found them in fact not only readable (although not perfect), but also easier to understand than our initial translation. This is likely an effect of the simplification process the text underwent. The whole exercise left us filled with optimism about the approach.

This newsletter was late due to the amount of discussion it generated internally. Don’t expect everyone on the team to agree on everything being said here. We think these discussions should be in the open, for everyone to join in. Expect more to follow.

Further updates:

We are getting additional support from ThisDot technical writers: Two ThisDot technical writers will be joining the team for the remainder of June to figure out how to on-board users into the concept of functions, and how to communicate to users what functions are and how they work, in an easily-translatable manner.

Below is the brief weekly summary highlighting the status of each workstream

Performance:

  • Drafted the Performance Metrics document
  • Started research on reported slowness in function evaluation
  • Added logging and dashboarding to Beta Cluster and wrote documentation for Beta Cluster

NLG:

  • Wrote a Proof of Concept of support for new Wikifunctions features to support proposed NLG pipelines

Meta-data:

  • Altered MediaWiki PHP and Vue layers to handle either format
  • Ensured that no function-orchestrator test code/cases employ the old format

Experience:

  • WikiLambda PHP and Function-schemata finished and merged
  • Design: continue working on typed list view
  • Front-end: made ISO codes mobile friendly and started table component implementation