Wikifunctions:Status updates/2024-09-20

Wikifunctions Status updates

Introducing focus topic areas

As we are moving closer to making it possible to generate natural language text, we are starting to think about introducing topical foci. We already have focus languages, but we are also considering introducing focus topic areas.

As we have discussed before, we expect communities to create (at least) two different types of articles using Abstract Wikipedia: on the one hand, we will have highly-standardised articles based mostly on Wikidata, called model articles; and on the other hand, we will have bespoke, hand-crafted articles, assembled sentence by sentence.

We suggest introducing (at least) two different focus topic areas: one focus area for the model articles, and a different focus area for the manually-written articles. The two different types of articles benefit from different topic areas: model articles suit areas better that have a lot of individual items which are uniformly describable in Wikidata, whereas manually-written articles are better suited for topic areas where each article in the topic area is quite different from the other, and where the items in Wikidata are often rather empty.

We would prefer to choose topic areas that are not highly contentious, either within a single language or across language barriers. This is particularly true for the model articles approach: both subtle and blatant differences may often miss the careful consideration that is necessary, especially if we create thousands or millions of articles!

We would prefer a topic area that can invite contributions from all over the world. For example, articles about kabuki theater would make a great contribution to the knowledge of the world, but it is expected that most contributors to them would come from one single country, speaking mostly one language.

We would prefer a topic area that is of interest to the wider population. Whereas Wikidata is known for its coverage of scientific papers or astronomical objects, both topic areas seem to have a limited readership, thus also limiting the value that topic area would bring.

Having said that, we propose food as the topic area for hand-crafted articles. We will write a more detailed weekly about why food makes such a great topic area, and in case there are no vetos or better suggestions, we will pick that up as one of our two focus areas.

For model articles, we are looking for a discussion to see what we should select: the two most obvious topic areas would be human settlements and people, but both have a lot of potential for being contentious. Biological species are another interesting topic area, but are often much more complicated than expected. There are many other interesting topic areas, and we would love to hear your suggestions, thoughts, considerations, and see the discussion.

Note that we most definitely won’t stop anyone from creating the content they care about. You will be absolutely free to create articles on the topic areas you care about, and they can be model articles or they can be manually-written articles. The focus area is merely to help the development team focus and to help set expectations when working together with you as communities. If you want to write an abstract article about a specific 1980s fashion fad or create model articles for crochet patterns, you are more than welcome to do so. We just want to help you understand our prioritisation.

Please chime in on which topic areas you think would make particular sense, so that we can come to a preliminary decision in the following weeks. Thank you!

Site instability update

We had an ongoing incident that we already reported in last week’s update. Together with our colleagues in SRE, we were looking for quite a while, trying to figure out what was going on, and couldn’t. Most frustratingly, we could not reproduce the issues in production in any other environment, which made debugging really difficult.

The issue surfaced as about 10%-20% of function pages timing out, also numerous test and implementation pages failed, and other issues. All tests failed consistently. SRE got paged so often that they had to switch off the monitoring on Wikifunctions.

We kicked off our incident procedure, and continuously increased the resources dedicated to the issue. We found several possible culprits, but since we were unable to locally replicate the issue, trying to fix it was often a frustrating cycle of deploying to production, and checking whether that helped. As of now, we hope that the site has recovered. While we were throwing our net wider, we were able to find an edit to Wikifunctions, that seemed to have caused an infinite loop on certain validations. We were able to roll back that edit.

We are still investigating how that edit led to such a big impact, why we didn’t catch this issue sooner, and what to do in order to ensure that we don’t run into a similar situation again. If you still continue to see pages time out, please let us know.

Thanks everyone for your patience. We apologize for being a bit vague still, but we want to first understand the root cause of the issue a bit better before making it easily visible how one may break the site again. We understand that some of you could easily look into the site to figure out details right now, but we would like to ask you to not share that knowledge too widely just yet. We plan to say a bit more about this in the coming weeks, once we have had a bit more time to understand the root cause. Thank you for your patience!

Function of the Week: Caesar cipher for Bengali alphabets

We had talked about the Caesar cipher before in the Function of the Week. It is an old form of cryptography, where each letter is shifted by a certain number of positions. Traditionally, it is applied to texts written in the Latin alphabet.

One large advantage of Wikifunctions is that we can easily create and deploy new functions, even functions that haven’t been implemented before, and make them available to everyone in the world (or, at least, everyone with access to the Web). This way our contributors can create functions which have probably been thought about, but haven’t seen widely available implementation: for example, one could take the idea of the Caesar cipher and apply it to other alphabets.

And this is exactly what we have in our current Function of the Week: it applies the idea of the Caesar cipher - of shifting letters along a given alphabet - to the Bengali alphabet. The Bengali alphabet is a script of the Indian subcontinent, used for a good thousand years, and used in a number of languages, including classic languages such as Sanskrit and living languages such as Bangla with more than a quarter of a billion speakers.

The function Caesar cipher (Bengali alphabets) (Z17530) takes two arguments, the Bengali string to be encoded, and the shift value, a number deciding by how many letters to shift it. The return is a string, representing the encoded Bengali input.

There is currently one implementation in Python, which is based on an array with the Bengali alphabet, and then going through the input string to replace each letter with the shifted letter.

The function has six tests:

Shifting অআকখ by 2 results in ইঈগঘ
Shifting ক by 38 results in অ
Shifting হ by 1 results in ড়
Shifting অ by 49 results in অ (i.e. that’s the identity shift)
Shifting য় by 1 results in ৎ
Shifting অ by 11 results in ক (which is the reverse of the second test)

The existing implementation passes all six tests.

The tests look great, and cover a few interesting cases (such as the identity shift, or a reverse). I would add tests for a zero shift, for a shift beyond 49 (e.g. for 98 or for 100), and shift more words instead of single letters, including letters and characters which are not in the Bengali alphabet. It would also be good to have more than just one implementation.

But the interesting part really is that this gives a widely available implementation of the Ceaser cipher for an alphabet where that wasn’t widely available before. I am looking forward to seeing what other functions we can make available in novel contexts, and see whether they gain any traction.