Okuna-api: Multi-language posts

Created on 2 Apr 2019  ·  10Comments  ·  Source: OkunaOrg/okuna-api

From Ronald on Slack

Perhaps it’s a good idea to be able to set preferred languages before we go public. If the trending timeline is full with posts written in chinese that is going to be a problem.

Possible solution is.. on on-boarding let the person select preferred languages, preselecting the current device language.

When a person is posting, we can try to detect the language and show this somewhere at all times.

The person can then tap this to override it if wrong. We can show the preferred languages list first.

After these two things are set, we can filter the timelines on language/s.

EDIT: See bottom for latest suggestion.

medium feature

Most helpful comment

I'm not sure that applies directly with this issue. But it should be possible to change the language attribute. Especially with very mixed posts with several foreign words, it can happen that the wrong language is stored. Even MS Word produces regular errors from my experience.

All 10 comments

Another option is having a translate button.

We can look into open-source, pretrained translations models and perhaps start from there?

http://opennmt.net/Models/

Someone in the comments on OB mentioned: https://www.deepl.com/pro.html#pricing

We can detect the content of a language on posting locally with https://github.com/Mimino666/langdetect

So... We're bumping this up in prio and we'll pick it up right after reporting flows are done.

How it looks like so far is:

  1. Detect language locally on server with the langdetect library and store it as a post attribute.
  2. When someone retrieves the post, check if the post language matches the device language. *1
  3. If it does, do nothing, if it does not, show a Translate button.
  4. When translate is pressed, call a /postUuid/translate/ api with the desired language.
  5. The server calls an external translation API and returns the result *2

*1 Although device language might work for first iterations, this should become something like preferred language that can be bootstrapped to the device language.

*2 There's 2 options so far, deepl.com and AWS translation API.

Deepl looks like a great option being based in Germany and claiming to have strong privacy principles but.. it is another third party. Using amazon's translate would keep it all within the AWS ecosystem but they do say they "may" use the contents to improve their translation models.

Personally, I'd rather go with Deepl.

Thoughts welcomed as usual.

With regards to point 3, there should also be an option to never show a translate link for a certain language. My device is set to Dutch, but I don't want the translate button to appear for English posts. Google added a similar option after their translate function in Chrome generated a lot of backlash from multilingual people.

Language detection is not flawless and will get it wrong or does not support a language at all. How should those cases be handled? Should the poster be able to override it if needed?

The downside with deepl (and maybe AWS) is that they only support a limited selection of languages (so far). Of course, a majority of the userbase will be covered with just English, German, French and Spanish, but the remaining few percent will have a lesser experience.

Bing and Google aren't really options, though, given privacy concerns.

The quality of DeepL results is great but I agree that the limited range of available languages might become a problem.
Another thing is costs. I don't know about AWS but DeepL charges 4.99€ / month for developers plus 0.01ct per 500 characters.

Thanks for the info @oliverzet !

At this time, Amazon Translate supports translation between the following 21 languages: Arabic, Chinese (Simplified), Chinese (Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Turkish. Between these languages, the service supports 417 translation combinations

And for pricing

image

Not sure how expensive it might turn out to be but definitely supports more languages.

@schmitzel76 Definitely, we'll add an option for "Never translate LANGUAGE posts".

Not sure how should we deal with wrong translations 🤔 .

As for deepl vs AWS, we can design it to be replaceable so question is just which one to try first.

Also, this will most likely only be available for public posts.

I'm not sure that applies directly with this issue. But it should be possible to change the language attribute. Especially with very mixed posts with several foreign words, it can happen that the wrong language is stored. Even MS Word produces regular errors from my experience.

@lifenautjoe Well, AWS seems to be less expensive and supports way more languages. Translation itself will probably be better with DeepL. On the other hand it's usually enough to get the gist. So it looks like Amazon is the better choice. I don't know how this might affect privacy though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

joenepraat picture joenepraat  ·  3Comments

lifenautjoe picture lifenautjoe  ·  7Comments

lifenautjoe picture lifenautjoe  ·  4Comments

amirali-asvadi picture amirali-asvadi  ·  3Comments

mitsuhiko picture mitsuhiko  ·  3Comments