Openlibrary: Keep language standardized to English

Created on 7 Nov 2019  ·  39Comments  ·  Source: internetarchive/openlibrary

Too many work titles are in other languages, but should really be English if one edition is in English (it's ok for a foreign language if all of them are in that, but it would help to have the English translation alongside it)

Will Not Fix Feature Request

Most helpful comment

@jessamynwest I agree _edition_ titles should be in the language of the book; we're talking about works. Can you provide some concrete cons of having work titles be in English? And some pros _specifically_ of having them be in the original language?

Cons of Work titles being in English (as opposed to whatever language they are in)

  • It sets up a bad precedent of mixing language on title/book. I do NOT want to read an English title and click on the book and have it be in Russian. That is a bad user experience for me. It will confound unsophisticated users.
  • It points to American/English exceptionalism and colonialist mindsets. It basically says "This is a site for English speakers" which, while true in usage, should not be what Open Library aspires to.
  • Making this change, on a site that has existed for over ten years, is saying something specific about the direction the site is heading, away from internationalization and towards English language hegemony
  • It's alienating to non-English speakers who may be trying to find works in their own language
  • It's lazier than doing it right which would be to let people choose their language with a user preference which should be the prioritized option if we're supposedly heading in an i18n direction
  • It's changing the current behavior of the site which we need to do carefully if we have little to no user support to let people know things have changed
  • it's not the way libraries do it (and Open Library was created in order to mirror the way libraries do things which is why that guideline is the way it is)

Pros of Work titles being in English

  • It's simpler for programmers to build
  • It's easier for English-speaking users

Want to make a case for the change? Show me data that supports it. Show me bounce statistics from users landing on non-English works. Show me emails from users who are upset about having to deal with non-English titles of non-English books. It's a data driven site, so where is the data that supports this?

All 39 comments

Hi,

The work title should be in the original language of the work according to this.

An English translation alongside the original title it would only help users who search in English, but not the rest of users.

There is a field for other titles of the work in the edition records, for example, for titles in other languages. I usually use this field to include the title in English if it is a edition translated into another language. Rather than artificially modifying the data, it would be helpful to improve the search method in any language, either using this field or in any other way.

I imagine if it hasn't already been done it's because it must be difficult to implement.

P. S.: An example: Fantasmas. It is an edition in Spanish of Phantoms by Dean Koontz. The "Other titles" field can be used to include the title in another language.

I _strongly_ disagree with that guideline. The title of the work should (in an ideal world) be internationalized; so an English user would see the English title, and a Spanish user would see the Spanish title. Similar to how Wikidata handles work titles: https://www.wikidata.org/wiki/Q43361
image

Setting the work title to be the original language makes almost every user unhappy; what are the chances the user speaks the _original_ language of a book? It's so frustrating to see a book by _Tolstoy_ and not be able to read its title.

Can I get some feedback from @seabelis @jessamynwest ; I'd like to change that guideline. In the meantime, we should have work titles be in English if an English edition exists (as @BrittanyBunk proposes). In the future, we should expand our data model to be like Wikidata's to support having multiple titles (one for each language); but that doesn't have to be decided right now. There could/should also be a field for "original title".

@hornc @dcapillae
It would, though, be a real UI improvement if the work page could group the languages somehow (perhaps facets or just list sort ordering) so that a reader trying to find a German, Turkish, or Spanish edition of _Phantoms_ does not have to page through all the English ones to find it.

@cdrini Please correct me if I'm wrong, but it seems there is still no way for an OL user to state their preferred language. Until that is fixed, preferring results in that language will remain impossible. The next-best option would be to test browser settings, but that is often not configurable e.g. in public library kiosks.

You are correct; it's not configurable at the moment.

Not all works have been translated into English.

And there's been some discussion about using the work to populate the 'translated from' fields in the edition, this would require the work to represent the original language.

A literal translation of a title may not be the actual title in English, if there are multiple versions of an English title, which would be used?

Not even the _titles_ of all works have been translated to English. For that matter, not every work even _has_ a title.

Hi,

Editions in other languages usually include a reference to the title of the original work. This value should be added in the "Translation of" field. This value can be only one.

Editions in other languages are sometimes published with different titles, e. g., Ramsey Campbel's The Nameless has been published in Spanish with two different titles (La secta sin nombre and Los sin nombre). Sometimes original works are also published with different titles, e. g., Jack Finney's The Body Snatchers has been published with other title (The Invasion of the Body Snatchers). These titles, all of them, can be included in the "Other titles" field. This value can be more than one.

If there are multiple versions of an English title, we can include all of them in the "Other titles" field. However, the value "Translated of" can only be one: there is only one title and one work from which it is translated.

@seabelis To be more succinct, here's my proposed change (only works which have an English edition would have English titles in the work).

| Before | After |
| --- | --- |
| The "work" data should reflect the original language, if that is known. | The "work" title should be in English, if an edition exists in English. |

Here's a pros/cons

Pros:

  • Makes it easier for the the majority of our users to find books
  • More inline with how we want work titles to be in the long term (i.e. internationalized)
  • Make our data dumps more consistent/easier to use
  • Make our site more usable via accessibility/screen readers

Cons:

  • Non-English users have more difficulty find books

    • I would argue this is false; a French user would have just as much trouble identifying _ВОЙНА и МИРЪ_ as _War and Peace_ as an English user. That's what I mean by using original language makes almost everyone unhappy; it basically randomly makes the nationality of the author happy for their books (?)

And there's been some discussion about using the work to populate the 'translated from' fields in the edition, this would require the work to represent the original language.

I'm not crazy over that proposal; I think I would rather have an original_edition field on the work, which stores the first publication. Then all editions not in that language can be assumed translations of this edition.*

A literal translation of a title may not be the actual title in English, if there are multiple versions of an English title, which would be used?

I would argue use the most well known (if one is clearly more well known), otherwise use the original English title. Any other titles can be added to other titles (as @dcapillae suggests).

  • e.g. _And Then There Were None_ should be user over the original title, because it is more well known and significantly more editions were published under that title
  • e.g. _Harry Potter and the Philosopher's Stone_ should be user over the US title, because both are very popular

* There is an edge case; _ideally_ we'd be able to specify the specific edition one edition was translated from (e.g. Ismail Kadare's books are originally in Albanian, but most non-Albanian versions are translations of the French version, not the original Albanian). But we don't currently handle that anyways, so ¯\_(ツ)_/¯

This is so wrong-headed that I don't even know where to begin. Do people not see the vicious cycle that they are perpetuating?

"The majority of are users are English speakers, so let's do everything as English only."

Why don't we have more non-English speakers? Because at every opportunity, we actively make them feel unwelcome and drive them away!

Translated work titles is definitely the way to go long term, but until we can get there let's not backslide into English-only xenophobia.

Translated work titles is definitely the way to go long term, but until we can get there let's not backslide into English-only xenophobia.

_Strong_ disagree. Why should we make the experience bad for almost everyone until that happens? Again, this isn't anti-i18n or xenophobic, it's making our data more organized.

Can you give some pros/cons of having work titles be in their original language?

@dcapillae The particular cases of translations _from_ English are relatively straightforward. It gets much more complex when discussing classics, where in some cases the original text does not survive.
Case in point: we can be quite sure that Aesop never wrote in English, yet WorldCat shows this is by far the most common translation language of the 99 listed, leaving Latin a distant second.

https://www.worldcat.org/search?q=Aesop&fq=&dblist=638&fc=ln:_100&qt=show_more_ln%3A&cookie

The oldest surviving text is the 1478 collected works, where the translation into Latin is attributed to Rinuccio d'Arezzo with the Latin title _Vita et Fabulae_. We certainly can't be sure what the first Greek titles were, nor can we be sure that the stories originated with Aesop. What we _can_ do is identify the earliest known text, from which others are derived. I still contend that this should simply be a work ID, not a textual title.

Book titles should be in the language the book is in. Agree entirely with @tfmorris on this.

The Internet Archive is not particularly good at dealing with an international audience, let's not make it worse.

If we're only looking for a good user experience for our English speaking users (see also, only having support during the hours the IA is open) we're failing at our mission. If we "standardize" to English language we are prioritizing one set of users over another.

The best way to do this would be to prioritize a feature addition whereby a user can set a preferred language.

@jessamynwest I agree _edition_ titles should be in the language of the book; we're talking about works. Can you provide some concrete cons of having work titles be in English? And some pros _specifically_ of having them be in the original language?

To clarify, we're not talking about i18n specifically here, except to keep it in mind as our long term goal. The main question is between whether the work title should be in English or in the original language of the work. To decide that, we need to come up with some sort of pros/cons of one vs the other.

Why would we want to list works in one language _or_ the other? Certainly today's world is dominated by the anglosphere, but there's no real reason to assume every work has even one English edition, or that one translation into English is more correct than another. We should have a mechanism that can show all the distinct titles of a work, with preference to those in whatever language the user asks for.

@jessamynwest I agree _edition_ titles should be in the language of the book; we're talking about works. Can you provide some concrete cons of having work titles be in English? And some pros _specifically_ of having them be in the original language?

Cons of Work titles being in English (as opposed to whatever language they are in)

  • It sets up a bad precedent of mixing language on title/book. I do NOT want to read an English title and click on the book and have it be in Russian. That is a bad user experience for me. It will confound unsophisticated users.
  • It points to American/English exceptionalism and colonialist mindsets. It basically says "This is a site for English speakers" which, while true in usage, should not be what Open Library aspires to.
  • Making this change, on a site that has existed for over ten years, is saying something specific about the direction the site is heading, away from internationalization and towards English language hegemony
  • It's alienating to non-English speakers who may be trying to find works in their own language
  • It's lazier than doing it right which would be to let people choose their language with a user preference which should be the prioritized option if we're supposedly heading in an i18n direction
  • It's changing the current behavior of the site which we need to do carefully if we have little to no user support to let people know things have changed
  • it's not the way libraries do it (and Open Library was created in order to mirror the way libraries do things which is why that guideline is the way it is)

Pros of Work titles being in English

  • It's simpler for programmers to build
  • It's easier for English-speaking users

Want to make a case for the change? Show me data that supports it. Show me bounce statistics from users landing on non-English works. Show me emails from users who are upset about having to deal with non-English titles of non-English books. It's a data driven site, so where is the data that supports this?

@LeadSongDog

We should have a mechanism that can show all the distinct titles of a work, with preference to those in whatever language the user asks for.

Strong agree; in the long term I think the work should list all titles (in some fashion). This is mostly deciding what to do in the interim until we have the time to work on that.

@jessamynwest _Thank you_ for the feedback; I honestly want to have a discussion about this, but felt like I was getting auto-shut down with "What is wrong with you?" style comments.

@LeadSongDog wrote:

The oldest surviving text is the 1478 collected works, where the translation into Latin is attributed to Rinuccio d'Arezzo with the Latin title _Vita et Fabulae_. We certainly can't be sure what the first Greek titles were, nor can we be sure that the stories originated with Aesop. What we _can_ do is identify the earliest known text, from which others are derived.

I'm not a specialist in the field. Librarians and archivists surely know what to do in these cases.

Consulting some cataloguing manuals, they prescribe that the transcription of the title must respect the spelling of the original work (the oldest book preserved, in this case). Some manuals also indicate how to transcribe certain letters of the original title which are not directly equivalent to modern letters in the case of old documents.

  • @cdrini What you wrote is what I would feel works best. All work titles in English with a box underneath with the original work's title in the original language.
  • @LeadSongDog Yes. I would propose that would be another filter in the table with all the editions. I was thinking of proposing a much better table too, but one thing at a time.
  • @tfmorris you're right. It's kind of like cultural appropriation and may perpetuate xenophobia, but for the cultural appropriation - that's why the original work's title will be there. We could switch it and have the original title on top and the English Translation underneath too. The thing is, English is the most used language that the world uses it as a default for communicating across multiple languages: https://www.lingualearnenglish.com/blog/tips-to-learn-english/10-reasons-english-important-language/ (and it even helps others learn too).
    The issue with the xenophobia argument is that this is going to be temporary. This website is an English website written in English. So naturally most people who visit will speak English. Until all parts of the website are translated, it's just how it is.
    However, if we use English as the basis, it'll be easier for the website to translate the works into another language, as it's one language to translate all from, no need to keep saying what the language of the site is. For instance, on Google Translate - it only allows you to translate from one language to another. It won't take 5 languages on the site and translate it into 1 language. @jessamynwest so it will allow for greater access to everyone instead of the opposite.
  • @jessamynwest You're right in that there shouldn't be much issue for people searching for a book in their language, provided that there's a translation in their language (I'm assuming). Still, how are we going to show every book in their language if it's not able to be translated? What would that format look like? If the entire site's translated to French by someone, will they still be looking at a Russian title? I just think there's a little bit of a misunderstanding. The English translation will be for the English version of the website. So the box for "English translation" would be "French Translation" when the French translation of the entire website appears. It's not going to be possible without starting with the English first, so while it may start veering towards hegemony, it's temporary (like I said) to create the way for other languages. I also think you're unjustly thinking this only happens with English, where it happens with other languages too (A French work was translated to Spanish for the main work: https://openlibrary.org/works/OL1086528W/Essays - which I translated to English, but will put it to the French original).
  • all: I feel having a translated title box will help with understanding works from long ago. Translating could help out historians in that way. The original text would still be there - as computer translators aren't perfect. Without translating, it'd be more difficult for everyone to cross cultural barriers and read/understand works outside of their language. It's not meant to turn this site into an English one only. Saying everything, shouldn't we have a 'translated from' box for each edition to show what it's translated from, like @cdrini says? Why don't we extend it to the cover image too while we're at it?

PS - Why don't we just have a big yellow star next to the original work and float it to the top?
So from what I see, the format would be:

  • Top: Original Title
  • Underneath: Translation (in the user's language) of the title, if possible. It should be the one that's most likely to be searched, like @cdrini says.
  • There may be an extension to this where the work's book cover should be the original too and optionally a translated cover would go underneath (if there is one)
  • I don't believe we need an 'other titles' bar underneath the main title, as that's what the editions are for
    Editions
  • float the original to the top
  • have a filter based on language the work is written in (that way, users can read in their language when they can find it)

How does this all sound? Thanks everyone for coming together and really hashing it out. It's really awesome in working it all out.

@hornc Not sure why this has a close label, as some of this is not hard to add in. (some is though)

There's a difference between translating a title and the title of a translated work. Literal translations are highly problematic.

The literal translation of _Mördare utan ansikte_ is 'Killer without a face'; the English translation is called _Faceless Killers_.

_Demons_, _The Devils_, and The _Posessed_ are titles of various English translations of the same work.

On the flip side, the German translation of _Jurassic Park_ is called _Dino Park_; the Spanish title of _Timeline_ is _Rescate en el tiempo: (1999-1357)_ which translates back to English as 'Rescue in Time.'

So please be clear about what you are requesting. Showing English titles (which?) of non-English works or translating the titles of non-English works. It seems you are suggesting both.

Agree with @seabelis. Literal machine translation is not accurate translation and won't correspond to the work's title in enough cases that it is worth trying. This is one of those things in librarianship that we describe, accurately, as "a hard problem" and it can't be solved by putting more computers on it.

Strong disagree on "this will just be temporary" especially when the thing you are referring to is temporary xenophobic implications of the website. Temporary xenophobic actions are still xenophobic actions.

Open Library is a joy, but change comes slowly to it and I would not be in favor of making a large-scale change that will "some day" be replaced with the right way to do the thing. The thing should be done the right way from the beginning (let users select their language, preference that language in OL presentation of content).

As built Open Library was a multi-lingual site and that multilingualism has been phased out over the last decade+ that it's been around. We should not continue to do that.

@seabelis I'm suggesting both, with a preference to the non-literal title when there is one. What the English title is probably won't be translated into the French title. I would say that when the site can be translated for everyone, it'll be manually entered in by the native speakers. Still, the machine should enter in a literal translation first, so it can be there first and corrected later by the better title.
@jessamynwest yes. It doesn't have to be on the front end though. It could be on the back end of the website so that when the website can be translated in every language, the feature could be added alongside it. I just can't write everything out, as my responses are long enough and if they're 20 pages+ long, no one'll read it.
Do you want to have a feature that has a drop down menu to show the translation of the title in a user's language? I thought about that and thought it wouldn't be worthwhile if it's through Google Translate of the English title (as Google translate doesn't translate more than 1 language). I'm not the developer for this website, so idk what would be used. These are just my thoughts. I'm not trying to create xenophobia, but find a solution to make the website more accessible to everyone (so the opposite). It just seems that the avoidance towards xenophobia (not just you) is creating the conditions for it (since you say that the website's heading in that direction, and if we do nothing, then it'd just stay that way or get worse).

I think since this issue may be bigger than I expected, maybe we should separate translations from editions, as it creates the assumption (at least with me) that everything is translated from one work (it kind of is, but kind of not). It's that if I can understand the website in my language, then I'd know how to do so for just about any language.

Those are my ideas, but @jessamynwest what would you suggest? I just use the website and have my thoughts on how it could be better. But I would like to hear others, so this can be resolved.

I have made my suggestion which is that we not implement this feature as written and instead move towards implementing a feature whereby a user can choose their preferred language which will determine how they see the "work" and possibly what edition will be shown to them.

An English-first approach is xenophobic and not okay. Maybe open another issue for a "user can choose their language" feature possibility?

Ethnocentric perhaps, better than xenophobic. It's not okay anyway.

@jessamynwest idk how the website will be translated in the end. These are my approaches though.

  • If it's through Google Translate, then it would be in English, as the site's in English, so when a person needs to translate it to their language, it'll appear in their language. Then the person doesn't need to have a scroll down to their language if it's done automatically. The issue with a scroll down to the preferred language is that not everyone's language may appear and then that would be limiting too (although Google Translate doesn't do every language, and that's limiting as well).

If you'd like to write that as a new issue, that's fine (I'm not, because I listed my solutions here and this is where can keep everything in one place to be fixed). I just think if it's the Google Translate method, it's inefficient.

  • Another method is where the OL has one website for each language. It'll look like Wikipedia. https://www.wikipedia.org/. The language appears in the front of the url (like ru.wikipedia.org for Russian). So if it's in English, then it'll show the English translation. For Wikipedia, they use en.wikipedia.org. So if we're going that route, then the website openlibrary.org would not open in English, but in a landing page where someone chooses their language preference.

Like I said, I'm not trying to incite xenophobia, nor be ethnocentric, I'm just trying to say that there should be a format with consistency with languages, so everyone can have equal access. With the format right now, then it would be in English (as the site's in English and so that's the consistency). I'm not saying "English first" because it's awesome and that's the way to go - it's just that the website's in that default language and someone in another country will have difficulties translating the site if it's in multiple languages. If the website's in Spanish, I'd say "Spanish first", etc. for consistency. Doing nothing though will make it worse, not better. I'm opting for the work to show the original (and original picture) and then the translation and translated picture (with the language of the user) underneath. It would first be a transliteration and then manually add the actual translation if it's different. Is this a good approach? Is there an issue with it that I'm not seeing?

@BrittanyBunk wrote:

I'm just trying to say that there should be a format with consistency with languages, so everyone can have equal access. With the format right now, then it would be in English (as the site's in English and so that's the consistency).

Not really. With the format right now, the "work" data should reflect the original language. It is a consistent criterion.

@LeadSongDog wrote:

I still contend that this should simply be a work ID, not a textual title.

I agree. In fact, all works have their Open Library ID. We don't need a title to identify them, so we don't need to choose a preferred language for the title. The work title in the original language is a consistent criterion. It's the same one librarians use.

If all the works have their ID, unique to each of them, I think that the issue of offering the title in one or another language should be resolved in another way, not by artificially modifying the titles or having to choose a preferred language for the work titles or for the Open Library site. It should be resolved "internally" by the software using those IDs and user preferences, I mean.

In any case, I agree with Brittany that efforts should be made to offer a better experience to the end user in terms of offering titles of works in their language (if they exist, because a transliteration is a bad idea in this case), especially to improve search functionality.

P. S.: I would like to clarify the following: an initial ethnocentric point of view can be modified, it is not necessarily a bad thing. Sometimes inevitable and unconscious, but not necessarily problematic. A xenophobic perspective is definitely a bad thing. That's why I suggested changing "xenophobic" for "ethnocentric".

@dcapillae what about my 2 box idea about the original work as one box and the translation as the 2nd one (and also out separating translations from editions*)? What's the idea you're saying about internally? (I'm confused) Thank you for understanding - it's about a better format for readability for all users, not something that's supposed to be offensive. I just can't write out everything all at once, otherwise no one will read it.

I mean a stacked approach - like if someone clicks on the 2nd edition, the other translations pop up for a user to choose one. It could be another format, just trying to create ideas.

How come the transliteration's a bad idea (especially if it's marked as such)?

Machine translations are often not a good idea for material with nuance like book titles which @seabelis mentioned above with examples. Unless there's a compelling reason to introduce bad data into the equation, I don't see that it solves a user problem.

@jessamynwest those would be manually changed. It's for translating the title so it's more readable, and it would have a description saying it's just a translation, not the most accurate title. If not, what do you propose for readers to see if there's no book translated into their language?

The only other idea I could think of is to not have the machine translation and just manual input. That would mean it would start out with nothing. Then multiple titles could be added in with their corresponding language (like what @cdrini's picture shows: https://github.com/internetarchive/openlibrary/issues/2601#issuecomment-551230169 - but it'd be belarusian title 1, belarusian title 2, etc. if there's more in one language).

@BrittanyBunk wrote:

@dcapillae what about my 2 box idea about the original work as one box and the translation as the 2nd one (and also out separating translations from editions*)?

With regard to titles in other languages, I think the most important thing is that you can search for works (and editions) from any title in any language (if those titles really exist because they have been published).

Two boxes is not bad idea, a main box with the original title and another with the translation (if it exists) in the user's language (perhaps the title with the largest number of editions), but always keeping the original title as main title. It is valuable information.

An option (filter) to group editions published in a particular language can also be of great help. @LeadSongDog suggested it above.

@BrittanyBunk wrote:

What's the idea you're saying about internally? (I'm confused)

Me too! :D I mean that the data should not be modified artificially (e.g. by adding a translation alongside, or changing the original title for a translation). The software should be able to show what corresponds in each case with the existing data: title of editions, title of works, titles of works in other languages, etc. I can't explain how it could be resolved by software, but it would be ideal.

@BrittanyBunk wrote:

How come the transliteration's a bad idea (especially if it's marked as such)?

Only real books with real titles. Open Library is a catalog for every book ever published. A transliterated title is not a real title. Nobody will look for it in Open Library because that book (with that title) does not exist (yet). I think it should not be in any book catalog. When an edition with that title is published, it will be time to add it to the catalog (with that title), but not before.

Only real books and real titles. I think that's the only right choice.

@jessamynwest those would be manually changed.

Open Library has no staffing for that. Those would not be manually changed.

If not, what do you propose for readers to see if there's no book translated into their language?

The book in the language it was published in. As I see it, you are trying to improve the OL user experience for English-speaking OL users. My proposal remains: keep this the way it is, add a different feature for letting a user indicate their primary language, build features off of that.

Only real books and real titles. I think that's the only right choice.

Strongly concur.

I think we're spiraling into a premature discussion which is prescribing a solution which we're not ready to prioritize. I don't think the solution as described is sufficient. It will need to be much more nuanced in terms of leveraging the infrastructure that we have to provide the right solution to our patrons. I feel inclined to close this issue until such a time as we have made the right decisions to enable moving forward responsibly. This conversation will remain here for us to refer to in the future.

@mekarpeles sounds good. I just wanted to add that I was thinking about the transliteration to allow for searching in every language. It would just need a note that says it's a direct translation and not the work's name itself (so people know, as otherwise it'll be hard to search for the book if it's not in their language at all). If there's a translation that's for an actual book, it'd be manually added in - in a way that it's possible to see the two differences (like an actual book name is highlighted). I guess this was premature (sorry I brought it up too early), but I hope I helped.
@jessamynwest by manually, they do: the OL user editors, like me and others. For the rest, I won't stop you from posting new issues, but this one got closed, so idk if it's a good idea anymore.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jdlrobson picture jdlrobson  ·  5Comments

BrittanyBunk picture BrittanyBunk  ·  5Comments

BrittanyBunk picture BrittanyBunk  ·  5Comments

bitnapper picture bitnapper  ·  4Comments

BrittanyBunk picture BrittanyBunk  ·  4Comments