Openlibrary: Attempting to merge some authors fails

Created on 31 Jul 2019  ·  50Comments  ·  Source: internetarchive/openlibrary

Description

Specifically Ludwig van Beethoven (duplicate author records identified via Wikidata) https://openlibrary.org/authors/merge?key=OL127077A&key=OL4357202A&key=OL7272005A&key=OL7480477A

fails

I suspect it may have something to do with one item in the list being or referencing a redirect -- needs investigation.

Relevant url?

Examples:

| Done | Person | Merge Link | Error |
| --- | --- | --- | -- |
| X | Ludwig van Beethoven | https://openlibrary.org/authors/merge?key=OL127077A&key=OL4357202A&key=OL7272005A&key=OL7480477A | ?? |
| X | Apollonius Rhodius | https://openlibrary.org/authors/merge?key=OL325079A&key=OL6050345A | {'message': 'expected /type/author, found /type/delete', 'at': {'property': 'authors', 'key': '/books/OL20525473M'}, 'value': '/authors/OL6050346A', 'error': 'bad_data'} |
| X | D. S. Margoliouth | https://openlibrary.org/authors/merge?key=OL1751871A&key=OL4335758A&key=OL3277479A&key=OL2832645A&key=OL3126854A&key=OL6010579A | {'message': 'expected /type/author, found /type/redirect', 'at': {'property': 'authors', 'key': '/books/OL20457133M'}, 'value': '/authors/OL5989450A', 'error': 'bad_data'} |
| X | Gaius | https://openlibrary.org/authors/merge?key=OL134502A&key=OL4675154A&key=OL6002146A | {'message': 'expected /type/author, found /type/delete', 'at': {'property': 'authors', 'key': '/books/OL20496191M'}, 'value': '/authors/OL6036269A', 'error': 'bad_data'} |
| X | Carl Gustav Jung | https://openlibrary.org/authors/merge?key=OL17370A&key=OL2677210A | {'message': 'expected /type/author, found /type/redirect', 'at': {'property': 'authors', 'key': '/books/OL12811553M'}, 'value': '/authors/OL2660553A', 'error': 'bad_data'} |
|

  • [X]
  • | Michel-Jean Sedaine | https://openlibrary.org/authors/merge?key=OL735423A&key=OL6011794A | ?? |
    |
  • [X]
  • | Friedrich August Wolf | https://openlibrary.org/authors/merge?key=OL4789371A&key=OL6011897A | ?? |
    |
  • [X]
  • | Gottfried Hermann | https://openlibrary.org/authors/merge?key=OL357738A&key=OL5999368A | ?? |
    |
  • [X]
  • | Friedrich Wimmer | https://openlibrary.org/authors/merge?key=OL4277168A&key=OL6039003A | ?? |
    |
  • [X]
  • | Philipp Karl Buttmann | https://openlibrary.org/authors/merge?key=OL2557977A&key=OL5998002A | ?? |
    |
  • [X]
  • | Hermann Diels | https://openlibrary.org/authors/merge?key=OL133119A&key=OL6011208A | ?? |
    |
  • [X]
  • | Jean-François de La Harpe | https://openlibrary.org/authors/merge?key=OL1271659A&key=OL5996409A | ?? |
    |
  • [X]
  • | Lope de Vega | https://openlibrary.org/authors/merge?key=OL80534A&key=OL2693344A | ?? |
    |
  • [X]
  • | Carl Gustav Jung | https://openlibrary.org/authors/merge?key=OL17370A&key=OL2677210A | ?? |
    |
  • [X]
  • | Gilbert Murray | https://openlibrary.org/authors/merge?key=OL125439A&key= | ?? |

    Expectation

    Merge should happen

    Proposal & Constraints

    Stakeholders

    Data @hornc Detail 3 Work In Progress Bug merging

    All 50 comments

    There are many 2008 AMZ-sourced books of music notation for which the isbn seems to be a dead end at OCLC, or even the authorship is misattributed to the publisher at amz. For some of these BWB can find a cover by isbn, but it seems to have the same crap metadata. We either need to cast a wider net in other databases, or just quarantine them somehow and trust that real books will resurface.
    See author Isagani Intano for a few examples.

    The problem author is
    https://openlibrary.org/authors/OL4357202A/Ludwig_Van_Beethoven
    which will not merge into the master OL127077A

    Tracking down the likely problem item:
    OL11122403M
    https://openlibrary.org/books/OL11122403M/Piano_Literature_of_the_17th_18th_and_19th_Centuries_Books_6B

    Through the UI, this doesn't even look like a LVB item since the author UI data is coming from the work https://openlibrary.org/works/OL15097322W/Piano_Literature_of_the_17th_18th_and_19th_Centuries_Books_6B

    However, if you look at the Edition's blank cover tile, it shows an expanded list of authors, which comes from the Edition's metadata: https://openlibrary.org/books/OL11122403M.json which shows a list of authors...

    authors: [ { key: "/authors/OL47923A" }, { key: "/authors/OL4357202A" }, { key: "/authors/OL2779314A" }, { key: "/authors/OL126336A" }, { key: "/authors/OL3338683A" }, { key: "/authors/OL2779506A" }, { key: "/authors/OL38111A" }, { key: "/authors/OL3551619A" } ],

    OL47923A is a redirect... to Mozart https://openlibrary.org/authors/OL5017833A/Wolfgang_Amadeus_Mozart

    So there's a couple of issues here:

    1. Merge authors is swallowing errors without any indication of what or where the problem is.
    2. Edition page UI is hiding authors listed specifically in the edition metadata, which can sometimes differ from the work.

    and maybe 3., a contributing factor making this even harder to debug: #183

    and 4. Why is merge authors even breaking on this? Why can't it simply update the affected items' authors and move on?

    ANS: I think it relates to #1445 where some items' data can be in a state where their authors are redirects, but re-saving throws an error. <<< this seems to be the root cause of a number of these redirect problems.

    a past PR that tried to deal with a similar problem: #2186 I need to investigate whether that fix needs to be applied in another location, or whether there is a gap in the fix. Either way, something is missing.

    the authors view page is swallowing author merge progress, and errors, and I think this problem is occurring on other pages that used to have an error flash msg.

    From debugging this I see there is a message div
    https://github.com/internetarchive/openlibrary/blob/17cd1728e21a8dafd3dffcebc93dee9a534c37ec/openlibrary/templates/type/author/view.html#L92-L118

    that is styled as class.hidden : display: none !important; in page-user.css

    There are scripts that attempt to .fadeIn() those hidden sub-divs. I _think_ the !important is preventing the fadein, but when I remove it, they become permanently visible.

    @jdlrobson , any ideas or tips? I'm interested in getting this working to tidy up this author merging feature as it is blocking me and affecting librarians, but I have a feeling this hidden problem is may be the cause of other missing error message too.

    @hornc @jdlrobson The !important is very likely related; see the thread starting from https://github.com/internetarchive/openlibrary/pull/2223#issuecomment-513393435

    Sorry for the pain (again). The !important was added in 0f9030c1047d5a337fc292a09085d7c353c85424.

    The problem with not using !important, is if you have

    <div class="hidden button">foo</div>
    

    and a rule of equal specificity:

    .button { display: inline-block; }
    

    the button is not actually hidden against expectations.

    I've been trying to move us more in a BEM direction so these specificity rules become more of a pain.

    The following grep yields 6 results:

    removeClass('hidden');
    

    and 4 for:

    addClass('hidden');
    

    In this case replacing:

    class="hidden"
    

    with

    style="display: none;"
    

    would work.

    Other things we could try:

    .button[style] { display: block;}
    

    (assumes style attribute is removed on a hide, which may not be the case.

    @cdrini I know you are opinionated on this one so what do you think?

    @jdlrobson I don't disagree with the logic, I disagree with the execution :P display: none seems like a good solution to me (not the style thing). I don't like how we're playing whack-a-mole with bugs on production. We should either 1) make sure all the hidden classes are changed to display: none (since that was the implied meaning before the commit 6mo ago; this would need to be done manually), or 2) remove the !important and do (1) later. I don't like that we're in this in-between state where we've changed the meaning of the hidden class without checking what depended on it.

    Yeh I messed up on the execution 6 months ago :( 321d120 looks like the fix here then, provided it can be tested and works.

    Hopefully the whack a mole will die down. I'd love to not do that, but without reliably knowing which templates are abandonware and which are still active, and the fact that the JS is littered across templates as well as JS the task is a little overwhelming and demoralising ( I've spent 30 minutes trying to check workflows without making any progress and now just feel sad) so I think this is the best approach for the time being. It's easy and quick to fix once the problem is identified and as the breaker of these things, please tag me when you see them.

    Two more examples have been added from Wikidata suggested merges. I can confirm that the cosmetic issue of the hidden error message is fixed and the merge failure message is correctly displayed to the user, but the underlying data and/or merging issues still remain.

    Although the "Arg. That didn't work" error is displayed the (important) error details are missing. In the D. S. Margoliouth case, they pinpoint the exact record that it's unhappy about:

    {'message': 'expected /type/author, found /type/redirect', 'at': {'property': 'authors', 'key': '/books/OL20457133M'}, 'value': '/authors/OL5989450A', 'error': 'bad_data'}

    Since we basically ignore edition authors (and probably don't care if it's a conflicting/wrong author as long as it's not a redirect), having this cause an author merge to fail seems a little silly to me.

    We should either:

    • silently fix the error and update the record with the redirect target, or
    • ignore edition authors altogether

    As a side note, when the error message says "We've made a note of it" that makes it sound like it's logged someplace where someone will notice and fix it. Does it get logged? Does anyone review the log?

    The error for @Camillo-Pellizzari's merge was:

    {'message': 'expected /type/author, found /type/delete', 'at': {'property': 'authors', 'key': '/books/OL20496191M'}, 'value': '/authors/OL6036269A', 'error': 'bad_data'}

    The author record was deleted by @hornc's CleanupBot back in 2017 because it wasn't used on any works, but it was still used on this edition record. Now, because there's no way to edit edition authors, this can't be cleaned up without a programmer's assistance.

    That example has a single work incorrectly attributed to the OL2677210A Carl Jung: "The Workbook" is a 3 volume commercial art directory, of which "Portfolio" is volume 2. It's a good thing the author merge errored, though how that happened is (too) obscure.

    @seabelis
    Ouch! That's 59 work records and two author records for one multivolume work with various editions, commentaries, and translations. We really need a wiki on how best to structure such things, but that's a separate discussion. In the meantime, I've manually changed all the work records from the latter to link the former author record instead.

    Thanks for doing that. A user submitted this, so I did had not even noticed that about the works.

    I've merged the two Gaius author records together, but there's a third that should be merged as well, I think, but is erroring on the merge: https://openlibrary.org/authors/OL6002146A/Gaius

    Even after moving all the works from OL6002146A to OL134502A, https://openlibrary.org/authors/OL134502A/Gaius?merge=true&duplicates=OL6002146A still errors, and the redirect is not created. Bizarre....

    Hmm, the problem author records seem to all have been created by Import Bot on 27 Oct 2008. Other oddities that might be hints: They include an obsolete "id=" field that is removed by any direct edit to that author record, but still can't be merged, so that isn't the problem. The trailing space after the author name might be a factor, or the "personal name=" field seen in some cases.

    Sigh, that list is getting long :( Thanks @Camillo-Pellizzari ; add to the list.

    Added :+1:

    Note this will likely be fixed by https://github.com/internetarchive/openlibrary/issues/2553

    @Camillo-Pellizzari
    This smells like just another legacy of our mangled diacritics. I've managed to merge most of the redundant author records to Émile Egger at https://openlibrary.org/authors/OL4557532A/ but that last record at https://openlibrary.org/authors/OL6003522A is stubborn.

    @Camillo-Pellizzari
    A clue!!!!
    I moved the 16 Mayhew works to the main author record manually, but one orphan edition record persists, cached perhaps. The authors still won't merge. That one edition has the malformed pseudowork path https://openlibrary.org/works/OL20459197M with the old author identified in the edition record, conflicting with the correct author shown in the work record https://openlibrary.org/works/OL2788965W.
    No way to know which of these oddities is the cause of the merge fail, but if an admin can tweak it, it could be instructive:

    {"publishers": ["Chatto & Windus"], "classifications": {}, "subtitle": "illustrations of the humour, pathos, and peculiarities of London life", "title": "London characters", "notes": "1e uitg. (1874) met de aanduiding \"By Henry Mayhew and other writers\" (Vgl. Toole-Stott, no. 491.).", "identifiers": {}, "ocaid": "londoncharacter00gilbgoog", "covers": [9182853], "created": {"type": "/type/datetime", "value": "2008-10-27T03:19:48.641147"}, "languages": [{"key": "/languages/eng"}], "last_modified": {"type": "/type/datetime", "value": "2019-12-11T23:49:48.914594"}, "latest_revision": 8, "key": "/books/OL20459197M", "authors": [{"key": "/authors/OL5239874A"}, {"key": "/authors/OL1331553A"}], "publish_date": "1881", "publish_places": ["London"], "works": [{"key": "/works/OL2788965W"}], "type": {"key": "/type/edition"}, "oclc_numbers": ["67342886"], "revision": 8}

    I'll investigate this one when I have time to write some code to do it automatically: https://openlibrary.org/authors/OL4280920A/Federico_Garc%C3%ADa_Lorca?merge=true&duplicates=OL6887222A,OL4122786A,OL3973784A,OL6250916A,OL6404110A,OL3210186A,OL7313848A,OL7306164A,OL7327570A,OL7386673A,OL7392312A,OL7416035A,OL7687411A

    @seabelis Found another https://openlibrary.org/authors/merge?key=OL4586796A&key=OL3206959A

    All of the editions list two authors, OL2629754A and OL3206959A, the first of which is a redirect.

    Of course, edition authors aren't editable, so this can't be fixed. I thought I could hack it by editing the YAML https://openlibrary.org/books/OL13263866M.yml?m=edit but no such luck - Permission Denied.

    I was able to remove the authors from the linked edition. https://openlibrary.org/books/OL13263866M/Relato_de_un_n%C3%A1ufrago?_compare=Compare&b=6&a=5&m=diff

    I think I recall from a different conversation that removing authors from editions is not preferred. I thought I could just clear the authors from the edition and then reapply the valid author but this throws an error,
    AttributeError: 'str' object has no attribute 'olid'

    I think I recall from a different conversation that removing authors from editions is not preferred.

    That's not my opinion. Since they can't be edited and aren't automatically kept in sync, I think they're more trouble than they're worth.

    I was able to remove the authors from the linked edition. https://openlibrary.org/books/OL13263866M/Relato_de_un_n%C3%A1ufrago?_compare=Compare&b=6&a=5&m=diff

    Were you able to do that through the web UI or did you use one of the APIs?

    @tfmorris openlibrary-client via the colaboratory notebook @cdrini helped me set up. I replaced the edition authors with an empty object; it's the same way I removed contributors previously when the UI wouldn't cooperate. I'm not sure this is the best way, but it allowed me to edit the work without the previous error.

    I have gone through and resolved all the data issues noted above, and performed the merges (some worked without any further changes, they must have been resolved elsewhere).

    The exact errors for each merge are visible in the HTTP 400 result of merge.json which can be seen in a browser dev tools console e.g. :

    {'message': 'expected /type/author, found /type/redirect', 'at': {'property': 'authors', 'key': '/books/OL13263870M'}, 'value': '/authors/OL2629754A', 'error': 'bad_data'}
    

    These messages used to appear on the merge results page to at least point to the problem edition. Now they don't.

    Thank you, @hornc .

    Was this page helpful?
    0 / 5 - 0 ratings