React-pdf: Text selection sometimes jumps during selecting

Created on 15 Nov 2017  ·  15Comments  ·  Source: wojtekmaj/react-pdf

The div's containing each text selection render out of order. This causes text selection to jump around. This can be seen in the demo.

bug

All 15 comments

Hey @nbwoodward,
sadly, PDFs hardly ever contain information in a logical order. PDFs were originally meant to ensure that documents looked the same when printed, regardless of the computer used to print. So if you didn't do your PDF professionally, in Adobe Acrobat, the text selection and copying might not give you the best possible results. This is entirely out of my control.

PDF used for demo is done right actually, so it shouldn't jump a lot between paragraphs - if you copy text, you will copy text, and there won't be any floating content in between paragraphs, for example.

Yeah I thought that might be the case. Somehow PDF renderers (for example the one in Chrome) seem to be able to sort that out.

I was thinking that since we have knowledge about the position of each text element on a page, we could use that to sort the array of text before rendering it. I may have a project coming up that requires good text selection of arbitrary PDF's in the browser. If it goes through I will see about forking and working it out.

How about the one on Firefox? Because that's exactly the same engine that Firefox is using internally.

Also, please keep in mind that Mozilla is still heavily investing in PDF.js and new releases are coming insanely often; so one day it might be fixed on their side. In such case, no intervention would be needed from my side - I just render text content in whatever order they are giving it to me.

The built in Firefox PDF viewer works well also. So it sounds like PDF viewers are doing some sort of processing of the text between the parsing of the document and the rendering of it.

I thought you were maybe asynchronously pushing DOM nodes on the text layer but after looking through the code it seems more like the text is just showing up in a weird order out of the PDF processor. But I could be wrong.

Hey @nbwoodward - thanks for info. It would help if you could attach minimal example of a PDF in which selection works well on Firefox while it doesn't on React-PDF. I'll have a look into that. But not today, it's 3 AM 😆

In the meantime - I'm super excited to announce that in v2.3.0 experimental support for SVG rendering was added. This gets rid of text layers altogether, so the problems with them should be gone too ;)

You can check out this feature on my online test suite too:

http://projekty.wojtekmaj.pl/react-pdf/test/

Definitely, I'll try to get a demo up in the next couple days.

I just tried the SVG rendering. It looks great, and totally bypasses the highlighting issue, awesome!

Here's a demo of a PDF that acts one way in Firefox and another way in react-pdf.

http://users.neptuneinc.org/nwoodward/react_pdf/

As a side note, it looks like Firefox has not gotten PDF selection perfect (or this is a particularly weird PDF) because the selection in the left column doesn't work very well on either react-pdf or Firefox. But the right column does work in Firefox but is broken in react-pdf.

So, perhaps sorting the text layer before render could make react-pdf work better than Firefox.

Let me know if you have any questions.

Again, maybe the SVG rendering obviates all of this.

God damn, that's a good demo! I expected something much smaller :D Thank you, I'll have a look on this next weekend (or faster if I still manage to do something after full time job)!

Okay, I investigated and the reason for this behavior is that Mozilla hacks the selection system. The DOM is actually the same, but the moment you start selecting, it covers up the entire page with additional <div>, preventing you from hovering over empty space between the text, but somehow still allowing you to continue selecting text!

...and I have a better idea than Mozilla...

Ha! Good find. That's a pretty funny solution. Perhaps it will be reasonably easy to replicate?

I think it Mozilla's approach makes no sense. pointer-events: none to textContent container, pointer-events: all to all its children, and we're done. I'm testing the fix at the moment.

Bad news: Both Mozilla's and my approach is failing on browsers other than Firefox (if you don't believe me, you can test Mozilla's viewer on Chrome here). So I'm going to push minor fix Firefox-only and closing it since if Mozilla didn't find any good solution for a couple of years of development then I highly doubt we could.

In case I'm horribly wrong, please let me know in this thread, and I'm going to make a fix both to here and to PDF.js.

The fix will be applied in v2.4.0.

Aaaah bummer. You're right, that link works terribly in Chrome.

If/when I need good selection functionality (I may have a project that depends on it) I'll either try out the SVG rendering or fork and see what I can do.

Thanks for looking into it!

Sorry if I disappointed you :( I'll let you know if I come up with something and promise me to do the same! ;)

Haha not at all. I appreciate you taking the time to check it out. I'll let you know if I come up with any bright ideas.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wojtekmaj picture wojtekmaj  ·  4Comments

shivekkhurana picture shivekkhurana  ·  4Comments

zambony picture zambony  ·  3Comments

saadq picture saadq  ·  3Comments

herneli picture herneli  ·  3Comments