Pdf.js: Remember view position after refreshing the page

Created on 8 Jan 2016  ·  34Comments  ·  Source: mozilla/pdf.js

Currently the view position is saved by a hash based on the file contents. When the page reloads, we should also take the last position into account, because it matches the normal behavior of browsers. Usually, when you reload a web page, the scroll offset is restored (even if the content of the page changed).

The motivation for this change comes from experiencing the following broken workflow:

  1. Generate a local PDF file (file://....pdf).
  2. Open PDF with PDF.js and scroll to some chapter in the PDF file.
  3. Edit PDF file.
  4. Refresh the PDF.js viewer (e.g. with F5).
  5. Expected result: Retain the scroll position.
    Actual result: Page 1 is shown in the viewport.

Technical notes:

  • performance.navigation.type can be used to detect page reload versus navigations.
  • history.state is preserved when a page reloads.
1-viewer

All 34 comments

That would be awesome.

I'm a student at Seneca college learning open source, and I was hoping to work on this bug for my course. If no one else is currently working on it, I'd like to give it a try.

Nobody indicated that they're working on it, so it's all yours! Feel free to contact us on IRC or leave a message here in case you have questions.

Hey thanks a lot for the quick reply. I really want to contribute to an open source project. I will start working on it right away. Since this my first time doing something like this, is there anything that I should know about?
Thanks a a lot!

I think all necessary information for this patch is listed on https://github.com/mozilla/pdf.js/wiki/Contributing. Unless you're touching files in the src/ folder (which I do not expect; I expect you'll only need to touch files in the web/ folder), you only need to run gulp lint and gulp unittest to verify your changes. You can run gulp server to start a local server to test your changes in the browser. If you have more questions, check the wiki, contact us on IRC or ask here. Good luck!

Thanks, I will start reading through the files.

I'm looking into this but I don't know if I understood the problem very well.

1 - Generate a local PDF file (file://....pdf).
3 - Edit PDF file.

So the issue is only related to building/generating my own PDF? E.g. building it with some pdf generator like latex/jspdf?

I did the following and couldn't reproduce:

  1. Built myself a PDF and opened it with http://localhost:8888/web/viewer.html?file=/andrei_test/a4.pdf
  2. navigated to page 3.
  3. Then edited the pdf (added more text in page3)
  4. refreshed and saw the new content on page3 appear but I was still at page 3, pdf.js didn't move me to page1.

Before this, I just tried refreshing the default PDF from viewer.html a few times and I was under the impression that the page wasn't remembered at all. But now I think I understand, if I refresh it too fast (before the internal hashing is done to remember where to scroll back after refresh), then it will just get me to the last place I was before my last scroll, not to my last position. But if I wait half a second more and then refresh, then I see it's fine, I get scrolled position to where I last scrolled.

So I'm not really sure what I'm after here. Could you give more details onto how to reproduce? Thanks!

I can't test again at the moment but in step 4 you used to get that you
jump to page 1 after refresh (if the document changed). That said I was not
working locally but over a server connection. Not sure if that could make a
difference.

On Sun, Dec 31, 2017 at 4:42 AM, Andrei Petre notifications@github.com
wrote:

I'm looking into this but I don't know if I understood the problem very
well.

1 - Generate a local PDF file (file://....pdf).
3 - Edit PDF file.

So the issue is only related to building/generating my own PDF? E.g.
building it with some pdf generator like latex/jspdf?

I did the following and couldn't reproduce:

  1. Built myself a PDF and opened it with http://localhost:8888/web/
    viewer.html?file=/andrei_test/a4.pdf
    http://localhost:8888/web/viewer.html?file=/andrei_test/a4.pdf
  2. navigated to page 3.
  3. Then edited the pdf (added more text in page3)
  4. refreshed and saw the new content on page3 appear but I was still
    at page 3, pdf.js didn't move me to page1.

Before this, I just tried refreshing the default PDF from viewer.html a
few times and I was under the impression that the page wasn't remembered at
all. But now I think I understand, if I refresh it too fast (before the
internal hashing is done to remember where to scroll back after refresh),
then it will just get me to the last time I was before my last scroll, not
to my last position. But if I wast half a second more and then refresh,
then I see it's fine, I get scrolled position to where I last scrolled.

So I'm not really sure what I'm after here. Could you give more details
onto how to reproduce? Thanks!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/pdf.js/issues/6847#issuecomment-354573873,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGBkZqzS34MYDM8wZi41cNY0NiVUyoI-ks5tFsNqgaJpZM4HBeqE
.

I just retested and was definitely moved back to page one on reload. This
is with chrome browser if that makes a difference. And still working
remotely with http-server.
By the way sharelatex, rstudio and others are using pdf.js backends and
have solved this issue already apparently. Could we not just ask them to
contribute a patch?

On Sun, Dec 31, 2017 at 7:18 AM, Yasha Savelyev yasha.savelyev@gmail.com
wrote:

I can't test again at the moment but in step 4 you used to get that you
jump to page 1 after refresh (if the document changed). That said I was not
working locally but over a server connection. Not sure if that could make a
difference.

On Sun, Dec 31, 2017 at 4:42 AM, Andrei Petre notifications@github.com
wrote:

I'm looking into this but I don't know if I understood the problem very
well.

1 - Generate a local PDF file (file://....pdf).
3 - Edit PDF file.

So the issue is only related to building/generating my own PDF? E.g.
building it with some pdf generator like latex/jspdf?

I did the following and couldn't reproduce:

  1. Built myself a PDF and opened it with
    http://localhost:8888/web/viewer.html?file=/andrei_test/a4.pdf
    http://localhost:8888/web/viewer.html?file=/andrei_test/a4.pdf
  2. navigated to page 3.
  3. Then edited the pdf (added more text in page3)
  4. refreshed and saw the new content on page3 appear but I was still
    at page 3, pdf.js didn't move me to page1.

Before this, I just tried refreshing the default PDF from viewer.html a
few times and I was under the impression that the page wasn't remembered at
all. But now I think I understand, if I refresh it too fast (before the
internal hashing is done to remember where to scroll back after refresh),
then it will just get me to the last time I was before my last scroll, not
to my last position. But if I wast half a second more and then refresh,
then I see it's fine, I get scrolled position to where I last scrolled.

So I'm not really sure what I'm after here. Could you give more details
onto how to reproduce? Thanks!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/pdf.js/issues/6847#issuecomment-354573873,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGBkZqzS34MYDM8wZi41cNY0NiVUyoI-ks5tFsNqgaJpZM4HBeqE
.

I can confirm this problem is reproducible. Not only the page moved back, the zoom also got reset. I suspect this may be due to the hash got changed when we modified the file.

@timvandermeij Is this up for grabs? I'd like to take a crack at it!

I can't figure it out

BrianNgo: I can confirm this problem is reproducible. Not only the page moved back, the zoom also got reset. I suspect this may be due to the hash got changed when we modified the file.

@BrianNgo Did you work on local, with the code, or how did you test this out? Could you give some step-by-step reproducing info?

yashamon: And still working remotely with http-server

@yashamon could you explain more your setup? It might be dependent on that, since when I tried running a local server and accessing it at localhost (e.g. http://localhost:8888/web/viewer.html?file=/andrei_test/a4.pdf), I couldn't reproduce this. I was also using chrome.

Jolo510: @timvandermeij Is this up for grabs? I'd like to take a crack at it!

@Jolo510 It's up for grabs, go for it. I'm not working on it, I couldn't reproduce it last time I tried.

The issue here is that the file changed only slightly, but the hash changed completely. For the purpose of testing, the actual PDF content does not matter, you only need to ensure that the PDFs that you are testing have different hashes.

To reproduce more reliably, you could take a set of completely unrelated PDF files (e.g. the PDFs in test/pdfs/), and overwrite a PDF file before reloading PDF.js (with the view set to page 2, so that you will see the difference between page 1 and page 2). In this way, the same file path will point to a different file and you can see the bug in action.

@andreip Yes, I am testing it on local with Chrome. What I did is open the pdf similar to what you had: http://localhost:8888/web/viewer.html?file=/andrei_test/a4.pdf. Then I used libreoffice to modify the file and exported it. Refreshed the page and the bug happened.

In my opinion, this is not really a bug. By modifying the file, the app perceives the current file as new file (which is the safest thing to assume). Thus the app should reset its history to view it as a new file.

The real issue would be when refreshing the file too quick before internal hashing is done.

@andreip Awesome! I'll see if I repo it locally.

I plan on getting the app running locally tonight. Then make some time in the next day or two to reproduce the bug and dig into the code.

@BrianNgo If the issue is refreshing the file too quickly, what would be a potential fix?

Any progress on this?

On Wed, 17 Jan 2018, 23:07 Johnnie Lo, notifications@github.com wrote:

I plan on getting the app running locally tonight. Then make some time in
the next day or two to reproduce the bug and dig into the code.

@BrianNgo https://github.com/brianngo If the issue is refreshing too
quickly, what would be a potential fix?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/pdf.js/issues/6847#issuecomment-358539017,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGBkZmlmOIxzNatXTXTGW3bNaeNFkWFzks5tLtF2gaJpZM4HBeqE
.

@yashamon Nope, I haven't made any progress on this.

@Rob--W
hey ,
I see many people taking a shot at it . Giving it a try . Please let me know if we need to write test for this as well

@ankitverma2211 If possible, tests would be great.
However, we don't have automated tests for this kind of features, so if the patch looks reasonable and passes a manual test, then we would accept it too.

I would like to start on this. is anyone else currently working on this?

Not that I know of. Feel free to work on this!

Sure I am starting on this will ping you guys on IRC for any help

On Mon, Dec 24, 2018 at 4:24 PM Tim van der Meij notifications@github.com
wrote:

Not that I know of. Feel free to work on this!


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/pdf.js/issues/6847#issuecomment-449718751,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8SZdbnLGoX5cY1fvk02tcM_3o8KDctks5u8LJUgaJpZM4HBeqE
.

@timvandermeij I have gone through the whole code which is involved while rendering pdf file.
It uses local storage for storing pdfjs view history with files as an array. In which each element stores the fingerprint of the file and other metadata about the last view history. when we modify a file fingerprint of the file changes and for that new fingerprint we don't have any view history.

my old file fingerprint => 14ecd8cdbbf6f76f04030d59025b5937

fingerprint after file change => 619c4c4f872e96e6514b25c6a1ae03f2

As far as I have gone through fingerprint calculation for a doc it depends on the content and pdf trailer.

here is some reference

figerprint calculation

stackoverflow referance

let me know what you say about this. should we close this issue?

Hi Rahul,

It might help if you check out Sharelatex in action, which uses pdf.js as a
backend and already has a work around, rerendering pdfs after latex source
code change, which will certainly change any hash, while keeping view
position. I believe their extensions are open source on github, but don't
have a link ready.

On Fri, Dec 28, 2018 at 3:01 PM Rahul Sharma notifications@github.com
wrote:

@timvandermeij https://github.com/timvandermeij I have gone through the
whole code which is involved while rendering pdf file.
It uses local storage to store to pdfjs view history with files as an
array. In which each element stores the fingerprint of the file and other
metadata about the last view history. when we modify a file fingerprint of
the file changes and for that new fingerprint we don't have any view
history.

my old file fingerprint => 14ecd8cdbbf6f76f04030d59025b5937

fingerprint after file change => 619c4c4f872e96e6514b25c6a1ae03f2

As far as I have gone through fingerprint calculation for a doc it depends
on the content and pdf trailer.

here is some reference

figerprint calculation
https://github.com/mozilla/pdf.js/blob/58c3ea08202becf007c304512c44726719acb508/src/core/core.js#L513

stackoverflow referance
https://stackoverflow.com/questions/33309378/using-fingerprint-generated-by-pdfjs-as-unique-id-for-a-pdf

let me know what you say about this. should we close this issue?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mozilla/pdf.js/issues/6847#issuecomment-450426605,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGBkZidcCqtZjNp18mXaFfC78IfPRj-1ks5u9oaTgaJpZM4HBeqE
.

it will be of great help if you can share the link to repo which is responsible for this and then it will be a feature to this repo rather than a bug

I used to have a greasemonkey script that replace key "C-r" to a "viewBookmark" click, which basically solves this issue for me. It didn't work after some version of Firefox. It looks like that greasemonkey is not loaded in pdf.js. Is it intended?

EDIT: after a bit of search I think this is intentional - https://discourse.mozilla.org/t/extensions-on-pdfjs-pages/28441

@timvandermeij @yashamon

I had look at Sharelatex repo. they are doing it using keeping track of pdfjs.history with projectId rather than fingerprint which changes with document changed, but projectId for that particular document remains same for sharelatex.

I have few questions in mind. I tried to connect with you guys in IRC but didn't got reponse

Questions:

  1. is we need to maintain page number also when pdf changes and the user opens a new file in a new tab.
    as it is maintained in the current fingerprint method.
  2. If it needs to be only in current tab we can use of sessions otherwise we will append some more keys to view_history.
    please guide me

Fixed in #10424.

Just tested this, still same behavior. Refreshing page fixes page view position only if the pdf file is unchanged, otherwise view jumps to first page. This is very easy to test with latex pick a document compile and preview the pdf add a random word in the latex source, recompile and preview the pdf, pdfjs preview jumps to first page. I am on release 2.2.191 in chrome. Will check in firefox when I get a chance.

I tested with firefox, it looks like on the latest release the issue is fixed, so is it just that Chrome version is lagging behind?

The Chrome extension version includes this patch. Its behavior may differ because of a difference in how the browser behaves. I once posted a detailed description of the problem at https://github.com/mozilla/pdf.js/commit/cdea75dc397f4eb4d6fd1f7d8a388c7d11df3452 (which was part of https://github.com/mozilla/pdf.js/pull/6200).

I submitted a similar issue #11359 with respect to *latex-generated pdf. It is actually not correct that this uses a "hash based on the file contents" @Rob--W. Rather, it is an ID embedded in the PDF upon creation, and how that ID is generated depends on the generating application, for *latex it is a hash based on the combination of the current time and the pathname of the tex-file. See my last comment there for a solution.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dmisdm picture dmisdm  ·  3Comments

timvandermeij picture timvandermeij  ·  4Comments

xingxiaoyiyio picture xingxiaoyiyio  ·  3Comments

kleins05 picture kleins05  ·  3Comments

liuzhen2008 picture liuzhen2008  ·  4Comments