Pdf.js: Formulas are painted too big.

Created on 23 Jan 2013  ·  62Comments  ·  Source: mozilla/pdf.js

Have a look at
http://arxiv.org/pdf/0707.3195v1.pdf

From page 2 onwards all formulas appear too big and are partially painted over the rest of the text. Other pdf viewers show this correctly.

This is how pdf.js shows page 2:
wrong

This is how evince shows page 2:
right

3-pdf-broken 4-font-conversion 4-os-linux

Most helpful comment

Apologies if this is inappropriate, but I will offer a $500 bounty for this bug being fixed.

At ShareLaTeX we use PDF.js to render the PDF produced from users' LaTeX documents, and so our users probably encounter this bug more often than the average user.

I couldn't find any precedent or comments regarding offering a bug bounty for this project, so I hope it's ok. Feel free to get in touch with me at [email protected]. If you'd prefer, we can set up the bounty in escrow with something like https://www.bountysource.com/, or if anyone else would like to add to the bounty.

All 62 comments

I forgot to mention: I'm using PDF Viewer 0.7.1 with Firefox 18.0.1 on Ubuntu Linux.

Looks like it affects only linux

However Windows displays an error in the log as well:
[16:59:53.199] Error in parsing value for 'font-size'. Declaration dropped. @ http://arxiv.org/pdf/0707.3195v1.pdf
[16:59:53.569] Error in parsing value for 'font'. Declaration dropped. @ http://arxiv.org/pdf/0707.3195v1.pdf

@dabbeljuh, do you think it's related?

@yurydelendik : Seems not, I've used the stepper and both warnings appear on the first page (the first when setting the vertical arXiV-nr on the left, the second when finishing the page).

@yurydelendik I think we can close this. I cannot replicate the issue on Arch Linux x64, Firefox 22 and pdf.js development. Also no problems on Windows 7 x64.

I just double checked. The problem still persists on my machines.
(both Ubuntu Linux 12.04, Firefox 22, Pdf.js 0.8.1)

Maybe it is fixed in development pdf.js though, I don't know.

Let me know if you want me to test anything.

Maybe it is fixed in development pdf.js though, I don't know.
Let me know if you want me to test anything.

@kaymes Please try downloading the file and opening it (by clicking on the open file button, placed at the right hand side of the PDF.js toolbar) in the web viewer: http://mozilla.github.io/pdf.js/web/viewer.html.
This always uses the the latest version of PDF.js, so please try that and see if the file is displayed correctly.

I just tried the online viewer at
http://mozilla.github.io/pdf.js/web/viewer.html. It still renders wrong
with all the braces rendered way too big. It was yet another computer
but also one running Ubuntu 12.04 with Firefox 22. So the issue might be
Linux specific (or even Ubuntu) but it shows on at least three
different machines.

Hmm, strange. In that case, I guess it would be Ubuntu specific, because there are no problems here on Arch Linux. Learning something new every day :-)

I just did the test whether it is some addon's fault. I started firefox
in safe mode and opened the document using the online viewer. The
problem still persists. So addons can be ruled out.

And I did one more test: I downloaded Firefox directly from Mozilla. So
all Ubuntu patches/modifications are gone. And then I started this one
in safe mode. The problem is still there.

I see this as well on Ubuntu 13.04

[18:42:43.639] "PDF 8d10792f8d2028a66825b6ce6ab5b3c1 [1.4 GPL Ghostscript GIT PRERELEASE 9.05 / dvips(k) 5.95a Copyright 2005 Radical Eye Software] (PDF.js: 0.8.510)

Is this still an issue with the latest PDF.js development version on http://mozilla.github.io/pdf.js/web/viewer.html? I cannot reproduce this on Arch Linux.

The problem still prevails (Ubuntu 12.04, FF26).

selection_012

Under Ubuntu-based Linux Mint this bug is also reproducible with Google Chrome 34 (and Firefox 32.0a1), so it's not an exclusive Firefox issue. Opera 12.16 renders correct though.

I'm just going to use the words TeX and LaTeX and math in this comment so that people may find this bug.

This seems to be related with antialiasing: I use gnome 3.14 in a Debian Jessie machine, Firefox 33.0.2. In both RGB and Grayscale antialiasing, when the option of Slight hinting is selected (in Gnome Tweak Tool), I have the same problem. When I change to any of the other hinting options (Full, Medium or None) it looks as it is meant to look.

Note that in Firefox you at least have to refresh the tab to see the change.

For me (Arch Linux) this bug appears if I use an infinality patched fontconfig/freetype. Using the vanilla packages does not show this bug.

I don't know if it is related to the patches or to the configuration shipped with the patched packages.

Can reproduce in Ubuntu 14.04 in Chromium and Firefox. Note how the artifacts change when scaling the document. I've seen this bug in dozens of pdfTeX documents in pdf.js, e.g. sum indices are affected as well.

This really appears to be an upstream issue, though I am not sure where to file it.

I rebuilt Ubuntu 14.04 freetype/fontconfig without most of the distribution-specific patches, but the problem persists.

I also installed the latest freetype/fontconfig from Ubuntu 15.10, yet the problem persists.

Perhaps this needs to be filed as an upstream Firefox (Linux) bug? I'm just not sure if it is caused by Firefox or a particular Linux font library.

Here's a minimal testcase:

\documentclass{article}
\begin{document}
$$ \sqrt{\frac{1}{2}} $$
\end{document}

This renders differently on Ubuntu and Windows:

<style type="text/css">
* { margin:0; padding: 0 }
@font-face { font-family:"g_font_1";src:url(data:font/opentype;base64,T1RUTwAJAIAAAwAQQ0ZGIEG/4oQAAACcAAACWU9TLzJL4jE4AAAC+AAAAGBjbWFwAA0ApQAAA1gAAAAsaGVhZKsnRLAAAAOEAAAANmhoZWEDBvRyAAADvAAAACRobXR4BdwAAAAAA+AAAAAMbWF4cAADUAAAAAPsAAAABm5hbWUiztZPAAAD9AAAAgRwb3N0AAMAAAAABfgAAAAgAQAEBAABAQEOTU5QRUhJK0NNRVgxMAABAQFF+BsA+BwB+B0C+B4D+B8EHgoAH4uLHgoAH4uLDAdzHPRwHAWu+ZgFHQAAAKoPHQAAAAAQHQAAAK8RHQAAABwdAAABYhIABQEBDSAtOkBWZXJzaW9uIDAuMTFTZWUgb3JpZ2luYWwgbm90aWNlTU5QRUhJK0NNRVgxME1OUEVISStDTUVYMTBNZWRpdW0AAAAAAAAAAAADAQEDC62LDhwB9BwAABYOHAPoHABvFhwBYxz3jhUc//8GHP8oHAPqBRz/fRz/MgUc//kc//ccAAAc//4cAAAc//8IHAAAHP/8HAANHP/1HAABHP//CBwARBwAawUcAOcc+88FHAAhHAAAHAADHAAAHAAGHAAaCBwCJhwJHQUcAAIcAAccAAIcAAkcAAAcAAUIHAALHP/4HAAJHP/0Hhz/8BwAABz//Rz/8xz//Rz/8ggOHgoEeW8MCboKuguzkgwMs5IMDYsMDh0AAAAcEwBrAQECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygpKissLS4vMDEyMzQ1Njc4OTo7PD0+P0BBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWltcXV5fYGFiY2RlZmdoaWprbAsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLCwsLAAAAAAADAiQB9AAFAAACigK7AAAAjAKKArsAAAHfADEBAgAAAAAGAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAqMjEqAAAAcgByAwT0cABkAwQLkAAAAAAAAAAAAa8AAAAAAHIAAwAAAAEAAwABAAAADAAEACAAAAAEAAQAAQAAAHL//wAAAHL///+QAAEAAAAAAAEAAAAAEAAAAAAAXw889QAAA+gAAAAAngt+JwAAAACeC34nAAD0cA//AwQAAAARAAAAAAAAAAAAAQAAAwT0cAAA//8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAAfQAAAPoAAAAAFAAAAMAAAAAABQA9gABAAAAAAAAABAAAAABAAAAAAABAA0AEAABAAAAAAACAAcAHQABAAAAAAADAAgAJAABAAAAAAAEAA0ALAABAAAAAAAFAAwAOQABAAAAAAAGAAAARQABAAAAAAAHAAcARQABAAAAAAAIAAcATAABAAAAAAAJAAcAUwADAAEECQAAACAAWgADAAEECQABABoAegADAAEECQACAA4AlAADAAEECQADABAAogADAAEECQAEABoAsgADAAEECQAFABgAzAADAAEECQAGAAAA5AADAAEECQAHAA4A5AADAAEECQAIAA4A8gADAAEECQAJAA4BAE9yaWdpbmFsIGxpY2VuY2VNTlBFSEkrQ01FWDEwVW5rbm93bnVuaXF1ZUlETU5QRUhJK0NNRVgxMFZlcnNpb24gMC4xMVVua25vd25Vbmtub3duVW5rbm93bgBPAHIAaQBnAGkAbgBhAGwAIABsAGkAYwBlAG4AYwBlAE0ATgBQAEUASABJACsAQwBNAEUAWAAxADAAVQBuAGsAbgBvAHcAbgB1AG4AaQBxAHUAZQBJAEQATQBOAFAARQBIAEkAKwBDAE0ARQBYADEAMABWAGUAcgBzAGkAbwBuACAAMAAuADEAMQBVAG4AawBuAG8AdwBuAFUAbgBrAG4AbwB3AG4AVQBuAGsAbgBvAHcAbgADAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA);}
td { border: 1px solid #ccc; width: 9px; height: 10px; }
</style>
<table cellspacing="0" cellpadding="0" style="border-collapse:collapse;empty-cells:show">
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
<tr><td></td><td></td></tr>
</table>
<div style='font:16px "g_font_1";position:absolute;top:0;left:0'>r</div>

Don't forget that Chrome is also affected, so it's likely not a Firefox bug.
Screenshot: Chrome 44 under (Ubuntu-based) Linux Mint.
zwischenablage03

@jethrogb Thank you for filing this upstream with so many details! Hopefully this will get the issue fixed soon.

It seems that the freedesktop folks concludes that this is still in part a pdfjs bug. So there is probably still some work to do here.

Also, in the meantime, are there any easy workarounds other than zooming in until the bug disappears? I use sharelatex frequently, which seems to have a pdfjs viewer built-in.

It is a pdf.js fault. I a nutshell, pdf.js creates invalid fonts for those math symbols. Subsequently, the font auto-hinter comes to wrong conclusions which triggers the wrong display.

Thus, pdf.js should fix they way pdf fonts are converted.

I've ran into this problem with the atom pdf view extension, just as further proof it's not a Firefox problem. Screenshots of how it looks in atom, in PDF.js, and how it should look.

I can confirm this problem exists in Chrome in Ubuntu 15.10.

This bug should now be fixed with an up to date Freetype library installed (>= 2.6.2)

No, the real bug is in the Type 1 to OpenType conversion of pdf.js which creates invalid font glyphs, and AFAIK this is not yet fixed.

Oh, I misread the upstream bug. Sorry for the bad information.
By the way, I do not detect this bug on my both Debian machines (Jessie and Stretch).

the same here: no problems at all!
Arch Linux x86_64 and font stuff patched with infinality[*], firefox 45.0.2
(the same happens with chromium 50.0.2661.75-1)

[*]cairo-infinality-ultimate 1.14.6-1
fontconfig-infinality-ultimate 2.11.95-4
freetype2-infinality-ultimate 2.6.3-2

Apologies if this is inappropriate, but I will offer a $500 bounty for this bug being fixed.

At ShareLaTeX we use PDF.js to render the PDF produced from users' LaTeX documents, and so our users probably encounter this bug more often than the average user.

I couldn't find any precedent or comments regarding offering a bug bounty for this project, so I hope it's ok. Feel free to get in touch with me at [email protected]. If you'd prefer, we can set up the bounty in escrow with something like https://www.bountysource.com/, or if anyone else would like to add to the bounty.

BTW: This has been fixed upstream in freetype https://bugs.freedesktop.org/show_bug.cgi?id=91829

This has nothing to do with FreeType, as people keep saying and as is thoroughly discussed in the FreeType issue you link, there is a bug in pdf.js's Type1 to OpenType converter.

This has nothing to do with FreeType... there is a bug in pdf.js's Type1 to OpenType converter.

Actually, it is an issue related to FreeType because only this engine experiences the issue. There might be an issue with the pdf.js's converter, and it will be helpful to understand why it is happening. Unfortunately the link above does not provide the detailed explanations. More input from the FreeType experts would speedup this bug resolution.

A font file converted by pdf.js
The original font file embedded in the PDF

Choice quotes from the FreeType bug report:

The font in question has a root-sign at letter 's'.

The font contains a cmap that maps position r' (nots', BTW) to a glyph called .notdef'. Since the auto-hinter accesses a font not by glyph names but by a Unicode mapping (either a real one or a synthesized), it believes that glyphr' is present

Font cmaps must not lie to the auto-hinter!

Oh it's probably pdf.js's fault partially as well, perhaps the bogus cmap is comping from pdf.js, not from the original font. Someone needs to verify.

The original file cmex10.pfb' (version 003.002) as delivered with TeXLive has a correct encoding vector, using glyph nameradicalbigg' at position 0x72. The subsetted `CMEX10.pfb' file in the FireFox bug report also has the correct glyph name.

Tentative fix in #7482. I don't have resources to look into this more if testing fails, but it could be simple. The font in the pdf is a bit strange since it has a symbolic font, but also has unicode mappings for some of the symbols. Usually for symbolic fonts we move all glyphs to the private use area if there is only an identity unicode mapping.

I can reproduce #7482 fixes the issue for me at least on the second page of the PDF linked to in the first post.

@brendandahl Awesome, thanks. I'll check to see if your patch fixes it ASAP. Is it able to merged? (It looks like some tests are failing?)

Is it able to merged? (It looks like some tests are failing?)

That's expected with this kind of patch. However we need to inspect differences before we will make a call.

Good progress! I did some more extensive testing and found a complication - the fix works for output produced by dvips+ghostscript but not for output produced by pdftex -- if I take the source for the test case above and compile it with pdflatex the output is rendered incorrectly.

Attached is a more exhaustive test case including one of the broken equations from the original 0707.3195v1.pdf file.

The first pdf file is produced directly by pdflatex and then the same output is produced by latex->dvips->ps2pdf. The screenshots are the rendering from pdf.js with the pull request applied - it doesn't solve the problem with pdftex output, but does fix it for the ghostscript conversion.

Presumably there's something different about the way the font is embedded in the output by pdftex that causes the bug to still occur?

test-pdftex.pdf - still broken
test-pdftex pdf - google chrome - with fix

test-ps2pdf.pdf - fixed
test dvi - test-ps2pdf pdf - google chrome - with fix

Original latex source test.tex.txt

Hi @jpallen, just want to let you know that it seems to fix the problem on Sharelatex when I switch the compiler to XeLatex.

We've done a little more digging into this at ShareLaTeX, and it looks like the patch above (https://github.com/mozilla/pdf.js/pull/7482 by @brendandahl) is on the right lines in terms of moving characters into the private use area, but doesn't cover all the necessary cases. The PDFs generated by ps2pdf work, but those generated directly by pdflatex still have this rendering problem.

If we naively move _everything_ to the private use area (e.g. https://github.com/brendandahl/pdf.js/compare/move-non-unicode-glyphs...briangough:put-all-symbolic-chars-in-private-use-area), then it renders correctly. However, this is just a debugging example since I assume this isn't a good idea.

At this point our knowledge reaches the limit and we don't know how to identify the correct symbols to put in the private use area. So is there anything we can do to help move this forwards?

If we naively move everything to the private use area (e.g. [email protected]:put-all-symbolic-chars-in-private-use-area), then it renders correctly. However, this is just a debugging example since I assume this isn't a good idea.

Without having tested the above, I'd suspect that doing so could actually lead to worse rendering in _many_ PDF files, since it might (basically) cause hinting to be disabled for _all_ Symbolic fonts in certain font renderers. (Note that a lot of fonts claim to be Symbolic, even when they are in fact not.)

At this point our knowledge reaches the limit and we don't know how to identify the correct symbols to put in the private use area. So is there anything we can do to help move this forwards?

Since the problematic cases use normal Type1 fonts, I still think that the correct solution _may_ be to ensure that we provide proper Charset/Encoding information when converting Type1 fonts in Type1Font_wrap.

@jpallen i think we need to recognize fonts that are generated by latex and those fonts are symbols (e.g. by name or the way they are created), but they shall not be recognize as such in rest of the cases.

Not moving _all_ characters into private area gives us some advantages, e.g. possibility of using fonts in input controls, better instrumentation, and ability for engine to discard invalid glyphs or fonts and still get somewhat readable text. So knowing if glyph is a symbol is a key here.

A tentative idea, based on PR #7482, could perhaps be to move characters to the PUA when we cannot trust that the toUnicode data is correct; e.g. something like this: https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:issue-2594.

Great! I've tried the new patch in Snuffleupagus:issue-2594 and it seems to work nicely for my test case and various pdflatex documents I tried. :+1:

As a test I have deployed it in production in the pdf viewer on www.ShareLaTeX.com, to see if any unexpected issues show up today.

We've tested this patch (https://github.com/mozilla/pdf.js/compare/master...Snuffleupagus:issue-2594) in production over the past 3 weeks and it has fixed the LaTeX font rendering problems for us, with no other issues showing up. Would be great if it can be included thanks. :+1:

I started reviewing #7705 and started to wonder why my original patch didn't also fix test-pdftex.pdf. Just looking at the font data it looks like pdf.js should move the majority of the glyphs from DVFZZA+CMEX10 to the private use area since most of them do not have valid glyph name to unicode values. For example, one of problematic glyphs (charcode=110 name='braceleftBig') does not have a unicode value but it was being mapped to 'n'. The issue seems to come from when we build a unicode map, it correctly contains 68 values with glyphs that have matching unicode values, but after building it we add back all the original encoding values, hence 110 gets filled in with 'n'.

I'm not quite sure what the right fix is here because if we remove the code to add back encoding values then our text selection will regress from https://github.com/mozilla/pdf.js/commit/325f7afcca30c891ec7be06a5178c012a052bd55

Maybe @Snuffleupagus has some thoughts...

_As https://github.com/mozilla/pdf.js/issues/2594#issuecomment-254289210 suggests, the previous version of PR #7705 contained a solution that was too simplistic._

I've thus put together a new (and for me most likely final) attempt at fixing this, which can be tested with: http://107.21.233.14:8877/768d76e3834ac61/web/viewer.html.
It would be most helpful if people that are currently affected by this issue could test the latest version of PR #7705, and report if it's enough to fix this issue!

Works well on the test-pdftex.pdf, we will try deploying it in production on www.ShareLaTeX.com this week and see if there are any issues reported.

Works well on the test-pdftex.pdf, we will try deploying it in production on www.ShareLaTeX.com this week and see if there are any issues reported.

As discussed on IRC, please refer to http://logs.glob.uno/?c=mozilla%23pdfjs&s=21+Nov+2016&e=21+Nov+2016#c54315, we'd like to move forward with PR #7705.
@briangough Do you have any results yet from testing the patch in production on ShareLaTeX?

As previously mentioned in https://github.com/mozilla/pdf.js/issues/2594#issuecomment-259930252 it would be most helpful if those currently affected by this issue could help with testing the proposed solution, which can be done using e.g. the preview in http://107.21.233.14:8877/768d76e3834ac61/web/viewer.html, and report if it fixes these issues!

We'd like to land PR #7705, but we really need confirmation of the fix before doing so.

Sorry for the delay. The patch is working fine - no complaints from our users, thank you.

Closing as fixed by #7705, thanks to @Snuffleupagus and @brendandahl!

Was this page helpful?
0 / 5 - 0 ratings