Ipython: "Download as PDF" feature shows unhelpful error message and requires big dependency

Created on 8 Feb 2015  ·  40Comments  ·  Source: ipython/ipython

This is more a feature request because I tested the "download as PDF" feature in Ubuntu 14.10 using the new notebook (Ipython 3) and it works great. The problem is that when I tried it for the first time, I didn't have the required dependencies and received an error saying ! LaTeX Error: File 'adjustbox.sty' not found. I tried to bypass this issue installing only adjustbox.sty but ultimately I had to install texlive-latex-extra.

The issue is that this requires an installation of, at least, 584 MB. That is a big dependency for a very specific feature. So there are a couple of suggestions:

  1. Show a more straightforward error message when a dependency such as adjustbox is not found in the 500 error page of the notebook. Currently, the error says: nbconvert failed: PDF creating failed
  2. In the long-term, it would be great to convert to PDF using a web service in the same way that Google Drive when downloading a PDF from a document.
nbconvert

Most helpful comment

Thanks for chiming in on this, everyone, I am going through and cleaning up old issues that have been addressed, and this one is ripe for closure.

There was a lot of good discussion going on in this issue, but to summarize:

1. A more detailed message is now displayed when conversion fails. In notebook version 4.2.2, removing the .sty file in question on my machine produces an error message that looks like this:

screen shot 2016-10-27 at 11 57 41 am

2. we do not intend to provide conversion of notebooks to PDF as a web service
3. if a third-party makes such a webservice available, the built-in notebook download has been explicit about using LaTeX since #7951 - and looks like this:

screen shot 2016-10-27 at 12 01 12 pm

4. As for the size of the required dependencies, a workaround has been proposed by @iuridiniz just above this comment.

Any future discussion around these and related issues should probably take place over at https://github.com/jupyter/nbconvert .

Happy hacking! :bowtie:

All 40 comments

Did the latex error show in the HTML error page, or only in the terminal?
So long as the specific error message gets to the error page, I think
that's all we can do. We're not about to try parsing why latex failed - I'm
sure there are many possible ways.

It would be interesting to investigate alternative ways of producing a PDF,
like using reportlab or wkhtmltopdf. We definitely shouldn't be calling out
to a web service by default.

The latex error showed in the terminal. In the HTML error page, the only message I received was nbconvert failed: PDF creating failed. What about adding the traceback to the HTML error page?

I forgot to mention that the idea of using a web service was meant to be a fallback when an error occurs in the local installation. Shouldn't it be fairly straightforward to create a server receiving requests to convert ipynb files similar to what nbviewer does?

Webservice like nbviewer are opt-in people explicitelty make their work public, and on the internet.
Having a webservice for PDF by default would be really bad for privacy, and woudl make notebook unusable online.

The only service that does that for now is matjax, because it was too big. And for 4.0 we will ship it as part of IPython.

@Carreau I'm not saying that the PDF would be published online publicly, but converted in a server managed by Ipython and delivered as a download for the user as a fallback in case there is not a proper installation in the local machine.

It's very much frowned on in the open source world to have software that automatically sends your data to a server when you haven't explicitly asked it to do that. We could conceivably do something like that as an explicit option 'convert to PDF on IPython's servers', but we wouldn't do it as a fallback when local conversion failed.

Yes, you're right, that would be better. Maybe a message saying that something went wrong with the conversion and an extra button to download the file using an external web service.

I think that could be doable, but in this case why stop at PDF ?
Having a (full, but still restricted) conversion service in the cloud could make sens and haven been already mentioned a few time. That would be a "nbviewer api". It does though, raise problematic legal question of liability in case of user data leak, hack, or other things we are not (yet) ready to tackle.

Though, writing the code so that company could deploy it on their local network would be fine.

Well, PDF conversion requires a big download (although most of its size is documentation) and is not likely that every Ipython user will have that package installed. Therefore, I thought that reducing the effort to start using that feature would be a nice addition. What other conversions you have in mind?

Actually, I wouldn't be too happy to get a full-blown conversion service because such a project requires a lot of effort that would detract from improving the notebook and the law of diminishing returns applies quickly in that case.

Regarding user data leaks, I thought the only thing required for this to work is the pynb file. What type of user information is needed for the conversion? In any case, a server providing PDF's is not supposed to save user information.

User data includes the notebook content itself. And while the server isn't supposed to save that information, you don't know when you send it an HTTP request what it's going to do with it.

For the time being, I have no problem with saying that you need Latex installed to convert notebooks to PDFs. Dependencies are not something we need to avoid.

For the time being, I have no problem with saying that you need Latex installed to convert notebooks to PDFs. Dependencies are not something we need to avoid.

I agree with @takluyver, maybe something more elaborated can be developed later, but for now, I am fine with just ask people to install Latex if they want to use this features...

The latex error showed in the terminal. In the HTML error page, the only message I received was nbconvert failed: PDF creating failed. What about adding the traceback to the HTML error page?

This seems like a good idea to me, what does everyone else think about this?

This seems like a good idea to me, what does everyone else think about this?

Sounds good to me...

What about adding the traceback to the HTML error page?

This may make sense, though LaTeX errors are some of the longest and least informative errors that exist. We would need to make sure we properly handle 1000s of lines of error output, when usually at most one of those lines contains any meaningful information.

What other conversions you have in mind

Everything. If we can PDF, why not HTML, RST, markdown, and even Just Latex.
Getting Pandoc installed is not easy...

Regarding user data leaks, I thought the only thing required for this to work is the pynb file. What type of user information is needed for the conversion? In any case, a server providing PDF's is not supposed to save user information.

If you have side files, like linked images, they might need to be uploaded as well.
Even if the server is not supposed to keeps things, if we get hacked, it might,
or it might even give you an infected PDF.

It is hard to imagine what can be done with data (or lack of it), see [this example])http://mashable.com/2015/01/28/redditor-muslim-cab-drivers/) where pattern of missing data allowed to guess religion of NY taxi driver.

While we agree that the service would be nice, we will probably not enable it by default.
Also we need developper and dev ops time to maintain the service online + being legally protected,
so unless someone comme along and does it, or we get funding for that, there is only a small chance it will happend.

For the error message it would make sens.

I agree that, for the time being, is better to describe the dependencies and improve the error message.

Is it possible to detect if LaTeX is installed before trying to convert the ipynb file? If that is possible, Ipython would be able to give a helpful error message instead of throwing an error about a very specific file. In my case, I thought that another package had installed a LaTeX distribution because pdflatex seemed to work. I suppose most of the errors Ipython users encounter, in this context, are due to not having installed a LaTeX distribution. Therefore, an early detection of this issue would benefit most users.

Maybe, a reasonable compromise is to add the traceback to the HTML error page but not show it by default (because it is too long and uninformative.) After clicking on some text such as "Want to see the traceback?", the traceback could be shown. However, if early detection of a missing dependency is possible, the main message in the error page should convey that.

@takluyver

User data includes the notebook content itself. And while the server isn't supposed to save that information, you don't know when you send it an HTTP request what it's going to do with it.

But do you mean snooping as @Carreau suggested? Otherwise, we know what the server will do with that request.

@Carreau

Everything. If we can PDF, why not HTML, RST, markdown, and even Just Latex.
Getting Pandoc installed is not easy...

I don't think HTML or RST are needed in a conversion service because they are already available in the notebook itself. Markdown and LaTeX conversions can be useful, though.

If the server gets hacked, everything is possible. However, the likelihood of getting hacked is low enough that it shouldn't be something that prevents people from launching a service. Instead, security measures should be taken (promptly installing security updates, opening only the ports that are required, enabling a firewall, installing only the needed applications, SSL, etc)

I think we already check if the pdflatex command is available. I doubt there's a good way of checking whether all the files that will need to process the Latex are present, other than running it.

I doubt there's a good way of checking whether all the files that will need to process the Latex are present, other than running it.

I second this doubt.

I don't think HTML or RST are needed in a conversion service because they are already available in the notebook itself. Markdown and LaTeX conversions can be useful, though.

Don't underestimate the things nbconvert/nbviewer does :-) if you have SVG it can even $ inkscape from SVG to PNG. You just haven't hit that yet.

If the server gets hacked, everything is possible. However, the likelihood of getting hacked is low enough that it shouldn't be something that prevents people from launching a service. Instead, security measures should be taken (promptly installing security updates, opening only the ports that are required, enabling a firewall, installing only the needed applications, SSL, etc)

That's a lot of work, and don't underestimate the great firewall and the NSA.
But anyway there are many labs where having remote connexion to internet services is out of questions
also, but I think you highly underestimate the amount of work in "etc" and the "likelihood to get hacked".

If people like, let say Peter Norvig, are known to use notebooks, and then potentially use the service then it will with 100% chance be the target of attacks.

In Debian distributions this works:

 ➜  ~  dpkg -s sudo | grep "install ok installed"
 Status: install ok installed
 ➜  ~  dpkg -s texlive | grep "install ok installed"
 Status: install ok installed
 ➜  ~  dpkg -s texlive-latex-extra | grep "install ok installed"

I uninstalled texlive-latex-extra before running the last command. I don't have the slightest idea about Windows, though.

@Carreau No, I meant to say that nbconvert already does many useful conversions, so we don't need a conversion service implementing the same again.

Regarding security concerns, that is a lot of work but it is almost mechanical work. Against intelligence agencies, there is nothing you can do. The NSA breaks SSL-encrypted connections routinely now. The amount of work in "etc" depends on the particular requirements of the site, but in this case, due to the simplicity of the service, the additional needs are not too demanding.

Google is target of industrial espionage on a regular basis, and if some foreign intelligence agency wanted to steal information, there is almost 0% chance that their chosen attack vector is going to be an ipynb->pdf service. They know that technologically savvy people won't upload sensitive information in such a careless manner. Furthermore, if they wanted to obtain Norvig's data, they would infect his laptop and then every additional security measure would be pointless.

I have both latex and pdflatex installed, but on RHEL 6 and 7 it's always adjustbox.sty that the conversion stumbles over. Does it HAVE to be this style file that seems to be non-default?

PDF export is hopelessly difficult on Windows 7 (or any Windows for that matter).

This is the process:

  1. Missing mistune - the pip install is broken but luckily the conda install became available today.
  2. Missing pandoc. Not available at all in pip or conda so search the web for an 18 MB Windows installer.
  3. Missing pdflatex. Where does one get this for Windows? There's something called MiKTeX which is a 163 MB install, and isn't clear that it will do this PDF export.

So far, that's 1 broken pip install and uninstall, 1 successful conda install, and 2 massive downloads from random third parties before we get close.

I just don't know how anyone could maintain this across a load of users' machines, and this is just to output a file as PDF. Many other software packages manage PDF export without any dependencies.
Isn't there a better way?

@blokeley Hopefully you had to deals with it only once. We have to deal with it almost each weeks.
If you have a better way that does works, then that would be great.

I think we could fix this one of two ways:

  1. Attempt to verify that latex and its dependencies are installed.
  2. Write a PDF template & exporter, which outputs to PDF directly.

I think (2) is actually easier than (1). I don't know if this is the kind of thing that the team is interested in allocating man power for though... I've had a direct PDF template working in the past, it was very basic though.

I think there are only two reasonable ways for us to do PDF export:

  • HTML via browser print preview -> print-to-pdf
  • LaTeX

If someone wants to write another PDF exporter using different tools, it would be welcome as a custom third-party exporter, but probably not as a new exporter shipped with nbconvert itself.

Maybe something as simple as a rename from "PDF" to "Latex PDF" and a more verbose error message would suffice. Right now the HTML message is "500: Internal server error" with nothing indicating that it's a dependency problem (except in the server console, which the user may not have access to).

Maybe something as simple as a rename from "PDF" to "Latex PDF"

already done in #7951

The error message should still be improved, though.

I'd love to write the PDF exporter but my time is somewhat limited (I won't bore you with the excuses). If nothing else happens, I'll try writing a prototype some time this summer.

Or more realistically could this be a Google Summer of Code project? I'd be happy to donate a bit of cash.

IPython doesn't do GSoC - Fernando feels quite strongly that it's not worth our time to mentor people.

I am having similar issues as blokely for windows 7.
I am behind a Firewall at my work and need to get special permission to install software. A number of my colleages and myself are trying out the Anaconda distribution. we were hoping it would be the only thing we would have to install.
I guess we need a total of 3 addons, mathjax, pandoc, (these two are documented in the detailed install instructions) and third MiKTeX which is not in the install instructions. you don't find that out untill you go to pandoc page. getting the notebook and nbconvert to work flawlesly on windows behind a firewall is not easy!

Someone could write an nbconvert PDF exporter that either goes directly to PDF using reportlab (which is now pip installable on Windows), or produces HTML and then converts it to PDF using a tool like wkhtmltopdf or weasyprint. If it went well enough, we may even include it in nbconvert. But I don't think that's high enough priority that we're going to work on it ourselves any time soon.

@drafter250 you might also need Inkscape if you want to use vectorized matplotlib plots (svg) converted to pdf.
@takluyver I also had a quick look at reportlab, but I fear that arranging the layout, especially in the presence of equation, will be quite tricky. Thats a big plus of LaTeX.

There are certainly a lot of dependencies here. For nbconvert. Latex, sphinx, jinja, miktex, inkscape. etc. im having a hard time keeping all this straight. Are there any resources out there for describing how all these interelate to give nbconvert full functionality. Also maybee it would be best to disable various commands for which dependencies could not be resolved

For the time being, I have no problem with saying that you need Latex installed to convert notebooks to PDFs. Dependencies are not something we need to avoid.

The problem ist that currently all (most?) of the RHEL-ish distributions are locked out of converting. The generated latex code relies on adjustbox.sty and there's no package available that provides that file (affects: RHEL6, RHEL7, EPEL6, EPEL7, Scientific and supposedly CentOS too)

This seems to be the offending pull request regarding the OPs original error:

  • #3578 Use adjustbox to specify figure size in nbconvert -> latex
    490

It might be interesting to experiment with jsPDF.

@serverhorror there already exists a bug report for adjustbox. So hopefully there will be a solution soon.
Alternatively, it shouldn't be to hard to locally install adjustbox.sty, see comments here

@jakobgager I've seen that. Unfortunately RHEL7 (EPEL7) isn't even on the roadmap yet (waaaaayyyy beyond my control).

For the time being I'll just use a locally patched version that basically reverts #3578

+1

If you use (in ubuntu 15.04):

apt-get install --no-install-recommends texlive-latex-extra texlive-fonts-recommended

You decrease the download size to 24 MB (instead of 606 MB). It does not download some documentation (about 300 MB)

Thanks for chiming in on this, everyone, I am going through and cleaning up old issues that have been addressed, and this one is ripe for closure.

There was a lot of good discussion going on in this issue, but to summarize:

1. A more detailed message is now displayed when conversion fails. In notebook version 4.2.2, removing the .sty file in question on my machine produces an error message that looks like this:

screen shot 2016-10-27 at 11 57 41 am

2. we do not intend to provide conversion of notebooks to PDF as a web service
3. if a third-party makes such a webservice available, the built-in notebook download has been explicit about using LaTeX since #7951 - and looks like this:

screen shot 2016-10-27 at 12 01 12 pm

4. As for the size of the required dependencies, a workaround has been proposed by @iuridiniz just above this comment.

Any future discussion around these and related issues should probably take place over at https://github.com/jupyter/nbconvert .

Happy hacking! :bowtie:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jakirkham picture jakirkham  ·  4Comments

gregcaporaso picture gregcaporaso  ·  3Comments

lewisacidic picture lewisacidic  ·  3Comments

hexhexd picture hexhexd  ·  4Comments

quchunguang picture quchunguang  ·  3Comments