Pdf.js: Interactive form (AcroForm) support

Created on 7 Sep 2016  ·  28Comments  ·  Source: mozilla/pdf.js

_This is a tracking issue only, so this is not the place for any other questions or discussions. Open a new issue for that._

This is a meta issue for interactive form (AcroForm) support according to Chapter 12.7 of the PDF reference (http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2110737). This includes all form elements except for signature fields, which are tracked in #1076. The objective is to get https://github.com/mozilla/pdf.js/blob/master/test/pdfs/f1040.pdf.link to render completely, but also to resolve other open issues and PRs.

General

  • [x] Prepare core and display layer for implementing form elements (#7596)
  • [x] Reference testing (#7602)
  • [x] Preference (#7602)
  • [x] Remove global PDFJS.renderInteractiveForms usage (#7640)
  • [x] Refactor field name construction code in WidgetAnnotation (#7775)
  • [x] Refactor or clarify where annotations are rendered

    • Mostly in the display layer, but text widget annotations with appearance streams are rendered in the core layer, which causes confusion...

  • [x] Appearances
  • [x] Storing entered values for when the page is destroyed when it is not visible
  • [x] Printing entered values

    • Either print the HTML elements or render the contents onto the canvas (use appendToOperatorList)

  • [x] Enable by default
  • [x] Update the example (#8030)
  • [x] Add Firefox pref to enable/disable forms (https://bugzilla.mozilla.org/show_bug.cgi?id=1652145)

Text widgets

  • [x] Rendering of single-line fields (#7602)
  • [x] Handle maximum length (#7622)
  • [x] Handle flags: multiline and read-only (#7633)
  • [x] Handle flags: comb (#7649)
  • [x] Handle justification (#7622)
  • [x] Sanitize maxLen and textAlignment in the core layer and unit tests for this (#7629)

Choice widgets

  • [x] Rendering of combo boxes (#7671)
  • [x] Rendering of list boxes (#7671)

Button widgets

  • [x] Rendering of pushbuttons (#9191)
  • [x] Rendering of checkboxes (#7898)
  • [x] Rendering of radio buttons (#7898)
4-annotations 4-form-acroform

Most helpful comment

This is a tracking issue (refer to https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251895091), so this is not the place for discussion or questions. Contact us on IRC in case of questions or file a separate issue if you found a bug. Thanks.

_(I'm unlocking the conversation to be able to let users use the reaction button to measure the interest for this feature, but irrelevant comments will be removed.)_

All 28 comments

This is a meta issue for tracking interactive form (AcroForm) support according to Chapter 8.6 of the PDF reference (https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf#page=671&zoom=auto,-246,244).

It might be a good idea to instead base the work on the latest version of the PDF specification, just in case there are any differences: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2110737.

Also, perhaps a good idea to add a "General" TODO item about ensuring proper test-coverage?

Both items have been addressed. Thank you!

I think that we're also going the have to actually parse the contents of the AcroForm dictionary, since otherwise we're not able to e.g. load all the necessary font resources.
Obviously, we cannot use custom fonts in the display layer, but we should be able to at least infer the correct font-family (and things like e.g. bold/italic) that should be used and pass that info on to the display layer.

Also, for printing forms, we might be able to utilize (or build upon) the already existing appendToOperatorList functionality, but that will definitely require that font resources present in the AcroForm dictionary has been loaded.

Another thing that we probably should attempt to support, is using the correct text colour in the display layer (note how in Adobe Reader the text in the form fields of f1040.pdf is blue). This probably ties in to better and more complete Appearance stream support.

Finally, a general question: Will we actually be able to support forms in a meaningful way, without partial (and well sanitized) script support?

Good points. I just added them to the item list above. I don't think we really need script support as the AcroForms generally just require filling and printing. AFAIK scripts are only used for interaction between elements, but we can implement the most used functionality ourselves (such as resetting the form or button actions for printing it). We'll have to see how widely used such script functionality is.

Handle flags: multiline and read-only

There's other flags that we might need to try and support as well, one example is comb which controls the spacing between the characters in an input field. That one is actually used on the second page of f1040.pdf, see the "Personal identification number (PIN)" field.

Sounds like a good idea. I have added it to the list.

It would probably also be a good idea see if the WidgetAnnotation code that builds the fullName property can be cleaned up or improved upon, see https://github.com/mozilla/pdf.js/blob/6c263c19946af23b723f148d9f05118971e18b36/src/core/annotation.js#L640-L670.

Also, regarding WidgetAnnotations it seems that different types can have different requirements for the V entry in the annotation dictionary, so it might be better to fetch and validate data.fieldValue in _each_ specific WidgetAnnotation subclass.

The first point is now in the list, for which I've got some ideas. I found out about the second point in a patch I'm currently finalizing for choice widget annotations, so that will be addressed there.

Hey @timvandermeij
When this functionality will be available? How I can help?

We're currently in the process of implementing this, but it's a large piece of functionality that will take time before it's complete. The ticked boxes above show which elements are already implemented and for other boxes there are already work-in-progress pull requests, so we're on track with this functionality. Feel free to test it by using the master branch and setting the renderInteractiveForms parameter to true. It's disabled by default as it's not ready yet.

Thank you tim, what can you tell me about digital signatures? There is progress according to this discussion thread https://github.com/mozilla/pdf.js/issues/1076

This was reported by the user: soa-x opened this issue on 13 Jan 2012

Almost 5 years have passed since it was reported.

Even someone has already done much of the implementation

viveksjain commented on February 22
@complience Hi, I have a proof-of-concept working at https://github.com/viveksjain/pdf.js/tree/sig-verify-support. You can try it by using git clone --recursive https://github.com/viveksjain/pdf.js.git. With a little bit more work it Should be ready for a pull request into esta repo, but I just Have not Had the time yet.

Do you know if these jobs were added to recent versions of pdf.js?

Re: https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251692825

Signatures in PDF files is a big and complex topic, one which is somewhat orthogonal to implementation of basic AcroForm support (which is what _this_ particular issue is tracking).

The current issue is just a tracking issue for implementation of basic AcroForm features, signatures are already tracked elsewhere (in #1076, which is where that feature should be discussed).

@lexcorp Please refrain from posting unrelated information and/or asking questions here, since it detracts from the purpose of this issue (which is to track support for basic AcroForm features).
Also, you've now posted basically the same information in _three_ different issues, please do not spam the issue tracker in this way!

Hello @timvandermeij @Snuffleupagus,
We really like your solution for adding support for AcroForm fields. We're planning to use these features in an app we're currently developing. We'd really appreciate if you can provide us a tentative date where you'd be able to add support for all types of form fields like checkboxes, etc. and export the filled data into an XFDF file or any other format. Thanks.

@anujgeek As I've already mentioned in https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251699579, this is a _tracking_ issue and not really a good place for this kind of general discussion and/or asking questions!

There's a number of fairly difficult TODOs left to implement, see the possibly incomplete list above, hence it's _not_ possible to give any sort of estimate of when, or even if, this feature will be completely implemented.

Also, note that so far all work has been done by contributors, and given that Mozilla is replacing PDF.js in Firefox (see https://wiki.mozilla.org/Mortar_Project) forms support will most likely take a while to complete.

This is a tracking issue (refer to https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251895091), so this is not the place for discussion or questions. Contact us on IRC in case of questions or file a separate issue if you found a bug. Thanks.

_(I'm unlocking the conversation to be able to let users use the reaction button to measure the interest for this feature, but irrelevant comments will be removed.)_

Hello together!

What is the progress with AcroForm fill?
Used example https://www.irs.gov/pub/irs-pdf/f1040.pdf (and other) still does not work. Or is it not configured by default?
Some basic JavaScript like set field(s), clear field(s), send button support mentioned?

Thanks.

@Alex-DE-74 Please read through the above comments carefully, in particular https://github.com/mozilla/pdf.js/issues/7613#issuecomment-251895091 and https://github.com/mozilla/pdf.js/issues/7613#issuecomment-287907674 are relevant.
Furthermore, you've already asked these questions in #9261 (where answers were provided); please let's try and keep this tracking issue free from that kind of general discussion.

@Snuffleupagus

Excuse me, but for me it's not really traceable throught many topics, which item has which stage. And cyclic references are not helpfull at all. From point of https://github.com/mozilla/pdf.js/projects/1 it is clear for me, what pice of AcroForms is supported now (complettely) and what is on plan. Moreover, many topics address renering/viewing, but no words about fill/check/select/submitt etc. interactive feature. So, by example, "Text widgets" part above has nothing about "Text typing". Than, if "AcroForm Dictionary" is currently not parsed at all, how can it works really well?
Maybe if would be helpfull for "users" to see a simply table where AcroForm featrures with their properties and a state of whole/particular/planned support listed. (why this showed bold=?!)

P.S. It is pain to me, I'm not JS/HTML5 expert, but done a lot of things on the other site (creating PDF with C#) and familiar wth other programming languages too. Is it worth to me to try to understand the current code in order to provide some more interactive support and help to develop this project? Or will be this take a huge amount of time just to understand the current architecture?

I have removed the bold style for you. I would like to emphasize again that this is not the place for such a discussion; a channel like IRC would be more appropriate so we can give some background information. Filling in/submitting/printing forms is in fact in the checkbox list above, it just hasn't been implemented yet. The "text widgets" part is about rendering text widgets, which means the input fields you can type in. That's done; the part that remains is storing the entered values. Anyone is welcome to help out with implementing this.

BTW: Chrome is also not able to save PDFs with forms, but there's a workaround. Forms are rendered by default and one is able to print them and one can even print them as PDF by default, including the form input.

Maybe this is applicable for pdf.js, too and we can just utilize the existing FF save as PDF ( https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/tabs/saveAsPDF )?

I am playing around with pdf.js trying to print entered form text field values. I have a rudimentary working proof of concept where I can render entered values to the printing PDF. I now want to dicuss my approach and see if someone comes up with a better or simpler one.

In my approach I pass the entered values to the worker task by adding a map to the task. This map is currently filled on the 'beforeprint' event.
In the 'getOperatorList' mehtod of the 'TextWidgetAnnotation' I read the object stream and replace the old text value of the 'Tj' operator with the new one. This works, but has a lot of problems coming along. The first one is, that it fails, if the stream has no 'Tj' operator because the field had no value. The second one is, that the placement for alignments other than 'left' will be wrong.
So the next idea is to create a completely new stream calculating all values by myself. This will be a lot of work, so I wanted to discuss this approach first.
I can already create a new stream and displaying the values, but again, there is the problem with the offset values of the 'Td' operation. I digged into to the code a bit and I think I need to calculate the offset X and Y position by taking into account the width and height of the String with the given Font. I found the FontDescriptor for one embedded font, but not for a system font. With the font descriptor I have the ascent and descent value of the font, with which I think I can calculate the y offset The x offset will be fixed for left-aligned texts, but needs to be calculated for centered, or right-aligned texts. I think I am able to do this with the widths array of the Font xRef, but again, there is no such for system fonts. So I think I would have to use a canvas and the measureText method.

So as you see there is a lot of 'thinking'. But before I try to implement and test my approach, I'd like to know what others are thinking of it.

Some time ago we had a discussion about how we could approach this. Refer to https://mozilla.logbot.info/pdfjs/20161219. The idea is to have two different operator lists: one for the UI and one for printing. In the one for printing, we would replace operations based on the entered/selected value in the widget.

I think this is somewhat easier than what you're describing since we let the remaining logic do the heavy lifting for us; we just have to provide the correct operator list.

This is a problem that we have to solve in multiple small steps. The first step is to make the annotation code asynchronous, which is done by @dmitryskey in #9822. The next step would be to parse the AcroForm dictionary for e.g., fonts and to parse the default appearance entry in the annotation dictionary for all appearance information. For this we can probably use the evaluator to get the information as an operator list, which required the annotation code to be asynchronous. Then, we can create the printing operator lists for each annotation type.

I also thought of creating the operation list by myself, but this would be more complicated for me than my approach. I just create the pdf object stream with 'BMC ... EMC' and pass the stream to the evaluator, which generates the operationlist.
If I create the operation list array myself, I will have the same problems as with generating a new object stream. But imho it is more complicated to create the oplist than to create a string and convert it to a objectstream. This already works in my proof of concept.

I though Opera/Chrome are using pdf.js as well, but Opera is able to print & use formular data. Maybe there's sth. we can reuse?

They use PDFium, which is mainly C++ code.

Hey all, the company I work for is starting to leverage PDFJS and I have been told I need to get "Storing entered values for when the page is destroyed when it is not visible" working. I am not sure if this thread is the right place to discuss it. @timvandermeij, it looks like you are a major driver of this project. Is there anyway we can get in contact with you or someone from the community that might be able to assist. I have a strategy for implementing this feature, but I want to make sure that what I do can also be mainlined back into this repo. We are also willing to sponsor or create some feature bounty as well, if that would help knock things off faster.

If you have ideas on how this should be done, it's best to open a separate issue to discuss it. The main question is what to do with the entered data. Render it onto the canvas when printing? Provide an option to download the values in FDF format? Render a new PDF file with the filled values? Et cetera. It depends on what the user would expect and what other PDF readers do.

Closing since AcroForm support is now done and enabled. The remaining issues are now filed in individual issues and collected with the 4-form-acroform tag; see https://github.com/mozilla/pdf.js/labels/4-form-acroform.

Was this page helpful?
0 / 5 - 0 ratings