Pdf.js: Generate (test) coverage statistics

Created on 10 Jul 2017  ·  43Comments  ·  Source: mozilla/pdf.js

To identify parts of our code that are not covered by tests, or dead code in practice, generating coverage reports would be helpful. Another use case is to quickly identify which parts of PDF.js are relevant when analysing an issue in a specific PDF file (this can especially be useful to new contributors).

There are multiple tools, but I have had a good experience with https://github.com/gotwarlost/istanbul.
It can be integrated with Travis, and the results can be published to an external service such as coveralls, see e.g. https://coveralls.io/github/Rob--W/cors-anywhere?branch=master (integrated with https://github.com/Rob--W/cors-anywhere/commit/f9af03e76249b4dd38722b459293773ccddb6c7d).

PDF.js has different ways of execution that I know of (see gulpfile.js for more details):

  • unittestcli - Runs some unit tests of PDF.js in Node.js (with minimal source changes, only with transpilation with babel, configured in gulpfile.js).
  • unittest - Runs some unit tests of PDF.js in the browser (with minimal source changes, only with transpilation by babel, configured in systemjs.config.js)
  • browsertest - Runs tests in browsers (we test Chrome and Firefox). This relies on the binary created by the generic build target, which uses code transpiled with babel and then bundled with webpack (configured in gulpfile.js).
  • examples/node/pdf2svg.js - Can be used to trigger the SVG rendering backend on Node.js (depends on the generic build target, just like browsertest)
  • as a browser extension (Firefox / Chromium), using the firefox / chromium build targets (uses a similar build process as the generic target, just with different DEFINES)

Ideally we would obtain coverage statistics for the original sources, but to start with we could also settle for coverage statistics on the generated JS files that directly run in the browser / Node.js (if it is easier).

1-test 5-good-beginner-bug

All 43 comments

unittestcli - Runs some unit tests of PDF.js in Node.js (with minimal source changes, only with transpilation with babel).
browsertest - Runs tests in browsers (we test Chrome and Firefox). This relies on the binary created by the generic build target, which uses code transpiled with babel and then bundled with webpack.

Note that while browsertest will run the reference tests, there's also the unittest command which runs the complete set of unit tests in browsers (as opposed to unittestcli which only runs a subset of the existing unit-tests).

Furthermore, note that the "transpilation with Babel" step can be skipped, if the PDFJS_NEXT build flag is set (either like other build flags in gulpfile.js, or as a command-line argument). While the code is still bundled with Webpack, at least it's possible to avoid the transpilation step.

@Rob--W I would like to work on this

It's yours! Let us know (preferably on IRC) if you have questions.

@timvandermeij I am thinking to use Karma tool.It works with the istanbul code coverage engine.Statistics can be checked for the tests execution, HTML reports can be made from it.Is it a good way to start?

@Divya063 Could you share your code, e.g. by pushing your current code to a branch on your pdf.js fork on Github? I wonder whether webpack or babel is being run when necessary.

And which version of Node.js are you using?

Thank you for the response,Node version is 6.11.1. Here is the link to branch https://github.com/Divya063/pdf.js/tree/dev

The errors are:

Firefox 43.0.0
SyntaxError: import declarations may only appear at top level
Chrome 60.0.3112
Uncaught SyntaxError: Unexpected token import

This indicates that the code is not transpiled before use. Currently there is built-in support for ES modules is not enabled by default in any browser (more info), so code needs to be transpiled first.

I have edited my initial post to point out where the transpilation is configured in PDF.js's build system. Perhaps you can try to use an existing plugin to integrate istanbul and babel. A quick search shows https://github.com/istanbuljs/babel-plugin-istanbul, but there may be other options too.

(and Firefox's current stable version is 55. You were testing with Firefox 43, which is ancient and not supported. I suggest that you upgrade to a recent version of Firefox before testing again)

@Rob--W Thank you for indicating the errors.I will update soon with the results.

@Rob--W I transpiled the code using karma-browserify and upgraded the firefox version but still lots of errors are coming up.Here is the link of branch https://github.com/Divya063/pdf.js/tree/dev

Can you share the error messages?

And if possible, try to use webpack instead of browserify, because webpack is what we use already. Doing so allows us to instrument the code that is actually used in the browser.

And I also see that you are checking in .idea and other user-specific project/IDE files. When you contribute to an existing project, it's better to not add unrelated files to the project, because it clutters the repository and causes merge conflicts. In the final pull request, these files should not be included.

Is this issue still up? If yes, I would like to work on it.

Yes, feel free to work on this.

@timvandermeij
I was working on this. I've used istanbul to cover my tests and coveralls to display the reports. I've made the required necessary changes wherever needed. However, whenever I run coveralls using

npm run coveralls

I get the following error

npm run coveralls

> [email protected] coveralls /home/shikhar/Desktop/mozillaPdfJs/pdf.js
> npm run cover -- --report lcovonly && cat ./coverage/lcov.info | coveralls


> [email protected] cover /home/shikhar/Desktop/mozillaPdfJs/pdf.js
> istanbul cover test/**/*.js "--report" "lcovonly"

Running test 1/16: test_first_run
Running test 2/16: test_first_run_incognito
Running test 3/16: test_storage_managed_unavailable
Running test 4/16: test_managed_pref
Running test 5/16: test_local_pref
Running test 6/16: test_managed_pref_is_overridden
Running test 7/16: test_run_extension_again
Running test 8/16: test_running_for_a_while
Running test 9/16: test_browser_update
Running test 10/16: test_browser_update_between_pref_toggle
Running test 11/16: test_extension_update
Running test 12/16: test_unofficial_build
Running test 13/16: test_fetch_is_supported
Running test 14/16: test_fetch_not_supported
Running test 15/16: test_fetch_mode_not_supported
Running test 16/16: test_network_offline
All tests completed.
No coverage information was collected, exit without writing coverage information
[error] "2017-12-17T11:00:06.112Z"  'error from lcovParse: ' 'Failed to parse string'
[error] "2017-12-17T11:00:06.116Z"  'input: ' ''
[error] "2017-12-17T11:00:06.116Z"  'error from convertLcovToCoveralls'

/home/shikhar/Desktop/mozillaPdfJs/pdf.js/node_modules/coveralls/bin/coveralls.js:18
        throw err;
        ^
Failed to parse string
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] coveralls: `npm run cover -- --report lcovonly && cat ./coverage/lcov.info | coveralls`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] coveralls script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/shikhar/.npm/_logs/2017-12-17T11_00_06_136Z-debug.log

I tried looking this uphere and here but to no avail. Any help on what the problem might be ?

It's hard to tell without seeing the code. Could you push the code to a branch so contributors here can look along with you?

@timvandermeij here it is. Please ignore the gem file. i've already removed it.

Any comments @timvandermeij

From a bit of Googling it looks like this error means that coveralls is not getting data in the lcov format. You could check if the individual commands in npm run cover -- --report lcovonly && cat ./coverage/lcov.info | coveralls in fact return the results you expect.

@timvandermeij
The main problem currently is this statement
No coverage information was collected, exit without writing coverage information
Because of this the lcov file at all times remains blank and thus this happens

[error] "2017-12-17T11:00:06.112Z"  'error from lcovParse: ' 'Failed to parse string'
[error] "2017-12-17T11:00:06.116Z"  'input: ' ''
[error] "2017-12-17T11:00:06.116Z"  'error from convertLcovToCoveralls'

On googling, it seems that these are very common errors. and the error basically lies with istanbul. I tried switching between different versions of the same but the error kept occurring. However, at all places the testing has been done by mocha and not by lint or unittest etc. and thus most(almost) of the resolutions too are for mocha only. These were some sources I looked up

https://github.com/gotwarlost/istanbul/issues/262
https://github.com/coryhouse/react-slingshot/issues/245
https://github.com/gotwarlost/istanbul/issues/496
and several others too but none of them actually helped :(

The build gets passed on travis ( https://travis-ci.org/shikhar-scs/pdf.js/jobs/318422621 )but again the coverage is not generated.

I'm not really sure why that's happening, but I also find a lot of people that did it successfully with Jasmine, so it must be possible. Could you try if https://bryce.fisher-fleig.org/blog/setting-up-istanbul-with-jasmine/index.html works for you? First just try those exact steps to see if works stand-alone, and then try to integrate it in PDF.js.

@timvandermeij on it
Coverage reports are finally now being generated. However I need to transpile and then test becuase its showing up problems with the import and export statements
Transformation error for /home/shikhar/Desktop/mozillaPdfJs/pdf.js/web/ui_utils.js ; return original code 'import' and 'export' may appear only with 'sourceType: "module"' (16:0)
This error is coming up with each and every js file and I'll work on it and file a PR soon.

@timvandermeij

Here they are : the build passing and calling for coverage.

The coverage reports

However, as the import and export statements exist, even after reaching those files, they arent fully tested and thus we are getting 0% coverage reports. As far as I know I need to babelify these files to ES6 before jasmine testing and that is proving to be a problem. How do I provide ES6 code to jasmine ?
Can I make changes in the gulp file as mentioned here http://jpsierens.com/use-es6-right-now/ ?

You're indeed getting closer to the solution. From the coverage reports it looks like you run the coverage on the lib files, which should already be transpiled to ES6 (see https://github.com/mozilla/pdf.js/blob/6ac9e1c5ed0d5f067872b8482724c171c79566b2/gulpfile.js#L965 and https://github.com/mozilla/pdf.js/blob/6ac9e1c5ed0d5f067872b8482724c171c79566b2/gulpfile.js#L985). Or is the problem that the unit tests themselves are not transpiled? I'm not really familiar with how that works exactly, but if that is the case than some changes to the Gulpfile for the unit tests may be necessary.

@timvandermeij

Or is the problem that the unit tests themselves are not transpiled? I'm not really familiar with how that works exactly, but if that is the case than some changes to the Gulpfile for the unit tests may be necessary.

The problem was I was running the tests in the wrong folder. build>lib folder already contains the entire project in ES6 format and I have now corrected the entire thing i.e. the path of jasmine and coveralls.
Another problem was the --report lcovonly statement. Magically ( I really dont know why ) when I removed this part from the coveralls line, reports started getting generated.Maybe I should've paid more attention to

You could check if the individual commands in npm run cover -- --report lcovonly && cat ./coverage/lcov.info | coveralls in fact return the results you expect.

Thank you for pointing this out.

Finally, we are able to generate reports :tada: and though they look a bit sad but reading the exact files will get you the reason why -

  1. All unexecuted conditional statements are being counted as 'not covered'.
  2. All unexecuted assignment statements too are being counted as 'not covered'.

I obviously havent uploaded the generated reports my self but have hosted them on a link here http://pdfjscoveragereport.bitballoon.com. Please visit these links and you will get the exact report expected.

However these results are not reflected on coveralls.io :cry: I dont know why. Also, I've noticed that even after committing multiple times coveralls is still building my project based on a very old commit and not on a recent one because of which coverage there, though gets generated, but always remains 0 (even though it isnt 0 now). Please help me on how to solve that.

But still npm run coveralls will give the entire coverage reports in this format lying in thebuild/lib/coverage/lcov-report folder.

I hope all of this helps finally, though, our last problem is to show these reports somehow on coveralls.

This is the link for my latest build.https://travis-ci.org/shikhar-scs/pdf.js
This is the link for my latest commit.

Apart from the reports not being generated on coveralls.io everything is fine I guess. So shall I generate a PR as it will attract attention from many more people and might solve this issue earlier instead?

Nice work! It's really good to have an idea of the coverage and the reports finally give us that. Indeed it clearly shows that we need a lot more unit testing, but all the methods we recently added unit tests for are indeed shown as covered in the report, so that looks perfectly fine.

I do wonder if it would be possible to run the coverage over the source files instead of the built files. That makes it easier to understand, because now at http://pdfjscoveragereport.bitballoon.com/lib/display/metadata.js.html I see line 28 not being covered while that's not our code, but automatically generated code instead. If it turns out to be hard, we can just do the current approach as a first version and do this in a follow-up issue.

So shall I generate a PR as it will attract attention from many more people and might solve this issue earlier instead?

Yes, that's a good idea. We can then start the review process and see which issues are left to address.

It's really good to have an idea of the coverage and the reports finally give us that. Indeed it clearly shows that we need a lot more unit testing, but all the methods we recently added unit tests for are indeed shown as covered in the report, so that looks perfectly fine.

While it's certainly true that we could do with a lot more unit-tests, there're large portions of the code-base that probably won't ever get anywhere near "good" enough test-coverage just from unit-tests alone sadly.

As mentioned in https://github.com/mozilla/pdf.js/issues/8632#issue-241690851 there's a few different test-suites, and unless I'm mistaken getting coverage results from gulp browsertest as well would be almost essential to really know what our actual test-coverage looks like.

@Snuffleupagus @timvandermeij

This morning I've extensively tried finding coverage reports in all of the folders individually using the statement cd build && cd lib && istanbul cover --include-all-sources jasmine-node test, changing different directories using cd <directory name> and testing using jasmine-node <directory names and js files> but in vain.

Though tests report are getting generated at times (not always), this is happening because of the one or two ES6 format js files lying in the specific directories (which is only bringing <2~3% of coverage reports). Sadly, any js file which contains the import or export statements is returning an error in this format.

Transformation error for /home/shikhar/Desktop/mozillaPdfJs/pdf.js/src/core/arithmetic_decoder.js ; return original code 'import' and 'export' may appear only with 'sourceType: "module"' (183:0) Unable to post-instrument: /home/shikhar/Desktop/mozillaPdfJs/pdf.js/src/core/arithmetic_decoder.js

And with this error, the file is not checked for coverage at all and thus returns a 0% report.

I do wonder if it would be possible to run the coverage over the source files instead of the built files.

Again, the source folder contains files which have import and export statements and therefore the above errors occur because of which the related files are not checked at all, leading to 0% coverage.

Thus, it is imperative that we test in the build folder itself.

getting coverage results from gulp browsertest as well would be almost essential to really know what our actual test-coverage looks like.

@Snuffleupagus Where do these specific tests lie ? We can try testing them directly using the jasmine-node statement mentioned above.

If it turns out to be hard, we can just do the current approach as a first version and do this in a follow-up issue.

Yep, we could rather do that.

Yes, that's a good idea. We can then start the review process and see which issues are left to address.

Sure, I'll start with that.

The PR at #9308 has shown an example of test coverage for the unit tests only. The generated reports provide little value because our set of unit tests is very small. For more details, see https://github.com/mozilla/pdf.js/pull/9308#issuecomment-353588039

So, to get browser tests, we need:

  1. A way to generate coverage data.
  2. A way to retrieve the coverage data (and upload it to coveralls).

To address 1), gulpfile.js should be edited to optionally add code instrumentation, which is exported to the window.__coverage__ object in a browser. gulp-istanbul might be useful. The documentation seems sparse, but I have found an example at https://stackoverflow.com/questions/38208735/no-window-coverage-object-is-created-by-istanbul-phantomjs. We do NOT use PhantomJS, but you can read the question, answer and linked blog post to deepen your understanding in how everything works.

After finishing step 1, the browser tests will have a window.__coverage__ variable (or whatever you had put in the coverageVariable configuration parameter). To get coverage reports:

  1. Modify the test runner (https://github.com/mozilla/pdf.js/blob/e081a708c36cb2aacff7889048863723fcf23671/test/driver.js) to post the coverage result with XMLHttpRequest to the test server.
  2. In the test server (https://github.com/mozilla/pdf.js/blob/e081a708c36cb2aacff7889048863723fcf23671/test/test.js), register a new hook to receive test results, and write that to a file using the fs Node.js API (maybe after some post-processing, such as converting it to the lcov format if needed).
  3. Upload the report to coveralls (e.g. with the "coveralls" command as shown in #9308).

@Rob--W Thanks for such a detailed review. I'll follow up and revert back as soon as possible.

This comment offers tips for the implementation, and addresses the questions from https://github.com/mozilla/pdf.js/pull/9308#issuecomment-353710595

istanbul only adds instrumentation to code. This input is supposed to be executable code, because "instrumentation" means adding extra JavaScript code that detects when execution passes through that line, statement, etc. If the code is significantly altered again after adding the instrumentation, the resulting coverage report becomes meaningless.

This instrumentation can be done on the fly while the program is being run (e.g. when you run istanbul cover from the command-line, istanbul will intercept calls to Node.js's require and transform the code with instrumentation before the module is loaded), or separately from execution (e.g. as the blog post demonstrates: the instrumented code is generated at the command line, the execution is done in the browser).

In your current PR at #9308, you are invoking istanbul cover with jasmine as the program to run. As I mentioned before, the effect is similar to running gulp unittestcli - i.e. the tests are run against a pre-built library in the build/lib directory (this is configured in test/unit/clitests.json). This explains why your coverage report is showing 0 coverage for everything except for build/lib/ (because the only require-d (Node.js) modules are in build/lib and build/streams/ - see the end of the gulp unittestcli task definition).

To get useful coverage reports, the coverage reports are preferably at a module level. This is a step ahead: you need to integrate istanbul in the build pipeline, so that when the code is transpiled from ES6, instrumentation is added. After that, it becomes more difficult to generate per-module reports (but not impossible, in theory source maps provide enough information to map data to the original files).
This is a challenge, and requires a good understanding of how to use Babel, gulp, istanbul and source maps / modules (I already referred to the relevant places in the source code where PDF.js brings together all modules to generate the PDF.js library - see #8632). This knowledge is very useful, so if you are not afraid of challenges, you can explore this under my guidance.

But before diving in so deeply, let's start with something simpler: getting coverage reports from the browser. We have two ways of running tests in the browser, unittest and browsertest. Since we already have a rather easy way of running unit tests in Node.js, let's focus on browsertest. The browser tests do not use individual modules, but the PDF.js library created by the generic gulp target, in GENERIC_DIR, aka build/generic/. So it suffices to add code instrumentation to build/generic. I suggest to take build/generic/ as an input directory, and write the result to coverage/build/generic.

After doing that, you need to change test/test_slave.html to not unconditionally load ../build/generic/build/pdf.js in a <script> tag, but conditionally load either ../build/generic/build/pdf.js or ../coverage/build/generic/build/pdf.js depending on some configuration parameter (for testing you can just hard-code the latter URL, it is very easy to change this hard-coded parameter later after you have completed the more difficult task of sending the coverage report back to the test server).

Once you have replaced the normal pdf.js library with the instrumented pdf.js library, the coverage statistics will be generated as the test run. The result is stored in the global window.__coverage__ variable. When the in-browser test driver finishes (the _quit method in test/driver.js), you can serialize this report (e.g. using JSON.stringify(window.__coverage__) ) and send this to the server with XMLHttpRequest (see the other locations in the driver.js file for examples - make sure that you send the report to the server BEFORE sending the /tellMeToQuit message, or else the coverage report might not be transmitted correctly).
You can add a new handler for your new custom API call in https://github.com/mozilla/pdf.js/blob/ba5dbc96326518ad716158ef040f61794cc72202/test/test.js . For code examples, look at XMLHttpRequest calls in driver.js (such as the /tellMeToQuit message), and find the corresponding handler in test.js. Once you have received the serialized JSON at the server's side, use the fs.writeFileSync API to write the coverage report to a file (again there are other examples in test.js that shows how you can write files).

@Rob--W I'm currently not available till the new year ... I'll catch on for sure after that

After doing that, you need to change test/test_slave.html to not unconditionally load ../build/generic/build/pdf.js in a