Libelektra: tests spawn unlimited gpg-agents

Created on 19 Apr 2018  ·  36Comments  ·  Source: ElektraInitiative/libelektra

Steps to Reproduce the Problem

  • build elektra for example in a docker container, or check the v2 server
  • run tests make run_nokdbtests
  • ps -ef
  • run tests make run_nokdbtests
  • ps -ef
  • ????
  • wonder where all your pid's went

Expected Result

tests should stop gpg-agents after they are finished

Actual Result

each test run spawns more gpg-agents

System Information

  • Elektra Version: master

Further Log Files and Output

+ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:57 pts/0    00:00:00 bash
root     11296     1  0 07:01 pts/0    00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py 
root     11297 11296  0 07:01 pts/0    00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write 
root     28509     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root     28519     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root     28539     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root     30656     1  0 08:00 pts/0    00:00:00 ps -ef
+ make run_nokdbtests
+ ps -ef
+ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 05:57 pts/0    00:00:00 bash
root     11296     1  0 07:01 pts/0    00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py 
root     11297 11296  0 07:01 pts/0    00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write 
root     28509     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root     28519     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root     28539     1  0 07:55 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root     30778     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.GZbzqb/.gnupg --use-standard-soc
root     30788     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.PEjcKs/.gnupg --use-standard-soc
root     30808     1  0 08:02 ?        00:00:00 gpg-agent --homedir /tmp/elektra-test.d6yL2g/.gnupg --use-standard-soc
root     30923     1  0 08:02 pts/0    00:00:00 ps -ef
bug work in progress

Most helpful comment

keep in mind that if you share your home directory you might not be able to run tests parallely.
And you would still need to delete GNUPGHOME afterwards (you don't want a lingering pgp-agent answering calls for the logged in user right?).

And what would happen if the target system relays on GNUPGHOME, so you would need to save existing env and restore it manually after tests.

I would appreciate if we could take a step back and look at how those tests might influence user machines, not just the test server environment.

All 36 comments

Thank you for reporting the problem!

@petermax2 Is it possible that the gpg commands during the tests spawn up gpg-agents?

Ooops I thought that gpg would always connect to the same agent. I will investigate.

@markus2330 this is also the reason why there are so many gpg agents on v2 reported with your userid, as the docker container runs with 1000:1000.

but the problem is not restricted to docker: debian-stretch-minimal has > 250 of them as well

some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)

Thank you both for looking into this!

some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)

If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).

Maybe the gpg agent is not required to start at all and we can suppress it during the tests. But I have to have a look at it in the evening.

mh usually GPG_AGENT_INFO should be set when one is started, in the past we cleaned out environment variables so that might have explained the multiple starts in the past. No idea why it is still happening right now though...

@petermax2 the tests that require gpg-agent (found by renaming gpg-agent to gpg-agent.bak ;)):

  • testmod_fcrypt
  • testmod_crypto_openssl
  • testmod_crypto_gcrypt

testmod_crypto_botan should run exactly like testmod_crypto_gcrypt and testmod_crypto_openssl. Is the Botan test running on the server?

@petermax2 probably yes. in the environment where i tested there was no botan installed. it is running however here and probably also spawning agents.

It's not that simple. I tried to invoke gpg with the --no-autostart argument during the unit tests, however gpg still starts the agent. --no-use-agent is a funny one. The man page reads:

--no-use-agent 
              This is dummy option. gpg2 always requires the agent.

If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).

Could we give this a shot?

Or have a cron-job like

pgrep gpg-agent | xargs -d "\n" kill

or something similar on the build servers/containers?

I would have the test check if an agent is available, if not start it and retain it's pid. in the test cleanup stop the agent. everything else is a hack.

You are right, the only question is where the start and stop should happen. Doing this within our agents/dockers seems to be easier than in our unit tests written in C.

Here is what I learned so far:

It is possible to suppress the auto-start of the gpg-agent with the --no-autostart option, if consistently used with all gpg calls. However, without a gpg-agent gpg2 can not perform any operations, that require the private key (i.e. decryption, signatures).

It is also possible to fork gpg-agent --server but then gpg2 can not connect to the agent. The environment variable GPG_AGENT_INFO is deprecated and is not considered any longer by gpg2.

I will try to fork and execv gpg-agent --daemon. I just need a way to find out the PID of the started gpg-agent so that I can SIGTERM when the tests are done.

Doing this within our agents/dockers seems to be easier than in our unit tests written in C.

Much easier, I guess :-)

I think your decision was right to simply use the default-way of gpg to connect to agents.

As alternative to starting/stopping gpg-agent, we can also disable the "use-agent" in .gnupg/gpg.conf

i have no problem with one agent autostarting (and even have it running). I have a problem with subsequent tests starting a new one

I think your decision was right to simply use the default-way of gpg to connect to agents.

In a production environment it is the better option. On my machine crypto and fcrypt always connect to the same agent and the integration with my Yubikey works very well.

in our test environments we mus keep a single instance of the agent up and running before starting the tests. I think the problem is that we clear the environemnt, as @ingwinlu mentioned before.

I think the problem is that we clear the environemnt

we shouldn't anymore. but the issue persists

If gpg-agent tries to communicate via environment it obviously cannot work, the next test run would never get the environment set by a test run before.

I like following two options best:

  1. we properly start/stop a gpg agent within the containers and document in TESTING.md that gpg agent needs to be running (see #1888).
  2. we disable startup of gpg agents (disable the "use-agent" in .gnupg/gpg.conf should work, did not test it though) and document this in TESTING.md (see #1888).

A setup, where daemons get started on-demand without a global way to know if the daemon has been started already (and env vars are not global but process-specific), seems to be broken. We should not try to fix this within the tests.

https://stackoverflow.com/questions/27459869/how-to-stop-gpg-2-1-spawning-many-agents-for-unit-testing

The reason you're spawning lots of agents is the different home directory using the --homedir option, otherwise a single one would have been used. From GnuPG 2.1, all communication with the agent is performed through a socket in the GnuPG homedirectory.

We do not use the homedir option. And https://dev.gnupg.org/T3218 describes the workaround of stackoverflow as "a (very awkward) workaround".

Maybe simply starting the gpg-agent is the most future-proof variant (in a controlled way within our environment). Seems like they in recent versions the startup of gpg-agent is not optional anymore. (which makes my option 2. above nonsensical)

We do not use the homedir option.

Yeah I have not found where it comes from but it matches the problem (see op) as all the agents spawned with a different one.

It was a good hint, I learned that startup of gpg-agent is not optional anymore.

Which makes it very clear that we need to start and stop it. And not try to avoid the starting.

We do not use the homedir option.

Yeah I have not found where it comes from but it matches the problem (see op)

We don't use the --home-dir option explicitly, but ps -ef revelas that gpg somehow sets it anyway.

https://wiki.archlinux.org/index.php/GnuPG

$GNUPGHOME is used by GnuPG to point to the directory where its configuration files are stored. By default $GNUPGHOME is not set and your $HOME is used instead; thus, you will find a ~/.gnupg directory right after installation.
To change the default location, either run gpg this way $ gpg --homedir path/to/file or set the GNUPGHOME environment variable.
```
@petermax2 can you check if HOME is available in your testsuite?

also interesting https://www.gnupg.org/documentation/manuals/gnupg/Ephemeral-home-directories.html:

Create a temporary directory, create (or copy) a configuration that meets your needs, make gpg use this directory either using the environment variable GNUPGHOME, or the option --homedir. GPGME supports this too on a per-context basis, by modifying the engine info of contexts. Now execute whatever operation you like, import and export key material as necessary. Once finished, you can delete the directory. All GnuPG backend services that were started will detect this and shut down

Tested this in my container and it cleaned up the process automatically as promised.

@petermax2 can you check if HOME is available in your testsuite?

Yes, HOME is available:

HOME = /tmp/elektra-test.3vLR4L

OK so something in the test suite is overriding HOME into a tmp directory (which is good). If that is still available during cleanup it should just be removed to stop the agent. That would be an ideal fix.

If we simply set GNUPGHOME only one instance of gpg-agent is spawned. GNUPGHOME is not overwritten before the test starts.

With GNUPGHOME set, only one single gpg-agent is running after mulitple test runs.

I think this is the simplest solution.

keep in mind that if you share your home directory you might not be able to run tests parallely.
And you would still need to delete GNUPGHOME afterwards (you don't want a lingering pgp-agent answering calls for the logged in user right?).

And what would happen if the target system relays on GNUPGHOME, so you would need to save existing env and restore it manually after tests.

I would appreciate if we could take a step back and look at how those tests might influence user machines, not just the test server environment.

you might not be able to run tests parallely.

I ran the script:

#!/bin/bash
mkdir /tmp/x
export GNUPGHOME=/tmp/x
for run in {1..1000000}
do
    ctest -R crypto_openssl &
done

without any problems. GPG should handle locking, etc.

you don't want a lingering pgp-agent answering calls for the logged in user right?

This is the way gpg-agent was designed: it is running forever until the user session ends. It does not write out its PID to some place, there are no commands to quit it. It only reacts to SIGTERM.

I tried to fork the gpg-agent from within the unit test with the --server option, so we would have a PID to kill afterwards. But then gpg-agent does not open the required sockets at $GNUPGHOME and the unit tests re-open another instance of the agent (which is running in --daemon mode). Also there is no way of making gpg-agent opening any sockets when in --server mode (I checked this with the source code of gpg-agent).

gpg-agent is hard to control and hardly documented. I was even reading the source code of gpg-agent. Our use case is not covered. The only option is SIGTERM.

parallelism

I was more thinking about you want to separate gpg-agents that should not influence each other. i.e. you only want agent a to have key of test a, and agent b to have key for test b. If that is not needed then a hardcoded tmp home is ok.

killing gpg-agent

When first investigating the issue I came across a website (linked above) that stated that the expected way to shut down a temp gpg-agent is to delete its gpg home directory.

So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.

So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.

It works! I will integrate this fix into the crypto and fcrypt test cases. Thank you for the tip!

I have a working prototype. PR is coming tomorrow.

Should be fixed with #2056 . Please re-open if the problem still occurs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kodebach picture kodebach  ·  26Comments

sanssecours picture sanssecours  ·  57Comments

PhilippGackstatter picture PhilippGackstatter  ·  45Comments

KurtMi picture KurtMi  ·  85Comments

ingwinlu picture ingwinlu  ·  35Comments