make run_nokdbtests
ps -ef
make run_nokdbtests
ps -ef
tests should stop gpg-agents after they are finished
each test run spawns more gpg-agents
+ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:57 pts/0 00:00:00 bash
root 11296 1 0 07:01 pts/0 00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py
root 11297 11296 0 07:01 pts/0 00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write
root 28509 1 0 07:55 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root 28519 1 0 07:55 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root 28539 1 0 07:55 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root 30656 1 0 08:00 pts/0 00:00:00 ps -ef
+ make run_nokdbtests
+ ps -ef
+ ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 05:57 pts/0 00:00:00 bash
root 11296 1 0 07:01 pts/0 00:00:00 sh -c /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py
root 11297 11296 0 07:01 pts/0 00:00:00 /usr/bin/python2 /root/cppcms-1.2.0/tests/http_timeouts_test.py write
root 28509 1 0 07:55 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.NmmZ2I/.gnupg --use-standard-soc
root 28519 1 0 07:55 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.6mb1t2/.gnupg --use-standard-soc
root 28539 1 0 07:55 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.5XdxDR/.gnupg --use-standard-soc
root 30778 1 0 08:02 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.GZbzqb/.gnupg --use-standard-soc
root 30788 1 0 08:02 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.PEjcKs/.gnupg --use-standard-soc
root 30808 1 0 08:02 ? 00:00:00 gpg-agent --homedir /tmp/elektra-test.d6yL2g/.gnupg --use-standard-soc
root 30923 1 0 08:02 pts/0 00:00:00 ps -ef
Thank you for reporting the problem!
@petermax2 Is it possible that the gpg commands during the tests spawn up gpg-agents?
Ooops I thought that gpg would always connect to the same agent. I will investigate.
@markus2330 this is also the reason why there are so many gpg agents on v2 reported with your userid, as the docker container runs with 1000:1000.
but the problem is not restricted to docker: debian-stretch-minimal has > 250 of them as well
some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)
Thank you both for looking into this!
some nodes are not affected because they are setup to spawn a gpg-agent for jenkins which gets used by the tests (probably, have to confirm)
If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).
Maybe the gpg agent is not required to start at all and we can suppress it during the tests. But I have to have a look at it in the evening.
mh usually GPG_AGENT_INFO
should be set when one is started, in the past we cleaned out environment variables so that might have explained the multiple starts in the past. No idea why it is still happening right now though...
@petermax2 the tests that require gpg-agent (found by renaming gpg-agent to gpg-agent.bak ;)):
testmod_fcrypt
testmod_crypto_openssl
testmod_crypto_gcrypt
testmod_crypto_botan
should run exactly like testmod_crypto_gcrypt
and testmod_crypto_openssl
. Is the Botan test running on the server?
@petermax2 probably yes. in the environment where i tested there was no botan installed. it is running however here and probably also spawning agents.
It's not that simple. I tried to invoke gpg
with the --no-autostart
argument during the unit tests, however gpg still starts the agent. --no-use-agent
is a funny one. The man page reads:
--no-use-agent
This is dummy option. gpg2 always requires the agent.
If we cannot find a way to kill the agents we start, we can simply require that the environment already has a gpg-agent (#1888).
Could we give this a shot?
Or have a cron-job like
pgrep gpg-agent | xargs -d "\n" kill
or something similar on the build servers/containers?
I would have the test check if an agent is available, if not start it and retain it's pid. in the test cleanup stop the agent. everything else is a hack.
You are right, the only question is where the start and stop should happen. Doing this within our agents/dockers seems to be easier than in our unit tests written in C.
Here is what I learned so far:
It is possible to suppress the auto-start of the gpg-agent
with the --no-autostart
option, if consistently used with all gpg
calls. However, without a gpg-agent
gpg2
can not perform any operations, that require the private key (i.e. decryption, signatures).
It is also possible to fork gpg-agent --server
but then gpg2
can not connect to the agent. The environment variable GPG_AGENT_INFO
is deprecated and is not considered any longer by gpg2
.
I will try to fork and execv gpg-agent --daemon
. I just need a way to find out the PID of the started gpg-agent
so that I can SIGTERM
when the tests are done.
Doing this within our agents/dockers seems to be easier than in our unit tests written in C.
Much easier, I guess :-)
I think your decision was right to simply use the default-way of gpg to connect to agents.
As alternative to starting/stopping gpg-agent, we can also disable the "use-agent" in .gnupg/gpg.conf
i have no problem with one agent autostarting (and even have it running). I have a problem with subsequent tests starting a new one
I think your decision was right to simply use the default-way of gpg to connect to agents.
In a production environment it is the better option. On my machine crypto
and fcrypt
always connect to the same agent and the integration with my Yubikey works very well.
in our test environments we mus keep a single instance of the agent up and running before starting the tests. I think the problem is that we clear the environemnt, as @ingwinlu mentioned before.
I think the problem is that we clear the environemnt
we shouldn't anymore. but the issue persists
If gpg-agent tries to communicate via environment it obviously cannot work, the next test run would never get the environment set by a test run before.
I like following two options best:
A setup, where daemons get started on-demand without a global way to know if the daemon has been started already (and env vars are not global but process-specific), seems to be broken. We should not try to fix this within the tests.
The reason you're spawning lots of agents is the different home directory using the --homedir option, otherwise a single one would have been used. From GnuPG 2.1, all communication with the agent is performed through a socket in the GnuPG homedirectory.
We do not use the homedir option. And https://dev.gnupg.org/T3218 describes the workaround of stackoverflow as "a (very awkward) workaround".
Maybe simply starting the gpg-agent is the most future-proof variant (in a controlled way within our environment). Seems like they in recent versions the startup of gpg-agent is not optional anymore. (which makes my option 2. above nonsensical)
We do not use the homedir option.
Yeah I have not found where it comes from but it matches the problem (see op) as all the agents spawned with a different one.
It was a good hint, I learned that startup of gpg-agent is not optional anymore.
Which makes it very clear that we need to start and stop it. And not try to avoid the starting.
We do not use the homedir option.
Yeah I have not found where it comes from but it matches the problem (see op)
We don't use the --home-dir
option explicitly, but ps -ef
revelas that gpg
somehow sets it anyway.
https://wiki.archlinux.org/index.php/GnuPG
$GNUPGHOME is used by GnuPG to point to the directory where its configuration files are stored. By default $GNUPGHOME is not set and your $HOME is used instead; thus, you will find a ~/.gnupg directory right after installation.
To change the default location, either run gpg this way $ gpg --homedir path/to/file or set the GNUPGHOME environment variable.
```
@petermax2 can you check if HOME is available in your testsuite?
also interesting https://www.gnupg.org/documentation/manuals/gnupg/Ephemeral-home-directories.html:
Create a temporary directory, create (or copy) a configuration that meets your needs, make gpg use this directory either using the environment variable GNUPGHOME, or the option --homedir. GPGME supports this too on a per-context basis, by modifying the engine info of contexts. Now execute whatever operation you like, import and export key material as necessary. Once finished, you can delete the directory. All GnuPG backend services that were started will detect this and shut down
Tested this in my container and it cleaned up the process automatically as promised.
@petermax2 can you check if HOME is available in your testsuite?
Yes, HOME
is available:
HOME = /tmp/elektra-test.3vLR4L
OK so something in the test suite is overriding HOME into a tmp directory (which is good). If that is still available during cleanup it should just be removed to stop the agent. That would be an ideal fix.
If we simply set GNUPGHOME
only one instance of gpg-agent
is spawned. GNUPGHOME
is not overwritten before the test starts.
With GNUPGHOME
set, only one single gpg-agent
is running after mulitple test runs.
I think this is the simplest solution.
keep in mind that if you share your home directory you might not be able to run tests parallely.
And you would still need to delete GNUPGHOME afterwards (you don't want a lingering pgp-agent answering calls for the logged in user right?).
And what would happen if the target system relays on GNUPGHOME, so you would need to save existing env and restore it manually after tests.
I would appreciate if we could take a step back and look at how those tests might influence user machines, not just the test server environment.
you might not be able to run tests parallely.
I ran the script:
#!/bin/bash
mkdir /tmp/x
export GNUPGHOME=/tmp/x
for run in {1..1000000}
do
ctest -R crypto_openssl &
done
without any problems. GPG should handle locking, etc.
you don't want a lingering pgp-agent answering calls for the logged in user right?
This is the way gpg-agent
was designed: it is running forever until the user session ends. It does not write out its PID to some place, there are no commands to quit it. It only reacts to SIGTERM
.
I tried to fork
the gpg-agent
from within the unit test with the --server
option, so we would have a PID to kill
afterwards. But then gpg-agent
does not open the required sockets at $GNUPGHOME
and the unit tests re-open another instance of the agent (which is running in --daemon
mode). Also there is no way of making gpg-agent
opening any sockets when in --server
mode (I checked this with the source code of gpg-agent
).
gpg-agent
is hard to control and hardly documented. I was even reading the source code of gpg-agent
. Our use case is not covered. The only option is SIGTERM
.
parallelism
I was more thinking about you want to separate gpg-agents that should not influence each other. i.e. you only want agent a to have key of test a, and agent b to have key for test b. If that is not needed then a hardcoded tmp home is ok.
killing gpg-agent
When first investigating the issue I came across a website (linked above) that stated that the expected way to shut down a temp gpg-agent is to delete its gpg home directory.
So if you set GNUPGHOME to /tmp/elektra_tests/gpg
and during test cleanup delete this tmp directory it should be fine.
So if you set GNUPGHOME to /tmp/elektra_tests/gpg and during test cleanup delete this tmp directory it should be fine.
It works! I will integrate this fix into the crypto
and fcrypt
test cases. Thank you for the tip!
I have a working prototype. PR is coming tomorrow.
Should be fixed with #2056 . Please re-open if the problem still occurs.
Most helpful comment
keep in mind that if you share your home directory you might not be able to run tests parallely.
And you would still need to delete GNUPGHOME afterwards (you don't want a lingering pgp-agent answering calls for the logged in user right?).
And what would happen if the target system relays on GNUPGHOME, so you would need to save existing env and restore it manually after tests.
I would appreciate if we could take a step back and look at how those tests might influence user machines, not just the test server environment.