Libelektra: multiresolver: crash when used with cache

Created on 13 May 2019  ·  24Comments  ·  Source: ElektraInitiative/libelektra

src/plugins/resolver/resolver.c:1175 seems to cause a crash in some situations.

Tried to reproduce. Does not work:

# create a new folder to not mess up with existing data
mkdir x
cd x

# create two mountpoints
kdb mount `pwd`/csv system/tests/csv/lists/cur csvstorage header=colname,columns/index=student/id
kdb mount -R multifile -c storage="ini",pattern="*/*",resolver="resolver" `pwd`/multi system/tests/multi

# create a csv file
echo "student/id,ue/5/kreuzerl" >> csv
echo "01234567,X" >> csv

# create a multiresolver directory
mkdir -p multi/pool
cd multi/pool
echo "[]" >> 01234567 >> 01234568 >> 01234569
echo "[student]" >> 01234567 >> 01234568 >> 01234569
echo "id = 01234567" >> 01234567
echo "[ue/5]" >> 01234567

 # create caches
kdb ls system/tests > /dev/null
kdb ls system/tests/multi/pool > /dev/null
kdb ls system/tests/csv/lists/cur > /dev/null

# now do something directly on the files
rm 01234569
touch 01234566
echo "kreuzerl = O" >> 01234567
echo "[something]" >> 01234567 >> 01234568
echo ""  >> 01234567 >> 01234568
echo ""  >> 01234568

# trigger
kdb cp -rf system/tests/csv/lists/cur system/tests/multi/pool

# debug
kdb export system/tests mini
tail *

kdb umount system/tests/csv/lists/cur
kdb umount system/tests/multi
cd ../../..
rm -r x

Did not work:

# create a new folder to not mess up with existing data
mkdir x
cd x

# create two mountpoints
kdb mount `pwd`/csv system/tests/csv csvstorage header=colname,columns/index=sec/somekey 
kdb mount -R multifile -c storage="ini",pattern="*",resolver="resolver" `pwd`/multi system/tests/multi

# create a csv file
echo "sec/somekey,othersec/deep/otherkey" >> csv
echo "a,data2a" >> csv
# echo "b,data2b" >> csv # but do not write in the other

# create a multiresolver directory
mkdir multi
cd multi
echo "[sec]" >> a >> b
echo "somekey = a" >> a
echo "somekey = b" >> b
echo "" >> a >> b
echo "" >> a >> b

kdb cp -rf system/tests/csv system/tests/multi

kdb umount system/tests/csv
kdb umount system/tests/multi

The problem seems to be filename=0x2 <error: Cannot access memory at address 0x2>, most likely set wrongly by the multiresolver?

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f902d39542a in __GI_abort () at abort.c:89
#2  0x0000564e84a716fc in catchSignal (signum=<optimized out>) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  strlen () at ../sysdeps/x86_64/strlen.S:106
#5  0x00007f902d3a9da8 in _IO_vfprintf_internal (s=s@entry=0x7fffc1371d80, format=<optimized out>, format@entry=0x7f902c448ddd "the file \"%s\" because of \"%s\"", ap=ap@entry=0x7fffc1371f48) at vfprintf.c:1637
#6  0x00007f902d457cf6 in ___vsnprintf_chk (s=0x564e85b9e3b0 "the file \"o-\220\177", maxlen=<optimized out>, maxlen@entry=512, flags=flags@entry=1, slen=slen@entry=18446744073709551615, 
    format=format@entry=0x7f902c448ddd "the file \"%s\" because of \"%s\"", args=args@entry=0x7fffc1371f48) at vsnprintf_chk.c:63
#7  0x00007f902dc9cd34 in vsnprintf (__ap=0x7fffc1371f48, __fmt=0x7f902c448ddd "the file \"%s\" because of \"%s\"", __n=512, __s=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/stdio2.h:77
#8  elektraVFormat (format=format@entry=0x7f902c448ddd "the file \"%s\" because of \"%s\"", arg_list=arg_list@entry=0x7fffc1371f48)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/internal.c:430
#9  0x00007f902c43f3bb in elektraAddWarningf36 (warningKey=warningKey@entry=0x564e85b09330, reason=0x7f902c448ddd "the file \"%s\" because of \"%s\"", 
    file=0x7f902c448578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c", line=0x7f902c448dd8 "1175", line=0x7f902c448dd8 "1175", 
    file=0x7f902c448578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c", reason=0x7f902c448ddd "the file \"%s\" because of \"%s\"")
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/obj-x86_64-linux-gnu/src/include/kdberrors.h:3458
#10 0x00007f902c43f4a4 in elektraUnlinkFile (filename=0x2 <error: Cannot access memory at address 0x2>, parentKey=parentKey@entry=0x564e85b09330)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c:1175
#11 0x00007f902c440cc5 in libelektra_resolver_fm_hpu_b_fm_hpu_b_LTX_elektraPluginerror (handle=<optimized out>, r=<optimized out>, parentKey=0x564e85b09330)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/resolver/resolver.c:1191
#12 0x00007f9029b7cb2d in elektraMultifileError (handle=0x564e85a68a10, returned=0x564e85b80490, parentKey=0x564e85b09330)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/plugins/multifile/multifile.c:900
#13 0x00007f902deb2bf6 in elektraSetRollback (parentKey=0x564e85b09330, split=0x564e85b7b520) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1331
#14 kdbSet (handle=0x564e85a3dce0, ks=0x564e85b0c0c0, parentKey=0x564e85b09330) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/libs/elektra/kdb.c:1564
#15 0x0000564e84a3fc59 in kdb::KDB::set (parentKey=..., returned=..., this=0x564e85a3dc78)
    at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/bindings/cpp/include/kdb.hpp:229
#16 CpCommand::execute (this=0x564e85a3dc70, cl=...) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/cp.cpp:111
#17 0x0000564e84a23659 in main (argc=<optimized out>, argv=0x7fffc1372bd8) at /home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA@2/libelektra/src/tools/kdb/main.cpp:198
bug urgent

All 24 comments

Further infos:

kdb ls works without any problems (and is very fast). The crash happens, when doing a kdb cp inside the multiresolver, i.e. with kdbSet.

The problem might be, that the multiresolver has some internal state (filename) which does not get recovered when it is not called in kdbGet because of the cache hit.

I added "urgent" because I needed to disable the cache because of this problem:

cd /usr/lib/x86_64-linux-gnu/elektra4 && sudo mv libelektra-cache.so libelektra-cache.so-backup

Thank you for reporting! I'll prioritize this and hopefully fix it today.

I could not reproduce this, but I found a different bug (#2702).

Your valgrind log suggests this was an error in kdbSet, causing a kdbError call to multifile resolver. Can you be more specific how you triggered this, so I can reproduce it?

I added steps to reproduce above.

The new backtrace from these steps (Hopfully the same. I simplified some steps from the script that originally caused the problem):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f1ca7f2642a in __GI_abort () at abort.c:89
#2  0x0000564d997ef1cc in catchSignal (signum=<optimized out>) at ./src/tools/kdb/main.cpp:110
#3  <signal handler called>
#4  strlen () at ../sysdeps/x86_64/strlen.S:106
#5  0x00007f1ca7f3ada8 in _IO_vfprintf_internal (s=s@entry=0x7fffca6e3580, format=<optimized out>, 
    format@entry=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", ap=ap@entry=0x7fffca6e3748) at vfprintf.c:1637
#6  0x00007f1ca7fe8cf6 in ___vsnprintf_chk (s=0x564d9a52cd50 "the file \"(\250\034\177", maxlen=<optimized out>, maxlen@entry=512, flags=flags@entry=1, 
    slen=slen@entry=18446744073709551615, format=format@entry=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", args=args@entry=0x7fffca6e3748)
    at vsnprintf_chk.c:63
#7  0x00007f1ca882dd34 in vsnprintf (__ap=0x7fffca6e3748, __fmt=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", __n=512, __s=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/stdio2.h:77
#8  elektraVFormat (format=format@entry=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", arg_list=arg_list@entry=0x7fffca6e3748)
    at ./src/libs/elektra/internal.c:430
#9  0x00007f1ca71d33bb in elektraAddWarningf36 (warningKey=warningKey@entry=0x564d9a527a00, reason=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"", 
    file=0x7f1ca71dc578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA/libelektra/src/plugins/resolver/resolver.c", line=0x7f1ca71dcdd8 "1175", line=0x7f1ca71dcdd8 "1175", 
    file=0x7f1ca71dc578 "/home/jenkins/workspace/libelektra_master-Q2SIBK3KE2NBEMJ4WVGJXAXCSCB77DUBUULVLZDKHQEV3WNDXBMA/libelektra/src/plugins/resolver/resolver.c", reason=0x7f1ca71dcddd "the file \"%s\" because of \"%s\"") at ./obj-x86_64-linux-gnu/src/include/kdberrors.h:3458
#10 0x00007f1ca71d34a4 in elektraUnlinkFile (filename=0x21 <error: Cannot access memory at address 0x21>, parentKey=parentKey@entry=0x564d9a527a00)
    at ./src/plugins/resolver/resolver.c:1175
#11 0x00007f1ca71d4cc5 in libelektra_resolver_fm_hpu_b_fm_hpu_b_LTX_elektraPluginerror (handle=<optimized out>, r=<optimized out>, parentKey=0x564d9a527a00)
    at ./src/plugins/resolver/resolver.c:1191
#12 0x00007f1ca4d1bb2d in elektraMultifileError (handle=0x564d9a499120, returned=0x564d9a5424c0, parentKey=0x564d9a527a00)
    at ./src/plugins/multifile/multifile.c:900
#13 0x00007f1ca8c46bf6 in elektraSetRollback (parentKey=0x564d9a527a00, split=0x564d9a52c2d0) at ./src/libs/elektra/kdb.c:1331
#14 kdbSet (handle=0x564d9a44ff70, ks=0x564d9a544180, parentKey=0x564d9a527a00) at ./src/libs/elektra/kdb.c:1564
#15 0x0000564d997bb5d9 in kdb::KDB::set (parentKey=..., returned=..., this=0x564d9a44ff08) at ./src/bindings/cpp/include/kdb.hpp:229
#16 CpCommand::execute (this=0x564d9a44ff00, cl=...) at ./src/tools/kdb/cp.cpp:111
#17 0x0000564d9979ed69 in main (argc=<optimized out>, argv=0x7fffca6e43f8) at ./src/tools/kdb/main.cpp:198

I just see that the steps above also crash without the cache plugin. The original script, however, only crashed with the cache plugin enabled... So maybe it is not the same bug.

Thank you for the details!

I just see that the steps above also crash without the cache plugin.

That is what I thought too, but I was not sure. I‘ll look into it nevertheless. As I mentioned there is at least one other critical bug with the cache.

I checked out version 0.8.25 / 6978802a6 with no cache at all (neither core nor in multifile). I get the same segfault there, so it seems like it has nothing to do with the cache.

Yes, multiresolver is very buggy and it does not support creating new files. Nevertheless, there was a case where it crashed only with the cache and worked after disabling cache.

I now investigated this problem in detail and the problem is very tricky. I could not reproduce it without local cache files. With my local cache files, it is easy to trigger and I found a quite minimal set of files to trigger it. The CSV only contains:

student/id,ue/5/kreuzerl
01234567,X

and I needed only two INI files:

tail *
==> 01234567 <==
[]
[student]
id = 01234567
[ue/5]
kreuzerl = X

==> 01234568 <==
[]
[student]

Then the cp crashed only if cache is enabled and a .cache of the INI files exist.

I updated the top-post but as said, without the problematic cache file it does not work. Unfortunately, the cache file is big (2.2MB) and contains private data.

Do you have some idea what it could be?

Unfortunately, I have no idea what it could be. The example you gave is quite elaborate but it works fine in our debian stretch docker image.

We already have a few separate issues here:

  1. The trace of the segfault is easy to reproduce by kdb cp-ying some key from another backend into the multifile backend. (which was your first example).

  2. Copying stuff between two files inside one mutifile backend causes a corrupt cache. (#2702)

  3. The complex problem that we can't easily reproduce.

I can only suggest that we start with the first two which are easy to reproduce, make regression tests, fix and then move on to the third problem.

Of course the obvious problems should be fixed first. I thought you already fixed problems 1+2 in some branch and are waiting for a test to reproduce this issue.

@markus2330 the segfault should be resolved by #2739. Can you verify this?

Looks good, it does not crash. But it also does not speed-up. Maybe it is disabled because of INI?

But I think we'll leave it for now. Some things have really amazing speedup, e.g. parsing a huge JSON file is reduced from 2.5sec to 0.17sec. What a pity that the multifile and INI caused so many troubles.

Oops, I was too fast. It still crashes. Interestingly only in 2 of 3 very similar copy operations. (I first tried the one that works.)

And it is still the case, that it does not crash if I rename the file libelektra-cache.so

Can you maybe improve your INI check so that you disable also the cache if INI is used within multifile?

Yes, I noticed that ini was not disabled in multifile. I do have to say that in my tests the cache worked. I'll have to investigate the slowdown (#2696) separately. Let's focus on the crash here.

Do you have a new trace for the crash/segfault or is it exactly the same?

I think the only way to reproduce this is that I send you the data or prepare some image for you to reproduce. With the data it is relatively straight-forward to reproduce (the second copy crashes). Do you really want to go into this? I am afraid it will be very time consuming for you to even find out what the problem is.

Ok. I won't have an excessive amount of time to waste on that (right now, maybe later). Let's talk / take a look at it at our next meeting.

EDIT: I'll "blacklist" ini inside multifile today. I had another fix waiting too.

Yes, maybe we see the problem when looking at it together. Last time we found the problem quite quickly :smile:

Good news: when I remount the whole multiresolver-ini as a single dump file and remove the cache file, it does not crash anymore.

So the deactivation of multiresolver+ini should fix the problem.

Did you already think about how you could detect that plugins behave in wrong ways (like INI)? Would be a nice question for the next thesis.

Btw. the cache is 165MB for a 55MB dump file.

Very nice! The last time this bug was also not caused by the cache, but by a wrong internal state of multifile.

Did you already think about how you could detect that plugins behave in wrong ways (like INI)? Would be a nice question for the next thesis.

No. To be honest, I didn't think about such problems at all when I started working on the cache. In general, it sounds like you're asking about the halting problem, but one can always "do some testing" and see what happens.

Btw. the cache is 165MB for a 55MB dump file.

I know that the size is larger. The cache is the same as the in-memory data structure. Unused parts of the keyset array etc. are written to disk as-is.

So the deactivation of multiresolver+ini should fix the problem.

It's done in #2750, just waiting for the builds to succeed. It would still be nice to know why/where it crashes.

No. To be honest, I didn't think about such problems at all when I started working on the cache. In general, it sounds like you're asking about the halting problem, but one can always "do some testing" and see what happens.

If the problems are about internal state then there is not much you can do. With @vLesk changes we will hopefully get all plugins, including resolvers, stateless.

I know that the size is larger. The cache is the same as the in-memory data structure. Unused parts of the keyset array etc. are written to disk as-is.

I think the size is quite good. Obviously there are many KeySets with ony a few elements (as INI adds meta data to every key).

It would still be nice to know why/where it crashes.

I also tested with multiresolver+ni, the crashes also appear then. So the bug seems to be in multiresolver. I get output like:

 Sorry, 98 warnings were issued ;(
        Sorry, module resolver issued the warning 36:
        could not unlink file: the file "L�����H�����H�=�1" because of "No such file or directory"

Which looks like some memory corruption. Running everything with valgrind, I get following output:

==10949== Memcheck, a memory error detector
==10949== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==10949== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==10949== Command: kdb cp -rf system/lehre/ep2/lists/adhoc/di11a system/lehre/ep2/students/pool
==10949== 
==10949== Warning: invalid file descriptor 1031 in syscall open()
==10949== Invalid read of size 4
==10949==    at 0x6F31CB1: libelektra_resolver_fm_hpu_b_fm_hpu_b_LTX_elektraPluginerror (resolver.c:1184)
==10949==    by 0x97F8B2C: elektraMultifileError (multifile.c:906)
==10949==    by 0x50C1BF5: elektraSetRollback (kdb.c:1331)
==10949==    by 0x50C1BF5: kdbSet (kdb.c:1564)
==10949==    by 0x16A838: set (kdb.hpp:229)
==10949==    by 0x16A838: CpCommand::execute(Cmdline const&) (cp.cpp:111)
==10949==    by 0x14DD63: main (main.cpp:198)
==10949==  Address 0x1140ade0 is 32 bytes inside a block of size 82 free'd
==10949==    at 0x4C2CDDB: free (vg_replace_malloc.c:530)
==10949==    by 0x54D7634: keyClear (key.c:515)
==10949==    by 0x54D77E0: keyDel (key.c:463)
==10949==    by 0x97F8D7D: flagUpdateBackends (multifile.c:842)
==10949==    by 0x97F8D7D: elektraMultifileSet (multifile.c:854)
==10949==    by 0x50C0E20: elektraSetPrepare (kdb.c:1210)
==10949==    by 0x50C0E20: kdbSet (kdb.c:1513)
==10949==    by 0x16A838: set (kdb.hpp:229)
==10949==    by 0x16A838: CpCommand::execute(Cmdline const&) (cp.cpp:111)
==10949==    by 0x14DD63: main (main.cpp:198)
==10949==  Block was alloc'd at
==10949==    at 0x4C2DDCF: realloc (vg_replace_malloc.c:785)
==10949==    by 0x54D6C13: elektraRealloc (internal.c:238)
==10949==    by 0x54D8F80: keyAddName (keyname.c:987)
==10949==    by 0x54D9190: elektraKeySetName (keyname.c:572)
==10949==    by 0x54D7DC3: keyVInit (keyhelpers.c:344)
==10949==    by 0x54D74E4: keyVNew (key.c:215)
==10949==    by 0x54D758D: keyNew (key.c:197)
==10949==    by 0x97F8DAF: flagUpdateBackends (multifile.c:824)
==10949==    by 0x97F8DAF: elektraMultifileSet (multifile.c:854)
==10949==    by 0x50C0E20: elektraSetPrepare (kdb.c:1210)
==10949==    by 0x50C0E20: kdbSet (kdb.c:1513)
==10949==    by 0x16A838: set (kdb.hpp:229)
==10949==    by 0x16A838: CpCommand::execute(Cmdline const&) (cp.cpp:111)
==10949==    by 0x14DD63: main (main.cpp:198)
... and much much more (1.8MB)

I tried to increase the number of file descriptors but this does not seem to have influence. So maybe the multiresolver remembered (via the cache) some file descriptor wrongly?

Thank you for investigating again. At first glance it seems like this is the same error as before, triggered from a different part of the code.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mpranj picture mpranj  ·  3Comments

markus2330 picture markus2330  ·  4Comments

mpranj picture mpranj  ·  3Comments

mpranj picture mpranj  ·  3Comments

sanssecours picture sanssecours  ·  3Comments