numpy load function with evil data will cause command execution,if attack share evil data on internet,
when user load it , it will cause command execution.
import numpy
from numpy import __version__
print __version__
import os
import pickle
class Test(object):
def __init__(self):
self.a = 1
def __reduce__(self):
return (os.system,('ls',))
tmpdaa = Test()
with open("a-file.pickle",'wb') as f:
pickle.dump(tmpdaa,f)
numpy.load('a-file.pickle')
1.14.6
the version <=1.16.0 , worked
Yes, which is why np.load(allow_pickle=True)
was added, now I guess we could make a move to switch over to defaulting to False
and give a well readable message " use allow_pickle="True"
if you trust this file".
I agree that this would be the better default, so I am open to pushing that deprecation, even if it is unfortunately a bit noisy e.g. for all those scientists just sharing some data in the lab (or just saving/reloading themselves).
So allow_pickle
was added in april 2015, so it would seem that is should have existed since numpy 1.10. So I think that move does get more realistic now, since I doubt many using/supporting 1.17 will also still support 1.10 (removing the pain of supporting the kwarg or not supporting it). Although for the moment it seems scipy at least still supports 1.8 in version 1.
it seems it will be last for a long time
I would suggest logging a deprecation warning and giving a date if you want a smooth transition.
@Plazmaz of course, I would go with a VisibleDeprecationWarning, if we want casual users to stop doing it. Then deprecate after one or two releases. The thing is that it is annoying to work around if you have to and the kwarg does not exist in some older versions. Because then you have to do if np.__version__ > ...: use kwarg else do not use kwarg
to avoid the warning and support both.
Anyway, I think there is a good chance you can get it into 1.17. So if you feel open a PR, but we may want to ping the mailing list to see if someone complains.
Hi, Fedora numpy RPM maintainer. What's a good way to mitigate this in distro packaging?
I do not know of a nice way. Depending on the concern level, I would be up to adding a warning very soon, so that it is definitely there in 1.17. If someone is extremely concerned, we could discuss backporting it or moving quicker, but that would depend a lot on whether or not downstream depends on it.
I am working on this.
cc @jeanqasaur re: security / vulnerability expertise
Hi, Fedora numpy RPM maintainer. What's a good way to mitigate this in distro packaging?
@limburgher: what does fedora do about the exact same functionality built into Python? It's not clear that this is something that needs mitigating.
While I'm not opposed to changing the default, it seems wrong to declare this a vulnerability. It's working as documented and designed.
Unfortunately the rule is that once a CVE number is assigned, it no longer matters whether there is any bug or not, the distros have to try to do something to prove to their customers that they are Providing Value. Not sure what that would be here, but companies and ops people are always struggling to manage the ongoing flood of vulnerabilities, and the tools they use to do this don't have a lot of room for communicating nuance, so that's the way the pressure goes. We don't have customers though, so we shouldn't necessarily take that into account outselves.
We can tell during save
and load
whether a particular file uses pickle or not, right? It probably is good to migrate to allow_pickle=False
in both cases, with an intermediate period where we issue some kind of deprecation warning exactly in the cases where save
or load
actually does need to use pickle and allow_pickle
wasn't specified.
@eric-wieser The difference from the stdlib pickle is that load
/save
actually can avoid using pickle in most cases (e.g. simple arrays of primitive types); pickle only gets used in more exotic cases like object arrays or IIRC certain complicated dtypes. This makes it possible for folks who are mostly using the safe case to miss that the unsafe case exists, if they don't read the docs closely enough. And anyway, given that we have both a "safe mode" and an "unsafe mode", it's better for the "safe mode" to be the default. For stdlib pickle OTOH, it's always 100% unsafe 100% of the time so there's no point in worry about defaults.
Honestly, if it's documented, intentional functionality, I can close the BZ in good conscience, especially if safe is the default. I don't know how we handle Python's functionality. I'll look.
From my examination of the spec, I don't think we alter anything from upstream in that regard.
Has the CVE been disputed? That might make the scenario clearer to maintainers.
The CVE appears largely bogus. That numpy.load
can execute arbitrary code is well known and documented, and it is necessary for loading serialized Python object arrays. The user can forbid loading object arrays by passing allow_pickle=False
to this library function.
It would have been better if the default had been to load object arrays only when explicitly asked, but it is as it is for historical reasons. The transition has been suggested also before, and the discussion above is about how to make it in a way that does not uncontrollably break backward compatibility.
Careless use of numpy.load
, similarly as of Python pickling, can however lead to vulnerabilities in downstream applications.
That
numpy.load
can execute arbitrary code is well known and documented, and it is necessary for loading serialized Python object arrays.
I would rather only say that it is documented. I've been using numpy for a few years, and while I'm not a frequent user of numpy.save
/numpy.load
it wasn't obvious to me at all that numpy.load
suffers from the same vulnerability as pickle
does. Of course I didn't know that numpy.load
might use pickle
under the hood (I only use numpy-native arrays and never gave it a thought, exactly the scenario that @njsmith mentioned).
The fact that pickle
is vulnerable is well-known, and its documentation has a big red warning on top saying
Warning: The
pickle
module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
In comparison, the docs of numpy.load
seems to mention the whole security aspect as an aside in the description of its allow_pickle
keyword:
allow_pickle: _bool, optional_
Allow loading pickled object arrays stored in npy files. Reasons for disallowing pickles include security, as loading pickled data can execute arbitrary code. If pickles are disallowed, loading object arrays will fail. Default: True
I wouldn't hate it if we could put a Big Red Warning into the documentation of numpy.load
, at least until allow_pickle=False
becomes the default. Until that change is made seeing numpy.load
should raise the same red flags in one's mind that pickle.load
raises.
Documentation PR welcome for numpy.load
Documentation now has a warning about pickle
Unfortunately the rule is that once a CVE number is assigned, it no longer matters whether there is any bug or not, the distros have to try to do _something_ to prove to their customers that they are Providing Value. Not sure what that would be here, but companies and ops people are always struggling to manage the ongoing flood of vulnerabilities, and the tools they use to do this don't have a lot of room for communicating nuance, so that's the way the pressure goes.
@njsmith it is not so bad: we will make numpy.load
default to allow_pickle
to False
, which is actually not completely stupid idea.
The only risk I see with that is any project that doesn't explicitly set allow_pickle will break.
It's not just end-user projects we have to worry about - I'm worried about downstream libraries providing mylib.load
that wrapsnp.load
. These will start failing to load object arrays. One of three things will happen with them:
allow_pickle=True
to resume the old behavior - which is the downstream libraries indicating that they think this isn't an attack vector they care about. This still costs them an incompatible releaseallow_pickle=False
in their own API, pushing the problem downstream.My preference would be:
np.save
. Having a long-running script crash at the end while saving an object array is an awful experience.np.load
to None
. Detect the user not passing in True
or False
explicitly, and emit a UserWarning
explaining the dangers, asking them to choose between security (False
) and object array support (True
). Default to the status quo after emitting this warning. It's my understanding that the problem here is lack of awareness. Neither choice is correct in all cases, so I don't think we should suddenly change our minds about the default without warning.@eric-wieser good point about the pain of a script crashing. I would be up for giving a UserWarning
by default.
The question is what we want to do in the long run in load
. I am not sure I like forcing everyone to use the kwarg (to silence the warning), when the array is safe. Although it does have the merit of no danger of locking someone out of their data... OTOH, if the warning only shows up on "unsafe" load, it may be too late. Right now I think I have a slight preference for making the transition period rather a bit longer.
OTOH, if the warning only shows up on "unsafe" load, it may be too late.
Either:
-Werror
set).Yes, I definitely agree for libraries, but I think it may be a bit annoying for the vast number of shorter scripts.
Change the default in
np.load
toNone
. Detect the user not passing inTrue
orFalse
explicitly, and emit aUserWarning
explaining the dangers, asking them to choose between security (False
) and object array support (True
). Default to the status quo after emitting this warning. It's my understanding that the problem here is lack of awareness. Neither choice is correct in all cases, so I don't think we should suddenly change our minds about the default without warning.
This sounds super annoying though. Most people (I believe) don't save/load object arrays. And the worst case if someone misses the warning is (eventually) their script crashes when loading, the data is still safe on disk, and they retry with the allow_pickle
flag.
Is it beyond the responsibility of numpy to try loading safely first and only shouting in case that fails due to object arrays? That would remove extra work for most (non-objecty) use cases, but I guess that would also reduce visibility of the whole security issue. Then again I think "users should be made very aware" and "users should not be inconvenienced" are a bit contradictory efforts here.
* Change the default in `np.load` to `None`. Detect the user not passing in `True` or `False` explicitly, and emit a `UserWarning` explaining the dangers, asking them to choose between security (`False`) and object array support (`True`). Default to the status quo after emitting this warning. It's my understanding that the problem here is lack of awareness. Neither choice is correct in all cases, so I don't think we should suddenly change our minds about the default without warning.
What about this patch?
* Change the default in `np.load` to `None`. Detect the user not passing in `True` or `False` explicitly, and emit a `UserWarning` explaining the dangers, asking them to choose between security (`False`) and object array support (`True`). Default to the status quo after emitting this warning. It's my understanding that the problem here is lack of awareness. Neither choice is correct in all cases, so I don't think we should suddenly change our minds about the default without warning.
What about this patch:
--- a/numpy/lib/npyio.py
+++ b/numpy/lib/npyio.py
@@ -265,7 +265,7 @@ class NpzFile(object):
return self.files.__contains__(key)
-def load(file, mmap_mode=None, allow_pickle=True, fix_imports=True,
+def load(file, mmap_mode=None, allow_pickle=None, fix_imports=True,
encoding='ASCII'):
"""
Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
@@ -367,6 +367,16 @@ def load(file, mmap_mode=None, allow_pic
memmap([4, 5, 6])
"""
+
+ if allow_pickle is None:
+ UserWarning("""
+ numpy.load() run without explicit setting allow_pickle option.
+ If you are not completely certain about security of the pickled
+ data, you are strongly encouraged to set allow_pickle to False,
+ otherwise you can set it to True.
+ """)
+ allow_pickle = False
+
own_fid = False
if isinstance(file, basestring):
fid = open(file, "rb")
I am still in favor of a warning when loading object data, it can be a bit "too late", but makes for a much less noisy transition. We could add a warning when saving (just a permanent warning). There is an open PRs which I hope transform into something more like that. If you want to spend time on it, we are generally happy about PRs.
It seems to me that it is conversion towards starting a deprecation cycle soon in any case, and I think that will happen (but it will be sooner if someone picks it up ;)). There may be a small chance of a request to delay, but I doubt it, and it is hard to know without trying.
Could you please close this issue as it is referenced in https://nvd.nist.gov/vuln/detail/CVE-2019-6446 because of which nexus iq still consider it has Vulnerable
thanks @Manjunath07
Most helpful comment
I am still in favor of a warning when loading object data, it can be a bit "too late", but makes for a much less noisy transition. We could add a warning when saving (just a permanent warning). There is an open PRs which I hope transform into something more like that. If you want to spend time on it, we are generally happy about PRs.
It seems to me that it is conversion towards starting a deprecation cycle soon in any case, and I think that will happen (but it will be sooner if someone picks it up ;)). There may be a small chance of a request to delay, but I doubt it, and it is hard to know without trying.