Scikit-learn: GridSearchCV.fit(...,n_job=-1) might contain bug in parallelism

Created on 11 Sep 2020 · 3Comments · Source: scikit-learn/scikit-learn

Describe the bug

Calling GridSearchCV.fit() with n_jobs=-1 raises exception 'OSError: [Errno 22] Invalid argument'.
Calling GridSearchCV.fit() with n_jobs=None works perfectly

In the case of n_jobs=None, the output shows that it resolves n_jobs=1 (although in my other programs, any successful run of GridSearchCV will resolve n_jobs=6 on the same machine, this may not be a big issue. Just FYI).

Possibly, the problem might lie in (1) lib\site-packages\joblib\parallel.py or _parallel_backends.py or (2) lib\concurrent\futures_base.py, as shown in the traceback below.

Steps/Code to Reproduce

search = GridSearchCV(RandomForestClassifier(), param_grid, cv=3, n_jobs=-1, return_train_score=True, verbose=2)
search.fit(feature_matrix, labels)

Expected Results

No error is thrown.

Actual Results

OSError: [Errno 22] Invalid argument

The detailed trackback is attached below:

_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:Users\tlbh9\Anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
r = call_item()
File "C:Users\tlbh9\Anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
return self.fn(self.args, *self.kwargs)
File "C:Users\tlbh9\Anaconda3\lib\site-packages\joblib_parallel_backends.py", line 595, in __call__
return self.func(args, *kwargs)
File "C:Users\tlbh9\Anaconda3\lib\site-packages\joblib\parallel.py", line 252, in __call__
return [func(args, *kwargs)
File "C:Users\tlbh9\Anaconda3\lib\site-packages\joblib\parallel.py", line 252, in
return [func(args, *kwargs)
File "C:Users\tlbh9\Anaconda3\lib\site-packages\sklearn\model_selection_validation.py", line 505, in _fit_and_score
print("[CV] %s %s" % (msg, (64 - len(msg)) * '.'))
OSError: [Errno 22] Invalid argument
"""

The above exception was the direct cause of the following exception:

OSError Traceback (most recent call last)
in
.......
212 search = GridSearchCV(RandomForestClassifier(), param_grid, cv=3, n_jobs=-1, return_train_score=True, verbose=2)
--> 213 search.fit(feature_matrix, labels)
.......

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(kwargs)
74 return inner_f
75

~\Anaconda3\lib\site-packages\sklearn\model_selection_search.py in fit(self, X, y, groups, **fit_params)
734 return results
735
--> 736 self._run_search(evaluate_candidates)
737
738 # For multi-metric evaluation, store the best_index_, best_params_ and

~\Anaconda3\lib\site-packages\sklearn\model_selection_search.py in _run_search(self, evaluate_candidates)
1186 def _run_search(self, evaluate_candidates):
1187 """Search all candidates in param_grid"""
-> 1188 evaluate_candidates(ParameterGrid(self.param_grid))
1189
1190

~\Anaconda3\lib\site-packages\sklearn\model_selection_search.py in evaluate_candidates(candidate_params)
706 n_splits, n_candidates, n_candidates * n_splits))
707
--> 708 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
709 X, y,
710 train=train, test=test,

~\Anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1040
1041 with self._backend.retrieval_context():
-> 1042 self.retrieve()
1043 # Make sure that we get a last message telling us we are done
1044 elapsed_time = time.time() - self._start_time

~\Anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
919 try:
920 if getattr(self._backend, 'supports_timeout', False):
--> 921 self._output.extend(job.get(timeout=self.timeout))
922 else:
923 self._output.extend(job.get())

~\Anaconda3\lib\site-packages\joblib_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e

~\Anaconda3\lib\concurrent\futures_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433
434 self._condition.wait(timeout)

~\Anaconda3\lib\concurrent\futures_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result

OSError: [Errno 22] Invalid argument

Versions

System:
python: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
executable: C:Users\tlbh9\Anaconda3\python.exe
machine: Windows-10-10.0.18362-SP0

Python dependencies:
pip: 20.1.1
setuptools: 49.2.0.post20200714
sklearn: 0.23.1
numpy: 1.18.5
scipy: 1.5.0
Cython: 0.29.21
pandas: 1.0.5
matplotlib: 3.2.2
joblib: 0.16.0
threadpoolctl: 2.1.0

Built with OpenMP: True

triage

Source

tluocs

👍1

All 3 comments

@tluocs Can you share complete code as i tried to reproduce issue it is saying param_grid is not defined.
I want to know how you assigned param_grid also feature_matrix.
It would be better if you can share all variables used in above code of lines to reproduce issue.

AnshuTrivedi on 26 Sep 2020

Thank you Anshu. The code is quite a lot so it seems not easy to reproduce the problem. That's why I was trying to input all the necessary info when I was raising the issue. Now I've just solved it by upgrading my python and scikit-learn to the latest (although my previous version was quite new already, which was dated in July 2020). So I guess this "bug" (if it is) may have been fixed in the latest release.

tluocs on 26 Sep 2020

❤1

Thanks @tluocs for reaching out, and thanks @AnshuTrivedi for having a look to this.
If I understand correctly the issue has been fixed with an update, I'm closing it. Feel free to reopen if something still needs to be fixed. Thanks.