μ°λ¦¬λ νμ¬ λ¨μ ν μ€νΈκ° 6μκ°μ GH Actions μ νμ λλ¬νλ κ²μ λ³΄κ³ μμ΅λλ€. μ΄κ²μ λͺ λ°±ν μ΄μ λ‘ μ’μ§ μμ΅λλ€.
3.8 μ½μ΄ λμ€ 6μκ° μ ν μκ°(μ§ν μ€)
build_conda_pkg, 3.8 μ½μ΄ λμ€, 3.7 λΉμ½μ΄ λμ€ 6μκ° μ ν μκ°(μ§ν μ€)
3.7 λΉμ½μ΄ deps 6μκ° μ ν μκ°
3.8 λΉμ½μ΄ deps 6μκ° μ ν μκ°
3.7 λΉμ½μ΄ κΉμ΄ 1.5μκ°
build_conda_pkg
3.7 λΉμ½μ΄ λμ€
3.8
μ΄ 3.8 μ½μ΄ deps μμ μΌλΆ λ°μ΄ν°λ₯Ό μΆκ°νκΈ°λ§ νλ©΄ μΌλ ¨μ κ²μ¬κ°
λ΄κ° μμμ°¨λ¦° ν κ°μ§λ κ·Έλ€μ΄ λͺ¨λ 91-93% μλ£ νμ μ£Όμμμ μΌμ μ€μ§νκ³ μλ€λ κ²μ λλ€. λλ κ·Έκ²μ΄ μ΄λ€ ν μ€νΈμΈμ§ μμλ΄λ λ° κ°μΉκ° μλμ§ μμ¬ μ€λ½μ§λ§ μΆκ΅¬ν΄μΌ ν κ²½λ‘ μΌ μ μμ΅λλ€.
λ€μ μ 3.9κ°μ λΉμ½μ΄ depμ λν λ λ€λ₯Έ κ²μ λλ€.
@chukarstenμ μ μΆν΄ μ£Όμ μ κ°μ¬ν©λλ€.
κ³ λ§κ²λ condaλ₯Ό μμΈμΌλ‘ λ°°μ ν μ μμ΅λλ€. μ΄λ build_conda_pkg
λΏλ§ μλλΌ μΌλ°μ μΈ λ¨μ ν
μ€νΈ λΉλμμ λ°μνκΈ° λλ¬Έμ
λλ€.
μ΄λ₯Ό νμ νλ λ° λμμ΄ λ μ μλ μμ§ν΄μΌ νλ λ€λ₯Έ μ λ³΄κ° μμ΅λκΉ? μλ λͺ κ°μ§ μμ΄λμ΄
( @freddyaboulton #2298 λ° #1815 μ μ°κ²°λκΈ° λλ¬Έμ μ¬κΈ°μ μΆκ°νμ΅λλ€ )
pytestλ‘ μμΈν λ‘κΉ
μ μννλλ‘ Makefileμ λ³κ²½νλ©΄ λ€μ λ‘κ·Έκ° νμλ©λλ€.
. μ΄κ²μ λ§μ§λ§μΌλ‘ μ€νλ ν
μ€νΈκ° "evalml/tuners/random_search_tuner.py::evalml.tuners.random_search_tuner.RandomSearchTuner"μμ 보μ¬μ€λλ€.
μκ° μ΄κ³Όλ₯Ό μΆκ°ν ν test_dask_sends_woodwork_schema
μμ λμΌν μκ° μ΄κ³Όλ₯Ό μ μ΄λ μΈ λ² λ³΄μμ΅λλ€.
λλ @freddyaboulton μ΄ νμ€ν μ¬κΈ°μ λκ° μλ€κ³ μκ°νκ³ μ°λ¦¬λ Daskλ₯Ό νκ³ νκ² κ°λ¦¬ν€κ³ μμ΅λλ€. dask λ¨μ ν μ€νΈλ₯Ό λΆλ¦¬νκΈ° μν΄ μ΄ PR μ λ§λ€μμ΅λλ€. λλ κ·Έλ€μ΄ μ€ν¨ν λ λ³ν©μ λ°©μ§νμ§ μμ μ μλ μ΅μ μ΄ μλ€κ³ μκ°ν©λλ€. μ΄ PRμ μ¬μ ν ββdask ν μ€νΈ λ°°μ΄μ μλ test_automl_immediate_quitμμ μ€ν¨νμ΅λλ€.
dask λ¨μ ν μ€νΈ μ€ν¨μ κ·Όλ³Έ μμΈμ μ‘°μ¬νλ κ²μ μμκ»λΌμ λλ€. λ‘κ·Έλ λ€μμ λ§μ΄ μμ±ν©λλ€.
distributed.worker - WARNING - Could not find data: {'Series-32a3ef2ca4739b46a6acc2ac58638b32': ['tcp://127.0.0.1:45587']} on workers: [] (who_has: {'Series-32a3ef2ca4739b46a6acc2ac58638b32': ['tcp://127.0.0.1:45587']})
distributed.scheduler - WARNING - Communication failed during replication: {'status': 'missing-data', 'keys'
μ μ΄λ° μΌμ΄ λ°μν©λκΉ? κΈμμ, λ°μ΄ν°κ° μ²λ¦¬λλ κ³³λ§λ€ ν΄λΉ λ°μ΄ν°μ λν μ°Έμ‘°κ° μμ€λλ κ² κ°μ΅λλ€ . λν 'workers: []'λ μ λͺ¨ νλ‘μΈμ€κ° μμ μλ₯Ό μ£½μ΄κ³ μμμ μμν©λλ€. λ°μ΄ν°κ° λΆμ°λλ λ°©μμ λ¬Έμ κ° μλ κ² κ°μ§λ§ μμ¬ λ³λ ¬/μ§λ ¬λ‘ ν¨κ» μ€νλλ μ΄ λ€ κ°μ§ μμ μΌλ‘ μΈν΄ λ¬΄μ¨ μΌμ΄ μΌμ΄λκ³ μλμ§ μμ¬μ€λ½μ΅λλ€.
μ΄ dask λΆμ° λ¬Έμ λ ν΄λ¬μ€ν°μ λν μ μ νμ₯μ λΉνμ±ννλ κ²μ μ μν©λλ€. λΆννλ μ μν ν΄λ¬μ€ν°λ μ¬μ©νμ§ μκ³ μΌλ° λ‘컬 μ μ ν΄λ¬μ€ν°λ§ μ¬μ©νλ―λ‘ λ¬Έμ κ° λμ§ μμ΅λλ€. μ΄ λ¬Έμ λ λ°μ΄ν° λΆμ°μ
dask μμ
μ λΆλ¦¬νκΈ° μν΄ #2376μ μλνκ³ DaskEngineμ ν΄λΌμ΄μΈνΈμ λν΄ broadcast=False
λ₯Ό μ€μ ν ν κΈ°λ³Έμ μΌλ‘ test_automl_immediate_quitμ λΉμ μμ ν
μ€νΈ μ€ν¨κ° μμ΅λλ€. μ¬κΈ°μ λ¬Έμνλμ΄
μ΄μ build_conda_pkgμμ λ€μ μ€ν μΆμ μ λ³΄κ³ μμ΅λλ€.
[gw3] linux -- Python 3.7.10 $PREFIX/bin/python
X_y_binary_cls = ( 0 1 2 ... 17 18 19
0 -0.039268 0.131912 -0.211206 ... 1.976989 ...ns], 0 0
1 0
2 1
3 1
4 1
..
95 1
96 1
97 1
98 1
99 0
Length: 100, dtype: int64)
cluster = LocalCluster(15c4b3ad, 'tcp://127.0.0.1:45201', workers=0, threads=0, memory=0 B)
def test_submit_training_jobs_multiple(X_y_binary_cls, cluster):
"""Test that training multiple pipelines using the parallel engine produces the
same results as the sequential engine."""
X, y = X_y_binary_cls
with Client(cluster) as client:
pipelines = [
BinaryClassificationPipeline(
component_graph=["Logistic Regression Classifier"],
parameters={"Logistic Regression Classifier": {"n_jobs": 1}},
),
BinaryClassificationPipeline(component_graph=["Baseline Classifier"]),
BinaryClassificationPipeline(component_graph=["SVM Classifier"]),
]
def fit_pipelines(pipelines, engine):
futures = []
for pipeline in pipelines:
futures.append(
engine.submit_training_job(
X=X, y=y, automl_config=automl_data, pipeline=pipeline
)
)
results = [f.get_result() for f in futures]
return results
# Verify all pipelines are trained and fitted.
seq_pipelines = fit_pipelines(pipelines, SequentialEngine())
for pipeline in seq_pipelines:
assert pipeline._is_fitted
# Verify all pipelines are trained and fitted.
> par_pipelines = fit_pipelines(pipelines, DaskEngine(client=client))
evalml/tests/automl_tests/dask_tests/test_dask_engine.py:103:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
evalml/tests/automl_tests/dask_tests/test_dask_engine.py:94: in fit_pipelines
results = [f.get_result() for f in futures]
evalml/tests/automl_tests/dask_tests/test_dask_engine.py:94: in <listcomp>
results = [f.get_result() for f in futures]
evalml/automl/engine/dask_engine.py:30: in get_result
return self.work.result()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Future: cancelled, key: train_pipeline-4bd4a99325cd3cc91144f86b64d6503c>
timeout = None
def result(self, timeout=None):
"""Wait until computation completes, gather result to local process.
If *timeout* seconds are elapsed before returning, a
``dask.distributed.TimeoutError`` is raised.
"""
if self.client.asynchronous:
return self.client.sync(self._result, callback_timeout=timeout)
# shorten error traceback
result = self.client.sync(self._result, callback_timeout=timeout, raiseit=False)
if self.status == "error":
typ, exc, tb = result
raise exc.with_traceback(tb)
elif self.status == "cancelled":
> raise result
E concurrent.futures._base.CancelledError: train_pipeline-4bd4a99325cd3cc91144f86b64d6503c
μ΄κ²μ dask https://github.com/dask/distributed/issues/4612 μ μλ €μ§ λ¬Έμ μΈ κ² κ°μ΅λλ€.
λ΄ μ€λλ κ²μλ¬Όμ μμ νμ§λ§ μ¬κΈ°μ λΉ¨κ°μ κ²μλ¬Όμ΄ μμ΅λλ€. https://github.com/alteryx/evalml/actions/runs/939673304 , μμ κ²μλ @freddyaboulton κ³Ό λμΌν μ€ν μΆμ μΈ κ² κ°μ΅λλ€.
λλ μ΄ λ¬Έμ κ° λ μ΄μ [μ΄ PR] λΉ dask μμ μ λΆλ¦¬νκΈ° μν΄ μ°¨λ¨νμ§ μλλ€κ³ μκ°ν©λλ€(https://github.com/alteryx/evalml/pull/2376), μ΄ PR μ μ‘°κ°μ μ€μ΄κΈ° μν΄ dask μμ μ 리ν©ν λ§νκ³ , μ΄ PR μ λ³λμ dask μμ μ΄ λ©μΈμ λ³ν©νκΈ° μν΄ μ°¨λ¨λμ§ μλλ‘ νκ³
dask κ΄λ ¨ μκ° μ΄κ³Όλ μ΄μ λ μ΄μ λ¬Έμ κ° μλλ©° κ°κΉμ΄ μ₯λμ μμ΄μλ μ λκΈ° λλ¬Έμ μ΄κ²μ λ«νμΌλ‘ μ΄λν©λλ€. κ·Έλ¬λ κ·Όλ³Έμ μΈ μμΈμ μ¬μ ν ββμλ €μ Έ μμ§ μμ΅λλ€.
κ°μ₯ μ μ©ν λκΈ
μ΄μ build_conda_pkgμμ λ€μ μ€ν μΆμ μ λ³΄κ³ μμ΅λλ€.
μ΄κ²μ dask https://github.com/dask/distributed/issues/4612 μ μλ €μ§ λ¬Έμ μΈ κ² κ°μ΅λλ€.