Scikit-learn: すべての属性が文書化されていることを確認します

作成日 2019年07月12日 · 79コメント · ソース: scikit-learn/scikit-learn

＃13385で説明したように、すべての属性が文書化されていることを確認する必要があります。

これに取り組みたい場合は、特定のサブモジュールを選択し、そのサブモジュール内のすべての属性ドキュメントの不一致を修正する必要があります。

残りのものを見つけるためのスクリプトは次のとおりです（誤検知がある可能性があります）。

import numpy as np
from sklearn.base import clone
from sklearn.utils.testing import all_estimators
from sklearn.utils.estimator_checks import pairwise_estimator_convert_X, enforce_estimator_tags_y
from numpydoc import docscrape

ests = all_estimators()

for name, Est in ests:
    try:
        estimator_orig = Est()
    except:
        continue
    rng = np.random.RandomState(0)
    X = pairwise_estimator_convert_X(rng.rand(40, 10), estimator_orig)
    X = X.astype(object)
    y = (X[:, 0] * 4).astype(np.int)
    est = clone(estimator_orig)
    y = enforce_estimator_tags_y(est, y)
    try:
        est.fit(X, y)
    except:
        continue
    fitted_attrs = [(x, getattr(est, x, None))
                    for x in est.__dict__.keys() if x.endswith("_")
                    and not x.startswith("_")]
    doc = docscrape.ClassDoc(type(est))
    doc_attributes = []
    incorrect = []
    for att_name, type_definition, param_doc in doc['Attributes']:
        if not type_definition.strip():
            if ':' in att_name and att_name[:att_name.index(':')][-1:].strip():
                incorrect += [name +
                              ' There was no space between the param name and '
                              'colon (%r)' % att_name]
            elif name.rstrip().endswith(':'):
                incorrect += [name +
                              ' Parameter %r has an empty type spec. '
                              'Remove the colon' % (att_name.lstrip())]

        if '*' not in att_name:
            doc_attributes.append(att_name.split(':')[0].strip('` '))
    assert incorrect == []
    fitted_attrs_names = [x[0] for x in fitted_attrs]

    bad = sorted(list(set(fitted_attrs_names) ^ set(doc_attributes)))
    if len(bad) > 0:
        msg = '{}\n'.format(name) + '\n'.join(bad)
        print("Docstring Error: Attribute mismatch in " + msg)

Documentation Easy good first issue help wanted

ソース

amueller

🚀1 👍1

最も参考になるコメント

各推定量の属性docstringがありません

PRでこの問題を参照してください

[x] ARDRegression、[intercept_]
[x] AdaBoostClassifier、[base_estimator_]
[x] AdaBoostRegressor、[base_estimator_]
[x] AdditiveChi2Sampler、[sample_interval_]
[x] AgglomerativeClustering、[n_components _]（非推奨）
[x] BaggingClassifier、[n_features_]
[x] BaggingRegressor、[base_estimator_、n_features_]
[x] BayesianGaussianMixture、[mean_precision_prior、mean_precision_prior_]
[x] BayesianRidge、[X_offset_、X_scale_]
[x] BernoulliNB、[coef_、intercept_]
[x] BernoulliRBM、[h_samples_]
[]バーチ、[fit_、partial_fit_]
[] CCA、[coef_、x_mean_、x_std_、y_mean_、y_std_]
[x] CheckingClassifier、[classes_]
[x] ComplementNB、[coef_、intercept_]
[x] CountVectorizer、[stop_words_、vocabulary_]
[] DecisionTreeRegressor、[classes_、n_classes_]
[x] DictVectorizer、[feature_names_、vocabulary_]
[] DummyClassifier、[output_2d_]
[] DummyRegressor、[output_2d_]
[] ElasticNet、[dual_gap_]
[] ElasticNetCV、[dual_gap_]
[] EllipticEnvelope、[dist_、raw_covariance_、raw_location_、raw_support_]
[x] ExtraTreeClassifier、[feature_importances_]
[] ExtraTreeRegressor、[classes_、feature_importances_、n_classes_]
[x] ExtraTreesClassifier、[base_estimator_]
[x] ExtraTreesRegressor、[base_estimator_]
[x] FactorAnalysis、[mean_]
[] FeatureAgglomeration、[n_components_]
[x] GaussianProcessClassifier、[base_estimator_]
[x] GaussianRandomProjection、[components_]
[x] GradientBoostingClassifier、[max_features_、n_classes_、n_features_、oob_improvement_]
[x] GradientBoostingRegressor、[max_features_、n_classes_、n_estimators_、n_features_、oob_improvement_]
[x] HistGradientBoostingClassifier、[bin_mapper_、classes_、do_early_stopping_、loss_、n_features_、scorer_]
[x] HistGradientBoostingRegressor、[bin_mapper_、do_early_stopping_、loss_、n_features_、scorer_]
[x] IncrementalPCA、[batch_size_]
[x] IsolationForest、[base_estimator_、estimators_features_、n_features_]
[x] IsotonicRegression、[X_max_、X_min_、f_]
[x] IterativeImputer、[random_state_]
[x] KNeighborsClassifier、[classes_、effective_metric_、effective_metric_params_、outputs_2d_]
[x] KNeighborsRegressor、[effective_metric_、effective_metric_params_]
[x] KernelCenterer、[K_fit_all_、K_fit_rows_]
[x] KernelDensity、[tree_]
[x] KernelPCA、[X_transformed_fit_、dual_coef_]
[x] LabelBinarizer、[classes_、sparse_input_、y_type_]
[x] LabelEncoder、[classes_]
[x] LarsCV、[active_]
[x]なげなわ、[dual_gap_]
[x] LassoLarsCV、[active_]
[x] LassoLarsIC、[alphas_]
[x] LatentDirichletAllocation、[bound_、doc_topic_prior_、exp_dirichlet_component_、random_state_、topic_word_prior_]
[x] LinearDiscriminantAnalysis、[共分散_]
[x] LinearRegression、[rank_、singular_]
[x] LinearSVC、[classes_]
[x] LocalOutlierFactor、[effective_metric_、effective_metric_params_]
[x] MDS、[dissimilarity_matrix_、n_iter_]
[x] MLPClassifier、[best_loss_、loss_curve_、t_]
[x] MLPRegressor、[best_loss_、loss_curve_、t_]
[x] MinMaxScaler、[n_samples_seen_]
[x] MiniBatchDictionaryLearning、[iter_offset_]
[x] MiniBatchKMeans、[counts_、init_size_、n_iter_]
[x] MultiLabelBinarizer、[classes_]
[x] MultiTaskElasticNet、[dual_gap_、eps_、sparse_coef_]
[x] MultiTaskElasticNetCV、[dual_gap_]
[x] MultiTaskLasso、[dual_gap_、eps_、sparse_coef_]
[x] MultiTaskLassoCV、[dual_gap_]
[x] NearestCentroid、[classes_]
[x] NearestNeighbors、[effective_metric_、effective_metric_params_]
[x] NeighborhoodComponentsAnalysis、[random_state_]
[x] NuSVC、[class_weight_、fit_status_、probA_、probB_、shape_fit_]
[] NuSVR、[class_weight_、fit_status_、n_support_、probA_、probB_、shape_fit_]
[x] OAS、[location_]
[] OneClassSVM、[class_weight_、fit_status_、n_support_、probA_、probB_、shape_fit_]
[x] OneVsOneClassifier、[n_classes_]
[x] OneVsRestClassifier、[coef_、intercept_、n_classes_]
[x] OrthogonalMatchingPursuit、[n_nonzero_coefs_]
[] PLSCanonical、[coef_、x_mean_、x_std_、y_mean_、y_std_]
[x] PLSRegression、[x_mean_、x_std_、y_mean_、y_std_]
[] PLSSVD、[x_mean_、x_std_、y_mean_、y_std_]
[x] PassiveAggressiveClassifier、[loss_function_、t_]
[x] PassiveAggressiveRegressor、[t_]
[x]パーセプトロン、[loss_function_]
[x] QuadraticDiscriminantAnalysis、[classes_、covariance_]
[x] RBFSampler、[random_offset_、random_weights_]
[] RFE、[classes_]
[] RFECV、[classes_]
[x] RadiusNeighborsClassifier、[classes_、effective_metric_、effective_metric_params_、outputs_2d_]
[x] RadiusNeighborsRegressor、[effective_metric_、effective_metric_params_]
[x] RandomForestClassifier、[oob_decision_function_、oob_score_]
[x] RandomForestRegressor、[oob_prediction_、oob_score_]
[x] RandomTreesEmbedding、[base_estimator_、feature_importances_、n_features_、n_outputs_、one_hot_encoder_]
[x] RidgeCV、[cv_values_]
[x] RidgeClassifier、[classes_]
[x] RidgeClassifierCV、[cv_values_]
[x] SGDClassifier、[classes_、t_]
[x] SGDRegressor、[average_coef_、average_intercept_]
[x] SVC、[class_weight_、shape_fit_]
[] SVR、[class_weight_、fit_status_、n_support_、probA_、probB_、shape_fit_]
[x] SelectKBest、[pvalues_、scores_]
[x] ShrunkCovariance、[収縮]
[x] SkewedChi2Sampler、[random_offset_、random_weights_]
[x] SparseRandomProjection、[components_、density_]
[x] SpectralEmbedding、[n_neighbors_]
[x] TfidfVectorizer、[stop_words_、vocabulary_]

thomasjpfan 2019年07月13日

👍6

全てのコメント79件

NMFクラス記述の属性ドキュメントに少なくとも1つの不一致がすでに見つかりました。私はこの仕事のいくつかを取ることができると思います。 decompositionおよびrandom_projectionサブモジュール内でいくつかの変更を提案する準備がほぼ整いました。

alexitkes 2019年07月12日

各推定量の属性docstringがありません

PRでこの問題を参照してください

[x] ARDRegression、[intercept_]
[x] AdaBoostClassifier、[base_estimator_]
[x] AdaBoostRegressor、[base_estimator_]
[x] AdditiveChi2Sampler、[sample_interval_]
[x] AgglomerativeClustering、[n_components _]（非推奨）
[x] BaggingClassifier、[n_features_]
[x] BaggingRegressor、[base_estimator_、n_features_]
[x] BayesianGaussianMixture、[mean_precision_prior、mean_precision_prior_]
[x] BayesianRidge、[X_offset_、X_scale_]
[x] BernoulliNB、[coef_、intercept_]
[x] BernoulliRBM、[h_samples_]
[]バーチ、[fit_、partial_fit_]
[] CCA、[coef_、x_mean_、x_std_、y_mean_、y_std_]
[x] CheckingClassifier、[classes_]
[x] ComplementNB、[coef_、intercept_]
[x] CountVectorizer、[stop_words_、vocabulary_]
[] DecisionTreeRegressor、[classes_、n_classes_]
[x] DictVectorizer、[feature_names_、vocabulary_]
[] DummyClassifier、[output_2d_]
[] DummyRegressor、[output_2d_]
[] ElasticNet、[dual_gap_]
[] ElasticNetCV、[dual_gap_]
[] EllipticEnvelope、[dist_、raw_covariance_、raw_location_、raw_support_]
[x] ExtraTreeClassifier、[feature_importances_]
[] ExtraTreeRegressor、[classes_、feature_importances_、n_classes_]
[x] ExtraTreesClassifier、[base_estimator_]
[x] ExtraTreesRegressor、[base_estimator_]
[x] FactorAnalysis、[mean_]
[] FeatureAgglomeration、[n_components_]
[x] GaussianProcessClassifier、[base_estimator_]
[x] GaussianRandomProjection、[components_]
[x] GradientBoostingClassifier、[max_features_、n_classes_、n_features_、oob_improvement_]
[x] GradientBoostingRegressor、[max_features_、n_classes_、n_estimators_、n_features_、oob_improvement_]
[x] HistGradientBoostingClassifier、[bin_mapper_、classes_、do_early_stopping_、loss_、n_features_、scorer_]
[x] HistGradientBoostingRegressor、[bin_mapper_、do_early_stopping_、loss_、n_features_、scorer_]
[x] IncrementalPCA、[batch_size_]
[x] IsolationForest、[base_estimator_、estimators_features_、n_features_]
[x] IsotonicRegression、[X_max_、X_min_、f_]
[x] IterativeImputer、[random_state_]
[x] KNeighborsClassifier、[classes_、effective_metric_、effective_metric_params_、outputs_2d_]
[x] KNeighborsRegressor、[effective_metric_、effective_metric_params_]
[x] KernelCenterer、[K_fit_all_、K_fit_rows_]
[x] KernelDensity、[tree_]
[x] KernelPCA、[X_transformed_fit_、dual_coef_]
[x] LabelBinarizer、[classes_、sparse_input_、y_type_]
[x] LabelEncoder、[classes_]
[x] LarsCV、[active_]
[x]なげなわ、[dual_gap_]
[x] LassoLarsCV、[active_]
[x] LassoLarsIC、[alphas_]
[x] LatentDirichletAllocation、[bound_、doc_topic_prior_、exp_dirichlet_component_、random_state_、topic_word_prior_]
[x] LinearDiscriminantAnalysis、[共分散_]
[x] LinearRegression、[rank_、singular_]
[x] LinearSVC、[classes_]
[x] LocalOutlierFactor、[effective_metric_、effective_metric_params_]
[x] MDS、[dissimilarity_matrix_、n_iter_]
[x] MLPClassifier、[best_loss_、loss_curve_、t_]
[x] MLPRegressor、[best_loss_、loss_curve_、t_]
[x] MinMaxScaler、[n_samples_seen_]
[x] MiniBatchDictionaryLearning、[iter_offset_]
[x] MiniBatchKMeans、[counts_、init_size_、n_iter_]
[x] MultiLabelBinarizer、[classes_]
[x] MultiTaskElasticNet、[dual_gap_、eps_、sparse_coef_]
[x] MultiTaskElasticNetCV、[dual_gap_]
[x] MultiTaskLasso、[dual_gap_、eps_、sparse_coef_]
[x] MultiTaskLassoCV、[dual_gap_]
[x] NearestCentroid、[classes_]
[x] NearestNeighbors、[effective_metric_、effective_metric_params_]
[x] NeighborhoodComponentsAnalysis、[random_state_]
[x] NuSVC、[class_weight_、fit_status_、probA_、probB_、shape_fit_]
[] NuSVR、[class_weight_、fit_status_、n_support_、probA_、probB_、shape_fit_]
[x] OAS、[location_]
[] OneClassSVM、[class_weight_、fit_status_、n_support_、probA_、probB_、shape_fit_]
[x] OneVsOneClassifier、[n_classes_]
[x] OneVsRestClassifier、[coef_、intercept_、n_classes_]
[x] OrthogonalMatchingPursuit、[n_nonzero_coefs_]
[] PLSCanonical、[coef_、x_mean_、x_std_、y_mean_、y_std_]
[x] PLSRegression、[x_mean_、x_std_、y_mean_、y_std_]
[] PLSSVD、[x_mean_、x_std_、y_mean_、y_std_]
[x] PassiveAggressiveClassifier、[loss_function_、t_]
[x] PassiveAggressiveRegressor、[t_]
[x]パーセプトロン、[loss_function_]
[x] QuadraticDiscriminantAnalysis、[classes_、covariance_]
[x] RBFSampler、[random_offset_、random_weights_]
[] RFE、[classes_]
[] RFECV、[classes_]
[x] RadiusNeighborsClassifier、[classes_、effective_metric_、effective_metric_params_、outputs_2d_]
[x] RadiusNeighborsRegressor、[effective_metric_、effective_metric_params_]
[x] RandomForestClassifier、[oob_decision_function_、oob_score_]
[x] RandomForestRegressor、[oob_prediction_、oob_score_]
[x] RandomTreesEmbedding、[base_estimator_、feature_importances_、n_features_、n_outputs_、one_hot_encoder_]
[x] RidgeCV、[cv_values_]
[x] RidgeClassifier、[classes_]
[x] RidgeClassifierCV、[cv_values_]
[x] SGDClassifier、[classes_、t_]
[x] SGDRegressor、[average_coef_、average_intercept_]
[x] SVC、[class_weight_、shape_fit_]
[] SVR、[class_weight_、fit_status_、n_support_、probA_、probB_、shape_fit_]
[x] SelectKBest、[pvalues_、scores_]
[x] ShrunkCovariance、[収縮]
[x] SkewedChi2Sampler、[random_offset_、random_weights_]
[x] SparseRandomProjection、[components_、density_]
[x] SpectralEmbedding、[n_neighbors_]
[x] TfidfVectorizer、[stop_words_、vocabulary_]

thomasjpfan 2019年07月13日

👍6

treeサブモジュール属性のドキュメントの不一致を取り上げることができます。これには次のものが含まれます。

DecisionTreeRegressor、[classes_、n_classes_]
ExtraTreeClassifier、[classes_、max_features_、n_classes_、n_features_、n_outputs_、tree_]
ExtraTreeRegressor、[classes_、max_features_、n_classes_、n_features_、n_outputs_、tree_]

mepa 2019年07月13日

私はLinearRegression、[rank_、singular_]に取り組んでいます。

wendyhhu 2019年07月13日

👍1

私はLinearSVC [n_iter_]とLinearSVR [n_iter_]に取り組んでいます

wendyhhu 2019年07月13日

👍1

Gradient boostingを取り上げます。

GradientBoostingClassifier [base_estimator_、max_features_、n_classes_、n_features_]
- GradientBoostingRegressor [base_estimator_、classes_、max_features_、n_estimators_、n_features_]

matsmaiwald 2019年07月14日

気にしないで、属性が欠落している場所と欠落している場所を読み間違えます

matsmaiwald 2019年07月14日

naive_bayesサブモジュールの分類子について文書化されていないclasses_属性もあるようです。私はそれを修正し始めました。

alexitkes 2019年07月14日

TfidfVectorizer、[fixed_vocabulary_]に取り組みます

mandalbiswadip 2019年07月14日

私は取り組みます：

RandomForestClassifier、[base_estimator_]
RandomForestRegressor、[base_estimator_、n_classes_]
ExtraTreesClassifier、[base_estimator_]
ExtraTreesRegressor、[base_estimator_、n_classes_]

rcwoolston 2019年07月14日

私は取り組んでいます：

SGDClassifier、[average_coef_、average_intercept_、standard_coef_、standard_intercept_]
SGDRegressor、[standard_coef_、standard_intercept_]

編集：これらの属性をパブリックからプライベートに変更する問題を開きました（参照：＃14364）

wendyhhu 2019年07月14日

私は取り組んでいます：
KernelCenterer、[K_fit_all_、K_fit_rows_]
MinMaxScaler、[n_samples_seen_]

SwordKnight6216 2019年07月14日

私は取り組みます：

RandomTreesEmbedding、[base_estimator_、classes_、feature_importances_、n_classes_、n_features_、n_outputs_、one_hot_encoder_]

rcwoolston 2019年07月14日

また、 KNeighborsClassifier 、 KNeighborsRegressor 、およびおそらく他のクラスのneighborsモジュールには、属性のドキュメントがまったくないことも発見しました。現在、2つの属性を持つKNeighborsRegressor取り組んでいます。

effective_metric_
effective_metric_params_

KNeighborsClassifierクラスには、次の4つの属性があります。

classes_
effective_metric_
effective_metric_params_
outputs_2d_

alexitkes 2019年07月14日

@alexitkes良いキャッチ。ありがとう！

amueller 2019年07月14日

QuadraticDiscriminantAnalysis、[classes_、covariance_]に取り組んでいます

abhishek-jana 2019年07月15日

KNeighborsClassifier、[classes_、effective_metric_、effective_metric_params_、outputs_2d_]での作業
RadiusNeighborsClassifier、[classes_、effective_metric_、effective_metric_params_、outputs_2d_]

abhishek-jana 2019年07月15日

取りかかっている：
LinearSVC、[classes_]
NuSVC、[class_weight_、classes_、fit_status_、probA_、probB_、shape_fit_]
SVC、[class_weight_、classes_、shape_fit_]

kwinata 2019年07月20日

取りかかっている：

[] BaggingClassifier、[n_features_、oob_decision_function_、oob_score_]
[] BaggingRegressor、[base_estimator_、n_features_、oob_prediction_、oob_score_]
[] AdaBoostClassifier、[base_estimator_]
[] AdaBoostRegressor、[base_estimator_]

mdomarsaleem 2019年07月25日

取りかかっている：

CountVectorizer、[stop_words_、vocabulary_]
DictVectorizer、[feature_names_、vocabulary_]

pvsagar 2019年07月25日

ひい!! 私はこれを手伝いたいです..誰かplzは私がどこから始めるべきか教えてもらえますか？

ManishAradwad 2019年08月05日

dict_learning.py @ spbailの関数に取り組んでいます

hannahbrucemacdonald 2019年08月24日

@ olgadk7を使用したLinearDiscriminantAnalysisの作業

m-clare 2019年08月24日

RidgeClassifierCV @ npatta01での属性の不一致に関する作業

ingrid88 2019年08月24日

@ ingrid88 + @ npatta01を

meiguan 2019年08月24日

@ olgadk7を使用したLinearDiscriminantAnalysisの作業

上記の属性スクリプトの誤検知。これは文書化されています。

m-clare 2019年08月24日

@ olgadk7でAdditiveChi2Samplerに取り組んでいます

m-clare 2019年08月24日

@eugeniaftでLabelEncoderに取り組んでいます

FranciDona 2019年08月24日

randomtreeclassifierで作業しようとします！

joyharjanto 2019年08月24日

取りかかっている

パーセプトロン

joanaz 2019年08月24日

BernoulliRBMに取り組んでいます

npatta01 2019年08月24日

ExtraTreeClassiferに取り組んでいます

she-dares 2019年08月24日

@eugeniaftでLabelEncoderに取り組んでいます

LabelEncoderには不一致がないようです。現在、OneClassSVMに取り組んでいます。

FranciDona 2019年08月24日

ツリーリグレッサは、代わりにクラスを非推奨にする必要があると思います。

amueller 2019年08月24日

SVRに取り組んでいます

ingrid88 2019年08月24日

取りかかっている：

OneVsOneClassifier、[n_classes_]
OneVsRestClassifier、[coef_、intercept_、n_classes_]

YuliaZamriy 2019年08月24日

LinearRegression、[rank_、singular_]に取り組んでいます

arzoobh 2019年08月24日

LatentDirichletAllocation、[bound_、doc_topic_prior_、exp_dirichlet_component_、random_state_、topic_word_prior_]に取り組んでいます

YuliaZamriy 2019年08月24日

取りかかっている
BaggingClassifier、[n_features_、oob_decision_function_、oob_score_]
BaggingRegressor、[base_estimator_、n_features_、oob_prediction_、oob_score_]

joanaz 2019年08月24日

BaggingClassifier、[n_features_、oob_decision_function_、oob_score_]
BaggingRegressor、[base_estimator_、n_features_、oob_prediction_、oob_score_]
oob_属性はPR＃14779のアドレスであり、n_features_とbase_estimator_は誤検知です。

joanaz 2019年08月24日

取りかかっている
AdaBoostClassifier、[base_estimator_]

更新： https：//github.com/scikit-learn/scikit-learn/pull/14477ですでに修正されてい

nitya 2019年08月24日

次のスプリントでこの問題を推奨したり、より厳選されたバージョンを使用したりするべきではないと思います。

以前のスプリントでの私の経験に基づくと、まだ多くの誤検知があり、実際にパブリック属性を非推奨にして非公開にするように寄稿者に依頼することになります。これは間違いなくはるかに困難です（寄稿者は彼らが働いていると感じているのでイライラする可能性があります何も）。

Ping @amueller @thomasjpfan WDYT？

NicolasHug 2019年09月03日

次のスプリントでこの問題を推奨したり、より厳選されたバージョンを使用したりするべきではないと思います。

https://github.com/numpy/numpydoc/issues/213で提案されているようなdocstringの一般的な検証ツールがあれば、寄稿者にとっては少し簡単になるでしょう。一部の属性は公開されるべきではないのに公開されているという事実に完全には対処していないことに同意します。

rth 2019年10月07日

TfidfVectorizer, SpectralEmbedding, SparseRandomProjectionが更新されます。

zioalex 2019年10月12日

その問題を最初の問題として取り上げようと思っていましたが、スクリプトによってリストされたサブモジュールをランダムに選択した後、正しく文書化されていないとわかったクラスはPLS *クラスだけです。しかし、それらは_pls_.pyファイルにあり、非公開のようです。私はそれらに取り組むべきですか、それとも別の良い最初の問題を見つけるべきですか？

panpiort8 2019年10月24日

実際のクラスが公開されている限り、資格があります。パブリッククラスはdoc/modules/classes.rstリストされています。 PLS *クラスがありますので、自由に文書化してください

NicolasHug 2019年10月24日

すべての属性をアルファベット順に並べることも意味がありますか？セクションに構造を与え、セクションを読みやすくするだろうと思います。

pwalchessen 2019年11月02日

👍1

@pwalchessen同意します、良い考えのように聞こえます。直接述べたように、私もそれをテストに追加します。

amueller 2019年11月02日

これらはまだオープンで、ちょっと明白なようです：

Docstring Error: Attribute mismatch in RidgeCV
cv_values_
Docstring Error: Attribute mismatch in RidgeClassifier
classes_
Docstring Error: Attribute mismatch in RidgeClassifierCV
classes_
cv_values_
Docstring Error: Attribute mismatch in SkewedChi2Sampler
random_offset_
random_weights_
Docstring Error: Attribute mismatch in PLSCanonical
coef_
x_mean_
x_std_
y_mean_
y_std_
Docstring Error: Attribute mismatch in PLSRegression
x_mean_
x_std_
y_mean_
y_std_
Docstring Error: Attribute mismatch in PLSSVD
x_mean_
x_std_
y_mean_
y_std_
Docstring Error: Attribute mismatch in PassiveAggressiveClassifier
loss_function_
Docstring Error: Attribute mismatch in Perceptron
loss_function_
Docstring Error: Attribute mismatch in PolynomialFeatures
powers_
Docstring Error: Attribute mismatch in QuadraticDiscriminantAnalysis
covariance_
Docstring Error: Attribute mismatch in RBFSampler
random_offset_
random_weights_
Docstring Error: Attribute mismatch in RadiusNeighborsClassifier
n_samples_fit_
outlier_label_
Docstring Error: Attribute mismatch in RadiusNeighborsRegressor
n_samples_fit_
Docstring Error: Attribute mismatch in RadiusNeighborsTransformer
effective_metric_
effective_metric_params_
n_samples_fit_
Docstring Error: Attribute mismatch in ElasticNet
dual_gap_
sparse_coef_
Docstring Error: Attribute mismatch in ElasticNetCV
dual_gap_
Docstring Error: Attribute mismatch in EllipticEnvelope
dist_
raw_covariance_
raw_location_
raw_support_

そしてもっとたくさん...

amueller 2019年11月02日

追加する必要のある未解決の属性のリストを更新しました。

[] BayesianGaussianMixture
- [x] mean_precision_prior
- [] mean_precision_prior_
[] BayesianRidge
- [] X_offset_
- [] X_scale_
[] BernoulliNB
- [] coef_ array
- []切片_
[]バーチ
- [] fit_
- [] partial_fit_
[] CCA
- [] coef_ array、shape（1、n_features）または（n_classes、n_features）; 決定関数の特徴の係数。
- [] x_mean_：配列、形状（n_features、）特徴の平均。
- [] x_std_
- [] y_mean_
- [] y_std_
[x] CategoricalNB
- [x] classes_（classes_：配列、形状（n_classes、）
  
  分類器に認識されているクラスラベルのリスト。
[] ComplementNB
- [] coef_：配列、形状（1、n_features）または（n_classes、n_features）; 決定関数の特徴の係数。
- []切片_
[x] CountVectorizer
- [x] stop_words_
- [x] vocabulary_
[x] DecisionTreeClassifier
- [x] feature_importances_
[] DecisionTreeRegressor
- [] classes_：配列のような形（n_classes、）; 一意のクラスラベル
- [] n_classes_：int; 一意のクラスラベルの数
- [x] feature_importances_
[] DictVectorizer
- [] feature_names_
- [ ] 単語_
[] DummyClassifier
- [] output_2d_
[] DummyRegressor
- [] output_2d_
[] ElasticNet
- [] dual_gap_
- [] sparse_coef_
[] ElasticNetCV
- [] dual_gap_
[] EllipticEnvelope
- [] dist_
- [] raw_covariance_
- [] raw_location_
- [] raw_support_
[] ExtraTreeClassifier
- [] feature_importances_
[] ExtraTreeRegressor
- [] classes_：配列のような形（n_classes、）; 一意のクラスラベル
- [] feature_importances_
- [] n_classes_：int; 一意のクラスラベルの数
[] FeatureAgglomeration
- [] n_components_
- [x]距離_
[] GaussianProcessClassifier
- [] base_estimator_
- [x] kernel_
[x] GaussianRandomProjection
- [x]コンポーネント_
[] GradientBoostingClassifier
- [] max_features_
- [] n_classes_：int; 一意のクラスの数。
- [] n_features_：int; 使用された機能の数。
- [x] oob_improvement_
- [x] feature_importances_
[] GradientBoostingRegressor
- [] max_features_
- [] n_classes_：int; 一意のクラスの数。
- [] n_estimators_
- [] n_features_：int; 使用された機能の数。
- [x] oob_improvement_
- [x] feature_importances_
[] HistGradientBoostingClassifier
- [] bin_mapper_
- [ ] クラス_
- [] do_early_stopping_
- [] loss_
- [] n_features_：int; 選択した機能の数。
- [x] n_iter_
- []スコアラー_
[] HistGradientBoostingRegressor
- [] bin_mapper_
- [] do_early_stopping_
- [] loss_
- [] n_features_：int; 選択した機能の数。
- []
- []スコアラー_
[] IncrementalPCA
- [ ] バッチサイズ_
[] IsolationForest
- [] base_estimator_
- [] estimators_features_
- [x] estimators_samples_
- [] n_features_：int; 選択した機能の数。
[] KernelCenterer
- [] K_fit_all_
- [] K_fit_rows_
[] KernelDensity
- [ ] 木_
[] LarsCV
- [] active_
[]なげなわ
- [] dual_gap_
- [x] sparse_coef_
[] LassoLarsCV
- [] active_
[] LassoLarsIC
- [] alphas_
[]潜在的ディリクレ割り当て
- [x] bound_
- [x] doc_topic_prior_
- [] exp_dirichlet_component_
- [] random_state_
[] LocalOutlierFactor
- [] effective_metric_
- [] effective_metric_params_
- [] n_samples_fit_：int; 近似データのサンプル数。
[] MDS
- [] dissimilarity_matrix_
- [] n_iter_：int; 反復回数。
[] MLPClassifier
- [] best_loss_
- [] loss_curve_
- [] t_
[] MLPRegressor
- [] best_loss_
- [] loss_curve_
- [] t_
[] MiniBatchKMeans
- [] counts_
- [] init_size_
- [] n_iter_：int; 反復回数。
[] MultiTaskElasticNet
- [] dual_gap_
- [] eps_
- [] sparse_coef_
[] MultiTaskElasticNetCV
- [] dual_gap_
[] MultiTaskLasso
- [] dual_gap_
- [] eps_
- [] sparse_coef_
[] MultiTaskLassoCV
- [] dual_gap_
[] OAS
- [] location_
[] OneVsRestClassifier
- [] coef_：配列、形状（1、n_features）または（n_classes、n_features）; 決定関数の特徴の係数。
- []切片_
- [] n_classes_：int; 一意のクラスの数。
[] OrthogonalMatchingPursuit
- [] n_nonzero_coefs_
[] PLSCanonical
- [] coef_：配列、形状（1、n_features）または（n_classes、n_features）; 決定関数の特徴の係数。
- [] x_mean_：float ???; の意味
- [] x_std_
- [] y_mean_
- [] y_std_
[] PLSRegression
- [] x_mean_
- [] x_std_
- [] y_mean_
- [] y_std_
[] PLSSVD
- [] x_mean_
- [] x_std_
- [] y_mean_
- [] y_std_
[] PassiveAggressiveClassifier
- [] loss_function_
[] RBFSampler
- [] random_offset_
- [] random_weights_
[] ShrunkCovariance
- []収縮
[] SkewedChi2Sampler
- [] random_offset_
- [] random_weights_
[] _BaseRidgeCV
- [] alpha_
- [] coef_
- []切片_
[] _ConstantPredictor
- [] y_
[] _RidgeGCV
- [] alpha_
- [] coef_
- [] dual_coef_
- []切片_

jigna-panchal 2019年11月03日

ExtraTreeRegressorドキュメントにfeature_importances_を追加します

marielledado 2020年01月25日

データサイエンス専攻のグループと私は、BayesianRidge、[X_offset_、X_scale_]属性のドキュメントの作成を開始します。

juliapiscioniere 2020年01月28日

こんにちは、私たちの貢献者グループは以下に取り組んでいます：

PLSSVD
CCA
インクリメンタルPCA
MiniBatchKMeans
なげなわ

phork37 2020年02月20日

＃16826の潜在的な修正

BenjaminLiuPenrose 2020年04月02日

テストは＃16286で追加されました。
現在、スキップされるクラスがいくつかあります。
https://github.com/scikit-learn/scikit-learn/blob/753da1de06a764f264c3f5f4817c9190dbe5e021/sklearn/tests/test_docstring_parameters.py#L180

これらのいくつかはすでにPRを持っているので、作業を開始する前に必ずそれを確認してください。

amueller 2020年05月29日

これらのいくつかはすでにPRを持っているので、作業を開始する前に必ずそれを確認してください。

マージされていないオープンPRを調べて、それらを終了しようとするのも良いオプションです。

経験則として、PRが2〜3週間以上アクティビティを取得していない場合は、それを引き継いで終了しようとしても問題ありません。

lesteve 2020年06月23日

このようなソリューションに興味がある場合は、パラメータがすべて文書化されているか、スペルが間違っていないかをチェックするsphinxの拡張機能を実装する方法があります（例はhttps://github.com/sdpython/pyquickhelper/blobで確認できます）。 /master/src/pyquickhelper/sphinxext/sphinx_docassert_extension.py）。たぶん、scikit-learnドキュメントにカスタムのものを追加すると便利かもしれません。

sdpython 2020年06月23日

🎉1

@sdpython 、それは素晴らしいでしょう！他に取り組んでいない場合は、PR草案を提案できますか？ありがとう！

cmarmo 2020年06月23日

面白い！

IIRCには、すべての属性が文書化されていることを確認する一般的なテストがあります。 https://github.com/scikit-learn/scikit-learn/pull/16286に追加されました

どのアプローチが望ましいかについての情報に基づく意見はありませんが、不足しているパラメータを文書化することは、チェックをどのように行うかを決定するよりもおそらく優先度が高いと思います。

lesteve 2020年06月23日

スフィンクスでこれを行う際の問題は、この場合、ドキュメントの作成に長い時間がかかるため（すべての例を生成するため）、単体テストまたはスタンドアロンツールの方が使いやすいでしょう。以前はhttps://github.com/scikit-learn/scikit-learn/issues/15440でnumpydoc検証を使用しており、型アノテーションを使用したdocstringの検証はhttps://github.com/terrencepreillyで実行できることに注意してください。 / darglint。したがって、docstringにも5つの異なる検証ツールを使用する状況を回避する必要があります:)

rth 2020年06月23日

たとえば、pytestを使用して結果を確認する機能が気に入っています。

pytest -v  --runxfail -k IsolationForest sklearn/tests/test_docstring_parameters.py

したがって、このためにスフィンクスのビルドを変更する必要はないかもしれません。

ogrisel 2020年06月23日

どの属性のdocstringがまだ欠落しているかを確認しました（上記のリストは古くなっています）。これらは私が見つけたものです：

BayesianGaussianMixture、[mean_precision_prior]
BayesianRidge、[X_offset_、X_scale_]
BernoulliNB、[coef_、intercept_]
バーチ、[fit_、partial_fit_]
CCA、[x_mean_、x_std_、y_mean_、y_std_]
DecisionTreeRegressor、[classes_、n_classes_]
DummyClassifier、[output_2d_]
DummyRegressor、[output_2d_]
ElasticNet、[dual_gap_]
ElasticNetCV、[dual_gap_]
ExtraTreeRegressor、[classes_、n_classes_]
FeatureAgglomeration、[n_components_]
LarsCV、[active_]
なげなわ、[dual_gap_]
LassoLarsCV、[active_]
LassoLarsIC、[alphas_]
MiniBatchKMeans、[counts_、init_size_、n_iter_]
MultiTaskElasticNet、[dual_gap_、eps_、sparse_coef_]
MultiTaskElasticNetCV、[dual_gap_]
MultiTaskLasso、[dual_gap_、eps_、sparse_coef_]
MultiTaskLassoCV、[dual_gap_]
NuSVR、[probA_、probB_]
OneClassSVM、[probA_、probB_]
OneVsRestClassifier、[coef_、intercept_]
OrthogonalMatchingPursuit、[n_nonzero_coefs_]
PLSCanonical、[x_mean_、x_std_、y_mean_、y_std_]
PLSSVD、[x_mean_、x_std_、y_mean_、y_std_]
SVR、[probA_、probB_]

marenwestermann 2020年06月26日

ありがとう@marenwestermann！

cmarmo 2020年06月26日

私はMiniBatchKMeansに取り組んでいます

jeremiedbb 2020年07月08日

私はなげなわに取り組んでいます。

marenwestermann 2020年07月10日

現在、MultiTaskElasticNetとMultiTaskLassoに属性sparse_coef_を追加する作業を行っています。

marenwestermann 2020年07月10日

私はLarsCVに取り組んでいます。

marenwestermann 2020年07月10日

@thomasjpfanクラスSVRとOneClassSVM言われています：
「probA_属性はバージョン0.23で非推奨になり、バージョン0.25で削除されます。」そして
「probB_属性はバージョン0.23で非推奨になり、バージョン0.25で削除されます。」

したがって、これらの属性はおそらくもうドキュメントを必要としませんよね？
ここから先、これら2つの属性もクラスNuSVRで非推奨になりますか？

marenwestermann 2020年08月07日

ExtraTreeRegressorの属性classes_およびn_classes_は誤検知です。

marenwestermann 2020年09月11日

したがって、これらの属性はおそらくもうドキュメントを必要としませんよね？
ここから先、これら2つの属性もNuSVRクラスで非推奨になりますか？

それらは非推奨になっているので、文書化する必要はないと思います。

ExtraTreeRegressorの属性classes_およびn_classes_は誤検知です。

うん、それらは非推奨になり、まだ削除されていない場合は削除する必要があります。

thomasjpfan 2020年09月11日

DecisionTreeRegressorクラスは次のように述べています。
「n_classes_属性はバージョン0.22から非推奨になり、0.24で削除されます。」
「classes_属性はバージョン0.22から非推奨になり、0.24で削除されます。」

したがって、これらの属性にもドキュメントは必要ありませんか？

Abilityguy 2020年09月16日

したがって、これらの属性にもドキュメントは必要ありませんか？

右@Abilityguy 、それを指摘してくれてありがとう。

cmarmo 2020年09月16日

_RidgeGCVで以下の不一致を確認できます。
Docstringエラー：_RidgeGCVの属性の不一致
アルファ_
最高のスコア_
coef_
dual_coef_
インターセプト_
n_features_in_

および_BaseRidgeCV内：
Docstringエラー：_BaseRidgeCVの属性の不一致
アルファ_
最高のスコア_
coef_
インターセプト_
n_features_in_

取り上げてもいいですか？私は初めてのタイマーで、貢献したいと思っています。

mynkdsi1011 2020年09月24日

@marenwestermannクラスFeatureAgglomerationで、その後n_components_偽陽性右だろう、バージョン0.21で、n_connected_components_がn_components_を交換するために追加されました、と言われて...？

srivathsa729 2020年09月26日

私の理解から@ srivathsa729はい。ただし、コア開発者の1人が再確認できればよいでしょう。

marenwestermann 2020年09月29日

ElasticNetを取り上げます

disha4u 2020年10月05日

BayesianRidgeの属性X_offset_およびX_scale_のドキュメントが＃18607で追加されました。

marenwestermann 2020年11月04日

👍1

属性output_2d_は、DummyClassifierおよびDummyRegressorで非推奨になりました（＃14933を参照）。

marenwestermann 2020年11月05日

このPRの上部にある@amuellerによって提供されたスクリプトを実行しました（n_features_in_を除いて、文書化する必要のある属性はこれ以上見つかりませんでした。この属性は、導入されたすべてのクラスで文書化されていないと思います。文書化する必要がありますか？
ping @NicolasHug

marenwestermann 2020年11月09日

このページは役に立ちましたか？

0 / 5 - 0 評価

Scikit-learn: すべての属性が文書化されていることを確認します

最も参考になるコメント

各推定量の属性docstringがありません

PRでこの問題を参照してください

全てのコメント79件

各推定量の属性docstringがありません

PRでこの問題を参照してください

関連する問題