Pandas: [Good first issue] TST: Disallow bare pytest.raises

Created on 14 Jan 2020  ·  51Comments  ·  Source: pandas-dev/pandas

End users rely on error messages for their debugging purposes. Thus, it is important that we make sure that the correct error messages are surfaced depending on the error triggered.

The core idea is to convert this:

with pytest.raises(klass):
    # Some code that raise an error

To this:

with pytest.raises(klass, match=msg):
    # Some code that raise an error

You can read more about pytest.raises here.


Side note:

In case that the raised error message is an external error message (meaning that's not pandas specific), you should use the external_error_raised instead of pytest.raises.

the usage of external_error_raised is __exactly__ like pytest.raises the only difference is that you don't pass in the match argument.

For example:

import pandas._testing as tm

def test_foo():
    with tm.external_error_raised(ValueError):
        raise ValueError("foo")

Keynotes:

  • Don't forget to link this issue in your PR, paste this
https://github.com/pandas-dev/pandas/issues/30999

in your PR.

  • Please comment what you are planning to work on, so we won't do double the work (no need to mention me, you can just declare what you are planning to work on, just remember to check if something is already taken).

  • If a file/files that should be marked as "done" (as if there is no more work to do), isn't marked as "done", please comment letting me know about that (And mentioning me by putting @MomIsBestFriend at the comment's body, so I'll know where to look).


To generate the full list yourself, you can run:

python scripts/validate_unwanted_patterns.py -vt="bare_pytest_raises" pandas/tests/

You can also run it against a single file like:

python scripts/validate_unwanted_patterns.py -vt="bare_pytest_raises" pandas/tests/PATH/TO/SPECIFIC/FILE.py

If a file contains a bare pytest.raises, the script will output the following:

pandas/tests/arithmetic/test_numeric.py:553:Bare pytests raise have been found. Please pass in the argument 'match' as well the exception

Which means that at pandas/tests/arithmetic/test_numeric.py on line 553 there is a bare pytest.raises


The current list is:

  • [x] pandas/tests/io/pytables/test_timezones.py
  • [ ] pandas/tests/generic/methods/test_pipe.py
  • [ ] pandas/tests/reshape/merge/test_merge_asof.py
  • [ ] pandas/tests/extension/base/reduce.py
  • [x] pandas/tests/arrays/test_datetimelike.py
  • [ ] pandas/tests/extension/test_boolean.py
  • [ ] pandas/tests/extension/base/getitem.py
  • [ ] pandas/tests/arrays/boolean/test_arithmetic.py
  • [ ] pandas/tests/extension/base/setitem.py
  • [ ] pandas/tests/indexes/interval/test_astype.py
  • [ ] pandas/tests/io/parser/test_network.py
  • [ ] pandas/tests/extension/test_integer.py
  • [ ] pandas/tests/indexing/multiindex/test_partial.py
  • [ ] pandas/tests/io/parser/test_python_parser_only.py
  • [ ] pandas/tests/io/test_html.py
  • [ ] pandas/tests/reductions/test_stat_reductions.py
  • [ ] pandas/tests/dtypes/test_inference.py
  • [ ] pandas/tests/plotting/test_hist_method.py
  • [ ] pandas/tests/series/apply/test_series_apply.py
  • [ ] pandas/tests/io/excel/test_xlrd.py
  • [ ] pandas/tests/indexes/test_common.py
  • [ ] pandas/tests/util/test_assert_series_equal.py
  • [ ] pandas/tests/extension/base/ops.py
  • [ ] pandas/tests/io/test_clipboard.py
  • [ ] pandas/tests/plotting/frame/test_frame_color.py
  • [ ] pandas/tests/window/moments/test_moments_ewm.py
  • [ ] pandas/tests/io/test_gbq.py
  • [ ] pandas/tests/reductions/test_reductions.py
  • [ ] pandas/tests/io/test_feather.py
  • [ ] pandas/tests/resample/test_resampler_grouper.py
  • [ ] pandas/tests/indexes/multi/test_indexing.py
  • [ ] pandas/tests/io/test_common.py
  • [ ] pandas/tests/io/test_sql.py
  • [ ] pandas/tests/plotting/test_series.py
  • [ ] pandas/tests/io/test_fsspec.py
  • [ ] pandas/tests/extension/test_floating.py
  • [ ] pandas/tests/indexes/multi/test_setops.py
  • [ ] pandas/tests/reshape/test_get_dummies.py
  • [ ] pandas/tests/plotting/frame/test_frame_subplots.py
  • [ ] pandas/tests/plotting/test_backend.py
  • [ ] pandas/tests/generic/methods/test_sample.py
  • [ ] pandas/tests/plotting/test_boxplot_method.py
  • [ ] pandas/tests/io/test_parquet.py
  • [ ] pandas/tests/extension/test_string.py
  • [ ] pandas/tests/io/pytables/test_complex.py
  • [ ] pandas/tests/indexes/test_numpy_compat.py
  • [ ] pandas/tests/io/test_gcs.py
  • [ ] pandas/tests/io/sas/test_sas7bdat.py
  • [ ] pandas/tests/window/test_apply.py
  • [ ] pandas/tests/series/test_ufunc.py
  • [ ] pandas/tests/plotting/frame/test_frame.py
  • [ ] pandas/tests/reshape/test_union_categoricals.py
  • [ ] pandas/tests/io/json/test_ujson.py
  • [ ] pandas/tests/indexing/test_coercion.py
  • [ ] pandas/tests/io/pytables/test_store.py
  • [ ] pandas/tests/computation/test_compat.py
  • [ ] pandas/tests/io/json/test_pandas.py
  • [ ] pandas/tests/io/json/test_json_table_schema.py

    - [ ] pandas/tests/scalar/test_nat.py

NOTE:

The list may change as files are moved/renamed constantly.


Took pretty much everything from #23922, that was originally opened by @gfyoung.

Style Testing good first issue

All 51 comments

I'll take:

  • [x] pandas/tests/test_common.py
  • [x] pandas/tests/test_downstream.py
  • [x] pandas/tests/test_errors.py
  • [x] pandas/tests/test_lib.py
  • [x] pandas/tests/test_take.py
  • [x] pandas/tests/internals/test_internals.py
  • [x] pandas/tests/window/test_rolling.py

I will begin working on:

pandas/tests/arithmetic/test_numeric.py
pandas/tests/arithmetic/test_object.py
pandas/tests/arithmetic/test_period.py
pandas/tests/arithmetic/test_timedelta64.py
pandas/tests/arrays/interval/test_interval.py

@gdex1 I hope this will help you :)

(The numbers represents line number)

pandas/tests/arithmetic/test_numeric.py:138
pandas/tests/arithmetic/test_numeric.py:141
pandas/tests/arithmetic/test_numeric.py:190
pandas/tests/arithmetic/test_numeric.py:208
pandas/tests/arithmetic/test_numeric.py:210
pandas/tests/arithmetic/test_numeric.py:212
pandas/tests/arithmetic/test_numeric.py:214
pandas/tests/arithmetic/test_numeric.py:232
pandas/tests/arithmetic/test_numeric.py:234
pandas/tests/arithmetic/test_numeric.py:236
pandas/tests/arithmetic/test_numeric.py:238
pandas/tests/arithmetic/test_numeric.py:519
pandas/tests/arithmetic/test_numeric.py:610
pandas/tests/arithmetic/test_numeric.py:615
pandas/tests/arithmetic/test_numeric.py:617
pandas/tests/arithmetic/test_numeric.py:795
pandas/tests/arithmetic/test_numeric.py:798
pandas/tests/arithmetic/test_numeric.py:819
pandas/tests/arithmetic/test_object.py:140
pandas/tests/arithmetic/test_object.py:152
pandas/tests/arithmetic/test_object.py:154
pandas/tests/arithmetic/test_object.py:278
pandas/tests/arithmetic/test_object.py:280
pandas/tests/arithmetic/test_object.py:282
pandas/tests/arithmetic/test_object.py:284
pandas/tests/arithmetic/test_object.py:298
pandas/tests/arithmetic/test_object.py:301
pandas/tests/arithmetic/test_object.py:315
pandas/tests/arithmetic/test_object.py:318



md5-634e15eb80aa764171dbacd11a06b70b



pandas/tests/arithmetic/test_timedelta64.py:51
pandas/tests/arithmetic/test_timedelta64.py:445
pandas/tests/arithmetic/test_timedelta64.py:607
pandas/tests/arithmetic/test_timedelta64.py:609
pandas/tests/arithmetic/test_timedelta64.py:703
pandas/tests/arithmetic/test_timedelta64.py:705
pandas/tests/arithmetic/test_timedelta64.py:707
pandas/tests/arithmetic/test_timedelta64.py:709
pandas/tests/arithmetic/test_timedelta64.py:741
pandas/tests/arithmetic/test_timedelta64.py:743
pandas/tests/arithmetic/test_timedelta64.py:960
pandas/tests/arithmetic/test_timedelta64.py:972
pandas/tests/arithmetic/test_timedelta64.py:1028
pandas/tests/arithmetic/test_timedelta64.py:1037
pandas/tests/arithmetic/test_timedelta64.py:1039
pandas/tests/arithmetic/test_timedelta64.py:1502
pandas/tests/arithmetic/test_timedelta64.py:1505
pandas/tests/arithmetic/test_timedelta64.py:1508
pandas/tests/arithmetic/test_timedelta64.py:1511
pandas/tests/arithmetic/test_timedelta64.py:1536
pandas/tests/arithmetic/test_timedelta64.py:1591
pandas/tests/arithmetic/test_timedelta64.py:1783
pandas/tests/arithmetic/test_timedelta64.py:1785
pandas/tests/arithmetic/test_timedelta64.py:1911
pandas/tests/arithmetic/test_timedelta64.py:1960
pandas/tests/arithmetic/test_timedelta64.py:1962
pandas/tests/arithmetic/test_timedelta64.py:1968



md5-634e15eb80aa764171dbacd11a06b70b



pandas/tests/arrays/interval/test_interval.py:155

@gfyoung the list wasn't generated by grep -r -e "pytest.raises([a-zA-Z]*)" pandas/tests -l in fact, it was generated by the script in #30755 (a validation type called bare_pytest_raises), I will put instructions at the issue body, once it gets merged :smile:

@MomIsBestFriend I will help with :
pandas/tests/base/test_constructors.py
pandas/tests/base/test_ops.py

I can take care of these:
@MomIsBestFriend

pandas/tests/io/test_html.py
pandas/tests/io/test_parquet.py
pandas/tests/io/test_sql.py
pandas/tests/io/test_stata.py
pandas/tests/plotting/test_backend.py
pandas/tests/plotting/test_boxplot_method.py
pandas/tests/plotting/test_frame.py
pandas/tests/plotting/test_hist_method.py
pandas/tests/plotting/test_series.py
pandas/tests/reductions/test_reductions.py

@MomIsBestFriend there was quite some discussion in https://github.com/pandas-dev/pandas/issues/23922 about to go about this. Because to repeat as I said there: I don't think we should "blindly" assert all error messages.

Some things that were said in that thread: limit it to internal error messages, limit the match to a few key words of the message, avoid complicated patterns.

Also, I think asserting error messages should go hand in hand with actually checking if it is a good, clear error message, and potentially improving this.

It might be good to distill a list of attention points from the discussion in the other issue to put here.

@jorisvandenbossche

@MomIsBestFriend there was quite some discussion in #23922 about to go about this. Because to repeat as I said there: I don't think we should "blindly" assert all error messages.

I completely agree, but the problem is that newcomers don't know what error messages to assert and what error messages not to assert, if we somehow define rules on what error messages to assert and what not to assert, and at the same time keeping this issue "beginner friendly", it will be great (IMO).

Also, if we plan to enforce this in the CI we need to somehow mark what bare pytest raises are "bare" on purpose (IMO comment with the style of isort: skip is enough) , and also so other people will know that a particular bare pytest raise is bare on purpose.

Some things that were said in that thread: limit it to internal error messages, limit the match to a few key words of the message, avoid complicated patterns.

I don't see why we wouldn't want to test internal error messages, can you please elaborate even more?

I see the point that you pointed out in https://github.com/pandas-dev/pandas/issues/23922#issuecomment-458551763, and I'm +1 on that, but I'm +2 (if that make any sense) on https://github.com/pandas-dev/pandas/issues/23922#issuecomment-458733117 and https://github.com/pandas-dev/pandas/issues/23922#issuecomment-458735169 because IMO the benefit is larger than the cost.

Also, I think asserting error messages should go hand in hand with actually checking if it is a good, clear error message, and potentially improving this.

Absolutely agree.

It might be good to distill a list of attention points from the discussion in the other issue to put here.

I have read the conversation at #23922, but I didn't saw anything that IMO is worth putting as a "note" in the issue's body, can you please point out things I missed?

I have read the conversation at #23922, but I didn't saw anything that IMO is worth putting as a "note" in the issue's body, can you please point out things I missed?

I don't see much else to add from that issue either.

I completely agree, but the problem is that newcomers don't know what error messages to assert and what error messages not to assert, if we somehow define rules on what error messages to assert and what not to assert, and at the same time keeping this issue "beginner friendly", it will be great (IMO).

Also, if we plan to enforce this in the CI we need to somehow mark what bare pytest raises are "bare" on purpose (IMO comment with the style of isort: skip is enough) , and also so other people will know that a particular bare pytest raise is bare on purpose.

These are reasons in part why picking and choosing which to test and which not to test is not the direction I would prefer. I would also add that we do sometimes check error message strings in except blocks, so good error messages also benefit us during development.

Also, if these "internal" messages aren't that important, why do we have an error message in the first place? I would then just create a helper that then asserts the message is empty.

I don't see why we wouldn't want to test internal error messages, can you please elaborate even more?

So I said "limit to internal error messages", while "internal" can be a bit ambiguous... I meant: error messages that originate from pandas itself, and of course we want to test those. But so I meant that we (IMO) shouldn't test too much external error messages, meaning: messages coming from eg numpy or other libraries. Numpy can change those, and then our tests start failing due to a cosmetic change in numpy (and this is not hypothetical, it happened just last week I think).

Now, I used "internal" in a different context in https://github.com/pandas-dev/pandas/pull/30998#discussion_r366726966. There, I meant as an internal, developer oriented error message that should never be raised to the user. IMO, those are not important to test exactly with the error message.

I see the point that you pointed out in #23922 (comment), and I'm +1 on that, but I'm +2 (if that make any sense) on #23922 (comment) and #23922 (comment) because IMO the benefit is larger than the cost.

Let's put @simonjayhawkins's comment that you link to here:

I am working on the assumption, maybe incorrectly, that it will be beneficial to

  1. identify tests that can be parametrized
  2. identify tests that should be split
  3. better understanding of the failure mode tested
  4. indirectly add more documentation to the tests
  5. identify where error messages could be more consistent
  6. identify tests that are redundant
  7. help improve error messages
  8. identify tests that are currently passing for the wrong reason.

That are all useful things, I fully agree. But that is not simple, and if we want to get those things out of this issue, then this issue is not for beginners. Of course, beginners don't need to do all of those things at once, but I still have the feeling that those PRs adding asserts are often rather close to "blindly adding the current error message to the pytest.raises call" without going any further (the above points).

Also, if the above points is what makes this exercise useful, it are more concrete instructions about this that is useful to put at the top of this issue, I think.


To be clear, I am all for better error messages and better tests asserting we have and keep those error messages good. But we also have limited time, and each PR requires time and effort to do and to review, and the question is where effort is best spent.
IMO, it would be more useful instead to focus on "fix bare pytest raises" rather to focus on "improve error messages" (and while doing this, better test them).

Also, if the above points is what makes this exercise useful, it are more concrete instructions about this that is useful to put at the top of this issue, I think.

It might make sense to create a larger issue to track these (other issues worth including in such an issue are https://github.com/pandas-dev/pandas/issues/19159 and https://github.com/pandas-dev/pandas/issues/21575).

This part in itself is self-contained and is very approachable for beginners.

@gfyoung how are those issues you link related to this discussion?

They relate to the comment that you explicitly introduced from @simonjayhawkins

31072 contains the one missing match in stata.py

As saying here https://github.com/pandas-dev/pandas/pull/31091#issuecomment-575422207 I'm with @jorisvandenbossche's idea, that we won't test error messages from external packages, Any ideas on how to mark those?

If we really don't want to test certain error messages (I could go either way on external ones to be fair), I think we should just create a helper function like this:

~python
def external_error_raised(expected_exception):
return pytest.raises(expected_exception, match=None)
~

This will make it clear to our future selves that this is a non-pandas error, and the match=None serves to appease any linting check we develop for bare pytest raises.

If we really don't want to test certain error messages (I could go either way on external ones to be fair), I think we should just create a helper function like this:

def external_error_raised(expected_exception):
   return pytest.raises(expected_exception, match=None)

This will make it clear to our future selves that this is a non-pandas error, and the match=None serves to appease any linting check we develop for bare pytest raises.

+1 on that.

I really like that idea, can we make it a convention for our tests?

That if a function is testing if a function/method is raising an error, and the error is an external error, we simply put match=None in the "pytest.raises```.

can we make it a convention for our tests?

By that I mean putting a section on in the Contributing guide.

That if a function is testing if a function/method is raising an error, and the error is an external error, we simply put match=None in the "pytest.raises```.

I would prefer the helper function since you then wouldn't have to think about adding that. Also, the helper name is much clearer as to why we're doing it.

If we really don't want to test certain error messages (I could go either way on external ones to be fair), I think we should just create a helper function like this:

def external_error_raised(expected_exception):
   return pytest.raises(expected_exception, match=None)

This will make it clear to our future selves that this is a non-pandas error, and the match=None serves to appease any linting check we develop for bare pytest raises.

@gfyoung where do you recommend putting this helper function? (as if in what file?)

pandas._testing

Hello,

I would like to work on :

pandas/tests/arrays/interval/test_ops.py
pandas/tests/arrays/test_array.py
pandas/tests/arrays/test_boolean.py

Hello - I would like to work on:

pandas/tests/arithmetic/test_period.py
pandas/tests/arithmetic/test_timedelta64.py

Hello all, I'll take the following:

pandas/tests/computation/test_compat.py
pandas/tests/computation/test_eval.py
pandas/tests/dtypes/cast/test_upcast.py
pandas/tests/dtypes/test_dtypes.py

@MomIsBestFriend this one is done already but isn't marked as done:
pandas/tests/arithmetic/test_numeric.py

UPDATE

@MomIsBestFriend these too:
pandas/tests/arithmetic/test_period.py
pandas/tests/arrays/test_integer.py
pandas/tests/arrays/test_period.py

These are included in #31852

pandas/tests/extension/decimal/test_decimal.py
pandas/tests/extension/json/test_json.py
pandas/tests/extension/test_boolean.py
pandas/tests/extension/test_categorical.py
pandas/tests/frame/indexing/test_categorical.py
pandas/tests/frame/indexing/test_indexing.py
pandas/tests/frame/indexing/test_where.py
pandas/tests/frame/methods/test_explode.py
pandas/tests/frame/methods/test_isin.py
pandas/tests/frame/methods/test_quantile.py
pandas/tests/frame/methods/test_round.py
pandas/tests/frame/methods/test_sort_values.py
pandas/tests/frame/methods/test_to_dict.py

I'll take

pandas/tests/io/excel/test_readers.py
pandas/tests/io/excel/test_writers.py
pandas/tests/io/excel/test_xlwt.py
pandas/tests/io/formats/test_format.py
pandas/tests/io/formats/test_style.py
pandas/tests/io/formats/test_to_latex.py

@MomIsBestFriend
These one are done without the mark:

  • pandas/tests/indexes/datetimes/test_astype.py

pandas/tests/indexes/datetimes/test_tools.py does not exist

I'll do:

  • pandas/tests/indexes/datetimes/test_constructors.py
  • pandas/tests/indexes/datetimes/test_date_range.py
  • pandas/tests/indexes/datetimes/test_indexing.py
  • pandas/tests/indexes/datetimes/test_shift.py
  • pandas/tests/indexes/datetimes/test_timezones.py

I have updated the original post, now that there's a script to detect bare pytest raises I have included instructions on how to use it, if anyone still got questions you are more than welcome to ask:)

I'll take,

pandas/tests/arithmetic/test_timedelta64.py

pandas/tests/scalar/timestamp/test_arithmetic.py
pandas/tests/scalar/timestamp/test_comparisons.py
pandas/tests/scalar/timestamp/test_constructors.py
pandas/tests/scalar/timestamp/test_timezones.py
pandas/tests/scalar/timestamp/test_unary_ops.py

seems all tests in pandas/tests/scalar/timestamp/ are already fixed.

$ git checkout master
Already on 'master'
$ python scripts/validate_unwanted_patterns.py -vt="bare_pytest_raises"  pandas/tests/scalar/timestamp/
$ 

pandas/tests/arrays/test_boolean.py => is missing.

I'm taking
pandas/tests/arrays/interval/test_ops.py
pandas/tests/arrays/test_datetimelike.py

pandas/tests/groupby/test_categorical.py
pandas/tests/groupby/test_groupby.py
pandas/tests/groupby/test_timegrouper.py

pandas/tests/arithmetic/test_timedelta64.py => #33010

pandas/tests/scalar/timestamp/test_arithmetic.py => no issue
pandas/tests/scalar/timestamp/test_comparisons.py => no issue
pandas/tests/scalar/timestamp/test_constructors.py => no issue
pandas/tests/scalar/timestamp/test_timezones.py => no issue
pandas/tests/scalar/timestamp/test_unary_ops.py => no issue

pandas/tests/arrays/test_boolean.py => is missing.

pandas/tests/arrays/interval/test_ops.py => #33010
pandas/tests/arrays/test_datetimelike.py => #33010

pandas/tests/groupby/test_categorical.py => #33144
pandas/tests/groupby/test_groupby.py => no issue
pandas/tests/groupby/test_timegrouper.py => no issue

pandas/tests/indexes/categorical/test_category.py => no issue
pandas/tests/indexes/common.py #33144
pandas/tests/indexes/datetimelike.py #33144

pandas/tests/indexes/interval/test_astype.py => all of the affected tests are marked to be xfailed, do we still need to fix, is so how?

pandas/tests/indexes/multi/test_compat.py #33144
pandas/tests/indexes/multi/test_duplicates.py => no issue
pandas/tests/indexes/multi/test_format.py => file not found.
pandas/tests/indexes/multi/test_reshape.py #33144
pandas/tests/indexes/multi/test_setops.py => no issue
pandas/tests/indexes/multi/test_sorting.py #33144

@sumanau7 did you list the files that you are taking up? I'm working on some of the files that I see you have merged.

Done with

pandas/tests/indexes/categorical/test_category.py
pandas/tests/indexes/period/test_constructors.py
pandas/tests/indexes/period/test_join.py
pandas/tests/indexes/period/test_partial_slicing.py
pandas/tests/indexes/period/test_setops.py
pandas/tests/indexes/timedeltas/test_delete.py

I'm working with

pandas/tests/indexes/ranges/test_constructors.py
pandas/tests/indexes/ranges/test_range.py
pandas/tests/indexing/multiindex/test_chaining_and_caching.py
pandas/tests/indexing/multiindex/test_partial.py
pandas/tests/series/indexing/test_alter_index.py
pandas/tests/arrays/boolean/test_function.py

Working on:

pandas/tests/reshape/merge/test_multi.py

I'll take:
pandas/tests/window/moments/test_moments_ewm.py
pandas/tests/window/moments/test_moments_rolling.py
pandas/tests/window/test_dtypes.py
pandas/tests/window/test_ewm.py
pandas/tests/window/test_expanding.py
pandas/tests/window/test_timeseries_window.py

Ill also take:

  • pandas/tests/frame/methods/test_assign.py
  • pandas/tests/frame/methods/test_at_time.py
  • pandas/tests/frame/methods/test_between_time.py
  • pandas/tests/frame/methods/test_first_and_last.py
  • pandas/tests/frame/methods/test_interpolate.py
  • pandas/tests/frame/methods/test_replace.py
  • pandas/tests/frame/test_query_eval.py

Hi,
I'm a new developer to the project and I'd like to help with this. Which of the remaining tests are best for a beginner?

Thanks,
Kevin

Hi,
I'm a new developer to the project and I'd like to help with this. Which of the remaining tests are best for a beginner?

Thanks,
Kevin

Welcome - I don't think any of these are any easier or harder than any others, any would be a good place to start

I'm getting an error running the validate_unwanted_patterns.py script:

Traceback (most recent call last):
  File "C:\Users\Kevom\git\pandas\scripts\validate_unwanted_patterns.py", line 397, in <module>
    main(
  File "C:\Users\Kevom\git\pandas\scripts\validate_unwanted_patterns.py", line 352, in main
    for line_number, msg in function(file_obj):
  File "C:\Users\Kevom\git\pandas\scripts\validate_unwanted_patterns.py", line 88, in bare_pytest_raises
    contents = file_obj.read()
  File "C:\Program Files (x86)\Python\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 76843: character maps to <undefined>

It's occurring when reading 'pandas/tests/test_strings.py' All of the files are encoded as cp1252.

I wanted to run the script to double check which tests are still unfinished to avoid duplicating work.

Most of the NotImplementedErrors don't have a specific message to be matched, I know it kind of defeats the purpose but is it a good idea to change them to pytest.raises(NotImplementedError, match=None) just to silence the linter?

I'll take:

  • pandas/tests/tools/test_to_datetime.py
  • pandas/tests/tseries/offsets/test_offsets.py

to start.

-Kevin

I am new to this so will start with this:

  • pandas/tests/tseries/offsets/test_ticks.py

Hello, I am new to contributing. Thanks for the clear write-up in the issue. I'll start with taking pandas/tests/generic/test_duplicate_labels.py, and will tackle some more if it works out.

I will take pandas/tests/arrays/test_datetimelike.py as a start.
Also, if you cannot run python scripts/validate_unwanted_patterns.py -vt="bare_pytest_raises" pandas/tests/successfully, try
python scripts/validate_unwanted_patterns.py -vt="bare_pytest_raises" pandas/tests/**/*.py instead

The easiest way to run it now would be to add

    -   id: unwanted-patterns-bare-pytest-raises
        name: Check for use of bare use of pytest raises
        language: python
        entry: python scripts/validate_unwanted_patterns.py --validation-type="bare_pytest_raises"
        types: [python]
        files: ^pandas/tests/

to .pre-commit-config.yaml in the - repo: local section, and then run

pre-commit run unwanted-patterns-bare-pytest-raises --all-files.

I've updated the issue with the remaining outstanding files

I can take these:

  • [x] pandas/tests/io/pytables/test_timezones.py
  • [ ] pandas/tests/generic/methods/test_pipe.py
  • [ ] pandas/tests/reshape/merge/test_merge_asof.py
  • [ ] pandas/tests/extension/base/reduce.py
  • [ ] pandas/tests/extension/base/getitem.py
  • pandas/tests/arrays/test_datetimelike.py

These are the top 5 in the list as of today.

@marktgraham If you haven't done test_datetime.py yet. Please leave it alone as I am about to make a PR

@liaoaoyuan97 no worries, I haven't touched test_datetimelike.py yet.

I will take pandas/tests/extension/base/getitem.py instead.

validate_unwanted_patterns.py raises an error on my side

$ python scripts/validate_unwanted_patterns.py -vt="bare_pytest_raises" pandas/tests/
Traceback (most recent call last):
  File "scripts/validate_unwanted_patterns.py", line 479, in <module>
    output_format=args.format,
  File "scripts/validate_unwanted_patterns.py", line 435, in main
    with open(file_path, encoding="utf-8") as file_obj:
IsADirectoryError: [Errno 21] Is a directory: 'pandas/tests/'

Seems to be related to #37419 perhaps?

I tried with the approach proposed by @MarcoGorelli and worked perfectly.

The easiest way to run it now would be to add

    -   id: unwanted-patterns-bare-pytest-raises
        name: Check for use of bare use of pytest raises
        language: python
        entry: python scripts/validate_unwanted_patterns.py --validation-type="bare_pytest_raises"
        types: [python]
        files: ^pandas/tests/

to .pre-commit-config.yaml in the - repo: local section, and then run

pre-commit run unwanted-patterns-bare-pytest-raises --all-files.

Does it make sense to add this to the .pre-commit-config.yaml and then update the instructions on this thread?

Does it make sense to add this to the .pre-commit-config.yaml and then update the instructions on this thread?

We will add it to .pre-commit-config.yaml once all the errors it raises are fixed, yes

Seems to be related to #37419 perhaps?

No, it's related to #37379 (which is when we moved this script over to pre-commit, hence it was no longer necessary for it to run on directories)

Was this page helpful?
0 / 5 - 0 ratings