Pandas: RLS: 0.24.0

Created on 3 Dec 2018  ·  61Comments  ·  Source: pandas-dev/pandas

Tracking issue

Open PRs
Open Issues

Let's whittle down. I just looked the lastest whatsnew and its huge. Let's get this out sooner rather than later. I know there are some blocking issues: DatetimeArray & What do do with CalendarDay.

Release

Most helpful comment

I'm on it. The pypy build was failing...

All 61 comments

The idea is for 0.24.0 to be the last py2-supporting release, right?

24021 fixes a corner case behavior in Timestamp comparisons, but also introduces an inconsistency between py2/py3 behavior. #21394 made the analogous change to Timedelta comparisons.

The _least_ consistent thing we could do is to keep the status quo, changing Timedelta behavior but not Timestamp behavior. The question is whether to a) merge #24021 and have mismatched py2/py3 behavior in 0.24.0, or b) revert #21394 until after 0.24.0 and wait to change both in the first py3-only release.

I lean slightly towards option b.

The idea is for 0.24.0 to be the last py2-supporting release, right?

Basically, though I assume we'll do some backporting and do a 0.24.1 or 2 that supports py2 as well.

I'm not super-familiar with this section, but your option b sounds reasonable.

Although... I could live with inconsistent py2 / py3 behavior. And would we be consistent with the building That wouldn't be the only one.

Question: with
https://github.com/pandas-dev/pandas/pull/24021 would we be consistent with the builtin timedelta of each version?

This release is indeed big, but v.0.24 is also quite special because it will effectively define the 1.0 API (in the sense of the no-deprecation policy between 0.24 and 1.0), and also because of the magnitude of the whole EA effort obviously.

But - despite a lot of hard work - the current state still feels a bit half-baked:

Realistically, a release before the end of the year would mean a cut-off in ~10 days max, which seems unrealistic from my POV, even if cutting corners.

Given that the following statement from @TomAugspurger above

Basically, though I assume we'll do some backporting and do a 0.24.1 or 2 that supports py2 as well.

effectively means more PY2 support at the beginning of 2019 anyway, I think one should consider not trying to force the release before year end.

If there is a release before the end of the year (resp. before the most important issues are sorted out), then I especially agree with Tom that there will need to be a 0.24.1 for PY2, as 0.24.0 will have too many issues (that hopefully surface in the RC, but well...) to be the last release, IMO.

Alternatively (which is simultaneously more in line with https://python3statement.org/, but also more controversial), one could consider to have a last PY2-supporting 0.23.5 this year, and then do 0.24.0 as PY3-only next year...?

@h-vetinari pandas is a volunteer project almost completely. Thus project priorities are set by community consensus and worked towards. We have regular releases in time; 0.24.0 is actually overdue by a few months. Trying to add additional things which themselves need discussion is not going to happen.

Whatever exists in he 0.24.x series is the last release for Python 2, this has long been announced. This is just how it is.

I don't follow the point of
https://github.com/pandas-dev/pandas/issues/24060#issuecomment-444777018. Could you try restating / summarizing it?

I think Py2 vs. Py3 is irrelevant for 0.24.0.

The EA-releated issues you linked are I think all going in 0.24 (I didn't check them all). That's basically the blocker at this point, but I haven't reviewed the backlog recently.

I haven't had time to look into unique.

@jreback

@h-vetinari pandas is a volunteer project almost completely. Thus project priorities are set by community consensus and worked towards. We have regular releases in time; 0.24.0 is actually overdue by a few months. Trying to add additional things which themselves need discussion is not going to happen.

I know all that. I'm just saying that being overdue is a bad reason to rush a release, if some of the core changes have not been fully developed yet (talking mostly about EA + regressions).

Whatever exists in he 0.24.x series is the last release for Python 2, this has long been announced. This is just how it is.

I don't know what's been discussed in other channels, but what I saw on GH was that the main decision was 0.24->0.25->1.0. Re:PY2, it has equally been said (and there's a warning to that effect on the whatsnew's) that there won't be PY2 releases after Dec. 31st 2018. Supporting the 0.24.0 series for PY2 is another ~6-8 month of supporting PY2 (as backporting to 0.24-branch would otherwise be very cumbersome). Of course that's a valid choice, but I just wanted to suggest the possibility of leaving PY2 at the very stable 0.23.5 instead.

@TomAugspurger

I don't follow the point of

24060 (comment). Could you try restating / summarizing it?

I think Py2 vs. Py3 is irrelevant for 0.24.0.

Sorry this was not clear. The main (interrelated) two points are:

  • one should not rush 0.24 due to the stated deadline (for supporting PY2) of Dec 31st.

    • listed some issues for that - some of these (and many more related ones I didn't tag) have been pushed to "Contributions Welcome" instead of v.0.24

    • brought up that 0.24 is special due to the API-lock policy until 1.0, and that I'd think this would be a pity for .unique (but I get that I'm a lone voice in the wilderness about that one)

  • point out the discrepancy between the warning box ("Starting January 1, 2019, pandas feature releases will support Python 3 only") and supporting the 0.24-series for PY2, along with the suggestion to have a 0.23.5 for PY2

The EA-releated issues you linked are I think all going in 0.24 (I didn't check them all). That's basically the blocker at this point, but I haven't reviewed the backlog recently.

The backlog has been thinned substantially recently, not least by issues being pushed to "Contributions Welcome" (also for EA issues). This goes towards my point of cutting corners to get this out soon.

That being said, I'm pretty new at this game, and I don't doubt that the core devs got the big picture in view and under control - but pointing out an observation can't hurt, I hope.

I haven't had time to look into unique.

Fair enough, dev time is a very limited resoure. I guess I'll have to try to convince everyone of this in post-1.0 land. ;-)

@h-vetinari

I know all that. I'm just saying that being overdue is a bad reason to rush a release, if some of the core changes have not been fully developed yet (talking mostly about EA + regressions).

If we keep adding and adding, then the release just keeps getting delayed forever. I have drawn a line in sand. This is how you get product out the door. Once the DTA lands fully, we will be in a position to release. So this is not that far away. Sure we could do extra work and just say 0.23.5 is the last PY2 release (and of course release it). But its going to be easier to back to a stable branch, which means the 0.24.x series.

There are always things to add in a release, but this one is already the biggest we have ever had. There are inevitable bugs and so better to do sooner rather than later. Thanks for your contributions. It is not possible to get every major API change here. You are exactly right in that dev time is a truly limited resource.

@jreback
Thanks for the response. I understand you want to get this out the door ASAP, which is fair enough, of course.

But its going to be easier to back to a stable branch, which means the 0.24.x series.

Seems I misunderstood that pandas would stop supporting PY2 per Jan 1st 2019... Maybe should adapt that warning box in the whatsnews then ("v.0.24.x will be the last series that supports PY2; starting with v.0.25.0, pandas will be python3-only"...?)

RE: CalendarDay progress https://github.com/pandas-dev/pandas/pull/22867#issuecomment-445433463

TODO
Add all the depreciation warnings (I anticipate there being quite a few of these). This needs to be included in the following:

  • Day tick arithmetic with other Ticks, Timedeltas, and DatetimeTZ
  • DatetimeIndex.shift (tz-aware only)

Migration
The plan is for _Day (formerly CalendarDay) to just replace Day once the prior Day behavior is replaced.

Concern
During the last dev chat, there was interest for 'D' to be compatible freq argument/offset with both Timedelta and Datetime. I don't see a clear way to make this possible without adding a lot of monkeypatching.

Example: timedelta_range(..., freq='D'); to_offset('D') will return _Day in the future and this offset will need to increment a Timedelta, but _Day + Timedelta is an invalid operation.

Anyone have opinions on the timestamp/timedelta py2/py3 consistency issue?

A bunch of deprecations are listed as To Be Removed in 1.0; should 0.25.0 take the place of 1.0 for some of those?

Anyone have opinions on the timestamp/timedelta py2/py3 consistency issue?

Can you summarize that issue? Ideally we would just follow python (whatever version is running) here I think. But I don't think I fully understand the issue.

A bunch of deprecations are listed as To Be Removed in 1.0; should 0.25.0 take the place of 1.0 for some of those?

I think all of them... Need to discuss that though, some may need be pushed.

https://github.com/pandas-dev/pandas/issues/24060#issuecomment-444180736

Timedelta was recently changed to return NotInplemented in case where it previously raised. As a result its py2 behavior matches python but differs from pandas py3 behavior.

Timestamp has an open PR to make the analogous change.

Once py2 is dropped, the change is definitely right. Until then, there are conflicting consistency arguments.

We should either get the Timestamp PR in for 0.24.0 or revert the Timedelta PR until after 0.24.0

(Typing with thumbs; LMk if unclear)

i think let’s revert the Timedelta one, then push them both in for 0.25/1.0 (py3 only)

Moving this comment https://github.com/pandas-dev/pandas/pull/24227#issuecomment-446680041 here:

(for which IMO we will also need at least a couple of weeks in master)

[Tom] Just to verify, we should do a release candidate with DatetimeArray ASAP, right? And then 1-2 weeks on master while the RC is out?

Personally, no, I wouldn't do that (if you mean with ASAP like a few days after). I would also keep it at least 2 weeks in master before doing an RC. Now in practice it will maybe be like that anyway ..

Personally, I'll have to get back to dask / other things after this push on datetimearray. I was hoping we could have an RC out while I do that.

Are there other major issues I could pick up while we're going through this round of reviews? My plate currently has

yeah i think we should merge things (that tom mentioned) then sit in master for a week or 2 at the least

To be clear, I also want to see this released as soon as possible, but we also need to be realistic (eg I don't think we will have a final release before the end of the year as you mentioned on fastparquet? even if all the blocking PRs are merged in a week that seems too quick IMO)

If we have a longer RC period, and still do some further clean-up after doing the RC (and possibly do a second RC), I am fine with doing a quick RC after merging.
But if we see an RC as "ready to be released from our part, if no major issue is reported by people trying out the RC, we can do a final release from that", then we should have those major changes in master for a bit IMO.

I think I have the permissions to do a fastparquet release. There's a backwards / forwards compatible change that could be released today.

But if we see an RC as "ready to be released from our part, if no major issue is reported by people trying out the RC, we can do a final release from that", then we should have those major changes in master for a bit IMO.

If we have a longer RC period, and still do some further clean-up after doing the RC (and possibly do a second RC), I am fine with doing a quick RC after merging.

That's basically where I'm at. Assuming that outstanding big PRs are merged this week (just assuming, not an actual deadline), then we will turn up issues with them in the next couple weeks. My hope is that by doing an RC (or two) we'll be more likely to turn up more issues, so that we can do a higher-quality final release sooner.

The main cost of doing the RC sooner is that we don't get any more scope creep, which may be a good thing :)

That said, I don't think doing an RC any time in, say, the 20th - 27th is a good idea because of the holidays. So I'm also fine with doing one shortly after the New Years.

that all sounds good ; RC first of the year;

@jreback @TomAugspurger @jorisvandenbossche
Would you accept a PR for #22724 before the cutoff? I know you want to avoid scope creep and get this out of the door soon, but I'm coming from the consistency side of things, where this is a change I think could be beneficial to have sooner rather than later. Thought I'd ask before I invest the time.

Speaking of which - do you already have an idea what will be the policy for breaking changes between v0.24 and v0.25? Will they be blocked completely, or will master move to 1.0.0.dev immediately, with v0.25 using backports?

@jreback @TomAugspurger @jorisvandenbossche

Speaking of which - do you already have an idea what will be the policy for breaking changes between v0.24 and v0.25? Will they be blocked completely, or will master move to 1.0.0.dev immediately, with v0.25 using backports?

Re-asking this, since - in case breaking PRs would be blocked until v.0.25 is released - I would suspend all work on breaking PRs.

clearing the decks, pls don't tag for 0.24.0 unless an immediate merge. excluding cleanups still being done by @jbrockmendel and @TomAugspurger

ideally could do rc1 say next week, @TomAugspurger ?

I was thinking next week as well. Going through the backlog right now.

On Fri, Jan 4, 2019 at 8:00 AM Jeff Reback notifications@github.com wrote:

clearing the decks, pls don't tag for 0.24.0 unless an immediate merge.
excluding cleanups still being done by @jbrockmendel
https://github.com/jbrockmendel and @TomAugspurger
https://github.com/TomAugspurger

ideally could do rc1 say next week, @TomAugspurger
https://github.com/TomAugspurger ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/24060#issuecomment-451450878,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIt1WfzSQoOTnbRYdwYdvo6Qo1Zy5ks5u_16OgaJpZM4Y9wcW
.

yep me too

Re-asking this, since - in case breaking PRs would be blocked until v.0.25 is released - I would suspend all work on breaking PRs.

My preference is that the only API-breaking changes from 0.25 -> 1.0 is the removal of previously deprecated features. Then users can

  1. Ensure things run smoothly on 0.25.x
  2. Fix any FutureWarnings from pandas
  3. Confidently upgrade to 1.0

IIRC there was loose agreement to this at the last dev meeting.

@TomAugspurger meaning we still do breaking changes / deprecations in the 0.25 development cycle? (as that was the actual question of @h-vetinari I think, apart from 0.25 -> 1.0)

I don't really remember what we said about that, only a vague recollection from the sprint in summer that we already wanted all deprecations in 0.24 and not add more in 0.25 (although the summary in https://github.com/pandas-dev/pandas/wiki/Pandas-Sprint-(July,-2018) only speaks about having all deprecations in 0.25 and removing them in 1.0).

Sorry if I misread. Your vague recollection matches my vague recollection on deprecations for 0.25.0 :)

Do we want to revisit that policy? IOW do we want to allow new deprecations in 0.25.0 that are either

  • Removed in 1.0 (without much time for the community to adapt)
  • Maintained for 1.0, and removed in 2.0 (if we're doing semver)

we should have a call on the plan after 0.24 is out the door

I'd like to cut RC1 in ~4 hours, after https://github.com/pandas-dev/pandas/pull/24708 is merged. Any objections?

We'll need to discus how conservative we are with merging PRs during the RC period. I don't recall what we did last time (only PRs that are fixing bugs with the RC specifically? Or "small" things are OK?).

ok the RC, yeah small things are ok. If we have big then maybe need RC2

Tagging now, and going through the local tests before pushing the tag. Ping me if you find a last-minute blocker.

@TomAugspurger you mentioned you were going to write a blog post for the release, but I suppose only for the final release?
In that case, it might be good to already have some highlights (I started drafting some); the whatsnew file can use some clean-up in general.

Any idea how long it'll take? I was just about to push the tag :)

Though, I can rebuild the docs that'll be pushed to the web server from a different commit.

You've got a bit more time, since I think I need to do some work on our conda-forge recipe to ensure numpy >= 1.12 :)

Yeah, don't wait for me. I have some other work to finish first.

OK. tagged.

On Fri, Jan 11, 2019 at 9:13 AM Joris Van den Bossche <
[email protected]> wrote:

Yeah, don't wait for me. I have some other work to finish first.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/24060#issuecomment-453548795,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIsamTvr2YaLm68tQDFCX40_nraSTks5vCKo3gaJpZM4Y9wcW
.

I started going through the whatsnew file to extract highlights, and already thought of:

  • Refactor of the internal handling of custom data types:

    • Better integration of the ExtensionArray interface

    • Period and Interval can now be stored in Series / DataFrame columns (before only in Index)

  • New .array attribute on Series and Index to access the underlying values, and to_numpy method to convert to numpy arrays.
  • Optional integer NA
  • sparse changes

Is there any highlight to mention for all the datetime-like refactoring? (apart from "refactor of the internal handling of custom data types")

Any other new features or changes worth mentioning?

https://github.com/pandas-dev/pandas/releases/tag/v0.24.0rc1 has a few.

On Fri, Jan 11, 2019 at 9:51 AM Joris Van den Bossche <
[email protected]> wrote:

I started going through the whatsnew file to extract highlights, and
already thought of:

  • Refactor of the internal handling of custom data types:

    • Better integration of the ExtensionArray interface

    • Period and Interval can now be stored in Series / DataFrame

      columns (before only in Index)

  • New .array attribute on Series and Index to access the underlying
    values, and to_numpy method to convert to numpy arrays.
  • Optional integer NA
  • sparse changes

Is there any highlight to mention for all the datetime-like refactoring?
(apart from "refactor of the internal handling of custom data types")

Any other new features or changes worth mentioning?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/24060#issuecomment-453561740,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIh4Abyv5aEhn2vAA7KAJziTL75Rkks5vCLL7gaJpZM4Y9wcW
.

Binaries are building and HTML docs are up at http://pandas.pydata.org.

Will send out an announcement later today, once the binaries are done.

Mac and Linux wheels are on PyPI. Conda packages are trickling into conda-forge, and I sent out the announce email.

https://github.com/pandas-dev/pandas-release has a few things that need fixing up. Some are RC specific, so I'm not too worried about them. I'll try to iron out all the final issues while doing the final release, and then hopefully someone else can try things out, to see what machine-specific stuff I've accidentally encoded in there.

no windows wheel for 0.24.0rc1 ?

I’m not sure if @cgohlke typically builds and uploads release candidates.

I'm on it. The pypy build was failing...

Thanks @cgohlke, windows wheels are up on PyPI now.

We should probably do 0.24.0 this week. Any objections? Any blockers?

I don't know if I'll get https://github.com/pandas-dev/pandas/pull/24674 done. Won't have too much time this week.

no objections - like to get what is marked currently for 0.24.0 in but if in a couple of days they r not then ok to defer

@TomAugspurger all issues & PR's are clean for 0.24.0

I'll do a quick doc PR adding an experiment label to DatetimeArray and
TimedeltaArray, with a warning that .dtype is expected to change in the
future.

On Wed, Jan 23, 2019 at 7:03 AM Jeff Reback notifications@github.com
wrote:

@TomAugspurger https://github.com/TomAugspurger all issues & PR's are
clean for 0.24.0


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/24060#issuecomment-456793566,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHImKD-UhxOjifdIssgpzK7mRPh69fks5vGF2mgaJpZM4Y9wcW
.

Planning to merge

and tag shortly afterwards. Anything else from the RC?

On Wed, Jan 23, 2019 at 7:15 AM Tom Augspurger tom.augspurger88@gmail.com
wrote:

I'll do a quick doc PR adding an experiment label to DatetimeArray and
TimedeltaArray, with a warning that .dtype is expected to change in the
future.

On Wed, Jan 23, 2019 at 7:03 AM Jeff Reback notifications@github.com
wrote:

@TomAugspurger https://github.com/TomAugspurger all issues & PR's are
clean for 0.24.0


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/24060#issuecomment-456793566,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHImKD-UhxOjifdIssgpzK7mRPh69fks5vGF2mgaJpZM4Y9wcW
.

Down to

If people have quick thoughts on #24926 (removing IntervalArray from the top-level, just using pd.arrays.IntervaArray) then a quick +/-1 over there would be useful.

@TomAugspurger Before tagging, can you do a last setting of the date in the whatsnew docs? (now still January XX) Or in the release commit

All merged I think!

Thanks, tagging.

nice @TomAugspurger

Wooo! Congratulations. Thank you for all the hard work. Really looking forward to the release.

sdist and binaries are up on PyPI and conda-forge. Anaconda is building for defaults now.

Thanks everyone.

And thank you!

Was this page helpful?
0 / 5 - 0 ratings