Pandas: Interpolate (upsample) non-equispaced timeseries into equispaced 18.0rc1

Created on 7 Mar 2016  ·  3Comments  ·  Source: pandas-dev/pandas

I want to interpolate (upscale) nonequispaced time-series to obtain equispaced time-series.

Currently I am doing it in following way:

  1. take original timeseries.
  2. create new timeseries with NaN values at each 30 seconds intervals ( using resample('30S').asfreq() )
  3. concat original timeseries and new timeseries
  4. sort the timeseries to restore order of times (This I do not like - sorting has complexity of O = n log(n) )
  5. interpolate
  6. remove original points from the timeseries

is there a more simple way? like in matlab you have original timeseries and you pass new times as a parameter to the interpolate() function to receive values at desired times. Ideally I would like to have a function such as

origTimeSeries.interpolate(newIndex=newTimeIndex, method='spline')

I remark that times of original timeseries might not be be a subset of the times of desired timeseries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, 50, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:4',
                             '2015-01-04 08:37:05',
                             '2015-01-04 08:41:07',
                             '2015-01-04 08:43:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts
ts[ts==-1] = np.nan
newFreq=ts.resample('60S').asfreq()

new=pd.concat([ts,newFreq]).sort_index()
new=new.interpolate(method='time')

ts.plot(marker='o')
new.plot(marker='+',markersize=15)

new[newFreq.index].plot(marker='.')

lines, labels = plt.gca().get_legend_handles_labels()
labels = ['original values (nonequispaced)', 'original + interpolated at new frequency (nonequispaced)', 'interpolated values without original values (equispaced!)']
plt.legend(lines, labels, loc='best')
plt.show()


image

Enhancement Resample Timeseries

Most helpful comment

this gets you pretty close

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64

All 3 comments

use ordered_merge rather than concat and sort
http://pandas.pydata.org/pandas-docs/stable/merging.html#merging-ordered-data

It would be nice to do it without need of merge altogether since I do not really need the merged time series, I only need the resultant equispaced time series. Is the way I described (enhanced with the ordered_merge) the most efficient way to do such? Maybe using spicy directly would be better then

http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/interpolate.html#d-interpolation-interp1d
scipy allows to do it in Matlab style, keep the original timeseries, and pass new index to obtain new timeseries.

also I will be working will online data so the original time series will grow and I will need to interpolate the new data and add them to the interpolated (equispaced) time series.

this gets you pretty close

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64
Was this page helpful?
0 / 5 - 0 ratings