Pandas: 将（上采样）非等距时间序列插入等距 18.0rc1

创建于 2016-03-07 · 3评论 · 资料来源: pandas-dev/pandas

我想插入（高档）非等距时间序列以获得等距时间序列。

目前我正在通过以下方式进行操作：

取原始时间序列。
以每 30 秒的间隔创建具有 NaN 值的新时间序列（使用 resample('30S').asfreq() ）
连接原始时间序列和新时间序列
对时间序列进行排序以恢复时间顺序（这我不喜欢 - 排序的复杂性为 O = n log(n) ）
插
从时间序列中删除原始点

有没有更简单的方法？就像在 matlab 中一样，您拥有原始时间序列，并将新时间作为参数传递给 interpolate() 函数以在所需时间接收值。理想情况下，我想要一个功能，例如

origTimeSeries.interpolate(newIndex=newTimeIndex, method='spline')

我注意到原始时间序列的时间可能不是所需时间序列时间的子集。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, 50, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:4',
                             '2015-01-04 08:37:05',
                             '2015-01-04 08:41:07',
                             '2015-01-04 08:43:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts
ts[ts==-1] = np.nan
newFreq=ts.resample('60S').asfreq()

new=pd.concat([ts,newFreq]).sort_index()
new=new.interpolate(method='time')

ts.plot(marker='o')
new.plot(marker='+',markersize=15)

new[newFreq.index].plot(marker='.')

lines, labels = plt.gca().get_legend_handles_labels()
labels = ['original values (nonequispaced)', 'original + interpolated at new frequency (nonequispaced)', 'interpolated values without original values (equispaced!)']
plt.legend(lines, labels, loc='best')
plt.show()

Enhancement Resample Timeseries

资料来源

marcelnem

👍2

最有用的评论

这让你非常接近

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64

jreback 于 2016-03-07

👍5

所有3条评论

使用ordered_merge 而不是concat 和sort
http://pandas.pydata.org/pandas-docs/stable/merging.html#merging -ordered-data

jreback 于 2016-03-07

不需要完全合并就可以很好地完成它，因为我真的不需要合并的时间序列，我只需要生成的等距时间序列。我描述的方式（通过ordered_merge 增强）是最有效的方式吗？也许直接用辣会更好

http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/interpolate.html#d -interpolation-interp1d
scipy 允许以 Matlab 风格进行，保留原始时间序列，并传递新索引以获得新时间序列。

我还将使用在线数据，因此原始时间序列会增长，我需要插入新数据并将它们添加到插入（等距）时间序列中。

marcelnem 于 2016-03-07

这让你非常接近

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64

jreback 于 2016-03-07

👍5

此页面是否有帮助？

0 / 5 - 0 等级

Pandas: 将（上采样）非等距时间序列插入等距 18.0rc1

最有用的评论

所有3条评论

相关问题