Pandas: 등간격이 μ•„λ‹Œ μ‹œκ³„μ—΄μ„ 등간격 18.0rc1둜 보간(μ—…μƒ˜ν”Œλ§)

에 λ§Œλ“  2016λ…„ 03μ›” 07일  Β·  3μ½”λ©˜νŠΈ  Β·  좜처: pandas-dev/pandas

등곡간 μ‹œκ³„μ—΄μ„ μ–»κΈ° μœ„ν•΄ (κ³ κΈ‰) λΉ„λ“±κ°„ μ‹œκ³„μ—΄μ„ λ³΄κ°„ν•˜κ³  μ‹ΆμŠ΅λ‹ˆλ‹€.

ν˜„μž¬ λ‹€μŒκ³Ό 같은 λ°©λ²•μœΌλ‘œ ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.

  1. μ›λž˜ μ‹œκ³„μ—΄μ„ κ°€μ Έ 가라.
  2. 30초 κ°„κ²©μœΌλ‘œ NaN κ°’μœΌλ‘œ μƒˆ μ‹œκ³„μ—΄ 생성( resample('30S').asfreq() μ‚¬μš©)
  3. concat μ›λž˜ μ‹œκ³„μ—΄ 및 μƒˆ μ‹œκ³„μ—΄
  4. μ‹œκ°„ μˆœμ„œλ₯Ό λ³΅μ›ν•˜κΈ° μœ„ν•΄ μ‹œκ³„μ—΄μ„ μ •λ ¬ν•©λ‹ˆλ‹€(이것은 λ§ˆμŒμ— 듀지 μ•ŠμŠ΅λ‹ˆλ‹€ - 정렬은 O = n log(n)의 λ³΅μž‘μ„±μ„ 가짐)
  5. λ³΄κ°„ν•˜λ‹€
  6. μ‹œκ³„μ—΄μ—μ„œ μ›λž˜ 점 제거

더 κ°„λ‹¨ν•œ 방법이 μžˆμŠ΅λ‹ˆκΉŒ? matlabμ—μ„œμ™€ 같이 μ›λž˜ μ‹œκ³„μ—΄μ΄ 있고 μ›ν•˜λŠ” μ‹œκ°„μ— 값을 μˆ˜μ‹ ν•˜κΈ° μœ„ν•΄ interpolate() ν•¨μˆ˜μ— λ§€κ°œλ³€μˆ˜λ‘œ μƒˆ μ‹œκ°„μ„ μ „λ‹¬ν•©λ‹ˆλ‹€. μ΄μƒμ μœΌλ‘œλŠ” λ‹€μŒκ³Ό 같은 κΈ°λŠ₯을 κ°–κ³  μ‹ΆμŠ΅λ‹ˆλ‹€.

origTimeSeries.interpolate(newIndex=newTimeIndex, method='spline')

μ›λž˜ μ‹œκ³„μ—΄μ˜ μ‹œκ°„μ€ μ›ν•˜λŠ” μ‹œκ³„μ—΄ μ‹œκ°„μ˜ ν•˜μœ„ 집합이 아닐 수 μžˆμŠ΅λ‹ˆλ‹€.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

values = [271238, 329285, 50, 260260, 263711]
timestamps = pd.to_datetime(['2015-01-04 08:29:4',
                             '2015-01-04 08:37:05',
                             '2015-01-04 08:41:07',
                             '2015-01-04 08:43:05',
                             '2015-01-04 08:49:05'])

ts = pd.Series(values, index=timestamps)
ts
ts[ts==-1] = np.nan
newFreq=ts.resample('60S').asfreq()

new=pd.concat([ts,newFreq]).sort_index()
new=new.interpolate(method='time')

ts.plot(marker='o')
new.plot(marker='+',markersize=15)

new[newFreq.index].plot(marker='.')

lines, labels = plt.gca().get_legend_handles_labels()
labels = ['original values (nonequispaced)', 'original + interpolated at new frequency (nonequispaced)', 'interpolated values without original values (equispaced!)']
plt.legend(lines, labels, loc='best')
plt.show()


image

Enhancement Resample Timeseries

κ°€μž₯ μœ μš©ν•œ λŒ“κΈ€

이것은 당신을 κ½€ κ°€κΉκ²Œ λ§Œλ“­λ‹ˆλ‹€

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64

λͺ¨λ“  3 λŒ“κΈ€

concat 및 sort λŒ€μ‹ ordered_mergeλ₯Ό μ‚¬μš©ν•˜μ‹­μ‹œμ˜€.
http://pandas.pydata.org/pandas-docs/stable/merging.html#merging -ordered-data

λ³‘ν•©λœ μ‹œκ³„μ—΄μ΄ μ‹€μ œλ‘œ ν•„μš”ν•˜μ§€ μ•Šκ³  결과적으둜 등간격 μ‹œκ³„μ—΄λ§Œ ν•„μš”ν•˜λ―€λ‘œ 병합할 ν•„μš” 없이 μ™„μ „νžˆ μˆ˜ν–‰ν•˜λŠ” 것이 μ’‹μŠ΅λ‹ˆλ‹€. λ‚΄κ°€ μ„€λͺ…ν•œ 방식(ordered_merge둜 ν–₯상됨)이 κ°€μž₯ 효율적인 λ°©λ²•μž…λ‹ˆκΉŒ? μ•„λ§ˆλ„ λ§€μš΄λ§›μ„ 직접 μ‚¬μš©ν•˜λŠ” 것이 더 λ‚˜μ„ κ²ƒμž…λ‹ˆλ‹€.

http://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/interpolate.html#d -interpolation-interp1d
scipyλ₯Ό μ‚¬μš©ν•˜λ©΄ Matlab μŠ€νƒ€μΌλ‘œ μˆ˜ν–‰ν•˜κ³  μ›λž˜ μ‹œκ³„μ—΄μ„ μœ μ§€ν•˜λ©° μƒˆ 인덱슀λ₯Ό μ „λ‹¬ν•˜μ—¬ μƒˆ μ‹œκ³„μ—΄μ„ 얻을 수 μžˆμŠ΅λ‹ˆλ‹€.

λ˜ν•œ μ›λž˜ μ‹œκ³„μ—΄μ΄ 컀질 수 μžˆλ„λ‘ 온라인 데이터λ₯Ό μž‘μ—…ν•  것이며 μƒˆ 데이터λ₯Ό λ³΄κ°„ν•˜κ³  λ³΄κ°„λœ(등간격) μ‹œκ³„μ—΄μ— μΆ”κ°€ν•΄μ•Ό ν•©λ‹ˆλ‹€.

이것은 당신을 κ½€ κ°€κΉκ²Œ λ§Œλ“­λ‹ˆλ‹€

In [42]: ts.reindex(ts.resample('60s').asfreq().index, method='nearest', tolerance=pd.Timedelta('60s')).interpolate('time')
Out[42]: 
2015-01-04 08:29:00    271238.000000
2015-01-04 08:30:00    271238.000000
2015-01-04 08:31:00    279530.428571
2015-01-04 08:32:00    287822.857143
2015-01-04 08:33:00    296115.285714
2015-01-04 08:34:00    304407.714286
2015-01-04 08:35:00    312700.142857
2015-01-04 08:36:00    320992.571429
2015-01-04 08:37:00    329285.000000
2015-01-04 08:38:00    329285.000000
2015-01-04 08:39:00    219540.000000
2015-01-04 08:40:00    109795.000000
2015-01-04 08:41:00        50.000000
2015-01-04 08:42:00        50.000000
2015-01-04 08:43:00    260260.000000
2015-01-04 08:44:00    260260.000000
2015-01-04 08:45:00    260950.200000
2015-01-04 08:46:00    261640.400000
2015-01-04 08:47:00    262330.600000
2015-01-04 08:48:00    263020.800000
2015-01-04 08:49:00    263711.000000
Freq: 60S, dtype: float64
이 νŽ˜μ΄μ§€κ°€ 도움이 λ˜μ—ˆλ‚˜μš”?
0 / 5 - 0 λ“±κΈ‰