åãªãææ¡- rolling
ãæ¡åŒµããŠãRã®rollapply(by=X)
ãªã©ã®ã¹ããããµã€ãºã®ããŒãªã³ã°ãŠã£ã³ããŠããµããŒãããŸãã
ãã³ã-éå¹ççãªãœãªã¥ãŒã·ã§ã³ïŒãã¹ãŠã®ãŠã£ã³ããŠã«é¢æ°ãé©çšããã¹ã©ã€ã¹ããŠ1ç§ããã®çµæãååŸããŸãïŒ
import pandas
ts = pandas.Series(range(0, 40, 2))
ts.rolling(5).apply(max).dropna()[::2]
ææ¡ïŒ
ts = pandas.Series(range(0, 40, 2))
ts.rolling(window=5, step=2).apply(max).dropna()
Rã«è§ŠçºãããŸããïŒ rollapplyããã¥ã¡ã³ããåç §ïŒïŒ
require(zoo)
TS <- zoo(seq(0, 40, 2))
rollapply(TS, 5, FUN=max, by=2)
8 12 16 20 24 28 32 36 40
'æšæº'é¢æ°ã䜿çšããŠããå Žåããããã¯ãã¯ãã«åãããŠãããããvé«éïŒ ts.rolling(5).max().dropna()[::2]
ïŒã§ãã
IIUCã§ã®ç¯çŽã¯ãé¢æ°ãã»ãã®ããããªæéïŒããšãã°ãnçªç®ããšã®å€ïŒã§é©çšããããšã«ãã£ãŠããããããŸãã ãããããããå®éçãªéããçãå Žåã¯ãããŸããïŒ
ããã¯å¯èœã§ããããããéèŠãªãŠãŒã¹ã±ãŒã¹ãèŠãããšæããŸãã ããã«ããããå ¥åãšåããµã€ãºãè¿ããAPIãç ŽæããŸãã ãããå®éã«å®è£ ããã®ã¯é£ãããšã¯æããŸãããïŒå®è£ ã«ã¯å€ãã®å€æŽãå¿ èŠã«ãªããŸããïŒã ããŒãžãã«ãŠã£ã³ããŠã䜿çšããŸãïŒIOWããŠã£ã³ããŠãèšç®ããé²ãã«ã€ããŠãé¢ãããã€ã³ãããããããªãããç²åŸãããã€ã³ããè¿œå ããŸãïŒã ãããã£ãŠãããã§ããã¹ãŠãèšç®ããå¿ èŠããããŸãããåºåã¯ããŸããã
è¿ä¿¡ããããšãããããŸãïŒ
IIUCã§ã®ç¯çŽã¯ãé¢æ°ãã»ãã®ããããªæéïŒããšãã°ãnçªç®ããšã®å€ïŒã§é©çšããããšã«ãã£ãŠããããããŸãã ãããããããå®éçãªéããçãå Žåã¯ãããŸããïŒ
ç§ã®ãŠãŒã¹ã±ãŒã¹ã¯ãããã€ãã®å€§ããªæç³»åããŒã¿ãã¬ãŒã ïŒ400åã5ã25Hzã§ã®ããŒã¿æéïŒã§éèšé¢æ°ïŒæ倧å€ã ãã§ãªãïŒãå®è¡ããããšã§ãã ç§ãéå»ã«20kHzãŸã§ã®ããŒã¿ã§åæ§ã®ããšïŒã»ã³ãµãŒããŒã¿ã®ç¹åŸŽå·¥åŠïŒãè¡ããŸããã 5ç§ã®ã¹ãããã§30ç§ã®ãŠã£ã³ããŠãå®è¡ãããšãåŠçã®å€§ããªãã£ã³ã¯ãç¯çŽãããŸããããšãã°ã5ç§ã®ã¹ãããã§25Hzã®å Žåãäœæ¥ã®1/125ã§ããã1åãŸãã¯2æéã§å®è¡ããå Žåã®éããçããŸãã
ç§ã¯æããã«numpyã«ãã©ãŒã«ããã¯ã§ããŸããããããè¡ãããã®ããé«ãã¬ãã«ã®APIãããã°ããã®ã§ããã ä»ã®äººãããã圹ç«ã€ãšæãå Žåã«åããŠãææ¡ãã䟡å€ããããšæã£ãã ãã§ã-ç§ã®ããã ãã«æ©èœãæ§ç¯ããããšã¯æåŸ ããŠããŸããïŒ
æåã«é«ãåšæ³¢æ°ééã«åãµã³ããªã³ã°ããŠããããŒãªã³ã°ãè©Šãããšãã§ããŸã
äœãã®ãããªãã®
df = df.resampleïŒ '30s'ïŒ
df.rollingïŒ..ïŒãmaxïŒïŒïŒãŸãã¯ä»»æã®é¢æ°ïŒ
ãã@jreback ãææ¡ãããããšãã
ããã¯ãããŒã¿max
ãå®è¡ããŠããå Žåã«æ©èœããŸãïŒãªãµã³ãã«ã«ã¯ãªãã¯ã·ã§ã³é¢æ°ãå¿
èŠã§ããããã§ãªãå Žåãããã©ã«ãã§mean
ã«ãªããŸãããïŒïŒïŒ
df.resample('1s').max().rolling(30).max()
ãã ãã30ç§ã®ããŒã¿ã«å¯ŸããŠãªãã¯ã·ã§ã³é¢æ°ãå®è¡ãã次ã«1ç§é²ã¿ã次ã®30ç§ã®ããŒã¿ã«å¯ŸããŠå®è¡ããããªã©ã§ããäžèšã®æ¹æ³ã§ã¯ã1ç§ã®ããŒã¿ã«å¯ŸããŠé¢æ°ãé©çšãã次ã«å¥ã®ããŒã¿ã«å¯ŸããŠé¢æ°ãé©çšããŸããæåã®é¢æ°ã®30ã®çµæã§é¢æ°ã
ç°¡åãªäŸã次ã«ç€ºããŸããããŒã¯ããŒããŒã¯èšç®ãå®è¡ããŠãã2åå®è¡ããŠãæ©èœããŸããïŒæããã«ïŒã
# 10 minutes of data at 5Hz
n = 5 * 60 * 10
rng = pandas.date_range('1/1/2017', periods=n, freq='200ms')
np.random.seed(0)
d = np.cumsum(np.random.randn(n), axis=0)
s = pandas.Series(d, index=rng)
# Peak to peak
def p2p(d):
return d.max() - d.min()
def p2p_arr(d):
return d.max(axis=1) - d.min(axis=1)
def rolling_with_step(s, window, step, func):
# See https://ga7g08.github.io/2015/01/30/Applying-python-functions-in-moving-windows/
vert_idx_list = np.arange(0, s.size - window, step)
hori_idx_list = np.arange(window)
A, B = np.meshgrid(hori_idx_list, vert_idx_list)
idx_array = A + B
x_array = s.values[idx_array]
idx = s.index[vert_idx_list + int(window/2.)]
d = func(x_array)
return pandas.Series(d, index=idx)
# Plot data
ax = s.plot(figsize=(12, 8), legend=True, label='Data')
# Plot resample then rolling (obviously does not work)
s.resample('1s').apply(p2p).rolling(window=30, center=True).apply(p2p).plot(ax=ax, label='1s p2p, roll 30 p2p', legend=True)
# Plot rolling window with step
rolling_with_step(s, window=30 * 5, step=5, func=p2p_arr).plot(ax=ax, label='Roll 30, step 1s', legend=True)
@alexloudenããªãã®å ã®èª¬æããç§ã¯æ¬¡ã®ãããªãã®ã ãšæããŸã
df.resample('5s').max().rolling('30s').mean()
ïŒãŸãã¯ä»»æã®åæžïŒã¯ãããªããæããã®ãšããäžèŽããŠããŸã
IOWã5 sãã³ã«ãããã®ããã¹ãŠåããããã1ã€ã®ãã€ã³ãã«æžãããŠããããããã®ãã³ãããŒã«ãªãŒããŒããŸãã ãã®äžè¬çãªèãæ¹ã¯ãçãã¿ã€ã ã¹ã±ãŒã«ã§èŠçŽã§ããããŒã¿ããããããããšããããšã§ãããå®éã«ã¯ããããããé«ãã¬ãã«ã§ããŒãªã³ã°ããå¿ èŠããããŸãã
ãã@jreback ãç§ã¯å®éã«5ç§ããšã«30ç§ä»¥äžã®ããŒã¿ã§é¢æ°ãå®è¡ããããšæã£ãŠããŸãã åã®äŸã®rolling_with_stepé¢æ°ãåç §ããŠãã ããã max / meanã®è¿œå ã®æé ã¯ãç§ã®ãŠãŒã¹ã±ãŒã¹ã§ã¯æ©èœããŸããã
@jreback ããã®è°è«ã§ã¯ãŸã åºãããŠããªãã¹ãããé¢æ°ãæ¬åœã«å¿ èŠã§ãã @alexloudenã説æãããã¹ãŠã®ããšã次ã«èª¬æããŸãããããã«ãŠãŒã¹ã±ãŒã¹ãè¿œå ããããšæããŸãã
çŽ3ã10ããªç§ã§ãµã³ããªã³ã°ãããå ¥åããŒã¿ã䜿çšããŠæç³»ååæãè¡ã£ãŠãããšããŸãã åšæ³¢æ°é åã®æ©èœã«é¢å¿ããããŸãã ããããæ§ç¯ããæåã®ã¹ãããã¯ããã€ãã¹ãåšæ³¢æ°ãèŠã€ããããšã§ãã ãã¡ã€ã³ç¥èã«ãã£ãŠãããã10 HzïŒ100ããªç§ã«1åïŒã§ããããšãããã£ãŠãããšããŸãã ã€ãŸãããã£ãŒãã£ãå ¥åä¿¡å·ãé©åã«ãã£ããã£ããå¿ èŠãããå Žåã¯ãããŒã¿ã®åšæ³¢æ°ãå°ãªããšã20 HzïŒ50ããªç§ã«1åïŒã§ããå¿ èŠããããŸãã ããããäœãåšæ³¢æ°ã«ãªãµã³ããªã³ã°ããããšã¯ã§ããŸããã æçµçã«ãããã§ç§ãã¡ãè¡ãèšç®ã¯æ¬¡ã®ãšããã§ãã
df.resample('50ms').mean().rolling(window=32).aggregate(power_spectrum_coeff)
ããã§ã¯ã8ã®åæ°ã®ãŠã£ã³ããŠãµã€ãºãéžæããŸããã32ãéžæãããšããŠã£ã³ããŠãµã€ãºã¯1.6ç§ã«ãªããŸãã éèšé¢æ°ã¯ãçåŽã®åšæ³¢æ°é åä¿æ°ãè¿ããæåã®å¹³åæåã¯å«ãŸããŸããïŒffté¢æ°ã¯å¯Ÿç§°ã§ããã0çªç®ã®èŠçŽ ã«å¹³åå€ããããŸãïŒã 以äžã¯ããµã³ãã«ã®éèšé¢æ°ã§ãã
def power_spectrum_coeff():
def power_spectrum_coeff_(x):
return np.fft.fft(x)[1 : int(len(x) / 2 + 1)]
power_spectrum_coeff_.__name__ = 'power_spectrum_coeff'
return power_spectrum_coeff_
ããã§ãããšãã°0.4ç§ããšãŸãã¯0.8ç§ããšã®ã¹ã©ã€ãã£ã³ã°ãŠã£ã³ããŠã§ãããç¹°ãè¿ããŸãã 代ããã«ãèšç®ãç¡é§ã«ããŠ50ããªç§ããšã«FFTãèšç®ããåŸã§ã¹ã©ã€ã¹ããŠãæå³ããããŸããã ããã«ã400ããªç§ã¯ãã€ãã¹ãåšæ³¢æ°ãããã¯ããã«äœã2.5Hzã§ããããã400ããªç§ãŸã§ãªãµã³ããªã³ã°ããããšã¯ã§ããŸããããããããšããã¹ãŠã®æ å ±ãæ©èœãã倱ãããŸãã
ããã¯åšæ³¢æ°é åã®æ©èœã§ãããå€ãã®æç³»åé¢é£ã®ç§åŠå®éšã«é©çšãããŸãã ãã ããæšæºåå·®ãªã©ã®ããåçŽãªæéé åéèšé¢æ°ã§ããããªãµã³ããªã³ã°ã§ã¯å¹æçã«ãµããŒãã§ããŸããã
ãããå®éã«å®è£ ããã®ã¯é£ãããšã¯æããŸãããïŒå®è£ ã«ã¯å€ãã®å€æŽãå¿ èŠã«ãªããŸããïŒã ããŒãžãã«ãŠã£ã³ããŠã䜿çšããŸãïŒIOWããŠã£ã³ããŠãèšç®ããé²ãã«ã€ããŠãé¢ãããã€ã³ããããããããŠãç²åŸãããã€ã³ããè¿œå ããŸãïŒã ãããã£ãŠãããã§ããã¹ãŠãèšç®ããå¿ èŠããããŸãããåºåã¯ããŸããã
'step'ãã©ã¡ãŒã¿ãŒãæã¡ãããã䜿çšããŠå®éã®èšç®ãæžããããšãã§ããããšã¯ãPandasã®å°æ¥ã®ç®æšã§ãªããã°ãªããŸããã stepãã©ã¡ãŒã¿ãŒãè¿ããã€ã³ããå°ãªãå Žåã¯ãåºåãã¹ã©ã€ã¹ã§ãããããå®è¡ãã䟡å€ã¯ãããŸããã ãããããããè¡ãããã®äœæ¥ãèãããšããããã®ããŒãºãæã€ãã¹ãŠã®ãããžã§ã¯ãã§Numpyã䜿çšããããšããå§ãããŸãã
@Murmuriaãããè¡ãããã®ãã«ãªã¯ãšã¹ããéä¿¡ããããšãæè¿ããŸãã å®éã«ã¯ããã»ã©é£ããã¯ãããŸããã
ãããç§ã¯2çªç®ã®èŠæ±éã«step
ã§ãã©ã¡ãŒã¿rolling()
ãç§ã¯ææã®çµæãåŸãããšãå¯èœã§ããããšãææããããšæããŸãbase
ã®ãã©ã¡ãŒã¿resample()
ãã¹ããããµã€ãºããŠã£ã³ããŠãµã€ãºã®æŽæ°åæ°ã®å Žåã @alexloudenã®äŸã䜿çšãããšïŒ
pandas.concat([
s.resample('30s', label='left', loffset=pandas.Timedelta(15, unit='s'), base=i).agg(p2p)
for i in range(30)
]).sort_index().plot(ax=ax, label='Solution with resample()', legend=True, style='k:')
åãçµæãåŸãããŸãïŒç·ãäž¡åŽã§30ç§äŒžã³ãŠããããšã«æ³šæããŠãã ããïŒã
éèšã®ã¿ã€ãã«ãã£ãŠã¯ãããã¯ãŸã ããããç¡é§ã§ãã @alexloudenã®äŸã®ãããªããŒã¯ããŒããŒã¯èšç®ã®ç¹å®ã®ã±ãŒã¹ã§ã¯ã p2p_arr()
ã¯ãçŽæ°ã2次å
è¡åã«åé
眮ããŠããã max()
ãžã®åäžã®åŒã³åºãã䜿çšãããããã»ãŒ200åé«éã§ãã max()
ããã³min()
ã
ããŒãªã³ã°ã®stepãã©ã¡ãŒã¿ãŒã䜿çšãããšãæ¥æã€ã³ããã¯ã¹ãªãã§ãã®æ©èœã䜿çšããããšãã§ããŸãã ãã§ã«åãçµãã§ãã人ã¯ããŸããïŒ
äžèšã®@alexloudenã¯ãããèšã£ãïŒ
ç§ã¯æããã«numpyã«ãã©ãŒã«ããã¯ã§ããŸããããããè¡ãããã®ããé«ãã¬ãã«ã®APIãããã°ããã®ã§ããã
@alexloudenãŸãã¯ç¥ã£ãŠããä»ã®èª°ããnumpyã§ãããè¡ãæ¹æ³ã«ã€ããŠããã€ãã®æŽå¯ãå ±æã§ããŸããïŒ ãããŸã§ã®ç§ã®èª¿æ»ããããããnumpyã§ãè¡ãã®ã¯ç°¡åã§ã¯ãªãããã§ãã å®éãããã«ã¯æªè§£æ±ºã®åé¡ããããŸãhttps://github.com/numpy/numpy/issues/7753
ããããšã
ããã«ã¡ã¯@ tsando-äžèšã§äœ¿çšããé¢æ°rolling_with_step
ã¯æ©èœããŸããã§ãããïŒ
@alexloudenããããšãããã®é¢æ°ããã§ãã¯ãããšããããŸã ãã³ãã«äŸåããŠããããã§ãïŒå ¥åãšããŠã·ãªãŒãºãåãåããã·ãªãŒãºã€ã³ããã¯ã¹ã䜿çšããŸãïŒã ããã«ã¯çŽç²ã«åä»ãªã¢ãããŒããããã®ã ããããšæã£ãŠããŸããã https://github.com/numpy/numpy/issues/7753ã§èšåããã¹ã¬ããã§ã¯ãnumpyã¹ãã©ã€ãã䜿çšããé¢æ°ãææ¡ããŠããŸãããç解ããŠãŠã£ã³ããŠãã¹ãããã®å ¥åã«å€æããã®ã¯å°é£ã§ãã
@tsandoããã¯ç§ãäžã«ãªã³ã¯ããããã°æçš¿ã®PDFã§ã-äœè ãGithubãŠãŒã¶ãŒåãå€æŽãããµã€ããåã³ç«ã¡äžããŠããªãããã§ãã ïŒPDFã«å€æããããã«ããŒã«ã«ã§å®è¡ããã ãã§ãïŒã
äžèšã®ç§ã®é¢æ°ã¯ã圌ã®æåŸã®äŸãPandasã§åââäœããããã«å€æããããšã§ãã-numpyãçŽæ¥äœ¿çšãããå Žåã¯ã次ã®ããã«ããããšãã§ããŸãïŒ https ïŒ
ã圹ã«ç«ãŠãã°ïŒ
@alexloudenããããšãïŒ åœ¢ç¶(13, 1313)
é
åã§è©ŠããŠã¿ãŸãããã次ã®ãšã©ãŒãçºçããŸããã
ãããã¯å¯èœã§ããããããéèŠãªãŠãŒã¹ã±ãŒã¹ãèŠãããšæããŸããã
ãã³ãã䜿çšããŠåãçµãã ãããžã§ã¯ããäœã§ãããã»ãšãã©ã®å Žåããã®æ©èœãèŠéããŠããŸãããããŸã«é©çšãèšç®ããå¿ èŠãããããåãŠã£ã³ããŠå ã§é©åãªè§£å床ãå¿ èŠãªå Žåã¯ãã€ã§ã圹ç«ã¡ãŸãã
ç§ããã®æ©èœã«åæãããµããŒãããŸã
æç³»åãåŠçãããšãã«ã»ãŒæ¯åå¿ èŠã«ãªããã®æ©èœã¯ãèŠèŠåãšåæã®äž¡æ¹ã®æç³»åæ©èœãçæããããã®ããåªããå¶åŸ¡ãæäŸããå¯èœæ§ããããŸãã ãã®ã¢ã€ãã¢ã匷ãæ¯æããŠãã ããïŒ
ãã®æ©èœã«ãåæããŠãµããŒããã
ããã¯ãè¯å¥œãªãŠã£ã³ããŠè§£å床ãç¶æããªããèšç®æéãççž®ããã®ã«éåžžã«åœ¹ç«ã¡ãŸãã
ç¹å®ã®ã¿ãŒã²ããã«åãããŠããã«èª¿æŽã§ãããœãªã¥ãŒã·ã§ã³ã³ãŒããæäŸããŸãã
def average_smoothing(signal, kernel_size, stride):
sample = []
start = 0
end = kernel_size
while end <= len(signal):
start = start + stride
end = end + stride
sample.append(np.mean(signal[start:end]))
return np.array(sample)
ç§ã¯ãã®æ©èœã«åæãããµããŒãããŸãã çŸåšãã¹ãããã¢ãŒã·ã§ã³ã«ãªã£ãŠããããã§ãã
TBã®ããŒã¿ãããå Žåãèšç®ããŠããããŠã³ãµã³ããªã³ã°ããããšã¯ã§ããŸããã
ããã¯ç§ãããããšã«ãéåžžã«åœ¹ç«ã¡ãŸãã ããŒã«ã«ã®ç¶æ ãç解ããããã«ãéè€ããªããŠã£ã³ããŠã®ããŸããŸãªçµ±èšãå¿ èŠãªTBã®ããŒã¿ããããŸãã ç§ã®çŸåšã®ãä¿®æ£ãã¯ãããŒã¿ãã¬ãŒã ãã¹ã©ã€ã¹ããŠã®çµ±èšãçæãããžã§ãã¬ãŒã¿ãŒãäœæããããšã§ãã ãã®æ©èœããããšéåžžã«åœ¹ç«ã¡ãŸãã
ãã®æ©èœã¯ãæç³»åãé¢ä¿ããå Žåã«å¿ ãå¿ èŠã§ãã
åæããŸãã確ãã«ãã®æ©èœãè¿œå ããå¿ èŠããããŸããæ ªäŸ¡éã§ãŠã£ã³ããŠçžé¢ãå®è¡ããããšãããšãç¬èªã®é¢æ°ãäœæããå¿ èŠããããŸãã
ãã®ãããªåºæ¬çãªæ©èœããŸã ãªããªããŠä¿¡ããããŸããïŒ
ãã®åé¡ã¯ãã€è§£æ±ºãããŸããïŒ
ããããšã
ããããªãè°è«ãã«è²¢ç®ããã«ã¯ïŒ
ç§ã®äœ¿çšäŸã¯ã1ç§ã®è§£å床ã§1ãæã®ããŒã¿ã«ã€ããŠ1æéããã1ã€ã®æå°/æ倧/äžå€®å€ãèšç®ããããšã§ãã ããã¯ãšãã«ã®ãŒäœ¿çšéã®ããŒã¿ã§ããããªãµã³ããªã³ã°ã§å€±ããã1ã2ç§éã®ããŒã¯ããããŸãã ãã以å€ã¯ãããšãã°5ç§/ 1åã«ãªãµã³ããªã³ã°ããŠããå¿
èŠãª1æ¥ããã24ãŠã£ã³ããŠãèšç®ã§ããã ãã§ãªããç Žæ£ããå¿
èŠããã1æ¥ããã4k / 1kãŠã£ã³ããŠãèšç®ããå¿
èŠããããšããäºå®ã¯å€ãããŸããã ã
groupby asoã䜿çšããŠãããåé¿ããããšã¯å¯èœã§ãããçŽæçã§ããããŒãªã³ã°å®è£ ã»ã©é«éã§ããªãããã§ãïŒãœãŒãä»ãã®2.5milæéã®ãŠã£ã³ããŠã®å Žåã¯2ç§ïŒã ããã¯é©ãã»ã©éããŠäŸ¿å©ã§ããããã®åãååã«æŽ»çšããã«ã¯ãæ¬åœã«å€§ããªè°è«ãå¿ èŠã§ãã
ç§ã¯ãã®åé¡ã調ã¹ãã ããã¯æ¯èŒçç°¡åã§ãããã³ãŒãã®å®è£ æ¹æ³ã¯ããã£ãšèŠãã ãã§ããã¹ãŠã®ããŒãªã³ã°ã«ãŒãã³ãæåã§ç·šéããå¿ èŠããããšæããŸãã ãããã®ã©ãããã€ã³ãã¯ãµãŒã¯ã©ã¹ã«ãã£ãŠäžãããããŠã£ã³ããŠå¢çãå°éããŸããã ãããããªãããã®ãªã¯ãšã¹ããšïŒ11704ã®äž¡æ¹ãéåžžã«ç°¡åã«è§£æ±ºã§ããŸãã ãããã«ãããç©äºãæŽããã®ã«æéããããã人ã«ãšã£ãŠã¯æ±ãããããšæããŸãã ç§ã¯ãåé¡ã«ã©ã®ããã«åãçµããã瀺ãããã«ãäžéå端ãªPRïŒMVPã®ããã ãã«ãæåŠããããšäºæ³ãããŸãïŒãéå§ããŸããã
ã©ã³ãã³ã°ïŒ
import numpy as np
import pandas as pd
data = pd.Series(
np.arange(100),
index=pd.date_range('2020/05/12 12:00:00', '2020/05/12 12:00:10', periods=100))
print('1s rolling window every 2s')
print(data.rolling('1s', step='2s').apply(np.mean))
data.sort_index(ascending=False, inplace=True)
print('1s rolling window every 500ms (and reversed)')
print(data.rolling('1s', step='500ms').apply(np.mean))
åé
1s rolling window every 2s
2020-05-12 12:00:00.000000000 4.5
2020-05-12 12:00:02.020202020 24.5
2020-05-12 12:00:04.040404040 44.5
2020-05-12 12:00:06.060606060 64.5
2020-05-12 12:00:08.080808080 84.5
dtype: float64
1s rolling window every 500ms (and reversed)
2020-05-12 12:00:10.000000000 94.5
2020-05-12 12:00:09.494949494 89.5
2020-05-12 12:00:08.989898989 84.5
2020-05-12 12:00:08.484848484 79.5
2020-05-12 12:00:07.979797979 74.5
2020-05-12 12:00:07.474747474 69.5
2020-05-12 12:00:06.969696969 64.5
2020-05-12 12:00:06.464646464 59.5
2020-05-12 12:00:05.959595959 54.5
2020-05-12 12:00:05.454545454 49.5
2020-05-12 12:00:04.949494949 44.5
2020-05-12 12:00:04.444444444 39.5
2020-05-12 12:00:03.939393939 34.5
2020-05-12 12:00:03.434343434 29.5
2020-05-12 12:00:02.929292929 24.5
2020-05-12 12:00:02.424242424 19.5
2020-05-12 12:00:01.919191919 14.5
2020-05-12 12:00:01.414141414 9.5
2020-05-12 12:00:00.909090909 4.5
dtype: float64
å®è£ ã®è©³çŽ°ã«ã€ããŠã¯ãPRãã芧ãã ããïŒãŸãã¯ããïŒhttpsïŒ//github.com/anthonytw/pandas/tree/rolling-window-stepïŒ
ä»äžãã«ãã£ãšæéãããããã£ãã®ã§ãããæ®å¿µãªããããã¹ãŠã®ããŒãªã³ã°æ©èœãäœãçŽããšããããããããäœæ¥ã«åãçµãäœå°ã¯ãããŸããã§ããã ããã«åãçµã¿ãã人ã«ã¯ãã€ã³ãã¯ãµãŒã¯ã©ã¹ã«ãã£ãŠçæããããŠã£ã³ããŠå¢çãé©çšããrolling _ * _ fixed / variableé¢æ°ãçµ±åããããšããå§ãããŸãã éå§å¢çãšçµäºå¢çã§ã¯ãäžåäžã«ãµã³ããªã³ã°ãããããŒã¿ã§ç¹å¥ãªããšãè¡ãé¢æ°ããªãéããããããç°ãªãå¿ èŠãããçç±ã¯ããããŸããïŒãã®å Žåããã®ç¹å®ã®é¢æ°ã¯ãã¥ã¢ã³ã¹ãããé©åã«åŠçã§ããããããããããã©ã°ãäœããèšå®ããŸãïŒã
ããã¯ã get_window_bounds()
ã¢ãããŒãã䜿çšããã«ã¹ã¿ã ãŠã£ã³ããŠã§ãæ©èœããŸããïŒ
ããã«ã¡ã¯ãç§ãææ¡ããé¡ãããŸãã ããã¯æ¬åœã«äŸ¿å©ãªæ©èœã§ãã
'æšæº'é¢æ°ã䜿çšããŠããå Žåããããã¯ãã¯ãã«åãããŠãããããvé«éïŒ
ts.rolling(5).max().dropna()[::2]
ïŒã§ããIIUCã§ã®ç¯çŽã¯ãé¢æ°ãã»ãã®ããããªæéïŒããšãã°ãnçªç®ããšã®å€ïŒã§é©çšããããšã«ãã£ãŠããããããŸãã ãããããããå®éçãªéããçãå Žåã¯ãããŸããïŒ
ç§ã¯ããã«ã¡ããã©ãã®ãããªäŸããããŸãïŒ https ïŒ
Nçªç®ããšã¯365çªç®ããšã«ãªããŸãã ãŠã£ã³ããŠãµã€ãºã¯ããã°ã©ã ã®åç¶æéã«ããã£ãŠå¯å€ã§ãããã¹ãããããŠã£ã³ããŠãµã€ãºã®æŽæ°åæ°ã§ãããšã¯éããŸããã
åºæ¬çã«ããèŠãŠãã幎ã®æ¥æ°ãã§æ®µéçã«èšå®ããããŠã£ã³ããŠãµã€ãºãå¿ èŠã§ããããã¯ããããŸã§ã«ãã®åé¡ã§èŠã€ãããã¹ãŠã®ãœãªã¥ãŒã·ã§ã³ã§ã¯äžå¯èœã§ãã
ç§ã¯ãŸãã次ã®æèã§åæ§ã®ããŒãºãæã£ãŠããŸãïŒå®éã®å°éçãªããŒãºããé©å¿ïŒïŒ
ç§ãç解ããŠããéããdataframe.rollingïŒïŒAPIã䜿çšãããšã365æ¥ã®æéãæå®ã§ããŸããã30æ¥éã®å€ïŒäžå®ã§ã¯ãªãè¡æ°ïŒãã¹ãããããŠæ¬¡ã®å¹³åãèšç®ããå¿ èŠã¯ãããŸããã 365æ¥ã®å€ã®éžæã
æããã«ãç§ãæåŸ ããçµæã®ããŒã¿ãã¬ãŒã ã¯ãæåã®ãç¬ã®ã€ãã³ããããŒã¿ãã¬ãŒã ãããïŒã¯ããã«ïŒå°ãªãè¡æ°ã«ãªããŸãã
ç°¡åãªäŸã䜿çšããŠããã®ãªã¯ãšã¹ãã«ã€ããŠããæ確ã«ããããã§ãã
ãã®ã·ãªãŒãºãããå ŽåïŒ
In [1]: s = pd.Series(range(5))
In [2]: s
Out[2]:
0 0
1 1
2 2
3 3
4 4
dtype: int64
ãŠã£ã³ããŠãµã€ãºã¯2
ãã¹ããããµã€ãºã¯1
ã§ãã ã€ã³ããã¯ã¹0
ãã®æåã®ãŠã£ã³ããŠãè©äŸ¡ãããã€ã³ããã¯ã¹1
ã®ãŠã£ã³ããŠãã¹ããããªãŒããŒããã€ã³ããã¯ã¹2
ã®ãŠã£ã³ããŠãè©äŸ¡ããŸããïŒ
In [3]: s.rolling(2, step=1, min_periods=0).max()
Out[3]:
0 0.0
1 NaN # step over this observation
2 2.0
3 NaN # step over this observation
4 4.0
dtype: float64
åæ§ã«ããã®æéããŒã¹ã®ã·ãªãŒãºãããå Žå
In [1]: s = pd.Series(range(5), index=pd.DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06', '2020-01-09']))
In [2]: s
Out[2]:
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-06 3
2020-01-09 4
dtype: int64
ãŠã£ã³ããŠãµã€ãºã¯'3D'
ãã¹ããããµã€ãºã¯'3D'
ã§ãã ããã¯æ£ããçµæã§ããããïŒ
In [3]: s.rolling('3D', step='3D', min_periods=0).max()
Out[3]:
2020-01-01 0.0 # evaluate this window
2020-01-02 NaN # step over this observation (2020-01-01 + 3 days > 2020-01-02)
2020-01-03 NaN # step over this observation (2020-01-01 + 3 days > 2020-01-03)
2020-01-06 3.0 # evaluate this window ("snap back" to this observation)
2020-01-09 4.0 # evaluate this window (2020-01-06 + 3 days = 2020-01-09)
dtype: float64
@mroeschke wrtæåã®äŸïŒ[3]ïŒã§ã¯ãçµæã¯ç§ãæåŸ ãããã®ã§ã¯ãããŸããã ããã¯ãã¬ãŒãªã³ã°ãŠã£ã³ããŠã§ãããšæ³å®ããŸãïŒããšãã°ãindex = 0ã§ã¯ã-1ãš0ã§ã®èŠçŽ ã®æ倧å€ã«ãªããããmaxïŒ[0]ïŒã ãã§ãã次ã«ãã1ãã€ã³ããã¯ã¹ãindex = 0ã«é²ããŸãã + step = 1ã§ããã次ã®èšç®ã¯maxïŒ[0,1]ïŒã次ã«maxïŒ[1,2]ïŒãªã©ã«ãªããŸããæå³ããããã«èŠããã®ã¯ã¹ããããµã€ãºã2ãªã®ã§ã次ã®ããã«ãªããŸãã index = 0ããindex = 0 + 2 = 2ã«ç§»åãïŒã€ã³ããã¯ã¹1ãã¹ãããïŒããã®ããã«ç¶è¡ããŸãããã®å Žåã¯ã»ãŒæ£ããã§ãããNaNã¯ååšããªãã¯ãã§ãããã ãããã®ãµã€ãºã¯ã2åããããªãå ŽåããããŸããããšãã°ãæ£è ã®500Hz ECGããŒã¿ã¯çŽ1æéã«çžåœãã180äžãµã³ãã«ã§ãã2åããšã«5åéã®ç§»åå¹³åãå¿ èŠãªå Žåã¯ã次ã®ããã«ãªããŸãã 30ã®æå¹ãªèšç®ãš180äžã®NaNããããã«äžåã180äžã®èŠçŽ :-)
ã€ã³ããã¯ã¹ä»ãã®å Žåãã¹ããããµã€ãº= 1ãçŸåšã®åäœã§ããã€ãŸãããŠã£ã³ããŠå ã®ããŒã¿ã䜿çšããŠå¯Ÿè±¡ã®ç¹åŸŽãèšç®ãããŠã£ã³ããŠã1ã€ã·ããããŠãããç¹°ãè¿ããŸãã ãã®äŸã§ã¯ããŠã£ã³ããŠå ã®ããŒã¿ã䜿çšããŠå¯Ÿè±¡ã®ç¹åŸŽãèšç®ãã60,000ã€ã³ããã¯ã¹ãã€ã·ããããŠãããç¹°ãè¿ããŸãã
åœæã®åæ§ã®çºèšã ãã®å Žåããã®ã¿ã€ãã®ãŠã£ã³ããŠãå®è£ ããæ£ããæ¹æ³ã«é¢ããŠæèŠã®çžéããããããããŸããããç§ã®æèŠã§ã¯ããæè¯ã®ãïŒTMïŒæ¹æ³ã¯ãæét0ããéå§ããç¯å²å ã®ãã¹ãŠã®èŠçŽ ãèŠã€ããããšã§ãïŒt0-window ãt0]ãç¹åŸŽãèšç®ããŠãããã¹ããããµã€ãºã§ç§»åããŸããèŠçŽ ã®æå°æ°ããå°ãªããŠã£ã³ããŠããã¹ãŠç Žæ£ããŸãïŒæ§æå¯èœãããã©ã«ãã¯1ïŒããã®äŸã¯æ«å°Ÿã®ãŠã£ã³ããŠçšã§ãããå€æŽã§ããŸããããã«ã¯å€§ããªã®ã£ããã§æéã浪費ãããšããæ¬ ç¹ããããŸãããã®ã£ããã¯ã€ã³ããªãžã§ã³ãã«åŠçã§ããåçŽãªæ¹æ³ã§èšç®ããŠãïŒç§ã®ããã«æ æ°ãªã®ã§ïŒããã®åé¡ã¯å®éã«ã¯ãŸã èŠãŠããŸããã ãã®ã£ããã¯éåžžãå®éã®ããŒã¿ã§åé¡ã«ãªãã»ã©å€§ãããªããããYMMVã
å€åããã¯ããæ確ã§ããïŒ äžèšã®ç§ã®äŸãšã³ãŒããèŠãŠãã ãããããã¯ããããããã説æãããããããŸããã
æ確å@anthonytwãããããšãã 確ãã«ã step
ããã¹ãããããŒãã€ã³ãããšããŠè§£éããå¿
èŠããã£ãããã§ãã
NaNã«ã€ããŠã¯ãåºåçµæã«NaNãèªåçã«ãããããããšããææ
ã¯ç解ããŠããŸããã httpsïŒ //github.com/pandas-dev/pandas/issues/15354#issuecomment -278676420 by @jrebackã«èšèŒãããŠããããã«ãåºåãå
¥åãšåãé·ãã«ãªãããã«ããããã®APIæŽåæ§ã®èæ
®äºé
ã NaNãä¿æããããŠãŒã¶ãŒãããå¯èœæ§ãããïŒå€åïŒïŒã rolling(..., step=...).func()
æäœåŸãdropna
ã¯åŒãç¶ã䜿çšã§ããŸãã
@mroeschkeäŸå€ãèšããå¿ èŠããããšæããŸãã ããã¥ã¡ã³ãã«æ瀺çãªã¡ã¢ãå ¥ããåäœãããã©ã«ãã§ãªãéãããžã£ã³ã¯ã§ãã£ã±ãã®ãã¯ãã«ãè¿ããªãããšã«ãã£ãŠæªåœ±é¿ãåããããšã¯ãããŸããã NaNãç¶æãããšãç®çã®ååãç¡å¹ã«ãªããŸãã 1ã€ã®ç®çã¯ãã³ã¹ãã®ãããèšç®ãå®è¡ããåæ°ãå¶éããããšã§ãã ãã1ã€ã®ç®çã¯ãæ©èœã»ããã管çãããããã®ã«æå°åããããšã§ãã ç§ãããªãã«äžãããã®äŸã¯å®éã®ãã®ã§ãããæ£è ç£èŠã¢ããªã±ãŒã·ã§ã³ã§å®éã«åŠçããªããã°ãªããªãã»ã©å€ãã®ããŒã¿ã§ã¯ãããŸããã æ¬åœã«å¿ èŠãªã¹ããŒã¹ã®60000åãå²ãåœãŠãŠãããé åãæ€çŽ¢ããŠNaNãåé€ããå¿ èŠããããŸããïŒ èšç®ãããæ©èœããšã«ïŒ
1åã®èšç®ã§å€ã®é åãçæãããå Žåãããããšã«æ³šæããŠãã ããã ECG波圢ã§äœããããã§ããïŒ ãã¡ããããã¯ãŒã¹ãã¯ãã«ãèšç®ããŠãã ããïŒ ãããã£ãŠã1ã€ã®å®å šãªPSDãã¯ãã«ïŒ150,000èŠçŽ ïŒã«180äžåïŒ2TBã®ããŒã¿ïŒã®ååãªã¹ããŒã¹ãå²ãåœãŠãŠããããã£ã«ã¿ãŒããããŠæ°ã«ãªãéšåïŒ34MBïŒãååŸããå¿ èŠããããŸãã ãã¹ãŠã®ã·ãªãŒãºã ãã¹ãŠã®æ£è ã®ããã«ã ãã£ãšRAMãè³Œå ¥ããå¿ èŠããããšæããŸãïŒ
NaNã¯ãäžéšã®æ©èœã§ã¯ãæå³ã®ããåºåã«ãªãå¯èœæ§ãããããšã«ãèšåãã䟡å€ããããŸãã ãã®å Žåãæå³ã®ããNaNãšããŒã¿ãããã£ã³ã°ãããžã£ã³ã¯NaNã®éããããããŸããã
APIãç¶æããããšããé¡æã¯ç解ããŠããŸãããããã¯æ¢åã®ã³ãŒããå£ãæ©èœã§ã¯ãããŸããïŒä»¥åã¯ååšããªãã£ãæ°ããæ©èœã§ããããïŒãæ©èœãèãããšã誰ãããããçæãããšæåŸ ããçç±ã¯ãããŸãããåããµã€ãºã®åºåã ãããŠãããšãããã ã£ããšããŠããã¹ããããµã€ãºã«ã€ããŠã®ããã¥ã¡ã³ãã®ã¡ã¢ã§ååã§ãããã äžå©ãªç¹ã¯ããäžè²«æ§ã®ãããAPIãæã€ããšã®å©ç¹ãã¯ããã«äžåããŸãïŒä»¥åã¯ååšããªãã£ãæ©èœã«ã€ããŠã¯ãæ°ãä»ããŠãã ããïŒã ãã®æ¹æ³ã§é²ããªããšæ©èœãæãªãããŸãããã®å Žåãå®è£ ãã䟡å€ãããããŸããïŒç§ã®çµéšã§ã¯ãã»ãšãã©ã®å Žåãã¹ããŒã¹ã³ã¹ãã倧ããªèŠå ã§ãïŒã
æãåèã«ãªãã³ã¡ã³ã
ãããã¯å¯èœã§ããããããéèŠãªãŠãŒã¹ã±ãŒã¹ãèŠãããšæããŸããã
ãã³ãã䜿çšããŠåãçµãã ãããžã§ã¯ããäœã§ãããã»ãšãã©ã®å Žåããã®æ©èœãèŠéããŠããŸãããããŸã«é©çšãèšç®ããå¿ èŠãããããåãŠã£ã³ããŠå ã§é©åãªè§£å床ãå¿ èŠãªå Žåã¯ãã€ã§ã圹ç«ã¡ãŸãã