Numpy: ๊ณ ์œ  ๋ฐ NaN ํ•ญ๋ชฉ (Trac # 1514)

์— ๋งŒ๋“  2012๋…„ 10์›” 19์ผ  ยท  14์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: numpy/numpy

_2010-06-18 trac ์‚ฌ์šฉ์ž rspringuel์˜ ์›๋ณธ ํ‹ฐ์ผ“ http://projects.scipy.org/numpy/ticket/1514 , unknown์— ํ• ๋‹น ๋จ _

์—ฌ๋Ÿฌ NaN ํ•ญ๋ชฉ์ด์žˆ๋Š” ๋ฐฐ์—ด์—์„œ unique๊ฐ€ ์ž‘๋™ํ•˜๋ฉด ์›๋ž˜ ๋ฐฐ์—ด์—์„œ NaN์ด์—ˆ๋˜ ๊ฐ ํ•ญ๋ชฉ์— ๋Œ€ํ•œ NaN์ด ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.

์˜ˆ :
a = random.randint (5, size = 100) .astype (float)

a [12] = nan # ๋‹จ์ผ nan ํ•ญ๋ชฉ ์ถ”๊ฐ€
๊ณ ์œ  (a)
๋ฐฐ์—ด ([0., 1., 2., 3., 4., NaN])
a [20] = nan # ์ดˆ ์ถ”๊ฐ€
๊ณ ์œ  (a)
๋ฐฐ์—ด ([0., 1., 2., 3., 4., NaN, NaN])
a [13] = nan
unique (a) # ๋ฐ 1/3
๋ฐฐ์—ด ([0., 1., 2., 3., 4., NaN, NaN, NaN])

์ด๊ฒƒ์€ ์•„๋งˆ๋„ x์™€ y๊ฐ€ ๋ชจ๋‘ NaN์ด๋ฉด x == y๊ฐ€ False๋กœ ํ‰๊ฐ€๋˜๊ธฐ ๋•Œ๋ฌธ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค. Unique๋Š” ์ด๋ฏธ ์‹๋ณ„ ๋œ ๊ฐ’์— ๊ฐ’์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ์กฐ๊ฑด๋ฌธ์— "or (isnan (x) ๋ฐ isnan (y))"๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ๋‚˜๋Š” numpy์—์„œ ๋…ํŠนํ•œ ์‚ถ์ด ์žˆ๋Š”์ง€ ๋ชฐ๋ž๊ณ  ๋‚ด๊ฐ€ ์ฐพ์•„๋ดค์„ ๋•Œ ๊ทธ๊ฒƒ์„ ์ฐพ์„ ์ˆ˜ ์—†์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์Šค์Šค๋กœ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค (๋˜๋Š” ์กฐ๊ฑด ๋ฌธ์˜ ์ •ํ™•ํ•œ ๊ตฌ๋ฌธ์ด ๋ฌด์—‡์ธ์ง€ ํ™•์‹  ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค).

๋˜ํ•œ ๋‹ค์Œ ๊ธฐ๋Šฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋™์ž‘์„ ํŒจ์น˜ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ •์˜ nanunique (x) :
a = numpy.unique (x)
r = []
๋‚˜๋ฅผ ์œ„ํ•ด :
i๊ฐ€ r ๋˜๋Š” (numpy.isnan (i) ๋ฐ numpy.any (numpy.isnan (r))) ์ธ ๊ฒฝ์šฐ :
๊ณ„์†ํ•˜๋‹ค
๊ทธ๋ฐ–์—:
r.append (i)
return numpy.array (r)

00 - Bug Other

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์˜ค๋Š˜๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. np.unique ๋ฃจํ‹ด์˜ ํ•ต์‹ฌ์€ numpy / lib / arraysetops.py์˜ unraveled ์ •๋ ฌ ๋œ ๋ฐฐ์—ด์—์„œ ๋งˆ์Šคํฌ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ํ•ด๋‹น ์ •๋ ฌ ๋œ ๋ฐฐ์—ด์—์„œ ๊ฐ’์ด ๋ณ€๊ฒฝ๋˜๋Š”์‹œ๊ธฐ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

mask = np.empty(aux.shape, dtype=np.bool_)
mask[:1] = True
mask[1:] = aux[1:] != aux[:-1]

์ด๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์œผ๋กœ ๋Œ€์ฒด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์•ฝ 5 ๋…„ ์ „์˜ jaimefrio์˜ ์˜๊ฒฌ๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•˜์ง€๋งŒ argmin ํ˜ธ์ถœ์„ ํ”ผํ•ฉ๋‹ˆ๋‹ค.

mask = np.empty(aux.shape, dtype=np.bool_)
mask[:1] = True
if (aux.shape[0] > 0 and isinstance(aux[-1], (float, np.float16,
                                              np.float32, np.float64))
    and np.isnan(aux[-1])):
    aux_firstnan = np.searchsorted(aux, np.nan, side='left')
    mask[1:aux_firstnan] = (aux[1:aux_firstnan] != aux[:aux_firstnan-1])
    mask[aux_firstnan] = True
    mask[aux_firstnan+1:] = False
else:
    mask[1:] = aux[1:] != aux[:-1]

๋ช‡ ๊ฐ€์ง€ % timeit ์‹คํ—˜์„ ์‹คํ–‰ํ•˜๋ฉด ์–ด๋ ˆ์ด๊ฐ€ ํฌ๊ณ  NaN์ด ๋งค์šฐ ์ ์€ ๊ฒฝ์šฐ (1 ๋ฐฑ๋งŒ ๊ฐœ ์ค‘ 10 NaN) ์ตœ๋Œ€ <10 % ๋Ÿฐํƒ€์ž„ ํŒจ๋„ํ‹ฐ๊ฐ€ ๊ด€์ฐฐ๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋Œ€ํ˜• ์–ด๋ ˆ์ด์˜ ๊ฒฝ์šฐ ์‹ค์ œ๋กœ ๋งŽ์€ ๊ฒฝ์šฐ ๋” ๋น ๋ฅด๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. NaN์˜.

๋ฐ˜๋ฉด์— ๋ฐฐ์—ด์ด ์ž‘์€ ๊ฒฝ์šฐ (์˜ˆ : 10 ๊ฐœ ํ•ญ๋ชฉ) float ๋ฐ NaN์— ๋Œ€ํ•œ ๊ฒ€์‚ฌ๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ๋น„์‹ธ๊ณ  ๋Ÿฐํƒ€์ž„์ด ๋ฐฐ์ˆ˜๋กœ ์˜ฌ๋ผ๊ฐˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Š๋ฆฐ ์ˆ˜ํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ์— NaN์ด ์—†์–ด๋„ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

๋ฐฐ์—ด์— NaN์ด์žˆ๋Š” ๊ฒฝ์šฐ NaN์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ์ด ๊ฒฝ์šฐ์—๋Š” ์›ํ•˜๋Š” ๊ฒฐ๊ณผ (๋ชจ๋“  NaN์ด ๋‹จ์ผ ๊ฐ’ ๊ทธ๋ฃน์œผ๋กœ ๊ฒฐํ•ฉ ๋จ)๋ฅผ ์•ฝ๊ฐ„ ๋” ๋Š๋ฆฌ๊ฒŒ ์–ป๋Š” ๊ฒƒ๊ณผ ์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฐ๊ณผ (์ž์ฒด ๊ฐ’ ๊ทธ๋ฃน์˜ ๊ฐ NaN)๋ฅผ ์•ฝ๊ฐ„ ๋” ๋น ๋ฅด๊ฒŒ ์–ป๋Š” ๊ฒƒ์ด ์‹ค์ œ๋กœ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ,์ด ํŒจ์น˜๋Š” ๋‹ค์Œ ์˜ˆ์ œ์™€ ๊ฐ™์ด NaN์„ ํฌํ•จํ•˜๋Š” ๋ณตํ•ฉ ๊ฐ์ฒด์™€ ๊ด€๋ จ๋œ ๊ณ ์œ  ๊ฐ’ ์ฐพ๊ธฐ๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

a = np.array([[0,1],[np.nan, 1], [np.nan, 1]])
np.unique(a, axis=0)

์—ฌ์ „ํžˆ ๋Œ์•„์˜ฌ ๊ฒƒ์ž…๋‹ˆ๋‹ค

array([[ 0.,  1.],
       [nan,  1.],
       [nan,  1.]])

๋ชจ๋“  14 ๋Œ“๊ธ€

_trac ์‚ฌ์šฉ์ž rspringuel์ด ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. 2010-06-18_

์œ„์˜ ์ฝ”๋“œ ๋ธ”๋ก์„ ์‚ฌ์šฉํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ํŒจ์น˜ ์˜ค๋ฒ„ ์ฝ”๋“œ์—๋งŒ ์‹ค์ œ๋กœ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฏ€๋กœ ๋‹ค์‹œ ๊ฒŒ์‹œํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

def nanunique(x):
    a = numpy.unique(x)
    r = []
    for i in a:
        if i in r or (numpy.isnan(i) and numpy.any(numpy.isnan(r))):
            continue
        else:
            r.append(i)
    return numpy.array(r)

๊ฒฐ์ •๋œ.

์ตœ์‹  ๋งˆ์Šคํ„ฐ์—์„œ ์—ฌ์ „ํžˆ์ด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์–ด๋–ค ์ปค๋ฐ‹์„ ์ˆ˜์ • ํ–ˆ์–ด์•ผํ•˜๋‚˜์š”? ๋‚ด๊ฐ€ ๋ญ”๊ฐ€๋ฅผ ๋†“์นœ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ๋ฉด์ด ๋ฌธ์ œ๋ฅผ ๋‹ค์‹œ ์—ฌ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ ์ˆ˜๋ ˆ์— ๋Œ€ํ•ด ์‰ฝ๊ฒŒ ๊ณ ์น  ์ˆ˜ ์žˆ์ง€๋งŒ ๋ณต์žกํ•˜๊ฑฐ๋‚˜ ๊ตฌ์กฐํ™” ๋œ dtype์— ๋Œ€ํ•œ ์‰ฌ์šด ๋ฐฉ๋ฒ•์€ ๋ณด์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ PR์„ ํ•จ๊ป˜ํ•˜๊ณ  ๊ฑฐ๊ธฐ์—์„œ ์˜ต์…˜์— ๋Œ€ํ•ด ๋…ผ์˜ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

@jaimefrio ๊ณ ์œ  ์‚ฌ์šฉ์„ ์œ„ํ•ด ์ˆ˜์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

    if issubclass(aux.dtype.type, np.inexact):
        # nans always compare unequal, so encode as integers
        tmp = aux.searchsorted(aux)
    else:
        tmp = aux
    flag = np.concatenate(([True], tmp[1:] != tmp[:-1]))

ํ•˜์ง€๋งŒ ๋‹ค๋ฅธ ๋ชจ๋“  ์ž‘์—…์—๋„ ๋ฌธ์ œ๊ฐ€์žˆ๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. nan_equal, nan_not_equal ufuncs ๋˜๋Š” nanfuntions๊ฐ€ ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

aux ์ž์ฒด๋ฅผ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์€ ํ˜„๋ช…ํ•œ ํŠธ๋ฆญ์ž…๋‹ˆ๋‹ค! ๋ชจ๋“  ํ•ญ๋ชฉ์„ ์ •๋ ฌํ•˜๋Š” ๊ฒƒ์€ ์•ฝ๊ฐ„ ๋‚ญ๋น„์ด์ง€๋งŒ ์ด์ƒ์ ์œผ๋กœ๋Š” aux ๋ฐ flag ๋ฅผ ์ง€๊ธˆ ๋‹น์žฅ ์ƒ์„ฑ ํ•œ ํ›„ nan์ด์žˆ๋Š” ์ฒซ ๋ฒˆ์งธ ํ•ญ๋ชฉ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. :

if not aux[-1] == aux[-1]:
    nanidx = np.argmin(aux == aux)
    nanaux = aux[nanidx:].searchsorted(aux[nanidx:])
    flag[nanidx+1:] = nanaux[1:] != nanaux[:-1]

๋˜๋Š” ๋‚ด๊ฐ€ ๊ฑฐ๊ธฐ์— ๋„์ž…ํ–ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด์žˆ๋Š” ํ•˜๋‚˜์˜ ์˜ค๋ฅ˜๋กœ ๋ชจ๋“  ์˜คํ”„๋ฅผ ์ˆ˜์ • ํ•œ ํ›„ ๋น„์Šทํ•œ ๊ฒƒ.

์ด ๋งˆ์ง€๋ง‰ ์ ‘๊ทผ ๋ฐฉ์‹์€ float ๋ฐ ๋ณตํ•ฉ ์œ ํ˜•์—๋Š” ์ž‘๋™ํ•˜์ง€๋งŒ ๋ถ€๋™ ์†Œ์ˆ˜์  ํ•„๋“œ๊ฐ€์žˆ๋Š” ๊ตฌ์กฐํ™” ๋œ dtype์—๋Š” ์‹คํŒจํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‚˜๋Š” ๋ชจ๋“  ์œ ํ˜•์— ๋Œ€ํ•ด ์ž‘๋™ํ•˜๋”๋ผ๋„ ๊ฒ€์ƒ‰ ์ •๋ ฌ ํŠธ๋ฆญ์ด ๋„ˆ๋ฌด ๋‚ญ๋น„๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋ช‡ ๊ฐ€์ง€ ํƒ€์ด๋ฐ :

In [10]: a = np.random.randn(1000)

In [11]: %timeit np.unique(a)
10000 loops, best of 3: 69.5 us per loop

In [12]: b = np.sort(a)

In [13]: %timeit b.searchsorted(b)
10000 loops, best of 3: 28.1 us per loop

์ด๋Š” 40 % ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. nanunique ํ•จ์ˆ˜์—์„œ๋Š” ๊ดœ์ฐฎ์„ ์ˆ˜ ์žˆ์ง€๋งŒ ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ์—๋Š” ๊ทธ๋ ‡์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2019 ๋…„์— OP ๋ฌธ์ œ๋Š” ์—ฌ์ „ํžˆ ์œ ํšจํ•˜๋ฉฐ ์ฝ”๋“œ๋Š” ์žฌํ˜„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

@jaimefrio ์™œ ์šฐ๋ฆฌ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฑฐ์ง“ ์˜ต์…˜์„ ๊ฐ€์งˆ ์ˆ˜ ์—†์Šต๋‹ˆ๊นŒ?

๋‚ด ๋ง์€,์ด ํ–‰๋™์€ ๊ธฐ๊ปํ•ด์•ผ ํ˜ผ๋ž€์Šค๋Ÿฝ๊ณ  ์„ฑ๋Šฅ์€ ๋ณ€๋ช…์ด ์•„๋‹™๋‹ˆ๋‹ค.

@ Demetrio92 ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ์‹œ๋„์— ๊ฐ์‚ฌ ๋“œ๋ฆฌ์ง€๋งŒ, ์ธํ„ฐ๋„ท์ƒ์˜ ์•„์ด๋Ÿฌ๋‹ˆ / ๋น„๊ผฌ๋Š” ์‚ฌ๋žŒ๋งˆ๋‹ค ๋‹ค๋ฅด๊ฒŒ ํ•ด์„ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์นœ์ ˆํ•˜๊ฒŒ ์œ ์ง€ํ•ด์ฃผ์„ธ์š”. ์šฐ๋ฆฌ ์ค‘ ์ผ๋ถ€์—๊ฒŒ๋Š” ์„ฑ๋Šฅ์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋ฉฐ ์ž‘์—… ์†๋„๋ฅผ ๋Šฆ์ถ”๋Š” ์ฝ”๋“œ๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

PR # 5487์€ ์•ž์œผ๋กœ ๋‚˜์•„๊ฐˆ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์˜๊ฒฌ์„ ๋งํ•˜๊ฑฐ๋‚˜ ์ œ์•ˆํ•˜๊ธฐ์— ๋” ์ข‹์€ ๊ณณ์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํŽธ์ง‘ : PR ๋ฒˆํ˜ธ ์ˆ˜์ •

์ด ๋ฌธ์ œ๋Š” 8 ๋…„ ๋™์•ˆ ์—ด๋ ค์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ด์ง€๋งŒ, numpy.unique ์˜ ๊ธฐ๋ณธ ๋™์ž‘์ด ๋น ๋ฅด์ง€ ์•Š๊ณ  ์ •ํ™•ํ•˜๋„๋ก +1์„ํ•˜๋ ค๊ณ ํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋‚ด ์ฝ”๋“œ๋ฅผ ๊นจ๋œจ ๋ ธ๊ณ  ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค์ด ๊ทธ๊ฒƒ์œผ๋กœ ๊ณ ํ†ต๋ฐ›์„ ๊ฒƒ์ด๋ผ๊ณ  ํ™•์‹ ํ•ฉ๋‹ˆ๋‹ค. ์„ ํƒ์  "fast = False"๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ fast ๋ฐ nans์— ๋Œ€ํ•œ nan ๋™์ž‘์„ ๋ฌธ์„œํ™” ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. np.unique๊ฐ€ ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ ํ˜„์ƒ์ด ์ž์ฃผ ๋ฐœ์ƒํ•œ๋‹ค๋ฉด ๋†€๋ž„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์˜ค๋Š˜๋„ ๊ฐ™์€ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ–ˆ์Šต๋‹ˆ๋‹ค. np.unique ๋ฃจํ‹ด์˜ ํ•ต์‹ฌ์€ numpy / lib / arraysetops.py์˜ unraveled ์ •๋ ฌ ๋œ ๋ฐฐ์—ด์—์„œ ๋งˆ์Šคํฌ๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ํ•ด๋‹น ์ •๋ ฌ ๋œ ๋ฐฐ์—ด์—์„œ ๊ฐ’์ด ๋ณ€๊ฒฝ๋˜๋Š”์‹œ๊ธฐ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

mask = np.empty(aux.shape, dtype=np.bool_)
mask[:1] = True
mask[1:] = aux[1:] != aux[:-1]

์ด๊ฒƒ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์œผ๋กœ ๋Œ€์ฒด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์•ฝ 5 ๋…„ ์ „์˜ jaimefrio์˜ ์˜๊ฒฌ๊ณผ ๊ฑฐ์˜ ๋น„์Šทํ•˜์ง€๋งŒ argmin ํ˜ธ์ถœ์„ ํ”ผํ•ฉ๋‹ˆ๋‹ค.

mask = np.empty(aux.shape, dtype=np.bool_)
mask[:1] = True
if (aux.shape[0] > 0 and isinstance(aux[-1], (float, np.float16,
                                              np.float32, np.float64))
    and np.isnan(aux[-1])):
    aux_firstnan = np.searchsorted(aux, np.nan, side='left')
    mask[1:aux_firstnan] = (aux[1:aux_firstnan] != aux[:aux_firstnan-1])
    mask[aux_firstnan] = True
    mask[aux_firstnan+1:] = False
else:
    mask[1:] = aux[1:] != aux[:-1]

๋ช‡ ๊ฐ€์ง€ % timeit ์‹คํ—˜์„ ์‹คํ–‰ํ•˜๋ฉด ์–ด๋ ˆ์ด๊ฐ€ ํฌ๊ณ  NaN์ด ๋งค์šฐ ์ ์€ ๊ฒฝ์šฐ (1 ๋ฐฑ๋งŒ ๊ฐœ ์ค‘ 10 NaN) ์ตœ๋Œ€ <10 % ๋Ÿฐํƒ€์ž„ ํŒจ๋„ํ‹ฐ๊ฐ€ ๊ด€์ฐฐ๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Ÿฌํ•œ ๋Œ€ํ˜• ์–ด๋ ˆ์ด์˜ ๊ฒฝ์šฐ ์‹ค์ œ๋กœ ๋งŽ์€ ๊ฒฝ์šฐ ๋” ๋น ๋ฅด๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. NaN์˜.

๋ฐ˜๋ฉด์— ๋ฐฐ์—ด์ด ์ž‘์€ ๊ฒฝ์šฐ (์˜ˆ : 10 ๊ฐœ ํ•ญ๋ชฉ) float ๋ฐ NaN์— ๋Œ€ํ•œ ๊ฒ€์‚ฌ๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ๋น„์‹ธ๊ณ  ๋Ÿฐํƒ€์ž„์ด ๋ฐฐ์ˆ˜๋กœ ์˜ฌ๋ผ๊ฐˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋Š๋ฆฐ ์ˆ˜ํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ์— NaN์ด ์—†์–ด๋„ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.

๋ฐฐ์—ด์— NaN์ด์žˆ๋Š” ๊ฒฝ์šฐ NaN์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ์ด ๊ฒฝ์šฐ์—๋Š” ์›ํ•˜๋Š” ๊ฒฐ๊ณผ (๋ชจ๋“  NaN์ด ๋‹จ์ผ ๊ฐ’ ๊ทธ๋ฃน์œผ๋กœ ๊ฒฐํ•ฉ ๋จ)๋ฅผ ์•ฝ๊ฐ„ ๋” ๋Š๋ฆฌ๊ฒŒ ์–ป๋Š” ๊ฒƒ๊ณผ ์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฐ๊ณผ (์ž์ฒด ๊ฐ’ ๊ทธ๋ฃน์˜ ๊ฐ NaN)๋ฅผ ์•ฝ๊ฐ„ ๋” ๋น ๋ฅด๊ฒŒ ์–ป๋Š” ๊ฒƒ์ด ์‹ค์ œ๋กœ ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ,์ด ํŒจ์น˜๋Š” ๋‹ค์Œ ์˜ˆ์ œ์™€ ๊ฐ™์ด NaN์„ ํฌํ•จํ•˜๋Š” ๋ณตํ•ฉ ๊ฐ์ฒด์™€ ๊ด€๋ จ๋œ ๊ณ ์œ  ๊ฐ’ ์ฐพ๊ธฐ๋ฅผ ์ˆ˜์ •ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

a = np.array([[0,1],[np.nan, 1], [np.nan, 1]])
np.unique(a, axis=0)

์—ฌ์ „ํžˆ ๋Œ์•„์˜ฌ ๊ฒƒ์ž…๋‹ˆ๋‹ค

array([[ 0.,  1.],
       [nan,  1.],
       [nan,  1.]])

"์–ด๋ ˆ์ด์— NaN์ด์žˆ๋Š” ๊ฒฝ์šฐ NaN์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ๋ชจ๋‘์˜ ์š”์ ์ž…๋‹ˆ๋‹ค."

+1

๋ฐ˜๋ณต๋˜๋Š” ์š”์†Œ๋ฅผ ํฌํ•จํ•˜๋Š” ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ•จ์ˆ˜ (์˜ˆ : NaN์ด 1 ๊ฐœ ์ด์ƒ์ธ ๋ชฉ๋ก)๋Š” "๊ณ ์œ "๋ผ๊ณ  ๋ถ€๋ฅด์ง€ ์•Š์•„์•ผํ•ฉ๋‹ˆ๋‹ค. NaN์˜ ๊ฒฝ์šฐ ๋ฐ˜๋ณต๋˜๋Š” ์š”์†Œ๊ฐ€ ํ•„์š”ํ•œ ๊ฒฝ์šฐ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋น„ํ™œ์„ฑํ™”๋˜๋Š” ํŠน์ˆ˜ํ•œ ๊ฒฝ์šฐ (์˜ˆ numpy.unique(..., keep_NaN=False) ์—ฌ์•ผํ•ฉ๋‹ˆ๋‹ค.

@ufmayer PR ์ œ์ถœ!

+1
NaN ๋ฐ˜ํ™˜๋„ ํ•œ ๋ฒˆ๋งŒ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰