Hdbscan: allow_single_cluster๊ฐ€ cluster_selection_epsilon๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ๋˜๋ฉด ์ถฉ๋Œ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2020๋…„ 04์›” 21์ผ  ยท  13์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: scikit-learn-contrib/hdbscan

x > 0 ๋ฐ allow_single_cluster=True ์—์„œ cluster_selection_epsilon=x #$๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋ฉด HDBSCAN์ด ์ถฉ๋Œํ•ฉ๋‹ˆ๋‹ค.

์ €๋Š” ์ด ๋‘ ๊ฐ€์ง€ ์˜ต์…˜์„ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ no_structure ์žฅ๋‚œ๊ฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ (ํ•˜๋‹จ ํ–‰, ์ •์‚ฌ๊ฐํ˜•)๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•˜๋ ค๊ณ  ์‹œ๋„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. DBSCAN์ด ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ์‚ฌ๊ฐํ˜•์ด ์™„์ „ํžˆ ํŒŒ๋ž€์ƒ‰์ด ๋˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค. cluster_selection_epsilon ๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ํ•ด๋‹น ์‚ฌ๊ฐํ˜•์— ์—ฌ๋Ÿฌ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. allow_single_cluster=True ๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ํ•ด๋‹น ์‚ฌ๊ฐํ˜•์˜ ์ผ๋ถ€๊ฐ€ ํšŒ์ƒ‰์ž…๋‹ˆ๋‹ค. ๋‘ ์ธ์ˆ˜๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ์›ํ•˜๋Š” ๊ฒฐ๊ณผ๋งŒ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ง€๋งŒ ๊ทธ๋ ‡๊ฒŒ ํ•  ๋•Œ HDBSCAN์ด ์ถฉ๋Œํ•ฉ๋‹ˆ๋‹ค.

์•”ํ˜ธ

import numpy as np
import hdbscan

if __name__ == '__main__':
    no_structure = np.random.rand(1500, 2)
    clusterer = hdbscan.HDBSCAN(min_cluster_size=15, cluster_selection_epsilon=3, allow_single_cluster=True)
    clusterer.fit(no_structure)

์˜ˆ์ƒ๋˜๋Š” ํ–‰๋™

HDBSCAN์€ ์ถฉ๋Œ ์—†์ด ๋ฐ์ดํ„ฐ๋ฅผ ํด๋Ÿฌ์Šคํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๋žŒ์งํ•˜๊ฒŒ๋Š” ์„ค๋ช…๋œ ๋Œ€๋กœ ๋ชจ๋“  ์ ์„ ์ •์‚ฌ๊ฐํ˜• ํŒŒ๋ž€์ƒ‰์œผ๋กœ ํŽ˜์ธํŒ…ํ•ฉ๋‹ˆ๋‹ค.

์‹ค์ œ ํ–‰๋™

HDBSCAN์€ ๋‹ค์Œ ์—ญ์ถ”์ ๊ณผ ํ•จ๊ป˜ ์ถฉ๋Œํ•ฉ๋‹ˆ๋‹ค.

Traceback (most recent call last):
  File "/home/home/PycharmProjects/sandbox/crash_example.py", line 7, in <module>
    clusterer.fit(no_structure)
  File "/home/home/PycharmProjects/sandbox/venv/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 919, in fit
    self._min_spanning_tree) = hdbscan(X, **kwargs)
  File "/home/home/PycharmProjects/sandbox/venv/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 632, in hdbscan
    return _tree_to_labels(X,
  File "/home/home/PycharmProjects/sandbox/venv/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 59, in _tree_to_labels
    labels, probabilities, stabilities = get_clusters(condensed_tree,
  File "hdbscan/_hdbscan_tree.pyx", line 645, in hdbscan._hdbscan_tree.get_clusters
  File "hdbscan/_hdbscan_tree.pyx", line 733, in hdbscan._hdbscan_tree.get_clusters
  File "hdbscan/_hdbscan_tree.pyx", line 631, in hdbscan._hdbscan_tree.epsilon_search
IndexError: index 0 is out of bounds for axis 0 with size 0

๋ชจ๋“  13 ๋Œ“๊ธ€

์ €๋„ ์ด ๋ฌธ์ œ๋ฅผ ๊ฒช๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. cluster_selection_method='eom' ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋งŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

._hdbscan_tree.pyx์—์„œ root ๋…ธ๋“œ์˜ ๋ชจ๋“  ์ž์‹์ด ์ž์‹ ํฌ๊ธฐ == 1์ด๊ธฐ ๋•Œ๋ฌธ์— cluster_tree๋Š” ๋น„์–ด ์žˆ์Šต๋‹ˆ๋‹ค. 'leaf' ๋ฉ”์„œ๋“œ๋Š” 'leaf'๋„ ๋น„์–ด ์žˆ๋„๋ก ์„ค์ •๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ทธ๋Ÿฌ๋‚˜ 'eom_clusters'๋Š” ๋ฃจํŠธ ๋…ธ๋“œ๋ฅผ ํฌํ•จํ•˜๊ณ  epsilon_search()๋Š” ์ž์‹ ๋…ธ๋“œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๋Š” cluster_tree์—๋งŒ ์˜์กดํ•ฉ๋‹ˆ๋‹ค. ๋ฃจํŠธ ๋…ธ๋“œ๊ฐ€ 'eom_clusters' ๋ชฉ๋ก์— ํฌํ•จ๋˜์ง€ ์•Š๋„๋ก ํ•˜๊ณ  ์‹ถ์Šต๋‹ˆ๊นŒ?

@danielzgtg ๋ฌธ์ œ์— ๋Œ€ํ•œ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ • ์‚ฌํ•ญ์— ๋Œ€ํ•œ ๋‚ด ํ’€ ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค. ๋‚˜๋Š” ๊ทธ๊ฒƒ์ด ๊ธฐ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์˜ํ–ฅ์„ ๋ฏธ์น˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ™•์‹ ํ•˜์ง€ ์•Š์ง€๋งŒ ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ๊ธฐ ์œ„ํ•ด ์ตœ์„ ์„ ๋‹คํ–ˆ๊ณ  ๊ฝค ์ž์‹ ํ•ฉ๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ Malzer์™€ Baum์˜ HDBSCAN* ๋…ผ๋ฌธ์„ ์ฝ์€ ํ›„ 370์— ๋Œ€ํ•œ ์ˆ˜์ • ์‚ฌํ•ญ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์ ์œผ๋กœ ์ •ํ™•ํ•˜์ง€ ์•Š์€ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์ˆ˜์ • ์‚ฌํ•ญ์„ ํ†ตํ•ด ํ•ด๋‹น ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์‹ค์ œ๋กœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์ •ํ™•ํžˆ ํ•˜๋‚˜์˜ ๋‹จ์ผ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์„ ํ—ˆ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์„น์…˜ 4.3์˜ ์„ค๋ช…์€ "๋ฃจํŠธ์— ๋„๋‹ฌํ•˜๊ธฐ ์ „์— ํ•˜๋‚˜๋ฅผ ์ฐพ์œผ๋ฉด ํด๋Ÿฌ์Šคํ„ฐ๋กœ ์„ ํƒํ•˜๊ณ  ๋ชจ๋“  ํ•˜์œ„ ํ•ญ๋ชฉ์„ ์„ ํƒ ์ทจ์†Œ"ํ•˜๋Š” ์—ก์‹ค๋ก  ๊ธฐ๋ฐ˜ ๋ณ‘ํ•ฉ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

@lmcinnes ๋˜๋Š” @cmalzer ๊ฐ€ "allow_single_cluster"์™€ "cluster_selection_epsilon"์˜ ์กฐํ•ฉ์— ๋Œ€ํ•ด ๊ณ„์ธต์˜ ์ƒํ–ฅ ์ˆœํšŒ ๋™์•ˆ _๋ฃจํŠธ ๋…ธ๋“œ์— ๋„๋‹ฌํ•˜๋Š” ๊ฒƒ์„ ํ—ˆ์šฉํ•˜๊ธฐ๋ฅผ ์›ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ๋˜๋Š” ์˜๋„ํ•˜์ง€ ์•Š์€ ๊ฒฐ๊ณผ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค๋ฉด? ์ด๊ฒƒ์€ ์˜ฌ๋ฐ”๋ฅธ ์›€์ง์ž„์ฒ˜๋Ÿผ ๋ณด์ž…๋‹ˆ๋‹ค.

๋‚˜๋Š” ๊ทธ๊ฒƒ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๊ตฌํ˜„ํ•˜๊ณ  ๋‚ด๊ฐ€๋ณด๊ณ  ์žˆ๋˜ ๋ช‡ ๊ฐ€์ง€ ์ž˜๋ชป๋œ ๋™์ž‘์„ ์ˆ˜์ •ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋Š” ์ƒˆ๋กœ์šด ์ปค๋ฐ‹์„ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ์ปค๋ฐ‹์„ ํ•˜๊ธฐ ์ „์— ๋” ์ฒ ์ €ํ•œ ์กฐ์‚ฌ๋ฅผ ํ•˜์ง€ ๋ชปํ•œ ์ ์— ๋Œ€ํ•ด ์‚ฌ๊ณผ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”, ์ง€์ ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. "allow_single_cluster" ์˜ต์…˜์— ๋Œ€ํ•ด cluster_selection_epsilon์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์„ ์žŠ์—ˆ์Šต๋‹ˆ๋‹ค. (๋…ผ๋ฌธ์— ์„ค๋ช…๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ถ„ํ• ํ•˜๋ ค๋Š” ๊ฒฝ์šฐ์—๋งŒ ๊ณ ๋ ค๋จ)

๊ทธ๋Ÿฌ๋‚˜ ์ตœ์‹  ์ปค๋ฐ‹์„ ์‚ดํŽด๋ณด๊ณ  allow_single_cluster๊ฐ€ True๋กœ ์„ค์ •๋œ ๊ฒฝ์šฐ "traverse_upwards"์—์„œ ๋ฃจํŠธ๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด ์‹ค์ œ๋กœ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

๋น ๋ฅธ ๋‹ต๋ณ€๊ณผ ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ค๊ณ„ ์ž‘์—…์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

@lmcinnes ์•ˆํƒ€๊น๊ฒŒ๋„ ์ด ๋ฌธ์ œ๋Š” ํ•ด๊ฒฐ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ๋งŒ๋“  ๋‘ ๋ฒˆ์งธ pull ์š”์ฒญ์€ ๋‚ด๊ฐ€ ์‚ฌ์šฉํ•˜๋˜ ๋‹จ์œ„ ํ…Œ์ŠคํŠธ๋ฅผ ํ†ต๊ณผํ–ˆ๊ณ  ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ ๋‹จ์ผ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ—ˆ์šฉํ•˜์ง€๋งŒ @danielzgtg์—์„œ ์ œ๊ณตํ•œ ์ฝ”๋“œ์—๋Š” ์˜ˆ์™ธ์ ์ธ ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

๋””๋ฒ„๊น…์„ ์‹œ๋„ํ–ˆ๋Š”๋ฐ _hdbscan_tree์—์„œ cluster_tree['lambda_val] ๊ฐ€ ๋น„์–ด ์žˆ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค. pyx:638 . ๊ทธ๋ž˜๋„ cython์„ ๋””๋ฒ„๊น…ํ•˜๋Š” ๋ฐ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค(์—ฌ์ „ํžˆ cython์— ๋งค์šฐ ์ต์ˆ™ํ•˜์ง€ ์•Š์Œ). ๋ˆ„๊ตฐ๊ฐ€ ์ด ์กฐ์‚ฌ๋ฅผ ๋„์šธ ์‹œ๊ฐ„์ด ์žˆ๋‹ค๋ฉด ๋Œ€๋‹จํžˆ ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

๋‚˜๋Š” ๋ฐฉ๊ธˆ ๊ทธ๊ฒƒ์„ ๋ณด์•˜๋‹ค. ๋ฌธ์ œ๋Š” ์‹ค์ œ๋กœ ์ด ์‹œ์ ์—์„œ cluster_tree๊ฐ€ ๋น„์–ด ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ cluster_tree['lambda_val'][cluster_tree['child'] == leaf] ๋Š” ๋นˆ ๋ฐฐ์—ด์ด๋ฉฐ ์ฒซ ๋ฒˆ์งธ ์š”์†Œ([0])์— ์•ก์„ธ์Šคํ•˜๋ ค๊ณ  ํ•˜๋ฉด IndexError๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

cluster_tree๊ฐ€ ๋น„์–ด ์žˆ๋Š” ์ด์œ ๋Š” "cluster_tree = tree[tree['child_size'] > 1]" ํ–‰ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ํŠธ๋ฆฌ์—๋Š” ํฌ๊ธฐ๊ฐ€ 1์ธ ์ž์‹์ด ์žˆ๋Š” ๋ถ€๋ชจ ๋…ธ๋“œ๋งŒ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

์ด ์ฝ”๋“œ๋Š” ๋ฆฌํ”„ ๋ชฉ๋ก์ด ๋น„์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— cluster_selection_method="leaf"์— ๋Œ€ํ•ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. "eom"์˜ ๊ฒฝ์šฐ ์ƒ์œ„ ํด๋Ÿฌ์Šคํ„ฐ๋Š” "๋ฆฌํ”„"๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

์ด๊ฒƒ์€ epsilon_search๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์ „์— cluster_tree๊ฐ€ ๋น„์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์—ฌ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, "๋งŒ์•ฝ cluster_selection_epsilon != 0.0 and cluster_tree.shape[0] > 0: ... "

์กฐ๊ฑด๋ถ€ cluster_tree.shape[0] > 0 ์ œ์•ˆ์œผ๋กœ ํ…Œ์ŠคํŠธํ–ˆ์ง€๋งŒ ์‹คํŒจํ–ˆ์Šต๋‹ˆ๋‹ค. ์•ฝ๊ฐ„์˜ printf ๋””๋ฒ„๊น…์„ ์ˆ˜ํ–‰ํ•œ ํ›„ cluster_tree์—๋Š” ~26๊ฐœ์˜ ์š”์†Œ๊ฐ€ ์žˆ์—ˆ๊ณ  eom_clusters์—๋Š” ๋ฃจํŠธ ๋…ธ๋“œ์ธ ์ •ํ™•ํžˆ ํ•˜๋‚˜์˜ ์š”์†Œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ 637ํ–‰์—์„œ cluster_tree['child'] == leaf ๋ฅผ ํ‰๊ฐ€ํ•  ๋•Œ ๋ฆฌํ”„๋Š” ๋ฃจํŠธ ๋…ธ๋“œ์ด๊ณ  ํ•˜์œ„ ํ•ญ๋ชฉ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด๊ฑด ์–ด๋•Œ์š”?

cpdef set epsilon_search(set leaves, np.ndarray cluster_tree, np.double_t cluster_sel\
ection_epsilon, np.intp_t allow_single_cluster):                                      

    selected_clusters = list()                                                        
    processed = list()                                                                
    root_node = cluster_tree['parent'].min()                                          

    for leaf in leaves:                                                               
        if leaf == root_node:                                                         
            if allow_single_cluster:                                                  
                selected_clusters.append(leaf)                                        
                break                                                                 
            continue                                                                  
        eps = 1/cluster_tree['lambda_val'][cluster_tree['child'] == leaf][0]          
        if eps < cluster_selection_epsilon:                                           
            if leaf not in processed:                                                 

์†”์งํžˆ ์ข€ ํ—ท๊ฐˆ๋ฆฝ๋‹ˆ๋‹ค. ๋‹จ์ผ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ—ˆ์šฉํ•˜๋ ค๋Š” ๊ฒฝ์šฐ ๋…ธ์ด์ฆˆ ํด๋Ÿฌ์Šคํ„ฐ๋งŒ ํ—ˆ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ? ์ด ๊ฒฝ์šฐ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

cpdef set epsilon_search(set leaves, np.ndarray cluster_tree, np.double_t cluster_sel\
ection_epsilon, np.intp_t allow_single_cluster):                                      

    selected_clusters = list()                                                        
    processed = list()                                                                
    root_node = cluster_tree['parent'].min()                                          

    for leaf in leaves:                                                               
        if leaf == root_node:                                                         
            if not allow_single_cluster:                                              
                selected_clusters.append(leaf)                                        
            break                                                                     
        eps = 1/cluster_tree['lambda_val'][cluster_tree['child'] == leaf][0]          
        if eps < cluster_selection_epsilon:  

์ž˜๋ชป๋œ ์ˆ˜์˜ ์ž„์˜ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋กœ ํ…Œ์ŠคํŠธํ•œ ๋‹ค์Œ ๋™์ผํ•œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์ง€๋งŒ ์›์ธ์€ ๋‹ค๋ฅธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ๊ณผ ๊ฐ™์ด epsilon_search ์™ธ๋ถ€์—์„œ ๊ฒ€์‚ฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์‹ค์ œ๋กœ ์กฐ๊ธˆ ๋” ๋น ๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

if cluster_selection_epsilon != 0.0 and cluster_tree.shape[0] > 0:

            eom_clusters = [c for c in is_cluster if is_cluster[c]]
            if not (len(eom_clusters) == 1 and eom_clusters[0] == cluster_tree['parent'].min()):
                selected_clusters = epsilon_search(set(eom_clusters), cluster_tree, cluster_selection_epsilon, allow_single_cluster)

(์ˆ˜์ •์ด ๋งŽ์•„ ์ฃ„์†กํ•ฉ๋‹ˆ๋‹ค)

๋น ๋ฅธ ์‘๋‹ต์— ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค! ๊ทธ๊ฑฐ ์ข‹์€ ์ƒ๊ฐ์ด์•ผ. ์˜ค๋Š˜ ๋ฐค์— ํ’€ ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ํ• ๊ฒŒ์š”.

@cmalzer

๋‚˜๋Š” ๋‹น์‹ ์˜ ์ œ์•ˆ์œผ๋กœ ํ’€ ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ์˜ฌ๋ ธ์Šต๋‹ˆ๋‹ค.

epsilon_search() ๋ฉ”์„œ๋“œ ๋‚ด๋ถ€์—์„œ ํ™•์ธ์„ ํ•œ ์ด์œ ๋Š” ๋ฃจํŠธ ๋…ธ๋“œ๊ฐ€ eom_clusters ์— ์žˆ์„ ๋•Œ ํ•ด๋‹น ๋ชฉ๋ก์˜ ์œ ์ผํ•œ ํ•ญ๋ชฉ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ณด์žฅํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์‹ ํ•  ์ˆ˜ ์—†์—ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š๊ณ  ํ•œ ๊ฐ€์ง€ ๊ฒฝ์šฐ๋งŒ ํ™•์ธํ•˜๋ฉด ๋ฃจํŠธ ๋…ธ๋“œ๊ฐ€ eom_clusters ๋ชฉ๋ก์— ๋‹ค๋ฅธ ์žŽ์ด ํ˜ผํ•ฉ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ epsilon_search() ์—์„œ ์—ฌ์ „ํžˆ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๊ฐ€ ๊ทธ ๊ฐ€์ •์„ ํ•  ์ˆ˜ ์—†๋‹ค๋ฉด ๋‚˜๋Š” ์กฐ๊ฑด๋ถ€๋ฅผ ๋‹ค์Œ์œผ๋กœ ๋ณ€๊ฒฝํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

if (cluster_tree['parent'].min() in eom_clusters)

๋ฃจํŠธ๊ฐ€ ๋ชฉ๋ก์— ์ „ํ˜€ ์กด์žฌํ•˜์ง€ ์•Š๊ณ  ์ด๋ฅผ ์ถœ๋ ฅ์œผ๋กœ ํ—ˆ์šฉํ•˜๋ฉด ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฒ€์ƒ‰์—์„œ ๋ฐ˜ํ™˜๋˜๋Š” ์œ ์ผํ•œ ์—ก์‹ค๋ก  ์ž์‹์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

HDBSCAN์ด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์„ ํƒํ•˜๋ฉด ๋ชจ๋“  ํ•˜์œ„ ํ•ญ๋ชฉ๋„ ์„ ํƒ ์ทจ์†Œํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์‹ค์ œ๋กœ ๋ฃจํŠธ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๋‹ค๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ๋ชฉ๋ก์— ์žˆ๋Š” ๊ฒฝ์šฐ๋Š” ๋ฐœ์ƒํ•˜์ง€ ์•Š์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ "if .... in eom_clusters"์— ๋Œ€ํ•œ ๊ท€ํ•˜์˜ ์ œ์•ˆ์€ ์ œ ๋ฒ„์ „๋ณด๋‹ค ํ›จ์”ฌ ์ฝ๊ธฐ ์‰ฝ๊ธฐ ๋•Œ๋ฌธ์— ์‹ค์ œ๋กœ ํ•ด๋‹น ์ฝ”๋“œ ํ–‰์„ ์„ ํ˜ธํ•ฉ๋‹ˆ๋‹ค. ๋“œ๋ฌธ ๊ฒฝ์šฐ์ง€๋งŒ ์—ก์‹ค๋ก  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋งˆ๋‹ค ๋ฃจํŠธ ๋…ธ๋“œ์— ๋Œ€ํ•œ ์ „์ฒด ํด๋Ÿฌ์Šคํ„ฐ ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Šต๋‹ˆ๋‹ค... (์–ด์จŒ๋“  ๊ธธ์ด๋ฅผ ํ™•์ธํ•˜๋Š” ๋ฐ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋‚˜์š”?) ์ €๋Š” ์–ด๋Š ์ชฝ์ด๋“  ๊ดœ์ฐฎ์Šต๋‹ˆ๋‹ค.

์„ค๋ช…ํ•ด์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค! ์•„๋งˆ๋„ ์ค‘๊ฐ„ ์ง€์ ์œผ๋กœ ์„ฑ๋Šฅ์„ ์œ„ํ•ด ๊ทธ๋Œ€๋กœ๋‘๊ณ  ์œ„์— ์ฃผ์„์„ ๋‹ฌ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰