GMM.fit()μ μ¬μ©νμ¬ 1μ°¨μ κ°μ°μ€ λΆν¬λ₯Ό νΌν νλ©΄ scikit-learn λ²μ 0.16.1μ μ¬μ©νμ¬ λ°νμ μ€λ₯κ° μμ±λμ§λ§ 0.15.2λ₯Ό μ¬μ©νμ¬ μ μ ν 맀κ°λ³μκ° μμ±λ©λλ€.
λ¬Έμ λ₯Ό 보μ¬μ£Όλ 짧μ μ:
import sklearn
from sklearn import mixture
import numpy as np
from scipy import stats
import sys
# the version info
print("Python version: %s.%s" %(sys.version_info.major, sys.version_info.minor))
print("scikit-learn version: %s" %(sklearn.__version__))
# some pretend data
np.random.seed(seed=0)
data = stats.norm.rvs(loc=100, scale=1, size=1000)
print("Data mean = %s, Data std dev = %s" %(np.mean(data), np.std(data)))
# Fitting using a GMM with a single component
clf = mixture.GMM(n_components=1)
clf.fit(data)
print(clf.means_, clf.weights_, clf.covars_)
scikit-learn 0.15.2λ‘ μ΄ μμ μ½λλ₯Ό μ€ννλ©΄ μ¬λ°λ₯Έ μΆλ ₯μ΄ μμ±λ©λλ€.
Python version: 3.4
scikit-learn version: 0.15.2
Data mean = 99.9547432925, Data std dev = 0.987033158669
[[ 99.95474329]] [ 1.] [[ 0.97523446]]
κ·Έλ¬λ scikit-learn 0.16.1μ μ¬μ©νλ μ νν λμΌν μ½λλ λ€μκ³Ό κ°μ μμΆμ μ μ 곡ν©λλ€.
Python version: 3.4
scikit-learn version: 0.16.1
Data mean = 99.9547432925, Data std dev = 0.987033158669
/home/rebecca/anaconda/envs/new_sklearn/lib/python3.4/site-packages/numpy/lib/function_base.py:1890: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
/home/rebecca/anaconda/envs/new_sklearn/lib/python3.4/site-packages/numpy/lib/function_base.py:1901: RuntimeWarning: invalid value encountered in true_divide
return (dot(X, X.T.conj()) / fact).squeeze()
Traceback (most recent call last):
File "test_sklearn.py", line 18, in <module>
clf.fit(data)
File "/home/rebecca/anaconda/envs/new_sklearn/lib/python3.4/site-packages/sklearn/mixture/gmm.py", line 498, in fit
"(or increasing n_init) or check for degenerate data.")
RuntimeError: EM algorithm was never able to compute a valid likelihood given initial parameters. Try different init parameters (or increasing n_init) or check for degenerate data.
n_init, n_iter λ° covariance_type 맀κ°λ³μμ λ€μν κ°μ μλνμ΅λλ€. λν λ€μν λ°μ΄ν° μΈνΈλ₯Ό μλνμ΅λλ€. μ΄λ¬ν λͺ¨λ κ²°κ³Όλ 0.16.1μ μ¬μ©νμ¬ μ΄ μ€λ₯ λλ μ΄μ μ μ¬ν κ²°κ³Όλ₯Ό κ°μ Έμ€μ§λ§ 0.15.2λ₯Ό μ¬μ©νλ λ°λ μ ν λ¬Έμ κ° μμ΅λλ€. λ¬Έμ λ κΈ°λ μ΅λνμ μ¬μ©λ μ΄κΈ° 맀κ°λ³μμ κ΄λ ¨λ κ²μΌλ‘ 보μ΄λ―λ‘ μ΄κ²μ΄ μ΄ λ¬Έμ μ κ΄λ ¨μ΄ μμ μ μμ΅λλ€. #4429
μ΄κ²μ΄ μ μ©ν μ λ³΄μΈ κ²½μ° scikit-learnμ μλ‘ μ€μΉν μλμ½λ€ κ°μ νκ²½μ λ€μκ³Ό κ°μ΄ μ€μ νμ΅λλ€(λ²μ 0.16.1μ©).
conda create -n new_sklearn python=3.4
source activate new_sklearn
conda install sklearn
μ΄λ λ°μ΄ν° ννμ λ¬Έμ κ° μμ μ μμ΅λλ€.
X
1ndimμ
λκΉ μλλ©΄ 2ndimμ
λκΉ?
0.15μ 0.16 μ¬μ΄μ μλνμ§ μμ λμ λ³κ²½μ΄ μμ μ μμ§λ§ μμΌλ‘ 1ndim μ
λ ₯μ μ§μνμ§ μκΈ°λ‘ κ²°μ νκΈ° λλ¬Έμ μ
λ ₯ λͺ¨μμ X.shape = (n_samples, 1)
μ΄μ΄μΌ ν©λλ€.
λ ν μμμ΄
X = X.reshape(-1, 1)
κ·Έλ μ§ μμΌλ©΄ νλμ μν λλ νλμ κΈ°λ₯μ μλ―Ένλ κ²½μ° λ€μ λͺ¨νΈν©λλ€.
μ, μ λ ₯ λ°μ΄ν°μ ννλ₯Ό λ³κ²½νλ©΄ μ λλ‘ μλν©λλ€. κ°μ¬ν©λλ€!
μλ νμΈμ, μμ μ½λλ μ€λ₯λ₯Ό μμ νμ§λ§ λ΄ μκ°μλ λμμ΄ λ€λ¦ λλ€. μ΄ νν λ¦¬μΌ μ½λλ₯Ό μ€ν
λμΌν κ²°κ³Όλ₯Ό μ»μ μ μλλ‘ λ³κ²½ν μ μλ νλͺ©(맀κ°λ³μ, μλ§λ)μ΄ μμ΅λκΉ? κ°μ¬!
@imadie
νν 리μΌμμ λ€μ λΌμΈμ λ€μμΌλ‘ λ³κ²½ μ°Έμ‘°:
clf = GMM(4, n_iter=500, random_state=3)
x.shape = (x.shape[0],1)
clf = clf.fit(x)
xpdf = np.linspace(-10, 20, 1000)
xpdf.shape = (xpdf.shape[0],1)
λ°λ = np.exp(clf.score(xpdf))
κ°μ₯ μ μ©ν λκΈ
μ΄λ λ°μ΄ν° ννμ λ¬Έμ κ° μμ μ μμ΅λλ€.
X
1ndimμ λκΉ μλλ©΄ 2ndimμ λκΉ?0.15μ 0.16 μ¬μ΄μ μλνμ§ μμ λμ λ³κ²½μ΄ μμ μ μμ§λ§ μμΌλ‘ 1ndim μ λ ₯μ μ§μνμ§ μκΈ°λ‘ κ²°μ νκΈ° λλ¬Έμ μ λ ₯ λͺ¨μμ
X.shape = (n_samples, 1)
μ΄μ΄μΌ ν©λλ€.λ ν μμμ΄
κ·Έλ μ§ μμΌλ©΄ νλμ μν λλ νλμ κΈ°λ₯μ μλ―Ένλ κ²½μ° λ€μ λͺ¨νΈν©λλ€.