Pytorch: μ˜΅ν‹°λ§ˆμ΄μ € load_state_dict() 문제?

에 λ§Œλ“  2017λ…„ 09μ›” 22일  Β·  23μ½”λ©˜νŠΈ  Β·  좜처: pytorch/pytorch

μ•ˆλ…•ν•˜μ„Έμš”, 이 버그가 λ°œμƒν–ˆμŠ΅λ‹ˆλ‹€:

    optimizer.step()
    exp_avg.mul_(beta1).add_(1 - beta1, grad)

TypeError: add_ received an invalid combination of arguments - got (float, torch.cuda.FloatTensor), but expected one of:
 * (float value)
 * (torch.FloatTensor other)
 * (torch.SparseFloatTensor other)
 * (float value, torch.FloatTensor other)
      didn't match because some of the arguments have invalid types: (float, torch.cuda.FloatTensor)
 * (float value, torch.SparseFloatTensor other)
      didn't match because some of the arguments have invalid types: (float, torch.cuda.FloatTensor)

μ½”λ“œ μŠ€μΌˆλ ˆν†€μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

model = Model()
model.load_state_dict(checkpoint['model'])
model.cuda()

optimizer = optim.Adam()
optimizer.load_state_dict(checkpoint['optimizer'])

...
#  In train loop
for epoch in range(...):
  ...
  optimizer.step()
     -> BUG <-

λ‘œλ“œλœ param_groups κ°€ torch.cuda.FloatTensor 인 것 κ°™μœΌλ©° ν•΄κ²° 방법을 μ‹œλ„ν–ˆμŠ΅λ‹ˆλ‹€.
optmizer.param_groups λ₯Ό cpu 둜 μ΄λ™ν•˜μ§€λ§Œ μ—¬μ „νžˆ λ™μΌν•œ 버그가 μžˆμŠ΅λ‹ˆλ‹€.

awaiting response (this tag is deprecated) needs reproduction

κ°€μž₯ μœ μš©ν•œ λŒ“κΈ€

@apaszke μ•„, λ‚΄ λ‚˜μœ. μ˜΅ν‹°λ§ˆμ΄μ €κ°€ λ‹€μ‹œ μƒμ„±λ˜λŠ” 쀄을 μ—…λ°μ΄νŠΈν•˜λŠ” 것을 μžŠμ—ˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 그렇지 μ•ŠμœΌλ©΄ λ‹€μŒ μž‘μ—…μ„ μˆ˜ν–‰ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ§žμŠ΅λ‹ˆκΉŒ?

model = Model()
model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.Adam(model.parameters())
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

λͺ¨λ“  23 λŒ“κΈ€

문제λ₯Ό μž¬ν˜„ν•˜κΈ° μœ„ν•œ 전체 슀크립트λ₯Ό μ œκ³΅ν•  수 μžˆμŠ΅λ‹ˆκΉŒ?

μ–΄μ©Œλ©΄ 당신은 μ΄λ ‡κ²Œ μ‹œλ„ ν•  수 μžˆμŠ΅λ‹ˆλ‹€,
Optimizer.step()
exp_avg.mul_(베타1).add_(1 - 베타1, grad.cpu())

μ£„μ†‘ν•©λ‹ˆλ‹€. λ‹΅μž₯ 이메일을 λ†“μ³€μŠ΅λ‹ˆλ‹€.

μ§€κΈˆ μž¬μƒκΈ°λ₯Ό μ œκ³΅ν•  수 μ—†μŠ΅λ‹ˆλ‹€. OpenNMT-py ν”„λ‘œμ νŠΈ : https://github.com/OpenNMT/OpenNMT-pyμ—μ„œ lr μ—…λ°μ΄νŠΈλ₯Ό μœ„ν•΄ lr_scheduler λ₯Ό μ‚¬μš©ν•˜λ €κ³  ν•˜λŠ” μž‘μ—…μž…λ‹ˆλ‹€. 그리고 resume a suspended training μΌ€μ΄μŠ€λ₯Ό ν…ŒμŠ€νŠΈν•  λ•Œ 이 λ¬Έμ œκ°€ λ°œμƒν–ˆμŠ΅λ‹ˆλ‹€. κ·Έλž˜μ„œ μœ„μ—μ„œ 이 λ¬Έμ œμ— λŒ€ν•œ μ½”λ“œ 골격을 μ œμ™Έν–ˆμŠ΅λ‹ˆλ‹€.

@hefeicypκ°€ μ œμ•ˆν•˜λŠ” 것과 같은 νŠΈλ¦­μ„ ν¬ν•¨ν•˜μ—¬ μ—¬λŸ¬ 방법을 μ‹œλ„ν–ˆμ§€λ§Œ μ—¬μ „νžˆ λ°œμƒν•©λ‹ˆλ‹€.

λ‚΄ 뢄석에 λ”°λ₯΄λ©΄ 이전 ꡐ윑이 GPUμ—μ„œ μˆ˜ν–‰λ˜μ—ˆκΈ° λ•Œλ¬Έμ— optimizer.state_dict μ €μž₯ν•  λ•Œ μ €μž₯된 μƒνƒœ(ν…μ„œ)λŠ” cuda λ²„μ „μž…λ‹ˆλ‹€. μž¬κ°œν•˜λŠ” λ™μ•ˆ μ €μž₯된 μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό λ‘œλ“œν•  λ•Œ load_state_dict() 이 cuda 버전을 cpu둜 λ‘œλ“œν•©λ‹ˆλ‹€(λͺ¨λΈ( nn.Module )λŠ” gpu둜 μ‰½κ²Œ 이동할 수 μžˆμ§€λ§Œ torch.optimizer 이 λŠ₯λ ₯이 λΆ€μ‘±ν•œ 것 κ°™μŠ΅λ‹ˆκΉŒ?) , κ·Έλž˜μ„œ 이 λ¬Έμ œκ°€ λ‚˜νƒ€λ‚©λ‹ˆλ‹€.

μ²΄ν¬ν¬μΈνŠΈμ—μ„œ λ‘œλ“œν•œ ν›„ μ˜΅ν‹°λ§ˆμ΄μ € μƒνƒœλ₯Ό GPU λ©”λͺ¨λ¦¬λ‘œ μˆ˜λ™μœΌλ‘œ 이동해 λ³΄μ‹­μ‹œμ˜€.

optimizer = optim.Adam()
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

이 μž‘μ—…μ— optimizer.cuda() λ©”μ„œλ“œλ₯Ό μ‚¬μš©ν•˜λŠ” 것이 μ’‹λ‹€λŠ” 데 λ™μ˜ν•©λ‹ˆλ‹€.

@dogancan , κ°μ‚¬ν•©λ‹ˆλ‹€. λ‹€λ₯Έ 문제둜 인해 μž‘μ—…μ΄ μ€‘λ‹¨λ˜μ—ˆμŠ΅λ‹ˆλ‹€. 재개되면 κ·€ν•˜μ˜ 방법을 μ‹œλ„ν•˜κ² μŠ΅λ‹ˆλ‹€.

@dogancan 의 μ†”λ£¨μ…˜μ΄ μž‘λ™ν•˜μ§€ μ•Šμ„ 것 κ°™μŠ΅λ‹ˆλ‹€. 그러면 였λ₯˜κ°€ μ‚¬λΌμ§€μ§€λ§Œ μ˜΅ν‹°λ§ˆμ΄μ €λŠ” 더 이상 λͺ¨λΈμ„ ν›ˆλ ¨ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. λͺ¨λ“ˆμ„ λ‹€λ₯Έ μœ ν˜•μ΄λ‚˜ μž₯치둜 μΊμŠ€νŒ…ν•œ ν›„ μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό λ‹€μ‹œ 생성해야 ν•˜λ©° load_state_dict λ₯Ό μ‚¬μš©ν•˜μ—¬ 이전 λ³΅μ‚¬λ³Έμ—μ„œ μƒνƒœλ₯Ό 볡원할 수 μžˆμŠ΅λ‹ˆλ‹€. 이것은 ν˜„μž¬ μž‘λ™ν•˜μ§€ μ•Šμ§€λ§Œ μˆ˜μ •ν•΄μ•Ό ν•©λ‹ˆλ‹€(ν…μ„œλ₯Ό 직접 μ‚¬μš©ν•˜λŠ” λŒ€μ‹  μƒνƒœ λ”•μ…”λ„ˆλ¦¬μ—μ„œ 데이터λ₯Ό λ³΅μ‚¬ν•˜μ—¬ ꡐ차 μž₯치 λ˜λŠ” ꡐ차 μœ ν˜• μ—…λ°μ΄νŠΈλ₯Ό ν—ˆμš©ν•©λ‹ˆλ‹€).

@apaszke , λ„€, κ·€ν•˜μ˜ 방법은 ν˜„μž¬ μ‚¬μš©ν•˜λŠ” 방법이며 μž‘λ™ν•©λ‹ˆλ‹€. ν•˜μ§€λ§Œ μ—…μŠ€νŠΈλ¦Όμ—μ„œ 이 문제λ₯Ό ν•΄κ²°ν•  λ•ŒκΉŒμ§€ 기닀릴 κ²ƒμž…λ‹ˆλ‹€. λ‹Ήμ‹ μ˜ ν›Œλ₯­ν•œ μž‘ν’ˆμ— κ°μ‚¬λ“œλ¦½λ‹ˆλ‹€!

@apaszke μ•„, λ‚΄ λ‚˜μœ. μ˜΅ν‹°λ§ˆμ΄μ €κ°€ λ‹€μ‹œ μƒμ„±λ˜λŠ” 쀄을 μ—…λ°μ΄νŠΈν•˜λŠ” 것을 μžŠμ—ˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 그렇지 μ•ŠμœΌλ©΄ λ‹€μŒ μž‘μ—…μ„ μˆ˜ν–‰ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ§žμŠ΅λ‹ˆκΉŒ?

model = Model()
model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.Adam(model.parameters())
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

μ•„, λ§žμ•„. μž‘λ™ν•΄μ•Ό ν•©λ‹ˆλ‹€ 😊

당신이 μ‚¬μš©ν•΄μ•Όν•œλ‹€λŠ” 점을 μ œμ™Έν•˜κ³  torch.is_tensor(v) λŒ€μ‹  isinstance(v, torch.Tensor)

λΉ„μŠ·ν•œ λ¬Έμ œκ°€μžˆμ—ˆμŠ΅λ‹ˆλ‹€. GPU 0 μ΄μ™Έμ˜ GPUμ—μ„œ μ˜΅ν‹°λ§ˆμ΄μ € μƒνƒœλ₯Ό μ €μž₯ν•œ λ‹€μŒ μƒνƒœλ₯Ό λ‘œλ“œν•˜λ©΄ μ—¬μ „νžˆ λͺ¨λ“  것이 GPU 0에 λ‘œλ“œλ©λ‹ˆλ‹€. map_location 에 torch.load() map_location λ₯Ό 지정해도 μž‘λ™ν•˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€. @dogancan 의 μ†”λ£¨μ…˜μ€ 이것을 ν•΄κ²°ν•©λ‹ˆλ‹€.

μ•ˆλ…•ν•˜μ„Έμš” μ—¬λŸ¬λΆ„, 이 μŠ€λ ˆλ“œμ˜ λ¬Έμ œμ™€ 맀우 μœ μ‚¬ν•œ λ¬Έμ œκ°€ μžˆμŠ΅λ‹ˆλ‹€. 제 μ½”λ“œλŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

model = inceptionresnetv2(num_classes=config['tr_classes'])
model = torch.nn.DataParallel(model).cuda()
model.load_state_dict(checkpoint['md_state_dict'])
optimizer = torch.optim.Adam(model.parameters(), lr=config['tr_lr'], weight_decay=config['tr_weightdecay'])
optimizer.load_state_dict(checkpoint['md_optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if torch.is_tensor(v):
            state[k] = v.cuda()

그리고 λ‹€μ‹œ μ‹œμž‘ν•˜λ©΄ μ˜΅ν‹°λ§ˆμ΄μ €μ—μ„œ KeyErrorsκ°€ λ°œμƒν•©λ‹ˆλ‹€.

---> 40         optimizer.step()
     41 
     42         config['am_batch_time'].update(time.time() - end)
~/.conda/envs/env_pytorch/lib/python3.5/site-packages/torch/optim/adam.py in step(self, closure)
     44                     continue
     45                 grad = p.grad.data
---> 46                 state = self.state[p]
     47 
     48                 # State initialization
KeyError: Parameter containing:
(0 ,0 ,.,.) = 
 -1.6336e-01 -5.6482e-01 -4.2228e-02
...
[torch.cuda.FloatTensor of size 32x3x3x3 (GPU 0)]

이 문제λ₯Ό ν•΄κ²°ν•˜λŠ” 방법을 μ•Œκ³  μžˆμŠ΅λ‹ˆκΉŒ? BTW, μ €λŠ” 8개의 GPUλ₯Ό μ‚¬μš©ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. 이 λ¬Έμ œκ°€ κ·Έ λ•Œλ¬ΈμΈμ§€ μΆ”μΈ‘ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.

@CodArs-van 닀쀑 GPU 문제λ₯Ό ν•΄κ²°ν•  수 μžˆμ—ˆμŠ΅λ‹ˆκΉŒ?

@rafaelvalle 질문

κ·Έλƒ₯ λŒ“κΈ€, 이 λ¬Έμ œλŠ”

    def load_state_dict(self, state_dict):
        ...
        # deepcopy, to be consistent with module API
        state_dict = deepcopy(state_dict)
       ...  

deepcopy λŠ” λͺ¨λ“  μƒνƒœ ν…μ„œλ₯Ό GPU0으둜 μ΄λ™ν•©λ‹ˆλ‹€.
λ”°λΌμ„œ μ˜΅ν‹°λ§ˆμ΄μ €μ˜ μƒνƒœλ₯Ό νŠΉμ • GPU둜 μ΄λ™ν•˜λ©΄ 이 λ¬Έμ œκ°€ ν•΄κ²°λ©λ‹ˆλ‹€.

μ•ˆλ…•ν•˜μ„Έμš” @lzcnλ‹˜ , λ‹€λ₯Έ ν…μ„œμ˜ νŠΉμ • GPU μœ„μΉ˜λ₯Ό 미리 μ–΄λ–»κ²Œ μ•Œ 수 μžˆμŠ΅λ‹ˆκΉŒ?

λͺ¨λ“  torch.save() 호좜이 항상 μžλ™μœΌλ‘œ μƒμ„±λœ CPU 버전을 μ‚¬μš©ν•˜λŠ” κΈ°λŠ₯이 μ‹€ν˜„ κ°€λŠ₯ν•©λ‹ˆκΉŒ?
그리고 λ‹€μ‹œ μ‹œμž‘ν•  λ•Œ torch.load()λŠ” μ‚¬μš© 쀑인 "ν˜„μž¬" μž₯치(λ˜λŠ” 더 λ‚˜μ€ μ „λž΅)λ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.
ν˜„μž¬ λͺ¨λΈ/μ˜΅ν‹°λ§ˆμ΄μ €/μŠ€μΌ€μ€„λŸ¬/λ“±μ˜ μž₯치 간에 μ €μž₯ 및 λ‘œλ“œκ°€ μΌκ΄€λ˜λ„λ‘ ν•˜κΈ° μœ„ν•΄ λ§Žμ€ μƒμš©κ΅¬ μ½”λ“œκ°€ ν•„μš”ν•œ 것 κ°™μŠ΅λ‹ˆλ‹€.

λΉ„μŠ·ν•œ λ¬Έμ œκ°€ λ°œμƒν•˜μ—¬ @dogancan 의 μ†”λ£¨μ…˜μ— 따라 model, model.cuda() 및 DataParallel(model)을 λ‹€μ‹œ λ‘œλ“œν•œ ν›„ optimizer.cuda() 없이 Adam μ΅œμ ν™” ν”„λ‘œκ·Έλž¨μ„ λ‹€μ‹œ

κ°μ‚¬ν•©λ‹ˆλ‹€. μž‘λ™ν•©λ‹ˆλ‹€!

@apaszke μ•„, λ‚΄ λ‚˜μœ. μ˜΅ν‹°λ§ˆμ΄μ €κ°€ λ‹€μ‹œ μƒμ„±λ˜λŠ” 쀄을 μ—…λ°μ΄νŠΈν•˜λŠ” 것을 μžŠμ—ˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 그렇지 μ•ŠμœΌλ©΄ λ‹€μŒ μž‘μ—…μ„ μˆ˜ν–‰ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ§žμŠ΅λ‹ˆκΉŒ?

model = Model()
model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.Adam(model.parameters())
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

@apaszke
μ•ˆλ…•ν•˜μ„Έμš”, λ§μ”€ν•˜μ‹ λŒ€λ‘œ λͺ¨λΈμ„ λ‹€λ₯Έ μž₯치둜 이동할 λ•Œλ§ˆλ‹€ μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό λΉŒλ“œν•΄μ•Ό ν•˜μ§€λ§Œ λͺ¨λΈμ„ λ‹€λ₯Έ μž₯치둜 μ΄λ™ν•˜κ³  λ’€λ‘œ μ΄λ™ν•˜λ©΄ μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό λ‹€μ‹œ λΉŒλ“œν•΄μ•Ό ν•©λ‹ˆκΉŒ?
λ‹€μŒμ€ 예제 μ½”λ“œμž…λ‹ˆλ‹€.

model = Model()
model.cuda()
optimizer = optim.Adam(model.parameters())

for d, gt in trn_dataloader:
    # train
    ... 
    optimizer.step()
    model.cpu() # move to cpu
    # eval or do other things
    ...
    model.cuda()  # but finnally, move back

μ΅œμ ν™” ν”„λ‘œκ·Έλž¨μ΄ μ˜ˆμƒλŒ€λ‘œ μ‹€ν–‰λ©λ‹ˆκΉŒ?

λ˜ν•œ model.to(model.device) ν•˜λ©΄ μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό λ‹€μ‹œ λΉŒλ“œν•΄μ•Ό ν•©λ‹ˆκΉŒ?

@apaszke μ•„, λ‚΄ λ‚˜μœ. μ˜΅ν‹°λ§ˆμ΄μ €κ°€ λ‹€μ‹œ μƒμ„±λ˜λŠ” 쀄을 μ—…λ°μ΄νŠΈν•˜λŠ” 것을 μžŠμ—ˆμŠ΅λ‹ˆλ‹€. κ·ΈλŸ¬λ‚˜ 그렇지 μ•ŠμœΌλ©΄ λ‹€μŒ μž‘μ—…μ„ μˆ˜ν–‰ν•΄μ•Ό ν•©λ‹ˆλ‹€. λ§žμŠ΅λ‹ˆκΉŒ?

model = Model()
model.load_state_dict(checkpoint['model'])
model.cuda()
optimizer = optim.Adam(model.parameters())
optimizer.load_state_dict(checkpoint['optimizer'])
for state in optimizer.state.values():
    for k, v in state.items():
        if isinstance(v, torch.Tensor):
            state[k] = v.cuda()

@apaszke μ΄λ ‡κ²Œ μˆœμ„œλ₯Ό

```파이썬
λͺ¨λΈ = λͺ¨λΈ()
model.to('μΏ λ‹€')
μ˜΅ν‹°λ§ˆμ΄μ € = optim.Adam(model.parameters())
optimizer.load_state_dict(체크포인트['μ˜΅ν‹°λ§ˆμ΄μ €'])
optimizer.state.values()의 μƒνƒœ:
state.items()의 k, v의 경우:
if isinstance(v, torch.Tensor):
μƒνƒœ[k] = v.cuda()
model.load_state_dict(체크포인트['λͺ¨λΈ'])

λͺ¨λΈμ„ 'cuda'둜 μ΄λ™ν•˜μ§€λ§Œ μ˜΅ν‹°λ§ˆμ΄μ €μ˜ μƒνƒœ λ”•μ…”λ„ˆλ¦¬λ₯Ό λ¨Όμ € λ‘œλ“œν•œ ν›„ μ²΄ν¬ν¬μΈνŠΈμ—μ„œ μƒνƒœ λ”•μ…”λ„ˆλ¦¬λ§Œ λ‘œλ“œν•œλ‹€λŠ” μ˜λ―Έμž…λ‹ˆκΉŒ?

λ¬Έμ œλŠ” μ˜΅ν‹°λ§ˆμ΄μ €μ˜ μƒνƒœκ°€ λͺ¨λΈκ³Ό λ™μΌν•˜κ²Œ μž₯μΉ˜μ— λ‘œλ“œλœλ‹€λŠ” 결둠을 내릴 수 μžˆμŠ΅λ‹ˆλ‹€. λ¨Όμ € λͺ¨λΈμ„ GPU에 λ‘œλ“œν•œ λ‹€μŒ μ˜΅ν‹°λ§ˆμ΄μ €μ˜ μƒνƒœλ₯Ό λ‘œλ“œν•΄μ•Ό ν•©λ‹ˆλ‹€. λͺ¨λΈκ³Ό μ΅œμ ν™” ν”„λ‘œκ·Έλž¨μ˜ μƒνƒœκ°€ λͺ¨λ‘ GPU에 λ‘œλ“œλ˜λ„λ‘ ν•©λ‹ˆλ‹€.

cpu에 λ‘œλ“œν•œ ν›„ μ˜΅ν‹°λ§ˆμ΄μ €λ₯Ό cuda둜 μ΄λ™ν•˜λŠ” λŒ€μ‹  cudaμ—μ„œ 직접 체크포인트λ₯Ό λ‘œλ“œν•  수 μžˆμŠ΅λ‹ˆλ‹€.

model.to(device)

ckpt = torch.load(<model_path>, map_location=device)

model.load_state_dict(ckpt['state_dict'])
optimizer.load_state_dict(ckpt['optimizer'])
scheduler.load_state_dict(ckpt['scheduler'])

del ckpt
이 νŽ˜μ΄μ§€κ°€ 도움이 λ˜μ—ˆλ‚˜μš”?
0 / 5 - 0 λ“±κΈ‰