Pytorch: [PyTorch][기능 요청] CrossEntropyLoss에 대한 레이블 평활화

에 만든 2018년 05월 10일 · 22코멘트 · 출처: pytorch/pytorch

안녕, 얘들아. target 의 torch.LongTensor 유형은 reference 의 일부 메소드와 같이 구현을 방해합니다. 따라서 Arg: label_smoothing 대해 torch.nn.CrossEntropyLoss() 를 추가하거나 target 를 one-hot vector 로 변환하여 작업하는 방법을 보여주는 문서를 추가할 수 있습니까? torch.nn.CrossEntropyLoss() 함께 또는 다른 간단한 방법? 감사 해요.

cc @ezyang @gchanan @zou3519 @bdhirsh @albanD @mruberry

enhancement high priority loss nn triage review triaged

출처

KaiyuYue

👍51

가장 유용한 댓글

여기 내 도구가 있습니다

class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1):
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        pred = pred.log_softmax(dim=self.dim)
        with torch.no_grad():
            # true_dist = pred.data.clone()
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

PistonY 에 2019년 07월 19일

👍78 🎉15 ❤4

모든 22 댓글

@KaiyuYue
label_smoothing의 경우 NJUNMT-pytorch 구현을

수업에서 NMTCritierion

https://github.com/whr94621/NJUNMT-pytorch/blob/aff968c0da9273dc42eabbb8ac4e459f9195f6e4/src/modules/criterions.py#L131

whr94621 에 2018년 05월 10일

👍4

https://discuss.pytorch.org/t/cross-entropy-with-one-hot-targets/13580/5를 참조 cross_entropy() 함수는 네트워크 출력과 동일한 차원을 갖는 평활화된 레이블과 함께 작동해야 합니다.

레이블 스무딩은 다양한 방법으로 수행할 수 있고 스무딩 자체는 사용자가 수동으로 쉽게 수행할 수 있기 때문에 CrossEntropyLoss() 가 label_smoothing 옵션을 직접 지원해야 한다고 생각하지 않습니다. 그러나 스칼라 값으로 표현할 수 없는 대상을 처리하는 방법이나 CrossEntropyLoss (k-hot/smoothed) 대상 전달에 대한 지원을 추가하는 방법에 대해 문서에서 최소한 언급해야 한다는 데 동의합니다.

mdraw 에 2018년 05월 13일

어쩌면 NonSparseCrossEntropy 와 같은 sth가 필요할까요? (글쎄.. 이름을 짓기 어렵다)

Jiaming-Liu 에 2018년 05월 16일

👍2

여기 내 도구가 있습니다

class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1):
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        pred = pred.log_softmax(dim=self.dim)
        with torch.no_grad():
            # true_dist = pred.data.clone()
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

PistonY 에 2019년 07월 19일

👍78 🎉15 ❤4

@mdraw에 동의합니다.
좋은 선택은 두 단계로 수행하는 것입니다.

기능을 사용하여 부드러운 레이블 얻기

def smooth_one_hot(true_labels: torch.Tensor, classes: int, smoothing=0.0):
    """
    if smoothing == 0, it's one-hot method
    if 0 < smoothing < 1, it's smooth method

    """
    assert 0 <= smoothing < 1
    confidence = 1.0 - smoothing
    label_shape = torch.Size((true_labels.size(0), classes))
    with torch.no_grad():
        true_dist = torch.empty(size=label_shape, device=true_labels.device)
        true_dist.fill_(smoothing / (classes - 1))
        true_dist.scatter_(1, true_labels.data.unsqueeze(1), confidence)
    return true_dist

CrossEntropyLoss k-hot/smoothed 대상을 지원하도록 합니다.

그러면 다음과 같이 사용할 수 있습니다.

Loss = CrossEntropyLoss(NonSparse=True, ...)
. . .
data = ...
labels = ...

outputs = model(data)

smooth_label = smooth_one_hot(labels, ...)
loss = (outputs, smooth_label)
...

그런데 ImageNet에서 내 구현을 테스트했는데 좋아 보입니다.

|모델 | 시대| dtype |배치 크기*|gpus | lr | 트릭|top1/top5 |개선 |
|:----:|:-----:|:-----:|:---------:|:----:|:---:|: ------:|:---------:|:------:|
|resnet50|120 |FP16 |128 | 8 |0.4 | - |77.35/- |기준|
|resnet50|120 |FP16 |128 | 8 |0.4 |레이블 스무딩|77.78/93.80| +0.43 |

PistonY 에 2019년 07월 22일

👍25 🚀8 ❤3 🎉3

@zhangguanheng66 이 이것이 그가 미래에 볼 수 있을 것이라고 말한 것 같습니다.

ezyang 에 2019년 07월 22일

그냥 torch.nn.KLDivLoss를 사용하세요. 그것은 동일합니다.

업데이트: 동일하지 않습니다.

suanrong 에 2019년 07월 29일

👍21 👎15

나는 이것이 새로운 Snorkel lib가 구현한 것과 유사하다고 생각합니다.
https://snorkel.readthedocs.io/en/master/packages/_autosummary/classification/snorkel.classification.cross_entropy_with_probs.html

사람들이 문제를 해결하는 방법에 대한 몇 가지 추가 정보

sadikneipp 에 2019년 09월 05일

Nvidia가 도움이 될 수 있는 방법은 https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/RN50v1.5 를 참조

Data-drone 에 2019년 09월 24일

👍4

@suanrong 감사합니다.

====
그리고 아마도 이것은 이 문제를 읽는 다른 사람들에게 도움이 될 것입니다.

0/1이 아닌 레이블에 대한 교차 엔트로피는 대칭이 아니므로 성능 저하에 대한 설명이 될 수 있습니다.
https://discuss.pytorch.org/t/cross-entropy-for-soft-label/16093/2

steermomo 에 2019년 10월 18일

❤1

제안된 구현:

class LabelSmoothLoss(nn.Module):

    def __init__(self, smoothing=0.0):
        super(LabelSmoothLoss, self).__init__()
        self.smoothing = smoothing

    def forward(self, input, target):
        log_prob = F.log_softmax(input, dim=-1)
        weight = input.new_ones(input.size()) * \
            self.smoothing / (input.size(-1) - 1.)
        weight.scatter_(-1, target.unsqueeze(-1), (1. - self.smoothing))
        loss = (-weight * log_prob).sum(dim=-1).mean()
        return loss

나는 그것을 확인했다 :
(1) smoothing=0.0일 때 1e-5 정밀도 내에서 nn.CrossEntropyLoss 와 동일하게 출력됩니다.
(2) 스무딩>0.0일 때 서로 다른 클래스 weight.sum(dim=-1) 대한 가중치의 합은 항상 1입니다.

huanglianghua 에 2019년 12월 21일

👍7

여기 구현에는 클래스 가중치 기능이 없습니다.
((

hadaev8 에 2020년 03월 09일

👍6

그냥 torch.nn.KLDivLoss를 사용하세요. 그것은 동일합니다.

더 자세히 설명해 주시겠습니까

alshahrani2030 에 2020년 05월 04일

그냥 torch.nn.KLDivLoss를 사용하세요. 그것은 동일합니다.
더 자세히 설명해 주시겠습니까

이미 스무딩된 레이블이 있다고 가정하면 두 값의 차이가 레이블의 엔트로피이고 상수이기 때문에 torch.nn.KLDivLoss를 사용할 수 있습니다.

suanrong 에 2020년 05월 12일

👍1

@PistonY 왜 이렇게 간단하게 사용하지

with torch.no_grad():
    confidence = 1.0 - smoothing_factor
    true_dist = torch.mul(labels, confidence)
    true_dist = torch.add(true_dist, smoothing_factor / (classNum - 1))
    print(true_dist)
return true_dist

jasstionzyf 에 2020년 05월 21일

👍1

여기 구현에는 클래스 가중치 기능이 없습니다.

평활 레이블 텐서에 클래스 가중치를 곱할 수 있습니까?

skull3r7 에 2020년 08월 25일

def smooth_one_hot(true_labels: torch.Tensor, 클래스: int, smoothing=0.0):
""
평활화 == 0이면 원-핫 방법입니다.
0 < 스무딩 < 1이면 부드러운 방법입니다.
"""
assert 0 <= smoothing < 1
confidence = 1.0 - smoothing
label_shape = torch.Size((true_labels.size(0), classes))
with torch.no_grad():
    true_dist = torch.empty(size=label_shape, device=true_labels.device)
    true_dist.fill_(smoothing / (classes - 1))
    true_dist.scatter_(1, true_labels.data.unsqueeze(1), confidence)
return true_dist
```

이 구현의 문제는 클래스 수에 매우 민감하다는 것입니다.

n_classes가 2인 경우 0.5를 초과하는 스무딩은 레이블을 뒤집 습니다. 이는 사용자가 원하지 않을 것입니다. n_classes가 3이면 2/3을 초과하는 스무딩이고 4개 클래스에 대해 0.75입니다. 그래서 아마도:

assert 0 <= smoothing < (classes-1)/classes 이 문제를 잡을 수 있지만 평활화에 클래스 수를 고려해야 한다고 생각합니까?

jphdotam 에 2020년 09월 15일

👀1

def smooth_one_hot(true_labels: torch.Tensor, 클래스: int, smoothing=0.0):
"""
if smoothing == 0, it's one-hot method
if 0 < smoothing < 1, it's smooth method
"""
assert 0 <= smoothing < 1
confidence = 1.0 - smoothing
label_shape = torch.Size((true_labels.size(0), classes))
with torch.no_grad():
    true_dist = torch.empty(size=label_shape, device=true_labels.device)
    true_dist.fill_(smoothing / (classes - 1))
    true_dist.scatter_(1, true_labels.data.unsqueeze(1), confidence)
return true_dist
```
이 구현의 문제는 클래스 수에 매우 민감하다는 것입니다.
n_classes가 2인 경우 0.5를 초과하는 스무딩은 레이블을 뒤집 습니다. 이는 사용자가 원하지 않을 것입니다. n_classes가 3이면 2/3을 초과하는 스무딩이고 4개 클래스에 대해 0.75입니다. 그래서 아마도:
assert 0 <= smoothing < (classes-1)/classes 이 문제를 잡을 수 있지만 평활화에 클래스 수를 고려해야 한다고 생각합니까?

현명한 생각입니다.

PistonY 에 2020년 09월 15일

토론해주셔서 감사합니다. 불분명하고 실수처럼 보이는 몇 가지 사항이 있습니다.

@PistonY 구현의 가중치 텐서
KL 발산과 레이블 평활화 간의 동등성( @suanrong )

무게 정보:

레이블 평활용지는 y_k = smoothing / n_classes + (1 - smoothing) * y_{one hot} 입니다. 가중치의 값이되도록 smoothing / n_classes 대상 이외의 지표에 대한, 그리고 인 smoothing / n_classes + (1 - smoothing) 대상 클래스. 그러나 @PistonY 의 구현에서 torch.scatter_ 함수는 대상 값을 (1 - smoothing) 덮어씁니다(그리고 상수 용어는 사라집니다).
게다가 계산(?)에 n_classes -= 1 를 사용하는 이유를 잘 모르겠습니다

KL 발산과 레이블 평활화 간의 동등성 정보:

레이블 평활화 교차 엔트로피 손실은 위에서 언급한 y 가중치를 사용하여 읽습니다.

LS(x, y) = - sum_k {y[k] * log-prob(x)}
         = - sum_k {y[k] * log(exp(x[k]) / (sum_j exp(x[j])))}
         = - sum_k {y[k] * (x[k] - log-sum-exp(x))}
         = - sum_k {y[k] * x[k]} + log-sum-exp(x)

여기서 세 번째에서 네 번째 줄은 sum_k y[k] = smoothing / n_classes * n_classes + (1 - smoothing) = 1 라는 사실을 사용합니다.

KL 발산 손실은 다음과 같습니다.

KL(x, y) = - sum_k {y[k] * x[k] - y[k] * log(y[k])
         = - sum_k {y[k] * x[k]} - sum_k {y[k] * log(y[k])}
         = - sum_k {y[k] * x[k]} - Const.

따라서 결국 LS(x, y) = KL(x, y) + log-sum-exp(x) + Const. . 여기서 Const. 는 y 의 엔트로피에 해당하는 상수 항이며, 이는 실제로 다중 클래스 설정에서 일정합니다. 그러나 log-sum-exp 용어는 어떻습니까?

소프트 타겟을 허용 하는 KLDiv loss + log-sum-exp 와 동일하다는 것을 보여줍니다. y . 이 용어를 삭제하는 것이 합리적이게 하는 로짓에 대한 가정이 있습니까?

많은 설명 감사합니다.
건배 !

antrec 에 2020년 10월 28일

👍1

@antrec 감사

당신이 맞습니다. logsoftmax 함수를 무시하고 실수를 했습니다.

suanrong 에 2020년 10월 29일

👍1

레이블 스무딩 교차 엔트로피 손실 함수의 구현:

import torch.nn.functional as F
def linear_combination(x, y, epsilon): 
    return epsilon*x + (1-epsilon)*y

def reduce_loss(loss, reduction='mean'):
    return loss.mean() if reduction=='mean' else loss.sum() if reduction=='sum' else loss

class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, epsilon:float=0.1, reduction='mean'):
        super().__init__()
        self.epsilon = epsilon
        self.reduction = reduction

    def forward(self, preds, target):
        n = preds.size()[-1]
        log_preds = F.log_softmax(preds, dim=-1)
        loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction)
        nll = F.nll_loss(log_preds, target, reduction=self.reduction)
        return linear_combination(loss/n, nll, self.epsilon)

wangleiofficial 에 2020년 11월 01일

👍5

활동을 기반으로 하이프리로 이동

zou3519 에 2020년 11월 17일

이 페이지가 도움이 되었나요?

0 / 5 - 0 등급

Pytorch: [PyTorch][기능 요청] CrossEntropyLoss에 대한 레이블 평활화

가장 유용한 댓글

모든 22 댓글

무게 정보:

KL 발산과 레이블 평활화 간의 동등성 정보:

관련 문제