The loss functions in PyTorch

The loss functions in PyTorch

Buy Me a Coffee

*My post explains optimizers in PyTorch.

A loss function is the function which can get the mean(average) of the sum of the losses(differences) between a model’s predictions and true values(train or test data) to optimize a model during training or to evaluate how good a model is during testing. *Loss function is also called Cost Function or Error Function.

There are popular loss functions as shown below:

(1) L1 Loss:

can compute the mean(average) of the sum of the absolute losses(differences) between a model’s predictions and true values(train and test data).
‘s formula:

is used for a regression model.
is also called Mean Absolute Error(MAE).
is L1Loss() in PyTorch.

‘s pros:

It’s less sensitive to outliers and anomalies.
The losses can be easily compared because they are just made absolute so the range of them is not big.

‘s cons:

The absolute loss of |0| cannot be differentiable according to this post and this post.

(2) L2 Loss:

can compute the mean(average) of the sum of the squared losses(differences) between a model’s predictions and true values(train and test data).
‘s formula:

is used for a regression model.
is also called Mean Squared Error(MSE).
is MSELoss() in PyTorch
‘s pros:

All squared losses can be differentiable.

‘s cons:

It’s sensitive to outliers and anomalies.
The losses cannot be easily compared because they are squared so the range of them is big.

(3) Huber Loss:

can do the similar computation of either L1 Loss or L2 Loss depending on the absolute losses(differences) between a model’s predictions and true values(train and test data) compared with delta which you set.
*Memos:

delta is 1.0 basically.
Be careful, the computation is not exactly same as L1 Loss or L2 Loss according to the formulas below.

‘s formula. *The 1st one is L2 Loss-like one and the 2nd one is L1 Loss-like one:

is used for a regression model.

is HuberLoss() in PyTorch.

with delta of 1.0 is same as Smooth L1 Loss which is SmoothL1Loss() in PyTorch.

‘s pros:

It’s less sensitive to outliers and anomalies.
All losses can be differentiable.
The losses can be more easily compared than L2 Loss because only small losses are squared so the range of them is smaller than L2 Loss.

‘s cons:

The computation is more than L1 Loss and L2 Loss because the formula is more complex than them.

(4) BCE(Binary Cross Entropy) Loss:

can compute the mean(average) of the sum of the losses(differences) between a model’s binary predictions and true binary values(train and test data).
s’ formula:

is used for Binary Classification. *Binary Classification is the technology to classify data into two classes.
is also called Binary Cross Entropy or Log(Logarithmic) Loss.
is BCELoss() in PyTorch.
*Memos:

There is also BCEWithLogitsLoss() which is the combination of BCE Loss and Sigmoid Activation Function in PyTorch.
Sigmoid Activation Function suits BCE Loss to be more stable.

(5) Cross Entropy Loss:

can compute the mean(average) of the sum of the losses(differences) between a model’s predictions and true values(train and test data). *A loss is between 0 and 1.
s’ formula:

is used for Multiclass Classification and Computer Vision.
*Memos:

Multiclass Classification is the technology to classify data into multiple classes.
Computer vision is the technology which enables a computer to understand objects.

is CrossEntropyLoss() in PyTorch.

Please follow and like us:
Pin Share