2024-07-12
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Occurrit olim BatchNorm
etLayerNorm
tum non nisi male cognitum est eos perfeceruntstandardisation: x = x − μ σ bm{x} = frac{bm{x} - bm{mu}}{bm{sigma}}x****************=σx****************−μ Hoc quidem simplex est, sed ipsa notitia magis implicata estCV tasksSecundum imaginem datae, notitia massae plerumque similis est: [B, C, H, W]
; NLP tasksSecundum sententiam datae, notitia massae plerumque similis est: [B, T, D]
. Quod XXNorm
Quomodo calculata? μ bm{mu}μ et σ bm{sigma}σac differunt.
Multum igitur informationes sedatus sum, et cum magistri verbis vere assentior: Diaria nuntius non legeris, non tam bona sunt quam tu! commentarium regio: Cur Transformers utor accumsan norma pro batch norma?
Haec sectio singillatim ex- plicatur perspectivae tam theoreticae quam codicis BatchNorm
etLayerNorm
calculi modus.
>>>>> Tongyi Qianwen >>>>>
quaeritur?: Quid interest inter regulam et regulam massam? Da explicationem formulae.
respondere: Layer Normalization (LayerNorm) etBatch Normalization (BatchNorm) sunt artes ad accelerandum neural network disciplina, per ordinationem accumsanIntra seu activate valoremretis doctis firmiorem efficiensqueExsecutioetLorem scaenadifferant on.
Batch Normalization (BatchNorm)
BatchNorm
Maxime in disciplinaAd mini-batch of notitiaad normalizeprincipalis scopumsicRedigendum internum covariate redolentque thymo fragrantia(transpositio covariata interna), hoc est, per processum propagationis anterioris retis, input distributio intermedii iacuit cum iterationibus exercitationibus mutatur. BatchNorm
per caculationemMedium et discrepantia currentis mini-batchAd initus normalize, formula haec est:
extruxerat x bm{x}x**************** vector initus est intra batch mini- μ B bm{mu}_BμB et σ B 2 bm{sigma}^2_BσB2 sunt media et discordes ad mini-batch respectively, tumBatchNorm
Output y bm{y}y****** Sic exprimi potest: x^ = x − μ B μ B 2 + y^ = γ x ^ + β^bmx****************=bmx****************−bmμB√bmμ2B+ϵ^bmy******=γ^bmx****************+β
x****************^y******^=μB2+ϵx****************−μB=γx****************^+β in, γ gammaγ et β betaβ disci modulus; epsilonϵ est parva constans stabilis pro numerorum usu.
Layer Normalization (LayerNorm)
LayerNorm
Propositum estutrumque specimen gradu "Facere ordinationem sine innixa mini-batch mutant. LayerNorm
Aptius ad exempla sequentia (utLSTM
or *Transformer
inter quos etcNon potest esse clientelas inter exemplaor * mini-batch mole non potest firmari. LayerNorm
computareMedium et dissimile cuiusque specimen plumaformula talis est;
extruxerat x bm{x}x**************** est pluma vector unius exempli; μ L bm{mu}_LμL et σ L 2 bm{sigma}^2_LσL2 sunt media et variata specimen indoles respective, tumLayerNorm
Output y bm{y}y****** Sic exprimi potest: x^ = x − μ L μ L 2 + y^ = γ x ^ + β^bmx****************=bmx****************−bmμL√bmμ2L+ϵ^bmy******=γ^bmx****************+β
x****************^y******^=μL2+ϵx****************−μL=γx****************^+β Idem hic; γ gammaγ et β betaβ disci modulus; epsilonϵ parva constans.
differentia
BatchNorm
dependetmini-batch mutant, et LayerNorm
dependetProprium statistica unius specimen;BatchNorm
aptius for*computatrum vision tasks, et LayerNorm
aptius for*naturalis lingua processus taskspraesertim cum de data serie agitur;BatchNorm
In parallelismi computatione efficacius esse potest, quia parallelismo in mini- batch uti potest, dumLayerNorm
Calculi in unoquoque sample sui iuris sunt et aptiores ad seriem processus notitiarum diversarum longitudinum.Ambae methodi suas utilitates et incommoda habent, et qui modus eligendi plerumque pendet ex applicatione specifica missionis et notae notarum.
<<<<< Tongyi Qianwen <<<<
Esne perspicuus? Quid est "medium et discrepantiae hodiernae mini-batch".? Quid est "medium et dissimile cuiusque pluma specimen"? Alieni responsio melius est:
Praecipuas notiones intellegamus:
Relucet hoc in notitia;
[B, C, H, W]
batch eft;[C, H, W]
est instantia;[C]
pixel est, comprehendoC
pluma.[B, T, L]
batch eft;[T, L]
est instantia;[L]
Verbum est, comprehendoL
pluma.Ut infra patebit:
Ex parte Batch Dimensionis spectans, singula quadrata parva ad dorsum extensa elementum repraesentat, quale est talea longa purpurea in pictura sinistra, notam pixelli RGB, vel vectoris verbi.LayerNorm
Dareunumquodque elementum Medium et discordans computa, accipere potesBxL
medium ac discordes (or *BxHxW
).Unumquodque elementum est normatum independenter. Purpura panni in pictura a dextra pluma est, primum omnium verborum in massa. BatchNorm
DareQuisque pluma Medium et discordans computa, accipere potesL
medium ac discordes (or *C
).Quisque pluma est normatum independenter.
Notandum quod Transformator supradictum non sequiturLayerNorm
ratione, sed deditutrumque Medium et discordans computa, accipere potesB
medium ac discordes, tumSingula instantia normatum independenter.
BatchNorm
etLayerNorm
BatchNorm
Apud PyTorch, BatchNorm
punctumnn.BatchNorm1d
, nn.BatchNorm2d
etnn.BatchNorm3d
respective pro notitia diversarum rationum;
nn.BatchNorm1d
: (N, C)
or *(N, C, L)
nn.BatchNorm2d
: (N, C, H, W)
nn.BatchNorm3d
: (N, C, D, H, W)
View source code:
class BatchNorm1d(_BatchNorm):
r"""
Args:
num_features: number of features or channels `C` of the input
Shape:
- Input: `(N, C)` or `(N, C, L)`, where `N` is the batch size,
`C` is the number of features or channels, and `L` is the sequence length
- Output: `(N, C)` or `(N, C, L)` (same shape as input)
"""
def _check_input_dim(self, input):
if input.dim() != 2 and input.dim() != 3:
raise ValueError(f"expected 2D or 3D input (got {input.dim()}D input)")
Exempla:
>>> m = nn.BatchNorm1d(100) # C=100 # With Learnable Parameters
>>> m = nn.BatchNorm1d(100, affine=False) # Without Learnable Parameters
>>> input = torch.randn(20, 100) # (N, C)
>>> output = m(input)
>>> # 或者
>>> input = torch.randn(20, 100, 30) # (N, C, L)
>>> output = m(input)
γ , β bm{gamma}, bm{beta}γ,β est discibilis parameter, etshape=(C,)
, modulus nomen est .weight
et.bias
:
>>> m = nn.BatchNorm1d(100)
>>> m.weight
Parameter containing:
tensor([1., 1., ..., 1.], requires_grad=True)
>>> m.weight.shape
torch.Size([100])
>>> m.bias
Parameter containing:
tensor([0., 0., ..., 0.], requires_grad=True)
BatchNorm2d
etBatchNorm3d
idem interest quod_check_input_dim(input)
:
class BatchNorm2d(_BatchNorm):
r"""
Args:
num_features: `C` from an expected input of size `(N, C, H, W)`
Shape:
- Input: :math:`(N, C, H, W)`
- Output: :math:`(N, C, H, W)` (same shape as input)
"""
def _check_input_dim(self, input):
if input.dim() != 4:
raise ValueError(f"expected 4D input (got {input.dim()}D input)")
Exempla:
>>> m = nn.BatchNorm2d(100)
>>> input = torch.randn(20, 100, 35, 45)
>>> output = m(input)
class BatchNorm3d(_BatchNorm):
r"""
Args:
num_features: `C` from an expected input of size `(N, C, D, H, W)`
Shape:
- Input: :math:`(N, C, D, H, W)`
- Output: :math:`(N, C, D, H, W)` (same shape as input)
"""
def _check_input_dim(self, input):
if input.dim() != 5:
raise ValueError(f"expected 5D input (got {input.dim()}D input)")
Exempla:
>>> m = nn.BatchNorm3d(100)
>>> input = torch.randn(20, 100, 35, 45, 10)
>>> output = m(input)
LayerNorm
aliud a * BatchNorm(num_features)
, LayerNorm(normalized_shape)
Parametri suntinput.shape
postx
singuladim
, sicut [B, T, L]
Ultimae duae rationes[T, L]
tum quaelibet sententia independenter normatum erit; L
or *[L]
ergo omne nomen vector independenter mensuratur.
NLP Exemplum
>>> batch, sentence_length, embedding_dim = 20, 5, 10
>>> embedding = torch.randn(batch, sentence_length, embedding_dim)
>>> layer_norm = nn.LayerNorm(embedding_dim)
>>> layer_norm(embedding) # Activate module
Exemplum imago
>>> N, C, H, W = 20, 5, 10, 10
>>> input = torch.randn(N, C, H, W)
>>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions)
>>> layer_norm = nn.LayerNorm([C, H, W])
>>> output = layer_norm(input)
Aliis verbis non solum "varium" includit elementumum Standardize independently "et" Quisqueexempli gratia independently normalized "et numerant quid"normalize in ultimis x dimensionibus.
import torch
from torch import nn
# >>> 手动计算 BatchNorm2d >>>
weight = torch.ones([1, 3, 1, 1])
bias = torch.zeros([1, 3, 1, 1])
x = 10 * torch.randn(2, 3, 4, 4) + 100
mean = x.mean(dim=[0, 2, 3], keepdim=True)
std = x.std(dim=[0, 2, 3], keepdim=True, unbiased=False)
print(x)
print(mean)
print(std)
y = (x - mean) / std
y = y * weight + bias
print(y)
# <<< 手动计算 BatchNorm2d <<<
# >>> nn.BatchNorm2d >>>
bnm2 = nn.BatchNorm2d(3)
z = bnm2(x)
print(z)
# <<< nn.BatchNorm2d <<<
print(torch.norm(z - y, p=1))
Invenies manually calculandum et nn.BatchNorm
Calculi calculi fere eodem modo sunt; epsilonϵ Auctoritasunbiased=False
Gravis est notare, documenta officialis explicat:
"""
At train time in the forward pass, the standard-deviation is calculated via the biased estimator,
equivalent to `torch.var(input, unbiased=False)`.
However, the value stored in the moving average of the standard-deviation is calculated via
the unbiased estimator, equivalent to `torch.var(input, unbiased=True)`.
Also by default, during training this layer keeps running estimates of its computed mean and
variance, which are then used for normalization during evaluation.
"""
EGO iustus volo cognoscere processus calculi hic, non focus in unbiased
.