首页 » 信息技术 »

理解weight decay

2019年1月3日 / 289次阅读

  • 打开支付宝,搜索“ 529018372 ”,领取专属红包!每日支付每日领。

weight decay是一种神经网络regularization的方法,它的作用在于让weight不要那么大,实践的结果是这样做可以有效防止overfitting。至于为什么有这样的效果,有些写书的人也说不清楚。

这是weight decay的公式,C就是Cost function。

$$ C = C_0 + \frac{\lambda}{2n} \sum_w w^2$$

$$ \frac{\partial C}{\partial w} = \frac{\partial C_0}{\partial w} + \frac{\lambda}{n} w $$

weight decay也被称为L2 regularization,或者是L2 parameter norm penalty。

我们可以从三个角度来理解weight decay是如何起作用的:

(1)让weight变小一点,带来的好处是可以是整个神经网络对输入中的噪音(或者一点点变化)不要那么敏感;weight太大,其对应的输入的一点点变化就会起到主导作用,进而显著改变输出。

(2)从公式来看,weight decay对于比较大的weight,decay的更多,比较小的weight,decay较小;这就相当于,weight越大,惩罚越大,即可以更有效的减少Cost函数。

(3)让神经网络倾向于形成更简单的,“斜率”更小的模型;比如一个非线性模型,我们可以用很复杂的高阶多项式来表示,也可以容忍一些噪音,通过简单的低阶多项式来表示,甚至直接使用线性函数来表示。

最后,weight decay不是bias decay,不对bias起作用,原因可以这样来理解,如上第1点,bias不对应任何输入,虽然也就不存在导致网络敏感这样的问题了。

For example, in linear regression, this gives us solutions that have a smaller slope, or put weight on fewer of the features. In other words, even though the model is capable of representing functions with much more complicated shape, weight decay has encouraged it to use a simpler function described by smaller coefficients.

Intuitively, in the feature space, only directions along which the parameters contribute significantly to reducing the objective function are preserved relatively intact. In directions that do not contribute to reducing the objective function, movement in this direction will not significantly increase the gradient.So, Components of the weight vector corresponding to such unimportant directions are decayed away through the use of the regularization throughout training.

Another simple explanation is when your weights are large, they are more sensitive to small noises in the input data. So, when a small noise is propagated through your network with large weights, it produces much different value in the output layer of the NN rather than a network with small weights.

Note that weight decay is not the only regularization technique. In the past few years, some other approaches have been introduced such as Dropout, Bagging, Early Stop, and Parameter Sharing which work very well in NNs.

本文链接:http://www.maixj.net/ict/weight-decay-19860

相关文章

留言区

电子邮件地址不会被公开。 必填项已用*标注


前一篇:
后一篇:
推一篇:可靠正规,长期稳定,网络兼职项目!!

栏目精选

云上小悟,麦新杰的独立博客

Ctrl+D 收藏本页

栏目


©Copyright 麦新杰 Since 2014 云上小悟独立博客版权所有 备案号:苏ICP备14045477号-1。云上小悟网站部分内容来源于网络,转载目的是为了整合信息,收藏学习,服务大家,有些转载内容也难以判断是否有侵权问题,如果侵犯了您的权益,请及时联系站长,我会立即删除。

网站二维码
go to top