Differences
This shows you the differences between two versions of the page.
belief_flows [2023/11/19 17:21] – created - external edit 127.0.0.1 | belief_flows [2025/04/08 15:20] (current) – [Belief flows] pedroortega | ||
---|---|---|---|
Line 52: | Line 52: | ||
- **Prior:** We place a Gaussian distribution P(w) to represent our parameter uncertainty. To simplify our exposition, we assume that the covariance matrix is diagonal, and so P(w)=N(w;μ,Σ)=∏nN(wn;μn,σ2n), where wn, μn are the n-th components of the parameter and mean vectors respectively, | - **Prior:** We place a Gaussian distribution P(w) to represent our parameter uncertainty. To simplify our exposition, we assume that the covariance matrix is diagonal, and so P(w)=N(w;μ,Σ)=∏nN(wn;μn,σ2n), where wn, μn are the n-th components of the parameter and mean vectors respectively, | ||
- **Parameter choice:** The learning algorithm now has to choose model parameters to minimize the prediction error. It does so using Thompson sampling, that is, by sampling a parameter vector w′ from the prior distribution: | - **Parameter choice:** The learning algorithm now has to choose model parameters to minimize the prediction error. It does so using Thompson sampling, that is, by sampling a parameter vector w′ from the prior distribution: | ||
- | - **Evaluation of Loss and Local Update:** Once the parameter is chosen, the learning algorithm is given a supervised pair (x,y) that is can use to evaluate the loss ℓ(y,ˆy), where ˆy=Fˉw(x) is the predicted output. Based on this loss, the learning algorithm can calculate the update of the parameter ˉw using SGD: ˉw′=ˉw−η⋅∂∂wℓ(y,ˆy), where η>0 is the learning rate. | + | - **Evaluation of Loss and Local Update:** Once the parameter is chosen, the learning algorithm is given a supervised pair (x,y) that it can use to evaluate the loss ℓ(y,ˆy), where ˆy=Fˉw(x) is the predicted output. Based on this loss, the learning algorithm can calculate the update of the parameter ˉw using SGD: ˉw′=ˉw−η⋅∂∂wℓ(y,ˆy), where η>0 is the learning rate. |
- **Global Update:** Now, the algorithm has to change its prior beliefs P(w) into posterior beliefs P′(w). To do so, it must infer the SGD update over the whole parameter space based solely on the local observation ˉw→ˉw′. | - **Global Update:** Now, the algorithm has to change its prior beliefs P(w) into posterior beliefs P′(w). To do so, it must infer the SGD update over the whole parameter space based solely on the local observation ˉw→ˉw′. | ||
- If we assume a quadratic error function with uncorrelated coordinates, | - If we assume a quadratic error function with uncorrelated coordinates, |