belief_flows

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

belief_flows [2023/11/19 17:21] – created - external edit 127.0.0.1belief_flows [2025/04/08 15:20] (current) – [Belief flows] pedroortega
Line 52: Line 52:
   - **Prior:** We place a Gaussian distribution P(w) to represent our parameter uncertainty. To simplify our exposition, we assume that the covariance matrix is diagonal, and so P(w)=N(w;μ,Σ)=nN(wn;μn,σ2n), where wn, μn are the n-th components of the parameter and mean vectors respectively, and σ2n is the n-th diagonal element of the covariance matrix Σ.   - **Prior:** We place a Gaussian distribution P(w) to represent our parameter uncertainty. To simplify our exposition, we assume that the covariance matrix is diagonal, and so P(w)=N(w;μ,Σ)=nN(wn;μn,σ2n), where wn, μn are the n-th components of the parameter and mean vectors respectively, and σ2n is the n-th diagonal element of the covariance matrix Σ.
   - **Parameter choice:** The learning algorithm now has to choose model parameters to minimize the prediction error. It does so using Thompson sampling, that is, by sampling a parameter vector w from the prior distribution: ˉwP(w).   - **Parameter choice:** The learning algorithm now has to choose model parameters to minimize the prediction error. It does so using Thompson sampling, that is, by sampling a parameter vector w from the prior distribution: ˉwP(w).
-  - **Evaluation of Loss and Local Update:** Once the parameter is chosen, the learning algorithm is given a supervised pair (x,y) that is can use to evaluate the loss (y,ˆy), where ˆy=Fˉw(x) is the predicted output. Based on this loss, the learning algorithm can calculate the update of the parameter ˉw using SGD: ˉw=ˉwηw(y,ˆy), where η>0 is the learning rate.+  - **Evaluation of Loss and Local Update:** Once the parameter is chosen, the learning algorithm is given a supervised pair (x,y) that it can use to evaluate the loss (y,ˆy), where ˆy=Fˉw(x) is the predicted output. Based on this loss, the learning algorithm can calculate the update of the parameter ˉw using SGD: ˉw=ˉwηw(y,ˆy), where η>0 is the learning rate.
   - **Global Update:** Now, the algorithm has to change its prior beliefs P(w) into posterior beliefs P(w). To do so, it must infer the SGD update over the whole parameter space based solely on the local observation ˉwˉw   - **Global Update:** Now, the algorithm has to change its prior beliefs P(w) into posterior beliefs P(w). To do so, it must infer the SGD update over the whole parameter space based solely on the local observation ˉwˉw
     - If we assume a quadratic error function with uncorrelated coordinates, then the class of possible SGD updates becomes the class of linear flow fields in parameter space that transforms each component as wn=anwn+bn, preserving the Gaussian shape of the resulting posterior. However, there are many such transformation that are consistent with the observed SGD update ˉwˉw, so which one should the algorithm choose?     - If we assume a quadratic error function with uncorrelated coordinates, then the class of possible SGD updates becomes the class of linear flow fields in parameter space that transforms each component as wn=anwn+bn, preserving the Gaussian shape of the resulting posterior. However, there are many such transformation that are consistent with the observed SGD update ˉwˉw, so which one should the algorithm choose?
  • belief_flows.txt
  • Last modified: 2025/04/08 15:20
  • by pedroortega