Multiplicative weight update method

<h2 id="name">Name</h2>
"Multiplicative weights" implies the iterative rule used in algorithms derived from the multiplicative weight update method.<a class="footnote-ref" id="fnref:2" href="#fn:2">2</a> It is given with different names in the different fields where it was discovered or rediscovered.

<h2 id="history-and-background">History and background</h2>
The earliest known version of this technique was in an algorithm named "<a href="/facts/Fictitious_play/rTYwUEVn">fictitious play</a>" which was proposed in <a href="/facts/Game_theory/zkEIj1ya">game theory</a> in the early 1950s. Grigoriadis and Khachiyan<a class="footnote-ref" id="fnref:3" href="#fn:3">3</a> applied a randomized variant of "fictitious play" to solve two-player <a href="/facts/Zero-sum_game/ccouScgK">zero-sum games</a> efficiently using the multiplicative weights algorithm. In this case, player allocates higher weight to the actions that had a better outcome and choose his strategy relying on these weights. In <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a>, Littlestone applied the earliest form of the multiplicative weights update rule in his famous <a href="/facts/Winnow_(algorithm)/Shdmb7AD">winnow algorithm</a>, which is similar to Minsky and Papert's earlier <a href="/facts/Perceptron/ArxdkAC1">perceptron learning algorithm</a>. Later, he generalized the winnow algorithm to weighted majority algorithm. Freund and Schapire followed his steps and generalized the winnow algorithm in the form of hedge algorithm.
The multiplicative weights algorithm is also widely applied in <a href="/facts/Computational_geometry/eeaotQtl">computational geometry</a> such as <a href="/facts/Kenneth_L._Clarkson/JbgWa82R">Kenneth Clarkson's</a> algorithm for <a href="/facts/Linear_programming/GduXFQxT">linear programming (LP)</a> with a bounded number of variables in linear time.<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a><a class="footnote-ref" id="fnref:5" href="#fn:5">5</a> Later, Bronnimann and Goodrich employed analogous methods to find <a href="/facts/Set_cover_problem/MVzyBYs1">set covers</a> for <a href="/facts/Hypergraph/qIR2o2I3">hypergraphs</a> with small <a href="/facts/VC_dimension/ThkxKE12">VC dimension</a>.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a>
In <a href="/facts/Operations_research/H5yJkV5n">operations research</a> and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.
In computer science field, some researchers have previously observed the close relationships between multiplicative update algorithms used in different contexts. Young discovered the similarities between fast LP algorithms and Raghavan's method of pessimistic estimators for derandomization of randomized rounding algorithms; Klivans and Servedio linked boosting algorithms in learning theory to proofs of Yao's XOR Lemma; Garg and Khandekar defined a common framework for convex optimization problems that contains Garg-Konemann and Plotkin-Shmoys-Tardos as subcases.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>
The Hedge algorithm is a special case of <a href="/facts/Mirror_descent/nUv4Mh04">mirror descent</a>.

<h2 id="general-setup">General setup</h2>
A binary decision needs to be made based on n experts’ opinions to attain an associated payoff. In the first round, all experts’ opinions have the same weight. The decision maker will make the first decision based on the majority of the experts' prediction. Then, in each successive round, the decision maker will repeatedly update the weight of each expert's opinion depending on the correctness of his prior predictions. Real life examples includes predicting if it is rainy tomorrow or if the stock market will go up or go down.

<h2 id="algorithm-analysis">Algorithm analysis</h2>
<h3>Halving algorithm</h3>
Given a sequential game played between an adversary and an aggregator who is advised by N experts, the goal is for the aggregator to make as few mistakes as possible. Assume there is an expert among the N experts who always gives the correct prediction. In the halving algorithm, only the consistent experts are retained. Experts who make mistakes will be dismissed. For every decision, the aggregator decides by taking a majority vote among the remaining experts. Therefore, every time the aggregator makes a mistake, at least half of the remaining experts are dismissed. The aggregator makes at most log2(N) mistakes.<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a>

<h3>Weighted majority algorithm</h3>
Source:<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a><a class="footnote-ref" id="fnref:10" href="#fn:10">10</a>
Unlike halving algorithm which dismisses experts who have made mistakes, weighted majority algorithm discounts their advice. Given the same "expert advice" setup, suppose we have n decisions, and we need to select one decision for each loop. In each loop, every decision incurs a cost. All costs will be revealed after making the choice. The cost is 0 if the expert is correct, and 1 otherwise. this algorithm's goal is to limit its cumulative losses to roughly the same as the best of experts.
The very first algorithm that makes choice based on majority vote every iteration does not work since the majority of the experts can be wrong consistently every time. The weighted majority algorithm corrects above trivial algorithm by keeping a weight of experts instead of fixing the cost at either 1 or 0.<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a> This would make fewer mistakes compared to halving algorithm.

 Initialization: 
 Fix an 
 
 
 
 η
 ≤
 1
 
 /
 
 2
 
 
 {\displaystyle \eta \leq 1/2}
 
. For each expert, associate the weight 
 
 
 
 
 
 
 w
 
 i
 
 
 
 
 1
 
 
 
 
 {\displaystyle {w_{i}}^{1}}
 
≔1.
 For 
 
 
 
 t
 
 
 {\displaystyle t}
 
 = 
 
 
 
 
 
 1
 
 
 
 
 {\displaystyle {\mathit {1}}}
 
, 
 
 
 
 
 
 2
 
 
 
 
 {\displaystyle {\mathit {2}}}
 
,...,
 
 
 
 T
 
 
 {\displaystyle T}

1. Make the prediction given by the weighted majority of the experts' predictions based on their weights
  
    
      
        
          
            
              w
              
                1
              
            
          
          
            t
          
        
        ,
        .
        .
        .
        ,
        
          
            
              w
              
                n
              
            
          
          
            t
          
        
      
    
    {\displaystyle \mathbb {w_{1}} ^{t},...,\mathbb {w_{n}} ^{t}}
  
. That is, choose 0 or 1 depending on which prediction has a higher total weight of experts advising it (breaking ties arbitrarily). 
      2. For every expert i that predicted wrongly, decrease his weight for the next round by multiplying it by a factor of (1-η):
           
  
    
      
        
          w
          
            i
          
          
            t
            +
            1
          
        
      
    
    {\displaystyle w_{i}^{t+1}}
  
=
  
    
      
        (
        1
        −
        η
        )
        
          w
          
            i
          
          
            t
          
        
      
    
    {\displaystyle (1-\eta )w_{i}^{t}}
  
 (update rule)

If 
 
 
 
 η
 =
 0
 
 
 {\displaystyle \eta =0}
 
, the weight of the expert's advice will remain the same. When 
 
 
 
 η
 
 
 {\displaystyle \eta }
 
 increases, the weight of the expert's advice will decrease. Note that some researchers fix 
 
 
 
 η
 =
 1
 
 /
 
 2
 
 
 {\displaystyle \eta =1/2}
 
 in weighted majority algorithm.
After 
 
 
 
 T
 
 
 {\displaystyle T}
 
 steps, let 
 
 
 
 
 m
 
 i
 
 
 T
 
 
 
 
 {\displaystyle m_{i}^{T}}
 
 be the number of mistakes of expert i and 
 
 
 
 
 M
 
 T
 
 
 
 
 {\displaystyle M^{T}}
 
 be the number of mistakes our algorithm has made. Then we have the following bound for every 
 
 
 
 i
 
 
 {\displaystyle i}
 
:

 
 
 
 
 
 M
 
 T
 
 
 ≤
 2
 (
 1
 +
 η
 )
 
 m
 
 i
 
 
 T
 
 
 +
 
 
 
 2
 ln
 ⁡
 (
 n
 )
 
 η
 
 
 
 
 {\displaystyle M^{T}\leq 2(1+\eta )m_{i}^{T}+{\frac {2\ln(n)}{\eta }}}
 
.

In particular, this holds for i which is the best expert. Since the best expert will have the least 
 
 
 
 
 m
 
 i
 
 
 T
 
 
 
 
 {\displaystyle m_{i}^{T}}
 
, it will give the best bound on the number of mistakes made by the algorithm as a whole.

<h3>Randomized weighted majority algorithm</h3>
This algorithm can be understood as follows:<a class="footnote-ref" id="fnref:12" href="#fn:12">12</a><a class="footnote-ref" id="fnref:13" href="#fn:13">13</a>
Given the same setup with N experts. Consider the special situation where the proportions of experts predicting positive and negative, counting the weights, are both close to 50%. Then, there might be a tie. Following the weight update rule in weighted majority algorithm, the predictions made by the algorithm would be randomized. The algorithm calculates the probabilities of experts predicting positive or negatives, and then makes a random decision based on the computed fraction:
predict

f
        (
        x
        )
        =
        
          
            {
            
              
                
                  1
                
                
                  
                    with probability
                  
                  
                    
                      
                        q
                        
                          1
                        
                      
                      W
                    
                  
                
              
              
                
                  0
                
                
                  
                    otherwise
                  
                
              
            
            
          
        
      
    
    {\displaystyle f(x)={\begin{cases}1&{\text{with probability}}{\frac {q_{1}}{W}}\\0&{\text{otherwise}}\end{cases}}}

where 

 
 
 
 
 W
 =
 
 ∑
 
 i
 
 
 
 
 w
 
 i
 
 
 
 =
 
 q
 
 0
 
 
 +
 
 q
 
 1
 
 
 
 
 {\displaystyle W=\sum _{i}{w_{i}}=q_{0}+q_{1}}
 
.

The number of mistakes made by the randomized weighted majority algorithm is bounded as: 

 
 
 
 
 E
 
 [
 
 #
 
 mistakes of the learner
 
 
 ]
 
 ≤
 
 α
 
 β
 
 
 
 (
 
 #
 
  mistakes of the best expert
 
 
 )
 
 +
 
 c
 
 β
 
 
 ln
 ⁡
 (
 N
 )
 
 
 {\displaystyle E\left[\#{\text{mistakes of the learner}}\right]\leq \alpha _{\beta }\left(\#{\text{ mistakes of the best expert}}\right)+c_{\beta }\ln(N)}

where 
 
 
 
 
 α
 
 β
 
 
 =
 
 
 
 ln
 ⁡
 (
 
 
 1
 β
 
 
 )
 
 
 1
 −
 β
 
 
 
 
 
 {\displaystyle \alpha _{\beta }={\frac {\ln({\frac {1}{\beta }})}{1-\beta }}}
 
 and 
 
 
 
 
 c
 
 β
 
 
 =
 
 
 1
 
 1
 −
 β
 
 
 
 
 
 {\displaystyle c_{\beta }={\frac {1}{1-\beta }}}
 
.
Note that only the learning algorithm is randomized. The underlying assumption is that the examples and experts' predictions are not random. The only randomness is the randomness where the learner makes his own prediction.
In this randomized algorithm, 
 
 
 
 
 α
 
 β
 
 
 →
 1
 
 
 {\displaystyle \alpha _{\beta }\rightarrow 1}
 
 if 
 
 
 
 β
 →
 1
 
 
 {\displaystyle \beta \rightarrow 1}
 
. Compared to weighted algorithm, this randomness halved the number of mistakes the algorithm is going to make.<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a> However, it is important to note that in some research, people define 
 
 
 
 η
 =
 1
 
 /
 
 2
 
 
 {\displaystyle \eta =1/2}
 
 in weighted majority algorithm and allow 
 
 
 
 0
 ≤
 η
 ≤
 1
 
 
 {\displaystyle 0\leq \eta \leq 1}
 
 in <a href="/facts/Randomized_weighted_majority_algorithm/Khd3GA8j">randomized weighted majority algorithm</a>.<a class="footnote-ref" id="fnref:15" href="#fn:15">15</a>

<h2 id="applications">Applications</h2>
The multiplicative weights method is usually used to solve a constrained optimization problem. Let each expert be the constraint in the problem, and the events represent the points in the area of interest. The punishment of the expert corresponds to how well its corresponding constraint is satisfied on the point represented by an event.<a class="footnote-ref" id="fnref:16" href="#fn:16">16</a>

<h3>Solving zero-sum games approximately (Oracle algorithm)</h3>
Source:<a class="footnote-ref" id="fnref:17" href="#fn:17">17</a><a class="footnote-ref" id="fnref:18" href="#fn:18">18</a>
Suppose we were given the distribution 
 
 
 
 P
 
 
 {\displaystyle P}
 
 on experts. Let 
 
 
 
 A
 
 
 {\displaystyle A}
 
 = payoff matrix of a finite two-player zero-sum game, with 
 
 
 
 n
 
 
 {\displaystyle n}
 
 rows.
When the row player 
 
 
 
 
 p
 
 r
 
 
 
 
 {\displaystyle p_{r}}
 
 uses plan 
 
 
 
 i
 
 
 {\displaystyle i}
 
 and the column player 
 
 
 
 
 p
 
 c
 
 
 
 
 {\displaystyle p_{c}}
 
 uses plan 
 
 
 
 j
 
 
 {\displaystyle j}
 
, the payoff of player 
 
 
 
 
 p
 
 c
 
 
 
 
 {\displaystyle p_{c}}
 
 is 
 
 
 
 A
 
 (
 
 i
 ,
 j
 
 )
 
 
 
 {\displaystyle A\left(i,j\right)}
 
≔
 
 
 
 
 A
 
 i
 j
 
 
 
 
 {\displaystyle A_{ij}}
 
, assuming 
 
 
 
 A
 
 (
 
 i
 ,
 j
 
 )
 
 ∈
 
 [
 
 0
 ,
 1
 
 ]
 
 
 
 {\displaystyle A\left(i,j\right)\in \left[0,1\right]}
 
.
If player 
 
 
 
 
 p
 
 r
 
 
 
 
 {\displaystyle p_{r}}
 
 chooses action 
 
 
 
 i
 
 
 {\displaystyle i}
 
 from a distribution 
 
 
 
 P
 
 
 {\displaystyle P}
 
 over the rows, then the expected result for player 
 
 
 
 
 p
 
 c
 
 
 
 
 {\displaystyle p_{c}}
 
 selecting action 
 
 
 
 j
 
 
 {\displaystyle j}
 
 is 
 
 
 
 A
 
 (
 
 P
 ,
 j
 
 )
 
 =
 
 E
 
 i
 ∈
 P
 
 
 
 [
 
 A
 
 (
 
 i
 ,
 j
 
 )
 
 
 ]
 
 
 
 {\displaystyle A\left(P,j\right)=E_{i\in P}\left[A\left(i,j\right)\right]}
 
.
To maximize 
 
 
 
 A
 
 (
 
 P
 ,
 j
 
 )
 
 
 
 {\displaystyle A\left(P,j\right)}
 
, player 
 
 
 
 
 p
 
 c
 
 
 
 
 {\displaystyle p_{c}}
 
 should choose plan 
 
 
 
 j
 
 
 {\displaystyle j}
 
. Similarly, the expected payoff for player 
 
 
 
 
 p
 
 l
 
 
 
 
 {\displaystyle p_{l}}
 
 is 
 
 
 
 A
 
 (
 
 i
 ,
 P
 
 )
 
 =
 
 E
 
 j
 ∈
 P
 
 
 
 [
 
 A
 
 (
 
 i
 ,
 j
 
 )
 
 
 ]
 
 
 
 {\displaystyle A\left(i,P\right)=E_{j\in P}\left[A\left(i,j\right)\right]}
 
. Choosing plan 
 
 
 
 i
 
 
 {\displaystyle i}
 
 would minimize this payoff. By John Von Neumann's Min-Max Theorem, we obtain:

 
 
 
 
 
 min
 
 P
 
 
 
 max
 
 j
 
 
 A
 
 (
 
 P
 ,
 j
 
 )
 
 =
 
 max
 
 Q
 
 
 
 min
 
 i
 
 
 A
 
 (
 
 i
 ,
 Q
 
 )
 
 
 
 {\displaystyle \min _{P}\max _{j}A\left(P,j\right)=\max _{Q}\min _{i}A\left(i,Q\right)}

where P and i changes over the distributions over rows, Q and j changes over the columns.
Then, let 
 
 
 
 
 λ
 
 ∗
 
 
 
 
 {\displaystyle \lambda ^{*}}
 
 denote the common value of above quantities, also named as the "value of the game". Let 
 
 
 
 δ
 >
 0
 
 
 {\displaystyle \delta >0}
 
 be an error parameter. To solve the zero-sum game bounded by additive error of 
 
 
 
 δ
 
 
 {\displaystyle \delta }
 
,

 
 
 
 
 
 λ
 
 ∗
 
 
 −
 δ
 ≤
 
 min
 
 i
 
 
 A
 
 (
 
 i
 ,
 q
 
 )
 
 
 
 {\displaystyle \lambda ^{*}-\delta \leq \min _{i}A\left(i,q\right)}

max
          
            j
          
        
        A
        
          (
          
            p
            ,
            j
          
          )
        
        ≤
        
          λ
          
            ∗
          
        
        +
        δ
      
    
    {\displaystyle \max _{j}A\left(p,j\right)\leq \lambda ^{*}+\delta }

So there is an algorithm solving zero-sum game up to an additive factor of δ using O(log2(n)/
 
 
 
 
 δ
 
 2
 
 
 
 
 {\displaystyle \delta ^{2}}
 
) calls to ORACLE, with an additional processing time of O(n) per call<a class="footnote-ref" id="fnref:19" href="#fn:19">19</a>
Bailey and Piliouras showed that although the time average behavior of multiplicative weights update converges to Nash equilibria in zero-sum games the day-to-day (last iterate) behavior diverges away from it.<a class="footnote-ref" id="fnref:20" href="#fn:20">20</a>

<h3>Machine learning</h3>
In machine learning, Littlestone and Warmuth generalized the winnow algorithm to the weighted majority algorithm.<a class="footnote-ref" id="fnref:21" href="#fn:21">21</a> Later, Freund and Schapire generalized it in the form of hedge algorithm.<a class="footnote-ref" id="fnref:22" href="#fn:22">22</a> AdaBoost Algorithm formulated by Yoav Freund and Robert Schapire also employed the Multiplicative Weight Update Method.<a class="footnote-ref" id="fnref:23" href="#fn:23">23</a>

<h4>Winnow algorithm</h4>
Based on current knowledge in algorithms, the multiplicative weight update method was first used in Littlestone's winnow algorithm.<a class="footnote-ref" id="fnref:24" href="#fn:24">24</a> It is used in machine learning to solve a linear program.
Given 
 
 
 
 m
 
 
 {\displaystyle m}
 
 labeled examples 
 
 
 
 
 (
 
 
 a
 
 1
 
 
 ,
 
 l
 
 1
 
 
 
 )
 
 ,
 
 …
 
 ,
 
 (
 
 
 a
 
 m
 
 
 ,
 
 l
 
 m
 
 
 
 )
 
 
 
 {\displaystyle \left(a_{1},l_{1}\right),{\text{…}},\left(a_{m},l_{m}\right)}
 
 where 
 
 
 
 
 a
 
 j
 
 
 ∈
 
 
 R
 
 
 n
 
 
 
 
 {\displaystyle a_{j}\in \mathbb {R} ^{n}}
 
 are feature vectors, and 
 
 
 
 
 l
 
 j
 
 
 ∈
 
 {
 
 −
 1
 ,
 1
 
 }
 
 
 
 
 {\displaystyle l_{j}\in \left\{-1,1\right\}\quad }
 
 are their labels.
The aim is to find non-negative weights such that for all examples, the sign of the weighted combination of the features matches its labels. That is, require that 
 
 
 
 
 l
 
 j
 
 
 
 a
 
 j
 
 
 x
 ≥
 0
 
 
 {\displaystyle l_{j}a_{j}x\geq 0}
 
 for all 
 
 
 
 j
 
 
 {\displaystyle j}
 
. Without loss of generality, assume the total weight is 1 so that they form a distribution. Thus, for notational convenience, redefine 
 
 
 
 
 a
 
 j
 
 
 
 
 {\displaystyle a_{j}}
 
 to be 
 
 
 
 
 l
 
 j
 
 
 
 a
 
 j
 
 
 
 
 {\displaystyle l_{j}a_{j}}
 
, the problem reduces to finding a solution to the following LP:

 
 
 
 
 ∀
 j
 =
 1
 ,
 2
 ,
 
 …
 
 ,
 m
 :
 
 a
 
 j
 
 
 x
 ≥
 0
 
 
 {\displaystyle \forall j=1,2,{\text{…}},m:a_{j}x\geq 0}
 
,
 
 
 
 
 1
 ∗
 x
 =
 1
 
 
 {\displaystyle 1*x=1}
 
,
 
 
 
 
 ∀
 i
 :
 
 x
 
 i
 
 
 ≥
 0
 
 
 {\displaystyle \forall i:x_{i}\geq 0}
 
.

This is general form of LP.

<h4>Hedge algorithm</h4>
Source:<a class="footnote-ref" id="fnref:25" href="#fn:25">25</a>
The hedge algorithm is similar to the weighted majority algorithm. However, their exponential update rules are different.<a class="footnote-ref" id="fnref:26" href="#fn:26">26</a>
It is generally used to solve the problem of binary allocation in which we need to allocate different portion of resources into N different options. The loss with every option is available at the end of every iteration. The goal is to reduce the total loss suffered for a particular allocation. The allocation for the following iteration is then revised, based on the total loss suffered in the current iteration using multiplicative update.<a class="footnote-ref" id="fnref:27" href="#fn:27">27</a>

<h5>Analysis</h5>
Assume the learning rate 
 
 
 
 η
 >
 0
 
 
 {\displaystyle \eta >0}
 
 and for 
 
 
 
 t
 ∈
 [
 T
 ]
 
 
 {\displaystyle t\in [T]}
 
, 
 
 
 
 
 p
 
 t
 
 
 
 
 {\displaystyle p^{t}}
 
 is picked by Hedge. Then for all experts 
 
 
 
 i
 
 
 {\displaystyle i}
 
,

 
 
 
 
 
 ∑
 
 t
 ≤
 T
 
 
 
 p
 
 t
 
 
 
 m
 
 t
 
 
 ≤
 
 ∑
 
 t
 ≤
 T
 
 
 
 m
 
 i
 
 
 t
 
 
 +
 
 
 
 ln
 ⁡
 (
 N
 )
 
 η
 
 
 +
 η
 T
 
 
 {\displaystyle \sum _{t\leq T}p^{t}m^{t}\leq \sum _{t\leq T}m_{i}^{t}+{\frac {\ln(N)}{\eta }}+\eta T}

Initialization: Fix an 
 
 
 
 η
 >
 0
 
 
 {\displaystyle \eta >0}
 
. For each expert, associate the weight 
 
 
 
 
 w
 
 i
 
 
 1
 
 
 
 
 {\displaystyle w_{i}^{1}}
 
 ≔1
For t=1,2,...,T:

 1. Pick the distribution 
 
 
 
 
 p
 
 i
 
 
 t
 
 
 =
 
 
 
 w
 
 i
 
 
 t
 
 
 
 Φ
 t
 
 
 
 
 
 {\displaystyle p_{i}^{t}={\frac {w_{i}^{t}}{\Phi t}}}
 
 where 
 
 
 
 Φ
 t
 =
 
 ∑
 
 i
 
 
 
 w
 
 i
 
 
 t
 
 
 
 
 {\displaystyle \Phi t=\sum _{i}w_{i}^{t}}
 
.
 2. Observe the cost of the decision 
 
 
 
 
 m
 
 t
 
 
 
 
 {\displaystyle m^{t}}
 
. 
 3. Set 
 
 
 
 
 
 w
 
 i
 
 
 t
 +
 1
 
 
 =
 
 w
 
 i
 
 
 t
 
 
 exp
 ⁡
 (
 −
 η
 
 m
 
 i
 
 
 t
 
 
 
 
 {\displaystyle w_{i}^{t+1}=w_{i}^{t}\exp(-\eta m_{i}^{t}}
 
).

<h4>AdaBoost algorithm</h4>
<a href="/facts/AdaBoost/nwJ6znVj">This algorithm</a><a class="footnote-ref" id="fnref:28" href="#fn:28">28</a> maintains a set of weights 
 
 
 
 
 w
 
 t
 
 
 
 
 {\displaystyle w^{t}}
 
 over the training examples. On every iteration 
 
 
 
 t
 
 
 {\displaystyle t}
 
, a distribution 
 
 
 
 
 p
 
 t
 
 
 
 
 {\displaystyle p^{t}}
 
 is computed by normalizing these weights. This distribution is fed to the weak learner WeakLearn which generates a hypothesis 
 
 
 
 
 h
 
 t
 
 
 
 
 {\displaystyle h_{t}}
 
 that (hopefully) has small error with respect to the distribution. Using the new hypothesis 
 
 
 
 
 h
 
 t
 
 
 
 
 {\displaystyle h_{t}}
 
, AdaBoost generates the next weight vector 
 
 
 
 
 w
 
 t
 +
 1
 
 
 
 
 {\displaystyle w^{t+1}}
 
. The process repeats. After T such iterations, the final hypothesis 
 
 
 
 
 h
 
 f
 
 
 
 
 {\displaystyle h_{f}}
 
 is the output. The hypothesis 
 
 
 
 
 h
 
 f
 
 
 
 
 {\displaystyle h_{f}}
 
 combines the outputs of the T weak hypotheses using a weighted majority vote.<a class="footnote-ref" id="fnref:29" href="#fn:29">29</a>

Input: 
 Sequence of 
 
 
 
 N
 
 
 {\displaystyle N}
 
 labeled examples (
 
 
 
 
 x
 
 1
 
 
 
 
 {\displaystyle x_{1}}
 
,
 
 
 
 
 y
 
 1
 
 
 
 
 {\displaystyle y_{1}}
 
),...,(
 
 
 
 
 x
 
 N
 
 
 
 
 {\displaystyle x_{N}}
 
, 
 
 
 
 
 y
 
 N
 
 
 
 
 {\displaystyle y_{N}}
 
)
 Distribution 
 
 
 
 D
 
 
 {\displaystyle D}
 
 over the 
 
 
 
 N
 
 
 {\displaystyle N}
 
 examples
 Weak learning algorithm "'WeakLearn"'
 Integer 
 
 
 
 T
 
 
 {\displaystyle T}
 
 specifying number of iterations
Initialize the weight vector: 
 
 
 
 
 w
 
 i
 
 
 1
 
 
 =
 D
 (
 i
 )
 
 
 {\displaystyle w_{i}^{1}=D(i)}
 
 for 
 
 
 
 i
 =
 1
 ,
 2
 ,
 .
 .
 .
 ,
 N
 
 
 {\displaystyle i=1,2,...,N}
 
.
Do for 
 
 
 
 t
 =
 1
 ,
 2
 ,
 .
 .
 .
 ,
 T
 
 
 {\displaystyle t=1,2,...,T}

1. Set 
  
    
      
        
          p
          
            t
          
        
        =
        
          
            
              w
              
                t
              
            
            
              
                ∑
                
                  i
                  =
                  1
                
                
                  N
                
              
              
                w
                
                  i
                
                
                  t
                
              
            
          
        
      
    
    {\displaystyle p^{t}={\frac {w^{t}}{\sum _{i=1}^{N}w_{i}^{t}}}}
  
.
      2. Call WeakLearn, providing it with the distribution 
  
    
      
        
          p
          
            t
          
        
      
    
    {\displaystyle p^{t}}
  
; get back a hypothesis 
  
    
      
        
          h
          
            t
          
        
        :
        X
        →
      
    
    {\displaystyle h_{t}:X\rightarrow }
  
 [0,1].
      3. Calculate the error of 
  
    
      
        
          h
          
            t
          
        
        :
        
          ϵ
          
            t
          
        
        =
        
          ∑
          
            i
            =
            1
          
          
            N
          
        
        
          p
          
            i
          
          
            t
          
        
        
          |
        
        
          h
          
            t
          
        
        (
        
          x
          
            i
          
        
        )
        −
        
          y
          
            i
          
        
        
          |
        
      
    
    {\displaystyle h_{t}:\epsilon _{t}=\sum _{i=1}^{N}p_{i}^{t}|h_{t}(x_{i})-y_{i}|}
  
.
      4. Set 
  
    
      
        
          β
          
            t
          
        
        =
        
          
            
              ϵ
              
                t
              
            
            
              1
              −
              
                ϵ
                
                  t
                
              
            
          
        
      
    
    {\displaystyle \beta _{t}={\frac {\epsilon _{t}}{1-\epsilon _{t}}}}
  
.                                     
      5. Set the new weight vector to be 
  
    
      
        
          w
          
            i
          
          
            t
            +
            1
          
        
        =
        
          w
          
            i
          
          
            t
          
        
        
          β
          
            t
          
          
            1
            −
            
              |
            
            
              h
              
                t
              
            
            (
            
              x
              
                i
              
            
            )
            −
            
              y
              
                i
              
            
            
              |
            
          
        
      
    
    {\displaystyle w_{i}^{t+1}=w_{i}^{t}\beta _{t}^{1-|h_{t}(x_{i})-y_{i}|}}
  
.

Output the hypothesis:

f
        (
        x
        )
        =
        
          h
          
            f
          
        
        (
        x
        )
        =
        
          
            {
            
              
                
                  1
                
                
                  
                    if
                  
                  
                    ∑
                    
                      t
                      =
                      1
                    
                    
                      T
                    
                  
                  (
                  log
                  ⁡
                  (
                  1
                  
                    /
                  
                  
                    β
                    
                      t
                    
                  
                  )
                  )
                  
                    h
                    
                      t
                    
                  
                  (
                  x
                  )
                  ≥
                  
                    
                      1
                      2
                    
                  
                  
                    ∑
                    
                      t
                      =
                      1
                    
                    
                      T
                    
                  
                  log
                  ⁡
                  (
                  1
                  
                    /
                  
                  
                    β
                    
                      t
                    
                  
                  )
                
              
              
                
                  0
                
                
                  
                    otherwise
                  
                
              
            
            
          
        
      
    
    {\displaystyle f(x)=h_{f}(x)={\begin{cases}1&{\text{if}}\sum _{t=1}^{T}(\log(1/\beta _{t}))h_{t}(x)\geq {\frac {1}{2}}\sum _{t=1}^{T}\log(1/\beta _{t})\\0&{\text{otherwise}}\end{cases}}}

<h3>Solving linear programs approximately</h3>
Source:<a class="footnote-ref" id="fnref:30" href="#fn:30">30</a>

<h4>Problem</h4>
Given a 
 
 
 
 m
 ×
 n
 
 
 {\displaystyle m\times n}
 
 matrix 
 
 
 
 A
 
 
 {\displaystyle A}
 
 and 
 
 
 
 b
 ∈
 
 
 R
 
 
 n
 
 
 
 
 {\displaystyle b\in \mathbb {R} ^{n}}
 
, is there a 
 
 
 
 x
 
 
 {\displaystyle x}
 
 such that 
 
 
 
 A
 x
 ≥
 b
 
 
 {\displaystyle Ax\geq b}
 
?

 
 
 
 
 ∃
 ?
 x
 :
 A
 x
 ≥
 b
 
 
 {\displaystyle \exists ?x:Ax\geq b}
 
 (1)

<h4>Assumption</h4>
Using the oracle algorithm in solving zero-sum problem, with an error parameter 
 
 
 
 ϵ
 >
 0
 
 
 {\displaystyle \epsilon >0}
 
, the output would either be a point 
 
 
 
 x
 
 
 {\displaystyle x}
 
 such that 
 
 
 
 A
 x
 ≥
 b
 −
 ϵ
 
 
 {\displaystyle Ax\geq b-\epsilon }
 
 or a proof that 
 
 
 
 x
 
 
 {\displaystyle x}
 
 does not exist, i.e., there is no solution to this linear system of inequalities.

<h4>Solution</h4>
Given vector 
 
 
 
 p
 ∈
 
 Δ
 
 n
 
 
 
 
 {\displaystyle p\in \Delta _{n}}
 
, solves the following relaxed problem

 
 
 
 
 ∃
 ?
 x
 :
 
 p
 
 
 T
 
 
 
 
 
 A
 x
 ≥
 
 p
 
 
 T
 
 
 
 
 b
 
 
 {\displaystyle \exists ?x:p^{\textsf {T}}\!\!Ax\geq p^{\textsf {T}}\!b}
 
 (2)

If there exists a x satisfying (1), then x satisfies (2) for all 
 
 
 
 p
 ∈
 
 Δ
 
 n
 
 
 
 
 {\displaystyle p\in \Delta _{n}}
 
. The contrapositive of this statement is also true.
Suppose if oracle returns a feasible solution for a 
 
 
 
 p
 
 
 {\displaystyle p}
 
, the solution 
 
 
 
 x
 
 
 {\displaystyle x}
 
 it returns has bounded width 
 
 
 
 
 max
 
 i
 
 
 
 |
 
 
 
 (
 A
 x
 )
 
 
 i
 
 
 −
 
 b
 
 i
 
 
 
 |
 
 ≤
 1
 
 
 {\displaystyle \max _{i}|{(Ax)}_{i}-b_{i}|\leq 1}
 
.
So if there is a solution to (1), then there is an algorithm that its output x satisfies the system (2) up to an additive error of 
 
 
 
 2
 ϵ
 
 
 {\displaystyle 2\epsilon }
 
. The algorithm makes at most 
 
 
 
 
 
 
 ln
 ⁡
 (
 m
 )
 
 
 ϵ
 
 2
 
 
 
 
 
 
 {\displaystyle {\frac {\ln(m)}{\epsilon ^{2}}}}
 
 calls to a width-bounded oracle for the problem (2). The contrapositive stands true as well. The multiplicative updates is applied in the algorithm in this case.

<h3>Other applications</h3>
Evolutionary game theory
Multiplicative weights update is the discrete-time variant of the <a href="/facts/Replicator_equation/UNHyHy74">replicator equation</a> (replicator dynamics), which is a commonly used model in <a href="/facts/Evolutionary_game_theory/dPlL3umh">evolutionary game theory</a>. It converges to <a href="/facts/Nash_equilibrium/l74xzIHd">Nash equilibrium</a> when applied to a <a href="/facts/Congestion_game/Q3qGGp7u">congestion game</a>.<a class="footnote-ref" id="fnref:31" href="#fn:31">31</a>
Operations research and online statistical decision-making
In <a href="/facts/Operations_research/H5yJkV5n">operations research</a> and on-line statistical decision making problem field, the weighted majority algorithm and its more complicated versions have been found independently.<a class="footnote-ref" id="fnref:32" href="#fn:32">32</a>
Computational geometry
The multiplicative weights algorithm is also widely applied in <a href="/facts/Computational_geometry/eeaotQtl">computational geometry</a>,<a class="footnote-ref" id="fnref:33" href="#fn:33">33</a> such as <a href="/facts/Kenneth_L._Clarkson/JbgWa82R">Clarkson</a>'s algorithm for <a href="/facts/Linear_programming/GduXFQxT">linear programming (LP)</a> with a bounded number of variables in linear time.<a class="footnote-ref" id="fnref:34" href="#fn:34">34</a><a class="footnote-ref" id="fnref:35" href="#fn:35">35</a> Later, Bronnimann and Goodrich employed analogous methods to find <a href="/facts/Set_cover_problem/MVzyBYs1">Set Covers</a> for <a href="/facts/Hypergraph/qIR2o2I3">hypergraphs</a> with small <a href="/facts/VC_dimension/ThkxKE12">VC dimension</a>.<a class="footnote-ref" id="fnref:36" href="#fn:36">36</a>
<a href="/facts/Gradient_descent/pFFrek0F">Gradient descent method</a><a class="footnote-ref" id="fnref:37" href="#fn:37">37</a>
<a href="/facts/Matrix_(mathematics)/qa8FI4ko">Matrix</a> multiplicative weights update<a class="footnote-ref" id="fnref:38" href="#fn:38">38</a>
Plotkin, Shmoys, Tardos framework for <a href="/facts/Packing_problems/nQtnMb6Y">packing</a>/<a href="/facts/Covering_problems/4BIHqE9K">covering LPs</a><a class="footnote-ref" id="fnref:39" href="#fn:39">39</a>
Approximating <a href="/facts/Multi-commodity_flow_problem/xdWBWmqa">multi-commodity flow problems</a><a class="footnote-ref" id="fnref:40" href="#fn:40">40</a>
O (logn)- approximation for many <a href="/facts/NP-hardness/mQeNPv4R">NP-hard problems</a><a class="footnote-ref" id="fnref:41" href="#fn:41">41</a>
<a href="/facts/Learning_theory_(education)/uuYQ4BHW">Learning theory</a> and <a href="/facts/Boosting_(machine_learning)/HgejTPPu">boosting</a><a class="footnote-ref" id="fnref:42" href="#fn:42">42</a>
Hard-core sets and the XOR lemma<a class="footnote-ref" id="fnref:43" href="#fn:43">43</a>
Hannan's algorithm and multiplicative weights<a class="footnote-ref" id="fnref:44" href="#fn:44">44</a>
Online <a href="/facts/Convex_optimization/7D6U4MTN">convex optimization</a><a class="footnote-ref" id="fnref:45" href="#fn:45">45</a>

<h2 id="external-links">External links</h2>
<ul><li><a href="https://www.quantamagazine.org/game-theory-makes-new-predictions-for-evolution-20140618">The Game Theory of Life</a> a <a href="/facts/Quanta_Magazine/FkbYnGtU">Quanta Magazine</a> article describing the use of the method to evolutionary biology in a paper by Erick Chastain, Adi Livnat, <a href="/facts/Christos_Papadimitriou/rXflR4or">Christos Papadimitriou</a>, and <a href="/facts/Umesh_Vazirani/ej4NSg84">Umesh Vazirani</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">"The Multiplicative Weights Algorithm*" (PDF). Retrieved 9 November 2016. <a href="https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf" target="_blank">https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Grigoriadis, Michael D.; Khachiyan, Leonid G. (1995). "A sublinear-time randomized approximation algorithm for matrix games". Operations Research Letters. 18 (2): 53–58. doi:10.1016/0167-6377(95)00032-0. <a href="/wiki/Leonid_Khachiyan" target="_blank">/wiki/Leonid_Khachiyan</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Kenneth L. Clarkson. A Las Vegas algorithm for linear programming when the dimension is small., In Proc. 29th FOCS, pp. 452–456. IEEE Comp. Soc. Press, 1988.[doi:10.1109/SFCS.1988.21961] 123, 152. <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Kenneth L. Clarkson. A Las Vegas algorithm for linear and integer programming when the dimension is small., Journal of the ACM, 42:488–499, 1995. [doi:10.1145/201019.201036] 123, 152. <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Bronnimann, H.; Goodrich, M. T. (1995). "Almost optimal set covers in finite VC-dimension". Discrete & Computational Geometry. 14 (4): 463–479. doi:10.1007/BF02570718. Preliminary version in 10th Ann. Symp. Comp. Geom. (SCG'94). <a href="https://doi.org/10.1007%2FBF02570718" target="_blank">https://doi.org/10.1007%2FBF02570718</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">"The Multiplicative Weights Algorithm*" (PDF). Retrieved 9 November 2016. <a href="https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf" target="_blank">https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">"Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm" (PDF). 2013. <a href="https://www.cs.princeton.edu/courses/archive/fall13/cos521/lecnotes/lec8.pdf" target="_blank">https://www.cs.princeton.edu/courses/archive/fall13/cos521/lecnotes/lec8.pdf</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">"The Multiplicative Weights Algorithm*" (PDF). Retrieved 9 November 2016. <a href="https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf" target="_blank">https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">"COS 511: Foundations of Machine Learning" (PDF). 20 March 2006. <a href="http://www.cs.princeton.edu/courses/archive/spr06/cos511/scribe_notes/0330.pdf" target="_blank">http://www.cs.princeton.edu/courses/archive/spr06/cos511/scribe_notes/0330.pdf</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">"An Algorithmist's Toolkit". 8 December 2009. Retrieved 9 November 2016. <a href="https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe24.pdfformat=PDF" target="_blank">https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe24.pdfformat=PDF</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">"The Multiplicative Weights Algorithm*" (PDF). Retrieved 9 November 2016. <a href="https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf" target="_blank">https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">"An Algorithmist's Toolkit". 8 December 2009. Retrieved 9 November 2016. <a href="https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe24.pdfformat=PDF" target="_blank">https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe24.pdfformat=PDF</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">"An Algorithmist's Toolkit". 8 December 2009. Retrieved 9 November 2016. <a href="https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe24.pdfformat=PDF" target="_blank">https://ocw.mit.edu/courses/mathematics/18-409-topics-in-theoretical-computer-science-an-algorithmists-toolkit-fall-2009/lecture-notes/MIT18_409F09_scribe24.pdfformat=PDF</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">Bailey, James P., and Georgios Piliouras. "Multiplicative weights update in zero-sum games." Proceedings of the 2018 ACM Conference on Economics and Computation. ACM, 2018. <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">Foster, Dean P.; Vohra, Rakesh (1999). "Regret in the on-line decision problem" (PDF). Games and Economic Behavior. 29 (1–2): 7–35. doi:10.1006/game.1999.0740. <a href="http://www.dklevine.com/archive/refs4569.pdf" target="_blank">http://www.dklevine.com/archive/refs4569.pdf</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">Yoav, Freund. Robert, E. Schapire (1996). TA Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting*, p. 55. journal of computer and system sciences. <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">"The Multiplicative Weights Algorithm*" (PDF). Retrieved 9 November 2016. <a href="https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf" target="_blank">https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">"The Multiplicative Weights Algorithm*" (PDF). Retrieved 9 November 2016. <a href="https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf" target="_blank">https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15859-f11/www/notes/lecture16.pdf</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">"Online Learning from Experts: Weighed Majority and Hedge" (PDF). Archived from the original on 29 May 2016. Retrieved 7 December 2016. <a href="https://web.archive.org/web/20160529191219/http://shivani-agarwal.net/Teaching/E0370/Aug-2011/Lectures/20-scribe1.pdf" target="_blank">https://web.archive.org/web/20160529191219/http://shivani-agarwal.net/Teaching/E0370/Aug-2011/Lectures/20-scribe1.pdf</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
<li id="fn:28">Yoav, Freund. Robert, E. Schapire (1996). TA Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting*, p. 55. journal of computer and system sciences. <a href="#fnref:28" class="footnote-back-ref">↩</a></li>
<li id="fn:29">Yoav, Freund. Robert, E. Schapire (1996). TA Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting*, p. 55. journal of computer and system sciences. <a href="#fnref:29" class="footnote-back-ref">↩</a></li>
<li id="fn:30">"Fundamentals of Convex Optimization" (PDF). Retrieved 9 November 2016. <a href="http://tcs.epfl.ch/files/content/sites/tcs/files/Lec2-Fall14-Ver2.pdf" target="_blank">http://tcs.epfl.ch/files/content/sites/tcs/files/Lec2-Fall14-Ver2.pdf</a> <a href="#fnref:30" class="footnote-back-ref">↩</a></li>
<li id="fn:31">Kleinberg, Robert, Georgios Piliouras, and Eva Tardos. "Multiplicative updates outperform generic no-regret learning in congestion games." Proceedings of the forty-first annual ACM symposium on Theory of computing. ACM, 2009. <a href="#fnref:31" class="footnote-back-ref">↩</a></li>
<li id="fn:32">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:32" class="footnote-back-ref">↩</a></li>
<li id="fn:33">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:33" class="footnote-back-ref">↩</a></li>
<li id="fn:34">Kenneth L. Clarkson. A Las Vegas algorithm for linear programming when the dimension is small., In Proc. 29th FOCS, pp. 452–456. IEEE Comp. Soc. Press, 1988.[doi:10.1109/SFCS.1988.21961] 123, 152. <a href="#fnref:34" class="footnote-back-ref">↩</a></li>
<li id="fn:35">Kenneth L. Clarkson. A Las Vegas algorithm for linear and integer programming when the dimension is small., Journal of the ACM, 42:488–499, 1995. [doi:10.1145/201019.201036] 123, 152. <a href="#fnref:35" class="footnote-back-ref">↩</a></li>
<li id="fn:36">Bronnimann, H.; Goodrich, M. T. (1995). "Almost optimal set covers in finite VC-dimension". Discrete & Computational Geometry. 14 (4): 463–479. doi:10.1007/BF02570718. Preliminary version in 10th Ann. Symp. Comp. Geom. (SCG'94). <a href="https://doi.org/10.1007%2FBF02570718" target="_blank">https://doi.org/10.1007%2FBF02570718</a> <a href="#fnref:36" class="footnote-back-ref">↩</a></li>
<li id="fn:37">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:37" class="footnote-back-ref">↩</a></li>
<li id="fn:38">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:38" class="footnote-back-ref">↩</a></li>
<li id="fn:39">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:39" class="footnote-back-ref">↩</a></li>
<li id="fn:40">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:40" class="footnote-back-ref">↩</a></li>
<li id="fn:41">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:41" class="footnote-back-ref">↩</a></li>
<li id="fn:42">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:42" class="footnote-back-ref">↩</a></li>
<li id="fn:43">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:43" class="footnote-back-ref">↩</a></li>
<li id="fn:44">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:44" class="footnote-back-ref">↩</a></li>
<li id="fn:45">Arora, Sanjeev; Hazan, Elad; Kale, Satyen (2012). "The Multiplicative Weights Update Method: A Meta-Algorithm and Applications". Theory of Computing. 8: 121–164. doi:10.4086/toc.2012.v008a006. <a href="/wiki/Sanjeev_Arora" target="_blank">/wiki/Sanjeev_Arora</a> <a href="#fnref:45" class="footnote-back-ref">↩</a></li>
</ol>

Multiplicative weight update method open-in-new

Multiplicative weight update method