Error tolerance (PAC learning)

<h2 id="notation-and-the-valiant-learning-model">Notation and the Valiant learning model</h2>
In the following, let 
 
 
 
 X
 
 
 {\displaystyle X}
 
 be our 
 
 
 
 n
 
 
 {\displaystyle n}
 
-dimensional input space. Let 
 
 
 
 
 
 H
 
 
 
 
 {\displaystyle {\mathcal {H}}}
 
 be a class of functions that we wish to use in order to learn a 
 
 
 
 {
 0
 ,
 1
 }
 
 
 {\displaystyle \{0,1\}}
 
-valued target function 
 
 
 
 f
 
 
 {\displaystyle f}
 
 defined over 
 
 
 
 X
 
 
 {\displaystyle X}
 
. Let 
 
 
 
 
 
 D
 
 
 
 
 {\displaystyle {\mathcal {D}}}
 
 be the distribution of the inputs over 
 
 
 
 X
 
 
 {\displaystyle X}
 
. The goal of a learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 is to choose the best function 
 
 
 
 h
 ∈
 
 
 H
 
 
 
 
 {\displaystyle h\in {\mathcal {H}}}
 
 such that it minimizes 
 
 
 
 e
 r
 r
 o
 r
 (
 h
 )
 =
 
 P
 
 x
 ∼
 
 
 D
 
 
 
 
 (
 h
 (
 x
 )
 ≠
 f
 (
 x
 )
 )
 
 
 {\displaystyle error(h)=P_{x\sim {\mathcal {D}}}(h(x)\neq f(x))}
 
. Let us suppose we have a function 
 
 
 
 s
 i
 z
 e
 (
 f
 )
 
 
 {\displaystyle size(f)}
 
 that can measure the complexity of 
 
 
 
 f
 
 
 {\displaystyle f}
 
. Let 
 
 
 
 
 Oracle
 
 (
 x
 )
 
 
 {\displaystyle {\text{Oracle}}(x)}
 
 be an oracle that, whenever called, returns an example 
 
 
 
 x
 
 
 {\displaystyle x}
 
 and its correct label 
 
 
 
 f
 (
 x
 )
 
 
 {\displaystyle f(x)}
 
.
When no noise corrupts the data, we can define learning in the Valiant setting:<a class="footnote-ref" id="fnref:1" href="#fn:1">1</a><a class="footnote-ref" id="fnref:2" href="#fn:2">2</a>
Definition:
We say that 
 
 
 
 f
 
 
 {\displaystyle f}
 
 is efficiently learnable using 
 
 
 
 
 
 H
 
 
 
 
 {\displaystyle {\mathcal {H}}}
 
 in the <a href="/facts/Leslie_Valiant/hA5okAxL">Valiant</a> setting if there exists a learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 that has access to 
 
 
 
 
 Oracle
 
 (
 x
 )
 
 
 {\displaystyle {\text{Oracle}}(x)}
 
 and a polynomial 
 
 
 
 p
 (
 ⋅
 ,
 ⋅
 ,
 ⋅
 ,
 ⋅
 )
 
 
 {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot )}
 
 such that for any 
 
 
 
 0
 <
 ε
 ≤
 1
 
 
 {\displaystyle 0<\varepsilon \leq 1}
 
 and 
 
 
 
 0
 <
 δ
 ≤
 1
 
 
 {\displaystyle 0<\delta \leq 1}
 
 it outputs, in a number of calls to the oracle bounded by 
 
 
 
 p
 
 (
 
 
 
 1
 ε
 
 
 ,
 
 
 1
 δ
 
 
 ,
 n
 ,
 
 size
 
 (
 f
 )
 
 )
 
 
 
 {\displaystyle p\left({\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,{\text{size}}(f)\right)}
 
 , a function 
 
 
 
 h
 ∈
 
 
 H
 
 
 
 
 {\displaystyle h\in {\mathcal {H}}}
 
 that satisfies with probability at least 
 
 
 
 1
 −
 δ
 
 
 {\displaystyle 1-\delta }
 
 the condition 
 
 
 
 
 error
 
 (
 h
 )
 ≤
 ε
 
 
 {\displaystyle {\text{error}}(h)\leq \varepsilon }
 
.
In the following we will define learnability of 
 
 
 
 f
 
 
 {\displaystyle f}
 
 when data have suffered some modification.<a class="footnote-ref" id="fnref:3" href="#fn:3">3</a><a class="footnote-ref" id="fnref:4" href="#fn:4">4</a><a class="footnote-ref" id="fnref:5" href="#fn:5">5</a>

<h2 id="classification-noise">Classification noise</h2>
In the classification noise model<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> a noise rate 
 
 
 
 0
 ≤
 η
 <
 
 
 1
 2
 
 
 
 
 {\displaystyle 0\leq \eta <{\frac {1}{2}}}
 
 is introduced. Then, instead of 
 
 
 
 
 Oracle
 
 (
 x
 )
 
 
 {\displaystyle {\text{Oracle}}(x)}
 
 that returns always the correct label of example 
 
 
 
 x
 
 
 {\displaystyle x}
 
, algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 can only call a faulty oracle 
 
 
 
 
 Oracle
 
 (
 x
 ,
 η
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\eta )}
 
 that will flip the label of 
 
 
 
 x
 
 
 {\displaystyle x}
 
 with probability 
 
 
 
 η
 
 
 {\displaystyle \eta }
 
. As in the Valiant case, the goal of a learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 is to choose the best function 
 
 
 
 h
 ∈
 
 
 H
 
 
 
 
 {\displaystyle h\in {\mathcal {H}}}
 
 such that it minimizes 
 
 
 
 e
 r
 r
 o
 r
 (
 h
 )
 =
 
 P
 
 x
 ∼
 
 
 D
 
 
 
 
 (
 h
 (
 x
 )
 ≠
 f
 (
 x
 )
 )
 
 
 {\displaystyle error(h)=P_{x\sim {\mathcal {D}}}(h(x)\neq f(x))}
 
. In applications it is difficult to have access to the real value of 
 
 
 
 η
 
 
 {\displaystyle \eta }
 
, but we assume we have access to its upperbound 
 
 
 
 
 η
 
 B
 
 
 
 
 {\displaystyle \eta _{B}}
 
.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a> Note that if we allow the noise rate to be 
 
 
 
 1
 
 /
 
 2
 
 
 {\displaystyle 1/2}
 
, then learning becomes impossible in any amount of computation time, because every label conveys no information about the target function.
Definition:
We say that 
 
 
 
 f
 
 
 {\displaystyle f}
 
 is efficiently learnable using 
 
 
 
 
 
 H
 
 
 
 
 {\displaystyle {\mathcal {H}}}
 
 in the classification noise model if there exists a learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 that has access to 
 
 
 
 
 Oracle
 
 (
 x
 ,
 η
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\eta )}
 
 and a polynomial 
 
 
 
 p
 (
 ⋅
 ,
 ⋅
 ,
 ⋅
 ,
 ⋅
 )
 
 
 {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot )}
 
 such that for any 
 
 
 
 0
 ≤
 η
 ≤
 
 
 1
 2
 
 
 
 
 {\displaystyle 0\leq \eta \leq {\frac {1}{2}}}
 
, 
 
 
 
 0
 ≤
 ε
 ≤
 1
 
 
 {\displaystyle 0\leq \varepsilon \leq 1}
 
 and 
 
 
 
 0
 ≤
 δ
 ≤
 1
 
 
 {\displaystyle 0\leq \delta \leq 1}
 
 it outputs, in a number of calls to the oracle bounded by 
 
 
 
 p
 
 (
 
 
 
 1
 
 1
 −
 2
 
 η
 
 B
 
 
 
 
 
 ,
 
 
 1
 ε
 
 
 ,
 
 
 1
 δ
 
 
 ,
 n
 ,
 s
 i
 z
 e
 (
 f
 )
 
 )
 
 
 
 {\displaystyle p\left({\frac {1}{1-2\eta _{B}}},{\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,size(f)\right)}
 
 , a function 
 
 
 
 h
 ∈
 
 
 H
 
 
 
 
 {\displaystyle h\in {\mathcal {H}}}
 
 that satisfies with probability at least 
 
 
 
 1
 −
 δ
 
 
 {\displaystyle 1-\delta }
 
 the condition 
 
 
 
 e
 r
 r
 o
 r
 (
 h
 )
 ≤
 ε
 
 
 {\displaystyle error(h)\leq \varepsilon }
 
.

<h2 id="statistical-query-learning">Statistical query learning</h2>
Statistical Query Learning<a class="footnote-ref" id="fnref:8" href="#fn:8">8</a> is a kind of <a href="/facts/Active_learning_(machine_learning)/wWoXGdCV">active learning</a> problem in which the learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 can decide if to request information about the likelihood 
 
 
 
 
 P
 
 f
 (
 x
 )
 
 
 
 
 {\displaystyle P_{f(x)}}
 
 that a function 
 
 
 
 f
 
 
 {\displaystyle f}
 
 correctly labels example 
 
 
 
 x
 
 
 {\displaystyle x}
 
, and receives an answer accurate within a tolerance 
 
 
 
 α
 
 
 {\displaystyle \alpha }
 
. Formally, whenever the learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 calls the oracle 
 
 
 
 
 Oracle
 
 (
 x
 ,
 α
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\alpha )}
 
, it receives as feedback probability 
 
 
 
 
 Q
 
 f
 (
 x
 )
 
 
 
 
 {\displaystyle Q_{f(x)}}
 
, such that 
 
 
 
 
 Q
 
 f
 (
 x
 )
 
 
 −
 α
 ≤
 
 P
 
 f
 (
 x
 )
 
 
 ≤
 
 Q
 
 f
 (
 x
 )
 
 
 +
 α
 
 
 {\displaystyle Q_{f(x)}-\alpha \leq P_{f(x)}\leq Q_{f(x)}+\alpha }
 
.
Definition:
We say that 
 
 
 
 f
 
 
 {\displaystyle f}
 
 is efficiently learnable using 
 
 
 
 
 
 H
 
 
 
 
 {\displaystyle {\mathcal {H}}}
 
 in the statistical query learning model if there exists a learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 that has access to 
 
 
 
 
 Oracle
 
 (
 x
 ,
 α
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\alpha )}
 
 and polynomials 
 
 
 
 p
 (
 ⋅
 ,
 ⋅
 ,
 ⋅
 )
 
 
 {\displaystyle p(\cdot ,\cdot ,\cdot )}
 
, 
 
 
 
 q
 (
 ⋅
 ,
 ⋅
 ,
 ⋅
 )
 
 
 {\displaystyle q(\cdot ,\cdot ,\cdot )}
 
, and 
 
 
 
 r
 (
 ⋅
 ,
 ⋅
 ,
 ⋅
 )
 
 
 {\displaystyle r(\cdot ,\cdot ,\cdot )}
 
 such that for any 
 
 
 
 0
 <
 ε
 ≤
 1
 
 
 {\displaystyle 0<\varepsilon \leq 1}
 
 the following hold:

<ol><li>
 
 
 
 
 Oracle
 
 (
 x
 ,
 α
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\alpha )}
 
 can evaluate 
 
 
 
 
 P
 
 f
 (
 x
 )
 
 
 
 
 {\displaystyle P_{f(x)}}
 
 in time 
 
 
 
 q
 
 (
 
 
 
 1
 ε
 
 
 ,
 n
 ,
 s
 i
 z
 e
 (
 f
 )
 
 )
 
 
 
 {\displaystyle q\left({\frac {1}{\varepsilon }},n,size(f)\right)}
 
;</li>
<li>
 
 
 
 
 
 1
 α
 
 
 
 
 {\displaystyle {\frac {1}{\alpha }}}
 
 is bounded by 
 
 
 
 r
 
 (
 
 
 
 1
 ε
 
 
 ,
 n
 ,
 s
 i
 z
 e
 (
 f
 )
 
 )
 
 
 
 {\displaystyle r\left({\frac {1}{\varepsilon }},n,size(f)\right)}
 
</li>
<li>
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 outputs a model 
 
 
 
 h
 
 
 {\displaystyle h}
 
 such that 
 
 
 
 e
 r
 r
 (
 h
 )
 <
 ε
 
 
 {\displaystyle err(h)<\varepsilon }
 
, in a number of calls to the oracle bounded by 
 
 
 
 p
 
 (
 
 
 
 1
 ε
 
 
 ,
 n
 ,
 s
 i
 z
 e
 (
 f
 )
 
 )
 
 
 
 {\displaystyle p\left({\frac {1}{\varepsilon }},n,size(f)\right)}
 
.</li></ol>
Note that the confidence parameter 
 
 
 
 δ
 
 
 {\displaystyle \delta }
 
 does not appear in the definition of learning. This is because the main purpose of 
 
 
 
 δ
 
 
 {\displaystyle \delta }
 
 is to allow the learning algorithm a small probability of failure due to an unrepresentative sample. Since now 
 
 
 
 
 Oracle
 
 (
 x
 ,
 α
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\alpha )}
 
 always guarantees to meet the approximation criterion 
 
 
 
 
 Q
 
 f
 (
 x
 )
 
 
 −
 α
 ≤
 
 P
 
 f
 (
 x
 )
 
 
 ≤
 
 Q
 
 f
 (
 x
 )
 
 
 +
 α
 
 
 {\displaystyle Q_{f(x)}-\alpha \leq P_{f(x)}\leq Q_{f(x)}+\alpha }
 
, the failure probability is no longer needed.
The statistical query model is strictly weaker than the PAC model: any efficiently SQ-learnable class is efficiently PAC learnable in the presence of classification noise, but there exist efficient PAC-learnable problems such as <a href="/facts/Parity_(mathematics)/7zq2UM4V">parity</a> that are not efficiently SQ-learnable.<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a>

<h2 id="malicious-classification">Malicious classification</h2>
In the malicious classification model<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a> an adversary generates errors to foil the learning algorithm. This setting describes situations of <a href="/facts/Burst_error/UXkhCd2D">error burst</a>, which may occur when for a limited time transmission equipment malfunctions repeatedly. Formally, algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 calls an oracle 
 
 
 
 
 Oracle
 
 (
 x
 ,
 β
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\beta )}
 
 that returns a correctly labeled example 
 
 
 
 x
 
 
 {\displaystyle x}
 
 drawn, as usual, from distribution 
 
 
 
 
 
 D
 
 
 
 
 {\displaystyle {\mathcal {D}}}
 
 over the input space with probability 
 
 
 
 1
 −
 β
 
 
 {\displaystyle 1-\beta }
 
, but it returns with probability 
 
 
 
 β
 
 
 {\displaystyle \beta }
 
 an example drawn from a distribution that is not related to 
 
 
 
 
 
 D
 
 
 
 
 {\displaystyle {\mathcal {D}}}
 
. 
Moreover, this maliciously chosen example may strategically selected by an adversary who has knowledge of 
 
 
 
 f
 
 
 {\displaystyle f}
 
, 
 
 
 
 β
 
 
 {\displaystyle \beta }
 
, 
 
 
 
 
 
 D
 
 
 
 
 {\displaystyle {\mathcal {D}}}
 
, or the current progress of the learning algorithm.
Definition:
Given a bound 
 
 
 
 
 β
 
 B
 
 
 <
 
 
 1
 2
 
 
 
 
 {\displaystyle \beta _{B}<{\frac {1}{2}}}
 
 for 
 
 
 
 0
 ≤
 β
 <
 
 
 1
 2
 
 
 
 
 {\displaystyle 0\leq \beta <{\frac {1}{2}}}
 
, we say that 
 
 
 
 f
 
 
 {\displaystyle f}
 
 is efficiently learnable using 
 
 
 
 
 
 H
 
 
 
 
 {\displaystyle {\mathcal {H}}}
 
 in the malicious classification model, if there exist a learning algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 that has access to 
 
 
 
 
 Oracle
 
 (
 x
 ,
 β
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\beta )}
 
 and a polynomial 
 
 
 
 p
 (
 ⋅
 ,
 ⋅
 ,
 ⋅
 ,
 ⋅
 ,
 ⋅
 )
 
 
 {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot ,\cdot )}
 
 such that for any 
 
 
 
 0
 <
 ε
 ≤
 1
 
 
 {\displaystyle 0<\varepsilon \leq 1}
 
, 
 
 
 
 0
 <
 δ
 ≤
 1
 
 
 {\displaystyle 0<\delta \leq 1}
 
 it outputs, in a number of calls to the oracle bounded by 
 
 
 
 p
 
 (
 
 
 
 1
 
 1
 
 /
 
 2
 −
 
 β
 
 B
 
 
 
 
 
 ,
 
 
 1
 ε
 
 
 ,
 
 
 1
 δ
 
 
 ,
 n
 ,
 s
 i
 z
 e
 (
 f
 )
 
 )
 
 
 
 {\displaystyle p\left({\frac {1}{1/2-\beta _{B}}},{\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,size(f)\right)}
 
 , a function 
 
 
 
 h
 ∈
 
 
 H
 
 
 
 
 {\displaystyle h\in {\mathcal {H}}}
 
 that satisfies with probability at least 
 
 
 
 1
 −
 δ
 
 
 {\displaystyle 1-\delta }
 
 the condition 
 
 
 
 e
 r
 r
 o
 r
 (
 h
 )
 ≤
 ε
 
 
 {\displaystyle error(h)\leq \varepsilon }
 
.

<h2 id="errors-in-the-inputs-nonuniform-random-attribute-noise">Errors in the inputs: nonuniform random attribute noise</h2>
In the nonuniform random attribute noise<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a><a class="footnote-ref" id="fnref:12" href="#fn:12">12</a> model the algorithm is learning a <a href="/facts/Boolean_function/uT7E8pqC">Boolean function</a>, a malicious oracle 
 
 
 
 
 Oracle
 
 (
 x
 ,
 ν
 )
 
 
 {\displaystyle {\text{Oracle}}(x,\nu )}
 
 may flip each 
 
 
 
 i
 
 
 {\displaystyle i}
 
-th bit of example 
 
 
 
 x
 =
 (
 
 x
 
 1
 
 
 ,
 
 x
 
 2
 
 
 ,
 …
 ,
 
 x
 
 n
 
 
 )
 
 
 {\displaystyle x=(x_{1},x_{2},\ldots ,x_{n})}
 
 independently with probability 
 
 
 
 
 ν
 
 i
 
 
 ≤
 ν
 
 
 {\displaystyle \nu _{i}\leq \nu }
 
.
This type of error can irreparably foil the algorithm, in fact the following theorem holds:
In the nonuniform random attribute noise setting, an algorithm 
 
 
 
 
 
 A
 
 
 
 
 {\displaystyle {\mathcal {A}}}
 
 can output a function 
 
 
 
 h
 ∈
 
 
 H
 
 
 
 
 {\displaystyle h\in {\mathcal {H}}}
 
 such that 
 
 
 
 e
 r
 r
 o
 r
 (
 h
 )
 <
 ε
 
 
 {\displaystyle error(h)<\varepsilon }
 
 only if 
 
 
 
 ν
 <
 2
 ε
 
 
 {\displaystyle \nu <2\varepsilon }
 
.

<h2 id="see-also">See also</h2>

<ul><li><a href="/facts/Machine_learning/e0w0XJTu">Machine learning</a></li>
<li><a href="/facts/Data_mining/AH53q5ac">Data mining</a></li>
<li><a href="/facts/Probably_approximately_correct_learning/vBQlvDV1">Probably approximately correct learning</a></li>
<li><a href="/facts/Adversarial_machine_learning/4An4fgUc">Adversarial machine learning</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Valiant, L. G. (August 1985). Learning Disjunction of Conjunctions. In IJCAI (pp. 560–566). <a href="http://www.ijcai.org/Past%20Proceedings/IJCAI-85-VOL1/PDF/107.pdf" target="_blank">http://www.ijcai.org/Past%20Proceedings/IJCAI-85-VOL1/PDF/107.pdf</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Valiant, Leslie G. "A theory of the learnable." Communications of the ACM 27.11 (1984): 1134–1142. <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Laird, P. D. (1988). Learning from good and bad data. Kluwer Academic Publishers. <a href="https://link.springer.com/content/pdf/bfm%3A978-1-4613-1685-5%2F1.pdf" target="_blank">https://link.springer.com/content/pdf/bfm%3A978-1-4613-1685-5%2F1.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Kearns, Michael. "Efficient noise-tolerant learning from statistical queries Archived 3 May 2013 at the Wayback Machine." Journal of the ACM 45.6 (1998): 983–1006. <a href="https://www.cs.iastate.edu/~honavar/noise-pac.pdf" target="_blank">https://www.cs.iastate.edu/~honavar/noise-pac.pdf</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Brunk, Clifford A., and Michael J. Pazzani. "An investigation of noise-tolerant relational concept learning algorithms." Proceedings of the 8th International Workshop on Machine Learning. 1991. <a href="http://www.ics.uci.edu/~pazzani/Publications/An-Investigation-MLW-91.pdf" target="_blank">http://www.ics.uci.edu/~pazzani/Publications/An-Investigation-MLW-91.pdf</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory, chapter 5. MIT press. <a href="https://books.google.com/books?id=vCA01wY6iywC" target="_blank">https://books.google.com/books?id=vCA01wY6iywC</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343–370. <a href="http://homepages.math.uic.edu/~lreyzin/papers/angluin88b.pdf" target="_blank">http://homepages.math.uic.edu/~lreyzin/papers/angluin88b.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Kearns, M. (1998). [www.cis.upenn.edu/~mkearns/papers/sq-journal.pdf Efficient noise-tolerant learning from statistical queries]. Journal of the ACM, 45(6), 983–1006. <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Kearns, M. (1998). [www.cis.upenn.edu/~mkearns/papers/sq-journal.pdf Efficient noise-tolerant learning from statistical queries]. Journal of the ACM, 45(6), 983–1006. <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Kearns, M., & Li, M. (1993). [www.cis.upenn.edu/~mkearns/papers/malicious.pdf Learning in the presence of malicious errors]. SIAM Journal on Computing, 22(4), 807–837. <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Goldman, S. A., & Sloan, Robert, H. (1991). The difficulty of random attribute noise. Technical Report WUCS 91 29, Washington University, Department of Computer Science. <a href="/wiki/Sally_Goldman" target="_blank">/wiki/Sally_Goldman</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Sloan, R. H. (1989). Computational learning theory: New models and algorithms (Doctoral dissertation, Massachusetts Institute of Technology). <a href="http://dspace.mit.edu/bitstream/handle/1721.1/38339/20770411.pdf?sequence=1" target="_blank">http://dspace.mit.edu/bitstream/handle/1721.1/38339/20770411.pdf?sequence=1</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
</ol>

Error tolerance (PAC learning) open-in-new

Error tolerance (PAC learning)