Conditional independence

<h2 id="conditional-independence-of-events">Conditional independence of events</h2>
<p>Let 
  
    
      
        A
      
    
    {\displaystyle A}
  
, 
  
    
      
        B
      
    
    {\displaystyle B}
  
, and 
  
    
      
        C
      
    
    {\displaystyle C}
  
 be <a href="/facts/Event_(probability_theory)/aJ0l4oQb">events</a>. 
  
    
      
        A
      
    
    {\displaystyle A}
  
 and 
  
    
      
        B
      
    
    {\displaystyle B}
  
 are said to be conditionally independent given 
  
    
      
        C
      
    
    {\displaystyle C}
  
 if and only if 
  
    
      
        P
        (
        C
        )
        >
        0
      
    
    {\displaystyle P(C)>0}
  
 and:
</p>

P
        (
        A
        ∣
        B
        ,
        C
        )
        =
        P
        (
        A
        ∣
        C
        )
      
    
    {\displaystyle P(A\mid B,C)=P(A\mid C)}

<p>This property is often written: 
  
    
      
        (
        A
        ⊥
        
        
        
        ⊥
        B
        ∣
        C
        )
      
    
    {\displaystyle (A\perp \!\!\!\perp B\mid C)}
  
, which should be read 
  
    
      
        (
        (
        A
        ⊥
        
        
        
        ⊥
        B
        )
        |
        C
        )
      
    
    {\displaystyle ((A\perp \!\!\!\perp B)\vert C)}
  
.
</p><p>Equivalently, conditional independence may be stated as:
</p>

P
        (
        A
        ,
        B
        
          |
        
        C
        )
        =
        P
        (
        A
        
          |
        
        C
        )
        P
        (
        B
        
          |
        
        C
        )
      
    
    {\displaystyle P(A,B|C)=P(A|C)P(B|C)}

<p>where 
  
    
      
        P
        (
        A
        ,
        B
        
          |
        
        C
        )
      
    
    {\displaystyle P(A,B|C)}
  
 is the <a href="/facts/Joint_probability/klX2ksGY">joint probability</a> of 
  
    
      
        A
      
    
    {\displaystyle A}
  
 and 
  
    
      
        B
      
    
    {\displaystyle B}
  
 given 
  
    
      
        C
      
    
    {\displaystyle C}
  
. This alternate formulation states that 
  
    
      
        A
      
    
    {\displaystyle A}
  
 and 
  
    
      
        B
      
    
    {\displaystyle B}
  
 are <a href="/facts/Independence_(probability_theory)/NUzQtnUL">independent events</a>, given 
  
    
      
        C
      
    
    {\displaystyle C}
  
.
</p><p>It demonstrates that 
  
    
      
        (
        A
        ⊥
        
        
        
        ⊥
        B
        ∣
        C
        )
      
    
    {\displaystyle (A\perp \!\!\!\perp B\mid C)}
  
 is equivalent to 
  
    
      
        (
        B
        ⊥
        
        
        
        ⊥
        A
        ∣
        C
        )
      
    
    {\displaystyle (B\perp \!\!\!\perp A\mid C)}
  
.
</p>
<h3>Proof of the equivalent definition</h3>

P
        (
        A
        ,
        B
        ∣
        C
        )
        =
        P
        (
        A
        ∣
        C
        )
        P
        (
        B
        ∣
        C
        )
      
    
    {\displaystyle P(A,B\mid C)=P(A\mid C)P(B\mid C)}

iff 
  
    
      
        
          
            
              P
              (
              A
              ,
              B
              ,
              C
              )
            
            
              P
              (
              C
              )
            
          
        
        =
        
          (
          
            
              
                P
                (
                A
                ,
                C
                )
              
              
                P
                (
                C
                )
              
            
          
          )
        
        
          (
          
            
              
                P
                (
                B
                ,
                C
                )
              
              
                P
                (
                C
                )
              
            
          
          )
        
      
    
    {\displaystyle {\frac {P(A,B,C)}{P(C)}}=\left({\frac {P(A,C)}{P(C)}}\right)\left({\frac {P(B,C)}{P(C)}}\right)}
  
      (definition of <a href="/facts/Conditional_probability/QcN2UERV">conditional probability</a>)
iff 
  
    
      
        P
        (
        A
        ,
        B
        ,
        C
        )
        =
        
          
            
              P
              (
              A
              ,
              C
              )
              P
              (
              B
              ,
              C
              )
            
            
              P
              (
              C
              )
            
          
        
      
    
    {\displaystyle P(A,B,C)={\frac {P(A,C)P(B,C)}{P(C)}}}
  
       (multiply both sides by 
  
    
      
        P
        (
        C
        )
      
    
    {\displaystyle P(C)}
  
)
iff 
  
    
      
        
          
            
              P
              (
              A
              ,
              B
              ,
              C
              )
            
            
              P
              (
              B
              ,
              C
              )
            
          
        
        =
        
          
            
              P
              (
              A
              ,
              C
              )
            
            
              P
              (
              C
              )
            
          
        
      
    
    {\displaystyle {\frac {P(A,B,C)}{P(B,C)}}={\frac {P(A,C)}{P(C)}}}
  
       (divide both sides by 
  
    
      
        P
        (
        B
        ,
        C
        )
      
    
    {\displaystyle P(B,C)}
  
)
iff 
  
    
      
        P
        (
        A
        ∣
        B
        ,
        C
        )
        =
        P
        (
        A
        ∣
        C
        )
      
    
    {\displaystyle P(A\mid B,C)=P(A\mid C)}
  
       (definition of conditional probability) 
  
    
      
        ∴
      
    
    {\displaystyle \therefore }

<h3>Examples</h3>
<h4>Coloured boxes</h4>
<p>Each cell represents a possible outcome. The events 
  
    
      
        
          R
        
      
    
    {\displaystyle \color {red}R}
  
, 
  
    
      
        
          B
        
      
    
    {\displaystyle \color {blue}B}
  
 and 
  
    
      
        
          Y
        
      
    
    {\displaystyle \color {gold}Y}
  
 are represented by the areas shaded red, blue and yellow respectively. The overlap between the events 
  
    
      
        
          R
        
      
    
    {\displaystyle \color {red}R}
  
 and 
  
    
      
        
          B
        
      
    
    {\displaystyle \color {blue}B}
  
 is shaded   purple.
</p><p>The probabilities of these events are shaded areas with respect to the total area. In both examples 
  
    
      
        
          R
        
      
    
    {\displaystyle \color {red}R}
  
 and 
  
    
      
        
          B
        
      
    
    {\displaystyle \color {blue}B}
  
 are conditionally independent given 
  
    
      
        
          Y
        
      
    
    {\displaystyle \color {gold}Y}
  
 because:
</p>

Pr
        (
        
          
            R
          
        
        ,
        
          
            B
          
        
        ∣
        
          
            Y
          
        
        )
        =
        Pr
        (
        
          
            R
          
        
        ∣
        
          
            Y
          
        
        )
        Pr
        (
        
          
            B
          
        
        ∣
        
          
            Y
          
        
        )
      
    
    {\displaystyle \Pr({\color {red}R},{\color {blue}B}\mid {\color {gold}Y})=\Pr({\color {red}R}\mid {\color {gold}Y})\Pr({\color {blue}B}\mid {\color {gold}Y})}
  
<a class="footnote-ref" id="fnref:1" href="#fn:1"><sup>1</sup></a>
<p>but not conditionally independent given 
  
    
      
        
          [
          
            
              not 
            
            
              
                Y
              
            
          
          ]
        
      
    
    {\displaystyle \left[{\text{not }}{\color {gold}Y}\right]}
  
 because:
</p>

Pr
        (
        
          
            R
          
        
        ,
        
          
            B
          
        
        ∣
        
          not 
        
        
          
            Y
          
        
        )
        ≠
        Pr
        (
        
          
            R
          
        
        ∣
        
          not 
        
        
          
            Y
          
        
        )
        Pr
        (
        
          
            B
          
        
        ∣
        
          not 
        
        
          
            Y
          
        
        )
      
    
    {\displaystyle \Pr({\color {red}R},{\color {blue}B}\mid {\text{not }}{\color {gold}Y})\not =\Pr({\color {red}R}\mid {\text{not }}{\color {gold}Y})\Pr({\color {blue}B}\mid {\text{not }}{\color {gold}Y})}

<h4>Proximity and delays</h4>
<p>Let events A and B be defined as the probability that person A and person B will be home in time for dinner where both people are randomly sampled from the entire world. Events A and B can be assumed to be independent i.e. knowledge that A is late has minimal to no change on the probability that B will be late. However, if a third event is introduced, person A and person B live in the same neighborhood, the two events are now considered not conditionally independent. Traffic conditions and weather-related events that might delay person A, might delay person B as well. Given the third event and knowledge that person A was late, the probability that person B will be late does meaningfully change.<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a>
</p>
<h4>Dice rolling</h4>
<p>Conditional independence depends on the nature of the third event. If you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of one die will not tell you about the result of the second die. (That is, the two dice are independent.) If, however, the 1st die's result is a 3, and someone tells you about a third event - that the sum of the two results is even - then this extra unit of information restricts the options for the 2nd result to an odd number. In other words, two events can be independent, but NOT conditionally independent.<a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a>
</p>
<h4>Height and vocabulary</h4>
<p>Height and vocabulary are dependent since very small people tend to be children, known for their more basic vocabularies. But knowing that two people are 19 years old (i.e., conditional on age) there is no reason to think that one person's vocabulary is larger if we are told that they are taller.
</p>
<h2 id="conditional-independence-of-random-variables">Conditional independence of random variables</h2>
<p>Two discrete <a href="/facts/Random_variable/TwTBXnLT">random variables</a> 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 are conditionally independent given a third discrete random variable 
  
    
      
        Z
      
    
    {\displaystyle Z}
  
 if and only if they are <a href="/facts/Independence_(probability_theory)/NUzQtnUL">independent</a> in their <a href="/facts/Conditional_probability_distribution/0eGm3P9W">conditional probability distribution</a> given 
  
    
      
        Z
      
    
    {\displaystyle Z}
  
. That is, 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 are conditionally independent given 
  
    
      
        Z
      
    
    {\displaystyle Z}
  
 if and only if, given any value of 
  
    
      
        Z
      
    
    {\displaystyle Z}
  
, the probability distribution of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is the same for all values of 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 and the probability distribution of 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 is the same for all values of 
  
    
      
        X
      
    
    {\displaystyle X}
  
. Formally:
</p>

<table><tbody><tr><td>                    (        X        ⊥                                ⊥        Y        )        ∣        Z                        ⟺                                  F                      X            ,            Y                        ∣                        Z                        =                        z                          (        x        ,        y        )        =                  F                      X                        ∣                        Z                        =                        z                          (        x        )        ⋅                  F                      Y                        ∣                        Z                        =                        z                          (        y        )                          for all                 x        ,        y        ,        z              {\displaystyle (X\perp \!\!\!\perp Y)\mid Z\quad \iff \quad F_{X,Y\,\mid \,Z\,=\,z}(x,y)=F_{X\,\mid \,Z\,=\,z}(x)\cdot F_{Y\,\mid \,Z\,=\,z}(y)\quad {\text{for all }}x,y,z}  </td> <td></td> <td>Eq.2</td></tr></tbody></table>

<p>where 
  
    
      
        
          F
          
            X
            ,
            Y
            
            ∣
            
            Z
            
            =
            
            z
          
        
        (
        x
        ,
        y
        )
        =
        Pr
        (
        X
        ≤
        x
        ,
        Y
        ≤
        y
        ∣
        Z
        =
        z
        )
      
    
    {\displaystyle F_{X,Y\,\mid \,Z\,=\,z}(x,y)=\Pr(X\leq x,Y\leq y\mid Z=z)}
  
 is the conditional <a href="/facts/Cumulative_distribution_function/WaKU8tp4">cumulative distribution function</a> of 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 given 
  
    
      
        Z
      
    
    {\displaystyle Z}
  
.
</p><p>Two events 
  
    
      
        R
      
    
    {\displaystyle R}
  
 and 
  
    
      
        B
      
    
    {\displaystyle B}
  
 are conditionally independent given a <a href="/facts/Sigma-algebra/8LJINLNc">σ-algebra</a> 
  
    
      
        Σ
      
    
    {\displaystyle \Sigma }
  
 if
</p>

Pr
        (
        R
        ,
        B
        ∣
        Σ
        )
        =
        Pr
        (
        R
        ∣
        Σ
        )
        Pr
        (
        B
        ∣
        Σ
        )
        
           a.s.
        
      
    
    {\displaystyle \Pr(R,B\mid \Sigma )=\Pr(R\mid \Sigma )\Pr(B\mid \Sigma ){\text{ a.s.}}}

<p>where 
  
    
      
        Pr
        (
        A
        ∣
        Σ
        )
      
    
    {\displaystyle \Pr(A\mid \Sigma )}
  
 denotes the <a href="/facts/Conditional_expectation/3GxG1Q3x">conditional expectation</a> of the <a href="/facts/Indicator_function/QEuh04NM">indicator function</a> of the event 
  
    
      
        A
      
    
    {\displaystyle A}
  
, 
  
    
      
        
          χ
          
            A
          
        
      
    
    {\displaystyle \chi _{A}}
  
, given the sigma algebra 
  
    
      
        Σ
      
    
    {\displaystyle \Sigma }
  
. That is,
</p>

Pr
        (
        A
        ∣
        Σ
        )
        :=
        E
        ⁡
        [
        
          χ
          
            A
          
        
        ∣
        Σ
        ]
        .
      
    
    {\displaystyle \Pr(A\mid \Sigma ):=\operatorname {E} [\chi _{A}\mid \Sigma ].}

<p>Two random variables 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 are conditionally independent given a σ-algebra 
  
    
      
        Σ
      
    
    {\displaystyle \Sigma }
  
 if the above equation holds for all 
  
    
      
        R
      
    
    {\displaystyle R}
  
 in 
  
    
      
        σ
        (
        X
        )
      
    
    {\displaystyle \sigma (X)}
  
 and 
  
    
      
        B
      
    
    {\displaystyle B}
  
 in 
  
    
      
        σ
        (
        Y
        )
      
    
    {\displaystyle \sigma (Y)}
  
.
</p><p>Two random variables 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 are conditionally independent given a random variable 
  
    
      
        W
      
    
    {\displaystyle W}
  
 if they are independent given <i>σ</i>(<i>W</i>): the σ-algebra generated by 
  
    
      
        W
      
    
    {\displaystyle W}
  
. This is commonly written:
</p>

X
        ⊥
        
        
        
        ⊥
        Y
        ∣
        W
      
    
    {\displaystyle X\perp \!\!\!\perp Y\mid W}
  
 or

X
        ⊥
        Y
        ∣
        W
      
    
    {\displaystyle X\perp Y\mid W}

<p>This it read "
  
    
      
        X
      
    
    {\displaystyle X}
  
 is independent of 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
, given 
  
    
      
        W
      
    
    {\displaystyle W}
  
"; the conditioning applies to the whole statement: "(
  
    
      
        X
      
    
    {\displaystyle X}
  
 is independent of 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
) given 
  
    
      
        W
      
    
    {\displaystyle W}
  
".
</p>

(
        X
        ⊥
        
        
        
        ⊥
        Y
        )
        ∣
        W
      
    
    {\displaystyle (X\perp \!\!\!\perp Y)\mid W}

<p>This notation extends 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        Y
      
    
    {\displaystyle X\perp \!\!\!\perp Y}
  
 for "
  
    
      
        X
      
    
    {\displaystyle X}
  
 is <a href="/facts/Independence_(probability_theory)/NUzQtnUL">independent</a> of 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
."
</p><p>If 
  
    
      
        W
      
    
    {\displaystyle W}
  
 assumes a countable set of values, this is equivalent to the conditional independence of <i>X</i> and <i>Y</i> for the events of the form 
  
    
      
        [
        W
        =
        w
        ]
      
    
    {\displaystyle [W=w]}
  
.
Conditional independence of more than two events, or of more than two random variables, is defined analogously.
</p><p>The following two examples show that 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        Y
      
    
    {\displaystyle X\perp \!\!\!\perp Y}
  
 <i>neither implies nor is implied by</i> 
  
    
      
        (
        X
        ⊥
        
        
        
        ⊥
        Y
        )
        ∣
        W
      
    
    {\displaystyle (X\perp \!\!\!\perp Y)\mid W}
  
.
</p><p>First, suppose 
  
    
      
        W
      
    
    {\displaystyle W}
  
 is 0 with probability 0.5 and 1 otherwise.  When <i>W</i> = 0 take 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise.  When 
  
    
      
        W
        =
        1
      
    
    {\displaystyle W=1}
  
, 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 are again independent, but this time they take the value 1 with probability 0.99.  Then 
  
    
      
        (
        X
        ⊥
        
        
        
        ⊥
        Y
        )
        ∣
        W
      
    
    {\displaystyle (X\perp \!\!\!\perp Y)\mid W}
  
. But 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 are dependent, because Pr(<i>X</i> = 0) < Pr(<i>X</i> = 0|<i>Y</i> = 0).  This is because Pr(<i>X</i> = 0) = 0.5, but if <i>Y</i> = 0 then it's very likely that <i>W</i> = 0 and thus that <i>X</i> = 0 as well, so Pr(<i>X</i> = 0|<i>Y</i> = 0) > 0.5.
</p><p>For the second example, suppose 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        Y
      
    
    {\displaystyle X\perp \!\!\!\perp Y}
  
, each taking the values 0 and 1 with probability 0.5. Let 
  
    
      
        W
      
    
    {\displaystyle W}
  
 be the product 
  
    
      
        X
        ⋅
        Y
      
    
    {\displaystyle X\cdot Y}
  
.  Then when 
  
    
      
        W
        =
        0
      
    
    {\displaystyle W=0}
  
, Pr(<i>X</i> = 0) = 2/3, but Pr(<i>X</i> = 0|<i>Y</i> = 0) = 1/2, so 
  
    
      
        (
        X
        ⊥
        
        
        
        ⊥
        Y
        )
        ∣
        W
      
    
    {\displaystyle (X\perp \!\!\!\perp Y)\mid W}
  
 is false.
This is also an example of Explaining Away. See Kevin Murphy's tutorial <a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a> where 
  
    
      
        X
      
    
    {\displaystyle X}
  
 and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 take the values "brainy" and "sporty".
</p>
<h2 id="conditional-independence-of-random-vectors">Conditional independence of random vectors</h2>
<p>Two <a href="/facts/Random_vector/qMfooyVf">random vectors</a> 
  
    
      
        
          X
        
        =
        (
        
          X
          
            1
          
        
        ,
        …
        ,
        
          X
          
            l
          
        
        
          )
          
            
              T
            
          
        
      
    
    {\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{l})^{\mathrm {T} }}
  
 and 
  
    
      
        
          Y
        
        =
        (
        
          Y
          
            1
          
        
        ,
        …
        ,
        
          Y
          
            m
          
        
        
          )
          
            
              T
            
          
        
      
    
    {\displaystyle \mathbf {Y} =(Y_{1},\ldots ,Y_{m})^{\mathrm {T} }}
  
 are conditionally independent given a third random vector 
  
    
      
        
          Z
        
        =
        (
        
          Z
          
            1
          
        
        ,
        …
        ,
        
          Z
          
            n
          
        
        
          )
          
            
              T
            
          
        
      
    
    {\displaystyle \mathbf {Z} =(Z_{1},\ldots ,Z_{n})^{\mathrm {T} }}
  
 if and only if they are independent in their conditional cumulative distribution given 
  
    
      
        
          Z
        
      
    
    {\displaystyle \mathbf {Z} }
  
. Formally:
</p>

<table><tbody><tr><td>                    (                  X                ⊥                                ⊥                  Y                )        ∣                  Z                                ⟺                                  F                                    X                        ,                          Y                                      |                                      Z                        =                          z                                      (                  x                ,                  y                )        =                  F                                    X                                    ∣                                      Z                                    =                                      z                                      (                  x                )        ⋅                  F                                    Y                                    ∣                                      Z                                    =                                      z                                      (                  y                )                          for all                           x                ,                  y                ,                  z                      {\displaystyle (\mathbf {X} \perp \!\!\!\perp \mathbf {Y} )\mid \mathbf {Z} \quad \iff \quad F_{\mathbf {X} ,\mathbf {Y} |\mathbf {Z} =\mathbf {z} }(\mathbf {x} ,\mathbf {y} )=F_{\mathbf {X} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} )\cdot F_{\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {y} )\quad {\text{for all }}\mathbf {x} ,\mathbf {y} ,\mathbf {z} }  </td> <td></td> <td>Eq.3</td></tr></tbody></table>

<p>where 
  
    
      
        
          x
        
        =
        (
        
          x
          
            1
          
        
        ,
        …
        ,
        
          x
          
            l
          
        
        
          )
          
            
              T
            
          
        
      
    
    {\displaystyle \mathbf {x} =(x_{1},\ldots ,x_{l})^{\mathrm {T} }}
  
, 
  
    
      
        
          y
        
        =
        (
        
          y
          
            1
          
        
        ,
        …
        ,
        
          y
          
            m
          
        
        
          )
          
            
              T
            
          
        
      
    
    {\displaystyle \mathbf {y} =(y_{1},\ldots ,y_{m})^{\mathrm {T} }}
  
 and 
  
    
      
        
          z
        
        =
        (
        
          z
          
            1
          
        
        ,
        …
        ,
        
          z
          
            n
          
        
        
          )
          
            
              T
            
          
        
      
    
    {\displaystyle \mathbf {z} =(z_{1},\ldots ,z_{n})^{\mathrm {T} }}
  
 and the conditional cumulative distributions are defined as follows.
</p>

F
                  
                    
                      X
                    
                    ,
                    
                      Y
                    
                    
                    ∣
                    
                    
                      Z
                    
                    
                    =
                    
                    
                      z
                    
                  
                
                (
                
                  x
                
                ,
                
                  y
                
                )
              
              
                
                =
                Pr
                (
                
                  X
                  
                    1
                  
                
                ≤
                
                  x
                  
                    1
                  
                
                ,
                …
                ,
                
                  X
                  
                    l
                  
                
                ≤
                
                  x
                  
                    l
                  
                
                ,
                
                  Y
                  
                    1
                  
                
                ≤
                
                  y
                  
                    1
                  
                
                ,
                …
                ,
                
                  Y
                  
                    m
                  
                
                ≤
                
                  y
                  
                    m
                  
                
                ∣
                
                  Z
                  
                    1
                  
                
                =
                
                  z
                  
                    1
                  
                
                ,
                …
                ,
                
                  Z
                  
                    n
                  
                
                =
                
                  z
                  
                    n
                  
                
                )
              
            
            
              
                
                  F
                  
                    
                      X
                    
                    
                    ∣
                    
                    
                      Z
                    
                    
                    =
                    
                    
                      z
                    
                  
                
                (
                
                  x
                
                )
              
              
                
                =
                Pr
                (
                
                  X
                  
                    1
                  
                
                ≤
                
                  x
                  
                    1
                  
                
                ,
                …
                ,
                
                  X
                  
                    l
                  
                
                ≤
                
                  x
                  
                    l
                  
                
                ∣
                
                  Z
                  
                    1
                  
                
                =
                
                  z
                  
                    1
                  
                
                ,
                …
                ,
                
                  Z
                  
                    n
                  
                
                =
                
                  z
                  
                    n
                  
                
                )
              
            
            
              
                
                  F
                  
                    
                      Y
                    
                    
                    ∣
                    
                    
                      Z
                    
                    
                    =
                    
                    
                      z
                    
                  
                
                (
                
                  y
                
                )
              
              
                
                =
                Pr
                (
                
                  Y
                  
                    1
                  
                
                ≤
                
                  y
                  
                    1
                  
                
                ,
                …
                ,
                
                  Y
                  
                    m
                  
                
                ≤
                
                  y
                  
                    m
                  
                
                ∣
                
                  Z
                  
                    1
                  
                
                =
                
                  z
                  
                    1
                  
                
                ,
                …
                ,
                
                  Z
                  
                    n
                  
                
                =
                
                  z
                  
                    n
                  
                
                )
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}F_{\mathbf {X} ,\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} ,\mathbf {y} )&=\Pr(X_{1}\leq x_{1},\ldots ,X_{l}\leq x_{l},Y_{1}\leq y_{1},\ldots ,Y_{m}\leq y_{m}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\\[6pt]F_{\mathbf {X} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} )&=\Pr(X_{1}\leq x_{1},\ldots ,X_{l}\leq x_{l}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\\[6pt]F_{\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {y} )&=\Pr(Y_{1}\leq y_{1},\ldots ,Y_{m}\leq y_{m}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\end{aligned}}}

<h2 id="uses-in-bayesian-inference">Uses in Bayesian inference</h2>
<p>Let <i>p</i> be the proportion of voters who will vote "yes" in an upcoming <a href="/facts/Referendum/DaF49jCL">referendum</a>. In taking an <a href="/facts/Opinion_poll/59PEtq3t">opinion poll</a>, one chooses <i>n</i> voters randomly from the population. For <i>i</i> = 1, ..., <i>n</i>, let <i>X</i><i>i</i> = 1 or 0 corresponding, respectively, to whether or not the <i>i</i>th chosen voter will or will not vote "yes".
</p><p>In a <a href="/facts/Frequency_probability/1f1TjSey">frequentist</a> approach to <a href="/facts/Statistical_inference/sJLjubxm">statistical inference</a> one would not attribute any probability distribution to <i>p</i> (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that <i>X</i>1, ..., <i>X</i><i>n</i> are <a href="/facts/Statistical_independence/NUzQtnUL">independent</a> random variables.
</p><p>By contrast, in a <a href="/facts/Bayesian_inference/tEpK3zLx">Bayesian</a> approach to statistical inference, one would assign a <a href="/facts/Probability_distribution/EpsKKVRu">probability distribution</a> to <i>p</i> regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that <i>p</i> is in any interval to which a probability is assigned. In that model, the random variables <i>X</i>1, ..., <i>X</i><i>n</i> are <i>not</i> independent, but they are conditionally independent given the value of <i>p</i>. In particular, if a large number of the <i>X</i>s are observed to be equal to 1, that would imply a high <a href="/facts/Conditional_probability/QcN2UERV">conditional probability</a>, given that observation, that <i>p</i> is near 1, and thus a high <a href="/facts/Conditional_probability/QcN2UERV">conditional probability</a>, given that observation, that the <i>next</i> <i>X</i> to be observed will be equal to 1.
</p>
<h2 id="rules-of-conditional-independence">Rules of conditional independence</h2>
<p>A set of rules governing statements of conditional independence have been derived from the basic definition.<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a><a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a>
</p><p>These rules were termed "<a href="/facts/Graphoid/fdqjpV04">Graphoid</a> Axioms"
by Pearl and Paz,<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a> because they hold in graphs, where 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        A
        ∣
        B
      
    
    {\displaystyle X\perp \!\!\!\perp A\mid B}
  
 is interpreted to mean: "All paths from <i>X</i> to <i>A</i> are intercepted by the set <i>B</i>".<a class="footnote-ref" id="fnref:8" href="#fn:8"><sup>8</sup></a>
</p>
<h3>Symmetry</h3>

X
        ⊥
        
        
        
        ⊥
        Y
        
        ⇒
        
        Y
        ⊥
        
        
        
        ⊥
        X
      
    
    {\displaystyle X\perp \!\!\!\perp Y\quad \Rightarrow \quad Y\perp \!\!\!\perp X}

<p>Proof:
</p><p>Note that we are required to prove if 
  
    
      
        P
        (
        X
        
          |
        
        Y
        )
        =
        P
        (
        X
        )
      
    
    {\displaystyle P(X|Y)=P(X)}
  
 then 
  
    
      
        P
        (
        Y
        
          |
        
        X
        )
        =
        P
        (
        Y
        )
      
    
    {\displaystyle P(Y|X)=P(Y)}
  
. Note that if 
  
    
      
        P
        (
        X
        
          |
        
        Y
        )
        =
        P
        (
        X
        )
      
    
    {\displaystyle P(X|Y)=P(X)}
  
 then it can be shown 
  
    
      
        P
        (
        X
        ,
        Y
        )
        =
        P
        (
        X
        )
        P
        (
        Y
        )
      
    
    {\displaystyle P(X,Y)=P(X)P(Y)}
  
. Therefore 
  
    
      
        P
        (
        Y
        
          |
        
        X
        )
        =
        P
        (
        X
        ,
        Y
        )
        
          /
        
        P
        (
        X
        )
        =
        P
        (
        X
        )
        P
        (
        Y
        )
        
          /
        
        P
        (
        X
        )
        =
        P
        (
        Y
        )
      
    
    {\displaystyle P(Y|X)=P(X,Y)/P(X)=P(X)P(Y)/P(X)=P(Y)}
  
 as required.
</p>
<h3>Decomposition</h3>

X
        ⊥
        
        
        
        ⊥
        A
        ,
        B
        
        ⇒
        
        
           and 
        
        
          
            {
            
              
                
                  X
                  ⊥
                  
                  
                  
                  ⊥
                  A
                
              
              
                
                  X
                  ⊥
                  
                  
                  
                  ⊥
                  B
                
              
            
            
          
        
      
    
    {\displaystyle X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\\X\perp \!\!\!\perp B\end{cases}}}

<p>Proof
</p>
<ul><li>
  
    
      
        
          p
          
            X
            ,
            A
            ,
            B
          
        
        (
        x
        ,
        a
        ,
        b
        )
        =
        
          p
          
            X
          
        
        (
        x
        )
        
          p
          
            A
            ,
            B
          
        
        (
        a
        ,
        b
        )
      
    
    {\displaystyle p_{X,A,B}(x,a,b)=p_{X}(x)p_{A,B}(a,b)}
  
      (meaning of 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        A
        ,
        B
      
    
    {\displaystyle X\perp \!\!\!\perp A,B}
  
)</li>
<li>
  
    
      
        
          ∫
          
            B
          
        
        
          p
          
            X
            ,
            A
            ,
            B
          
        
        (
        x
        ,
        a
        ,
        b
        )
        
        d
        b
        =
        
          ∫
          
            B
          
        
        
          p
          
            X
          
        
        (
        x
        )
        
          p
          
            A
            ,
            B
          
        
        (
        a
        ,
        b
        )
        
        d
        b
      
    
    {\displaystyle \int _{B}p_{X,A,B}(x,a,b)\,db=\int _{B}p_{X}(x)p_{A,B}(a,b)\,db}
  
      (ignore variable <i>B</i> by integrating it out)</li>
<li>
  
    
      
        
          p
          
            X
            ,
            A
          
        
        (
        x
        ,
        a
        )
        =
        
          p
          
            X
          
        
        (
        x
        )
        
          p
          
            A
          
        
        (
        a
        )
      
    
    {\displaystyle p_{X,A}(x,a)=p_{X}(x)p_{A}(a)}
  
     </li></ul>
<p>A similar proof shows the independence of <i>X</i> and <i>B</i>.
</p>
<h3>Weak union</h3>

X
        ⊥
        
        
        
        ⊥
        A
        ,
        B
        
        ⇒
        
        
           and 
        
        
          
            {
            
              
                
                  X
                  ⊥
                  
                  
                  
                  ⊥
                  A
                  ∣
                  B
                
              
              
                
                  X
                  ⊥
                  
                  
                  
                  ⊥
                  B
                  ∣
                  A
                
              
            
            
          
        
      
    
    {\displaystyle X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\mid A\end{cases}}}

<p>Proof
</p>
<ul><li>By assumption, 
  
    
      
        Pr
        (
        X
        )
        =
        Pr
        (
        X
        ∣
        A
        ,
        B
        )
      
    
    {\displaystyle \Pr(X)=\Pr(X\mid A,B)}
  
.</li>
<li>Due to the property of decomposition 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        B
      
    
    {\displaystyle X\perp \!\!\!\perp B}
  
, 
  
    
      
        Pr
        (
        X
        )
        =
        Pr
        (
        X
        ∣
        B
        )
      
    
    {\displaystyle \Pr(X)=\Pr(X\mid B)}
  
.</li>
<li>Combining the above two equalities gives 
  
    
      
        Pr
        (
        X
        ∣
        B
        )
        =
        Pr
        (
        X
        ∣
        A
        ,
        B
        )
      
    
    {\displaystyle \Pr(X\mid B)=\Pr(X\mid A,B)}
  
, which establishes 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        A
        ∣
        B
      
    
    {\displaystyle X\perp \!\!\!\perp A\mid B}
  
.</li></ul>
<p>The second condition can be proved similarly.
</p>
<h3>Contraction</h3>

X
                  ⊥
                  
                  
                  
                  ⊥
                  A
                  ∣
                  B
                
              
              
                
                  X
                  ⊥
                  
                  
                  
                  ⊥
                  B
                
              
            
          
          }
        
        
           and 
        
        
        ⇒
        
        X
        ⊥
        
        
        
        ⊥
        A
        ,
        B
      
    
    {\displaystyle \left.{\begin{aligned}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\end{aligned}}\right\}{\text{ and }}\quad \Rightarrow \quad X\perp \!\!\!\perp A,B}

<p>Proof
</p><p>This property can be proved by noticing 
  
    
      
        Pr
        (
        X
        ∣
        A
        ,
        B
        )
        =
        Pr
        (
        X
        ∣
        B
        )
        =
        Pr
        (
        X
        )
      
    
    {\displaystyle \Pr(X\mid A,B)=\Pr(X\mid B)=\Pr(X)}
  
, each equality of which is asserted by 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        A
        ∣
        B
      
    
    {\displaystyle X\perp \!\!\!\perp A\mid B}
  
 and 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        B
      
    
    {\displaystyle X\perp \!\!\!\perp B}
  
, respectively.
</p>
<h3>Intersection</h3>
<p>For strictly positive probability distributions,<a class="footnote-ref" id="fnref:9" href="#fn:9"><sup>9</sup></a> the following also holds:
</p>

X
                  ⊥
                  
                  
                  
                  ⊥
                  Y
                  ∣
                  Z
                  ,
                  W
                
              
              
                
                  X
                  ⊥
                  
                  
                  
                  ⊥
                  W
                  ∣
                  Z
                  ,
                  Y
                
              
            
          
          }
        
        
           and 
        
        
        ⇒
        
        X
        ⊥
        
        
        
        ⊥
        W
        ,
        Y
        ∣
        Z
      
    
    {\displaystyle \left.{\begin{aligned}X\perp \!\!\!\perp Y\mid Z,W\\X\perp \!\!\!\perp W\mid Z,Y\end{aligned}}\right\}{\text{ and }}\quad \Rightarrow \quad X\perp \!\!\!\perp W,Y\mid Z}

<p>Proof
</p><p>By assumption:
</p>

P
        (
        X
        
          |
        
        Z
        ,
        W
        ,
        Y
        )
        =
        P
        (
        X
        
          |
        
        Z
        ,
        W
        )
        ∧
        P
        (
        X
        
          |
        
        Z
        ,
        W
        ,
        Y
        )
        =
        P
        (
        X
        
          |
        
        Z
        ,
        Y
        )
        
        ⟹
        
        P
        (
        X
        
          |
        
        Z
        ,
        Y
        )
        =
        P
        (
        X
        
          |
        
        Z
        ,
        W
        )
      
    
    {\displaystyle P(X|Z,W,Y)=P(X|Z,W)\land P(X|Z,W,Y)=P(X|Z,Y)\implies P(X|Z,Y)=P(X|Z,W)}

<p>Using this equality, together with the <a href="/facts/Law_of_total_probability/rq6NTRn1">Law of total probability</a> applied to 
  
    
      
        P
        (
        X
        
          |
        
        Z
        )
      
    
    {\displaystyle P(X|Z)}
  
:
</p>

P
                (
                X
                
                  |
                
                Z
                )
              
              
                
                =
                
                  ∑
                  
                    w
                    ∈
                    W
                  
                
                P
                (
                X
                
                  |
                
                Z
                ,
                W
                =
                w
                )
                P
                (
                W
                =
                w
                
                  |
                
                Z
                )
              
            
            
              
              
                
                =
                
                  ∑
                  
                    w
                    ∈
                    W
                  
                
                P
                (
                X
                
                  |
                
                Y
                ,
                Z
                )
                P
                (
                W
                =
                w
                
                  |
                
                Z
                )
              
            
            
              
              
                
                =
                P
                (
                X
                
                  |
                
                Z
                ,
                Y
                )
                
                  ∑
                  
                    w
                    ∈
                    W
                  
                
                P
                (
                W
                =
                w
                
                  |
                
                Z
                )
              
            
            
              
              
                
                =
                P
                (
                X
                
                  |
                
                Z
                ,
                Y
                )
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}P(X|Z)&=\sum _{w\in W}P(X|Z,W=w)P(W=w|Z)\\[4pt]&=\sum _{w\in W}P(X|Y,Z)P(W=w|Z)\\[4pt]&=P(X|Z,Y)\sum _{w\in W}P(W=w|Z)\\[4pt]&=P(X|Z,Y)\end{aligned}}}

<p>Since 
  
    
      
        P
        (
        X
        
          |
        
        Z
        ,
        W
        ,
        Y
        )
        =
        P
        (
        X
        
          |
        
        Z
        ,
        Y
        )
      
    
    {\displaystyle P(X|Z,W,Y)=P(X|Z,Y)}
  
 and 
  
    
      
        P
        (
        X
        
          |
        
        Z
        ,
        Y
        )
        =
        P
        (
        X
        
          |
        
        Z
        )
      
    
    {\displaystyle P(X|Z,Y)=P(X|Z)}
  
, it follows that 
  
    
      
        P
        (
        X
        
          |
        
        Z
        ,
        W
        ,
        Y
        )
        =
        P
        (
        X
        
          |
        
        Z
        )
        
        ⟺
        
        X
        ⊥
        
        
        
        ⊥
        Y
        ,
        W
        
          |
        
        Z
      
    
    {\displaystyle P(X|Z,W,Y)=P(X|Z)\iff X\perp \!\!\!\perp Y,W|Z}
  
.
</p><p>Technical note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say <i>K</i>. For example, 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        Y
        ⇒
        Y
        ⊥
        
        
        
        ⊥
        X
      
    
    {\displaystyle X\perp \!\!\!\perp Y\Rightarrow Y\perp \!\!\!\perp X}
  
 would also mean that 
  
    
      
        X
        ⊥
        
        
        
        ⊥
        Y
        ∣
        K
        ⇒
        Y
        ⊥
        
        
        
        ⊥
        X
        ∣
        K
      
    
    {\displaystyle X\perp \!\!\!\perp Y\mid K\Rightarrow Y\perp \!\!\!\perp X\mid K}
  
.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Graphoid/fdqjpV04">Graphoid</a></li>
<li><a href="/facts/Conditional_dependence/NrFCMzeg">Conditional dependence</a></li>
<li><a href="/facts/De_Finetti%27s_theorem/dLpasxgn">de Finetti's theorem</a></li>
<li><a href="/facts/Conditional_expectation/3GxG1Q3x">Conditional expectation</a></li></ul>

<h2 id="external-links">External links</h2>
<ul><li> Media related to Conditional independence at Wikimedia Commons</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>To see that this is the case, one needs to realise that Pr(R ∩ B | Y) is the probability of an overlap of R and B (the purple shaded area) in the Y area. Since, in the picture on the left, there are two squares where R and B overlap within the Y area, and the Y area has twelve squares, Pr(R ∩ B | Y) = ⁠2/12⁠ = ⁠1/6⁠. Similarly, Pr(R | Y) = ⁠4/12⁠ = ⁠1/3⁠ and Pr(B | Y) = ⁠6/12⁠ = ⁠1/2⁠. <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Could someone explain conditional independence? <a href="https://math.stackexchange.com/q/23093" target="_blank">https://math.stackexchange.com/q/23093</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Could someone explain conditional independence? <a href="https://math.stackexchange.com/q/23093" target="_blank">https://math.stackexchange.com/q/23093</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>"Graphical Models". <a href="http://people.cs.ubc.ca/~murphyk/Bayes/bnintro.html" target="_blank">http://people.cs.ubc.ca/~murphyk/Bayes/bnintro.html</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Dawid, A. P. (1979). "Conditional Independence in Statistical Theory". Journal of the Royal Statistical Society, Series B. 41 (1): 1–31. JSTOR 2984718. MR 0535541. <a href="/wiki/Philip_Dawid" target="_blank">/wiki/Philip_Dawid</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>Pearl, Judea; Paz, Azaria (1986). "Graphoids: Graph-Based Logic for Reasoning about Relevance Relations or When would x tell you more about y if you already know z?". In du Boulay, Benedict; Hogg, David C.; Steels, Luc (eds.). Advances in Artificial Intelligence II, Seventh European Conference on Artificial Intelligence, ECAI 1986, Brighton, UK, July 20–25, 1986, Proceedings (PDF). North-Holland. pp. 357–363. <a href="/wiki/Judea_Pearl" target="_blank">/wiki/Judea_Pearl</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann. ISBN 9780934613736. <a href="9780934613736" target="_blank">9780934613736</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
<li id="fn:9"><p>J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press <a href="#fnref:9" class="footnote-back-ref">↩</a></p></li>
</ol>

Conditional independence open-in-new

Conditional independence