This chapter is the real deal. So I decided to break the video
down into multiple parts, hopefully making your learning a little bit
easier.

However, if you prefer to watch it in one shot, here it is:

## Part 1: Motivation (read 3.3 up to 3.3.1)

Slide 1-2: Don't worry too much if you do not fully grasp
what
I am saying.

The point I am trying to make is that the specification of
conditional probability often matches how we interpret various
prediciton and classification problem.

That's why belief network (BN) is one of my (and
many others') favorite.

Also in Remark 3.3 of the text, the author mentions "Markov Blanket" of
a node X - it represents the smallest set of evidence nodes to make X
conditionally independent from the rest of the graph. It's a bit early
to introduce this but you should be able to find the Markov Blanket for
any node by the end of the lecture.

## Part 2: Collider (read 3.3.1-3.3.2)

Slide 3-5: Our goal here is to deduce conditional independence for a
given BN. The key is to the analogy between connectivity and
conditional
independence. Except for collider node, evidence blocks dependence.

For collider node, evidence introduces dependence. In fact, any
evidence as a descendent of a collider node will allow dependence to
flow through (see the third example with the 4-node graph on page 42.)

The mathematical
reasons are provided in the slides, but you should explain to yourself
why by coming up with some word examples.

## Part 3: d-separation (read 3.3.3-3.3.5)

Slide 6-7: For a more complicated BN, you can
still check conditional independence of two nodes by making sure that
all paths between them are "blocked" by evidence (conditioning set.)

Again, collider nodes are more problematic so you need to watch out for
it. If you want to find a full algorithm to do this, check out the
"Bayes Ball" algorithm in Remark 3.5.

An alternative term called
d-separation is introduced to describe two set of nodes being "blocked"
by evidence.
Why do we need a new name? d-separation is a graph-theoretical term
specific to BN while conditional independence has a much broader
meaning in probability.

## Part 4: Markov Equivalence and Limited Expressibility (read
3.3.6-3.3.7)

Slide 8-9: Given a set of conditional independence
statements, it turns out that there are more than one BN that can
represent them. They are called Markov Equivalent BNs.

In practice, we
may refer one over the other as it might be easier to get data for one
over the other. For example, we might want to use generative over
discriminative if it is easier to describe the conditional probability
in data generation given a particular class.

BN has limitations in the
sense that not all conditional independence can be represented. That's
why we will study other types of graphical models in the next chapter.

## Part 5: Casuality (read 3.4)

Slide 10-12: This is a fascinating topic, though
I will not hold you responsible. The key is that arrows in BN DO NOT MEAN causation.

Causality is a much deeper concept that must be handled with care,
otherwise paradox may arise.

The example discussed in the video should
be fairly obvious but the general theory of causality is beyond the
scope of this course.