Lecture 6: Introduction to Belief Networks

This chapter is the real deal. So I decided to break the video down into multiple parts, hopefully making your learning a little bit easier.

However, if you prefer to watch it in one shot, here it is:

Part 1: Motivation (read 3.3 up to 3.3.1)

Slide 1-2:  Don't worry too much if you do not fully grasp what I am saying.

The point I am trying to make is that the specification of conditional probability often matches how we interpret various prediciton and classification problem.
That's why belief network (BN) is one of my (and many others') favorite.

Also in Remark 3.3 of the text, the author mentions "Markov Blanket" of a node X - it represents the smallest set of evidence nodes to make X conditionally independent from the rest of the graph. It's a bit early to introduce this but you should be able to find the Markov Blanket for any node by the end of the lecture.

Part 2: Collider (read 3.3.1-3.3.2)

Slide 3-5: Our goal here is to deduce conditional independence for a given BN. The key is to the analogy between connectivity and conditional independence. Except for collider node, evidence blocks dependence.

For collider node, evidence introduces dependence. In fact, any evidence as a descendent of a collider node will allow dependence to flow through (see the third example with the 4-node graph on page 42.)

The mathematical reasons are provided in the slides, but you should explain to yourself why by coming up with some word examples.

Part 3: d-separation (read 3.3.3-3.3.5)

Slide 6-7:  For a more complicated BN, you can still check conditional independence of two nodes by making sure that all paths between them are "blocked" by evidence (conditioning set.)

Again, collider nodes are more problematic so you need to watch out for it. If you want to find a full algorithm to do this, check out the "Bayes Ball" algorithm in Remark 3.5.

An alternative term called d-separation is introduced to describe two set of nodes being "blocked" by evidence. Why do we need a new name? d-separation is a graph-theoretical term specific to BN while conditional independence has a much broader meaning in probability.

Part 4: Markov Equivalence and Limited Expressibility (read 3.3.6-3.3.7)

Slide 8-9:  Given a set of conditional independence statements, it turns out that there are more than one BN that can represent them. They are called Markov Equivalent BNs.

In practice, we may refer one over the other as it might be easier to get data for one over the other. For example, we might want to use generative over discriminative if it is easier to describe the conditional probability in data generation given a particular class.

BN has limitations in the sense that not all conditional independence can be represented. That's why we will study other types of graphical models in the next chapter.

Part 5: Casuality (read 3.4)

Slide 10-12:  This is a fascinating topic, though I will not hold you responsible. The key is that arrows in BN DO NOT MEAN causation.

Causality is a much deeper concept that must be handled with care, otherwise paradox may arise.

The example discussed in the video should be fairly obvious but the general theory of causality is beyond the scope of this course.