a_Page_01
(Chapter 9.5, 12)
a_Page_02
Straightforward ML does not work
Structure Learning is a Model Selection problem
Table learning is a Parameter Estimation problem
Model is defined by complexity:
More complex models have more parameters.
For structure learning, more complex models havemore edges (or fewer conditional independencestatements.)
Many other model selection problems:
How many classes in a clustering/classification problem?
Independent versus dependent?
Simple Example
Ground Truth
Binary random variables
Only B and C are dependent
Data generated are slightlycorrupted by measurement noise
Many measurements are collected
A
B
D
E
C
First Model: Too simple
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
6
~0.9
8
~0.99
9
~0.999
Second Model: Perfect!
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
6
~0.9
8
~0.99
9
~0.999
Third Model: Fit the noise too!
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
6
~0.9
8
~0.99
9
~0.999
Third Model: Too much overfitting!
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
6
~0.9
8
~0.99
9
~0.999
(Jump to Chapter 12)
(Jump to Chapter 12)
Use uniform prior P(|M)
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
0
6
~0.9
~0.9/N6
8
~0.99
~O(1)0.99/N8
9
~0.999
~O(1)0.999/N9
Use uniform prior P(|M)
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
0
6
~0.9
~0.9/N6
8
~0.99
~O(1)0.99/N8
9
~0.999
~O(1)0.999/N9
Use uniform prior P(|M)
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
0
6
~0.9
~0.9/N6
8
~0.99
~O(1)0.99/N8
9
~0.999
~O(1)0.999/N9
Use uniform prior P(|M)
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
0
6
~0.9
~0.9/N6
8
~0.99
~O(1)0.99/N8
9
~0.999
~O(1)0.999/N9
Use uniform prior P(|M)
Models
Complexity = # ofparameters
Data Likelihood
P(D|,M)
Model Likelihood
P(D|M)
5
~0
0
6
~0.9
~0.9/N6
8
~0.99
~O(1)0.99/N8
9
~0.999
~O(1)0.999/N9
a_Page_13
V
X
Y
Versus
X
Y
a_Page_13
a_Page_13
a_Page_14
a_Page_15
a_Page_15
a_Page_15
a_Page_16
a_Page_17
a_Page_18
a_Page_03
a_Page_03
a_Page_03
a_Page_03
a_Page_04
a_Page_05
a_Page_06
a_Page_07
a_Page_08
a_Page_09
a_Page_10
a_Page_11
a_Page_12
a_Page_12
a_Page_12
a_Page_12
a_Page_19
a_Page_19
a_Page_20
a_Page_20
a_Page_21
a_Page_21
Total Pseudo Counts = 10
U(Y=0) = 5;   U(Y=1) = 5
U(X=0) = 5;   U(X=1) = 5
U(Z=0;X=0) = 2.5
U(Z=0;X=1) = 2.5
X
Y
Z
a_Page_21
a_Page_23
a_Page_24
Should be p(x), but since the average only involves xi andxpa(i) , it is sufficient to use just the marginal.
a_Page_25
Even though q is removed, the search space is nowcharacterized by the unknown pa(*) relationship
a_Page_26
a_Page_27
a_Page_28
a_Page_29