Ch.8: Statistics for MachineLearning
Sen-ching Samson Cheung
Representing Data
Categorical or Nominal
Classes with no ordering
e.g. dessert = {ice-cream, pudding, cake, fruit}
Represented by 1-of-m encoding
Ice-cream = (0,0,0,1), pudding = (0,0,1,0), …
Ordinal
Classes with ordering
e.g. cold=-1, cool=0, warm=+1, hot=+2
Numerical
Real numbers
Continuous Distributions
Probability Density Function p(x)
Expectation
Inference on continuous distribution relies onintegration (versus summation).
Other important concepts
Concept
Definitions
K-th moment
Cumulative DistributionFunction
Moment Generating Function
Mode
Covariance Matrix
Correlation Matrix
Change of variables
Given p(x) and a bijective transformationy=f(x)
where
Skewness and Kurtosis
Skewness
Kurtosis
image403.gif
>0
=0
<0
Empirical Distribution
From data points to distribution
DiscreteContinuous
Sample mean and covariance
Entropy and Information
(Average) Entropy


It measures the average number of bits (uncertainty) to represent a symbol drawn from the distribution p(x). 
Conditional Entropy
𝐻 𝑋 𝑌 =−   log 𝑝(𝑥|𝑦)   𝑝(𝑥|𝑦) 
Mutual Information
𝑀𝐼 𝑋,𝑌 =𝐻 𝑋 −𝐻 𝑋 𝑌 =𝐻 𝑌 −𝐻(𝑌|𝑋)
=𝐻(X)
Kullback-Leibler Divergence
“Difference” between two distributions


It is always positive because it represents the extra bandwidth needed to compress a source with a wrong distribution
	𝐾𝐿 𝑞 𝑝 =−   log 𝑝 𝑥    𝑞 𝑥  −𝐻 𝑞 
	
And zero if and only if p(x)=q(x).
Classical Discrete Distributions
Bernoulli Distribution
Discrete Binary Variable x
Dom(x) = {0,1}
Parameters:  p(x=1)=
Properties
1.p(x=0)=1-
2.<x>=
3.Var(x)=(1-)
Categorical Distribution
Discrete Variable x
Dom(x) = {1,…,C}
Parameters:  p(x=c)=c
Classical Discrete Distributions
Binomial Distribution
Discrete Variable 𝑦= 𝑖=1 𝑛  𝑥 𝑖   where  𝑥 𝑖 ’s are independent Bernoulli variables
dom(y)={0,1,2,…,n}
Parameters=n, 
Properties
Eq :
<y>=n
Var(y)=n(1- )
File:Binomial distribution pmf.svg
Classical Discrete Distributions
Multinomial Distribution
Discrete vector 

 𝑦 𝑖 = 𝑗=1 𝑛  𝕀 { 𝑥 𝑗 =𝑖}   where { 𝑥 1 ,…,  𝑥 𝑛 } are iid K-valued categorical data with parameters { 𝜃 1 ,…, 𝜃 𝐾 }
Properties
Classical Discrete Distribution
Poisson
Model discrete events where the average scales with the length of the observation interval
Discrete variable x with dom(x)={0,1,2,…}
Parameters: 
	𝑝 𝑥=𝑘 𝜆 = 1 𝑘!  𝑒 −𝜆  𝜆 𝑘 
	 𝑥 =𝜆  and var(𝑥)=𝜆
File:Poisson pmf.svg
Classical Continuous Distribution
Uniform
Continuous variable x with dom(x) = [a,b]
𝑝 𝑥 = 1 𝑏−𝑎 ,    𝑥 = 𝑎+𝑏 2 ,  
var(x)=  (𝑏−𝑎) 2  12 
Exponential
Continuous variable x with dom(x) = 0,∞ 
𝑝 𝑥 =𝜆 𝑒 −𝜆𝑥 ,    𝑥 = 1 𝜆 ,  
var(x)= 1  𝜆 2
File:Uniform Distribution PDF SVG.svg
File:Exponential pdf.svg
Classical Continuous Distribution
Gamma Distribution (α=shape, =scale)

	      where
 𝑥 =𝛼𝛽  and  var(x)=𝛼 𝛽 2 
For integer α, x represents the sum of α independent exponentially distributed RV with parameter 1/