EE 639 Advanced Topics in Signal Processing and Communication

Fall 2004: Multimedia Information System

[Home] [Syllabus] [Lectures] [Readings] [Homework] [Project Recommendations] [Final Projects] [Announcements]

Mon, 29 Nov 2004 15:02:57 -0800 (PST) "Sen-ching Samson Cheung" said:

Hi folks:
A couple of announcements: 
1. Final presentations will be held on 12/17 1-3pm at Anderson Hall
Room 453. The final reports will be due at the same time. Please let
me know of any conflict ASAP.   
2. Just a reminder, the length of the presentation is 10 minutes for
1-person project and 20 minutes for 2-person projects. The minimum
length for report is 4-page for 1-person project and 8-page for
3. I am going to send out an email to both the ECE department and the
Center for Visualization and Virtual Environments about the time and
location of final presentation. Please let me know by the end of
tomorrow on any possible changes on your project title.  
4. I have just uploaded the watermarking and distance preserving
mapping lectures, plus a few watermarking papers. If you feel that you
need some more time to work on the homework, feel free to turn it in

Tue, 28 Sep 2004 03:18:29 -0700 (PDT) "Samson Cheung" said:

Hi All:
I have some thoughts about the format of HW2 writeup. Since it is so
close to the due date, I will not deduct any points if you do not
adhere to these guidelines.   
For the recall and precision in HW2, I would appreciate if you can
provide me with a plot (or table) for each query. Let say you MANUALLY
identify five ground-truth images in the database that are similar to
query 1. Adjust your distance threshold to measure precision at five
recall levels 0.2, 0.4, 0.6, 0.8, and 1.0. These recall levels
correspond to retrieving 1, 2, 3, 4, and 5 ground-truth
images. Remember the precision is the ratio of the number of
ground-truth images retrieved to the total number of images retrieved
at the selected threshold level.  
In the HW statement, I ask you not to include any images. Now I think
about it, it will actually be easier for me if you can include the
images in your writeup. Unless you have finished the HW already, I
will appreciate if you can put your ground-truth images and the top
ten retrieval images at recall level 1.0 in your writeup. If you have
problem printing color images, you can send me electronicially. 
By the way, I have put HW3 and a number of video-shot detection papers
up on the class website. 
Good luck with your homework.

Thu, 23 Sep 2004 11:27:17 -0700 (PDT) "Samson Cheung" said:

Hi All:

Lisa Carter, the audio-visual archivist of UK and
project manager of the Analog Video Digital Conversion
Project of KET, has kindly agreed to give us a guided
tour about her project at KET. It is scheduled on 9/29
(Wed) between 1 and 2. 

For those who do not know, KET is the public
television network of Kentucky. KET is at 600 Cooper
Dr. which is across the street (to the east) from
Lexington Community College and (to the south) the
football practice field and diagonal from the Johnson
Center parking lot near the tennis courts. We will be
meeting around 12:50 in the lobby of the building
(there are two KET buildings, one next to each other.
Only one has a big lobby with multiple TV.) Carpooling
is highly encouraged and please let me know if any of
you wants to carpool with me. 

The project Lisa leads involves digitizing and
analysing public broadcast television programs
archived over the last ten plus years. I have included
her email which has some links to the presentations
she gave to archivists. The part that especially
interests me is their use of Virage's VideoLogger that
can segment video sequences, identify faces and
videotext and extract transcript from both closed
caption and speech recognizer. 

Lisa told me that they have a great deal of problem
with VideoLogger's speech recognization as it
consistently achieves correct rate less than 10%. This
is a real issue as many of the older programs do not
carry closed caption and transcript is the main (only)
indexing tool KET uses for searching video. A really
neat project will be to train an open-source speech
recognizer on related programs with closed captions
and see how much improvement it will have. This is an
excellent opportunity to work with expert archivists
and state-of-the-art video analysis and archiving
infrastructure. I really hope someone can consider
this as their final project. I have some experiences
with different speech recognizers and you are more
than welcome to discuss with me.


Wed, 8 Sep 2004 14:35:55 "Nathir Rawashdeh" said:

In HW1, problem 2 statement, it says " The encoder also generates a
file stat.dat. The contents of this file ... "  
Il-Won and I found that the statistics from the encoder run are more
easily accessible in "matlab_file.txt" as opposed to "stat.dat"  
stat.dat is also mentioned in problem 2 part (b). It should say

Thu, 2 Sep 2004 14:28:39 "Sen-ching Samson Cheung" said:

Wei asked me a question that may be of interest to the entire
class. He asked, "30 frames/second progressive scan seems to transmit
exactly the same amount of data as 60 fields/second interlace. So,
what is the deal with interlace?" 
Well, I have to look it up. It turns out that in a well-lit
environment like your living room, a display should have a refresh
rate of at least 50-60Hz otherwise we will see flickering (for those
who still use the good old CRT for computer display, change the
refresh rate to below 50Hz and you will see what I mean).  
One trick is to flash the same frame twice, which is commonly done in
a movie theater (they also dim the light which further reduce the
amount of perceived flickering). But your old TV certainly does not
have the memory buffer to store the entire video frame. So to jack up
the refresh rate, the broadcast engineers decide to go up to 60Hz but
use interlace in order to keep the same bandwidth.  

Wed, 1 Sep 2004 15:51:12 "Sen-ching Samson Cheung" said:

There are two mistakes in question 3 of HW1. The correct version is
posted on the web. 
1. The symbol V represents two different quantities -- noise vector
for estimating the bad images, and the unitary matrix. I rename the
noise vector as N. 
2. In the last equation, the captial sigma on the left hand side
should be the covariance matrix of X sub G (the good images), not X as
a whole. 
One clarification for question 1, you can assume that we use 8x8 DCT
and the image is much larger than 8x8. 
The notes used for lecture 3 and 4 are posted online.

Sen-ching Samson Cheung
Last modified: Wed Sep 8 21:47:43 EDT 2004