When comparing performance between different recognition retrieval systems and ideas it is important to separate the effects of different components of the different systems on the final outcome of the test.
Out goal with this page is to walk through our recognition pipeline step-by-step and provide as much data as possible at every step so that authors with a different pipeline can replace as little as possible of the pipeline in order to minimally hurt the comparisons. Our current pipeline architecture which we share with several other systems is:
| Scoring Strategy | ||||
| Quantizer Training set | Flat | 10 | 100 | 1000 |
| cd | 2.895588 | 2.574118 | 3.139706 | 3.161275 |
| moving | 2.828529 | 2.161275 | 3.014216 | 3.083824 |
| moving+cd | 2.884412 | 2.551078 | 3.139902 | 3.157157 |
| flip | 3.014412 | 2.534902 | 3.135098 | 3.188333 |
| test | 3.166373 | 3.070098 | 3.294314 | 3.286863 |
The Training sets | Download sift-features | |
| cd | A large collection of CD-covers. | 31.486.448 features(4GB). |
| moving | Images from moving vehicles. | 21.750.941 features (2.7GB). |
| moving+cd | Concatenation of the two above sets | 53.237.389 features |
| flip | Flipped versions of the test-set | 5.327.083 features (0.68GB) |
| test | The test-set | 5.328.755 features (0.68GB) |
The Scoring Strategies | ||
| flat | Using only the leaf-nodes. No scoring limit. | |
| 10 | Using hierarchical scoring with scoring limit 10. | |
| 100 | Using hierarchical scoring with scoring limit 10. | |
| 100 | Using hierarchical scoring with scoring limit 10. | |
The Test Set | Download extracted data | |
| Extracted features and geometry from UK-bench. WARNING: Only x and y are valid in the geometry data. | SIFT and geometry (1.1GB) | |
| Quantized using cd | Visual Words | |
| Quantized using moving | Visual Words | |
| Quantized using cd+moving | Visual Words | |
| Quantized using flip | Visual Words | |
| Quantized using normal | Visual Words | |
| There are a total of 7034780 features, the average is 689 features per image. We permit multiple SIFT per MSER-point and I do not know the original number of MSER-points (you can find this number by counting the number of distinct points in the geometric part of the data). | ||||||||||||||||||||||||||||||||||||
For the training data all features with an ellipse partially outside the image have been removed.
Corresponding test-data can be found, along with results, can be found
here.
Table with results and
values for plotting a curve.
| ||||||||||||||||||||||||||||||||||||
| The code is not freely available and is the property of the University of Kentucky. Please contact the Center for Visualization and Virtual Environments for licensing. The code exist in both Windows and Linux versions. | ||||||||||||||||||||||||||||||||||||
Around 12 frames per second. For the 10200 frames we get:
real 14m30.650s user 13m40.619sApproximate distribution:
| ||||||||||||||||||||||||||||||||||||
Depends on the scoring strategy. There is a trade-off between speed and quality.
Time for all 10200 queries:
real 16m7.531s user 15m32.642sFor 100 times using "10" real 16m59.350s user 16m31.338sFor 10 times using "100" real 4m46.278s user 4m36.717sFor 1 time using "1000" real 3m45.801s user 3m35.857sThese times includes reading all the text-based forward files, building the inverted index and computing the weights once. | ||||||||||||||||||||||||||||||||||||
On an AMD bought spring 2006, it is running 32-bit Linux (2.6.17-2-k7, Debian).
# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 39
model name : AMD Athlon(tm) 64 Processor 4000+
stepping : 1
cpu MHz : 2418.239
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce
cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse
sse2 syscall nx mmxext fxsr_opt
lm 3dnowext 3dnow up pni lahf_lm
ts fid vid ttp tm stc
bogomips : 4841.36
| ||||||||||||||||||||||||||||||||||||
| Please mention the homepage. The only paper we have published on this is: D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161-2168, June 2006. [ bib | .ppt | http | .pdf ] | ||||||||||||||||||||||||||||||||||||
If you are using hierarchical scoring there is never a problem using a too large vocabulary.
For flat scoring the picture is different:
![]() (This plot is on a different dataset. The quantizers where trained on images from a video-camera.) Anyway, choosing vocabulary size for flat scoring is a choice where several factors have to be balanced: | The question is ill posed. Different use scenarios have different optimal settings. We have choosen to use settings which do not give optimal quality but gives a good trade-off between quality and speed. Using a higher number of features per image does improve the quality but slows down queries. | For optimal performance on a toy-scale example such as this benchmark I would advice being very agressive with the number of features, after all the cost is only linear in memory and bilinear in query-time. |
Ask the other authors to try our set. I have made some trials but i am
not sure enough of my implementation.
|
| ||||||||||||||||||||||||||||||||