Description: Description: Description: M:\public_html\images\logo1_copy.jpg

Description: Description: Description: M:\public_html\images\Title Bar.jpg

Description: Description: Description: M:\public_html\images\SeparadorCol1.gif

 

 

Description: Description: Description: M:\public_html\images\TopHome.gif

Description: Description: Description: M:\public_html\images\TopContactUs.gif

Description: Description: Description: M:\public_html\_blank.gif

 

 

Description: Description: Description: M:\public_html\images\home.png

Description: Description: Description: M:\public_html\images\people.png

Description: Description: Description: M:\public_html\images\research.png

Description: Description: Description: M:\public_html\images\publications.png

Description: Description: Description: M:\public_html\images\events.png

Description: Description: Description: M:\public_html\images\links.png

Description: Description: Description: M:\public_html\images\Sponsor.png

 

 

 

 

 

Accurate 3D Pose Estimation From a Single Depth Image

We present a novel system to estimate body pose configuration from a single depth map. It combines both pose detection and pose refinement. The input depth map is matched with a set of pre-captured motion exemplars to generate a body configuration estimation, as well as semantic labeling of the input point cloud. The initial estimation is then refined by directly fitting the body configuration with the observation (e.g., the input depth). In addition to the new system architecture, our other contributions include modifying a point cloud smoothing technique to deal with very noisy input depth maps, a point cloud alignment and pose search algorithm that is view-independent and efficient. Experiments on a public dataset show that our approach achieves significantly higher accuracy than previous state-of-art methods.

Description: Description: Description: M:\public_html\Research\ToFMatting\ToFMatting_files\FMatting.PNG

Automatic Real-Time Video Matting Using Time-of-Flight Camera and Multichannel Poisson Equations

We present an automatic real-time video matting system. The proposed system consists of two novel components. In order to automatically generate trimaps for live videos, we advocate a Time-of-Flight (TOF) camera-based approach to video bi-layer segmentation. Our algorithm combines color and depth cues in a probabilistic fusion framework. The scene depth information returned by the TOF camera is less sensitive to environment changes, which makes our method robust to illumination variation, dynamic background and camera motion. For the second step, we perform alpha matting based on the segmentation result. Our matting algorithm is based on a set of novel Poisson equations that are derived for handling multichannel color vectors, as well as the depth information captured. Real-time processing speed is achieved through optimizing the algorithm for parallel processing on graphics hardware. We demonstrate the effectiveness of our matting system on an extensive set of experimental results.

Description: Description: Description: M:\public_html\Research\stereolization\stereolization_files\image015.gif

Video Stereolization: Combining Motion Analysis with User Interaction

We present a semi-automatic system that converts conventional videos into stereoscopic videos by combining motion analysis with user interaction, aiming to transfer as much as possible labeling work from the user to the computer. In addition to the widely-used structure from motion (SFM) techniques, we develop two new methods that analyze the optical flow to provide additional qualitative depth constraints. They remove the camera movement restriction imposed by SFM so that general motions can be used in scene depth estimation ? the central problem in mono-to-stereo conversion. With these algorithms, the user’s labeling task is significantly simplified. We further developed a quadratic programming approach to incorporate both quantitative depth and qualitative depth (such as these from user scribbling) to recover dense depth maps for all frames, from which stereoscopic view can be synthesized. In addition to visual results, we present user study results showing that our approach is more intuitive and less labor intensive, while producing 3D effect comparable to that from current state-of-the-art interactive algorithms.

Description: Description: Description: M:\public_html\Research\interreflection\interreflection_files\image030.gif

Interreflection Removal for Photometric Stereo by Using Spectrum-dependent Albedo

We present a novel method that can separate m-bounced light and remove the interreflections in a photometric stereo setup. Under the assumption of a uniformly colored lambertian surface, the intensity of a point in the scene is the sum of 1-bounced light through m-bounced light rays. Ruled by the law of diffuse reflection, whenever a light ray is bounced by the surface, its intensity will be attenuated by the factor of albedo . This implies that the measured intensity value can be written as a polynomial function of , and the intensity contribution of the m-bounced light rays are expressed by the term of m. Therefore, when we change the surface albedo, the intensity of the m-bounced light is changed to the order of m. This non-linearity gives us the possibility to separate the m-bounced light. In practice, we illuminate the scene with different light colors to effectively simulate different surface albedos since albedo is spectrum dependent. Once the m-bounced light rays are separated, we can perform the photometric stereo algorithm on the 1-bounced light (direct lighting) images to produce the 3D shape without the impact of interreflections. Experiments have shown that we get significantly improved scene reconstruction with a minimum of two color image

Description: Description: Description: M:\public_html\Research\FaceModeling\facemodeling_files\image001.png

Learning-based Face Modeling from a Single Image

The 3D reconstruction of a face from a single frontal image is an ill-posed problem. This is further accentuated when the face image is captured under different poses and/or complex illumination conditions. We aim to solve the shape recovery problem from a single facial image under these challenging conditions. The local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. By combining the local shapes, the global shape of a face can be reconstructed directly using a single least-square system of equations. We perform experiments on synthetic and real data, and validate the algorithm against the ground truth. Experimental results show that our method can yield accurate shape recovery from out-of-training samples with a variety of pose and illumination variations.

Description: Description: Description: M:\public_html\Research\eccv10_webpage\10CVVE.jpg

Semantic Segmentation of Urban Scenes Using Dense Depth Maps

We present a framework for semantic scene parsing and object recognition based on dense depth maps. Five view independent 3D features that vary with object class are extracted from dense depth maps at a super-pixel level for training a classifier using randomized decision forest. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance.

Description: Description: Description: M:\public_html\Research\Miao_VolumetricApproach\details.files\image027.jpg

A Volumetric Approach for Merging Range Images of Semi-Rigid Objects Captured at Different Time Instances

We present a framework for reconstructing complete 4D models of semi-rigid objects from a single stereoscopic sequence,extending the powerful structure-from motion method to dynamic scenes. We developed a novel volumetric distance field warping function so that depth maps from different time, even if there are non-rigid deformations, can be mapped to time t and merged together.

Description: Description: Description: M:\public_html\Research\Fusion\pic.jpg

Fusion of Passive Stereo and Time-of-Flight

Time-of-flight range sensors have error characteristics which are complementary to passive stereo. They provide real time depth estimates in conditions where passive stereo does not work well, such as on white walls. In contrast, these sensors are noisy and often perform poorly on the textured scenes for which stereo excels. We introduce a method for combining the results from both methods that performs better than either alone. A depth probability distribution function from each method is calculated and then merged. In addition, stereo methods have long used global methods such as belief propagation and graph cuts to improve results, and we apply these methods to this sensor. Since time-of-flight devices have primarily been used as individual sensors, they are typically poorly calibrated. We introduce a method that substantially improves upon the manufacturer’s calibration. We show that these techniques lead to improved accuracy and robustness.

Description: Description: Description: M:\public_html\Research\4DRecWebPage\4DReconstruction_files\f1.jpg

Modeling Deformable Objects from a Single Depth Camera

We propose a novel approach to reconstruct complete 3D deformable models over time by a single depth camera, provided that most parts of the models are observed by the camera at least once. The core of this algorithm is based on the assumption that the deformation is continuous and predictable in a short temporal interval. While the camera can only capture part of a whole surface at any time instant, partial surfaces reconstructed from different times are assembled together to form a complete 3D surface for each time instant, even when the shape is under severe deformation. A mesh warping algorithm based on linear mesh deformation is used to align different partial surfaces. A volumetric method is then used to combine partial surfaces, fix missing holes, and smooth alignment errors. Our experiment shows that this approach is able to reconstruct visually plausible 3D surface deformation results with a single camera.

Description: Description: Description: M:\public_html\Research\MultiProjectors\TitleImage.JPG

Multi-Projector Display Systems

The goal of this research is to create prototypes of rapidly assembled and calibrated multi-projector display systems capable of displaying any content in any situation. These displays provide ultra-high resolution in a large format with a short depth footprint. Due to their rapid assembly and calibration, they are ideally suited for portable operations such as mobile command centers for first responders, field-deployable troop training environments, and conference and trade show displays. The software driving these displays empowers the user with easy content management and control of multiple windows in multiple displays. Still images, video, live capture feeds, remote desktop connections, DVR servers, and other content can be easily managed and simultaneously integrated into one display.

Description: Description: Description: M:\public_html\Research\StructSemantics\TitleImage.jpg

Unsupervised Learning of High-order Structural Semantics from Images

We present a new unsupervised learning algorithm to find high-order frequently occurring visual patterns (semantics) in images beyond the spatial proximity assumption. We believe semantics are composed by image features with consistent geometric relationships sufficiently often. An efficient polynomial-time algorithm is developed to search for meaningful and strong associations between pair-wise visual clusters over the entire image space. High-order composite visual structures are extracted by frequent subgraph mining on a undirected labeled graph built upon all pair-wise associations.

Description: Description: Description: M:\public_html\Research\WaterRec\5.jpg

Physically Guided Liquid Surface Modeling from Videos

We present an image-based reconstruction framework to model real water scenes captured by stereoscopic video. The combination of image-based reconstruction with physically-based simulation allows us to model complex and dynamic objects such as fluid. Using a depth map sequence as initial conditions, we use a physically based approach that automatically fills in missing regions, removes outliers, and refines the geometric shape so that the final 3D model is consistent to both the input video data and the laws of physics.

 

Description: Description: Description: M:\public_html\Research\Fusion\pic.jpg

 

Fusion of Passive Stereo and Time-of-Flight (Active)

Time-of-flight range sensors have error characteristics which are complementary to passive stereo. They provide real time depth estimates in conditions where passive stereo does not work well, such as on white walls. In contrast, these sensors are noisy and often perform poorly on the textured scenes for which stereo excels. We introduce a method for combining the results from both methods that performs better than either alone. We show that these techniques lead to improved accuracy and robustness.

Description: Description: Description: M:\public_html\Research\PixelRouter\compositer_old.JPGDescription: Description: Description: M:\public_html\Research\PixelRouter\compositer_new.JPG

 

Pixel Router

Even with the recent rapid advancement in hardware, the demand from high-end graphics applications (including video games) seems to always outpace the capability that a single GPU can offer. As we migrate from a single GPU to multiple GPUs or eventually GPU clusters, how to effectively assemble the final image from these distributed rendering nodes becomes an important issue. Here we propose to develop a flexible pixel compositor to solve this problem.

 

 

Description: Description: Description: M:\public_html\Research\SDSR\title.jpg

 

Spatial-Depth Supre Resolution for Range Images

We present a new post-processing step to enhance the resolution of range images. Using one or two registered and potentially high-resolution color images as reference, we iteratively refine the input low-resolution range image, in terms of both its spatial resolution and depth precision. Evaluation using the Middlebury benchmark shows across-the-board improvement for sub-pixel accuracy. We also demonstrated its effectiveness for spatial resolution enhancement up to 100X with a single reference image.

Description: Description: Description: M:\public_html\Research\Light Fall-off Stereo\L1.JPGDescription: Description: Description: M:\public_html\Research\Light Fall-off Stereo\R1.JPG

 

Light Fall-off Stereo

LFS-a new method for rcomputing depth from scenes beyond lambertian reflectance and texture. Compared to previous reconstruction methods for non-lamebrain scenes, LFS needs as few as two images, does not require calibrated camera or light sources, or reference objects in the scene.

 

 

Description: Description: Description: M:\public_html\Research\urban\image012.jpg

3D Urban Reconstruction from Video

This project aims at developing a fully automated system for the accurate and rapid 3D reconstruction of urban environments from video streams. The system collects multiple video streams, as well as GPS and INS measurements in order to place the reconstructed models in geo-registered coordinates. Besides high quality in terms of both geometry and appearance, we aim at real-time performance on a combination of CPUs and GPUs.

 

 

Description: Description: Description: M:\public_html\Research\brdf\Figure1_LightChanging_v1.jpgDescription: Description: Description: M:\public_html\Research\brdf\Figure1_LightChanging_v2.jpg

 

 

BRDF Invariant Stereo using Light Transport Constancy

Nearly all existing methods for stereo reconstruction assume that scene reflectance is Lambertian and make use of brightness constancy as a matching invariant. We introduce a new invariant for stereo reconstruction called Light Transport Constancy, which allows completely arbitrary scene reflectance (BRDFs).

 

 

 

Description: Description: Description: M:\public_html\Research\st-lfr\left_after_book.jpgDescription: Description: Description: M:\public_html\Research\st-lfr\center_after_book.jpgDescription: Description: Description: M:\public_html\Research\st-lfr\right_after_book.jpg

 

 

Towards Space-time Light Field Rendering

In this paper we propose a novel framework, space-time light field rendering, which allows continuous exploration of a dynamic scene in both spatial and temporal domain with unsynchronized input video sequences.

Description: Description: Description: M:\public_html\Research\lddisplay\LFDisplay\swing.JPG

Toward the Light Field Display: Autostereoscopic Rendering
via a Cluster of Projectors

Ultimately, a display device should be capable of reproducing the visual effects that are produced by reality. In this paper we introduce an autostereoscopic display that uses a scalable array of digital light projectors and a projection screen augmented with microlenses to simulate a light field for a given three-dimensional scene.

Description: Description: Description: M:\public_html\images\research_clip_image002.jpg

Projector-Whiteboard-Camera System for Remote Collaboration

In a typical remote collaboration setup, two or more projector-camera pairs are "cross-wired" to form a full-duplex system for two-way communication. A whiteboard can be used as the projector screen, and in that case, the whiteboard server as an output device as well as an input device. Users can write on the whiteboard to comment on what is projected or to add new thoughts in the discussion.

 

 

 

Description: Description: Description: M:\public_html\Research\iris\iris_files\image001s.jpg

 

 

Wide-area Rapid Iris Image Capture with Pan-tilt-zoom Cameras

In response to the DHS interest in fast biometric measurement, this project will develop a system to rapidly capture iris images of moving human subjects at long range. Working in concert with a currently available commercial iris identification software package, our system will provide fast, accurate, and automated biometric identification first for homeland security and also for other fields requiring identification or authentication.

 

 

Description: Description: Description: M:\public_html\Research\stereo\Input-Live-Li-bar.jpgDescription: Description: Description: M:\public_html\Research\stereo\DP-Live-Li-bar.jpg

 

 

High Quality and Real-time Stereo Algorithms

We have been working on designing algorithms for dense two-frame stereo matching problem aiming at both high reconstruction quality and real-time performance. Evaluation using the benchmark Middlebury stereo database shows that our algorithms are among the best in terms of both quality and speed.

 

 

Dr. Yang's Previous and Current Work:

3D Reconstruction and View Synthesis (from May 2001)

 

Description: Description: Description: M:\public_html\Research\Yang-research\research\ViewSyn\surg2_sync_01f0000FR.jpgDescription: Description: Description: M:\public_html\Research\Yang-research\research\ViewSyn\surg2_sync_02f0000FR.jpgDescription: Description: Description: M:\public_html\Research\Yang-research\research\ViewSyn\surg2_sync_03f0000FR.jpg
Description: Description: Description: M:\public_html\Research\Yang-research\research\ViewSyn\torso-s.jpg

 

3D Physically-based 2D View Synthesis

As part of Dr. Yang's thesis work, he is working on a new statistical approach for view synthesis. It is particular effective for texture-less regions and specular highlights, two major problems that most existing reconstruction techniques would have difficulty with. We are preparing to report our work to ICCV 2003. Some initial results are presented on the left, the top row shows several input images while the bottom row shows the reconstructed point cloud.

 

Description: Description: Description: M:\public_html\Research\Yang-research\research\ViewSyn\Resear4.jpg

 

Real-time Stereo

A multi-resolution stereo algorithm that can be implemented on commodity graphics hardware. A paper and a live demo appeared in CVPR 2003.

 

Description: Description: Description: M:\public_html\Research\Yang-research\research\ViewSyn\Resear5.jpg

Real-time View Synthesis on Graphics Hardware

We present a novel use of commodity graphics hardware that effectively combines a plane-sweeping algorithm with view synthesis for real-time, on-line 3D scene acquisition and view synthesis. The heart of our method is to use programmable Pixel Shader technology to square intensity differences between reference image pixels, and then to choose final colors that correspond to the minimum difference, i.e. the most consistent color. We filed an invention disclosure with UNC.

Internship at Microsoft Research 
(Mentor: Zhengyou Zhang), Summer 2001

Description: Description: Description: M:\public_html\Research\Yang-research\research\MSR\facetracking_s.jpg

 

Eye-Gaze Correction

Dr. Yang's internship at Microsoft Research (MSR) during summer 2001 has focused on maintaining eye-contact for desktop video teleconferencing. They took a model-based approach that incorporates a detailed individualized three-dimensional head model with stereoscopic analysis. This approach is very effective; they probably achieved the most realistic results in published literature for eye gaze correction. In the process, they can also get very accurate 3D tracking results of the head pose. The images show the face model projected on the tracked head. MSR has filed two patent applications for our algorithms and systems.

 

Large Format Display (2000-2001)

Description: Description: Description: M:\public_html\Research\Yang-research\research\largeDisplay\Resear7.jpg

 

PixelFlex: A Reconfigurable Multi-Projector Display System

The PixelFlex system is composed of ceiling-mounted projectors, each with computer-controlled pan, tilt, zoom and focus; and a camera for closed-loop calibration. Working collectively, these controllable projectors function as a single logical display capable of being easily modified into a variety of spatial formats. The left image shows a stacked configuration that can be used for stereo display. 

Description: Description: Description: M:\public_html\Research\Yang-research\research\largeDisplay\autoCalib-s.jpg

Automatic Projector Display Surface Estimation Using Every-Day Imagery 

 

We introduce a new method for continuous display surface auto-calibration. Using a camera that observes the display surface, we match image features in whatever imagery is being projected, with the corresponding features that appear on the display surface, to continually refine an estimate for the display surface geometry. In effect we enjoy the high signal-to-noise ratio of "structured" light (without getting to choose the structure) and the unobtrusive nature of passive correlation-based methods.

Tele-Immersion  (1998-current)

 

Description: Description: Description: M:\public_html\Research\Yang-research\research\telecon\groupTI_s.jpg

Group Teleconferencing

We want to design a system that facilitate many-to-many teleconferencing. Instead of providing a perceptively correct view for every single user, we strive to provide the best approximating view for the entire group as a whole. We demonstrate two real-time acquisition-through-rendering algorithms: one is based on view dependent texture mapping with automatically acquired approximating geometry, and the other uses an array of cameras to perform Light Field style rendering. 

 

Description: Description: Description: M:\public_html\Research\Yang-research\research\telecon\work.jpg

 

3D Tele-Immersion

The goal of Tele-Immersion is to enable users at geographically distributed sites to collaborate in real time in a shared, simulated environment as if they were in the same physical room. While the entire project was a interdisciplinary, multi-site collaboration, Dr. Yang was mainly invovled in in real-time data capture and distribution.

 

Description: Description: Description: M:\public_html\Research\Yang-research\research\telecon\Resear6_s.jpg

 

 

2D Immersive Teleconferencing

We worked on improving the field of view and resolution for 2D video teleconferencing. The result is a simple, yet effective technique for producing geometrically correct imagery for teleconferencing environments. The necessary image transformations are derived by finding a direct one-to-one mapping between a capture device and a display device for a fixed viewer location, thus completely avoiding the need for any intermediate, complex representations of screen geometry, capture and display distortions, and viewer location. Using this technique, we can easily build an immersive teleconferencing system using multiple projectors and cameras. 

 

 

Geometrically Correct Imagery for Teleconferencing

 

 

 

Multi-Projector Displays Using Camera-Based Registration

 

 

 

 

 

 

___________________________________________________________________________________________________________________________

GRAVITY at VIS.UKY.EDU

1 Quality Street, Suite 800

Lexington,KY 40507

859-257-1257

 

Description: Description: Description: M:\public_html\images\BottomBarraBack.jpg