Monocular 3D Human Body Reconstruction Towards Depth Augmentation of Television Sequences

更新时间:2023-05-24 11:27:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr

MONOCULAR 3D HUMAN BODY RECONSTRUCTION TOWARDS

DEPTH AUGMENTATION OF TELEVISION SEQUENCES

Angel Sappa

Niki Aifanti

Sotiris Malassiotis

Michael G. Strintzis

Informatics & Telematics Institute1st Km Thermi-Panorama RoadThermi-Thessaloniki, Greece

{angel.sappa@iti.gr}

ABSTRACT

This paper addresses the reconstruction of 3D humanbody models from 2D video sequences. Considering thatthe input frames are already segmented, the proposedtechnique consists of three stages. These stages are inde-pendently applied over each segmented frame. Firstly, askeleton of a human figure obtained from the segmentedimage is extracted by means of a fast algorithm based ona Voronoi diagram of the boundary points. Afterwards,the skeleton is labelled according to the human bodyparts (e.g. head, upper arm, lower arm, torso, etc). Sec-ondly, an initial 3D model posture is estimated from thelabelled skeleton. Finally, an iterative closest point (ICP)implementation is used to refine the initial model postureby maximizing the similarity between the projected 3Dmodel and the segmented image. Experimental resultswith video sequences are presented.

provided by a steady single camera. The target applica-tion is depth augmentation of common televisionsequences for future 3D-Displays [1].

Due to its widespread interest, there has been an abun-dance of work on the vision-based human body modelreconstruction in recent years; however, in spite of all theeffort it is still an open research area with a lot of work tobe done. Recovering the shape and the pose of the humanbody, with only one point of view, is an ill-posed problemdue to self-occlusions and motion ambiguities. In spite ofthe aforementioned difficulties, 3D human body recon-struction from 2D images has been addressed by manyresearchers. In the early eighty [2] proposes a model-based technique to compute a synthetic 3D model byusing monocular images. This technique extracts a pairsof parallel lines in a segmented real image and matchesthem with the legs of a projected 3D model.

Other model-based approaches, using monocular per-ception systems, have been recently proposed by [3] and[4]. In [3] the problem of human arms modelling isaddressed, while [4] tackles the full body modelling. It isbased on maximizing the joint probability density func-tion of the position and velocity of the body parts. Thedrawback of this approach is the requirement of markers(light bulbs strapped to the body joints) for facilitating theimage analysis. In [5] a probabilistic approach is intro-duced for modelling 3D human motion for synthesis andtracking. The goal of this technique is to predict the 3Dpose by using the observed motion history. Although theobtained results are quite promising, the aforementionedtechniques are computationally expensive or need somekind of learning/training process. In [5], for example, alarge data base with different body postures is required. Unlike the previous approaches, in the current work3D human body postures are estimated by using explicitlythe information extracted from 2D video sequence insteadof relying on probabilistic methods. Assuming a seg-mented image is given as an input, the proposedtechnique consists of three stages. Firstly, a human bodyskeleton of the given segmented image is extracted. Sec-

1. INTRODUCTION

The use of 3D Human Body Models (HBM) is experienc-ing a continuous and accelerated growth. This is partlydue to the increasing demand of more realistic representa-tions from computer graphics and computer visioncommunities. Computer graphics pursue a realistic mod-elling of both the human body geometry and itsassociated motion. Applications such as: games, virtualreality or animations demand highly realistic models. Onthe contrary, computer vision seeks for an efficient andaccurate model for applications such as: intelligent videosurveillance, motion analysis, telepresence, human-machine interface.

The current work is focused on the generation of 3DHBMs within the computer vision field. In other words,the objective is to extract 3D HBMs from the information

This work has been carried out as part of the ATTEST project(Advanced Three-dimensional TElevision System Technologies, IST-2001-34396). The first author has been supported by the EU under aMarie Curie Post-Doctoral Fellowship (HPMD-GH-01-00086-01).

This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr

Figure 1. (

top-left) Boundary points extracted from the segmented

input image. (top-right) Constrained Delaunay triangulation of asubset of boundary points—only one out of twenty points wereconsidered. (bottom-left) Skeleton computed from the Voronoidiagram. (bottom-right

) Labelled body parts.

ondly, the skeleton posture is used to initialize a 3Dmodel of a human body. Finally, a 2D projection of theinitial 3D model is registered with the original image byusing the iterative closest point algorithm.

The outline of this paper is as follows. Section 2addresses the skeleton extraction stage. Section 3 intro-duces the used 3D model and describes the initializationof the posture parameters. Section 4 presents the approachused for registering the computed model with the originalimage. Finally, section 5 presents experimental results byusing a video sequence. Conclusions and further improve-ments are given in section 6.

2. SKELETON EXTRACTION

Given a segmented image as an input (segmented imageswere computed by using the techniques proposed in [6]and [7]), the skeleton of the contained human figure isextracted. After implementing and comparing differentoptions an algorithm based on a Voronoi diagram hasbeen chosen. The proposed technique consists of the fol-lowing steps. Firstly, the boundary points of thesegmented image are extracted (see Fig. 1(top-left)) andtriangulated by means of a constrained Delaunay algo-rithm [8]. The constraint is used to enforce a triangulationinside the polyline defined by linking consecutive bound-ary points. In order to reduce the CPU time, not all theboundary points are considered but only a subset (takinginto account that the points are arranged in a list, in the

Figure 2. Illustration of a 22 DOF model built with superquadric

current implementation only one out of twenty pointswere used). Fig. 1(top-right) illustrates the triangularmesh obtained by using a subset of the points presented inFig. 1(top-left). Afterwards, the corresponding Voronoidiagram is extracted from the obtained triangular meshand used to define the skeleton [9] (see Fig. 1(bottom-left)). Finally, the computed skeleton is labelled accord-ing to the different parts of the human body (i.e. legs,torso, arms and head). The implemented heuristic consistsin labelling the lowest point as a member of one of thelegs and going up until a bifurcation is reached. Everytime a bifurcation is reached, a new member of the bodypart is labelled. Fig. 1(bottom-right) presents the skeletonextracted by using the Voronoi diagram and the corre-sponding set of segments defining the body parts. Thetorso and head were represented by a single segment,while the legs and arms were represented by two seg-ments in order to preserve the human body anatomy.

3. 3D BODY MODELLING

Superquadrics are a family of parametric shapes capable ofmodelling a large set of blob-like objects, such as spheres,cylinders, parallelepipeds and shapes in between [10]. Asuperquadric surface is given by the following parametricequation:

αε

1cos(θ)cos2(φ)

x(θ,φ)=

αε1

ε2

2cos(θ)sin(φ)

(1)

αε1

3sin(θ)

where , . The parameters –π 2≤θ≤π 2–π≤φ<παα1,y and 2 and αz axis respectively, while 3define the size of the superquadric along the x,ε1 is the squareness param-eter in the latitude plane and εparameter in the longitudinal plane. Furthermore, super-2 is the squarenessquadric shapes can be deformed with tapering, bending andcavities. In our model, the different body parts are repre-

This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr

sented with superquadrics tapered along the y-axis. Theparametric equation is then written as:

-tα--1--x2+1

x21x′(θ,φ)=

x2

(2)

-tα--3--x2+1

x23where , are the tapering parameters and –1≤t1t3≤1x1,

xThe parameters , and 2 and are the elements of the vector in equation (1).x3

α1αpart according to anthropometric measurements. An exam-2α3were defined in each bodyple can be seen in Fig. 2.

The model used through the current implementationhas 22 DOFs—four for each arm and leg, three for thetorso and three for the head. We did not assign any DOFsto the palms or the feet for simplicity. The movements ofthe limbs are based on a hierarchical approach (the torsois considered the root) using Euler angles. The body pos-ture is synthesized by concatenating the transformationmatrices associated with the joints, starting from the root.Kinematic constraints have also been introduced in orderto generate a realistic 3D model.

The unknown model parameters, which are the DOFs,are initially estimated using information extracted fromthe labelled skeleton. More precisely, the orientation andthe length of the skeleton’s labelled parts define the initialvalues of the DOFs, providing an initial human body pos-ture. In case that some skeleton parts are missed, due toocclusions or segmentation failure, the algorithm usestemporal continuity. It is based on the posture computedin the previous frame updated with a displacement esti-mated by using the three precedent frames.

4. 3D POSTURE ESTIMATION

This section describes the technique used to compute a

3D human body posture from a 2D projection. This tech-nique is based on the Iterative Closest Point—ICP—algorithm (originally proposed in [11]), which starts byusing the initial human body posture as described in theprevious section. The aim of the algorithm is to establishregistration between the edge points—boundary points inSection 2—extracted from the segmentation and the pro-jected occluding boundary points of the 3D model.

In order to obtain the projected occluding boundarypoints, the normals of each point on the superquadric sur-faces are calculated. The dot product between the normalvectors and the viewing direction is then obtained. Thesign of the dot product indicates whether this point lies onthe front or the back surface. After eliminating the back-facing polygons, the occluding boundary vertices can be

specified. The projection of the occluding boundary issubsequently obtained by projecting the boundary pointsonto the image plane.

The relative position and scale of the human figure isobtained from the segmented image. Consequently, theparameters computed by the ICP algorithm are the DOFsof the model ω=[ω1, ω2,… ω22]. The objective is toestimate the model parameters that align the projectedoccluding boundary points with the extracted edge points.This is achieved by means of an iterative energy minimiza-tion technique. The computation of the distance betweenthe projected occluding boundary points Bi(ω)(i=1,…,N) and the edge points Ej(j=1,…,M) isbased on an approximate though efficient technique.According to this technique, the distance of an edge pointEj from an occluding boundary point may be approxi-Ej–Bi(ω)Bi(ω)

tree search algorithm. Thus, the estimation of the modelparameters is achieved by the minimization of the follow-ing function:

M

D(ω)=

2

jEj–BEj(ω)

(3)

j∑

w=1

where wj is a weighting factor and BEj(ω) denotes the

occluding boundary point which was found to be the clos-

est to . The minimization process is then performed byEmeans of the Levenberg-Marquard non-linear least squaresj

technique.

The use of the weighting factors aims at limiting theweffect of outliers on the estimation process. The weightingj

factors are calculated according to the following:

w

0 dj>3σj=

σ dj σ<dj<3σ

(4)

1 dj<σwhere dj is the residual fitting error for point and Ejσthe error variance.

Since the number of DOFs is significant, if the algo-rithm is applied to the whole body at the same time, it maybe easily trapped in a local minima. For this reason, severalgroups of body parts are separately processed. At thebeginning, head’s and torso’s abduction parameters areestimated. In the next step, the edges close to the head’sand torso’s boundary projection are excluded. Thus, onlythe edges associated with the arms and legs are left. We useprior knowledge of the human body geometry to assign theremaining edge pixels to arms or legs. Then, the ICP algo-rithm is applied independently for each labelled group ofpoints (e.g. left arm, right arm).

This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr

Figure 3. (top) Original segmented images used as an input for the algorithm, in black the projected occluding boundary points of the 3Dmodel are represented. (bottom) The corresponding 3D models generated by using the proposed technique. Minor misalignments are due

to posture ambiguities.

5. EXPERIMENTAL RESULTS

The proposed technique has been tested with several seg-mented video sequences generated using [6] and [7]. Theaverage CPU (Pentium III, 1GHz processor) time to com-pute and label the skeleton of the human body was 0.1sec. per frame. This labelled skeleton is used to estimate acoarse 3D posture which is improved by the registrationof the projected occluding boundary points with the edgepoints extracted from the segmented image. The averageCPU time to register the different projected body partswas 2 sec. per frame. Fig. 3(top) presents a set of fourframes used as an input for the proposed technique. Thefinal projected occluding boundary points, computed withICP, are represented in black. The obtained 3D modelsare illustrated in Fig. 3(bottom).

and by studying the history of the articulations’movement.

7. REFERENCES

[1]

A. Redert, et al., ATTEST- Advanced Three-dimensionalTElevision System Technologies, 3DPVT’02, Padova, Italy,June, 2002.

[2]D. Hogg, Model-Based Vision: A Program to See a Walking

Person, Image and Vision Computing, 1(1), February 1983.[3]L. Goncalves, E. Di Bernardo, E. Ursella and P. Perona,

Monocular tracking of the human arm in 3D, IEEE Int.Conf. on Computer Vision, 1995.

[4]Y. Song, L. Goncalves, E. Di Bernardo and P. Perona.,

Monocular Perception of Biological Motion—Detection andLabeling, IEEE Int. Conf. on Computer Vision and PatternRecognition, Fort Collings, USA, 1999.

[5]H. Sidenbladh, M. Black and L. Sigal, Implicit Probabilistic

Models of Human Motion for Synthesis and Tracking,European Conf. on Computer Vision, Copenhagen,Denmark 2002.

[6]F. Ernst, P. Wilinski and K. Van Overveld, Dense Structure-from-Motion: An Approach Based on Segment Matching,ECCV 2001.

[7]S. Jabri, Z. Duric, H. Wechsler and A. Rosenfeld, Detection

and Location of People in Video Images Using AdaptiveFusion of Color and Edge Information, 15th. Int. Conf. onPattern Recognition 2000.

[8]O. Faugeras, Three-Dimensional Computer Vision. The MIT

Press, 1993.

[9]Sing-Tze Bow, Pattern Recognition and Image

Preprocessing, Marcel Dekker, Inc, 1992.

[10]F. Solina and R. Bajcsy, Recovery of Parametric Models

from Range Images: The Case for Superquadrics withGlobal Deformations, IEEE Trans. on Pattern Analysis andMachine Intelligence, Vol. 12, No. 2, February 1990.

[11]P. Besl and N. McKay, A Method for Registration of 3-D

Shapes, IEEE Trans. Pattern Analysis and MachineIntelligence

, Vol. 14, no. 2, February 1992.

6. CONCLUSIONS AND FURTHER

WORK

A new technique to generate 3D models of human bodiesfrom 2D video sequences has been presented. It is basedon the processing of single frames, avoiding expensiveprobabilistic approaches and learning problems. The pro-posed technique consists of three stages. Firstly, askeleton of a human body figure is extracted and labelled.Next, the posture of a 3D model is estimated by using theaforementioned skeleton, and finally a registration algo-rithm is used to tune the parameters of the model (DOFs).Further work will include the prediction of humanbody posture using consecutive frames. The initial humanbody posture estimation could be improved and the CPUtime reduced, by using the posture of a previous frame

本文来源:https://www.bwwdw.com/article/sw24.html

Top