Monocular 3D Human Body Reconstruction Towards Depth Augmentation of Television Sequences
更新时间:2023-05-24 11:27:01 阅读量: 实用文档 文档下载
- monocular推荐度:
- 相关推荐
This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr
MONOCULAR 3D HUMAN BODY RECONSTRUCTION TOWARDS
DEPTH AUGMENTATION OF TELEVISION SEQUENCES
Angel Sappa
Niki Aifanti
Sotiris Malassiotis
Michael G. Strintzis
Informatics & Telematics Institute1st Km Thermi-Panorama RoadThermi-Thessaloniki, Greece
{angel.sappa@iti.gr}
ABSTRACT
This paper addresses the reconstruction of 3D humanbody models from 2D video sequences. Considering thatthe input frames are already segmented, the proposedtechnique consists of three stages. These stages are inde-pendently applied over each segmented frame. Firstly, askeleton of a human figure obtained from the segmentedimage is extracted by means of a fast algorithm based ona Voronoi diagram of the boundary points. Afterwards,the skeleton is labelled according to the human bodyparts (e.g. head, upper arm, lower arm, torso, etc). Sec-ondly, an initial 3D model posture is estimated from thelabelled skeleton. Finally, an iterative closest point (ICP)implementation is used to refine the initial model postureby maximizing the similarity between the projected 3Dmodel and the segmented image. Experimental resultswith video sequences are presented.
provided by a steady single camera. The target applica-tion is depth augmentation of common televisionsequences for future 3D-Displays [1].
Due to its widespread interest, there has been an abun-dance of work on the vision-based human body modelreconstruction in recent years; however, in spite of all theeffort it is still an open research area with a lot of work tobe done. Recovering the shape and the pose of the humanbody, with only one point of view, is an ill-posed problemdue to self-occlusions and motion ambiguities. In spite ofthe aforementioned difficulties, 3D human body recon-struction from 2D images has been addressed by manyresearchers. In the early eighty [2] proposes a model-based technique to compute a synthetic 3D model byusing monocular images. This technique extracts a pairsof parallel lines in a segmented real image and matchesthem with the legs of a projected 3D model.
Other model-based approaches, using monocular per-ception systems, have been recently proposed by [3] and[4]. In [3] the problem of human arms modelling isaddressed, while [4] tackles the full body modelling. It isbased on maximizing the joint probability density func-tion of the position and velocity of the body parts. Thedrawback of this approach is the requirement of markers(light bulbs strapped to the body joints) for facilitating theimage analysis. In [5] a probabilistic approach is intro-duced for modelling 3D human motion for synthesis andtracking. The goal of this technique is to predict the 3Dpose by using the observed motion history. Although theobtained results are quite promising, the aforementionedtechniques are computationally expensive or need somekind of learning/training process. In [5], for example, alarge data base with different body postures is required. Unlike the previous approaches, in the current work3D human body postures are estimated by using explicitlythe information extracted from 2D video sequence insteadof relying on probabilistic methods. Assuming a seg-mented image is given as an input, the proposedtechnique consists of three stages. Firstly, a human bodyskeleton of the given segmented image is extracted. Sec-
1. INTRODUCTION
The use of 3D Human Body Models (HBM) is experienc-ing a continuous and accelerated growth. This is partlydue to the increasing demand of more realistic representa-tions from computer graphics and computer visioncommunities. Computer graphics pursue a realistic mod-elling of both the human body geometry and itsassociated motion. Applications such as: games, virtualreality or animations demand highly realistic models. Onthe contrary, computer vision seeks for an efficient andaccurate model for applications such as: intelligent videosurveillance, motion analysis, telepresence, human-machine interface.
The current work is focused on the generation of 3DHBMs within the computer vision field. In other words,the objective is to extract 3D HBMs from the information
This work has been carried out as part of the ATTEST project(Advanced Three-dimensional TElevision System Technologies, IST-2001-34396). The first author has been supported by the EU under aMarie Curie Post-Doctoral Fellowship (HPMD-GH-01-00086-01).
This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr
Figure 1. (
top-left) Boundary points extracted from the segmented
input image. (top-right) Constrained Delaunay triangulation of asubset of boundary points—only one out of twenty points wereconsidered. (bottom-left) Skeleton computed from the Voronoidiagram. (bottom-right
) Labelled body parts.
ondly, the skeleton posture is used to initialize a 3Dmodel of a human body. Finally, a 2D projection of theinitial 3D model is registered with the original image byusing the iterative closest point algorithm.
The outline of this paper is as follows. Section 2addresses the skeleton extraction stage. Section 3 intro-duces the used 3D model and describes the initializationof the posture parameters. Section 4 presents the approachused for registering the computed model with the originalimage. Finally, section 5 presents experimental results byusing a video sequence. Conclusions and further improve-ments are given in section 6.
2. SKELETON EXTRACTION
Given a segmented image as an input (segmented imageswere computed by using the techniques proposed in [6]and [7]), the skeleton of the contained human figure isextracted. After implementing and comparing differentoptions an algorithm based on a Voronoi diagram hasbeen chosen. The proposed technique consists of the fol-lowing steps. Firstly, the boundary points of thesegmented image are extracted (see Fig. 1(top-left)) andtriangulated by means of a constrained Delaunay algo-rithm [8]. The constraint is used to enforce a triangulationinside the polyline defined by linking consecutive bound-ary points. In order to reduce the CPU time, not all theboundary points are considered but only a subset (takinginto account that the points are arranged in a list, in the
Figure 2. Illustration of a 22 DOF model built with superquadric
current implementation only one out of twenty pointswere used). Fig. 1(top-right) illustrates the triangularmesh obtained by using a subset of the points presented inFig. 1(top-left). Afterwards, the corresponding Voronoidiagram is extracted from the obtained triangular meshand used to define the skeleton [9] (see Fig. 1(bottom-left)). Finally, the computed skeleton is labelled accord-ing to the different parts of the human body (i.e. legs,torso, arms and head). The implemented heuristic consistsin labelling the lowest point as a member of one of thelegs and going up until a bifurcation is reached. Everytime a bifurcation is reached, a new member of the bodypart is labelled. Fig. 1(bottom-right) presents the skeletonextracted by using the Voronoi diagram and the corre-sponding set of segments defining the body parts. Thetorso and head were represented by a single segment,while the legs and arms were represented by two seg-ments in order to preserve the human body anatomy.
3. 3D BODY MODELLING
Superquadrics are a family of parametric shapes capable ofmodelling a large set of blob-like objects, such as spheres,cylinders, parallelepipeds and shapes in between [10]. Asuperquadric surface is given by the following parametricequation:
αε
1ε
1cos(θ)cos2(φ)
x(θ,φ)=
αε1
ε2
2cos(θ)sin(φ)
(1)
αε1
3sin(θ)
where , . The parameters –π 2≤θ≤π 2–π≤φ<παα1,y and 2 and αz axis respectively, while 3define the size of the superquadric along the x,ε1 is the squareness param-eter in the latitude plane and εparameter in the longitudinal plane. Furthermore, super-2 is the squarenessquadric shapes can be deformed with tapering, bending andcavities. In our model, the different body parts are repre-
This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr
sented with superquadrics tapered along the y-axis. Theparametric equation is then written as:
-tα--1--x2+1
x21x′(θ,φ)=
x2
(2)
-tα--3--x2+1
x23where , are the tapering parameters and –1≤t1t3≤1x1,
xThe parameters , and 2 and are the elements of the vector in equation (1).x3
α1αpart according to anthropometric measurements. An exam-2α3were defined in each bodyple can be seen in Fig. 2.
The model used through the current implementationhas 22 DOFs—four for each arm and leg, three for thetorso and three for the head. We did not assign any DOFsto the palms or the feet for simplicity. The movements ofthe limbs are based on a hierarchical approach (the torsois considered the root) using Euler angles. The body pos-ture is synthesized by concatenating the transformationmatrices associated with the joints, starting from the root.Kinematic constraints have also been introduced in orderto generate a realistic 3D model.
The unknown model parameters, which are the DOFs,are initially estimated using information extracted fromthe labelled skeleton. More precisely, the orientation andthe length of the skeleton’s labelled parts define the initialvalues of the DOFs, providing an initial human body pos-ture. In case that some skeleton parts are missed, due toocclusions or segmentation failure, the algorithm usestemporal continuity. It is based on the posture computedin the previous frame updated with a displacement esti-mated by using the three precedent frames.
4. 3D POSTURE ESTIMATION
This section describes the technique used to compute a
3D human body posture from a 2D projection. This tech-nique is based on the Iterative Closest Point—ICP—algorithm (originally proposed in [11]), which starts byusing the initial human body posture as described in theprevious section. The aim of the algorithm is to establishregistration between the edge points—boundary points inSection 2—extracted from the segmentation and the pro-jected occluding boundary points of the 3D model.
In order to obtain the projected occluding boundarypoints, the normals of each point on the superquadric sur-faces are calculated. The dot product between the normalvectors and the viewing direction is then obtained. Thesign of the dot product indicates whether this point lies onthe front or the back surface. After eliminating the back-facing polygons, the occluding boundary vertices can be
specified. The projection of the occluding boundary issubsequently obtained by projecting the boundary pointsonto the image plane.
The relative position and scale of the human figure isobtained from the segmented image. Consequently, theparameters computed by the ICP algorithm are the DOFsof the model ω=[ω1, ω2,… ω22]. The objective is toestimate the model parameters that align the projectedoccluding boundary points with the extracted edge points.This is achieved by means of an iterative energy minimiza-tion technique. The computation of the distance betweenthe projected occluding boundary points Bi(ω)(i=1,…,N) and the edge points Ej(j=1,…,M) isbased on an approximate though efficient technique.According to this technique, the distance of an edge pointEj from an occluding boundary point may be approxi-Ej–Bi(ω)Bi(ω)
tree search algorithm. Thus, the estimation of the modelparameters is achieved by the minimization of the follow-ing function:
M
D(ω)=
2
jEj–BEj(ω)
(3)
j∑
w=1
where wj is a weighting factor and BEj(ω) denotes the
occluding boundary point which was found to be the clos-
est to . The minimization process is then performed byEmeans of the Levenberg-Marquard non-linear least squaresj
technique.
The use of the weighting factors aims at limiting theweffect of outliers on the estimation process. The weightingj
factors are calculated according to the following:
w
0 dj>3σj=
σ dj σ<dj<3σ
(4)
1 dj<σwhere dj is the residual fitting error for point and Ejσthe error variance.
Since the number of DOFs is significant, if the algo-rithm is applied to the whole body at the same time, it maybe easily trapped in a local minima. For this reason, severalgroups of body parts are separately processed. At thebeginning, head’s and torso’s abduction parameters areestimated. In the next step, the edges close to the head’sand torso’s boundary projection are excluded. Thus, onlythe edges associated with the arms and legs are left. We useprior knowledge of the human body geometry to assign theremaining edge pixels to arms or legs. Then, the ICP algo-rithm is applied independently for each labelled group ofpoints (e.g. left arm, right arm).
This paper addresses the reconstruction of 3D human body models from 2D video sequences. Considering that the input frames are already segmented, the proposed technique consists of three stages. These stages are independently applied over each segmented fr
Figure 3. (top) Original segmented images used as an input for the algorithm, in black the projected occluding boundary points of the 3Dmodel are represented. (bottom) The corresponding 3D models generated by using the proposed technique. Minor misalignments are due
to posture ambiguities.
5. EXPERIMENTAL RESULTS
The proposed technique has been tested with several seg-mented video sequences generated using [6] and [7]. Theaverage CPU (Pentium III, 1GHz processor) time to com-pute and label the skeleton of the human body was 0.1sec. per frame. This labelled skeleton is used to estimate acoarse 3D posture which is improved by the registrationof the projected occluding boundary points with the edgepoints extracted from the segmented image. The averageCPU time to register the different projected body partswas 2 sec. per frame. Fig. 3(top) presents a set of fourframes used as an input for the proposed technique. Thefinal projected occluding boundary points, computed withICP, are represented in black. The obtained 3D modelsare illustrated in Fig. 3(bottom).
and by studying the history of the articulations’movement.
7. REFERENCES
[1]
A. Redert, et al., ATTEST- Advanced Three-dimensionalTElevision System Technologies, 3DPVT’02, Padova, Italy,June, 2002.
[2]D. Hogg, Model-Based Vision: A Program to See a Walking
Person, Image and Vision Computing, 1(1), February 1983.[3]L. Goncalves, E. Di Bernardo, E. Ursella and P. Perona,
Monocular tracking of the human arm in 3D, IEEE Int.Conf. on Computer Vision, 1995.
[4]Y. Song, L. Goncalves, E. Di Bernardo and P. Perona.,
Monocular Perception of Biological Motion—Detection andLabeling, IEEE Int. Conf. on Computer Vision and PatternRecognition, Fort Collings, USA, 1999.
[5]H. Sidenbladh, M. Black and L. Sigal, Implicit Probabilistic
Models of Human Motion for Synthesis and Tracking,European Conf. on Computer Vision, Copenhagen,Denmark 2002.
[6]F. Ernst, P. Wilinski and K. Van Overveld, Dense Structure-from-Motion: An Approach Based on Segment Matching,ECCV 2001.
[7]S. Jabri, Z. Duric, H. Wechsler and A. Rosenfeld, Detection
and Location of People in Video Images Using AdaptiveFusion of Color and Edge Information, 15th. Int. Conf. onPattern Recognition 2000.
[8]O. Faugeras, Three-Dimensional Computer Vision. The MIT
Press, 1993.
[9]Sing-Tze Bow, Pattern Recognition and Image
Preprocessing, Marcel Dekker, Inc, 1992.
[10]F. Solina and R. Bajcsy, Recovery of Parametric Models
from Range Images: The Case for Superquadrics withGlobal Deformations, IEEE Trans. on Pattern Analysis andMachine Intelligence, Vol. 12, No. 2, February 1990.
[11]P. Besl and N. McKay, A Method for Registration of 3-D
Shapes, IEEE Trans. Pattern Analysis and MachineIntelligence
, Vol. 14, no. 2, February 1992.
6. CONCLUSIONS AND FURTHER
WORK
A new technique to generate 3D models of human bodiesfrom 2D video sequences has been presented. It is basedon the processing of single frames, avoiding expensiveprobabilistic approaches and learning problems. The pro-posed technique consists of three stages. Firstly, askeleton of a human body figure is extracted and labelled.Next, the posture of a 3D model is estimated by using theaforementioned skeleton, and finally a registration algo-rithm is used to tune the parameters of the model (DOFs).Further work will include the prediction of humanbody posture using consecutive frames. The initial humanbody posture estimation could be improved and the CPUtime reduced, by using the posture of a previous frame
正在阅读:
Monocular 3D Human Body Reconstruction Towards Depth Augmentation of Television Sequences05-24
材料力学习题07-02
@化工制图及AutoCAD,A卷05-09
SIMOTION D的上电后自动再运行功能(2007.01.24)09-06
乳房炎防治指导手册04-24
公司财务会计制度07-18
荥阳论文网职称论文发表网-水利水电工程建设不良地基处理方法论文选题题目09-20
鸟适于飞行的形态结构特点10-23
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Reconstruction
- Augmentation
- Television
- Monocular
- Sequences
- Towards
- Human
- Depth
- Body
- 3D
- 2010年高考数学易错题归类解析
- 塑料阻燃型可挠(波纹)管敷设
- 新反恐安全手册 文档
- 第1章面向对象技术概述
- 盾安环境年报(002011)年度报告2011年(通用设备财务信息)浙江盾安人工环境股份有限公司_九舍会智库
- 人教版地理九年级上学期期中试题1
- 公司综合部二〇一二年度工作计划
- 2013-2018年中国数据库管理系统市场分析及发展趋势研究预测报告
- 六年级数学下册认识正比例图像教案苏教版
- 再谈游戏植入式广告
- 专题一中华古代文明的形成和发展
- 协和医院进修总结
- java环境变量配置
- 新人教版七年级(上)2.2整式的加减1合并同类项
- 1小时速记所有医考针灸急症
- 经济全球化与爱国主义教育的几点思考
- 五年级语文综合知识竞赛试卷
- 城市绿地系统动态及与土地利用关系研究_以上海西南地区为例
- 小学数学六年级上册百分数应用二练习题
- 2012市场营销考试试题及答案A卷