This paper presents some techniques for synthesizing novel view for a virtual viewpoint from two given views cap-tured at different viewpoints to achieve both high quality and high efficiency. The whole process consis...This paper presents some techniques for synthesizing novel view for a virtual viewpoint from two given views cap-tured at different viewpoints to achieve both high quality and high efficiency. The whole process consists of three passes. The first pass recovers depth map. We formulate it as pixel labelling and propose a bisection approach to solve it. It is accomplished in log2n(n is the number of depth levels) steps,each of which involves a single graph cut computation. The second pass detects occluded pixels and reasons about their depth. It fits a foreground depth curve and a background depth curve using depth of nearby fore-ground and background pixels,and then distinguishes foreground and background pixels by minimizing a global energy,which involves only one graph cut computation. The third pass finds for each pixel in the novel view the corresponding pixels in the input views and computes its color. The whole process involves only a small number of graph cut computations,therefore it is efficient. And,visual artifacts in the synthesized view can be removed successfully by correcting depth of the occluded pixels. Experimental results demonstrate that both high quality and high efficiency are achieved by the proposed techniques.展开更多
A new method is proposed for synthesizing intermediate views from a pair of stereoscopic images. In order to synthesize high-quality intermediate views, the block matching method together with a simplified multi-windo...A new method is proposed for synthesizing intermediate views from a pair of stereoscopic images. In order to synthesize high-quality intermediate views, the block matching method together with a simplified multi-window technique and dynamic programming is used in the process of disparity estimation. Then occlusion detection is performed to locate occluded regions and their disparities are compensated. After the projecton of the left-to-right and right-to-left disparities onto the intermediate image, intermediate view is synthesized considering occluded regions. Experimental results show that our synthesis method can obtain intermediate views with higher quality.展开更多
A new method of view synthesis is proposed based on Delaunay triangulation. The first step of this method is making the Delaunay triangulation of 2 reference images. Secondly, matching the image points using the epipo...A new method of view synthesis is proposed based on Delaunay triangulation. The first step of this method is making the Delaunay triangulation of 2 reference images. Secondly, matching the image points using the epipolar geometry constraint. Finally, constructing the third view according to pixel transferring under the trilinear constraint. The method gets rid of the classic time consuming dense matching technique and takes advantage of Delaunay triangulation. So it can not only save the computation time but also enhance the quality of the synthesized view. The significance of this method is that it can be used directly in the fields of video coding, image compressing and virtual reality.展开更多
For the pre-acquired serial images from camera lengthways motion, a view synthesis algorithm based on epipolar geometry constraint is proposed in this paper. It uses the whole matching and maintaining order characters...For the pre-acquired serial images from camera lengthways motion, a view synthesis algorithm based on epipolar geometry constraint is proposed in this paper. It uses the whole matching and maintaining order characters of the epipolar line, Fourier transform and dynamic programming matching theories, thus truly synthesizing the destination image of current viewpoint. Through the combination of Fourier transform, epipolar geometry constraint and dynamic programming matching, the circumference distortion problem resulting from conventional view synthesis approaches is effectively avoided. The detailed implementation steps of this algorithm are given, and some running instances are presented to illustrate the results.展开更多
Depth maps are used for synthesis virtual view in free-viewpoint television (FTV) systems. When depth maps are derived using existing depth estimation methods, the depth distortions will cause undesirable artifacts ...Depth maps are used for synthesis virtual view in free-viewpoint television (FTV) systems. When depth maps are derived using existing depth estimation methods, the depth distortions will cause undesirable artifacts in the synthesized views. To solve this problem, a 3D video quality model base depth maps (D-3DV) for virtual view synthesis and depth map coding in the FTV applications is proposed. First, the relationships between distortions in coded depth map and rendered view are derived. Then, a precisely 3DV quality model based depth characteristics is develop for the synthesized virtual views. Finally, based on D-3DV model, a multilateral filtering is applied as a pre-processed filter to reduce rendering artifacts. The experimental results evaluated by objective and subjective methods indicate that the proposed D-3DV model can reduce bit-rate of depth coding and achieve better rendering quality.展开更多
View synthesis is an important building block in three dimension(3D) video processing and communications.Based on one or several views,view synthesis creates other views for the purpose of view prediction(for compr...View synthesis is an important building block in three dimension(3D) video processing and communications.Based on one or several views,view synthesis creates other views for the purpose of view prediction(for compression) or view rendering(for multiview-display).The quality of view synthesis depends on how one fills the occlusion area as well as how the pixels are created.Consequently,luminance adjustment and hole filling are two key issues in view synthesis.In this paper,two views are used to produce an arbitrary virtual synthesized view.One view is merged into another view using a local luminance adjustment method,based on local neighborhood region for the calculation of adjustment coefficient.Moreover,a maximum neighborhood spreading strength hole filling method is presented to deal with the micro texture structure when the hole is being filled.For each pixel at the hole boundary,its neighborhood pixels with the maximum spreading strength direction are selected as candidates;and among them,the pixel with the maximum spreading strength is used to fill the hole from boundary to center.If there still exist disocclusion pixels after once scan,the filling process is repeated until all hole pixels are filled.Simulation results show that the proposed method is efficient,robust and achieves high performance in subjection and objection.展开更多
Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization,which limits their practical applications.We propose a generalization method to infer scenes from ...Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization,which limits their practical applications.We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF(Sparse-Input Generalized Neural Radiance Fields).Firstly,we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images,and then these features are aggregated by multi-head attention as the input of the neural radiance fields.This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference,thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes.We tested the generalization ability on DTU dataset,and our PSNR(peak signal-to-noise ratio)improved by 3.14 compared with the baseline method under the same input conditions.In addition,if the scene has dense input views available,the average PSNR can be improved by 1.04 through further refinement training in a short time,and a higher quality rendering effect can be obtained.展开更多
Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,w...Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,while geometry-based methods have difficulties in synthesizing detailed textures.In this paper,we propose STATE,an end-to-end deep neural network,for sparse view synthesis by learning structure and texture representations.Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions,and texture is encoded as a deformed feature map to preserve detailed textures.We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation,in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps.By decoding the aggregated features,STATE is able to generate realistic images with reasonable structures and detailed textures.Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods.Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture.Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE.展开更多
Indoor visual localization,i.e.,6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene,is gaining increased attention driven by rapid progress of applications such as robotics and a...Indoor visual localization,i.e.,6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene,is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality.However,drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization.In this paper,based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes,we propose a novel system incorporating geometric information to address issues using only pixelated images.Through the system implementation,we contribute a hierarchical structure consisting of pre-scanned images and point cloud,as well as a distilled representation of the planar-element layout extracted from the original dataset.A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset.Moreover,a global image descriptor based on the image statistic modality,called block mean,variance,and color(BMVC),was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network(CNN)descriptor.Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.展开更多
Typical stereo algorithms treat disparity estimation and view synthesis as two sequential procedures.In this paper,we consider stereo matching and view synthesis as two complementary components,and present a novel ite...Typical stereo algorithms treat disparity estimation and view synthesis as two sequential procedures.In this paper,we consider stereo matching and view synthesis as two complementary components,and present a novel iterative refinement model for joint view synthesis and disparity refinement.To achieve the mutual promotion between view synthesis and disparity refinement,we apply two key strategies,disparity maps fusion and disparity-assisted plane sweep-based rendering(DAPSR).On the one hand,the disparity maps fusion strategy is applied to generate disparity map from synthesized view and input views.This strategy is able to detect and counteract disparity errors caused by potential artifacts from synthesized view.On the other hand,the DAPSR is used for view synthesis and updating,and is able to weaken the interpolation errors caused by outliers in the disparity maps.Experiments on Middlebury benchmarks demonstrate that by introducing the synthesized view,disparity errors due to large occluded region and large baseline are eliminated effectively and the synthesis quality is greatly improved.展开更多
Background In this study, we propose view interpolation networks to reproduce changes in the brightness of an object′s surface depending on the viewing direction, which is important for reproducing the material appea...Background In this study, we propose view interpolation networks to reproduce changes in the brightness of an object′s surface depending on the viewing direction, which is important for reproducing the material appearance of a real object. Method We used an original and modified version of U-Net for image transformation. The networks were trained to generate images from the intermediate viewpoints of four cameras placed at the corners of a square. We conducted an experiment using with three different combinations of methods and training data formats. Result We determined that inputting the coordinates of the viewpoints together with the four camera images and using images from random viewpoints as the training data produces the best results.展开更多
Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual experience.This technology enhances the interactivity and freedom of multimedia performances.However,many ...Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual experience.This technology enhances the interactivity and freedom of multimedia performances.However,many free-viewpoint video synthesis methods hardly satisfy the requirement to work in real time with high precision,particularly for sports fields having large areas and numerous moving objects.To address these issues,we propose a freeviewpoint video synthesis method based on distance field acceleration.The central idea is to fuse multiview distance field information and use it to adjust the search step size adaptively.Adaptive step size search is used in two ways:for fast estimation of multiobject three-dimensional surfaces,and synthetic view rendering based on global occlusion judgement.We have implemented our ideas using parallel computing for interactive display,using CUDA and OpenGL frameworks,and have used real-world and simulated experimental datasets for evaluation.The results show that the proposed method can render free-viewpoint videos with multiple objects on large sports fields at 25 fps.Furthermore,the visual quality of our synthetic novel viewpoint images exceeds that of state-of-the-art neural-rendering-based methods.展开更多
Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users c...Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users can perceive genuine 3-D content without glasses. The multiview format also comprises much more visual information than classical 2-D or stereo 3-D content, which makes it possible to perform various interesting editing operations both on pixel-level and object-level. This survey provides a comprehensive review of existing multiview video synthesis and editing algorithms and applications. For each topic, the related technologies in classical 2-D image and video processing are reviewed. We then continue to the discussion of recent advanced techniques for multiview video virtual view synthesis and various interactive editing applications. Due to the ongoing progress on multiview video synthesis and editing, we can foresee more and more immersive 3-D video applications will appear in the future.展开更多
Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseli...Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.展开更多
The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position an...The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position and viewpoint-conditioned neural networks,3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images.Apart from fast rendering,the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction,geometry editing,and physical simulation.Considering the rapid changes and growing number of works in this field,we present a literature review of recent 3D Gaussian splatting methods,which can be roughly classified by functionality into 3D reconstruction,3D editing,and other downstream applications.Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique.This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview,aiming to stimulate future development of the 3D Gaussian splatting representation.展开更多
Image interpolation has a wide range of applications such as frame rate-up conversion and free viewpoint TV.Despite significant progresses,it remains an open challenge especially for image pairs with large displacemen...Image interpolation has a wide range of applications such as frame rate-up conversion and free viewpoint TV.Despite significant progresses,it remains an open challenge especially for image pairs with large displacements.In this paper,we first propose a novel optimization algorithm for motion estimation,which combines the advantages of both global optimization and a local parametric transformation model.We perform optimization over dynamic label sets,which are modified after each iteration using the prior of piecewise consistency to avoid local minima.Then we apply it to an image interpolation framework including occlusion handling and intermediate image interpolation.We validate the performance of our algorithm experimentally,and show that our approach achieves state-of-the-art performance.展开更多
The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representat...The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representation-based median filter as the in-loop filter in depth video codec. It constructs depth candidate set which contains relevant neighboring depth pixel based on depth and intensity similarity weighted sparse coding, then the median operation is performed on this set to select a neighboring depth pixel as the result of the filtering. The experimental results indicate that the depth bitrate is reduced by about 9% compared with anchor method. It is confirmed that the proposed method is more effective in reducing the required depth bitrates for a given synthesis quality level.展开更多
In order to solve the hole-filling mismatch problem in virtual view synthesis, a three-step repairing(TSR) algorithm was proposed. Firstly, the image with marked holes is decomposed by the non-subsampled shear wave tr...In order to solve the hole-filling mismatch problem in virtual view synthesis, a three-step repairing(TSR) algorithm was proposed. Firstly, the image with marked holes is decomposed by the non-subsampled shear wave transform(NSST), which will generate high-/low-frequency sub-images with different resolutions. Then the improved Criminisi algorithm was used to repair the texture information in the high-frequency sub-images, while the improved curvature driven diffusion(CDD) algorithm was used to repair the low-frequency sub-images with the image structure information. Finally, the repaired parts of high-frequency and low-frequency sub-images are synthesized to obtain the final image through inverse NSST. Experiments show that the peak signal-to-noise ratio(PSNR) of the TSR algorithm is improved by an average of 2-3 dB and 1-2 dB compared with the Criminisi algorithm and the nearest neighbor interpolation(NNI) algorithm, respectively.展开更多
基金Project (No. 2002CB312101) supported by the National Basic Re-search Program (973) of China
文摘This paper presents some techniques for synthesizing novel view for a virtual viewpoint from two given views cap-tured at different viewpoints to achieve both high quality and high efficiency. The whole process consists of three passes. The first pass recovers depth map. We formulate it as pixel labelling and propose a bisection approach to solve it. It is accomplished in log2n(n is the number of depth levels) steps,each of which involves a single graph cut computation. The second pass detects occluded pixels and reasons about their depth. It fits a foreground depth curve and a background depth curve using depth of nearby fore-ground and background pixels,and then distinguishes foreground and background pixels by minimizing a global energy,which involves only one graph cut computation. The third pass finds for each pixel in the novel view the corresponding pixels in the input views and computes its color. The whole process involves only a small number of graph cut computations,therefore it is efficient. And,visual artifacts in the synthesized view can be removed successfully by correcting depth of the occluded pixels. Experimental results demonstrate that both high quality and high efficiency are achieved by the proposed techniques.
文摘A new method is proposed for synthesizing intermediate views from a pair of stereoscopic images. In order to synthesize high-quality intermediate views, the block matching method together with a simplified multi-window technique and dynamic programming is used in the process of disparity estimation. Then occlusion detection is performed to locate occluded regions and their disparities are compensated. After the projecton of the left-to-right and right-to-left disparities onto the intermediate image, intermediate view is synthesized considering occluded regions. Experimental results show that our synthesis method can obtain intermediate views with higher quality.
文摘A new method of view synthesis is proposed based on Delaunay triangulation. The first step of this method is making the Delaunay triangulation of 2 reference images. Secondly, matching the image points using the epipolar geometry constraint. Finally, constructing the third view according to pixel transferring under the trilinear constraint. The method gets rid of the classic time consuming dense matching technique and takes advantage of Delaunay triangulation. So it can not only save the computation time but also enhance the quality of the synthesized view. The significance of this method is that it can be used directly in the fields of video coding, image compressing and virtual reality.
文摘For the pre-acquired serial images from camera lengthways motion, a view synthesis algorithm based on epipolar geometry constraint is proposed in this paper. It uses the whole matching and maintaining order characters of the epipolar line, Fourier transform and dynamic programming matching theories, thus truly synthesizing the destination image of current viewpoint. Through the combination of Fourier transform, epipolar geometry constraint and dynamic programming matching, the circumference distortion problem resulting from conventional view synthesis approaches is effectively avoided. The detailed implementation steps of this algorithm are given, and some running instances are presented to illustrate the results.
基金supported by the National Natural Science Foundation of China(Grant No.60832003)Key Laboratory of Advanced Display and System Application(Shanghai University),Ministry of Education,China(Grant No.P200902)the Key Project of Science and Technology Commission of Shanghai Municipality(Grant No.10510500500)
文摘Depth maps are used for synthesis virtual view in free-viewpoint television (FTV) systems. When depth maps are derived using existing depth estimation methods, the depth distortions will cause undesirable artifacts in the synthesized views. To solve this problem, a 3D video quality model base depth maps (D-3DV) for virtual view synthesis and depth map coding in the FTV applications is proposed. First, the relationships between distortions in coded depth map and rendered view are derived. Then, a precisely 3DV quality model based depth characteristics is develop for the synthesized virtual views. Finally, based on D-3DV model, a multilateral filtering is applied as a pre-processed filter to reduce rendering artifacts. The experimental results evaluated by objective and subjective methods indicate that the proposed D-3DV model can reduce bit-rate of depth coding and achieve better rendering quality.
基金supported by the National Natural Science Foundation of China(61075013)
文摘View synthesis is an important building block in three dimension(3D) video processing and communications.Based on one or several views,view synthesis creates other views for the purpose of view prediction(for compression) or view rendering(for multiview-display).The quality of view synthesis depends on how one fills the occlusion area as well as how the pixels are created.Consequently,luminance adjustment and hole filling are two key issues in view synthesis.In this paper,two views are used to produce an arbitrary virtual synthesized view.One view is merged into another view using a local luminance adjustment method,based on local neighborhood region for the calculation of adjustment coefficient.Moreover,a maximum neighborhood spreading strength hole filling method is presented to deal with the micro texture structure when the hole is being filled.For each pixel at the hole boundary,its neighborhood pixels with the maximum spreading strength direction are selected as candidates;and among them,the pixel with the maximum spreading strength is used to fill the hole from boundary to center.If there still exist disocclusion pixels after once scan,the filling process is repeated until all hole pixels are filled.Simulation results show that the proposed method is efficient,robust and achieves high performance in subjection and objection.
基金supported by the Zhengzhou Collaborative Innovation Major Project under Grant No.20XTZX06013the Henan Provincial Key Scientific Research Project of China under Grant No.22A520042。
文摘Traditional neural radiance fields for rendering novel views require intensive input images and pre-scene optimization,which limits their practical applications.We propose a generalization method to infer scenes from input images and perform high-quality rendering without pre-scene optimization named SG-NeRF(Sparse-Input Generalized Neural Radiance Fields).Firstly,we construct an improved multi-view stereo structure based on the convolutional attention and multi-level fusion mechanism to obtain the geometric features and appearance features of the scene from the sparse input images,and then these features are aggregated by multi-head attention as the input of the neural radiance fields.This strategy of utilizing neural radiance fields to decode scene features instead of mapping positions and orientations enables our method to perform cross-scene training as well as inference,thus enabling neural radiance fields to generalize for novel view synthesis on unseen scenes.We tested the generalization ability on DTU dataset,and our PSNR(peak signal-to-noise ratio)improved by 3.14 compared with the baseline method under the same input conditions.In addition,if the scene has dense input views available,the average PSNR can be improved by 1.04 through further refinement training in a short time,and a higher quality rendering effect can be obtained.
基金This work was supported in part by the National Natural Science Foundation of China(62171317 and 62122058).
文摘Novel viewpoint image synthesis is very challenging,especially from sparse views,due to large changes in viewpoint and occlusion.Existing image-based methods fail to generate reasonable results for invisible regions,while geometry-based methods have difficulties in synthesizing detailed textures.In this paper,we propose STATE,an end-to-end deep neural network,for sparse view synthesis by learning structure and texture representations.Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions,and texture is encoded as a deformed feature map to preserve detailed textures.We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation,in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps.By decoding the aggregated features,STATE is able to generate realistic images with reasonable structures and detailed textures.Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods.Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture.Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE.
基金supported by the National Natural Science Foundation of China under Grant Nos.62072284 and 61772318the Special Project of Science and Technology Innovation Base of Key Laboratory of Shandong Province for Software Engineering under Grant No.11480004042015。
文摘Indoor visual localization,i.e.,6 Degree-of-Freedom camera pose estimation for a query image with respect to a known scene,is gaining increased attention driven by rapid progress of applications such as robotics and augmented reality.However,drastic visual discrepancies between an onsite query image and prerecorded indoor images cast a significant challenge for visual localization.In this paper,based on the key observation of the constant existence of planar surfaces such as floors or walls in indoor scenes,we propose a novel system incorporating geometric information to address issues using only pixelated images.Through the system implementation,we contribute a hierarchical structure consisting of pre-scanned images and point cloud,as well as a distilled representation of the planar-element layout extracted from the original dataset.A view synthesis procedure is designed to generate synthetic images as complementary to that of a sparsely sampled dataset.Moreover,a global image descriptor based on the image statistic modality,called block mean,variance,and color(BMVC),was employed to speed up the candidate pose identification incorporated with a traditional convolutional neural network(CNN)descriptor.Experimental results on a popular benchmark demonstrate that the proposed method outperforms the state-of-the-art approaches in terms of visual localization validity and accuracy.
基金supported by the National key foundation for exploring scientific instrument(2013YQ140517)the National Natural Science Foundation of China(Grant No.61522111)the Shenzhen Peacock Plan(KQTD20140630115140843).
文摘Typical stereo algorithms treat disparity estimation and view synthesis as two sequential procedures.In this paper,we consider stereo matching and view synthesis as two complementary components,and present a novel iterative refinement model for joint view synthesis and disparity refinement.To achieve the mutual promotion between view synthesis and disparity refinement,we apply two key strategies,disparity maps fusion and disparity-assisted plane sweep-based rendering(DAPSR).On the one hand,the disparity maps fusion strategy is applied to generate disparity map from synthesized view and input views.This strategy is able to detect and counteract disparity errors caused by potential artifacts from synthesized view.On the other hand,the DAPSR is used for view synthesis and updating,and is able to weaken the interpolation errors caused by outliers in the disparity maps.Experiments on Middlebury benchmarks demonstrate that by introducing the synthesized view,disparity errors due to large occluded region and large baseline are eliminated effectively and the synthesis quality is greatly improved.
文摘Background In this study, we propose view interpolation networks to reproduce changes in the brightness of an object′s surface depending on the viewing direction, which is important for reproducing the material appearance of a real object. Method We used an original and modified version of U-Net for image transformation. The networks were trained to generate images from the intermediate viewpoints of four cameras placed at the corners of a square. We conducted an experiment using with three different combinations of methods and training data formats. Result We determined that inputting the coordinates of the viewpoints together with the four camera images and using images from random viewpoints as the training data produces the best results.
基金supported by the National Natural Science Foundation of China(Nos.62172315,62073262,and 61672429)the Fundamental Research Funds for the Central Universities,the Innovation Fund of Xidian University(No.20109205456)the Key Research and Development Program of Shaanxi(No.S2021-YF-ZDCXL-ZDLGY-0127),and HUAWEI.
文摘Free-viewpoint video allows the user to view objects from any virtual perspective,creating an immersive visual experience.This technology enhances the interactivity and freedom of multimedia performances.However,many free-viewpoint video synthesis methods hardly satisfy the requirement to work in real time with high precision,particularly for sports fields having large areas and numerous moving objects.To address these issues,we propose a freeviewpoint video synthesis method based on distance field acceleration.The central idea is to fuse multiview distance field information and use it to adjust the search step size adaptively.Adaptive step size search is used in two ways:for fast estimation of multiobject three-dimensional surfaces,and synthetic view rendering based on global occlusion judgement.We have implemented our ideas using parallel computing for interactive display,using CUDA and OpenGL frameworks,and have used real-world and simulated experimental datasets for evaluation.The results show that the proposed method can render free-viewpoint videos with multiple objects on large sports fields at 25 fps.Furthermore,the visual quality of our synthetic novel viewpoint images exceeds that of state-of-the-art neural-rendering-based methods.
基金partially supported by Innoviris(3-DLicornea project)FWO(project G.0256.15)+3 种基金supported by the National Natural Science Foundation of China(Nos.61272226 and 61373069)Research Grant of Beijing Higher Institution Engineering Research CenterTsinghua-Tencent Joint Laboratory for Internet Innovation TechnologyTsinghua University Initiative Scientific Research Program
文摘Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users can perceive genuine 3-D content without glasses. The multiview format also comprises much more visual information than classical 2-D or stereo 3-D content, which makes it possible to perform various interesting editing operations both on pixel-level and object-level. This survey provides a comprehensive review of existing multiview video synthesis and editing algorithms and applications. For each topic, the related technologies in classical 2-D image and video processing are reviewed. We then continue to the discussion of recent advanced techniques for multiview video virtual view synthesis and various interactive editing applications. Due to the ongoing progress on multiview video synthesis and editing, we can foresee more and more immersive 3-D video applications will appear in the future.
基金the Theme-based Research Scheme,Research Grants Council of Hong Kong(No.T45-205/21-N).
文摘Novel view synthesis has attracted tremendous research attention recently for its applications in virtual reality and immersive telepresence.Rendering a locally immersive light field(LF)based on arbitrary large baseline RGB references is a challenging problem that lacks efficient solutions with existing novel view synthesis techniques.In this work,we aim at truthfully rendering local immersive novel views/LF images based on large baseline LF captures and a single RGB image in the target view.To fully explore the precious information from source LF captures,we propose a novel occlusion-aware source sampler(OSS)module which efficiently transfers the pixels of source views to the target view′s frustum in an occlusion-aware manner.An attention-based deep visual fusion module is proposed to fuse the revealed occluded background content with a preliminary LF into a final refined LF.The proposed source sampling and fusion mechanism not only helps to provide information for occluded regions from varying observation angles,but also proves to be able to effectively enhance the visual rendering quality.Experimental results show that our proposed method is able to render high-quality LF images/novel views with sparse RGB references and outperforms state-of-the-art LF rendering and novel view synthesis methods.
基金supported by the National Natural Science Foundation of China(62322210)Beijing Municipal Natural Science Foundation for Distinguished Young Scholars(JQ21013)+1 种基金Beijing Municipal Science and Technology Commission(Z231100005923031)2023 Tencent AI Lab Rhino-Bird Focused Research Program.
文摘The emergence of 3D Gaussian splatting(3DGS)has greatly accelerated rendering in novel view synthesis.Unlike neural implicit representations like neural radiance fields(NeRFs)that represent a 3D scene with position and viewpoint-conditioned neural networks,3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images.Apart from fast rendering,the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction,geometry editing,and physical simulation.Considering the rapid changes and growing number of works in this field,we present a literature review of recent 3D Gaussian splatting methods,which can be roughly classified by functionality into 3D reconstruction,3D editing,and other downstream applications.Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique.This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview,aiming to stimulate future development of the 3D Gaussian splatting representation.
基金supported by the National Key Technology Research and Development Program of China(No.2017YFB1002601)PKU-Baidu Fund(No.2019BD007)National Natural Science Foundation of China(NSFC)(No.61632003).
文摘Image interpolation has a wide range of applications such as frame rate-up conversion and free viewpoint TV.Despite significant progresses,it remains an open challenge especially for image pairs with large displacements.In this paper,we first propose a novel optimization algorithm for motion estimation,which combines the advantages of both global optimization and a local parametric transformation model.We perform optimization over dynamic label sets,which are modified after each iteration using the prior of piecewise consistency to avoid local minima.Then we apply it to an image interpolation framework including occlusion handling and intermediate image interpolation.We validate the performance of our algorithm experimentally,and show that our approach achieves state-of-the-art performance.
基金Supported by the National Natural Science Foundation of China(61462048)
文摘The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representation-based median filter as the in-loop filter in depth video codec. It constructs depth candidate set which contains relevant neighboring depth pixel based on depth and intensity similarity weighted sparse coding, then the median operation is performed on this set to select a neighboring depth pixel as the result of the filtering. The experimental results indicate that the depth bitrate is reduced by about 9% compared with anchor method. It is confirmed that the proposed method is more effective in reducing the required depth bitrates for a given synthesis quality level.
基金supported by the National Natural Science Foundation of China (61834005, 61772417, 61602377, 61634004,61802304)the Shaanxi Province Key R&D Plan (2021GY-029)。
文摘In order to solve the hole-filling mismatch problem in virtual view synthesis, a three-step repairing(TSR) algorithm was proposed. Firstly, the image with marked holes is decomposed by the non-subsampled shear wave transform(NSST), which will generate high-/low-frequency sub-images with different resolutions. Then the improved Criminisi algorithm was used to repair the texture information in the high-frequency sub-images, while the improved curvature driven diffusion(CDD) algorithm was used to repair the low-frequency sub-images with the image structure information. Finally, the repaired parts of high-frequency and low-frequency sub-images are synthesized to obtain the final image through inverse NSST. Experiments show that the peak signal-to-noise ratio(PSNR) of the TSR algorithm is improved by an average of 2-3 dB and 1-2 dB compared with the Criminisi algorithm and the nearest neighbor interpolation(NNI) algorithm, respectively.