Background Generally, it is difficult to obtain accurate pose and depth for a non-rigid moving object from a single RGB camera to create augmented reality (AR). In this study, we build an augmented reality system from...Background Generally, it is difficult to obtain accurate pose and depth for a non-rigid moving object from a single RGB camera to create augmented reality (AR). In this study, we build an augmented reality system from a single RGB camera for a non-rigid moving human by accurately computing pose and depth, for which two key tasks are segmentation and monocular Simultaneous Localization and Mapping (SLAM). Most existing monocular SLAM systems are designed for static scenes, while in this AR system, the human body is always moving and non-rigid. Methods In order to make the SLAM system suitable for a moving human, we first segment the rigid part of the human in each frame. A segmented moving body part can be regarded as a static object, and the relative motions between each moving body part and the camera can be considered the motion of the camera. Typical SLAM systems designed for static scenes can then be applied. In the segmentation step of this AR system, we first employ the proposed BowtieNet, which adds the atrous spatial pyramid pooling (ASPP) of DeepLab between the encoder and decoder of SegNet to segment the human in the original frame, and then we use color information to extract the face from the segmented human area. Results Based on the human segmentation results and a monocular SLAM, this system can change the video background and add a virtual object to humans. Conclusions The experiments on the human image segmentation datasets show that BowtieNet obtains state-of-the-art human image segmentation performance and enough speed for real-time segmentation. The experiments on videos show that the proposed AR system can robustly add a virtual object to humans and can accurately change the video background.展开更多
Several new models and formats for the digital transformation of the manufacturing industry appear because of the rapid integration of information technology and the real economy,as well as the increasingly obvious ev...Several new models and formats for the digital transformation of the manufacturing industry appear because of the rapid integration of information technology and the real economy,as well as the increasingly obvious evolution trend of industrial digitalization,networking,and intelligence.Among them,digital twins have increasingly become a research hotspot in all sectors of the industry and have broad prospects.It maps physical objects in virtual space in a digital way and simulates their behavioral characteristics in real environments.It makes the gap between virtuality and reality disappear based on their closed-loop interaction.Digital twins are undoubtedly an important and strategic technology in response to familiar products,production,and services.It can also speculate some indicators that cannot be directly measured by machine learning through collecting the direct data of limited physical sensor indicators.This can realize an assessment of the current state,a diagnosis of past problems,and a prediction of future trends,and simulate possibilities to provide more comprehensive decision support.展开更多
Image-based rendering is important both in the field of computer graphics and computer vision,and it is also widely used in virtual reality technology.For more than two decades,people have done a lot of work on the re...Image-based rendering is important both in the field of computer graphics and computer vision,and it is also widely used in virtual reality technology.For more than two decades,people have done a lot of work on the research of image-based rendering,and these methods can be divided into two categories according to whether the geometric information of the scene is utilized.According to this classification,we introduce some classical methods and representative methods proposed in recent years.We also compare and analyze the basic principles,advantages and disadvantages of different methods.Finally,some suggestions are given for research directions on image-based rendering techniques in the future.展开更多
Background Assembly guided by paper documents is the most widespread type used in the process of aircraft cable assembly.This process is very complicated and requires assembly workers with high-level skills.The techno...Background Assembly guided by paper documents is the most widespread type used in the process of aircraft cable assembly.This process is very complicated and requires assembly workers with high-level skills.The technologies of wearable Augmented Reality(AR)and portable visual inspection can be exploited to improve the efficiency and the quality of cable assembly.Methods In this study,we propose a smart assistance system for cable assembly that combines wearable AR with portable visual inspection.Specifically,a portable visual device based on binocular vision and deep learning is developed to realize fast detection and recognition of cable brackets that are installed on aircraft airframes.A Convolutional Neural Network(CNN)is then developed to read the texts on cables after images are acquired from the camera of the wearable AR device.An authoring tool that was developed to create and manage the assembly process is proposed to realize visual guidance of the cable assembly process based on a wearable AR device.The system is applied to cable assembly on an aircraft bulkhead prototype.Results The results show that this system can recognize the number,types,and locations of brackets,and can correctly read the text of aircraft cables.The authoring tool can assist users who lack professional programming experience in establishing a process plan,i.e.,assembly outline based on AR for cable assembly.Conclusions The system can provide quick assembly guidance for aircraft cable with texts,images,and a 3 D model.It is beneficial for reducing the dependency on paper documents,labor intensity,and the error rate.展开更多
Visual reality(VR)health-monitoring by flexible electronics provides a new avenue to remote and wearable medicine.The combination of flexible electronics and VR could facilitate smart remote disease diagnosis by real-...Visual reality(VR)health-monitoring by flexible electronics provides a new avenue to remote and wearable medicine.The combination of flexible electronics and VR could facilitate smart remote disease diagnosis by real-time monitoring of the physiological signals and remote interaction between patient and physician.The flexible healthcare sensor is the most crucial unit in the flexible and wearable health-monitoring system,which has attracted much attention in recent years.This paper briefly reviews the progress in flexible healthcare sensors and VR healthcare devices.The flexible healthcare sensor is introduced with basic flexible materials,manufacturing techniques,and their applications in health-monitoring(such as blood/sweat detection and heart-rate tracking).VR healthcare devices for telemedicine diagnosis are discussed,and the smart remote diagnosis system using flexible and wearable healthcare sensors,and a VR device,is addressed.展开更多
Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes...Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.展开更多
Background Monocular depth estimation aims to predict a dense depth map from a single RGB image,and has important applications in 3D reconstruction,automatic driving,and augmented reality.However,existing methods dire...Background Monocular depth estimation aims to predict a dense depth map from a single RGB image,and has important applications in 3D reconstruction,automatic driving,and augmented reality.However,existing methods directly feed the original RGB image into the model to extract depth features without avoiding the interference of depth-irrelevant information on depth-estimation accuracy,which leads to inferior performance.Methods To remove the influence of depth-irrelevant information and improve the depth-prediction accuracy,we propose RADepthNet,a novel reflectance-guided network that fuses boundary features.Specifically,our method predicts depth maps using the following three steps:(1)Intrinsic Image Decomposition.We propose a reflectance extraction module consisting of an encoder-decoder structure to extract the depth-related reflectance.Through an ablation study,we demonstrate that the module can reduce the influence of illumination on depth estimation.(2)Boundary Detection.A boundary extraction module,consisting of an encoder,refinement block,and upsample block,was proposed to better predict the depth at object boundaries utilizing gradient constraints.(3)Depth Prediction Module.We use an encoder different from(2)to obtain depth features from the reflectance map and fuse boundary features to predict depth.In addition,we proposed FIFADataset,a depth-estimation dataset applied in soccer scenarios.Results Extensive experiments on a public dataset and our proposed FIFADataset show that our method achieves state-of-the-art performance.展开更多
Background As a novel approach for people to directly communicate with an external device,the study of brain-computer interfaces(BCIs)has become well-rounded.However,similar to the real-world scenario,where individual...Background As a novel approach for people to directly communicate with an external device,the study of brain-computer interfaces(BCIs)has become well-rounded.However,similar to the real-world scenario,where individuals are expected to work in groups,the BCI systems should be able to replicate group attributes.Methods We proposed a 4-order cumulants feature extraction method(CUM4-CSP)based on the common spatial patterns(CSP)algorithm.Simulation experiments conducted using motion visual evoked potentials(mVEP)EEG data verified the robustness of the proposed algorithm.In addition,to freely choose paradigms,we adopted the mVEP and steady-state visual evoked potential(SSVEP)paradigms and designed a multimodal collaborative BCI system based on the proposed CUM4-CSP algorithm.The feasibility of the proposed multimodal collaborative system framework was demonstrated using a multiplayer game controlling system that simultaneously facilitates the coordination and competitive control of two users on external devices.To verify the robustness of the proposed scheme,we recruited 30 subjects to conduct online game control experiments,and the results were statistically analyzed.Results The simulation results prove that the proposed CUM4-CSP algorithm has good noise immunity.The online experimental results indicate that the subjects could reliably perform the game confrontation operation with the selected BCI paradigm.Conclusions The proposed CUM4-CSP algorithm can effectively extract features from EEG data in a noisy environment.Additionally,the proposed scheme may provide a new solution for EEG-based group BCI research.展开更多
Background Within a virtual environment(VE)the control of locomotion(e.g.,self-travel)is critical for creating a realistic and functional experience.Usually the direction of locomotion,whileusing a head-mounted displa...Background Within a virtual environment(VE)the control of locomotion(e.g.,self-travel)is critical for creating a realistic and functional experience.Usually the direction of locomotion,whileusing a head-mounted display(HMD),is determined by the direction the head is pointing and the forwardor backward motion is controlled with a hand held controllers.However,hand held devices can be difficultto use while the eyes are covered with a HMD.Free hand gestures,that are tracked with a camera or ahand data glove,have an advantage of eliminating the need to look at the hand controller but the design ofhand or finger gestures for this purpose has not been well developed.Methods This study used a depth-sensing camera to track fingertip location(curling and straightening the fingers),which was converted toforward or backward self-travel in the VE.Fingertip position was converted to self-travel velocity using amapping function with three parameters:a region of zero velocity(dead zone)around the relaxed handposition,a linear relationship of fingertip position to velocity(slope orβ)beginning at the edge of the deadzone,and an exponential relationship rather than a linear one mapping fingertip position to velocity(exponent).Using a HMD,participants moved forward along a virtual road and stopped at a target on theroad by controlling self-travel velocity with finger flexion and extension.Each of the 3 mapping functionparameters was tested at 3 levels.Outcomes measured included usability ratings,fatigue,nausea,and timeto complete the tasks.Results Twenty subjects participated but five did not complete the study due tonausea.The size of the dead zone had little effect on performance or usability.Subjects preferred lower β values which were associated with better subjective ratings of control and reduced time to complete thetask,especially for large targets.Exponent values of 1.0 or greater were preferred and reduced the time tocomplete the task,especially for small targets.Conclusions Small finger movements can be used tocontrol velocity of self-travel in VE.The functions used for converting fingertip position to movementvelocity influence usability and performance.展开更多
Backgrounds This work emphasizes the current research status of the urban Digital Twins to establish an intelligent spatiotemporal framework.A Geospatial Artificial Intelligent(GeoAI)system is developed based on the G...Backgrounds This work emphasizes the current research status of the urban Digital Twins to establish an intelligent spatiotemporal framework.A Geospatial Artificial Intelligent(GeoAI)system is developed based on the Geographic Information System and Artificial Intelligence.It integrates multi-video technology and Virtual City in urban Digital Twins.Methods Besides,an improved small object detection model is proposed:YOLOv5-Pyramid,and Siamese network video tracking models,namely MPSiam and FSSiamese,are established.Finally,an experimental platform is built to verify the georeferencing correction scheme of video images.Result The MultiplyAccumulate value of MPSiam is 0.5B,and that of ResNet50-Siam is 4.5B.Besides,the model is compressed by 4.8times.The inference speed has increased by 3.3 times,reaching 83 Frames Per Second.3%of the Average Expectation Overlap is lost.Therefore,the urban Digital Twins-oriented GeoAI framework established here has excellent performance for video georeferencing and target detection problems.展开更多
Background In virtual environments(VEs),users can explore a large virtual scene through the viewpoint operation of a head-mounted display(HMD)and movement gains combined with redirected walking technology.The existing...Background In virtual environments(VEs),users can explore a large virtual scene through the viewpoint operation of a head-mounted display(HMD)and movement gains combined with redirected walking technology.The existing redirection methods and viewpoint operations are effective in the horizontal direction;however,they cannot help participants experience immersion in the vertical direction.To improve the immersion of upslope walking,this study presents a virtual climbing system based on passive haptics.Methods This virtual climbing system uses the tactile feedback provided by sponges,a commonly used flexible material,to simulate the tactile sense of a user's soles.In addition,the visual stimulus of the HMD,the tactile feedback of the flexible material,and the operation of the user's walking in a VE combined with redirection technology are all adopted to enhance the user's perception in a VE.In the experiments,a physical space with a hard-flat floor and three types of sponges with thicknesses of 3,5,and 8cm were utilized.Results We recruited 40 volunteers to conduct these experiments,and the results showed that a thicker flexible material increases the difficulty for users to roam and walk within a certain range.Conclusion The virtual climbing system can enhance users'perception of upslope walking in a VE.展开更多
Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables dom...Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables domain predominantly emphasizes sensor functionality and quantity,often skipping crucial aspects related to user experience and interaction.Methods To address this gap,this study introduces a novel real-time 3D interactive system based on intelligent garments.The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulsed neural units to classify and recognize human movements,thereby achieving real-time interaction between users and sensors.Additionally,the system incorporates 3D human visualization functionality,which visualizes sensor data and recognizes human actions as 3D models in real time,providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion.This system has significant potential for applications in motion detection,medical monitoring,virtual reality,and other fields.The accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies.Conclusions This study has substantial implications in the domains of intelligent garments,human motion monitoring,and digital twin visualization.The advancement of this system is expected to propel the progress of wearable technology and foster a deeper comprehension of human motion.展开更多
The research on 3D scene viewpoints has been a frontier problem in computer graphics and virtual reality technology.In a pioneering study,it had been extensively used in virtual scene understanding,image-based modelin...The research on 3D scene viewpoints has been a frontier problem in computer graphics and virtual reality technology.In a pioneering study,it had been extensively used in virtual scene understanding,image-based modeling,and visualization computing.With the development of computer graphics and the human-computer interaction,the viewpoint evaluation becomes more significant for the comprehensive understanding of complex scenes.The high-quality viewpoints could navigate observers to the region of interest,help subjects to seek the hidden relations of hierarchical structure,and improve the efficiency of virtual exploration.These studies later contributed to research such as robot vision,dynamic scene planning,virtual driving and artificial intelligence navigation.The introduction of visual perception had The introduction of visual perception had contributed to the inspiration of viewpoints research,and the combination with machine learning made significant progress in the viewpoints selection.The viewpoints research also has been significant in the optimization of global lighting,visualization calculation,3D supervising rendering,and reconstruction of a virtual scene.Additionally,it has a huge potential in novel fields such as 3D model retrieval,virtual tactile analysis,human visual perception research,salient point calculation,ray tracing optimization,molecular visualization,and intelligent scene computing.展开更多
Mobile laser scanning(MLS)systems mainly comprise laser scanners and mobile mapping platforms.Typical MLS systems can acquire three-dimensional point clouds with 1-10cm point spacings at a normal driving or walking sp...Mobile laser scanning(MLS)systems mainly comprise laser scanners and mobile mapping platforms.Typical MLS systems can acquire three-dimensional point clouds with 1-10cm point spacings at a normal driving or walking speed in streets or indoor environments.The efficiency and stability of these systems make them extremely useful for application in three-dimensional urban modeling.This paper reviews the latest advances of the LiDAR-based mobile mapping system(MMS)point cloud in the field of 3D modeling,including LiDAR simultaneous localization and mapping,point cloud registration,feature extraction,object extraction,semantic segmentation,and processing using deep learning.Furthermore,typical urban modeling applications based on MMS are also discussed.展开更多
Background To solve the problem of visualization in augmented reality(AR),for assembly process information,we report here on our study into the composition of AR assembly process information.Methods Our work led us to...Background To solve the problem of visualization in augmented reality(AR),for assembly process information,we report here on our study into the composition of AR assembly process information.Methods Our work led us to classify the visual elements of assembly processes into six categories,and after looking further into visual element expression characteristics used in assembly process information in the AR environment,standard assembly process elements have been identified and visual element layout principles studied.Conclusion Typical visualization elements have been presented,using an AR-based assembly instruction system.展开更多
Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an ...Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an appropriate benchmark.For AR applications in practice,a variety of challenging situations(e.g.,fast motion,strong rotation,serious motion blur,dynamic interference)may be easily encountered since a home user may not carefully move the AR device,and the real environment may be quite complex.In addition,the frequency of camera lost should be minimized and the recovery from the failure status should be fast and accurate for good AR experience.Existing SLAM datasets/benchmarks generally only provide the evaluation of pose accuracy and their camera motions are somehow simple and do not fit well the common cases in the mobile AR applications.With the above motivation,we build a new visual-inertial dataset as well as a series of evaluation criteria for AR.We also review the existing monocular VSLAM/VISLAM approaches with detailed analyses and comparisons.Especially,we select 8 representative monocular VSLAM/VISLAM approaches/systems and quantitatively evaluate them on our benchmark.Our dataset,sample code and corresponding evaluation tools are available at the benchmark website http://www.zjucvg.net/eval-vislam/.展开更多
Hands play an important role in our daily life.We use our hands for manipulation in working,emphasis in speaking,communication in non-verbal environment,etc.Hand gesture not only uses for simple commands in traffic co...Hands play an important role in our daily life.We use our hands for manipulation in working,emphasis in speaking,communication in non-verbal environment,etc.Hand gesture not only uses for simple commands in traffic control,but also extends as a kind of language-sign language.In the areas of VR/AR and HCI,understanding hand and its action can greatly improve user experience.This covers a broad topics related to hands,including hand detection,tracking,hand pose estimation,gesture recognition,and sign language translation.Four papers are collected in this issue.They cover different topics related to hand and gesture.展开更多
Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one ...Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one key issue is path and view planning,which tells UAVs exactly where to fly and how to search.Methods With specific consideration for three popular UAV applications(scene reconstruction,environment exploration,and aerial cinematography),we present a survey that should assist researchers in positioning and evaluating their works in the context of existing solutions.Results/Conclusions It should also help newcomers and practitioners in related fields quickly gain an overview of the vast literature.In addition to the current research status,we analyze and elaborate on advantages,disadvantages,and potential explorative trends for each application domain.展开更多
文摘Background Generally, it is difficult to obtain accurate pose and depth for a non-rigid moving object from a single RGB camera to create augmented reality (AR). In this study, we build an augmented reality system from a single RGB camera for a non-rigid moving human by accurately computing pose and depth, for which two key tasks are segmentation and monocular Simultaneous Localization and Mapping (SLAM). Most existing monocular SLAM systems are designed for static scenes, while in this AR system, the human body is always moving and non-rigid. Methods In order to make the SLAM system suitable for a moving human, we first segment the rigid part of the human in each frame. A segmented moving body part can be regarded as a static object, and the relative motions between each moving body part and the camera can be considered the motion of the camera. Typical SLAM systems designed for static scenes can then be applied. In the segmentation step of this AR system, we first employ the proposed BowtieNet, which adds the atrous spatial pyramid pooling (ASPP) of DeepLab between the encoder and decoder of SegNet to segment the human in the original frame, and then we use color information to extract the face from the segmented human area. Results Based on the human segmentation results and a monocular SLAM, this system can change the video background and add a virtual object to humans. Conclusions The experiments on the human image segmentation datasets show that BowtieNet obtains state-of-the-art human image segmentation performance and enough speed for real-time segmentation. The experiments on videos show that the proposed AR system can robustly add a virtual object to humans and can accurately change the video background.
文摘Several new models and formats for the digital transformation of the manufacturing industry appear because of the rapid integration of information technology and the real economy,as well as the increasingly obvious evolution trend of industrial digitalization,networking,and intelligence.Among them,digital twins have increasingly become a research hotspot in all sectors of the industry and have broad prospects.It maps physical objects in virtual space in a digital way and simulates their behavioral characteristics in real environments.It makes the gap between virtuality and reality disappear based on their closed-loop interaction.Digital twins are undoubtedly an important and strategic technology in response to familiar products,production,and services.It can also speculate some indicators that cannot be directly measured by machine learning through collecting the direct data of limited physical sensor indicators.This can realize an assessment of the current state,a diagnosis of past problems,and a prediction of future trends,and simulate possibilities to provide more comprehensive decision support.
基金National Natural Science Foundation of China(61632003).
文摘Image-based rendering is important both in the field of computer graphics and computer vision,and it is also widely used in virtual reality technology.For more than two decades,people have done a lot of work on the research of image-based rendering,and these methods can be divided into two categories according to whether the geometric information of the scene is utilized.According to this classification,we introduce some classical methods and representative methods proposed in recent years.We also compare and analyze the basic principles,advantages and disadvantages of different methods.Finally,some suggestions are given for research directions on image-based rendering techniques in the future.
基金the Civil Airplane Technology Development Program(MJ-2017-G-70)Defense Industrial Technology Development Program(JCKY 2018601 C 011)the MIIT(Ministry of Industry and Information Technology)Key Laboratory of Smart Manufacturing for High-end Aerospace Products,and the Beijing Key Laboratory of Digital Design and Manufacturing.
文摘Background Assembly guided by paper documents is the most widespread type used in the process of aircraft cable assembly.This process is very complicated and requires assembly workers with high-level skills.The technologies of wearable Augmented Reality(AR)and portable visual inspection can be exploited to improve the efficiency and the quality of cable assembly.Methods In this study,we propose a smart assistance system for cable assembly that combines wearable AR with portable visual inspection.Specifically,a portable visual device based on binocular vision and deep learning is developed to realize fast detection and recognition of cable brackets that are installed on aircraft airframes.A Convolutional Neural Network(CNN)is then developed to read the texts on cables after images are acquired from the camera of the wearable AR device.An authoring tool that was developed to create and manage the assembly process is proposed to realize visual guidance of the cable assembly process based on a wearable AR device.The system is applied to cable assembly on an aircraft bulkhead prototype.Results The results show that this system can recognize the number,types,and locations of brackets,and can correctly read the text of aircraft cables.The authoring tool can assist users who lack professional programming experience in establishing a process plan,i.e.,assembly outline based on AR for cable assembly.Conclusions The system can provide quick assembly guidance for aircraft cable with texts,images,and a 3 D model.It is beneficial for reducing the dependency on paper documents,labor intensity,and the error rate.
基金the Fundamental Research Funds for the Central Universities(3102019PY004),a start-up funding from Northwestern Polytechnical University.
文摘Visual reality(VR)health-monitoring by flexible electronics provides a new avenue to remote and wearable medicine.The combination of flexible electronics and VR could facilitate smart remote disease diagnosis by real-time monitoring of the physiological signals and remote interaction between patient and physician.The flexible healthcare sensor is the most crucial unit in the flexible and wearable health-monitoring system,which has attracted much attention in recent years.This paper briefly reviews the progress in flexible healthcare sensors and VR healthcare devices.The flexible healthcare sensor is introduced with basic flexible materials,manufacturing techniques,and their applications in health-monitoring(such as blood/sweat detection and heart-rate tracking).VR healthcare devices for telemedicine diagnosis are discussed,and the smart remote diagnosis system using flexible and wearable healthcare sensors,and a VR device,is addressed.
基金Supported by National Natural Science Foundation of China(61872024)National Key R&D Program of China under Grant(2018YFB2100603).
文摘Background In this study,we propose a novel 3D scene graph prediction approach for scene understanding from point clouds.Methods It can automatically organize the entities of a scene in a graph,where objects are nodes and their relationships are modeled as edges.More specifically,we employ the DGCNN to capture the features of objects and their relationships in the scene.A Graph Attention Network(GAT)is introduced to exploit latent features obtained from the initial estimation to further refine the object arrangement in the graph structure.A one loss function modified from cross entropy with a variable weight is proposed to solve the multi-category problem in the prediction of object and predicate.Results Experiments reveal that the proposed approach performs favorably against the state-of-the-art methods in terms of predicate classification and relationship prediction and achieves comparable performance on object classification prediction.Conclusions The 3D scene graph prediction approach can form an abstract description of the scene space from point clouds.
基金Supported by the National Natural Science Foundation of China under Grants 61872241, 62077037 and 62077037Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0102。
文摘Background Monocular depth estimation aims to predict a dense depth map from a single RGB image,and has important applications in 3D reconstruction,automatic driving,and augmented reality.However,existing methods directly feed the original RGB image into the model to extract depth features without avoiding the interference of depth-irrelevant information on depth-estimation accuracy,which leads to inferior performance.Methods To remove the influence of depth-irrelevant information and improve the depth-prediction accuracy,we propose RADepthNet,a novel reflectance-guided network that fuses boundary features.Specifically,our method predicts depth maps using the following three steps:(1)Intrinsic Image Decomposition.We propose a reflectance extraction module consisting of an encoder-decoder structure to extract the depth-related reflectance.Through an ablation study,we demonstrate that the module can reduce the influence of illumination on depth estimation.(2)Boundary Detection.A boundary extraction module,consisting of an encoder,refinement block,and upsample block,was proposed to better predict the depth at object boundaries utilizing gradient constraints.(3)Depth Prediction Module.We use an encoder different from(2)to obtain depth features from the reflectance map and fuse boundary features to predict depth.In addition,we proposed FIFADataset,a depth-estimation dataset applied in soccer scenarios.Results Extensive experiments on a public dataset and our proposed FIFADataset show that our method achieves state-of-the-art performance.
基金Supported by the National Natural Science Foundation of China(U19A2082,61961160705,61901077)the National Key Research and Development Plan of China(2017YFB1002501)the Key R&D Program of Guangdong Province,China(2018B030339001).
文摘Background As a novel approach for people to directly communicate with an external device,the study of brain-computer interfaces(BCIs)has become well-rounded.However,similar to the real-world scenario,where individuals are expected to work in groups,the BCI systems should be able to replicate group attributes.Methods We proposed a 4-order cumulants feature extraction method(CUM4-CSP)based on the common spatial patterns(CSP)algorithm.Simulation experiments conducted using motion visual evoked potentials(mVEP)EEG data verified the robustness of the proposed algorithm.In addition,to freely choose paradigms,we adopted the mVEP and steady-state visual evoked potential(SSVEP)paradigms and designed a multimodal collaborative BCI system based on the proposed CUM4-CSP algorithm.The feasibility of the proposed multimodal collaborative system framework was demonstrated using a multiplayer game controlling system that simultaneously facilitates the coordination and competitive control of two users on external devices.To verify the robustness of the proposed scheme,we recruited 30 subjects to conduct online game control experiments,and the results were statistically analyzed.Results The simulation results prove that the proposed CUM4-CSP algorithm has good noise immunity.The online experimental results indicate that the subjects could reliably perform the game confrontation operation with the selected BCI paradigm.Conclusions The proposed CUM4-CSP algorithm can effectively extract features from EEG data in a noisy environment.Additionally,the proposed scheme may provide a new solution for EEG-based group BCI research.
文摘Background Within a virtual environment(VE)the control of locomotion(e.g.,self-travel)is critical for creating a realistic and functional experience.Usually the direction of locomotion,whileusing a head-mounted display(HMD),is determined by the direction the head is pointing and the forwardor backward motion is controlled with a hand held controllers.However,hand held devices can be difficultto use while the eyes are covered with a HMD.Free hand gestures,that are tracked with a camera or ahand data glove,have an advantage of eliminating the need to look at the hand controller but the design ofhand or finger gestures for this purpose has not been well developed.Methods This study used a depth-sensing camera to track fingertip location(curling and straightening the fingers),which was converted toforward or backward self-travel in the VE.Fingertip position was converted to self-travel velocity using amapping function with three parameters:a region of zero velocity(dead zone)around the relaxed handposition,a linear relationship of fingertip position to velocity(slope orβ)beginning at the edge of the deadzone,and an exponential relationship rather than a linear one mapping fingertip position to velocity(exponent).Using a HMD,participants moved forward along a virtual road and stopped at a target on theroad by controlling self-travel velocity with finger flexion and extension.Each of the 3 mapping functionparameters was tested at 3 levels.Outcomes measured included usability ratings,fatigue,nausea,and timeto complete the tasks.Results Twenty subjects participated but five did not complete the study due tonausea.The size of the dead zone had little effect on performance or usability.Subjects preferred lower β values which were associated with better subjective ratings of control and reduced time to complete thetask,especially for large targets.Exponent values of 1.0 or greater were preferred and reduced the time tocomplete the task,especially for small targets.Conclusions Small finger movements can be used tocontrol velocity of self-travel in VE.The functions used for converting fingertip position to movementvelocity influence usability and performance.
基金Supported by Key R&D Program of the Ministry of Science and Technology (2019YFC0810704)Key R&D Program of Guangdong Province (2019B111102002)Shenzhen Science and Technology Program (KCXFZ202002011007040)。
文摘Backgrounds This work emphasizes the current research status of the urban Digital Twins to establish an intelligent spatiotemporal framework.A Geospatial Artificial Intelligent(GeoAI)system is developed based on the Geographic Information System and Artificial Intelligence.It integrates multi-video technology and Virtual City in urban Digital Twins.Methods Besides,an improved small object detection model is proposed:YOLOv5-Pyramid,and Siamese network video tracking models,namely MPSiam and FSSiamese,are established.Finally,an experimental platform is built to verify the georeferencing correction scheme of video images.Result The MultiplyAccumulate value of MPSiam is 0.5B,and that of ResNet50-Siam is 4.5B.Besides,the model is compressed by 4.8times.The inference speed has increased by 3.3 times,reaching 83 Frames Per Second.3%of the Average Expectation Overlap is lost.Therefore,the urban Digital Twins-oriented GeoAI framework established here has excellent performance for video georeferencing and target detection problems.
基金the National Key R&D Program of China(2018YFB1404100)National Natural Science Foundation of China(62072405)Zhejiang Provincial Natural Science Foundation of China(LGF20F020017).
文摘Background In virtual environments(VEs),users can explore a large virtual scene through the viewpoint operation of a head-mounted display(HMD)and movement gains combined with redirected walking technology.The existing redirection methods and viewpoint operations are effective in the horizontal direction;however,they cannot help participants experience immersion in the vertical direction.To improve the immersion of upslope walking,this study presents a virtual climbing system based on passive haptics.Methods This virtual climbing system uses the tactile feedback provided by sponges,a commonly used flexible material,to simulate the tactile sense of a user's soles.In addition,the visual stimulus of the HMD,the tactile feedback of the flexible material,and the operation of the user's walking in a VE combined with redirection technology are all adopted to enhance the user's perception in a VE.In the experiments,a physical space with a hard-flat floor and three types of sponges with thicknesses of 3,5,and 8cm were utilized.Results We recruited 40 volunteers to conduct these experiments,and the results showed that a thicker flexible material increases the difficulty for users to roam and walk within a certain range.Conclusion The virtual climbing system can enhance users'perception of upslope walking in a VE.
基金Supported by the National Natural Science Foundation of China (62202346)Hubei Key Research and Development Program (2021BAA042)+3 种基金Open project of Engineering Research Center of Hubei Province for Clothing Information (2022HBCI01)Wuhan Applied Basic Frontier Research Project (2022013988065212)MIIT′s AI Industry Innovation Task Unveils Flagship Projects (Key Technologies,Equipment,and Systems for Flexible Customized and Intelligent Manufacturing in the Clothing Industry)Hubei Science and Technology Project of Safe Production Special Fund (Scene Control Platform Based on Proprioception Information Computing of Artificial Intelligence)。
文摘Background Intelligent garments,a burgeoning class of wearable devices,have extensive applications in domains such as sports training and medical rehabilitation.Nonetheless,existing research in the smart wearables domain predominantly emphasizes sensor functionality and quantity,often skipping crucial aspects related to user experience and interaction.Methods To address this gap,this study introduces a novel real-time 3D interactive system based on intelligent garments.The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulsed neural units to classify and recognize human movements,thereby achieving real-time interaction between users and sensors.Additionally,the system incorporates 3D human visualization functionality,which visualizes sensor data and recognizes human actions as 3D models in real time,providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion.This system has significant potential for applications in motion detection,medical monitoring,virtual reality,and other fields.The accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies.Conclusions This study has substantial implications in the domains of intelligent garments,human motion monitoring,and digital twin visualization.The advancement of this system is expected to propel the progress of wearable technology and foster a deeper comprehension of human motion.
基金Beijing imaging technology advanced innovation center funding(BAI-CIT-2016024).
文摘The research on 3D scene viewpoints has been a frontier problem in computer graphics and virtual reality technology.In a pioneering study,it had been extensively used in virtual scene understanding,image-based modeling,and visualization computing.With the development of computer graphics and the human-computer interaction,the viewpoint evaluation becomes more significant for the comprehensive understanding of complex scenes.The high-quality viewpoints could navigate observers to the region of interest,help subjects to seek the hidden relations of hierarchical structure,and improve the efficiency of virtual exploration.These studies later contributed to research such as robot vision,dynamic scene planning,virtual driving and artificial intelligence navigation.The introduction of visual perception had The introduction of visual perception had contributed to the inspiration of viewpoints research,and the combination with machine learning made significant progress in the viewpoints selection.The viewpoints research also has been significant in the optimization of global lighting,visualization calculation,3D supervising rendering,and reconstruction of a virtual scene.Additionally,it has a huge potential in novel fields such as 3D model retrieval,virtual tactile analysis,human visual perception research,salient point calculation,ray tracing optimization,molecular visualization,and intelligent scene computing.
文摘Mobile laser scanning(MLS)systems mainly comprise laser scanners and mobile mapping platforms.Typical MLS systems can acquire three-dimensional point clouds with 1-10cm point spacings at a normal driving or walking speed in streets or indoor environments.The efficiency and stability of these systems make them extremely useful for application in three-dimensional urban modeling.This paper reviews the latest advances of the LiDAR-based mobile mapping system(MMS)point cloud in the field of 3D modeling,including LiDAR simultaneous localization and mapping,point cloud registration,feature extraction,object extraction,semantic segmentation,and processing using deep learning.Furthermore,typical urban modeling applications based on MMS are also discussed.
基金Industrial Technology Development Program(JCKY2016204A502).
文摘Background To solve the problem of visualization in augmented reality(AR),for assembly process information,we report here on our study into the composition of AR assembly process information.Methods Our work led us to classify the visual elements of assembly processes into six categories,and after looking further into visual element expression characteristics used in assembly process information in the AR environment,standard assembly process elements have been identified and visual element layout principles studied.Conclusion Typical visualization elements have been presented,using an AR-based assembly instruction system.
基金the National Key Research and Development Program of China(2016YFB1001501)NSF of China(61672457)+1 种基金the Fundamental Research Funds for the Central Universities(2018FZA5011)Zhejiang University-SenseTime Joint Lab of 3D Vision.
文摘Although VSLAM/VISLAM has achieved great success,it is still difficult to quantitatively evaluate the localization results of different kinds of SLAM systems from the aspect of augmented reality due to the lack of an appropriate benchmark.For AR applications in practice,a variety of challenging situations(e.g.,fast motion,strong rotation,serious motion blur,dynamic interference)may be easily encountered since a home user may not carefully move the AR device,and the real environment may be quite complex.In addition,the frequency of camera lost should be minimized and the recovery from the failure status should be fast and accurate for good AR experience.Existing SLAM datasets/benchmarks generally only provide the evaluation of pose accuracy and their camera motions are somehow simple and do not fit well the common cases in the mobile AR applications.With the above motivation,we build a new visual-inertial dataset as well as a series of evaluation criteria for AR.We also review the existing monocular VSLAM/VISLAM approaches with detailed analyses and comparisons.Especially,we select 8 representative monocular VSLAM/VISLAM approaches/systems and quantitatively evaluate them on our benchmark.Our dataset,sample code and corresponding evaluation tools are available at the benchmark website http://www.zjucvg.net/eval-vislam/.
文摘Hands play an important role in our daily life.We use our hands for manipulation in working,emphasis in speaking,communication in non-verbal environment,etc.Hand gesture not only uses for simple commands in traffic control,but also extends as a kind of language-sign language.In the areas of VR/AR and HCI,understanding hand and its action can greatly improve user experience.This covers a broad topics related to hands,including hand detection,tracking,hand pose estimation,gesture recognition,and sign language translation.Four papers are collected in this issue.They cover different topics related to hand and gesture.
基金LHTD(20170003)and the Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ).
文摘Background In recent decades,unmanned aerial vehicles(UAVs)have developed rapidly and been widely applied in many domains,including photography,reconstruction,monitoring,and search and rescue.In such applications,one key issue is path and view planning,which tells UAVs exactly where to fly and how to search.Methods With specific consideration for three popular UAV applications(scene reconstruction,environment exploration,and aerial cinematography),we present a survey that should assist researchers in positioning and evaluating their works in the context of existing solutions.Results/Conclusions It should also help newcomers and practitioners in related fields quickly gain an overview of the vast literature.In addition to the current research status,we analyze and elaborate on advantages,disadvantages,and potential explorative trends for each application domain.