The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance g...The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly.展开更多
The problem of associating the agricultural market names on web sites with their locations is essential for geographical analysis of the agricultural products. In this paper, an algorithm which employs the administrat...The problem of associating the agricultural market names on web sites with their locations is essential for geographical analysis of the agricultural products. In this paper, an algorithm which employs the administrative ontology and the statistics from the search results were proposed. The experiments with 100 market names collected from web sites were conducted. The experimental results demonstrate that the algorithm proposed obtains satisfactory performance in resolving the problem above, thus the effectiveness of the method is verified.展开更多
Support Vector Machine (SVM) is a powerful methodology for solving problems in non-linear classification, function estimation and density estimation, which has also led to many other recent developments in kernel base...Support Vector Machine (SVM) is a powerful methodology for solving problems in non-linear classification, function estimation and density estimation, which has also led to many other recent developments in kernel based methods in general. This paper presents a highaccuracy and fault-tolerant SVM for the mobile geo-location problem, which is an important component of pervasive computing. Simulation results show its basic location performance, and illustrate impacts of the number of training samples and training area on test location error.展开更多
In a Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) based Wireless Local Area Network (WLAN) system, both Access Points (APs) and Mobile Termi-nals (MTs) are configured with mu...In a Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) based Wireless Local Area Network (WLAN) system, both Access Points (APs) and Mobile Termi-nals (MTs) are configured with multiple antennas, to make novel indoor geo-location method possible. In this paper, we presented a novel Least Square Support Vector Machine (LS-SVM) based data fusion algorithm to fuse signal strength measurements for indoor geo-location using only a single AP with MIMO arrays. We evaluate our proposed algorithms under indoor environments by MATLAB simulations. Simulation results show that our MIMO-based algorithm is superior to conventional least square algorithm.展开更多
Forecasting always plays a vital role in modern economic and industrial fields,and tourism demand forecasting is an important part of intelligent tourism.This paper proposes a simple method for data modeling and a com...Forecasting always plays a vital role in modern economic and industrial fields,and tourism demand forecasting is an important part of intelligent tourism.This paper proposes a simple method for data modeling and a combined cross-view model,which is easy to implement but very effective.The method presented in this paper is commonly used for BPNN and SVR algorithms.A real tourism data set of Small Wild Goose Pagoda is used to verify the feasibility of the proposed method,with the analysis of the impact of year,season,and week on tourism demand forecasting.Comparative experiments suggest that the proposed model shows better accuracy than contrast methods.展开更多
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulner...Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.展开更多
Emergency ambulance services in the UK are tasked with providing pre-hospital patient care and clinical services with a target response time between call connect to on-scene attendance.In 2017,NHS England introduced f...Emergency ambulance services in the UK are tasked with providing pre-hospital patient care and clinical services with a target response time between call connect to on-scene attendance.In 2017,NHS England introduced four new response time categories based on patient needs.The most challenging is to be on-scene for a life-threatening situation within seven minutes of the call being connected when such calls are random in terms of time and place throughout a large territory.Recent evidence indicates emergency ambulance services regularly fall short of achieving the target ambulance response times set by the National Health Service(NHS).To achieve these targets,they need to undertake transformational change and apply statistical,operations research and artificial intelligence techniques in the form of five separate modules covering demand forecasting,plus locate,allocate,dispatch,monitoring and re-deployment of resources.These modules should be linked in real-time employing a data warehouse to minimise computational data and generate accurate,meaningful and timely decisions ensuring patients receive an appropriate and timely response.A simulation covering a limited geographical area,time and operational data concluded that this form of integration of the five modules provides accurate and timely data upon which to make decisions that effectively improve ambulance response times.展开更多
The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the...The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.展开更多
Consider the geo-localization task of finding the pose of a camera in a large 3 D scene from a single image.Most existing CNN-based methods use as input textured images.We aim to experimentally explore whether texture...Consider the geo-localization task of finding the pose of a camera in a large 3 D scene from a single image.Most existing CNN-based methods use as input textured images.We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task.To do so,we consider lean images,textureless projections of a simple 3 D model of a city.They only contain information related to the geometry of the scene viewed(edges,faces,and relative depth).The main contributions of this paper are:(i)to demonstrate the ability of CNNs to recover camera pose using lean images;and(ii)to provide insight into the role of geometry in the CNN learning process.展开更多
Numerous news or event pictures are taken and shared on the internet every day that have abundant information worth being mined,but only a small fraction of them are geotagged.The visual content of the news image hint...Numerous news or event pictures are taken and shared on the internet every day that have abundant information worth being mined,but only a small fraction of them are geotagged.The visual content of the news image hints at clues of the geographical location because they are usually taken at the site of the incident,which provides a prerequisite for geo-localization.This paper proposes an automated pipeline based on deep learning for the geo-localization of news pictures in a large-scale urban environment using geotagged street view images as a reference dataset.The approach obtains location information by constructing an attention-based feature extraction network.Then,the image features are aggregated,and the candidate street view image results are retrieved by the selective matching kernel function.Finally,the coordinates of the news images are estimated by the kernel density prediction method.The pipeline is tested in the news pictures in Hong Kong.In the comparison experiments,the proposed pipeline shows stable performance and generalizability in the large-scale urban environment.In addition,the performance analysis of components in the pipeline shows the ability to recognize localization features of partial areas in pictures and the effectiveness of the proposed solution in news picture geo-localization.展开更多
Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UA...Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UAV is determined. Its main challenge is the considerable differences in the visual content of remote sensing images acquired by satellites and UAVs, such as dramatic changes in viewpoint, unknown orientations, etc. Much of the previous work has focused on image matching of homologous data. To overcome the difficulties caused by the difference between these two data modes and maintain robustness in visual positioning, a quality-aware template matching method based on scale-adaptive deep convolutional features is proposed by deeply mining their common features. The template size feature map and the reference image feature map are first obtained. The two feature maps obtained are used to measure the similarity. Finally, a heat map representing the probability of matching is generated to determine the best match in the reference image. The method is applied to the latest UAV-based geolocation dataset(University-1652 dataset) and the real-scene campus data we collected with UAVs. The experimental results demonstrate the effectiveness and superiority of the method.展开更多
We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 ...We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2 D space into a clustering problem in a 3 D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2 D joints for the same person across different views, from which the 3 D joints can be effectively inferred. We then assemble the inferred 3 D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion,closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:62106177supported by the Central University Basic Research Fund of China(No.2042020KF0016)supported by the supercomputing system in the Supercomputing Center of Wuhan University.
文摘The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly.
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciences
文摘The problem of associating the agricultural market names on web sites with their locations is essential for geographical analysis of the agricultural products. In this paper, an algorithm which employs the administrative ontology and the statistics from the search results were proposed. The experiments with 100 market names collected from web sites were conducted. The experimental results demonstrate that the algorithm proposed obtains satisfactory performance in resolving the problem above, thus the effectiveness of the method is verified.
文摘Support Vector Machine (SVM) is a powerful methodology for solving problems in non-linear classification, function estimation and density estimation, which has also led to many other recent developments in kernel based methods in general. This paper presents a highaccuracy and fault-tolerant SVM for the mobile geo-location problem, which is an important component of pervasive computing. Simulation results show its basic location performance, and illustrate impacts of the number of training samples and training area on test location error.
文摘In a Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) based Wireless Local Area Network (WLAN) system, both Access Points (APs) and Mobile Termi-nals (MTs) are configured with multiple antennas, to make novel indoor geo-location method possible. In this paper, we presented a novel Least Square Support Vector Machine (LS-SVM) based data fusion algorithm to fuse signal strength measurements for indoor geo-location using only a single AP with MIMO arrays. We evaluate our proposed algorithms under indoor environments by MATLAB simulations. Simulation results show that our MIMO-based algorithm is superior to conventional least square algorithm.
文摘Forecasting always plays a vital role in modern economic and industrial fields,and tourism demand forecasting is an important part of intelligent tourism.This paper proposes a simple method for data modeling and a combined cross-view model,which is easy to implement but very effective.The method presented in this paper is commonly used for BPNN and SVR algorithms.A real tourism data set of Small Wild Goose Pagoda is used to verify the feasibility of the proposed method,with the analysis of the impact of year,season,and week on tourism demand forecasting.Comparative experiments suggest that the proposed model shows better accuracy than contrast methods.
文摘Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
文摘Emergency ambulance services in the UK are tasked with providing pre-hospital patient care and clinical services with a target response time between call connect to on-scene attendance.In 2017,NHS England introduced four new response time categories based on patient needs.The most challenging is to be on-scene for a life-threatening situation within seven minutes of the call being connected when such calls are random in terms of time and place throughout a large territory.Recent evidence indicates emergency ambulance services regularly fall short of achieving the target ambulance response times set by the National Health Service(NHS).To achieve these targets,they need to undertake transformational change and apply statistical,operations research and artificial intelligence techniques in the form of five separate modules covering demand forecasting,plus locate,allocate,dispatch,monitoring and re-deployment of resources.These modules should be linked in real-time employing a data warehouse to minimise computational data and generate accurate,meaningful and timely decisions ensuring patients receive an appropriate and timely response.A simulation covering a limited geographical area,time and operational data concluded that this form of integration of the five modules provides accurate and timely data upon which to make decisions that effectively improve ambulance response times.
基金This research was funded by the Science and Technology Support Plan Project of Hebei Province(grant numbers 17210803D and 19273703D)the Science and Technology Spark Project of the Hebei Seismological Bureau(grant number DZ20180402056)+1 种基金the Education Department of Hebei Province(grant number QN2018095)the Polytechnic College of Hebei University of Science and Technology.
文摘The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.
文摘Consider the geo-localization task of finding the pose of a camera in a large 3 D scene from a single image.Most existing CNN-based methods use as input textured images.We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task.To do so,we consider lean images,textureless projections of a simple 3 D model of a city.They only contain information related to the geometry of the scene viewed(edges,faces,and relative depth).The main contributions of this paper are:(i)to demonstrate the ability of CNNs to recover camera pose using lean images;and(ii)to provide insight into the role of geometry in the CNN learning process.
文摘Numerous news or event pictures are taken and shared on the internet every day that have abundant information worth being mined,but only a small fraction of them are geotagged.The visual content of the news image hints at clues of the geographical location because they are usually taken at the site of the incident,which provides a prerequisite for geo-localization.This paper proposes an automated pipeline based on deep learning for the geo-localization of news pictures in a large-scale urban environment using geotagged street view images as a reference dataset.The approach obtains location information by constructing an attention-based feature extraction network.Then,the image features are aggregated,and the candidate street view image results are retrieved by the selective matching kernel function.Finally,the coordinates of the news images are estimated by the kernel density prediction method.The pipeline is tested in the news pictures in Hong Kong.In the comparison experiments,the proposed pipeline shows stable performance and generalizability in the large-scale urban environment.In addition,the performance analysis of components in the pipeline shows the ability to recognize localization features of partial areas in pictures and the effectiveness of the proposed solution in news picture geo-localization.
基金co-supported by the National Natural Science Foundations of China(Nos.62175111 and 62001234)。
文摘Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UAV is determined. Its main challenge is the considerable differences in the visual content of remote sensing images acquired by satellites and UAVs, such as dramatic changes in viewpoint, unknown orientations, etc. Much of the previous work has focused on image matching of homologous data. To overcome the difficulties caused by the difference between these two data modes and maintain robustness in visual positioning, a quality-aware template matching method based on scale-adaptive deep convolutional features is proposed by deeply mining their common features. The template size feature map and the reference image feature map are first obtained. The two feature maps obtained are used to measure the similarity. Finally, a heat map representing the probability of matching is generated to determine the best match in the reference image. The method is applied to the latest UAV-based geolocation dataset(University-1652 dataset) and the real-scene campus data we collected with UAVs. The experimental results demonstrate the effectiveness and superiority of the method.
基金partially supported by National Natural Science Foundation of China(No.61872317)Face Unity Technology。
文摘We present a multiview method for markerless motion capture of multiple people. The main challenge in this problem is to determine crossview correspondences for the 2 D joints in the presence of noise. We propose a 3 D hypothesis clustering technique to solve this problem. The core idea is to transform joint matching in 2 D space into a clustering problem in a 3 D hypothesis space. In this way, evidence from photometric appearance, multiview geometry, and bone length can be integrated to solve the clustering problem efficiently and robustly. Each cluster encodes a set of matched 2 D joints for the same person across different views, from which the 3 D joints can be effectively inferred. We then assemble the inferred 3 D joints to form full-body skeletons for all persons in a bottom–up way. Our experiments demonstrate the robustness of our approach even in challenging cases with heavy occlusion,closely interacting people, and few cameras. We have evaluated our method on many datasets, and our results show that it has significantly lower estimation errors than many state-of-the-art methods.