Visual motion segmentation(VMS)is an important and key part of many intelligent crowd systems.It can be used to figure out the flow behavior through a crowd and to spot unusual life-threatening incidents like crowd st...Visual motion segmentation(VMS)is an important and key part of many intelligent crowd systems.It can be used to figure out the flow behavior through a crowd and to spot unusual life-threatening incidents like crowd stampedes and crashes,which pose a serious risk to public safety and have resulted in numerous fatalities over the past few decades.Trajectory clustering has become one of the most popular methods in VMS.However,complex data,such as a large number of samples and parameters,makes it difficult for trajectory clustering to work well with accurate motion segmentation results.This study introduces a spatial-angular stacked sparse autoencoder model(SA-SSAE)with l2-regularization and softmax,a powerful deep learning method for visual motion segmentation to cluster similar motion patterns that belong to the same cluster.The proposed model can extract meaningful high-level features using only spatial-angular features obtained from refined tracklets(a.k.a‘trajectories’).We adopt l2-regularization and sparsity regularization,which can learn sparse representations of features,to guarantee the sparsity of the autoencoders.We employ the softmax layer to map the data points into accurate cluster representations.One of the best advantages of the SA-SSAE framework is it can manage VMS even when individuals move around randomly.This framework helps cluster the motion patterns effectively with higher accuracy.We put forward a new dataset with itsmanual ground truth,including 21 crowd videos.Experiments conducted on two crowd benchmarks demonstrate that the proposed model can more accurately group trajectories than the traditional clustering approaches used in previous studies.The proposed SA-SSAE framework achieved a 0.11 improvement in accuracy and a 0.13 improvement in the F-measure compared with the best current method using the CUHK dataset.展开更多
Estimating the crowd count and density of highly dense scenes witnessed in Muslim gatherings at religious sites in Makkah and Madinah is critical for developing control strategies and organizing such a large gathering...Estimating the crowd count and density of highly dense scenes witnessed in Muslim gatherings at religious sites in Makkah and Madinah is critical for developing control strategies and organizing such a large gathering.Moreover,since the crowd images in this case can range from low density to high density,detection-based approaches are hard to apply for crowd counting.Recently,deep learning-based regression has become the prominent approach for crowd counting problems,where a density-map is estimated,and its integral is further computed to acquire the final count result.In this paper,we put forward a novel multi-scale network(named 2U-Net)for crowd counting in sparse and dense scenarios.The proposed framework,which employs the U-Net architecture,is straightforward to implement,computationally efficient,and has single-step training.Unpooling layers are used to retrieve the pooling layers’erased information and learn hierarchically pixelwise spatial representation.This helps in obtaining feature values,retaining spatial locations,and maximizing data integrity to avoid data loss.In addition,a modified attention unit is introduced and integrated into the proposed 2UNet model to focus on specific crowd areas.The proposed model concentrates on balancing the number of model parameters,model size,computational cost,and counting accuracy compared with other works,which may involve acquiring one criterion at the expense of other constraints.Experiments on five challenging datasets for density estimation and crowd counting have shown that the proposed model is very effective and outperforms comparable mainstream models.Moreover,it counts very well in both sparse and congested crowd scenes.The 2U-Net model has the lowest MAE in both parts(Part A and Part B)of the ShanghaiTech,UCSD,and Mall benchmarks,with 63.3,7.4,1.5,and 1.6,respectively.Furthermore,it obtains the lowest MSE in the ShanghaiTech-Part B,UCSD,and Mall benchmarks with 12.0,1.9,and 2.1,respectively.展开更多
基金This research work is supported by the Deputyship of Research&Innovation,Ministry of Education in Saudi Arabia(Grant Number 758).
文摘Visual motion segmentation(VMS)is an important and key part of many intelligent crowd systems.It can be used to figure out the flow behavior through a crowd and to spot unusual life-threatening incidents like crowd stampedes and crashes,which pose a serious risk to public safety and have resulted in numerous fatalities over the past few decades.Trajectory clustering has become one of the most popular methods in VMS.However,complex data,such as a large number of samples and parameters,makes it difficult for trajectory clustering to work well with accurate motion segmentation results.This study introduces a spatial-angular stacked sparse autoencoder model(SA-SSAE)with l2-regularization and softmax,a powerful deep learning method for visual motion segmentation to cluster similar motion patterns that belong to the same cluster.The proposed model can extract meaningful high-level features using only spatial-angular features obtained from refined tracklets(a.k.a‘trajectories’).We adopt l2-regularization and sparsity regularization,which can learn sparse representations of features,to guarantee the sparsity of the autoencoders.We employ the softmax layer to map the data points into accurate cluster representations.One of the best advantages of the SA-SSAE framework is it can manage VMS even when individuals move around randomly.This framework helps cluster the motion patterns effectively with higher accuracy.We put forward a new dataset with itsmanual ground truth,including 21 crowd videos.Experiments conducted on two crowd benchmarks demonstrate that the proposed model can more accurately group trajectories than the traditional clustering approaches used in previous studies.The proposed SA-SSAE framework achieved a 0.11 improvement in accuracy and a 0.13 improvement in the F-measure compared with the best current method using the CUHK dataset.
基金This research work is supported by the Deputyship of Research&Innovation,Ministry of Education in Saudi Arabia(Grant Number 758).
文摘Estimating the crowd count and density of highly dense scenes witnessed in Muslim gatherings at religious sites in Makkah and Madinah is critical for developing control strategies and organizing such a large gathering.Moreover,since the crowd images in this case can range from low density to high density,detection-based approaches are hard to apply for crowd counting.Recently,deep learning-based regression has become the prominent approach for crowd counting problems,where a density-map is estimated,and its integral is further computed to acquire the final count result.In this paper,we put forward a novel multi-scale network(named 2U-Net)for crowd counting in sparse and dense scenarios.The proposed framework,which employs the U-Net architecture,is straightforward to implement,computationally efficient,and has single-step training.Unpooling layers are used to retrieve the pooling layers’erased information and learn hierarchically pixelwise spatial representation.This helps in obtaining feature values,retaining spatial locations,and maximizing data integrity to avoid data loss.In addition,a modified attention unit is introduced and integrated into the proposed 2UNet model to focus on specific crowd areas.The proposed model concentrates on balancing the number of model parameters,model size,computational cost,and counting accuracy compared with other works,which may involve acquiring one criterion at the expense of other constraints.Experiments on five challenging datasets for density estimation and crowd counting have shown that the proposed model is very effective and outperforms comparable mainstream models.Moreover,it counts very well in both sparse and congested crowd scenes.The 2U-Net model has the lowest MAE in both parts(Part A and Part B)of the ShanghaiTech,UCSD,and Mall benchmarks,with 63.3,7.4,1.5,and 1.6,respectively.Furthermore,it obtains the lowest MSE in the ShanghaiTech-Part B,UCSD,and Mall benchmarks with 12.0,1.9,and 2.1,respectively.