期刊文献+
共找到302,029篇文章
< 1 2 250 >
每页显示 20 50 100
CAW-YOLO:Cross-Layer Fusion and Weighted Receptive Field-Based YOLO for Small Object Detection in Remote Sensing
1
作者 Weiya Shi Shaowen Zhang Shiqiang Zhang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期3209-3231,共23页
In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in re... In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models. 展开更多
关键词 Small object detection attention mechanism cross-layer fusion discrete cosine transform
下载PDF
Learning Discriminatory Information for Object Detection on Urine Sediment Image
2
作者 Sixian Chan Binghui Wu +2 位作者 Guodao Zhang Yuan Yao Hongqiang Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第1期411-428,共18页
In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,... In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5. 展开更多
关键词 object detection attention mechanism medical image urine sediment
下载PDF
Multi-Stream Temporally Enhanced Network for Video Salient Object Detection
3
作者 Dan Xu Jiale Ru Jinlong Shi 《Computers, Materials & Continua》 SCIE EI 2024年第1期85-104,共20页
Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing com... Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing complex spatial data that is also influenced by temporal dynamics.Despite the progress made in existing VSOD models,they still struggle in scenes of great background diversity within and between frames.Additionally,they encounter difficulties related to accumulated noise and high time consumption during the extraction of temporal features over a long-term duration.We propose a multi-stream temporal enhanced network(MSTENet)to address these problems.It investigates saliency cues collaboration in the spatial domain with a multi-stream structure to deal with the great background diversity challenge.A straightforward,yet efficient approach for temporal feature extraction is developed to avoid the accumulative noises and reduce time consumption.The distinction between MSTENet and other VSOD methods stems from its incorporation of both foreground supervision and background supervision,facilitating enhanced extraction of collaborative saliency cues.Another notable differentiation is the innovative integration of spatial and temporal features,wherein the temporal module is integrated into the multi-stream structure,enabling comprehensive spatial-temporal interactions within an end-to-end framework.Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on five benchmark datasets while maintaining a real-time speed of 27 fps(Titan XP).Our code and models are available at https://github.com/RuJiaLe/MSTENet. 展开更多
关键词 Video salient object detection deep learning temporally enhanced foreground-background collaboration
下载PDF
Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
4
作者 Cong Pan Junran Peng Zhaoxiang Zhang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第3期673-689,共17页
Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input t... Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts. 展开更多
关键词 Monocular 3D object detection normalizing flows Swin Transformer
下载PDF
A Secure and Cost-Effective Training Framework Atop Serverless Computing for Object Detection in Blasting
5
作者 Tianming Zhang Zebin Chen +4 位作者 Haonan Guo Bojun Ren Quanmin Xie Mengke Tian Yong Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第5期2139-2154,共16页
The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection ... The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS. 展开更多
关键词 Serverless computing object detection BLASTING
下载PDF
Local saliency consistency-based label inference for weakly supervised salient object detection using scribble annotations
6
作者 Shuo Zhao Peng Cui +1 位作者 Jing Shen Haibo Liu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2024年第1期239-249,共11页
Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully superv... Recently,weak supervision has received growing attention in the field of salient object detection due to the convenience of labelling.However,there is a large performance gap between weakly supervised and fully supervised salient object detectors because the scribble annotation can only provide very limited foreground/background information.Therefore,an intuitive idea is to infer annotations that cover more complete object and background regions for training.To this end,a label inference strategy is proposed based on the assumption that pixels with similar colours and close positions should have consistent labels.Specifically,k-means clustering algorithm was first performed on both colours and coordinates of original annotations,and then assigned the same labels to points having similar colours with colour cluster centres and near coordinate cluster centres.Next,the same annotations for pixels with similar colours within each kernel neighbourhood was set further.Extensive experiments on six benchmarks demonstrate that our method can significantly improve the performance and achieve the state-of-the-art results. 展开更多
关键词 label inference salient object detection weak supervision
下载PDF
SwinVid:Enhancing Video Object Detection Using Swin Transformer
7
作者 Abdelrahman Maharek Amr Abozeid +1 位作者 Rasha Orban Kamal ElDahshan 《Computer Systems Science & Engineering》 2024年第2期305-320,共16页
What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reas... What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reasons have made video object detection(VID)a growing area of research in recent years.Video object detection can be used for various healthcare applications,such as detecting and tracking tumors in medical imaging,monitoring the movement of patients in hospitals and long-term care facilities,and analyzing videos of surgeries to improve technique and training.Additionally,it can be used in telemedicine to help diagnose and monitor patients remotely.Existing VID techniques are based on recurrent neural networks or optical flow for feature aggregation to produce reliable features which can be used for detection.Some of those methods aggregate features on the full-sequence level or from nearby frames.To create feature maps,existing VID techniques frequently use Convolutional Neural Networks(CNNs)as the backbone network.On the other hand,Vision Transformers have outperformed CNNs in various vision tasks,including object detection in still images and image classification.We propose in this research to use Swin-Transformer,a state-of-the-art Vision Transformer,as an alternative to CNN-based backbone networks for object detection in videos.The proposed architecture enhances the accuracy of existing VID methods.The ImageNet VID and EPIC KITCHENS datasets are used to evaluate the suggested methodology.We have demonstrated that our proposed method is efficient by achieving 84.3%mean average precision(mAP)on ImageNet VID using less memory in comparison to other leading VID techniques.The source code is available on the website https://github.com/amaharek/SwinVid. 展开更多
关键词 Video object detection vision transformers convolutional neural networks deep learning
下载PDF
Few-Shot Object Detection Based on the Transformer and High-Resolution Network 被引量:1
8
作者 Dengyong Zhang Huaijian Pu +2 位作者 Feng Li Xiangling Ding Victor S.Sheng 《Computers, Materials & Continua》 SCIE EI 2023年第2期3439-3454,共16页
Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balan... Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balance between network parameters and training data.It makes the information provided by a small amount of picture data insufficient to optimize model parameters,resulting in unsatisfactory detection results.To improve the accuracy of few shot object detection,this paper proposes a network based on the transformer and high-resolution feature extraction(THR).High-resolution feature extractionmaintains the resolution representation of the image.Channels and spatial attention are used to make the network focus on features that are more useful to the object.In addition,the recently popular transformer is used to fuse the features of the existing object.This compensates for the previous network failure by making full use of existing object features.Experiments on the Pascal VOC and MS-COCO datasets prove that the THR network has achieved better results than previous mainstream few shot object detection. 展开更多
关键词 object detection few shot object detection TRANSFORMER HIGH-RESOLUTION
下载PDF
UGC-YOLO:Underwater Environment Object Detection Based on YOLO with a Global Context Block 被引量:1
9
作者 YANG Yuyi CHEN Liang +2 位作者 ZHANG Jian LONG Lingchun WANG Zhenfei 《Journal of Ocean University of China》 SCIE CAS CSCD 2023年第3期665-674,共10页
With the continuous development and utilization of marine resources,the underwater target detection has gradually become a popular research topic in the field of underwater robot operations and target detection.Howeve... With the continuous development and utilization of marine resources,the underwater target detection has gradually become a popular research topic in the field of underwater robot operations and target detection.However,it is difficult to combine the environmental semantic information and the semantic information of targets at different scales by detection algorithms due to the complex underwater environment.In this paper,a cascade model based on the UGC-YOLO network structure with high detection accuracy is proposed.The YOLOv3 convolutional neural network is employed as the baseline structure.By fusing the global semantic information between two residual stages in the parallel structure of the feature extraction network,the perception of underwater targets is improved and the detection rate of hard-to-detect underwater objects is raised.Furthermore,the deformable convolution is applied to capture longrange semantic dependencies and PPM pooling is introduced in the highest layer network for aggregating semantic information.Finally,a multi-scale weighted fusion approach is presented for learning semantic information at different scales.Experiments are conducted on an underwater test dataset and the results have demonstrated that our proposed algorithm could detect aquatic targets in complex degraded underwater images.Compared with the baseline network algorithm,the Common Objects in Context(COCO)evaluation metric has been improved by 4.34%. 展开更多
关键词 object detection underwater environment semantic information semantic features deep learning algorithm
下载PDF
Dual Attribute Adversarial Camouflage toward camouflaged object detection 被引量:1
10
作者 Yang Wang Zheng Fang +3 位作者 Yun-fei Zheng Zhen Yang Wen Tong Tie-yong Cao 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第4期166-175,共10页
The object detectors can precisely detect the camouflaged object beyond human perception.The investigations reveal that the CNNs-based(Convolution Neural Networks)detectors are vulnerable to adversarial attacks.Some w... The object detectors can precisely detect the camouflaged object beyond human perception.The investigations reveal that the CNNs-based(Convolution Neural Networks)detectors are vulnerable to adversarial attacks.Some works can fool detectors by crafting the adversarial camouflage attached to the object,leading to wrong prediction.It is hard for military operations to utilize the existing adversarial camouflage due to its conspicuous appearance.Motivated by this,this paper proposes the Dual Attribute Adversarial Camouflage(DAAC)for evading the detection by both detectors and humans.Generating DAAC includes two steps:(1)Extracting features from a specific type of scene to generate individual soldier digital camouflage;(2)Attaching the adversarial patch with scene features constraint to the individual soldier digital camouflage to generate the adversarial attribute of DAAC.The visual effects of the individual soldier digital camouflage and the adversarial patch will be improved after integrating with the scene features.Experiment results show that objects camouflaged by DAAC are well integrated with background and achieve visual concealment while remaining effective in fooling object detectors,thus evading the detections by both detectors and humans in the digital domain.This work can serve as the reference for crafting the adversarial camouflage in the physical world. 展开更多
关键词 Adversarial camouflage Digital camouflage generation Visual concealment object detection Adversarial patch
下载PDF
Realtime Object Detection Through M-ResNet in Video Surveillance System 被引量:1
11
作者 S.Prabu J.M.Gnanasekar 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期2257-2271,共15页
Object detection plays a vital role in the video surveillance systems.To enhance security,surveillance cameras are now installed in public areas such as traffic signals,roadways,retail malls,train stations,and banks.Ho... Object detection plays a vital role in the video surveillance systems.To enhance security,surveillance cameras are now installed in public areas such as traffic signals,roadways,retail malls,train stations,and banks.However,monitor-ing the video continually at a quicker pace is a challenging job.As a consequence,security cameras are useless and need human monitoring.The primary difficulty with video surveillance is identifying abnormalities such as thefts,accidents,crimes,or other unlawful actions.The anomalous action does not occur at a high-er rate than usual occurrences.To detect the object in a video,first we analyze the images pixel by pixel.In digital image processing,segmentation is the process of segregating the individual image parts into pixels.The performance of segmenta-tion is affected by irregular illumination and/or low illumination.These factors highly affect the real-time object detection process in the video surveillance sys-tem.In this paper,a modified ResNet model(M-Resnet)is proposed to enhance the image which is affected by insufficient light.Experimental results provide the comparison of existing method output and modification architecture of the ResNet model shows the considerable amount improvement in detection objects in the video stream.The proposed model shows better results in the metrics like preci-sion,recall,pixel accuracy,etc.,andfinds a reasonable improvement in the object detection. 展开更多
关键词 object detection ResNet video survilence image processing object quality
下载PDF
Zero-DCE++Inspired Object Detection in Less Illuminated Environment Using Improved YOLOv5
12
作者 Ananthakrishnan Balasundaram Anshuman Mohanty +3 位作者 Ayesha Shaik Krishnadoss Pradeep Kedalu Poornachary Vijayakumar Muthu Subash Kavitha 《Computers, Materials & Continua》 SCIE EI 2023年第12期2751-2769,共19页
Automated object detection has received the most attention over the years.Use cases ranging from autonomous driving applications to military surveillance systems,require robust detection of objects in different illumi... Automated object detection has received the most attention over the years.Use cases ranging from autonomous driving applications to military surveillance systems,require robust detection of objects in different illumination conditions.State-of-the-art object detectors tend to fare well in object detection during daytime conditions.However,their performance is severely hampered in night light conditions due to poor illumination.To address this challenge,the manuscript proposes an improved YOLOv5-based object detection framework for effective detection in unevenly illuminated nighttime conditions.Firstly,the preprocessing strategies involve using the Zero-DCE++approach to enhance lowlight images.It is followed by optimizing the existing YOLOv5 architecture by integrating the Convolutional Block Attention Module(CBAM)in the backbone network to boost model learning capability and Depthwise Convolutional module(DWConv)in the neck network for efficient compression of network parameters.The Night Object Detection(NOD)and Exclusively Dark(ExDARK)dataset has been used for this work.The proposed framework detects classes like humans,bicycles,and cars.Experiments demonstrate that the proposed architecture achieved a higher Mean Average Precision(mAP)along with a reduction in model size and total parameters,respectively.The proposed model is lighter by 11.24%in terms of model size and 12.38%in terms of parameters when compared to baseline YOLOv5. 展开更多
关键词 object detection deep learning nighttime road scenes YOLOv5 DWConv Zero-DCE++ CBAM
下载PDF
Accelerate Single Image Super-Resolution Using Object Detection Process
13
作者 Xiaolin Xing Shujie Yang Bohan Li 《Computers, Materials & Continua》 SCIE EI 2023年第8期1585-1597,共13页
Image Super-Resolution(SR)research has achieved great success with powerful neural networks.The deeper networks with more parameters improve the restoration quality but add the computation complexity,which means more ... Image Super-Resolution(SR)research has achieved great success with powerful neural networks.The deeper networks with more parameters improve the restoration quality but add the computation complexity,which means more inference time would be cost,hindering image SR from practical usage.Noting the spatial distribution of the objects or things in images,a twostage local objects SR system is proposed,which consists of two modules,the object detection module and the SR module.Firstly,You Only Look Once(YOLO),which is efficient in generic object detection tasks,is selected to detect the input images for obtaining objects of interest,then put them into the SR module and output corresponding High-Resolution(HR)subimages.The computational power consumption of image SR is optimized by reducing the resolution of input images.In addition,we establish a dataset,TrafficSign500,for our experiment.Finally,the performance of the proposed system is evaluated under several State-Of-The-Art(SOTA)YOLOv5 and SISR models.Results show that our system can achieve a tremendous computation improvement in image SR. 展开更多
关键词 object detection SUPER-RESOLUTION computation complexity YOLOv5 inference time objects of interest
下载PDF
Few-shot object detection based on positive-sample improvement
14
作者 Yan Ouyang Xin-qing Wang +1 位作者 Rui-zhe Hu Hong-hui Xu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第10期74-86,共13页
Traditional object detectors based on deep learning rely on plenty of labeled samples,which are expensive to obtain.Few-shot object detection(FSOD)attempts to solve this problem,learning detection objects from a few l... Traditional object detectors based on deep learning rely on plenty of labeled samples,which are expensive to obtain.Few-shot object detection(FSOD)attempts to solve this problem,learning detection objects from a few labeled samples,but the performance is often unsatisfactory due to the scarcity of samples.We believe that the main reasons that restrict the performance of few-shot detectors are:(1)the positive samples is scarce,and(2)the quality of positive samples is low.Therefore,we put forward a novel few-shot object detector based on YOLOv4,starting from both improving the quantity and quality of positive samples.First,we design a hybrid multivariate positive sample augmentation(HMPSA)module to amplify the quantity of positive samples and increase positive sample diversity while suppressing negative samples.Then,we design a selective non-local fusion attention(SNFA)module to help the detector better learn the target features and improve the feature quality of positive samples.Finally,we optimize the loss function to make it more suitable for the task of FSOD.Experimental results on PASCAL VOC and MS COCO demonstrate that our designed few-shot object detector has competitive performance with other state-of-the-art detectors. 展开更多
关键词 Few-shot learning object detection Sample augmentation Attention mechanism
下载PDF
A Progressive Approach to Generic Object Detection: A Two-Stage Framework for Image Recognition
15
作者 Muhammad Aamir Ziaur Rahman +3 位作者 Waheed Ahmed Abro Uzair Aslam Bhatti Zaheer Ahmed Dayo Muhammad Ishfaq 《Computers, Materials & Continua》 SCIE EI 2023年第6期6351-6373,共23页
Object detection in images has been identified as a critical area of research in computer vision image processing.Research has developed several novel methods for determining an object’s location and category from an... Object detection in images has been identified as a critical area of research in computer vision image processing.Research has developed several novel methods for determining an object’s location and category from an image.However,there is still room for improvement in terms of detection effi-ciency.This study aims to develop a technique for detecting objects in images.To enhance overall detection performance,we considered object detection a two-fold problem,including localization and classification.The proposed method generates class-independent,high-quality,and precise proposals using an agglomerative clustering technique.We then combine these proposals with the relevant input image to train our network on convolutional features.Next,a network refinement module decreases the quantity of generated proposals to produce fewer high-quality candidate proposals.Finally,revised candidate proposals are sent into the network’s detection process to determine the object type.The algorithm’s performance is evaluated using publicly available the PASCAL Visual Object Classes Challenge 2007(VOC2007),VOC2012,and Microsoft Common Objects in Context(MS-COCO)datasets.Using only 100 proposals per image at intersection over union((IoU)=0.5 and 0.7),the proposed method attains Detection Recall(DR)rates of(93.17%and 79.35%)and(69.4%and 58.35%),and Mean Average Best Overlap(MABO)values of(79.25%and 62.65%),for the VOC2007 and MS-COCO datasets,respectively.Besides,it achieves a Mean Average Precision(mAP)of(84.7%and 81.5%)on both VOC datasets.The experiment findings reveal that our method exceeds previous approaches in terms of overall detection performance,proving its effectiveness. 展开更多
关键词 Deep neural network deep learning features agglomerative clustering LOCALIZATIONS REFINEMENT region of interest(ROI) object detection
下载PDF
DSAFF-Net:A Backbone Network Based on Mask R-CNN for Small Object Detection
16
作者 Jian Peng Yifang Zhao +2 位作者 Dengyong Zhang Feng Li Arun Kumar Sangaiah 《Computers, Materials & Continua》 SCIE EI 2023年第2期3405-3419,共15页
Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we presen... Recently,object detection based on convolutional neural networks(CNNs)has developed rapidly.The backbone networks for basic feature extraction are an important component of the whole detection task.Therefore,we present a new feature extraction strategy in this paper,which name is DSAFF-Net.In this strategy,we design:1)a sandwich attention feature fusion module(SAFF module).Its purpose is to enhance the semantic information of shallow features and resolution of deep features,which is beneficial to small object detection after feature fusion.2)to add a new stage called D-block to alleviate the disadvantages of decreasing spatial resolution when the pooling layer increases the receptive field.The method proposed in the new stage replaces the original method of obtaining the P6 feature map and uses the result as the input of the regional proposal network(RPN).In the experimental phase,we use the new strategy to extract features.The experiment takes the public dataset of Microsoft Common Objects in Context(MS COCO)object detection and the dataset of Corona Virus Disease 2019(COVID-19)image classification as the experimental object respectively.The results show that the average recognition accuracy of COVID-19 in the classification dataset is improved to 98.163%,and small object detection in object detection tasks is improved by 4.0%. 展开更多
关键词 Small object detection classification RPN MS COCO COVID-19
下载PDF
Visual SLAM Based on Object Detection Network:A Review
17
作者 Jiansheng Peng Dunhua Chen +3 位作者 Qing Yang Chengjun Yang Yong Xu Yong Qin 《Computers, Materials & Continua》 SCIE EI 2023年第12期3209-3236,共28页
Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed ... Visual simultaneous localization and mapping(SLAM)is crucial in robotics and autonomous driving.However,traditional visual SLAM faces challenges in dynamic environments.To address this issue,researchers have proposed semantic SLAM,which combines object detection,semantic segmentation,instance segmentation,and visual SLAM.Despite the growing body of literature on semantic SLAM,there is currently a lack of comprehensive research on the integration of object detection and visual SLAM.Therefore,this study aims to gather information from multiple databases and review relevant literature using specific keywords.It focuses on visual SLAM based on object detection,covering different aspects.Firstly,it discusses the current research status and challenges in this field,highlighting methods for incorporating semantic information from object detection networks into mileage measurement,closed-loop detection,and map construction.It also compares the characteristics and performance of various visual SLAM object detection algorithms.Lastly,it provides an outlook on future research directions and emerging trends in visual SLAM.Research has shown that visual SLAM based on object detection has significant improvements compared to traditional SLAM in dynamic point removal,data association,point cloud segmentation,and other technologies.It can improve the robustness and accuracy of the entire SLAM system and can run in real time.With the continuous optimization of algorithms and the improvement of hardware level,object visual SLAM has great potential for development. 展开更多
关键词 object detection visual SLAM visual odometry loop closure detection semantic map
下载PDF
MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
18
作者 Peicheng Shi Zhiqiang Liu +1 位作者 Heng Qi Aixi Yang 《Computers, Materials & Continua》 SCIE EI 2023年第6期5615-5637,共23页
In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection ... In complex traffic environment scenarios,it is very important for autonomous vehicles to accurately perceive the dynamic information of other vehicles around the vehicle in advance.The accuracy of 3D object detection will be affected by problems such as illumination changes,object occlusion,and object detection distance.To this purpose,we face these challenges by proposing a multimodal feature fusion network for 3D object detection(MFF-Net).In this research,this paper first uses the spatial transformation projection algorithm to map the image features into the feature space,so that the image features are in the same spatial dimension when fused with the point cloud features.Then,feature channel weighting is performed using an adaptive expression augmentation fusion network to enhance important network features,suppress useless features,and increase the directionality of the network to features.Finally,this paper increases the probability of false detection and missed detection in the non-maximum suppression algo-rithm by increasing the one-dimensional threshold.So far,this paper has constructed a complete 3D target detection network based on multimodal feature fusion.The experimental results show that the proposed achieves an average accuracy of 82.60%on the Karlsruhe Institute of Technology and Toyota Technological Institute(KITTI)dataset,outperforming previous state-of-the-art multimodal fusion networks.In Easy,Moderate,and hard evaluation indicators,the accuracy rate of this paper reaches 90.96%,81.46%,and 75.39%.This shows that the MFF-Net network has good performance in 3D object detection. 展开更多
关键词 3D object detection multimodal fusion neural network autonomous driving attention mechanism
下载PDF
Interactive Transformer for Small Object Detection
19
作者 Jian Wei Qinzhao Wang Zixu Zhao 《Computers, Materials & Continua》 SCIE EI 2023年第11期1699-1717,共19页
The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small o... The detection of large-scale objects has achieved high accuracy,but due to the low peak signal to noise ratio(PSNR),fewer distinguishing features,and ease of being occluded by the surroundings,the detection of small objects,however,does not enjoy similar success.Endeavor to solve the problem,this paper proposes an attention mechanism based on cross-Key values.Based on the traditional transformer,this paper first improves the feature processing with the convolution module,effectively maintaining the local semantic context in the middle layer,and significantly reducing the number of parameters of the model.Then,to enhance the effectiveness of the attention mask,two Key values are calculated simultaneously along Query and Value by using the method of dual-branch parallel processing,which is used to strengthen the attention acquisition mode and improve the coupling of key information.Finally,focusing on the feature maps of different channels,the multi-head attention mechanism is applied to the channel attention mask to improve the feature utilization effect of the middle layer.By comparing three small object datasets,the plug-and-play interactive transformer(IT-transformer)module designed by us effectively improves the detection results of the baseline. 展开更多
关键词 Small object detection ATTENTION TRANSFORMER plug-and-play
下载PDF
Performance releaser with smart anchor learning for arbitrary‐oriented object detection
20
作者 Tianwei W.Zhang Xiaoyu Y.Dong +4 位作者 Xu Sun Lianru R.Gao Ying Qu Bing Zhang Ke Zheng 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第4期1213-1225,共13页
Arbitrary‐oriented object detection is widely used in aerial image applications because of its efficient object representation.However,the use of oriented bounding box aggravates the imbalance between positive and ne... Arbitrary‐oriented object detection is widely used in aerial image applications because of its efficient object representation.However,the use of oriented bounding box aggravates the imbalance between positive and negative samples when using one‐stage object detectors,which seriously decreases the detection accuracy.We believe that it is the anchor learning strategy(ALS)used by such detectors that needs to take the responsibility.In this study,three perspectives on ALS design were summarised and ALS—Performance Releaser with Smart Anchor Learning(PRSAL)was proposed.Performance Releaser with Smart Anchor Learning is a dynamic ALS that utilises anchor classification ability as an equivalent indicator to anchor box regression ability,this allows anchors with high detection potential to be filtered out in a more reasonable way.At the same time,PRSAL focuses more on anchor potential and it is able to automatically select a number of positive samples that far exceed that of other methods by activating anchors that previously had a low spatial overlap,thereby releasing the detection performance.We validate the PRSAL using three remote sensing datasets—HRSC2016,DOTA and UCAS‐AOD as well as one scene text dataset—ICDAR 2013.The experimental results show that the proposed method gives substantially better results than existing models. 展开更多
关键词 anchor learning strategy deep learning object detection remote sensing
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部