In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,...In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.展开更多
In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in re...In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models.展开更多
The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection ...The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS.展开更多
Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input t...Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.展开更多
Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing com...Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing complex spatial data that is also influenced by temporal dynamics.Despite the progress made in existing VSOD models,they still struggle in scenes of great background diversity within and between frames.Additionally,they encounter difficulties related to accumulated noise and high time consumption during the extraction of temporal features over a long-term duration.We propose a multi-stream temporal enhanced network(MSTENet)to address these problems.It investigates saliency cues collaboration in the spatial domain with a multi-stream structure to deal with the great background diversity challenge.A straightforward,yet efficient approach for temporal feature extraction is developed to avoid the accumulative noises and reduce time consumption.The distinction between MSTENet and other VSOD methods stems from its incorporation of both foreground supervision and background supervision,facilitating enhanced extraction of collaborative saliency cues.Another notable differentiation is the innovative integration of spatial and temporal features,wherein the temporal module is integrated into the multi-stream structure,enabling comprehensive spatial-temporal interactions within an end-to-end framework.Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on five benchmark datasets while maintaining a real-time speed of 27 fps(Titan XP).Our code and models are available at https://github.com/RuJiaLe/MSTENet.展开更多
Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unman...Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.展开更多
We are investigating the distributed optimization problem,where a network of nodes works together to minimize a global objective that is a finite sum of their stored local functions.Since nodes exchange optimization p...We are investigating the distributed optimization problem,where a network of nodes works together to minimize a global objective that is a finite sum of their stored local functions.Since nodes exchange optimization parameters through the wireless network,large-scale training models can create communication bottlenecks,resulting in slower training times.To address this issue,CHOCO-SGD was proposed,which allows compressing information with arbitrary precision without reducing the convergence rate for strongly convex objective functions.Nevertheless,most convex functions are not strongly convex(such as logistic regression or Lasso),which raises the question of whether this algorithm can be applied to non-strongly convex functions.In this paper,we provide the first theoretical analysis of the convergence rate of CHOCO-SGD on non-strongly convex objectives.We derive a sufficient condition,which limits the fidelity of compression,to guarantee convergence.Moreover,our analysis demonstrates that within the fidelity threshold,this algorithm can significantly reduce transmission burden while maintaining the same convergence rate order as its no-compression equivalent.Numerical experiments further validate the theoretical findings by demonstrating that CHOCO-SGD improves communication efficiency and keeps the same convergence rate order simultaneously.And experiments also show that the algorithm fails to converge with low compression fidelity and in time-varying topologies.Overall,our study offers valuable insights into the potential applicability of CHOCO-SGD for non-strongly convex objectives.Additionally,we provide practical guidelines for researchers seeking to utilize this algorithm in real-world scenarios.展开更多
Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computati...Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computational resources. In this paper, the LAA images-oriented tensor decomposition and knowledge distillation-based network(TDKD-Net) is proposed,where the TT-format TD(tensor decomposition) and equalweighted response-based KD(knowledge distillation) methods are designed to minimize redundant parameters while ensuring comparable performance. Moreover, some robust network structures are developed, including the small object detection head and the dual-domain attention mechanism, which enable the model to leverage the learned knowledge from small-scale targets and selectively focus on salient features. Considering the imbalance of bounding box regression samples and the inaccuracy of regression geometric factors, the focal and efficient IoU(intersection of union) loss with optimal transport assignment(F-EIoU-OTA)mechanism is proposed to improve the detection accuracy. The proposed TDKD-Net is comprehensively evaluated through extensive experiments, and the results have demonstrated the effectiveness and superiority of the developed methods in comparison to other advanced detection algorithms, which also present high generalization and strong robustness. As a resource-efficient precise network, the complex detection of small and occluded LAA objects is also well addressed by TDKD-Net, which provides useful insights on handling imbalanced issues and realizing domain adaptation.展开更多
Memory deficit,which is often associated with aging and many psychiatric,neurological,and neurodegenerative diseases,has been a challenging issue for treatment.Up till now,all potential drug candidates have failed to ...Memory deficit,which is often associated with aging and many psychiatric,neurological,and neurodegenerative diseases,has been a challenging issue for treatment.Up till now,all potential drug candidates have failed to produce satisfa ctory effects.Therefore,in the search for a solution,we found that a treatment with the gene corresponding to the RGS14414protein in visual area V2,a brain area connected with brain circuits of the ventral stream and the medial temporal lobe,which is crucial for object recognition memory(ORM),can induce enhancement of ORM.In this study,we demonstrated that the same treatment with RGS14414in visual area V2,which is relatively unaffected in neurodegenerative diseases such as Alzheimer s disease,produced longlasting enhancement of ORM in young animals and prevent ORM deficits in rodent models of aging and Alzheimer’s disease.Furthermore,we found that the prevention of memory deficits was mediated through the upregulation of neuronal arbo rization and spine density,as well as an increase in brain-derived neurotrophic factor(BDNF).A knockdown of BDNF gene in RGS14414-treated aging rats and Alzheimer s disease model mice caused complete loss in the upregulation of neuronal structural plasticity and in the prevention of ORM deficits.These findings suggest that BDNF-mediated neuronal structural plasticity in area V2 is crucial in the prevention of memory deficits in RGS14414-treated rodent models of aging and Alzheimer’s disease.Therefore,our findings of RGS14414gene-mediated activation of neuronal circuits in visual area V2 have therapeutic relevance in the treatment of memory deficits.展开更多
What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reas...What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reasons have made video object detection(VID)a growing area of research in recent years.Video object detection can be used for various healthcare applications,such as detecting and tracking tumors in medical imaging,monitoring the movement of patients in hospitals and long-term care facilities,and analyzing videos of surgeries to improve technique and training.Additionally,it can be used in telemedicine to help diagnose and monitor patients remotely.Existing VID techniques are based on recurrent neural networks or optical flow for feature aggregation to produce reliable features which can be used for detection.Some of those methods aggregate features on the full-sequence level or from nearby frames.To create feature maps,existing VID techniques frequently use Convolutional Neural Networks(CNNs)as the backbone network.On the other hand,Vision Transformers have outperformed CNNs in various vision tasks,including object detection in still images and image classification.We propose in this research to use Swin-Transformer,a state-of-the-art Vision Transformer,as an alternative to CNN-based backbone networks for object detection in videos.The proposed architecture enhances the accuracy of existing VID methods.The ImageNet VID and EPIC KITCHENS datasets are used to evaluate the suggested methodology.We have demonstrated that our proposed method is efficient by achieving 84.3%mean average precision(mAP)on ImageNet VID using less memory in comparison to other leading VID techniques.The source code is available on the website https://github.com/amaharek/SwinVid.展开更多
Object tracking,an important technology in the field of image processing and computer vision,is used to continuously track a specific object or person in an image.This technology may be effective in identifying the sa...Object tracking,an important technology in the field of image processing and computer vision,is used to continuously track a specific object or person in an image.This technology may be effective in identifying the same person within one image,but it has limitations in handling multiple images owing to the difficulty in identifying whether the object appearing in other images is the same.When tracking the same object using two or more images,there must be a way to determine that objects existing in different images are the same object.Therefore,this paper attempts to determine the same object present in different images using color information among the unique information of the object.Thus,this study proposes a multiple-object-tracking method using histogram stamp extraction in closed-circuit television applications.The proposed method determines the presence or absence of a target object in an image by comparing the similarity between the image containing the target object and other images.To this end,a unique color value of the target object is extracted based on its color distribution in the image using three methods:mean,mode,and interquartile range.The Top-N accuracy method is used to analyze the accuracy of each method,and the results show that the mean method had an accuracy of 93.5%(Top-2).Furthermore,the positive prediction value experimental results show that the accuracy of the mean method was 65.7%.As a result of the analysis,it is possible to detect and track the same object present in different images using the unique color of the object.Through the results,it is possible to track the same object that can minimize manpower without using personal information when detecting objects in different images.In the last response speed experiment,it was shown that when the mean was used,the color extraction of the object was possible in real time with 0.016954 s.Through this,it is possible to detect and track the same object in real time when using the proposed method.展开更多
Introduction: Cranioencephalic trauma caused by bladed weapons is rare, and that caused by sharp objects is exceptional. The aim of our study was to describe the clinical, therapeutic and evolutionary aspects. Materia...Introduction: Cranioencephalic trauma caused by bladed weapons is rare, and that caused by sharp objects is exceptional. The aim of our study was to describe the clinical, therapeutic and evolutionary aspects. Materials and method: This was a descriptive and analytical study over a 48-month period at CHU la Renaissance from January 1, 2018 to December 31, 2021, concerning patients admitted for penetrating cranioencephalic trauma by pointed object. Results: Twelve cases, all male, of penetrating cranioencephalic sharp-force trauma were identified. The mean age was 34 ± 7 years, with extremes of 11 and 60 years. Farmers and herders accounted for 31% and 25% of cases respectively. The average admission time was 47 hours. Brawls were the circumstances of occurrence in 81.2% of cases. Knives (33%), arrows (25%) and iron bars (16.6%) were the objects used. Altered consciousness was present in 43.8% of cases, and focal deficit in 50%. Scannographic lesions were fracture and/or embarrhment (12 cases), intra-parenchymal haematomas (6 cases) and presence of object in place (4 cases). Surgery was performed in 11 patients. Postoperative outcome was favorable in 9 patients. After 12 months, 2 patients were declared unfit. Conclusion: Penetrating head injuries caused by sharp objects are common in Chad. Urgent surgery can prevent disabling after-effects.展开更多
This article presents an analysis of the patterns of interactions resulting from the positive and negative emotional events that occur in cities,considering them as complex systems.It explores,from the imaginaries,how...This article presents an analysis of the patterns of interactions resulting from the positive and negative emotional events that occur in cities,considering them as complex systems.It explores,from the imaginaries,how certain urban objects can act as emotional agents and how these events affect the urban system as a whole.An adaptive complex systems perspective is used to analyze these patterns.The results show patterns in the processes and dynamics that occur in cities based on the objects that affect the emotions of the people who live there.These patterns depend on the characteristics of the emotional charge of urban objects,but they can be generalized in the following process:(1)immediate reaction by some individuals;(2)emotions are generated at the individual level which begins to generalize,permuting to a collective emotion;(3)a process of reflection is detonated in some individuals from the reading of collective emotions;(4)integration/significance in the community both at the individual and collective level,on the concepts,roles and/or functions that give rise to the process in the system.Therefore,it is clear that emotions play a significant role in the development of cities and these aspects should be considered in the design strategies of all kinds of projects for the city.Future extensions of this work could include a deeper analysis of specific emotional events in urban environments,as well as possible implications for urban policy and decision making.展开更多
Structural development defects essentially refer to code structure that violates object-oriented design principles. They make program maintenance challenging and deteriorate software quality over time. Various detecti...Structural development defects essentially refer to code structure that violates object-oriented design principles. They make program maintenance challenging and deteriorate software quality over time. Various detection approaches, ranging from traditional heuristic algorithms to machine learning methods, are used to identify these defects. Ensemble learning methods have strengthened the detection of these defects. However, existing approaches do not simultaneously exploit the capabilities of extracting relevant features from pre-trained models and the performance of neural networks for the classification task. Therefore, our goal has been to design a model that combines a pre-trained model to extract relevant features from code excerpts through transfer learning and a bagging method with a base estimator, a dense neural network, for defect classification. To achieve this, we composed multiple samples of the same size with replacements from the imbalanced dataset MLCQ1. For all the samples, we used the CodeT5-small variant to extract features and trained a bagging method with the neural network Roberta Classification Head to classify defects based on these features. We then compared this model to RandomForest, one of the ensemble methods that yields good results. Our experiments showed that the number of base estimators to use for bagging depends on the defect to be detected. Next, we observed that it was not necessary to use a data balancing technique with our model when the imbalance rate was 23%. Finally, for blob detection, RandomForest had a median MCC value of 0.36 compared to 0.12 for our method. However, our method was predominant in Long Method detection with a median MCC value of 0.53 compared to 0.42 for RandomForest. These results suggest that the performance of ensemble methods in detecting structural development defects is dependent on specific defects.展开更多
Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balan...Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balance between network parameters and training data.It makes the information provided by a small amount of picture data insufficient to optimize model parameters,resulting in unsatisfactory detection results.To improve the accuracy of few shot object detection,this paper proposes a network based on the transformer and high-resolution feature extraction(THR).High-resolution feature extractionmaintains the resolution representation of the image.Channels and spatial attention are used to make the network focus on features that are more useful to the object.In addition,the recently popular transformer is used to fuse the features of the existing object.This compensates for the previous network failure by making full use of existing object features.Experiments on the Pascal VOC and MS-COCO datasets prove that the THR network has achieved better results than previous mainstream few shot object detection.展开更多
Object detection plays a vital role in the video surveillance systems.To enhance security,surveillance cameras are now installed in public areas such as traffic signals,roadways,retail malls,train stations,and banks.Ho...Object detection plays a vital role in the video surveillance systems.To enhance security,surveillance cameras are now installed in public areas such as traffic signals,roadways,retail malls,train stations,and banks.However,monitor-ing the video continually at a quicker pace is a challenging job.As a consequence,security cameras are useless and need human monitoring.The primary difficulty with video surveillance is identifying abnormalities such as thefts,accidents,crimes,or other unlawful actions.The anomalous action does not occur at a high-er rate than usual occurrences.To detect the object in a video,first we analyze the images pixel by pixel.In digital image processing,segmentation is the process of segregating the individual image parts into pixels.The performance of segmenta-tion is affected by irregular illumination and/or low illumination.These factors highly affect the real-time object detection process in the video surveillance sys-tem.In this paper,a modified ResNet model(M-Resnet)is proposed to enhance the image which is affected by insufficient light.Experimental results provide the comparison of existing method output and modification architecture of the ResNet model shows the considerable amount improvement in detection objects in the video stream.The proposed model shows better results in the metrics like preci-sion,recall,pixel accuracy,etc.,andfinds a reasonable improvement in the object detection.展开更多
Image Super-Resolution(SR)research has achieved great success with powerful neural networks.The deeper networks with more parameters improve the restoration quality but add the computation complexity,which means more ...Image Super-Resolution(SR)research has achieved great success with powerful neural networks.The deeper networks with more parameters improve the restoration quality but add the computation complexity,which means more inference time would be cost,hindering image SR from practical usage.Noting the spatial distribution of the objects or things in images,a twostage local objects SR system is proposed,which consists of two modules,the object detection module and the SR module.Firstly,You Only Look Once(YOLO),which is efficient in generic object detection tasks,is selected to detect the input images for obtaining objects of interest,then put them into the SR module and output corresponding High-Resolution(HR)subimages.The computational power consumption of image SR is optimized by reducing the resolution of input images.In addition,we establish a dataset,TrafficSign500,for our experiment.Finally,the performance of the proposed system is evaluated under several State-Of-The-Art(SOTA)YOLOv5 and SISR models.Results show that our system can achieve a tremendous computation improvement in image SR.展开更多
Object Detection is the task of localization and classification of objects in a video or image.In recent times,because of its widespread applications,it has obtained more importance.In the modern world,waste pollution...Object Detection is the task of localization and classification of objects in a video or image.In recent times,because of its widespread applications,it has obtained more importance.In the modern world,waste pollution is one significant environmental problem.The prominence of recycling is known very well for both ecological and economic reasons,and the industry needs higher efficiency.Waste object detection utilizing deep learning(DL)involves training a machine-learning method to classify and detect various types of waste in videos or images.This technology is utilized for several purposes recycling and sorting waste,enhancing waste management and reducing environmental pollution.Recent studies of automatic waste detection are difficult to compare because of the need for benchmarks and broadly accepted standards concerning the employed data andmetrics.Therefore,this study designs an Entropy-based Feature Fusion using Deep Learning forWasteObject Detection and Classification(EFFDL-WODC)algorithm.The presented EFFDL-WODC system inherits the concepts of feature fusion and DL techniques for the effectual recognition and classification of various kinds of waste objects.In the presented EFFDL-WODC system,two major procedures can be contained,such as waste object detection and waste object classification.For object detection,the EFFDL-WODC technique uses a YOLOv7 object detector with a fusionbased backbone network.In addition,entropy feature fusion-based models such as VGG-16,SqueezeNet,and NASNetmodels are used.Finally,the EFFDL-WODC technique uses a graph convolutional network(GCN)model performed for the classification of detected waste objects.The performance validation of the EFFDL-WODC approach was validated on the benchmark database.The comprehensive comparative results demonstrated the improved performance of the EFFDL-WODC technique over recent approaches.展开更多
Object detection and classification are the trending research topics in thefield of computer vision because of their applications like visual surveillance.However,the vision-based objects detection and classification met...Object detection and classification are the trending research topics in thefield of computer vision because of their applications like visual surveillance.However,the vision-based objects detection and classification methods still suffer from detecting smaller objects and dense objects in the complex dynamic envir-onment with high accuracy and precision.The present paper proposes a novel enhanced method to detect and classify objects using Hyperbolic Tangent based You Only Look Once V4 with a Modified Manta-Ray Foraging Optimization-based Convolution Neural Network.Initially,in the pre-processing,the video data was converted into image sequences and Polynomial Adaptive Edge was applied to preserve the Algorithm method for image resizing and noise removal.The noiseless resized image sequences contrast was enhanced using Contrast Limited Adaptive Edge Preserving Algorithm.And,with the contrast-enhanced image sequences,the Hyperbolic Tangent based You Only Look Once V4 was trained for object detection.Additionally,to detect smaller objects with high accuracy,Grasp configuration was observed for every detected object.Finally,the Modified Manta-Ray Foraging Optimization-based Convolution Neural Network method was carried out for the detection and the classification of objects.Comparative experiments were conducted on various benchmark datasets and methods that showed improved accurate detection and classification results.展开更多
基金This work was partially supported by the National Natural Science Foundation of China(Grant Nos.61906168,U20A20171)Zhejiang Provincial Natural Science Foundation of China(Grant Nos.LY23F020023,LY21F020027)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects(Grant Nos.2022SDSJ01).
文摘In clinical practice,the microscopic examination of urine sediment is considered an important in vitro examination with many broad applications.Measuring the amount of each type of urine sediment allows for screening,diagnosis and evaluation of kidney and urinary tract disease,providing insight into the specific type and severity.However,manual urine sediment examination is labor-intensive,time-consuming,and subjective.Traditional machine learning based object detection methods require hand-crafted features for localization and classification,which have poor generalization capabilities and are difficult to quickly and accurately detect the number of urine sediments.Deep learning based object detection methods have the potential to address the challenges mentioned above,but these methods require access to large urine sediment image datasets.Unfortunately,only a limited number of publicly available urine sediment datasets are currently available.To alleviate the lack of urine sediment datasets in medical image analysis,we propose a new dataset named UriSed2K,which contains 2465 high-quality images annotated with expert guidance.Two main challenges are associated with our dataset:a large number of small objects and the occlusion between these small objects.Our manuscript focuses on applying deep learning object detection methods to the urine sediment dataset and addressing the challenges presented by this dataset.Specifically,our goal is to improve the accuracy and efficiency of the detection algorithm and,in doing so,provide medical professionals with an automatic detector that saves time and effort.We propose an improved lightweight one-stage object detection algorithm called Discriminatory-YOLO.The proposed algorithm comprises a local context attention module and a global background suppression module,which aid the detector in distinguishing urine sediment features in the image.The local context attention module captures context information beyond the object region,while the global background suppression module emphasizes objects in uninformative backgrounds.We comprehensively evaluate our method on the UriSed2K dataset,which includes seven categories of urine sediments,such as erythrocytes(red blood cells),leukocytes(white blood cells),epithelial cells,crystals,mycetes,broken erythrocytes,and broken leukocytes,achieving the best average precision(AP)of 95.3%while taking only 10 ms per image.The source code and dataset are available at https://github.com/binghuiwu98/discriminatoryyolov5.
基金supported in part by the National Natural Science Foundation of China under Grant 62006071part by the Science and Technology Research Project of Henan Province under Grant 232103810086.
文摘In recent years,there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks.Despite these efforts,the detection of small objects in remote sensing remains a formidable challenge.The deep network structure will bring about the loss of object features,resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers.Additionally,the features of small objects are susceptible to interference from background features contained within the image,leading to a decline in detection accuracy.Moreover,the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty.In this paper,we introduce a novel approach,Cross-Layer Fusion and Weighted Receptive Field-based YOLO(CAW-YOLO),specifically designed for small object detection in remote sensing.To address feature loss in deep layers,we have devised a cross-layer attention fusion module.Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention(BRA).To enhance the model’s capacity to perceive multi-scale objects,particularly small-scale objects,we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule.Furthermore,wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance(NWD)and Efficient Intersection over Union(EIoU)losses.The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets.The experimental results unequivocally demonstrate the model’s pronounced advantages in small object detection for remote sensing,surpassing the performance of current mainstream models.
文摘The data analysis of blasting sites has always been the research goal of relevant researchers.The rise of mobile blasting robots has aroused many researchers’interest in machine learning methods for target detection in the field of blasting.Serverless Computing can provide a variety of computing services for people without hardware foundations and rich software development experience,which has aroused people’s interest in how to use it in the field ofmachine learning.In this paper,we design a distributedmachine learning training application based on the AWS Lambda platform.Based on data parallelism,the data aggregation and training synchronization in Function as a Service(FaaS)are effectively realized.It also encrypts the data set,effectively reducing the risk of data leakage.We rent a cloud server and a Lambda,and then we conduct experiments to evaluate our applications.Our results indicate the effectiveness,rapidity,and economy of distributed training on FaaS.
基金supported in part by the Major Project for New Generation of AI (2018AAA0100400)the National Natural Science Foundation of China (61836014,U21B2042,62072457,62006231)the InnoHK Program。
文摘Monocular 3D object detection is challenging due to the lack of accurate depth information.Some methods estimate the pixel-wise depth maps from off-the-shelf depth estimators and then use them as an additional input to augment the RGB images.Depth-based methods attempt to convert estimated depth maps to pseudo-LiDAR and then use LiDAR-based object detectors or focus on the perspective of image and depth fusion learning.However,they demonstrate limited performance and efficiency as a result of depth inaccuracy and complex fusion mode with convolutions.Different from these approaches,our proposed depth-guided vision transformer with a normalizing flows(NF-DVT)network uses normalizing flows to build priors in depth maps to achieve more accurate depth information.Then we develop a novel Swin-Transformer-based backbone with a fusion module to process RGB image patches and depth map patches with two separate branches and fuse them using cross-attention to exchange information with each other.Furthermore,with the help of pixel-wise relative depth values in depth maps,we develop new relative position embeddings in the cross-attention mechanism to capture more accurate sequence ordering of input tokens.Our method is the first Swin-Transformer-based backbone architecture for monocular 3D object detection.The experimental results on the KITTI and the challenging Waymo Open datasets show the effectiveness of our proposed method and superior performance over previous counterparts.
基金funded by the Natural Science Foundation China(NSFC)under Grant No.62203192.
文摘Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing complex spatial data that is also influenced by temporal dynamics.Despite the progress made in existing VSOD models,they still struggle in scenes of great background diversity within and between frames.Additionally,they encounter difficulties related to accumulated noise and high time consumption during the extraction of temporal features over a long-term duration.We propose a multi-stream temporal enhanced network(MSTENet)to address these problems.It investigates saliency cues collaboration in the spatial domain with a multi-stream structure to deal with the great background diversity challenge.A straightforward,yet efficient approach for temporal feature extraction is developed to avoid the accumulative noises and reduce time consumption.The distinction between MSTENet and other VSOD methods stems from its incorporation of both foreground supervision and background supervision,facilitating enhanced extraction of collaborative saliency cues.Another notable differentiation is the innovative integration of spatial and temporal features,wherein the temporal module is integrated into the multi-stream structure,enabling comprehensive spatial-temporal interactions within an end-to-end framework.Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on five benchmark datasets while maintaining a real-time speed of 27 fps(Titan XP).Our code and models are available at https://github.com/RuJiaLe/MSTENet.
基金This research was funded by the Natural Science Foundation of Hebei Province(F2021506004).
文摘Transformer-based models have facilitated significant advances in object detection.However,their extensive computational consumption and suboptimal detection of dense small objects curtail their applicability in unmanned aerial vehicle(UAV)imagery.Addressing these limitations,we propose a hybrid transformer-based detector,H-DETR,and enhance it for dense small objects,leading to an accurate and efficient model.Firstly,we introduce a hybrid transformer encoder,which integrates a convolutional neural network-based cross-scale fusion module with the original encoder to handle multi-scale feature sequences more efficiently.Furthermore,we propose two novel strategies to enhance detection performance without incurring additional inference computation.Query filter is designed to cope with the dense clustering inherent in drone-captured images by counteracting similar queries with a training-aware non-maximum suppression.Adversarial denoising learning is a novel enhancement method inspired by adversarial learning,which improves the detection of numerous small targets by counteracting the effects of artificial spatial and semantic noise.Extensive experiments on the VisDrone and UAVDT datasets substantiate the effectiveness of our approach,achieving a significant improvement in accuracy with a reduction in computational complexity.Our method achieves 31.9%and 21.1%AP on the VisDrone and UAVDT datasets,respectively,and has a faster inference speed,making it a competitive model in UAV image object detection.
基金supported in part by the Shanghai Natural Science Foundation under the Grant 22ZR1407000.
文摘We are investigating the distributed optimization problem,where a network of nodes works together to minimize a global objective that is a finite sum of their stored local functions.Since nodes exchange optimization parameters through the wireless network,large-scale training models can create communication bottlenecks,resulting in slower training times.To address this issue,CHOCO-SGD was proposed,which allows compressing information with arbitrary precision without reducing the convergence rate for strongly convex objective functions.Nevertheless,most convex functions are not strongly convex(such as logistic regression or Lasso),which raises the question of whether this algorithm can be applied to non-strongly convex functions.In this paper,we provide the first theoretical analysis of the convergence rate of CHOCO-SGD on non-strongly convex objectives.We derive a sufficient condition,which limits the fidelity of compression,to guarantee convergence.Moreover,our analysis demonstrates that within the fidelity threshold,this algorithm can significantly reduce transmission burden while maintaining the same convergence rate order as its no-compression equivalent.Numerical experiments further validate the theoretical findings by demonstrating that CHOCO-SGD improves communication efficiency and keeps the same convergence rate order simultaneously.And experiments also show that the algorithm fails to converge with low compression fidelity and in time-varying topologies.Overall,our study offers valuable insights into the potential applicability of CHOCO-SGD for non-strongly convex objectives.Additionally,we provide practical guidelines for researchers seeking to utilize this algorithm in real-world scenarios.
基金supported in part by the National Natural Science Foundation of China (62073271)the Natural Science Foundation for Distinguished Young Scholars of the Fujian Province of China (2023J06010)the Fundamental Research Funds for the Central Universities of China(20720220076)。
文摘Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computational resources. In this paper, the LAA images-oriented tensor decomposition and knowledge distillation-based network(TDKD-Net) is proposed,where the TT-format TD(tensor decomposition) and equalweighted response-based KD(knowledge distillation) methods are designed to minimize redundant parameters while ensuring comparable performance. Moreover, some robust network structures are developed, including the small object detection head and the dual-domain attention mechanism, which enable the model to leverage the learned knowledge from small-scale targets and selectively focus on salient features. Considering the imbalance of bounding box regression samples and the inaccuracy of regression geometric factors, the focal and efficient IoU(intersection of union) loss with optimal transport assignment(F-EIoU-OTA)mechanism is proposed to improve the detection accuracy. The proposed TDKD-Net is comprehensively evaluated through extensive experiments, and the results have demonstrated the effectiveness and superiority of the developed methods in comparison to other advanced detection algorithms, which also present high generalization and strong robustness. As a resource-efficient precise network, the complex detection of small and occluded LAA objects is also well addressed by TDKD-Net, which provides useful insights on handling imbalanced issues and realizing domain adaptation.
基金supported by grants from the Ministerio de Economia y Competitividad(BFU2013-43458-R)Junta de Andalucia(P12-CTS-1694 and Proyexcel-00422)to ZUK。
文摘Memory deficit,which is often associated with aging and many psychiatric,neurological,and neurodegenerative diseases,has been a challenging issue for treatment.Up till now,all potential drug candidates have failed to produce satisfa ctory effects.Therefore,in the search for a solution,we found that a treatment with the gene corresponding to the RGS14414protein in visual area V2,a brain area connected with brain circuits of the ventral stream and the medial temporal lobe,which is crucial for object recognition memory(ORM),can induce enhancement of ORM.In this study,we demonstrated that the same treatment with RGS14414in visual area V2,which is relatively unaffected in neurodegenerative diseases such as Alzheimer s disease,produced longlasting enhancement of ORM in young animals and prevent ORM deficits in rodent models of aging and Alzheimer’s disease.Furthermore,we found that the prevention of memory deficits was mediated through the upregulation of neuronal arbo rization and spine density,as well as an increase in brain-derived neurotrophic factor(BDNF).A knockdown of BDNF gene in RGS14414-treated aging rats and Alzheimer s disease model mice caused complete loss in the upregulation of neuronal structural plasticity and in the prevention of ORM deficits.These findings suggest that BDNF-mediated neuronal structural plasticity in area V2 is crucial in the prevention of memory deficits in RGS14414-treated rodent models of aging and Alzheimer’s disease.Therefore,our findings of RGS14414gene-mediated activation of neuronal circuits in visual area V2 have therapeutic relevance in the treatment of memory deficits.
文摘What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reasons have made video object detection(VID)a growing area of research in recent years.Video object detection can be used for various healthcare applications,such as detecting and tracking tumors in medical imaging,monitoring the movement of patients in hospitals and long-term care facilities,and analyzing videos of surgeries to improve technique and training.Additionally,it can be used in telemedicine to help diagnose and monitor patients remotely.Existing VID techniques are based on recurrent neural networks or optical flow for feature aggregation to produce reliable features which can be used for detection.Some of those methods aggregate features on the full-sequence level or from nearby frames.To create feature maps,existing VID techniques frequently use Convolutional Neural Networks(CNNs)as the backbone network.On the other hand,Vision Transformers have outperformed CNNs in various vision tasks,including object detection in still images and image classification.We propose in this research to use Swin-Transformer,a state-of-the-art Vision Transformer,as an alternative to CNN-based backbone networks for object detection in videos.The proposed architecture enhances the accuracy of existing VID methods.The ImageNet VID and EPIC KITCHENS datasets are used to evaluate the suggested methodology.We have demonstrated that our proposed method is efficient by achieving 84.3%mean average precision(mAP)on ImageNet VID using less memory in comparison to other leading VID techniques.The source code is available on the website https://github.com/amaharek/SwinVid.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2022R1F1A1068828).
文摘Object tracking,an important technology in the field of image processing and computer vision,is used to continuously track a specific object or person in an image.This technology may be effective in identifying the same person within one image,but it has limitations in handling multiple images owing to the difficulty in identifying whether the object appearing in other images is the same.When tracking the same object using two or more images,there must be a way to determine that objects existing in different images are the same object.Therefore,this paper attempts to determine the same object present in different images using color information among the unique information of the object.Thus,this study proposes a multiple-object-tracking method using histogram stamp extraction in closed-circuit television applications.The proposed method determines the presence or absence of a target object in an image by comparing the similarity between the image containing the target object and other images.To this end,a unique color value of the target object is extracted based on its color distribution in the image using three methods:mean,mode,and interquartile range.The Top-N accuracy method is used to analyze the accuracy of each method,and the results show that the mean method had an accuracy of 93.5%(Top-2).Furthermore,the positive prediction value experimental results show that the accuracy of the mean method was 65.7%.As a result of the analysis,it is possible to detect and track the same object present in different images using the unique color of the object.Through the results,it is possible to track the same object that can minimize manpower without using personal information when detecting objects in different images.In the last response speed experiment,it was shown that when the mean was used,the color extraction of the object was possible in real time with 0.016954 s.Through this,it is possible to detect and track the same object in real time when using the proposed method.
文摘Introduction: Cranioencephalic trauma caused by bladed weapons is rare, and that caused by sharp objects is exceptional. The aim of our study was to describe the clinical, therapeutic and evolutionary aspects. Materials and method: This was a descriptive and analytical study over a 48-month period at CHU la Renaissance from January 1, 2018 to December 31, 2021, concerning patients admitted for penetrating cranioencephalic trauma by pointed object. Results: Twelve cases, all male, of penetrating cranioencephalic sharp-force trauma were identified. The mean age was 34 ± 7 years, with extremes of 11 and 60 years. Farmers and herders accounted for 31% and 25% of cases respectively. The average admission time was 47 hours. Brawls were the circumstances of occurrence in 81.2% of cases. Knives (33%), arrows (25%) and iron bars (16.6%) were the objects used. Altered consciousness was present in 43.8% of cases, and focal deficit in 50%. Scannographic lesions were fracture and/or embarrhment (12 cases), intra-parenchymal haematomas (6 cases) and presence of object in place (4 cases). Surgery was performed in 11 patients. Postoperative outcome was favorable in 9 patients. After 12 months, 2 patients were declared unfit. Conclusion: Penetrating head injuries caused by sharp objects are common in Chad. Urgent surgery can prevent disabling after-effects.
文摘This article presents an analysis of the patterns of interactions resulting from the positive and negative emotional events that occur in cities,considering them as complex systems.It explores,from the imaginaries,how certain urban objects can act as emotional agents and how these events affect the urban system as a whole.An adaptive complex systems perspective is used to analyze these patterns.The results show patterns in the processes and dynamics that occur in cities based on the objects that affect the emotions of the people who live there.These patterns depend on the characteristics of the emotional charge of urban objects,but they can be generalized in the following process:(1)immediate reaction by some individuals;(2)emotions are generated at the individual level which begins to generalize,permuting to a collective emotion;(3)a process of reflection is detonated in some individuals from the reading of collective emotions;(4)integration/significance in the community both at the individual and collective level,on the concepts,roles and/or functions that give rise to the process in the system.Therefore,it is clear that emotions play a significant role in the development of cities and these aspects should be considered in the design strategies of all kinds of projects for the city.Future extensions of this work could include a deeper analysis of specific emotional events in urban environments,as well as possible implications for urban policy and decision making.
文摘Structural development defects essentially refer to code structure that violates object-oriented design principles. They make program maintenance challenging and deteriorate software quality over time. Various detection approaches, ranging from traditional heuristic algorithms to machine learning methods, are used to identify these defects. Ensemble learning methods have strengthened the detection of these defects. However, existing approaches do not simultaneously exploit the capabilities of extracting relevant features from pre-trained models and the performance of neural networks for the classification task. Therefore, our goal has been to design a model that combines a pre-trained model to extract relevant features from code excerpts through transfer learning and a bagging method with a base estimator, a dense neural network, for defect classification. To achieve this, we composed multiple samples of the same size with replacements from the imbalanced dataset MLCQ1. For all the samples, we used the CodeT5-small variant to extract features and trained a bagging method with the neural network Roberta Classification Head to classify defects based on these features. We then compared this model to RandomForest, one of the ensemble methods that yields good results. Our experiments showed that the number of base estimators to use for bagging depends on the defect to be detected. Next, we observed that it was not necessary to use a data balancing technique with our model when the imbalance rate was 23%. Finally, for blob detection, RandomForest had a median MCC value of 0.36 compared to 0.12 for our method. However, our method was predominant in Long Method detection with a median MCC value of 0.53 compared to 0.42 for RandomForest. These results suggest that the performance of ensemble methods in detecting structural development defects is dependent on specific defects.
基金the National Natural Science Foundation of China under grant 62172059 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+2 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004“Double First-class”International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25the Young Teacher Growth Plan Project of Changsha University of Science and Technology under Grant 2019QJCZ076.
文摘Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balance between network parameters and training data.It makes the information provided by a small amount of picture data insufficient to optimize model parameters,resulting in unsatisfactory detection results.To improve the accuracy of few shot object detection,this paper proposes a network based on the transformer and high-resolution feature extraction(THR).High-resolution feature extractionmaintains the resolution representation of the image.Channels and spatial attention are used to make the network focus on features that are more useful to the object.In addition,the recently popular transformer is used to fuse the features of the existing object.This compensates for the previous network failure by making full use of existing object features.Experiments on the Pascal VOC and MS-COCO datasets prove that the THR network has achieved better results than previous mainstream few shot object detection.
文摘Object detection plays a vital role in the video surveillance systems.To enhance security,surveillance cameras are now installed in public areas such as traffic signals,roadways,retail malls,train stations,and banks.However,monitor-ing the video continually at a quicker pace is a challenging job.As a consequence,security cameras are useless and need human monitoring.The primary difficulty with video surveillance is identifying abnormalities such as thefts,accidents,crimes,or other unlawful actions.The anomalous action does not occur at a high-er rate than usual occurrences.To detect the object in a video,first we analyze the images pixel by pixel.In digital image processing,segmentation is the process of segregating the individual image parts into pixels.The performance of segmenta-tion is affected by irregular illumination and/or low illumination.These factors highly affect the real-time object detection process in the video surveillance sys-tem.In this paper,a modified ResNet model(M-Resnet)is proposed to enhance the image which is affected by insufficient light.Experimental results provide the comparison of existing method output and modification architecture of the ResNet model shows the considerable amount improvement in detection objects in the video stream.The proposed model shows better results in the metrics like preci-sion,recall,pixel accuracy,etc.,andfinds a reasonable improvement in the object detection.
基金supported by the National Natural Science Foundation of China(NSFC)under Grant No.62001057by Beijing University of Posts and Telecommunications Basic Research Fund,2021RC26by the National Natural Science Foundation of China(NSFC)under Grant Nos.61871048 and 61872253.
文摘Image Super-Resolution(SR)research has achieved great success with powerful neural networks.The deeper networks with more parameters improve the restoration quality but add the computation complexity,which means more inference time would be cost,hindering image SR from practical usage.Noting the spatial distribution of the objects or things in images,a twostage local objects SR system is proposed,which consists of two modules,the object detection module and the SR module.Firstly,You Only Look Once(YOLO),which is efficient in generic object detection tasks,is selected to detect the input images for obtaining objects of interest,then put them into the SR module and output corresponding High-Resolution(HR)subimages.The computational power consumption of image SR is optimized by reducing the resolution of input images.In addition,we establish a dataset,TrafficSign500,for our experiment.Finally,the performance of the proposed system is evaluated under several State-Of-The-Art(SOTA)YOLOv5 and SISR models.Results show that our system can achieve a tremendous computation improvement in image SR.
基金funded by Institutional Fund Projects under Grant No. (IFPIP:557-135-1443).
文摘Object Detection is the task of localization and classification of objects in a video or image.In recent times,because of its widespread applications,it has obtained more importance.In the modern world,waste pollution is one significant environmental problem.The prominence of recycling is known very well for both ecological and economic reasons,and the industry needs higher efficiency.Waste object detection utilizing deep learning(DL)involves training a machine-learning method to classify and detect various types of waste in videos or images.This technology is utilized for several purposes recycling and sorting waste,enhancing waste management and reducing environmental pollution.Recent studies of automatic waste detection are difficult to compare because of the need for benchmarks and broadly accepted standards concerning the employed data andmetrics.Therefore,this study designs an Entropy-based Feature Fusion using Deep Learning forWasteObject Detection and Classification(EFFDL-WODC)algorithm.The presented EFFDL-WODC system inherits the concepts of feature fusion and DL techniques for the effectual recognition and classification of various kinds of waste objects.In the presented EFFDL-WODC system,two major procedures can be contained,such as waste object detection and waste object classification.For object detection,the EFFDL-WODC technique uses a YOLOv7 object detector with a fusionbased backbone network.In addition,entropy feature fusion-based models such as VGG-16,SqueezeNet,and NASNetmodels are used.Finally,the EFFDL-WODC technique uses a graph convolutional network(GCN)model performed for the classification of detected waste objects.The performance validation of the EFFDL-WODC approach was validated on the benchmark database.The comprehensive comparative results demonstrated the improved performance of the EFFDL-WODC technique over recent approaches.
文摘Object detection and classification are the trending research topics in thefield of computer vision because of their applications like visual surveillance.However,the vision-based objects detection and classification methods still suffer from detecting smaller objects and dense objects in the complex dynamic envir-onment with high accuracy and precision.The present paper proposes a novel enhanced method to detect and classify objects using Hyperbolic Tangent based You Only Look Once V4 with a Modified Manta-Ray Foraging Optimization-based Convolution Neural Network.Initially,in the pre-processing,the video data was converted into image sequences and Polynomial Adaptive Edge was applied to preserve the Algorithm method for image resizing and noise removal.The noiseless resized image sequences contrast was enhanced using Contrast Limited Adaptive Edge Preserving Algorithm.And,with the contrast-enhanced image sequences,the Hyperbolic Tangent based You Only Look Once V4 was trained for object detection.Additionally,to detect smaller objects with high accuracy,Grasp configuration was observed for every detected object.Finally,the Modified Manta-Ray Foraging Optimization-based Convolution Neural Network method was carried out for the detection and the classification of objects.Comparative experiments were conducted on various benchmark datasets and methods that showed improved accurate detection and classification results.