Dear Editor,This letter proposes to integrate dendritic learnable network architecture with Vision Transformer to improve the accuracy of image recognition.In this study,based on the theory of dendritic neurons in neu...Dear Editor,This letter proposes to integrate dendritic learnable network architecture with Vision Transformer to improve the accuracy of image recognition.In this study,based on the theory of dendritic neurons in neuroscience,we design a network that is more practical for engineering to classify visual features.Based on this,we propose a dendritic learning-incorporated vision Transformer(DVT),which out-performs other state-of-the-art methods on three image recognition benchmarks.展开更多
Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,ru...Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,rural power grids often lack digitalization;thus,the load distribution within these areas is not fully known.This hinders the calculation of the available PV capacity and deduction of node voltages.This study proposes a load-distribution modeling approach based on remote-sensing image recognition in pursuit of a scientific framework for developing distributed PV resources in rural grid areas.First,houses in remote-sensing images are accurately recognized using deep-learning techniques based on the YOLOv5 model.The distribution of the houses is then used to estimate the load distribution in the grid area.Next,equally spaced and clustered distribution models are used to adaptively determine the location of the nodes and load power in the distribution lines.Finally,by calculating the connectivity matrix of the nodes,a minimum spanning tree is extracted,the topology of the network is constructed,and the node parameters of the load-distribution model are calculated.The proposed scheme is implemented in a software package and its efficacy is demonstrated by analyzing typical remote-sensing images of rural grid areas.The results underscore the ability of the proposed approach to effectively discern the distribution-line structure and compute the node parameters,thereby offering vital support for determining PV access capability.展开更多
Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus ...Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus and has become one of the main problems threatening asparagus production.To improve the ability to accurately identify and localize phenotypic lesions of stem blight in asparagus and to enhance the accuracy of the test,a YOLOv8-CBAM detection algorithm for asparagus stem blight based on YOLOv8 was proposed.The algorithm aims to achieve rapid detection of phenotypic images of asparagus stem blight and to provide effective assistance in the control of asparagus stem blight.To enhance the model’s capacity to capture subtle lesion features,the Convolutional Block AttentionModule(CBAM)is added after C2f in the head.Simultaneously,the original CIoU loss function in YOLOv8 was replaced with the Focal-EIoU loss function,ensuring that the updated loss function emphasizes higher-quality bounding boxes.The YOLOv8-CBAM algorithm can effectively detect asparagus stem blight phenotypic images with a mean average precision(mAP)of 95.51%,which is 0.22%,14.99%,1.77%,and 5.71%higher than the YOLOv5,YOLOv7,YOLOv8,and Mask R-CNN models,respectively.This greatly enhances the efficiency of asparagus growers in identifying asparagus stem blight,aids in improving the prevention and control of asparagus stem blight,and is crucial for the application of computer vision in agriculture.展开更多
Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of part...Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of particles,as well as the discharge glow,strongly depend on discharge parameters.However,traditional manual diagnosis methods for recognizing discharge parameters from discharge images are complicated to operate with low accuracy,time-consuming and high requirement of instruments.To solve these problems,by combining the two mechanisms of attention mechanism(strengthening the extraction of the channel feature)and shortcut connection(enabling the input information to be directly transmitted to deep networks and avoiding the disappearance or explosion of gradients),the network of squeeze and excitation convolution with shortcut(SECS)for complex plasma image recognition is proposed to effectively improve the model performance.The results show that the accuracy,precision,recall and F1-Score of our model are superior to other models in complex plasma image recognition,and the recognition accuracy reaches 97.38%.Moreover,the recognition accuracy for the Flowers and Chest X-ray publicly available data sets reaches 97.85%and 98.65%,respectively,and our model has robustness.This study shows that the proposed model provides a new method for the diagnosis of complex plasma images and also provides technical support for the application of plasma in industrial production.展开更多
Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of s...Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of stained tongue coating from healthy students at Hunan University of Chinese Medicine and 1007 images of pathological(non-stained)tongue coat-ing from hospitalized patients at The First Hospital of Hunan University of Chinese Medicine withlungcancer;diabetes;andhypertensionwerecollected.Thetongueimageswererandomi-zed into the training;validation;and testing datasets in a 7:2:1 ratio.A deep learning model was constructed using the ResNet50 for recognizing stained tongue coating in the training and validation datasets.The training period was 90 epochs.The model’s performance was evaluated by its accuracy;loss curve;recall;F1 score;confusion matrix;receiver operating characteristic(ROC)curve;and precision-recall(PR)curve in the tasks of predicting stained tongue coating images in the testing dataset.The accuracy of the deep learning model was compared with that of attending physicians of traditional Chinese medicine(TCM).Results The training results showed that after 90 epochs;the model presented an excellent classification performance.The loss curve and accuracy were stable;showing no signs of overfitting.The model achieved an accuracy;recall;and F1 score of 92%;91%;and 92%;re-spectively.The confusion matrix revealed an accuracy of 92%for the model and 69%for TCM practitioners.The areas under the ROC and PR curves were 0.97 and 0.95;respectively.Conclusion The deep learning model constructed using ResNet50 can effectively recognize stained coating images with greater accuracy than visual inspection of TCM practitioners.This model has the potential to assist doctors in identifying false tongue coating and prevent-ing misdiagnosis.展开更多
This study delves into the applications,challenges,and future directions of deep learning techniques in the field of image recognition.Deep learning,particularly Convolutional Neural Networks(CNNs),Recurrent Neural Ne...This study delves into the applications,challenges,and future directions of deep learning techniques in the field of image recognition.Deep learning,particularly Convolutional Neural Networks(CNNs),Recurrent Neural Networks(RNNs),and Generative Adversarial Networks(GANs),has become key to enhancing the precision and efficiency of image recognition.These models are capable of processing complex visual data,facilitating efficient feature extraction and image classification.However,acquiring and annotating high-quality,diverse datasets,addressing imbalances in datasets,and model training and optimization remain significant challenges in this domain.The paper proposes strategies for improving data augmentation,optimizing model architectures,and employing automated model optimization tools to address these challenges,while also emphasizing the importance of considering ethical issues in technological advancements.As technology continues to evolve,the application of deep learning in image recognition will further demonstrate its potent capability to solve complex problems,driving society towards more inclusive and diverse development.展开更多
Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi...Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.展开更多
This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system ad...This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system adopts advanced image recognition technology to monitor road conditions in real-time through the camera and to detect and measure distance to foreign objects on the road.The system visualizes the detection results on the wheelchair screen to assist the user in avoiding and improving the safety of their daily travel.In addition,the system also includes crawler tracks,seat heating,snow and rain protection,and other functions.The wheelchair has a wide range of application prospects and development potential.It is expected to be widely used in the future,providing a strong guarantee for the safe travel of disabled individuals in China.展开更多
The fraudulent website image is a vital information carrier for telecom fraud.The efficient and precise recognition of fraudulent website images is critical to combating and dealing with fraudulent websites.Current re...The fraudulent website image is a vital information carrier for telecom fraud.The efficient and precise recognition of fraudulent website images is critical to combating and dealing with fraudulent websites.Current research on image recognition of fraudulent websites is mainly carried out at the level of image feature extraction and similarity study,which have such disadvantages as difficulty in obtaining image data,insufficient image analysis,and single identification types.This study develops a model based on the entropy method for image leader decision and Inception-v3 transfer learning to address these disadvantages.The data processing part of the model uses a breadth search crawler to capture the image data.Then,the information in the images is evaluated with the entropy method,image weights are assigned,and the image leader is selected.In model training and prediction,the transfer learning of the Inception-v3 model is introduced into image recognition of fraudulent websites.Using selected image leaders to train the model,multiple types of fraudulent websites are identified with high accuracy.The experiment proves that this model has a superior accuracy in recognizing images on fraudulent websites compared to other current models.展开更多
The rail surface status image is affected by the noise in the shooting environment and contains a large amount of interference information, which increases the difficulty of rail surface status identification. In orde...The rail surface status image is affected by the noise in the shooting environment and contains a large amount of interference information, which increases the difficulty of rail surface status identification. In order to solve this problem, a preprocessing method for the rail surface state image is proposed. The preprocessing process mainly includes image graying, image denoising, image geometric correction, image extraction, data amplification, and finally building the rail surface image database. The experimental results show that this method can efficiently complete image processing, facilitate feature extraction of rail surface status images, and improve rail surface status recognition accuracy.展开更多
Geological discontinuity(GD)plays a pivotal role in determining the catastrophic mechanical failure of jointed rock masses.Accurate and efficient acquisition of GD networks is essential for characterizing and understa...Geological discontinuity(GD)plays a pivotal role in determining the catastrophic mechanical failure of jointed rock masses.Accurate and efficient acquisition of GD networks is essential for characterizing and understanding the progressive damage mechanisms of slopes based on monitoring image data.Inspired by recent advances in computer vision,deep learning(DL)models have been widely utilized for image-based fracture identification.The multi-scale characteristics,image resolution and annotation quality of images will cause a scale-space effect(SSE)that makes features indistinguishable from noise,directly affecting the accuracy.However,this effect has not received adequate attention.Herein,we try to address this gap by collecting slope images at various proportional scales and constructing multi-scale datasets using image processing techniques.Next,we quantify the intensity of feature signals using metrics such as peak signal-to-noise ratio(PSNR)and structural similarity(SSIM).Combining these metrics with the scale-space theory,we investigate the influence of the SSE on the differentiation of multi-scale features and the accuracy of recognition.It is found that augmenting the image's detail capacity does not always yield benefits for vision-based recognition models.In light of these observations,we propose a scale hybridization approach based on the diffusion mechanism of scale-space representation.The results show that scale hybridization strengthens the tolerance of multi-scale feature recognition under complex environmental noise interference and significantly enhances the recognition accuracy of GD.It also facilitates the objective understanding,description and analysis of the rock behavior and stability of slopes from the perspective of image data.展开更多
This document presents a framework for recognizing people by palm vein distribution analysis using cross-correlation based signatures to obtain descriptors. Haar wavelets are useful in reducing the number of features ...This document presents a framework for recognizing people by palm vein distribution analysis using cross-correlation based signatures to obtain descriptors. Haar wavelets are useful in reducing the number of features while maintaining high recognition rates. This experiment achieved 97.5% of individuals classified correctly with two levels of Haar wavelets. This study used twelve-version of RGB and NIR (near infrared) wavelength images per individual. One hundred people were studied;therefore 4,800 instances compose the complete database. A Multilayer Perceptron (MLP) was trained to improve the recognition rate in a k-fold cross-validation test with k = 10. Classification results using MLP neural network were obtained using Weka (open source machine learning software).展开更多
The accumulation of snow and ice on PV modules can have a detrimental impact on power generation,leading to reduced efficiency for prolonged periods.Thus,it becomes imperative to develop an intelligent system capable ...The accumulation of snow and ice on PV modules can have a detrimental impact on power generation,leading to reduced efficiency for prolonged periods.Thus,it becomes imperative to develop an intelligent system capable of accurately assessing the extent of snow and ice coverage on PV modules.To address this issue,the article proposes an innovative ice and snow recognition algorithm that effectively segments the ice and snow areas within the collected images.Furthermore,the algorithm incorporates an analysis of the morphological characteristics of ice and snow coverage on PV modules,allowing for the establishment of a residual ice and snow recognition process.This process utilizes both the external ellipse method and the pixel statistical method to refine the identification process.The effectiveness of the proposed algorithm is validated through extensive testing with isolated and continuous snow area pictures.The results demonstrate the algorithm’s accuracy and reliability in identifying and quantifying residual snow and ice on PV modules.In conclusion,this research presents a valuable method for accurately detecting and quantifying snow and ice coverage on PV modules.This breakthrough is of utmost significance for PV power plants,as it enables predictions of power generation efficiency and facilitates efficient PV maintenance during the challenging winter conditions characterized by snow and ice.By proactively managing snow and ice coverage,PV power plants can optimize energy production and minimize downtime,ensuring a sustainable and reliable renewable energy supply.展开更多
With the arrival of new data acquisition platforms derived from the Internet of Things(IoT),this paper goes beyond the understanding of traditional remote sensing technologies.Deep fusion of remote sensing and compute...With the arrival of new data acquisition platforms derived from the Internet of Things(IoT),this paper goes beyond the understanding of traditional remote sensing technologies.Deep fusion of remote sensing and computer vision has hit the industrial world and makes it possible to apply Artificial intelligence to solve problems such as automatic extraction of information and image interpretation.However,due to the complex architecture of IoT and the lack of a unified security protection mechanism,devices in remote sensing are vulnerable to privacy leaks when sharing data.It is necessary to design a security scheme suitable for computation‐limited devices in IoT,since traditional encryption methods are based on computational complexity.Visual Cryptography(VC)is a threshold scheme for images that can be decoded directly by the human visual system when superimposing encrypted images.The stacking‐to‐see feature and simple Boolean decryption operation make VC an ideal solution for privacy‐preserving recognition for large‐scale remote sensing images in IoT.In this study,the secure and efficient transmission of high‐resolution remote sensing images by meaningful VC is achieved.By diffusing the error between the encryption block and the original block to adjacent blocks,the degradation of quality in recovery images is mitigated.By fine‐tuning the pre‐trained model from large‐scale datasets,we improve the recognition performance of small encryption datasets for remote sensing images.The experimental results show that the proposed lightweight privacy‐preserving recognition framework maintains high recognition performance while enhancing security.展开更多
Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Car...Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Cardiology,medical imaging technology(2D ultrasonic,MRI)has been proved to be helpful to detect congenital defects of the fetal heart and assists sonographers in prenatal diagnosis.It is a highly complex task to recognize 2D fetal heart ultrasonic standard plane(FHUSP)manually.Compared withmanual identification,automatic identification through artificial intelligence can save a lot of time,ensure the efficiency of diagnosis,and improve the accuracy of diagnosis.In this study,a feature extraction method based on texture features(Local Binary Pattern LBP and Histogram of Oriented Gradient HOG)and combined with Bag of Words(BOW)model is carried out,and then feature fusion is performed.Finally,it adopts Support VectorMachine(SVM)to realize automatic recognition and classification of FHUSP.The data includes 788 standard plane data sets and 448 normal and abnormal plane data sets.Compared with some other methods and the single method model,the classification accuracy of our model has been obviously improved,with the highest accuracy reaching 87.35%.Similarly,we also verify the performance of the model in normal and abnormal planes,and the average accuracy in classifying abnormal and normal planes is 84.92%.The experimental results show that thismethod can effectively classify and predict different FHUSP and can provide certain assistance for sonographers to diagnose fetal congenital heart disease.展开更多
As the COVID-19 epidemic spread across the globe,people around the world were advised or mandated to wear masks in public places to prevent its spreading further.In some cases,not wearing a mask could result in a fine...As the COVID-19 epidemic spread across the globe,people around the world were advised or mandated to wear masks in public places to prevent its spreading further.In some cases,not wearing a mask could result in a fine.To monitor mask wearing,and to prevent the spread of future epidemics,this study proposes an image recognition system consisting of a camera,an infrared thermal array sensor,and a convolutional neural network trained in mask recognition.The infrared sensor monitors body temperature and displays the results in real-time on a liquid crystal display screen.The proposed system reduces the inefficiency of traditional object detection by providing training data according to the specific needs of the user and by applying You Only Look Once Version 4(YOLOv4)object detection technology,which experiments show has more efficient training parameters and a higher level of accuracy in object recognition.All datasets are uploaded to the cloud for storage using Google Colaboratory,saving human resources and achieving a high level of efficiency at a low cost.展开更多
BACKGROUND Small intestinal vascular malformations(angiodysplasias)are common causes of small intestinal bleeding.While capsule endoscopy has become the primary diagnostic method for angiodysplasia,manual reading of t...BACKGROUND Small intestinal vascular malformations(angiodysplasias)are common causes of small intestinal bleeding.While capsule endoscopy has become the primary diagnostic method for angiodysplasia,manual reading of the entire gastrointestinal tract is time-consuming and requires a heavy workload,which affects the accuracy of diagnosis.AIM To evaluate whether artificial intelligence can assist the diagnosis and increase the detection rate of angiodysplasias in the small intestine,achieve automatic disease detection,and shorten the capsule endoscopy(CE)reading time.METHODS A convolutional neural network semantic segmentation model with a feature fusion method,which automatically recognizes the category of vascular dysplasia under CE and draws the lesion contour,thus improving the efficiency and accuracy of identifying small intestinal vascular malformation lesions,was proposed.Resnet-50 was used as the skeleton network to design the fusion mechanism,fuse the shallow and depth features,and classify the images at the pixel level to achieve the segmentation and recognition of vascular dysplasia.The training set and test set were constructed and compared with PSPNet,Deeplab3+,and UperNet.RESULTS The test set constructed in the study achieved satisfactory results,where pixel accuracy was 99%,mean intersection over union was 0.69,negative predictive value was 98.74%,and positive predictive value was 94.27%.The model parameter was 46.38 M,the float calculation was 467.2 G,and the time length to segment and recognize a picture was 0.6 s.CONCLUSION Constructing a segmentation network based on deep learning to segment and recognize angiodysplasias lesions is an effective and feasible method for diagnosing angiodysplasias lesions.展开更多
Seabed sediment recognition is vital for the exploitation of marine resources.Side-scan sonar(SSS)is an excellent tool for acquiring the imagery of seafloor topography.Combined with ocean surface sampling,it provides ...Seabed sediment recognition is vital for the exploitation of marine resources.Side-scan sonar(SSS)is an excellent tool for acquiring the imagery of seafloor topography.Combined with ocean surface sampling,it provides detailed and accurate images of marine substrate features.Most of the processing of SSS imagery works around limited sampling stations and requires manual interpretation to complete the classification of seabed sediment imagery.In complex sea areas,with manual interpretation,small targets are often lost due to a large amount of information.To date,studies related to the automatic recognition of seabed sediments are still few.This paper proposes a seabed sediment recognition method based on You Only Look Once version 5 and SSS imagery to perform real-time sedi-ment classification and localization for accuracy,particularly on small targets and faster speeds.We used methods such as changing the dataset size,epoch,and optimizer and adding multiscale training to overcome the challenges of having a small sample and a low accuracy.With these methods,we improved the results on mean average precision by 8.98%and F1 score by 11.12%compared with the original method.In addition,the detection speed was approximately 100 frames per second,which is faster than that of previous methods.This speed enabled us to achieve real-time seabed sediment recognition from SSS imagery.展开更多
Introduction: Ultrafast latest developments in artificial intelligence (ΑΙ) have recently multiplied concerns regarding the future of robotic autonomy in surgery. However, the literature on the topic is still scarce...Introduction: Ultrafast latest developments in artificial intelligence (ΑΙ) have recently multiplied concerns regarding the future of robotic autonomy in surgery. However, the literature on the topic is still scarce. Aim: To test a novel AI commercially available tool for image analysis on a series of laparoscopic scenes. Methods: The research tools included OPENAI CHATGPT 4.0 with its corresponding image recognition plugin which was fed with a list of 100 laparoscopic selected snapshots from common surgical procedures. In order to score reliability of received responses from image-recognition bot, two corresponding scales were developed ranging from 0 - 5. The set of images was divided into two groups: unlabeled (Group A) and labeled (Group B), and according to the type of surgical procedure or image resolution. Results: AI was able to recognize correctly the context of surgical-related images in 97% of its reports. For the labeled surgical pictures, the image-processing bot scored 3.95/5 (79%), whilst for the unlabeled, it scored 2.905/5 (58.1%). Phases of the procedure were commented in detail, after all successful interpretations. With rates 4 - 5/5, the chatbot was able to talk in detail about the indications, contraindications, stages, instrumentation, complications and outcome rates of the operation discussed. Conclusion: Interaction between surgeon and chatbot appears to be an interesting frontend for further research by clinicians in parallel with evolution of its complex underlying infrastructure. In this early phase of using artificial intelligence for image recognition in surgery, no safe conclusions can be drawn by small cohorts with commercially available software. Further development of medically-oriented AI software and clinical world awareness are expected to bring fruitful information on the topic in the years to come.展开更多
The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recogni...The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recognition system that allows the isolation,the extraction,and the recognition of text in the case of documents having a textured background,a degraded aspect of colors,and of poor quality,and to synthesize it into speech.This system basically consists of three algorithms:a text localization and detection algorithm based on mathematical morphology method(MMM);a text extraction algorithm based on the gamma correction method(GCM);and an optical character recognition(OCR)algorithm for text recognition.A detailed complexity study of the different blocks of this text recognition system has been realized.Following this study,an acceleration of the GCM algorithm(AGCM)is proposed.The AGCM algorithm has reduced the complexity in the text recognition system by 70%and kept the same quality of text recognition as that of the original method.To assist visually impaired persons,a graphical interface of the entire text recognition chain has been developed,allowing the capture of images from a camera,rapid and intuitive visualization of the recognized text from this image,and text-to-speech synthesis.Our text recognition system provides an improvement of 6.8%for the recognition rate and 7.6%for the F-measure relative to GCM and AGCM algorithms.展开更多
基金partially supported by the Japan Society for the Promotion of Science(JSPS)KAKENHI(JP22H03643)Japan Science and Technology Agency(JST)Support for Pioneering Research Initiated by the Next Generation(SPRING)(JPMJSP2145)JST through the Establishment of University Fellowships towards the Creation of Science Technology Innovation(JPMJFS2115)。
文摘Dear Editor,This letter proposes to integrate dendritic learnable network architecture with Vision Transformer to improve the accuracy of image recognition.In this study,based on the theory of dendritic neurons in neuroscience,we design a network that is more practical for engineering to classify visual features.Based on this,we propose a dendritic learning-incorporated vision Transformer(DVT),which out-performs other state-of-the-art methods on three image recognition benchmarks.
基金supported by the State Grid Science&Technology Project of China(5400-202224153A-1-1-ZN).
文摘Expanding photovoltaic(PV)resources in rural-grid areas is an essential means to augment the share of solar energy in the energy landscape,aligning with the“carbon peaking and carbon neutrality”objectives.However,rural power grids often lack digitalization;thus,the load distribution within these areas is not fully known.This hinders the calculation of the available PV capacity and deduction of node voltages.This study proposes a load-distribution modeling approach based on remote-sensing image recognition in pursuit of a scientific framework for developing distributed PV resources in rural grid areas.First,houses in remote-sensing images are accurately recognized using deep-learning techniques based on the YOLOv5 model.The distribution of the houses is then used to estimate the load distribution in the grid area.Next,equally spaced and clustered distribution models are used to adaptively determine the location of the nodes and load power in the distribution lines.Finally,by calculating the connectivity matrix of the nodes,a minimum spanning tree is extracted,the topology of the network is constructed,and the node parameters of the load-distribution model are calculated.The proposed scheme is implemented in a software package and its efficacy is demonstrated by analyzing typical remote-sensing images of rural grid areas.The results underscore the ability of the proposed approach to effectively discern the distribution-line structure and compute the node parameters,thereby offering vital support for determining PV access capability.
基金supported by the Feicheng Artificial Intelligence Robot and Smart Agriculture Service Platform(381387).
文摘Asparagus stem blight,also known as“asparagus cancer”,is a serious plant disease with a regional distribution.The widespread occurrence of the disease has had a negative impact on the yield and quality of asparagus and has become one of the main problems threatening asparagus production.To improve the ability to accurately identify and localize phenotypic lesions of stem blight in asparagus and to enhance the accuracy of the test,a YOLOv8-CBAM detection algorithm for asparagus stem blight based on YOLOv8 was proposed.The algorithm aims to achieve rapid detection of phenotypic images of asparagus stem blight and to provide effective assistance in the control of asparagus stem blight.To enhance the model’s capacity to capture subtle lesion features,the Convolutional Block AttentionModule(CBAM)is added after C2f in the head.Simultaneously,the original CIoU loss function in YOLOv8 was replaced with the Focal-EIoU loss function,ensuring that the updated loss function emphasizes higher-quality bounding boxes.The YOLOv8-CBAM algorithm can effectively detect asparagus stem blight phenotypic images with a mean average precision(mAP)of 95.51%,which is 0.22%,14.99%,1.77%,and 5.71%higher than the YOLOv5,YOLOv7,YOLOv8,and Mask R-CNN models,respectively.This greatly enhances the efficiency of asparagus growers in identifying asparagus stem blight,aids in improving the prevention and control of asparagus stem blight,and is crucial for the application of computer vision in agriculture.
基金This study was supported by a grand from the National Natural Science Foundation of China(No.12075315).
文摘Complex plasma widely exists in thin film deposition,material surface modification,and waste gas treatment in industrial plasma processes.During complex plasma discharge,the configuration,distribution,and size of particles,as well as the discharge glow,strongly depend on discharge parameters.However,traditional manual diagnosis methods for recognizing discharge parameters from discharge images are complicated to operate with low accuracy,time-consuming and high requirement of instruments.To solve these problems,by combining the two mechanisms of attention mechanism(strengthening the extraction of the channel feature)and shortcut connection(enabling the input information to be directly transmitted to deep networks and avoiding the disappearance or explosion of gradients),the network of squeeze and excitation convolution with shortcut(SECS)for complex plasma image recognition is proposed to effectively improve the model performance.The results show that the accuracy,precision,recall and F1-Score of our model are superior to other models in complex plasma image recognition,and the recognition accuracy reaches 97.38%.Moreover,the recognition accuracy for the Flowers and Chest X-ray publicly available data sets reaches 97.85%and 98.65%,respectively,and our model has robustness.This study shows that the proposed model provides a new method for the diagnosis of complex plasma images and also provides technical support for the application of plasma in industrial production.
基金National Natural Science Foundation of China(82274411)Science and Technology Innovation Program of Hunan Province(2022RC1021)Leading Research Project of Hunan University of Chinese Medicine(2022XJJB002).
文摘Objective To build a dataset encompassing a large number of stained tongue coating images and process it using deep learning to automatically recognize stained tongue coating images.Methods A total of 1001 images of stained tongue coating from healthy students at Hunan University of Chinese Medicine and 1007 images of pathological(non-stained)tongue coat-ing from hospitalized patients at The First Hospital of Hunan University of Chinese Medicine withlungcancer;diabetes;andhypertensionwerecollected.Thetongueimageswererandomi-zed into the training;validation;and testing datasets in a 7:2:1 ratio.A deep learning model was constructed using the ResNet50 for recognizing stained tongue coating in the training and validation datasets.The training period was 90 epochs.The model’s performance was evaluated by its accuracy;loss curve;recall;F1 score;confusion matrix;receiver operating characteristic(ROC)curve;and precision-recall(PR)curve in the tasks of predicting stained tongue coating images in the testing dataset.The accuracy of the deep learning model was compared with that of attending physicians of traditional Chinese medicine(TCM).Results The training results showed that after 90 epochs;the model presented an excellent classification performance.The loss curve and accuracy were stable;showing no signs of overfitting.The model achieved an accuracy;recall;and F1 score of 92%;91%;and 92%;re-spectively.The confusion matrix revealed an accuracy of 92%for the model and 69%for TCM practitioners.The areas under the ROC and PR curves were 0.97 and 0.95;respectively.Conclusion The deep learning model constructed using ResNet50 can effectively recognize stained coating images with greater accuracy than visual inspection of TCM practitioners.This model has the potential to assist doctors in identifying false tongue coating and prevent-ing misdiagnosis.
文摘This study delves into the applications,challenges,and future directions of deep learning techniques in the field of image recognition.Deep learning,particularly Convolutional Neural Networks(CNNs),Recurrent Neural Networks(RNNs),and Generative Adversarial Networks(GANs),has become key to enhancing the precision and efficiency of image recognition.These models are capable of processing complex visual data,facilitating efficient feature extraction and image classification.However,acquiring and annotating high-quality,diverse datasets,addressing imbalances in datasets,and model training and optimization remain significant challenges in this domain.The paper proposes strategies for improving data augmentation,optimizing model architectures,and employing automated model optimization tools to address these challenges,while also emphasizing the importance of considering ethical issues in technological advancements.As technology continues to evolve,the application of deep learning in image recognition will further demonstrate its potent capability to solve complex problems,driving society towards more inclusive and diverse development.
文摘Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security.Currently,with the emergence of massive high-resolution multi-modality images,the use of multi-modality images for fine-grained recognition has become a promising technology.Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples.The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features.The attention mechanism helps the model to pinpoint the key information in the image,resulting in a significant improvement in the model’s performance.In this paper,a dataset for fine-grained recognition of ships based on visible and near-infrared multi-modality remote sensing images has been proposed first,named Dataset for Multimodal Fine-grained Recognition of Ships(DMFGRS).It includes 1,635 pairs of visible and near-infrared remote sensing images divided into 20 categories,collated from digital orthophotos model provided by commercial remote sensing satellites.DMFGRS provides two types of annotation format files,as well as segmentation mask images corresponding to the ship targets.Then,a Multimodal Information Cross-Enhancement Network(MICE-Net)fusing features of visible and near-infrared remote sensing images,has been proposed.In the network,a dual-branch feature extraction and fusion module has been designed to obtain more expressive features.The Feature Cross Enhancement Module(FCEM)achieves the fusion enhancement of the two modal features by making the channel attention and spatial attention work cross-functionally on the feature map.A benchmark is established by evaluating state-of-the-art object recognition algorithms on DMFGRS.MICE-Net conducted experiments on DMFGRS,and the precision,recall,mAP0.5 and mAP0.5:0.95 reached 87%,77.1%,83.8%and 63.9%,respectively.Extensive experiments demonstrate that the proposed MICE-Net has more excellent performance on DMFGRS.Built on lightweight network YOLO,the model has excellent generalizability,and thus has good potential for application in real-life scenarios.
文摘This paper introduces an intelligent image recognition system integrated into a wheelchair based on deep learning in cold environments,aiming to improve the convenience and safety of disabled individuals.The system adopts advanced image recognition technology to monitor road conditions in real-time through the camera and to detect and measure distance to foreign objects on the road.The system visualizes the detection results on the wheelchair screen to assist the user in avoiding and improving the safety of their daily travel.In addition,the system also includes crawler tracks,seat heating,snow and rain protection,and other functions.The wheelchair has a wide range of application prospects and development potential.It is expected to be widely used in the future,providing a strong guarantee for the safe travel of disabled individuals in China.
基金supported by the National Social Science Fund of China(23BGL272)。
文摘The fraudulent website image is a vital information carrier for telecom fraud.The efficient and precise recognition of fraudulent website images is critical to combating and dealing with fraudulent websites.Current research on image recognition of fraudulent websites is mainly carried out at the level of image feature extraction and similarity study,which have such disadvantages as difficulty in obtaining image data,insufficient image analysis,and single identification types.This study develops a model based on the entropy method for image leader decision and Inception-v3 transfer learning to address these disadvantages.The data processing part of the model uses a breadth search crawler to capture the image data.Then,the information in the images is evaluated with the entropy method,image weights are assigned,and the image leader is selected.In model training and prediction,the transfer learning of the Inception-v3 model is introduced into image recognition of fraudulent websites.Using selected image leaders to train the model,multiple types of fraudulent websites are identified with high accuracy.The experiment proves that this model has a superior accuracy in recognizing images on fraudulent websites compared to other current models.
文摘The rail surface status image is affected by the noise in the shooting environment and contains a large amount of interference information, which increases the difficulty of rail surface status identification. In order to solve this problem, a preprocessing method for the rail surface state image is proposed. The preprocessing process mainly includes image graying, image denoising, image geometric correction, image extraction, data amplification, and finally building the rail surface image database. The experimental results show that this method can efficiently complete image processing, facilitate feature extraction of rail surface status images, and improve rail surface status recognition accuracy.
基金supported by the National Natural Science Foundation of China(Grant No.52090081)the State Key Laboratory of Hydro-science and Hydraulic Engineering(Grant No.2021-KY-04).
文摘Geological discontinuity(GD)plays a pivotal role in determining the catastrophic mechanical failure of jointed rock masses.Accurate and efficient acquisition of GD networks is essential for characterizing and understanding the progressive damage mechanisms of slopes based on monitoring image data.Inspired by recent advances in computer vision,deep learning(DL)models have been widely utilized for image-based fracture identification.The multi-scale characteristics,image resolution and annotation quality of images will cause a scale-space effect(SSE)that makes features indistinguishable from noise,directly affecting the accuracy.However,this effect has not received adequate attention.Herein,we try to address this gap by collecting slope images at various proportional scales and constructing multi-scale datasets using image processing techniques.Next,we quantify the intensity of feature signals using metrics such as peak signal-to-noise ratio(PSNR)and structural similarity(SSIM).Combining these metrics with the scale-space theory,we investigate the influence of the SSE on the differentiation of multi-scale features and the accuracy of recognition.It is found that augmenting the image's detail capacity does not always yield benefits for vision-based recognition models.In light of these observations,we propose a scale hybridization approach based on the diffusion mechanism of scale-space representation.The results show that scale hybridization strengthens the tolerance of multi-scale feature recognition under complex environmental noise interference and significantly enhances the recognition accuracy of GD.It also facilitates the objective understanding,description and analysis of the rock behavior and stability of slopes from the perspective of image data.
文摘This document presents a framework for recognizing people by palm vein distribution analysis using cross-correlation based signatures to obtain descriptors. Haar wavelets are useful in reducing the number of features while maintaining high recognition rates. This experiment achieved 97.5% of individuals classified correctly with two levels of Haar wavelets. This study used twelve-version of RGB and NIR (near infrared) wavelength images per individual. One hundred people were studied;therefore 4,800 instances compose the complete database. A Multilayer Perceptron (MLP) was trained to improve the recognition rate in a k-fold cross-validation test with k = 10. Classification results using MLP neural network were obtained using Weka (open source machine learning software).
基金supported by the Key Research and Development Projects in Shaanxi Province(Program No.2021GY-306)the Innovation Capability Support Program of Shaanxi(Program No.2022KJXX-41)the Key Scientific and Technological Projects of Xi’an(Program No.2022JH-RGZN-0005).
文摘The accumulation of snow and ice on PV modules can have a detrimental impact on power generation,leading to reduced efficiency for prolonged periods.Thus,it becomes imperative to develop an intelligent system capable of accurately assessing the extent of snow and ice coverage on PV modules.To address this issue,the article proposes an innovative ice and snow recognition algorithm that effectively segments the ice and snow areas within the collected images.Furthermore,the algorithm incorporates an analysis of the morphological characteristics of ice and snow coverage on PV modules,allowing for the establishment of a residual ice and snow recognition process.This process utilizes both the external ellipse method and the pixel statistical method to refine the identification process.The effectiveness of the proposed algorithm is validated through extensive testing with isolated and continuous snow area pictures.The results demonstrate the algorithm’s accuracy and reliability in identifying and quantifying residual snow and ice on PV modules.In conclusion,this research presents a valuable method for accurately detecting and quantifying snow and ice coverage on PV modules.This breakthrough is of utmost significance for PV power plants,as it enables predictions of power generation efficiency and facilitates efficient PV maintenance during the challenging winter conditions characterized by snow and ice.By proactively managing snow and ice coverage,PV power plants can optimize energy production and minimize downtime,ensuring a sustainable and reliable renewable energy supply.
基金supported in part by the National Natural Science Foundation of China under Grants(62250410365,62071084)the Guangdong Basic and Applied Basic Research Foundation of China(2022A1515011542)the Guangzhou Science and technology program of China(202201010606).
文摘With the arrival of new data acquisition platforms derived from the Internet of Things(IoT),this paper goes beyond the understanding of traditional remote sensing technologies.Deep fusion of remote sensing and computer vision has hit the industrial world and makes it possible to apply Artificial intelligence to solve problems such as automatic extraction of information and image interpretation.However,due to the complex architecture of IoT and the lack of a unified security protection mechanism,devices in remote sensing are vulnerable to privacy leaks when sharing data.It is necessary to design a security scheme suitable for computation‐limited devices in IoT,since traditional encryption methods are based on computational complexity.Visual Cryptography(VC)is a threshold scheme for images that can be decoded directly by the human visual system when superimposing encrypted images.The stacking‐to‐see feature and simple Boolean decryption operation make VC an ideal solution for privacy‐preserving recognition for large‐scale remote sensing images in IoT.In this study,the secure and efficient transmission of high‐resolution remote sensing images by meaningful VC is achieved.By diffusing the error between the encryption block and the original block to adjacent blocks,the degradation of quality in recovery images is mitigated.By fine‐tuning the pre‐trained model from large‐scale datasets,we improve the recognition performance of small encryption datasets for remote sensing images.The experimental results show that the proposed lightweight privacy‐preserving recognition framework maintains high recognition performance while enhancing security.
基金supported by Fujian Provincial Science and Technology Major Project(No.2020HZ02014)by the grants from National Natural Science Foundation of Fujian(2021J01133,2021J011404)by the Quanzhou Scientific and Technological Planning Projects(Nos.2018C113R,2019C028R,2019C029R,2019C076R and 2019C099R).
文摘Congenital heart defect,accounting for about 30%of congenital defects,is the most common one.Data shows that congenital heart defects have seriously affected the birth rate of healthy newborns.In Fetal andNeonatal Cardiology,medical imaging technology(2D ultrasonic,MRI)has been proved to be helpful to detect congenital defects of the fetal heart and assists sonographers in prenatal diagnosis.It is a highly complex task to recognize 2D fetal heart ultrasonic standard plane(FHUSP)manually.Compared withmanual identification,automatic identification through artificial intelligence can save a lot of time,ensure the efficiency of diagnosis,and improve the accuracy of diagnosis.In this study,a feature extraction method based on texture features(Local Binary Pattern LBP and Histogram of Oriented Gradient HOG)and combined with Bag of Words(BOW)model is carried out,and then feature fusion is performed.Finally,it adopts Support VectorMachine(SVM)to realize automatic recognition and classification of FHUSP.The data includes 788 standard plane data sets and 448 normal and abnormal plane data sets.Compared with some other methods and the single method model,the classification accuracy of our model has been obviously improved,with the highest accuracy reaching 87.35%.Similarly,we also verify the performance of the model in normal and abnormal planes,and the average accuracy in classifying abnormal and normal planes is 84.92%.The experimental results show that thismethod can effectively classify and predict different FHUSP and can provide certain assistance for sonographers to diagnose fetal congenital heart disease.
文摘As the COVID-19 epidemic spread across the globe,people around the world were advised or mandated to wear masks in public places to prevent its spreading further.In some cases,not wearing a mask could result in a fine.To monitor mask wearing,and to prevent the spread of future epidemics,this study proposes an image recognition system consisting of a camera,an infrared thermal array sensor,and a convolutional neural network trained in mask recognition.The infrared sensor monitors body temperature and displays the results in real-time on a liquid crystal display screen.The proposed system reduces the inefficiency of traditional object detection by providing training data according to the specific needs of the user and by applying You Only Look Once Version 4(YOLOv4)object detection technology,which experiments show has more efficient training parameters and a higher level of accuracy in object recognition.All datasets are uploaded to the cloud for storage using Google Colaboratory,saving human resources and achieving a high level of efficiency at a low cost.
基金Chongqing Technological Innovation and Application Development Project,Key Technologies and Applications of Cross Media Analysis and Reasoning,No.cstc2019jscx-zdztzxX0037.
文摘BACKGROUND Small intestinal vascular malformations(angiodysplasias)are common causes of small intestinal bleeding.While capsule endoscopy has become the primary diagnostic method for angiodysplasia,manual reading of the entire gastrointestinal tract is time-consuming and requires a heavy workload,which affects the accuracy of diagnosis.AIM To evaluate whether artificial intelligence can assist the diagnosis and increase the detection rate of angiodysplasias in the small intestine,achieve automatic disease detection,and shorten the capsule endoscopy(CE)reading time.METHODS A convolutional neural network semantic segmentation model with a feature fusion method,which automatically recognizes the category of vascular dysplasia under CE and draws the lesion contour,thus improving the efficiency and accuracy of identifying small intestinal vascular malformation lesions,was proposed.Resnet-50 was used as the skeleton network to design the fusion mechanism,fuse the shallow and depth features,and classify the images at the pixel level to achieve the segmentation and recognition of vascular dysplasia.The training set and test set were constructed and compared with PSPNet,Deeplab3+,and UperNet.RESULTS The test set constructed in the study achieved satisfactory results,where pixel accuracy was 99%,mean intersection over union was 0.69,negative predictive value was 98.74%,and positive predictive value was 94.27%.The model parameter was 46.38 M,the float calculation was 467.2 G,and the time length to segment and recognize a picture was 0.6 s.CONCLUSION Constructing a segmentation network based on deep learning to segment and recognize angiodysplasias lesions is an effective and feasible method for diagnosing angiodysplasias lesions.
基金funded by the Natural Science Foundation of Fujian Province(No.2018J01063)the Project of Deep Learning Based Underwater Cultural Relics Recognization(No.38360041)the Project of the State Administration of Cultural Relics(No.2018300).
文摘Seabed sediment recognition is vital for the exploitation of marine resources.Side-scan sonar(SSS)is an excellent tool for acquiring the imagery of seafloor topography.Combined with ocean surface sampling,it provides detailed and accurate images of marine substrate features.Most of the processing of SSS imagery works around limited sampling stations and requires manual interpretation to complete the classification of seabed sediment imagery.In complex sea areas,with manual interpretation,small targets are often lost due to a large amount of information.To date,studies related to the automatic recognition of seabed sediments are still few.This paper proposes a seabed sediment recognition method based on You Only Look Once version 5 and SSS imagery to perform real-time sedi-ment classification and localization for accuracy,particularly on small targets and faster speeds.We used methods such as changing the dataset size,epoch,and optimizer and adding multiscale training to overcome the challenges of having a small sample and a low accuracy.With these methods,we improved the results on mean average precision by 8.98%and F1 score by 11.12%compared with the original method.In addition,the detection speed was approximately 100 frames per second,which is faster than that of previous methods.This speed enabled us to achieve real-time seabed sediment recognition from SSS imagery.
文摘Introduction: Ultrafast latest developments in artificial intelligence (ΑΙ) have recently multiplied concerns regarding the future of robotic autonomy in surgery. However, the literature on the topic is still scarce. Aim: To test a novel AI commercially available tool for image analysis on a series of laparoscopic scenes. Methods: The research tools included OPENAI CHATGPT 4.0 with its corresponding image recognition plugin which was fed with a list of 100 laparoscopic selected snapshots from common surgical procedures. In order to score reliability of received responses from image-recognition bot, two corresponding scales were developed ranging from 0 - 5. The set of images was divided into two groups: unlabeled (Group A) and labeled (Group B), and according to the type of surgical procedure or image resolution. Results: AI was able to recognize correctly the context of surgical-related images in 97% of its reports. For the labeled surgical pictures, the image-processing bot scored 3.95/5 (79%), whilst for the unlabeled, it scored 2.905/5 (58.1%). Phases of the procedure were commented in detail, after all successful interpretations. With rates 4 - 5/5, the chatbot was able to talk in detail about the indications, contraindications, stages, instrumentation, complications and outcome rates of the operation discussed. Conclusion: Interaction between surgeon and chatbot appears to be an interesting frontend for further research by clinicians in parallel with evolution of its complex underlying infrastructure. In this early phase of using artificial intelligence for image recognition in surgery, no safe conclusions can be drawn by small cohorts with commercially available software. Further development of medically-oriented AI software and clinical world awareness are expected to bring fruitful information on the topic in the years to come.
基金This work was funded by the Deanship of Scientific Research at Jouf University under Grant Number(DSR2022-RG-0114).
文摘The challenge faced by the visually impaired persons in their day-today lives is to interpret text from documents.In this context,to help these people,the objective of this work is to develop an efficient text recognition system that allows the isolation,the extraction,and the recognition of text in the case of documents having a textured background,a degraded aspect of colors,and of poor quality,and to synthesize it into speech.This system basically consists of three algorithms:a text localization and detection algorithm based on mathematical morphology method(MMM);a text extraction algorithm based on the gamma correction method(GCM);and an optical character recognition(OCR)algorithm for text recognition.A detailed complexity study of the different blocks of this text recognition system has been realized.Following this study,an acceleration of the GCM algorithm(AGCM)is proposed.The AGCM algorithm has reduced the complexity in the text recognition system by 70%and kept the same quality of text recognition as that of the original method.To assist visually impaired persons,a graphical interface of the entire text recognition chain has been developed,allowing the capture of images from a camera,rapid and intuitive visualization of the recognized text from this image,and text-to-speech synthesis.Our text recognition system provides an improvement of 6.8%for the recognition rate and 7.6%for the F-measure relative to GCM and AGCM algorithms.