Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and c...Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.展开更多
Few-shot named entity recognition(NER)aims to identify named entities in new domains using a limited amount of annotated data.Previous methods divided this task into entity span detection and entity classification,ach...Few-shot named entity recognition(NER)aims to identify named entities in new domains using a limited amount of annotated data.Previous methods divided this task into entity span detection and entity classification,achieving good results.However these methods are limited by the imbalance between the entity and non-entity categories due to the use of sequence labeling for entity span detection.To this end,a point-proto network(PPN)combining pointer and prototypical networks was proposed.Specifically,the pointer network generates the position of entities in sentences in the entity span detection stage.The prototypical network builds semantic prototypes of entity types and classifies entities based on their distance from these prototypes in the entity classification stage.Moreover,the low-rank adaptation(LoRA)fine-tuning method,which involves freezing the pre-trained weights and injecting a trainable decomposition matrix,reduces the parameters that need to be trained and saved.Extensive experiments on the few-shot NER Dataset(Few-NERD)and Cross-Dataset demonstrate the superiority of PPN in this domain.展开更多
为获得结构化的小麦品种表型和遗传描述,针对非结构化小麦种质数据中存在的实体边界模糊以及关系重叠问题,提出一种基于深度字词融合的小麦种质信息实体关系联合抽取模型WGIE-DCWF(wheat germplasm information extraction model based ...为获得结构化的小麦品种表型和遗传描述,针对非结构化小麦种质数据中存在的实体边界模糊以及关系重叠问题,提出一种基于深度字词融合的小麦种质信息实体关系联合抽取模型WGIE-DCWF(wheat germplasm information extraction model based on deep character and word fusion)。模型编码层通过深度字词融合和上下文语义特征融合,提高密集实体特征识别能力;模型三元组抽取层建立层叠指针网络,提高重叠关系的提取能力。在小麦种质数据集和公开数据集上的一系列对比实验结果表明,WGIE-DCWF模型能够有效提高小麦种质数据实体关系联合抽取效果,同时拥有较好的泛化性,可以为小麦种质信息知识库构建提供技术支撑。展开更多
[目的/意义]针对实体嵌套、实体类型混淆等问题导致的农业病害命名实体识别(Named Entities Recognition,NER)准确率不高的情况,以PointerNet为基准模型,提出一种基于RoFormer预训练模型的指针网络农业病害NER方法RoFormer-PointerNet。...[目的/意义]针对实体嵌套、实体类型混淆等问题导致的农业病害命名实体识别(Named Entities Recognition,NER)准确率不高的情况,以PointerNet为基准模型,提出一种基于RoFormer预训练模型的指针网络农业病害NER方法RoFormer-PointerNet。[方法]采用RoFormer预训练模型对输入的文本进行向量化,利用其独特的旋转位置嵌入方法来捕捉位置信息,丰富字词特征信息,从而解决一词多义导致的类型易混淆的问题。使用指针网络进行解码,利用指针网络的首尾指针标注方式抽取句子中的所有实体,首尾指针标注方式可以解决实体抽取中存在的嵌套问题。[结果和讨论]自建农业病害数据集,数据集中包含2867条标注语料,共10282个实体。为验证RoFormer预训练模型在实体抽取上的优越性,采用Word2Vec、BERT、RoBERTa等多种向量化模型进行对比试验,RoFormer-PointerNet与其他模型相比,模型精确率、召回率、F1值均为最优,分别为87.49%,85.76%和86.62%。为验证RoFormer-PointerNet在缓解实体嵌套的优势,与使用最为广泛的双向长短期记忆神经网络(Bidirectional Long Short-Term Memory,BiLSTM)和条件随机场(Conditional Random Field,CRF)模型进行对比试验,RoFormer-PointerNet比RoFormer-BiLSTM模型、RoFormer-CRF模型和RoFormer-BiLSTM-CRF模型分别高出4.8%、5.67%和3.87%,证明用指针网络模型可以很好解决实体嵌套问题。最后验证RoFormer-PointerNet方法在农业病害数据集中的识别性能,针对病害症状、病害名称、防治方法等8类实体进行了识别实验,本方法识别的精确率、召回率和F1值分别为87.49%、85.76%和86.62%,为同类最优。[结论]本研究提出的方法能有效识别中文农业病害文本中的实体,识别效果优于其他模型。在解决实体抽取过程中的实体嵌套和类型混淆等问题方面具有一定优势。展开更多
针对光照不均匀和水表表盘雾化的指针式水表在读数检测时出现漏检、误检等问题,提出一种基于改进YOLOv5s的指针式水表读数检测方法。首先,采用Mosaic、Mixup等数据增强方法,提高模型的泛化能力;其次,引入加权双向特征金字塔网络(bilater...针对光照不均匀和水表表盘雾化的指针式水表在读数检测时出现漏检、误检等问题,提出一种基于改进YOLOv5s的指针式水表读数检测方法。首先,采用Mosaic、Mixup等数据增强方法,提高模型的泛化能力;其次,引入加权双向特征金字塔网络(bilateral feature pyramid network, BiFPN)实现更高层次的特征融合使得水表图像的深层特征图和浅层特征图充分融合,提高网络的表达能力;然后,嵌入卷积注意力机制(convolutional block attention module, CBAM),在通道和空间双重维度上强化指针式水表子表盘示数特征;最后将完全交并比损失函数(complete intersection over union loss, CIoU-Loss)替换为SIoU_Loss(scylla intersection over union loss),提升边界框的回归精度。改进算法的mAP@0.5达到97.8%,比YOLOv5s原始网络提升了3.2%。实验结果表明:该算法能有效提高指针式水表的读数检测精度。展开更多
针对现有仪表读数方法易受光照不均等因素影响,而导致读数误差大的问题,提出一种基于深度学习的全自动指针式仪表读数方法。首先,引入YOLOv7网络提取表盘区域;其次,采用文中提出的VCA-UNet(VGG16Net,improved skip connections and ASPP...针对现有仪表读数方法易受光照不均等因素影响,而导致读数误差大的问题,提出一种基于深度学习的全自动指针式仪表读数方法。首先,引入YOLOv7网络提取表盘区域;其次,采用文中提出的VCA-UNet(VGG16Net,improved skip connections and ASPP based U-Net)网络用于分割刻度线和指针;最后,引入PP-OCRv3网络自动获取仪表量程,并利用角度法确定仪表示数。实验结果表明:VCA-UNet网络的MIoU和MPA值较U-Net网络分别提升18.48%和9.36%,且普遍高于其他经典分割网络,仪表读数的平均相对误差为0.614%,且泛化实验的读数绝对误差相对较小,验证了读数方法的准确性和泛化性。展开更多
基金supported by the Outstanding Youth Team Project of Central Universities(QNTD202308)the Ant Group through CCF-Ant Research Fund(CCF-AFSG 769498 RF20220214).
文摘Named Entity Recognition(NER)stands as a fundamental task within the field of biomedical text mining,aiming to extract specific types of entities such as genes,proteins,and diseases from complex biomedical texts and categorize them into predefined entity types.This process can provide basic support for the automatic construction of knowledge bases.In contrast to general texts,biomedical texts frequently contain numerous nested entities and local dependencies among these entities,presenting significant challenges to prevailing NER models.To address these issues,we propose a novel Chinese nested biomedical NER model based on RoBERTa and Global Pointer(RoBGP).Our model initially utilizes the RoBERTa-wwm-ext-large pretrained language model to dynamically generate word-level initial vectors.It then incorporates a Bidirectional Long Short-Term Memory network for capturing bidirectional semantic information,effectively addressing the issue of long-distance dependencies.Furthermore,the Global Pointer model is employed to comprehensively recognize all nested entities in the text.We conduct extensive experiments on the Chinese medical dataset CMeEE and the results demonstrate the superior performance of RoBGP over several baseline models.This research confirms the effectiveness of RoBGP in Chinese biomedical NER,providing reliable technical support for biomedical information extraction and knowledge base construction.
基金supported by the National Key Research and Development Project(2021YFF0901701)。
文摘Few-shot named entity recognition(NER)aims to identify named entities in new domains using a limited amount of annotated data.Previous methods divided this task into entity span detection and entity classification,achieving good results.However these methods are limited by the imbalance between the entity and non-entity categories due to the use of sequence labeling for entity span detection.To this end,a point-proto network(PPN)combining pointer and prototypical networks was proposed.Specifically,the pointer network generates the position of entities in sentences in the entity span detection stage.The prototypical network builds semantic prototypes of entity types and classifies entities based on their distance from these prototypes in the entity classification stage.Moreover,the low-rank adaptation(LoRA)fine-tuning method,which involves freezing the pre-trained weights and injecting a trainable decomposition matrix,reduces the parameters that need to be trained and saved.Extensive experiments on the few-shot NER Dataset(Few-NERD)and Cross-Dataset demonstrate the superiority of PPN in this domain.
文摘为获得结构化的小麦品种表型和遗传描述,针对非结构化小麦种质数据中存在的实体边界模糊以及关系重叠问题,提出一种基于深度字词融合的小麦种质信息实体关系联合抽取模型WGIE-DCWF(wheat germplasm information extraction model based on deep character and word fusion)。模型编码层通过深度字词融合和上下文语义特征融合,提高密集实体特征识别能力;模型三元组抽取层建立层叠指针网络,提高重叠关系的提取能力。在小麦种质数据集和公开数据集上的一系列对比实验结果表明,WGIE-DCWF模型能够有效提高小麦种质数据实体关系联合抽取效果,同时拥有较好的泛化性,可以为小麦种质信息知识库构建提供技术支撑。
文摘[目的/意义]针对实体嵌套、实体类型混淆等问题导致的农业病害命名实体识别(Named Entities Recognition,NER)准确率不高的情况,以PointerNet为基准模型,提出一种基于RoFormer预训练模型的指针网络农业病害NER方法RoFormer-PointerNet。[方法]采用RoFormer预训练模型对输入的文本进行向量化,利用其独特的旋转位置嵌入方法来捕捉位置信息,丰富字词特征信息,从而解决一词多义导致的类型易混淆的问题。使用指针网络进行解码,利用指针网络的首尾指针标注方式抽取句子中的所有实体,首尾指针标注方式可以解决实体抽取中存在的嵌套问题。[结果和讨论]自建农业病害数据集,数据集中包含2867条标注语料,共10282个实体。为验证RoFormer预训练模型在实体抽取上的优越性,采用Word2Vec、BERT、RoBERTa等多种向量化模型进行对比试验,RoFormer-PointerNet与其他模型相比,模型精确率、召回率、F1值均为最优,分别为87.49%,85.76%和86.62%。为验证RoFormer-PointerNet在缓解实体嵌套的优势,与使用最为广泛的双向长短期记忆神经网络(Bidirectional Long Short-Term Memory,BiLSTM)和条件随机场(Conditional Random Field,CRF)模型进行对比试验,RoFormer-PointerNet比RoFormer-BiLSTM模型、RoFormer-CRF模型和RoFormer-BiLSTM-CRF模型分别高出4.8%、5.67%和3.87%,证明用指针网络模型可以很好解决实体嵌套问题。最后验证RoFormer-PointerNet方法在农业病害数据集中的识别性能,针对病害症状、病害名称、防治方法等8类实体进行了识别实验,本方法识别的精确率、召回率和F1值分别为87.49%、85.76%和86.62%,为同类最优。[结论]本研究提出的方法能有效识别中文农业病害文本中的实体,识别效果优于其他模型。在解决实体抽取过程中的实体嵌套和类型混淆等问题方面具有一定优势。
文摘针对光照不均匀和水表表盘雾化的指针式水表在读数检测时出现漏检、误检等问题,提出一种基于改进YOLOv5s的指针式水表读数检测方法。首先,采用Mosaic、Mixup等数据增强方法,提高模型的泛化能力;其次,引入加权双向特征金字塔网络(bilateral feature pyramid network, BiFPN)实现更高层次的特征融合使得水表图像的深层特征图和浅层特征图充分融合,提高网络的表达能力;然后,嵌入卷积注意力机制(convolutional block attention module, CBAM),在通道和空间双重维度上强化指针式水表子表盘示数特征;最后将完全交并比损失函数(complete intersection over union loss, CIoU-Loss)替换为SIoU_Loss(scylla intersection over union loss),提升边界框的回归精度。改进算法的mAP@0.5达到97.8%,比YOLOv5s原始网络提升了3.2%。实验结果表明:该算法能有效提高指针式水表的读数检测精度。
文摘针对现有仪表读数方法易受光照不均等因素影响,而导致读数误差大的问题,提出一种基于深度学习的全自动指针式仪表读数方法。首先,引入YOLOv7网络提取表盘区域;其次,采用文中提出的VCA-UNet(VGG16Net,improved skip connections and ASPP based U-Net)网络用于分割刻度线和指针;最后,引入PP-OCRv3网络自动获取仪表量程,并利用角度法确定仪表示数。实验结果表明:VCA-UNet网络的MIoU和MPA值较U-Net网络分别提升18.48%和9.36%,且普遍高于其他经典分割网络,仪表读数的平均相对误差为0.614%,且泛化实验的读数绝对误差相对较小,验证了读数方法的准确性和泛化性。