为提升对未来事件的预判能力并有效应对不确定性,提出了一种基于图增强和注意力机制的网络架构,用于多元时间序列的不确定性预测.通过引入隐含式图结构并结合图神经网络技术,捕捉各序列间相互依赖关系,从而建模时间序列之间的相互影响;...为提升对未来事件的预判能力并有效应对不确定性,提出了一种基于图增强和注意力机制的网络架构,用于多元时间序列的不确定性预测.通过引入隐含式图结构并结合图神经网络技术,捕捉各序列间相互依赖关系,从而建模时间序列之间的相互影响;运用注意力机制捕捉同一序列内的时序变化模式,以建模时间序列的动态演变规律;采用蒙特卡洛随机失活(Monte Carlo dropout)方法近似模型参数,并将预测序列建模为随机分布,以实现精确的时间序列不确定性预测.实验证明,该方法在保持较高预测精度的同时,还能进行可靠的不确定性估计,可以为决策任务提供置信度信息.展开更多
AIS(Automatic Identification System)是一种船舶的自动识别系统,可以提供船舶的时间戳、经纬度、航向角度、速度等数据信息.本文针对船舶航行轨迹多维度的特点以及对船舶轨迹预测的精确度和实时性的需求,提出了一种基于图像检测和匹...AIS(Automatic Identification System)是一种船舶的自动识别系统,可以提供船舶的时间戳、经纬度、航向角度、速度等数据信息.本文针对船舶航行轨迹多维度的特点以及对船舶轨迹预测的精确度和实时性的需求,提出了一种基于图像检测和匹配的计算轨迹相似度的方法.该方法首先将所有渔船轨迹数据进行可视化,再通过ORB(Oriented FAST and Rotated BRIEF)算法和BF(Brute-Force)匹配来计算轨迹图片相似度用于划分渔船轨迹类型.实验结果显示,通过该计算相似度的方法具有精度高、易实现的特点,与传统计算方法相比,其在处理轨迹数据的效率和速度更具有优越性.展开更多
本研究聚焦于利用机器学习区分早期年龄相关性黄斑变性(AMD)与正常对照组。鉴于AMD致盲率高且患病率随老龄化上升,早期检测至关重要。采用包含数千张图像的公开数据集,筛选出早期AMD患者和正常对照组的视网膜OCT图像,经基于U-net网络分...本研究聚焦于利用机器学习区分早期年龄相关性黄斑变性(AMD)与正常对照组。鉴于AMD致盲率高且患病率随老龄化上升,早期检测至关重要。采用包含数千张图像的公开数据集,筛选出早期AMD患者和正常对照组的视网膜OCT图像,经基于U-net网络分割为9层后,利用Python的Mathotas包计算各层前13个Haralick纹理特征值,并通过Kolmogorov-Smirnov检验及相应t检验或Mann-WhitneyU检验筛选特征。统计分析显示ONL、MEZ、RPE层纹理特征在两组间差异显著,OS层差异较小。模型分类中,LightGBM和XGBoost性能优于逻辑回归和SVM,前两者在MEZ、ONL层AUC值高,后两者在OS层表现差。研究为早期AMD诊断提供参考,但OS层问题有待进一步研究改进。This study focuses on using machine learning to distinguish early age-related macular degeneration (AMD) from normal control groups. Given the high rate of blindness caused by AMD and its increasing prevalence with aging, early detection is crucial. Using a public dataset containing thousands of images, retinal OCT images of early AMD patients and normal controls were selected. These images were segmented into 9 layers using a U-net based network. The first 13 Haralick texture features of each layer were calculated using Python’s Mahotas package, and features were selected through Kolmogorov-Smirnov tests and corresponding t-tests or Mann-Whitney U tests. Statistical analysis showed significant differences in texture features of the ONL, MEZ, and RPE layers between the two groups, with smaller differences in the OS layer. In model classification, LightGBM and XGBoost outperformed logistic regression and SVM, with the former two showing high AUC values in the MEZ and ONL layers, while the latter two performed poorly in the OS layer. The study provides a reference for early AMD diagnosis, but issues with the OS layer require further research and improvement.展开更多
语义边缘检测致力于精确描绘对象边界并为各个像素分配类别标签,这对实现准确定位和分类提出了双重挑战。本研究介绍了语言驱动语义边缘检测,这是一个简单的框架,可增强语义轮廓检测模型。语言驱动语义边缘检测旨在利用嵌入在文本表示...语义边缘检测致力于精确描绘对象边界并为各个像素分配类别标签,这对实现准确定位和分类提出了双重挑战。本研究介绍了语言驱动语义边缘检测,这是一个简单的框架,可增强语义轮廓检测模型。语言驱动语义边缘检测旨在利用嵌入在文本表示中的语义信息来重新校准边缘检测器的注意力,从而增强高级图像特征的判别能力。为了实现这一点,我们引入了文本特征信息,使用跨模态融合方式增强了边缘检测器的定位和分类。在SBD和CityScapes数据集上的实验结果表明,模型性能得到显著提升。例如,在CASENet中加入文本特征信息可将SBD数据集上的平均ODS得分从70.4提高到72.6。最终,语言驱动语义边缘检测实现了领先的平均ODS 77.0,超越了竞争对手。我们将展示更多额外的结合方法、主干网络的效果。Semantic edge detection strives to accurately delineate object boundaries and assign category labels to individual pixels, which poses a dual challenge to achieve accurate localization and classification. This study introduces language-driven semantic edge detection, a simple framework that enhances semantic contour detection models. Language-driven semantic edge detection aims to leverage the semantic information embedded in text representations to recalibrate the attention of edge detectors, thereby enhancing the discriminative ability of high-level image features. To achieve this, we introduce text feature information and use cross-modal fusion to enhance the localization and classification of edge detectors. Experimental results on SBD and CityScapes datasets show that model performance is significantly improved. For example, adding text feature information to CASENet improves the average ODS score on the SBD dataset from 70.4 to 72.6. Ultimately, language-driven semantic edge detection achieves a leading average ODS of 77.0, surpassing the competition. We will show the effects of more additional combining methods and backbone networks.展开更多
近年来,语音驱动的3D面部动画得到了广泛的研究,虽然先前的工作可以从语音数据中生成连贯的3D面部动画,但是由于视听数据的稀缺性,生成的3D面部动画缺乏真实感和生动性,嘴唇运动的准确性不高。为了提高嘴唇运动的准确性和生动性,本文提...近年来,语音驱动的3D面部动画得到了广泛的研究,虽然先前的工作可以从语音数据中生成连贯的3D面部动画,但是由于视听数据的稀缺性,生成的3D面部动画缺乏真实感和生动性,嘴唇运动的准确性不高。为了提高嘴唇运动的准确性和生动性,本文提出了一种新的模型HBF Talk (端到端的神经网络模型),通过使用Hu BERT (Hidden-Unit BERT)预训练模型对语音数据进行特征提取和编码,引入Flash模块对提取到的语音特征表示进行进一步的编码,获得更为丰富的语音特征上下文表示,最后使用带偏置的跨模态Transformer解码器进行解码。本文进行了定量和定性实验,并与现有的基线模型进行比较,显示本文HBF Talk模型具有更好的性能,提高了语音驱动的嘴唇运动的准确性和生动性。In recent years, speech-driven 3D facial animation has been widely studied. Previous work on the generation of coherent 3D facial animations was reported from speech data. However, the generated 3D facial animations lacks realism and vividness due to the scarcity of audio-visual data, and the accuracy of lip movements is not sufficient. This work is performed in order to improve the accuracy and vividness of lip movement and an end-to-end neural network model, HBF Talk, is proposed. It utilizes the Hu BERT (Hidden-Unit BERT) pre-trained model for feature extraction and encoding of speech data. The Flash module is introduced to further encode the extracted speech feature representations, resulting in more enriched contextual representations of speech features. Finally, a biased cross-modal Transformer decoder is used for decoding. This paper conducts both quantitative and qualitative experiments and compares the results with existing baseline models, demonstrating the proposed HBF Talk model outperforms previous models by improving the accuracy and liveliness of speech-driven lip movements.展开更多
先前的语音驱动面部表情的动画研究从音频信号中产生了较为逼真和精确的嘴唇运动和面部表情。传统的方法主要集中在学习从语音到动画的确定性映射,最近的研究开始探讨语音驱动的3D人脸动画的多样性,即通过利用扩散模型的多样性能力来捕...先前的语音驱动面部表情的动画研究从音频信号中产生了较为逼真和精确的嘴唇运动和面部表情。传统的方法主要集中在学习从语音到动画的确定性映射,最近的研究开始探讨语音驱动的3D人脸动画的多样性,即通过利用扩散模型的多样性能力来捕捉音频和面部运动之间复杂的多对多关系来完成任务。本文的Self-Diffuser方法使用预训练的大语言模型wav2vec 2.0对音频输入进行编码,通过引入基于扩散的技术,将其与Transformer相结合来完成生成任务。本研究不仅克服了传统回归模型在生成具有唇读可理解性的真实准确唇运动方面的局限性,还探讨了精确的嘴唇同步和创造与语音无关的面部表情之间的权衡。通过对比、分析当前最先进的方法,本文的Self-Diffuser方法,使得语音驱动的面部动画产生了更精确的唇运动;在与说话松散相关的上半部表情方面也产生了更贴近于真实说话表情的面部运动;同时本文模型引入的扩散机制使得生成3D人脸动画序列的多样性能力也大大提高。Previous research on speech-driven facial expression animation has achieved realistic and accurate lip movements and facial expressions from audio signals. Traditional methods primarily focused on learning deterministic mappings from speech to animation. Recent studies have started exploring the diversity of speech-driven 3D facial animation, aiming to capture the complex many-to-many relationships between audio and facial motion by leveraging the diversity capabilities of diffusion models. In this study, the Self-Diffuser method is proposed by utilizing the pre-trained large-scale language model wav2vec 2.0 to encode audio inputs. By introducing diffusion-based techniques and combining them with Transformers, the generation task is accomplished. This research not only overcomes the limitations of traditional regression models in generating lip movements that are both realistic and lip-reading comprehensible, but also explores the trade-off between precise lip synchronization and creating facial expressions independent of speech. Through comparisons and analysis with the current state-of-the-art methods, the Self-Diffuser method in this paper achieves more accurate lip movements in speech-driven facial animation. It also produces facial motions that closely resemble real speaking expressions in the upper face region correlated with speech looseness. Additionally, the introduced diffusion mechanism significantly enhances the diversity capabilities in generating 3D facial animation sequences.展开更多
文摘为提升对未来事件的预判能力并有效应对不确定性,提出了一种基于图增强和注意力机制的网络架构,用于多元时间序列的不确定性预测.通过引入隐含式图结构并结合图神经网络技术,捕捉各序列间相互依赖关系,从而建模时间序列之间的相互影响;运用注意力机制捕捉同一序列内的时序变化模式,以建模时间序列的动态演变规律;采用蒙特卡洛随机失活(Monte Carlo dropout)方法近似模型参数,并将预测序列建模为随机分布,以实现精确的时间序列不确定性预测.实验证明,该方法在保持较高预测精度的同时,还能进行可靠的不确定性估计,可以为决策任务提供置信度信息.
文摘AIS(Automatic Identification System)是一种船舶的自动识别系统,可以提供船舶的时间戳、经纬度、航向角度、速度等数据信息.本文针对船舶航行轨迹多维度的特点以及对船舶轨迹预测的精确度和实时性的需求,提出了一种基于图像检测和匹配的计算轨迹相似度的方法.该方法首先将所有渔船轨迹数据进行可视化,再通过ORB(Oriented FAST and Rotated BRIEF)算法和BF(Brute-Force)匹配来计算轨迹图片相似度用于划分渔船轨迹类型.实验结果显示,通过该计算相似度的方法具有精度高、易实现的特点,与传统计算方法相比,其在处理轨迹数据的效率和速度更具有优越性.
文摘本研究聚焦于利用机器学习区分早期年龄相关性黄斑变性(AMD)与正常对照组。鉴于AMD致盲率高且患病率随老龄化上升,早期检测至关重要。采用包含数千张图像的公开数据集,筛选出早期AMD患者和正常对照组的视网膜OCT图像,经基于U-net网络分割为9层后,利用Python的Mathotas包计算各层前13个Haralick纹理特征值,并通过Kolmogorov-Smirnov检验及相应t检验或Mann-WhitneyU检验筛选特征。统计分析显示ONL、MEZ、RPE层纹理特征在两组间差异显著,OS层差异较小。模型分类中,LightGBM和XGBoost性能优于逻辑回归和SVM,前两者在MEZ、ONL层AUC值高,后两者在OS层表现差。研究为早期AMD诊断提供参考,但OS层问题有待进一步研究改进。This study focuses on using machine learning to distinguish early age-related macular degeneration (AMD) from normal control groups. Given the high rate of blindness caused by AMD and its increasing prevalence with aging, early detection is crucial. Using a public dataset containing thousands of images, retinal OCT images of early AMD patients and normal controls were selected. These images were segmented into 9 layers using a U-net based network. The first 13 Haralick texture features of each layer were calculated using Python’s Mahotas package, and features were selected through Kolmogorov-Smirnov tests and corresponding t-tests or Mann-Whitney U tests. Statistical analysis showed significant differences in texture features of the ONL, MEZ, and RPE layers between the two groups, with smaller differences in the OS layer. In model classification, LightGBM and XGBoost outperformed logistic regression and SVM, with the former two showing high AUC values in the MEZ and ONL layers, while the latter two performed poorly in the OS layer. The study provides a reference for early AMD diagnosis, but issues with the OS layer require further research and improvement.
文摘语义边缘检测致力于精确描绘对象边界并为各个像素分配类别标签,这对实现准确定位和分类提出了双重挑战。本研究介绍了语言驱动语义边缘检测,这是一个简单的框架,可增强语义轮廓检测模型。语言驱动语义边缘检测旨在利用嵌入在文本表示中的语义信息来重新校准边缘检测器的注意力,从而增强高级图像特征的判别能力。为了实现这一点,我们引入了文本特征信息,使用跨模态融合方式增强了边缘检测器的定位和分类。在SBD和CityScapes数据集上的实验结果表明,模型性能得到显著提升。例如,在CASENet中加入文本特征信息可将SBD数据集上的平均ODS得分从70.4提高到72.6。最终,语言驱动语义边缘检测实现了领先的平均ODS 77.0,超越了竞争对手。我们将展示更多额外的结合方法、主干网络的效果。Semantic edge detection strives to accurately delineate object boundaries and assign category labels to individual pixels, which poses a dual challenge to achieve accurate localization and classification. This study introduces language-driven semantic edge detection, a simple framework that enhances semantic contour detection models. Language-driven semantic edge detection aims to leverage the semantic information embedded in text representations to recalibrate the attention of edge detectors, thereby enhancing the discriminative ability of high-level image features. To achieve this, we introduce text feature information and use cross-modal fusion to enhance the localization and classification of edge detectors. Experimental results on SBD and CityScapes datasets show that model performance is significantly improved. For example, adding text feature information to CASENet improves the average ODS score on the SBD dataset from 70.4 to 72.6. Ultimately, language-driven semantic edge detection achieves a leading average ODS of 77.0, surpassing the competition. We will show the effects of more additional combining methods and backbone networks.
文摘近年来,语音驱动的3D面部动画得到了广泛的研究,虽然先前的工作可以从语音数据中生成连贯的3D面部动画,但是由于视听数据的稀缺性,生成的3D面部动画缺乏真实感和生动性,嘴唇运动的准确性不高。为了提高嘴唇运动的准确性和生动性,本文提出了一种新的模型HBF Talk (端到端的神经网络模型),通过使用Hu BERT (Hidden-Unit BERT)预训练模型对语音数据进行特征提取和编码,引入Flash模块对提取到的语音特征表示进行进一步的编码,获得更为丰富的语音特征上下文表示,最后使用带偏置的跨模态Transformer解码器进行解码。本文进行了定量和定性实验,并与现有的基线模型进行比较,显示本文HBF Talk模型具有更好的性能,提高了语音驱动的嘴唇运动的准确性和生动性。In recent years, speech-driven 3D facial animation has been widely studied. Previous work on the generation of coherent 3D facial animations was reported from speech data. However, the generated 3D facial animations lacks realism and vividness due to the scarcity of audio-visual data, and the accuracy of lip movements is not sufficient. This work is performed in order to improve the accuracy and vividness of lip movement and an end-to-end neural network model, HBF Talk, is proposed. It utilizes the Hu BERT (Hidden-Unit BERT) pre-trained model for feature extraction and encoding of speech data. The Flash module is introduced to further encode the extracted speech feature representations, resulting in more enriched contextual representations of speech features. Finally, a biased cross-modal Transformer decoder is used for decoding. This paper conducts both quantitative and qualitative experiments and compares the results with existing baseline models, demonstrating the proposed HBF Talk model outperforms previous models by improving the accuracy and liveliness of speech-driven lip movements.
文摘先前的语音驱动面部表情的动画研究从音频信号中产生了较为逼真和精确的嘴唇运动和面部表情。传统的方法主要集中在学习从语音到动画的确定性映射,最近的研究开始探讨语音驱动的3D人脸动画的多样性,即通过利用扩散模型的多样性能力来捕捉音频和面部运动之间复杂的多对多关系来完成任务。本文的Self-Diffuser方法使用预训练的大语言模型wav2vec 2.0对音频输入进行编码,通过引入基于扩散的技术,将其与Transformer相结合来完成生成任务。本研究不仅克服了传统回归模型在生成具有唇读可理解性的真实准确唇运动方面的局限性,还探讨了精确的嘴唇同步和创造与语音无关的面部表情之间的权衡。通过对比、分析当前最先进的方法,本文的Self-Diffuser方法,使得语音驱动的面部动画产生了更精确的唇运动;在与说话松散相关的上半部表情方面也产生了更贴近于真实说话表情的面部运动;同时本文模型引入的扩散机制使得生成3D人脸动画序列的多样性能力也大大提高。Previous research on speech-driven facial expression animation has achieved realistic and accurate lip movements and facial expressions from audio signals. Traditional methods primarily focused on learning deterministic mappings from speech to animation. Recent studies have started exploring the diversity of speech-driven 3D facial animation, aiming to capture the complex many-to-many relationships between audio and facial motion by leveraging the diversity capabilities of diffusion models. In this study, the Self-Diffuser method is proposed by utilizing the pre-trained large-scale language model wav2vec 2.0 to encode audio inputs. By introducing diffusion-based techniques and combining them with Transformers, the generation task is accomplished. This research not only overcomes the limitations of traditional regression models in generating lip movements that are both realistic and lip-reading comprehensible, but also explores the trade-off between precise lip synchronization and creating facial expressions independent of speech. Through comparisons and analysis with the current state-of-the-art methods, the Self-Diffuser method in this paper achieves more accurate lip movements in speech-driven facial animation. It also produces facial motions that closely resemble real speaking expressions in the upper face region correlated with speech looseness. Additionally, the introduced diffusion mechanism significantly enhances the diversity capabilities in generating 3D facial animation sequences.