期刊文献+
共找到1,815篇文章
< 1 2 91 >
每页显示 20 50 100
Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video 被引量:1
1
作者 Liu Hua-yong, Zhou Dong-ru School of Computer,Wuhan University,Wuhan 430072, Hubei, China 《Wuhan University Journal of Natural Sciences》 CAS 2003年第04A期1070-1074,共5页
Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The p... Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust. 展开更多
关键词 news video story segmentation audio-visual features analysis text detection
下载PDF
A Review on Audio-visual Translation Studies
2
作者 李瑶 《语言与文化研究》 2008年第1期146-150,共5页
This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some ligh... This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed. 展开更多
关键词 audio-visual TRANSLATION SUBTITLING DUBBING
原文传递
Audio-visual emotion recognition with multilayer boosted HMM
3
作者 吕坤 贾云得 张欣 《Journal of Beijing Institute of Technology》 EI CAS 2013年第1期89-93,共5页
Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A mod... Emotion recognition has become an important task of modern human-computer interac- tion. A multilayer boosted HMM ( MBHMM ) classifier for automatic audio-visual emotion recognition is presented in this paper. A modified Baum-Welch algorithm is proposed for component HMM learn- ing and adaptive boosting (AdaBoost) is used to train ensemble classifiers for different layers (cues). Except for the first layer, the initial weights of training samples in current layer are decided by recognition results of the ensemble classifier in the upper layer. Thus the training procedure using current cue can focus more on the difficult samples according to the previous cue. Our MBHMM clas- sifier is combined by these ensemble classifiers and takes advantage of the complementary informa- tion from multiple cues and modalities. Experimental results on audio-visual emotion data collected in Wizard of Oz scenarios and labeled under two types of emotion category sets demonstrate that our approach is effective and promising. 展开更多
关键词 emotion recognition audio-visual fusion Baum-Welch algorithm multilayer boostedHMM Wizard of Oz scenario
下载PDF
The Audio-Visual Performance Highlighted Craze in Chicago During Chinese New Year
4
《China & The World Cultural Exchange》 2019年第2期38-39,共2页
February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese ... February 10 (US Central Time), 2019, China National Peking Opera Company (CNPOC) and the Hubei Chime Bells National Chinese Orchestra presented a fantastic audio-visual performance of Chinese Peking Opera and Chinese chime bells for the American audience at the world s top-level Buntrock Hall at Symphony Center. 展开更多
关键词 audio-visual PERFORMANCE Chicago CHINESE New YEAR
下载PDF
Research on National Identity Based on National Audio-Visual Works: Taking Inner Mongolia as an Example
5
作者 LIU Haitao ZHANG Pei 《Cultural and Religious Studies》 2021年第8期391-396,共6页
Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense... Mongolian audio-visual works are an important carrier of exploring the true significance to this national culture.This paper believes that the Mongolian people in Inner Mongolia constantly enhance the individual sense of identity to the overall ethnic group through the influence of film and television and music,and on this basis constantly evolve a new culture in line with modern and contemporary life to further enhance their sense of belonging to the ethnic nation. 展开更多
关键词 MONGOLIAN audio-visual works national identity
下载PDF
Application of Task-based Teaching Method to College Audio-visual English Teaching
6
作者 Liguo Shi 《International Journal of Technology Management》 2015年第9期65-67,共3页
Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. Af... Based on the current situation of college audio-visual English teaching in China, this article points out that the avoidance in class is a serious phenomenon in the process of college audio-visual English teaching. After further analysis and combination with the characteristics of college English audio-visual teaching in China, it puts forward the application of task-based teaching method to college audio-visual English teaching and its steps, attempting to alleviate the avoidance phenomenon in students through task-based teaching method. 展开更多
关键词 task-based teaching method college English audio-visual English teaching
下载PDF
Prioritized MPEG-4 Audio-Visual Objects Streaming over the DiffServ
7
作者 黄天云 郑婵 《Journal of Electronic Science and Technology of China》 2005年第4期314-320,共7页
The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are e... The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the 1P DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the ‘best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content. 展开更多
关键词 video streaming quality of service (QoS) MPEG-4 audio-visual objects (AVOs) DIFFSERV PRIORITIZATION
下载PDF
Application of Conversational Implicatures in Teaching English Audio-visual Course
8
作者 刘慧莹 《商情》 2014年第17期370-371,共2页
关键词 英语学习 学习方法 阅读知识 阅读材料
下载PDF
The Research on Audio-Visual-Oral Instructional theory in Foreign Language
9
作者 FENG Xiaowei 《Journal of Zhouyi Research》 2014年第3期4-6,共3页
关键词 口语教学 视听 外语 教学理论 语言学习 语言教学 教学设备 心理基础
下载PDF
Self-supervised Learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases
10
作者 Karim Dabbabi Abdelkarim Mars 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2024年第5期576-606,共31页
Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to... Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to develop a deep learning model that utilizes Distil HuBERT for jointly learning these combined features in speech emotion recognition (SER). Our experiments highlight its distinct advantages: it significantly outperforms Wav2vec 2.0 in both offline and real-time accuracy on RAVDESS and BAVED datasets. Although slightly trailing HuBERT’s offline accuracy, Distil HuBERT shines with comparable performance at a fraction of the model size, making it an ideal choice for resource-constrained environments like mobile devices. This smaller size does come with a slight trade-off: Distil HuBERT achieved notable accuracy in offline evaluation, with 96.33% on the BAVED database and 87.01% on the RAVDESS database. In real-time evaluation, the accuracy decreased to 79.3% on the BAVED database and 77.87% on the RAVDESS database. This decrease is likely a result of the challenges associated with real-time processing, including latency and noise, but still demonstrates strong performance in practical scenarios. Therefore, Distil HuBERT emerges as a compelling choice for SER, especially when prioritizing accuracy over real-time processing. Its compact size further enhances its potential for resource-limited settings, making it a versatile tool for a wide range of applications. 展开更多
关键词 Wav2vec 2.0 Distil HuBERT HuBERT SER audio and audio-visual features
原文传递
Cogeneration of Innovative Audio-visual Content: A New Challenge for Computing Art
11
作者 Mengting Liu Ying Zhou +1 位作者 Yuwei Wu Feng Gao 《Machine Intelligence Research》 EI CSCD 2024年第1期4-28,共25页
In recent years,computing art has developed rapidly with the in-depth cross study of artificial intelligence generated con-tent(AIGC)and the main features of artworks.Audio-visual content generation has gradually been... In recent years,computing art has developed rapidly with the in-depth cross study of artificial intelligence generated con-tent(AIGC)and the main features of artworks.Audio-visual content generation has gradually been applied to various practical tasks,including video or game score,assisting artists in creation,art education and other aspects,which demonstrates a broad application pro-spect.In this paper,we introduce innovative achievements in audio-visual content generation from the perspective of visual art genera-tion and auditory art generation based on artificial intelligence(Al).We outline the development tendency of image and music datasets,visual and auditory content modelling,and related automatic generation systems.The objective and subjective evaluation of generated samples plays an important role in the measurement of algorithm performance.We provide a cogeneration mechanism of audio-visual content in multimodal tasks from image to music and display the construction of specific stylized datasets.There are still many new op-portunities and challenges in the field of audio-visual synesthesia generation,and we provide a comprehensive discussion on them. 展开更多
关键词 Artificial intelligence(AI)art audio-visual artificial intelligence generated content(AIGC) MULTIMODAL artistic evalu-ation
原文传递
AV-FDTI:Audio-visual fusion for drone threat identification
12
作者 Yizhuo Yang Shenghai Yuan +5 位作者 Jianfei Yang Thien Hoang Nguyen Muqing Cao Thien-Minh Nguyen Han Wang Lihua Xie 《Journal of Automation and Intelligence》 2024年第3期144-151,共8页
In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fus... In response to the evolving challenges posed by small unmanned aerial vehicles(UAVs),which have the potential to transport harmful payloads or cause significant damage,we present AV-FDTI,an innovative Audio-Visual Fusion system designed for Drone Threat Identification.AV-FDTI leverages the fusion of audio and omnidirectional camera feature inputs,providing a comprehensive solution to enhance the precision and resilience of drone classification and 3D localization.Specifically,AV-FDTI employs a CRNN network to capture vital temporal dynamics within the audio domain and utilizes a pretrained ResNet50 model for image feature extraction.Furthermore,we adopt a visual information entropy and cross-attention-based mechanism to enhance the fusion of visual and audio data.Notably,our system is trained based on automated Leica tracking annotations,offering accurate ground truth data with millimeter-level accuracy.Comprehensive comparative evaluations demonstrate the superiority of our solution over the existing systems.In our commitment to advancing this field,we will release this work as open-source code and wearable AV-FDTI design,contributing valuable resources to the research community. 展开更多
关键词 audio-visual fusion Anti-UAV Multi-modal fusion Classification 3D localization Self-attention
下载PDF
Deep Audio-visual Learning:A Survey 被引量:3
13
作者 Hao Zhu Man-Di Luo +2 位作者 Rui Wang Ai-Hua Zheng Ran He 《International Journal of Automation and computing》 EI CSCD 2021年第3期351-376,共26页
Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these tw... Audio-visual learning,aimed at exploiting the relationship between audio and visual modalities,has drawn considerable attention since deep learning started to be used successfully.Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems.In this paper,we provide a comprehensive survey of recent audio-visual learning development.We divide the current audio-visual learning tasks into four different subfields:audiovisual separation and localization,audio-visual correspondence learning,audio-visual generation,and audio-visual representation learning.State-of-the-art methods,as well as the remaining challenges of each subfield,are further discussed.Finally,we summarize the commonly used datasets and challenges. 展开更多
关键词 Deep audio-visual learning audio-visual separation and localization correspondence learning generative models representation learning
原文传递
Neural correlates of audio-visual modal interference inhibition investigated in children by ERP 被引量:2
14
作者 WANG YiWen LIN ChongDe +2 位作者 LIANG Jing WANG Yu ZHANG WenXin 《Science China(Life Sciences)》 SCIE CAS 2011年第2期194-200,共7页
In order to detect cross-sectional age characteristics of cognitive neural mechanisms in audio-visual modal interference inhibition,event-related potentials(ERP) of 14 10-year-old children were recorded while performi... In order to detect cross-sectional age characteristics of cognitive neural mechanisms in audio-visual modal interference inhibition,event-related potentials(ERP) of 14 10-year-old children were recorded while performing the words interference task.In incongruent conditions,the participants were required to inhibit the audio interference words of the same category.The present findings provided the preliminary evidence of brain mechanism for the children's inhibition development in the specific childhood stage. 展开更多
关键词 audio-visual modal INTERFERENCE INHIBITION event-related potentials(ERP)
原文传递
Stream Weight Training Based on MCE for Audio-Visual LVCSR 被引量:1
15
作者 刘鹏 王作英 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期141-144,共4页
In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is dis... In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments. 展开更多
关键词 audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE)
原文传递
二重情境:数字视听文化中的身份构建与认同疏离 被引量:2
16
作者 张梓轩 李政 《编辑之友》 CSSCI 北大核心 2024年第2期21-28,共8页
数字视听媒介及其文化的发展创造了新的情境,推动用户产生新的媒介实践,但也带来了新的问题和现象,即情境生成逻辑向流媒体用户让渡的转变、情境功能作为交往工具的偏移,以及情境秩序在去公共性过程中权力再结构化的取向。其中隐含着情... 数字视听媒介及其文化的发展创造了新的情境,推动用户产生新的媒介实践,但也带来了新的问题和现象,即情境生成逻辑向流媒体用户让渡的转变、情境功能作为交往工具的偏移,以及情境秩序在去公共性过程中权力再结构化的取向。其中隐含着情境之于用户身份构建的二重性:一方面,流媒体用户依据情境构建身份,身份的构建过程进一步激发了情境的创造;另一方面,这些被构建的身份呈现出去情境化的流动趋向,造成了身份与认同的疏离。对此,文章在厘清数字视听文化新情境特征的基础上,阐释流媒体用户基于身份的情境互动过程,以及身份认同疏离的成因及危机。更进一步,文章尝试将新媒介—新情境—新行为的线性模式延展为更具解释力的用户主导循环模式,并提出通过共识的凝聚、公共性的重拾和共同体的重建,推动数字视听文化的良序发展。 展开更多
关键词 数字视听文化 流媒体 媒介情境论 身份构建 认同疏离
下载PDF
基于ⅢF A/V规范和Avalon系统的大学图书馆视听数据库建设研究
17
作者 张毅 熊泽泉 +1 位作者 胡晓明 陈丹 《图书馆杂志》 CSSCI 北大核心 2024年第1期50-58,49,共10页
随着中国网络基础设施的不断改善,视听媒体在年轻一代中非常流行,给以文本资源为主的图书馆带来了挑战。本研究旨在探究国内外大学图书馆视听资源数据库建设的现状,借鉴ⅢF规范在图像资源管理方面的成功经验和各种视听保存社区的实践,... 随着中国网络基础设施的不断改善,视听媒体在年轻一代中非常流行,给以文本资源为主的图书馆带来了挑战。本研究旨在探究国内外大学图书馆视听资源数据库建设的现状,借鉴ⅢF规范在图像资源管理方面的成功经验和各种视听保存社区的实践,提出基于ⅢF A/V规范与开源软件的中国大学图书馆视听资源管理方法。通过分析华东师范大学图书馆在视听资源保存、流媒体发布、时间轴气泡注释、转录、视听结构化和开放共享方面的实践,进行实证研究。 展开更多
关键词 视听数据库 ⅢF A/V Avalon媒体系统 视听可视化
原文传递
网络视听视域下中国式现代化图景的现实表征与构建逻辑
18
作者 王晓红 张琦 《中州学刊》 CSSCI 北大核心 2024年第10期161-169,共9页
随着信息技术的发展,网络视听已成为文化传播与日常生活的重要媒介,也日益成为文化生产的重要领域。网络视听在创新主体、核心根脉和重要目标等方面,与中国式现代化有着深刻而自洽的内涵关系,包括以人民为中心的影像赋权、彰显中国特色... 随着信息技术的发展,网络视听已成为文化传播与日常生活的重要媒介,也日益成为文化生产的重要领域。网络视听在创新主体、核心根脉和重要目标等方面,与中国式现代化有着深刻而自洽的内涵关系,包括以人民为中心的影像赋权、彰显中国特色的文化“双创”以及打造共同富裕的产业驱动等先天优势。围绕中国式现代化的五大特征,网络视听空间以丰富的案例多维度地反映和塑造了中国式现代化的生动图景。究其背后的生成逻辑,实为网络视听承担了五种身份,即作为“引领者”弘扬主流价值,作为“记录者”打造多模态中国文化符号,作为“讲述者”激发情感共鸣的时代语态,作为“参与者”构建多元互动的社会场景,作为“驱动者”正确处理技术赋能与艺术实践的关系。网络视听媒体以其突出的视听艺术特性和共创共享的互联网品质,正在成为讲好中国式现代化故事的重要力量。 展开更多
关键词 网络视听 中国式现代化 文化生产 日常生活
下载PDF
基于视音互补语义清晰化的隐私视频动作识别方法
19
作者 李泽超 付孝德 +2 位作者 潘礼勇 严锐 唐金辉 《电子学报》 EI CAS CSCD 北大核心 2024年第7期2170-2182,共13页
视频隐私保护是当前社会面临的重要挑战之一,对视频进行模糊处理是保护人们隐私权益的重要手段.由于模糊视频天然缺失视觉模态的信息,主流的视频动作识别算法无法取得令人满意的效果.模糊视频作为多模态介质不仅仅只有视觉模态信息,同时... 视频隐私保护是当前社会面临的重要挑战之一,对视频进行模糊处理是保护人们隐私权益的重要手段.由于模糊视频天然缺失视觉模态的信息,主流的视频动作识别算法无法取得令人满意的效果.模糊视频作为多模态介质不仅仅只有视觉模态信息,同时,也含有丰富的音频模态信息,从人类的认知角度而言,音频也是获取信息的重要来源.本文提出一种基于多模态融合的隐私视频动作识别方法,在保证不侵犯使用者隐私的前提下进行人类动作行为识别.具体来说,使用音频-视觉特征融合模块将音频模态特征图融入到视觉模态中,充分融合音视频模态的深层语义信息.除此之外,模型还引入清晰视频帧图像作为标签,在模型训练阶段监督动作识别网络的参数更新,为隐私视频动作识别网络提供清晰的语义信息.在多组隐私行为数据集上,通过大量消融和对比实验验证了所提方法的有效性. 展开更多
关键词 音视频特征融合 语义清晰化 隐私保护
下载PDF
基于扩张卷积和Transformer的视听融合语音分离方法
20
作者 刘宏清 谢奇洲 +1 位作者 赵宇 周翊 《信号处理》 CSCD 北大核心 2024年第7期1208-1217,共10页
为了提高语音分离的效果,除了利用混合的语音信号,还可以借助视觉信号作为辅助信息。这种融合了视觉与音频信号的多模态建模方式,已被证实可以有效地提高语音分离的性能,为语音分离任务提供了新的可能性。为了更好地捕捉视觉与音频特征... 为了提高语音分离的效果,除了利用混合的语音信号,还可以借助视觉信号作为辅助信息。这种融合了视觉与音频信号的多模态建模方式,已被证实可以有效地提高语音分离的性能,为语音分离任务提供了新的可能性。为了更好地捕捉视觉与音频特征中的长期依赖关系,并强化网络对输入上下文信息的理解,本文提出了一种基于一维扩张卷积与Transformer的时域视听融合语音分离模型。将基于频域的传统视听融合语音分离方法应用到时域中,避免了时频变换带来的信息损失和相位重构问题。所提网络架构包含四个模块:一个视觉特征提取网络,用于从视频帧中提取唇部嵌入特征;一个音频编码器,用于将混合语音转换为特征表示;一个多模态分离网络,主要由音频子网络、视频子网络,以及Transformer网络组成,用于利用视觉和音频特征进行语音分离;以及一个音频解码器,用于将分离后的特征还原为干净的语音。本文使用LRS2数据集生成的包含两个说话者混合语音的数据集。实验结果表明,所提出的网络在尺度不变信噪比改进(Scale-Invariant Signal-to-Noise Ratio Improvement,SISNRi)与信号失真比改进(Signal-to-Distortion Ratio Improvement,SDRi)这两种指标上分别达到14.0 dB与14.3 dB,较纯音频分离模型和普适的视听融合分离模型有明显的性能提升。 展开更多
关键词 语音分离 视听融合 多头自注意力机制 扩张卷积
下载PDF
上一页 1 2 91 下一页 到第
使用帮助 返回顶部