Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of t...Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of the Three Kingdoms,a classic literary work in China,has received significant attention and promotion from leading audio platforms.However,the commercialization of digital audio publishing faces unprecedented challenges due to the mismatch between the dissemination of long-form content on digital audio platforms and the current trend of short and fast information reception.Drawing on the Business Model Canvas Theory and taking Romance of the Three Kingdoms as the main focus of analysis,this paper argues that the construction of a business model for the audio publishing of classical books should start from three aspects:the user evaluation of digital audio platforms,the establishment of value propositions based on the“creative transformation and innovative development”principle,and the improvement of the audio publishing infrastructure to ensure the healthy operation and development of the digital audio platforms and consequently improve their current state of development and expand the boundaries of cultural heritage.展开更多
Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animation...Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animations,and the portability of virtual character gestures and facial animations has not received sufficient attention.Methods Therefore,we propose a deep-learning-based audio-to-animation-and-blendshape(Audio2AB)network that generates gesture animations and ARK it's 52 facial expression parameter blendshape weights based on audio,audio-corresponding text,emotion labels,and semantic relevance labels to generate parametric data for full-body animations.This parameterization method can be used to drive full-body animations of virtual characters and improve their portability.In the experiment,we first downsampled the gesture and facial data to achieve the same temporal resolution for the input,output,and facial data.The Audio2AB network then encoded the audio,audio-corresponding text,emotion labels,and semantic relevance labels,and then fused the text,emotion labels,and semantic relevance labels into the audio to obtain better audio features.Finally,we established links between the body,gestures,and facial decoders and generated the corresponding animation sequences through our proposed GAN-GF loss function.Results By using audio,audio-corresponding text,and emotional and semantic relevance labels as input,the trained Audio2AB network could generate gesture animation data containing blendshape weights.Therefore,different 3D virtual character animations could be created through parameterization.Conclusions The experimental results showed that the proposed method could generate significant gestures and facial animations.展开更多
Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary mea...Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.展开更多
Steganography techniques,such as audio steganography,have been widely used in covert communication.However,the deep neural network,especially the convolutional neural network(CNN),has greatly threatened the security o...Steganography techniques,such as audio steganography,have been widely used in covert communication.However,the deep neural network,especially the convolutional neural network(CNN),has greatly threatened the security of audio steganography.Besides,existing adversarial attacks-based countermeasures cannot provide general perturbation,and the trans-ferability against unknown steganography detection methods is weak.This paper proposes a cover enhancement method for audio steganography based on universal adversarial perturbations with sample diversification to address these issues.Universal adversarial perturbation is constructed by iteratively optimizing adversarial perturbation,which applies adversarial attack tech-niques,such as Deepfool.Moreover,the sample diversification strategy is designed to improve the transferability of adversarial perturbations in black-box attack scenarios,where two types of common audio-processing operations are considered,including noise addition and moving picture experts group audio layer III(MP3)compression.Furthermore,the perturbation ensemble method is applied to further improve the attacks’transferability by integrating perturbations of different detection networks with heterogeneous architec-tures.Consequently,the single universal adversarial perturbation can enhance different cover audios against a CNN-based detection network.Extensive experiments have been conducted,and the results demonstrate that the average missed-detection probabilities of the proposed method are higher than those of the state-of-the-art methods by 7.3%and 16.6%for known and unknown detection networks,respectively.It verifies the efficiency and transferability of the proposed methods for the cover enhancement of audio steganography.展开更多
基金This study is a phased achievement of the“Research on Innovative Communication of Romance of the Three Kingdoms under Audio Empowerment”project(No.23ZGL16)funded by Zhuge Liang Research Center,a key research base of social sciences in Sichuan Province.
文摘Visual media have dominated sensory communications for decades,and the resulting“visual hegemony”leads to the call for the“auditory return”in order to achieve a holistic balance in cultural acceptance.Romance of the Three Kingdoms,a classic literary work in China,has received significant attention and promotion from leading audio platforms.However,the commercialization of digital audio publishing faces unprecedented challenges due to the mismatch between the dissemination of long-form content on digital audio platforms and the current trend of short and fast information reception.Drawing on the Business Model Canvas Theory and taking Romance of the Three Kingdoms as the main focus of analysis,this paper argues that the construction of a business model for the audio publishing of classical books should start from three aspects:the user evaluation of digital audio platforms,the establishment of value propositions based on the“creative transformation and innovative development”principle,and the improvement of the audio publishing infrastructure to ensure the healthy operation and development of the digital audio platforms and consequently improve their current state of development and expand the boundaries of cultural heritage.
基金Supported by the National Natural Science Foundation of China (62277014)the National Key Research and Development Program of China (2020YFC1523100)the Fundamental Research Funds for the Central Universities of China (PA2023GDSK0047)。
文摘Background Considerable research has been conducted in the areas of audio-driven virtual character gestures and facial animation with some degree of success.However,few methods exist for generating full-body animations,and the portability of virtual character gestures and facial animations has not received sufficient attention.Methods Therefore,we propose a deep-learning-based audio-to-animation-and-blendshape(Audio2AB)network that generates gesture animations and ARK it's 52 facial expression parameter blendshape weights based on audio,audio-corresponding text,emotion labels,and semantic relevance labels to generate parametric data for full-body animations.This parameterization method can be used to drive full-body animations of virtual characters and improve their portability.In the experiment,we first downsampled the gesture and facial data to achieve the same temporal resolution for the input,output,and facial data.The Audio2AB network then encoded the audio,audio-corresponding text,emotion labels,and semantic relevance labels,and then fused the text,emotion labels,and semantic relevance labels into the audio to obtain better audio features.Finally,we established links between the body,gestures,and facial decoders and generated the corresponding animation sequences through our proposed GAN-GF loss function.Results By using audio,audio-corresponding text,and emotional and semantic relevance labels as input,the trained Audio2AB network could generate gesture animation data containing blendshape weights.Therefore,different 3D virtual character animations could be created through parameterization.Conclusions The experimental results showed that the proposed method could generate significant gestures and facial animations.
基金Supported by Shandong Province Key R and D Program,No.2021SFGC0504Shandong Provincial Natural Science Foundation,No.ZR2021MF079Science and Technology Development Plan of Jinan(Clinical Medicine Science and Technology Innovation Plan),No.202225054.
文摘Depression is a common mental health disorder.With current depression detection methods,specialized physicians often engage in conversations and physiological examinations based on standardized scales as auxiliary measures for depression assessment.Non-biological markers-typically classified as verbal or non-verbal and deemed crucial evaluation criteria for depression-have not been effectively utilized.Specialized physicians usually require extensive training and experience to capture changes in these features.Advancements in deep learning technology have provided technical support for capturing non-biological markers.Several researchers have proposed automatic depression estimation(ADE)systems based on sounds and videos to assist physicians in capturing these features and conducting depression screening.This article summarizes commonly used public datasets and recent research on audio-and video-based ADE based on three perspectives:Datasets,deficiencies in existing research,and future development directions.
基金supported by the National Natural Science Foundation of China(61902263)the National Key Research and Development Program of China(2018YFB0804103).
文摘Steganography techniques,such as audio steganography,have been widely used in covert communication.However,the deep neural network,especially the convolutional neural network(CNN),has greatly threatened the security of audio steganography.Besides,existing adversarial attacks-based countermeasures cannot provide general perturbation,and the trans-ferability against unknown steganography detection methods is weak.This paper proposes a cover enhancement method for audio steganography based on universal adversarial perturbations with sample diversification to address these issues.Universal adversarial perturbation is constructed by iteratively optimizing adversarial perturbation,which applies adversarial attack tech-niques,such as Deepfool.Moreover,the sample diversification strategy is designed to improve the transferability of adversarial perturbations in black-box attack scenarios,where two types of common audio-processing operations are considered,including noise addition and moving picture experts group audio layer III(MP3)compression.Furthermore,the perturbation ensemble method is applied to further improve the attacks’transferability by integrating perturbations of different detection networks with heterogeneous architec-tures.Consequently,the single universal adversarial perturbation can enhance different cover audios against a CNN-based detection network.Extensive experiments have been conducted,and the results demonstrate that the average missed-detection probabilities of the proposed method are higher than those of the state-of-the-art methods by 7.3%and 16.6%for known and unknown detection networks,respectively.It verifies the efficiency and transferability of the proposed methods for the cover enhancement of audio steganography.