This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designe...This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designed aims to mitigate the impact of various noise attacks on the integrity of secret information during transmission.The method we propose involves encoding secret images into stylized encrypted images and applies adversarial transfer to both the style and content features of the original and embedded data.This process effectively enhances the concealment and imperceptibility of confidential information,thereby improving the security of such information during transmission and reducing security risks.Furthermore,we have designed a specialized attack layer to simulate real-world attacks and common noise scenarios encountered in practical environments.Through adversarial training,the algorithm is strengthened to enhance its resilience against attacks and overall robustness,ensuring better protection against potential threats.Experimental results demonstrate that our proposed algorithm successfully enhances the concealment and unknowability of secret information while maintaining embedding capacity.Additionally,it ensures the quality and fidelity of the stego image.The method we propose not only improves the security and robustness of information hiding technology but also holds practical application value in protecting sensitive data and ensuring the invisibility of confidential information.展开更多
Traditional information hiding techniques achieve information hiding by modifying carrier data,which can easily leave detectable traces that may be detected by steganalysis tools.Especially in image transmission,both ...Traditional information hiding techniques achieve information hiding by modifying carrier data,which can easily leave detectable traces that may be detected by steganalysis tools.Especially in image transmission,both geometric and non-geometric attacks can cause subtle changes in the pixels of the image during transmission.To overcome these challenges,we propose a constructive robust image steganography technique based on style transformation.Unlike traditional steganography,our algorithm does not involve any direct modifications to the carrier data.In this study,we constructed a mapping dictionary by setting the correspondence between binary codes and image categories and then used the mapping dictionary to map secret information to secret images.Through image semantic segmentation and style transfer techniques,we combined the style of secret images with the content of public images to generate stego images.This type of stego image can resist interference during public channel transmission,ensuring the secure transmission of information.At the receiving end,we input the stego image into a trained secret image reconstruction network,which can effectively reconstruct the original secret image and further recover the secret information through a mapping dictionary to ensure the security,accuracy,and efficient decoding of the information.The experimental results show that this constructive information hiding method based on style transfer improves the security of information hiding,enhances the robustness of the algorithm to various attacks,and ensures information security.展开更多
Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalizat...Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.展开更多
The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean port...The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.展开更多
In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of ...In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of model convergence caused by the irreversible en-decoder methodology of the existing models.Aiming to this,this paper proposes a Flow-based architecture with both the en-decoder sharing a reversible network configuration.The proposed APST-Flow can efficiently reduce model uncertainty via a compact analysis-synthesis methodology,thereby the generalization performance and the convergence stability are improved.For the generator,a Flow-based network using Wavelet additive coupling(WAC)layers is implemented to extract multi-scale content features.Also,a style checker is used to enhance the global style consistency by minimizing the error between the reconstructed and the input images.To enhance the generated salient details,a loss of adaptive stroke edge is applied in both the global and local model training.The experimental results show that the proposed method improves PSNR by 5%,SSIM by 6.2%,and decreases Style Error by 29.4%over the existing models on the ChipPhi set.The competitive results verify that APST-Flow achieves high-quality generation with less content deviation and enhanced generalization,thereby can be further applied to more APST scenes.展开更多
With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while ...With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while current end-to-end model learning is generally limited to training of massive data,innovation of deep network architecture,and learning in-situ model in a simulation environment.Therefore,we introduce a new image style transfer method into data augmentation,and improve the diversity of limited data by changing the texture,contrast ratio and color of the image,and then it is extended to the scenarios that the model has been unobserved before.Inspired by rapid style transfer and artistic style neural algorithms,we propose an arbitrary style generation network architecture,including style transfer network,style learning network,style loss network and multivariate Gaussian distribution function.The style embedding vector is randomly sampled from the multivariate Gaussian distribution and linearly interpolated with the embedded vector predicted by the input image on the style learning network,which provides a set of normalization constants for the style transfer network,and finally realizes the diversity of the image style.In order to verify the effectiveness of the method,image classification and simulation experiments were performed separately.Finally,we built a small-sized smart car experiment platform,and apply the data augmentation technology based on image style transfer drive to the experiment of automatic driving for the first time.The experimental results show that:(1)The proposed scheme can improve the prediction accuracy of the end-to-end model and reduce the model’s error accumulation;(2)the method based on image style transfer provides a new scheme for data augmentation technology,and also provides a solution for the high cost that many deep models rely heavily on a large number of label data.展开更多
The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to gen...The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to generate mesh from rock fracture images.In this new approach,we use digital rock fractures at multiple scales that represent’content’and define uniformly shaped and sized triangles to represent’style’.The 19-layer convolutional neural network(CNN)learns the content from the rock image,including lower-level features(such as edges and corners)and higher-level features(such as rock,fractures,or other mineral fillings),and learns the style from the triangular grids.By optimizing the cost function to achieve approximation to represent both the content and the style,numerical meshes can be generated and optimized.We utilize the NST to generate meshes for rough fractures with asperities formed in rock,a network of fractures embedded in rock,and a sand aggregate with multiple grains.Based on the examples,we show that this new NST technique can make mesh generation and optimization much more efficient by achieving a good balance between the density of the mesh and the presentation of the geometric features.Finally,we discuss future applications of this approach and perspectives of applying machine learning to bridge the gaps between numerical modeling and experiments.展开更多
The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target...The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.展开更多
The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive tech...The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive technique that can cover all the diverse noises and mitigate their damaging effects on the performance and precision of various systems is still missing.In this paper,we have focused on the stability and robustness of one computer vision branch(i.e.,visual object tracking).We have demonstrated that,without imposing a heavy computational load on a model or changing its algorithms,the drop in the performance and accuracy of a system when it is exposed to an unseen noise-laden test dataset can be prevented by simply applying the style transfer technique on the train dataset and training the model with a combination of these and the original untrained data.To verify our proposed approach,it is applied on a generic object tracker by using regression networks.This method’s validity is confirmed by testing it on an exclusive benchmark comprising 50 image sequences,with each sequence containing 15 types of noise at five different intensity levels.The OPE curves obtained show a 40%increase in the robustness of the proposed object tracker against noise,compared to the other trackers considered.展开更多
Aiming at the current process of artistic creation and animation creation, there are a lot of repeated manual operations in the process of conversion from sketch to the stylized image. This paper presented a solution ...Aiming at the current process of artistic creation and animation creation, there are a lot of repeated manual operations in the process of conversion from sketch to the stylized image. This paper presented a solution based on a deep learning framework to realize image generation and style transfer. The method first used the conditional generation to resist the network, optimizes the loss function of the training mapping relationship, and generated the actual image from the input sketch. Then, by defining and optimizing the perceptual loss function of the style transfer model, the style features are extracted from the image, thereby forming the actual The conversion between images and stylized art images. Experiments show that this method can greatly reduce the work of coloring and converting with different artistic effects, and achieve the purpose of transforming simple stick figures into actual object images.展开更多
Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media.However,the appearances of different regions may be inconsistent with each other a...Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media.However,the appearances of different regions may be inconsistent with each other after performing regional editing.In this paper,we focus on harmonized regional style transfer for facial images.A multi-scale encoder is proposed for accurate style code extraction.The key part of our work is a multi-region style attention module.It adapts multiple regional style embeddings from a reference image to a target image,to generate a harmonious result.We also propose style mapping networks for multi-modal style synthesis.We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space.Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets.The results show that our model can reliably perform style transfer and multimodal manipulation,generating output comparable to the state of the art.展开更多
Most of the existing virtual scenarios built for the digital protection of Chinese classical private gardens are too modern in expression style to show the aesthetic significance of their historical period.Considering...Most of the existing virtual scenarios built for the digital protection of Chinese classical private gardens are too modern in expression style to show the aesthetic significance of their historical period.Considering the aesthetic commonality between traditional Chinese landscape paintings and classical private gardens and referring to image style transfer,here,a deep neural network was proposed to transfer the aesthetic style from landscape paintings to the virtual scenario of classical private gardens.The network consisted of two parts:style prediction and style transfer.The style prediction network was used to obtain style representation from style paintings,and the style transfer network was used to transfer style representation to the content scenario.The pre-trained network was then embedded into the scenario rendering pipeline and combined with the screen post-processing method to realise the stylised expression of the virtual scenario.To verify the feasibility of this methodology,a virtual scenario of the Humble Administrator’s Garden was used as the content scenario andfive garden landscape paintings from different time periods and painting styles were selected for the case study.The results demonstrated that this methodology could effectively achieve the aesthetic style transfer of a virtual scenario.展开更多
In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the met...In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set.展开更多
Vision Transformer has shown impressive performance on the image classification tasks.Observing that most existing visual style transfer(VST)algorithms are based on the texture-biased convolution neural network(CNN),h...Vision Transformer has shown impressive performance on the image classification tasks.Observing that most existing visual style transfer(VST)algorithms are based on the texture-biased convolution neural network(CNN),here raises the question of whether the shape-biased Vision Transformer can perform style transfer as CNN.In this work,we focus on comparing and analyzing the shape bias between CNN-and transformer-based models from the view of VST tasks.For comprehensive comparisons,we propose three kinds of transformer-based visual style transfer(Tr-VST)methods(Tr-NST for optimization-based VST,Tr-WCT for reconstruction-based VST and Tr-AdaIN for perceptual-based VST).By engaging three mainstream VST methods in the transformer pipeline,we show that transformer-based models pre-trained on ImageNet are not proper for style transfer methods.Due to the strong shape bias of the transformer-based models,these Tr-VST methods cannot render style patterns.We further analyze the shape bias by considering the influence of the learned parameters and the structure design.Results prove that with proper style supervision,the transformer can learn similar texture-biased features as CNN does.With the reduced shape bias in the transformer encoder,Tr-VST methods can generate higher-quality results compared with state-of-the-art VST methods.展开更多
Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a l...Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a lack of paired training samples,lack of good image translation,low feature extraction from the previous domain images,and lack of high-quality image translation from the traditional generator algorithms.To solve the above-mentioned issues,paired independent model,high-quality dataset,Bayesian-based feature extractor,and an improved generator must be proposed.In this study,we propose a high-quality dataset to reduce the effect of paired training samples on the model’s performance.We use a Bayesian Very Deep Convolutional Network(VGG)-based feature extractor to improve the performance of the standard feature extractor because Bayesian inference regu-larizes weights well.The generator from the Cartoon Generative Adversarial Network(GAN)is modified by introducing a depthwise convolution layer and channel attention mechanism to improve the performance of the original generator.We have used the Fréchet inception distance(FID)score and user preference score to evaluate the performance of the model.The FID scores obtained for the generated cartoon and real-world images are 107 and 76 for the TCC style,and 137 and 57 for the Hayao style,respectively.User preference score is also calculated to evaluate the quality of generated images and our proposed model acquired a high preference score compared to other models.We achieved stunning results in producing high-quality cartoon images,demonstrating the proposed model’s effectiveness in transferring style between authentic images and cartoon images.展开更多
针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对Efficie...针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对EfficientNet V2、RegNet Y 16GF和ViT Large 16等模型进行微调训练,生成新模型,实现基于单个深度学习的服装风格图像分类;最后,为进一步提高图像分类的准确性、可靠性和鲁棒性,分别采用基于投票、加权平均和堆叠的集成学习方法对上述单个模型进行组合预测。迁移学习实验结果表明,基于ViT Large 16的深度学习模型在测试集上表现最佳,平均准确率为77.024%;集成学习方法实验结果显示,基于投票的集成学习方法在相同测试集上平均准确率可达78.833%。研究结果为解决服装风格分类问题提供了新的思路。展开更多
基金the National Natural Science Foundation of China(Nos.62272478,61872384)Natural Science Foundation of Shanxi Province(No.2023-JC-YB-584)+1 种基金National Natural Science Foundation of China(No.62172436)Engineering University of PAP’s Funding for Scientific Research Innovation Team,Engineering University of PAP’s Funding for Key Researcher(No.KYGG202011).
文摘This paper proposes an artificial intelligence-based robust information hiding algorithm to address the issue of confidential information being susceptible to noise attacks during transmission.The algorithm we designed aims to mitigate the impact of various noise attacks on the integrity of secret information during transmission.The method we propose involves encoding secret images into stylized encrypted images and applies adversarial transfer to both the style and content features of the original and embedded data.This process effectively enhances the concealment and imperceptibility of confidential information,thereby improving the security of such information during transmission and reducing security risks.Furthermore,we have designed a specialized attack layer to simulate real-world attacks and common noise scenarios encountered in practical environments.Through adversarial training,the algorithm is strengthened to enhance its resilience against attacks and overall robustness,ensuring better protection against potential threats.Experimental results demonstrate that our proposed algorithm successfully enhances the concealment and unknowability of secret information while maintaining embedding capacity.Additionally,it ensures the quality and fidelity of the stego image.The method we propose not only improves the security and robustness of information hiding technology but also holds practical application value in protecting sensitive data and ensuring the invisibility of confidential information.
基金the National Natural Science Foundation of China(Nos.62272478,61872384,62172436,62102451)Natural Science Foundation of Shanxi Province(No.2023-JC-YB-584)Engineering University of PAP’s Funding for Key Researcher(No.KYGG202011).
文摘Traditional information hiding techniques achieve information hiding by modifying carrier data,which can easily leave detectable traces that may be detected by steganalysis tools.Especially in image transmission,both geometric and non-geometric attacks can cause subtle changes in the pixels of the image during transmission.To overcome these challenges,we propose a constructive robust image steganography technique based on style transformation.Unlike traditional steganography,our algorithm does not involve any direct modifications to the carrier data.In this study,we constructed a mapping dictionary by setting the correspondence between binary codes and image categories and then used the mapping dictionary to map secret information to secret images.Through image semantic segmentation and style transfer techniques,we combined the style of secret images with the content of public images to generate stego images.This type of stego image can resist interference during public channel transmission,ensuring the secure transmission of information.At the receiving end,we input the stego image into a trained secret image reconstruction network,which can effectively reconstruct the original secret image and further recover the secret information through a mapping dictionary to ensure the security,accuracy,and efficient decoding of the information.The experimental results show that this constructive information hiding method based on style transfer improves the security of information hiding,enhances the robustness of the algorithm to various attacks,and ensures information security.
基金supported by the National Research Foundation of Korea (NRF)grant funded by the Korean government (MSIT) (No.2022R1A2C1004657,Contribution Rate:50%)Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by Ministry of Culture Sports and Tourism in 2024 (Project Name:Developing Professionals for R&D in Contents Production Based on Generative Ai and Cloud,Project Number:RS-2024-00352578,Contribution Rate:50%).
文摘Arbitrary style transfer aims to perceptually reflect the style of a reference image in artistic creations with visual aesthetics.Traditional style transfer models,particularly those using adaptive instance normalization(AdaIN)layer,rely on global statistics,which often fail to capture the spatially local color distribution,leading to outputs that lack variation despite geometric transformations.To address this,we introduce Patchified AdaIN,a color-inspired style transfer method that applies AdaIN to localized patches,utilizing local statistics to capture the spatial color distribution of the reference image.This approach enables enhanced color awareness in style transfer,adapting dynamically to geometric transformations by leveraging local image statistics.Since Patchified AdaIN builds on AdaIN,it integrates seamlessly into existing frameworks without the need for additional training,allowing users to control the output quality through adjustable blending parameters.Our comprehensive experiments demonstrate that Patchified AdaIN can reflect geometric transformations(e.g.,translation,rotation,flipping)of images for style transfer,thereby achieving superior results compared to state-of-the-art methods.Additional experiments show the compatibility of Patchified AdaIN for integration into existing networks to enable spatial color-aware arbitrary style transfer by replacing the conventional AdaIN layer with the Patchified AdaIN layer.
基金supported by Metaverse Lab Program funded by the Ministry of Science and ICT(MSIT),and the Korea Radio Promotion Association(RAPA).
文摘The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.
基金support from National Natural Science Foundation of China(62062048).
文摘In recent years,deep generative models have been successfully applied to perform artistic painting style transfer(APST).The difficulties might lie in the loss of reconstructing spatial details and the inefficiency of model convergence caused by the irreversible en-decoder methodology of the existing models.Aiming to this,this paper proposes a Flow-based architecture with both the en-decoder sharing a reversible network configuration.The proposed APST-Flow can efficiently reduce model uncertainty via a compact analysis-synthesis methodology,thereby the generalization performance and the convergence stability are improved.For the generator,a Flow-based network using Wavelet additive coupling(WAC)layers is implemented to extract multi-scale content features.Also,a style checker is used to enhance the global style consistency by minimizing the error between the reconstructed and the input images.To enhance the generated salient details,a loss of adaptive stroke edge is applied in both the global and local model training.The experimental results show that the proposed method improves PSNR by 5%,SSIM by 6.2%,and decreases Style Error by 29.4%over the existing models on the ChipPhi set.The competitive results verify that APST-Flow achieves high-quality generation with less content deviation and enhanced generalization,thereby can be further applied to more APST scenes.
基金the National Natural Science Foundation of China(51965008)Science and Technology projects of Guizhou[2018]2168Excellent Young Researcher Project of Guizhou[2017]5630.
文摘With the advent of deep learning,self-driving schemes based on deep learning are becoming more and more popular.Robust perception-action models should learn from data with different scenarios and real behaviors,while current end-to-end model learning is generally limited to training of massive data,innovation of deep network architecture,and learning in-situ model in a simulation environment.Therefore,we introduce a new image style transfer method into data augmentation,and improve the diversity of limited data by changing the texture,contrast ratio and color of the image,and then it is extended to the scenarios that the model has been unobserved before.Inspired by rapid style transfer and artistic style neural algorithms,we propose an arbitrary style generation network architecture,including style transfer network,style learning network,style loss network and multivariate Gaussian distribution function.The style embedding vector is randomly sampled from the multivariate Gaussian distribution and linearly interpolated with the embedded vector predicted by the input image on the style learning network,which provides a set of normalization constants for the style transfer network,and finally realizes the diversity of the image style.In order to verify the effectiveness of the method,image classification and simulation experiments were performed separately.Finally,we built a small-sized smart car experiment platform,and apply the data augmentation technology based on image style transfer drive to the experiment of automatic driving for the first time.The experimental results show that:(1)The proposed scheme can improve the prediction accuracy of the end-to-end model and reduce the model’s error accumulation;(2)the method based on image style transfer provides a new scheme for data augmentation technology,and also provides a solution for the high cost that many deep models rely heavily on a large number of label data.
基金supported by Laboratory Directed Research and Development(LDRD)funding from Berkeley Laboratoryby the US Department of Energy(DOE),including the Office of Basic Energy Sciences,Chemical Sciences,Geosciences,and Biosciences Division and the Office of Nuclear Energy,Spent Fuel and Waste Disposition Campaign,both under Contract No.DEAC02-05CH11231 with Berkeley Laboratory。
文摘The complex geometric features of subsurface fractures at different scales makes mesh generation challenging and/or expensive.In this paper,we make use of neural style transfer(NST),a machine learning technique,to generate mesh from rock fracture images.In this new approach,we use digital rock fractures at multiple scales that represent’content’and define uniformly shaped and sized triangles to represent’style’.The 19-layer convolutional neural network(CNN)learns the content from the rock image,including lower-level features(such as edges and corners)and higher-level features(such as rock,fractures,or other mineral fillings),and learns the style from the triangular grids.By optimizing the cost function to achieve approximation to represent both the content and the style,numerical meshes can be generated and optimized.We utilize the NST to generate meshes for rough fractures with asperities formed in rock,a network of fractures embedded in rock,and a sand aggregate with multiple grains.Based on the examples,we show that this new NST technique can make mesh generation and optimization much more efficient by achieving a good balance between the density of the mesh and the presentation of the geometric features.Finally,we discuss future applications of this approach and perspectives of applying machine learning to bridge the gaps between numerical modeling and experiments.
基金This work was funded by the China Postdoctoral Science Foundation(No.2019M661319)Heilongjiang Postdoctoral Scientific Research Developmental Foundation(No.LBH-Q17042)+1 种基金Fundamental Research Funds for the Central Universities(3072020CFQ0602,3072020CF0604,3072020CFP0601)2019 Industrial Internet Innovation and Development Engineering(KY1060020002,KY10600200008).
文摘The technology for image-to-image style transfer(a prevalent image processing task)has developed rapidly.The purpose of style transfer is to extract a texture from the source image domain and transfer it to the target image domain using a deep neural network.However,the existing methods typically have a large computational cost.To achieve efficient style transfer,we introduce a novel Ghost module into the GANILLA architecture to produce more feature maps from cheap operations.Then we utilize an attention mechanism to transform images with various styles.We optimize the original generative adversarial network(GAN)by using more efficient calculation methods for image-to-illustration translation.The experimental results show that our proposed method is similar to human vision and still maintains the quality of the image.Moreover,our proposed method overcomes the high computational cost and high computational resource consumption for style transfer.By comparing the results of subjective and objective evaluation indicators,our proposed method has shown superior performance over existing methods.
文摘The performance and accuracy of computer vision systems are affected by noise in different forms.Although numerous solutions and algorithms have been presented for dealing with every type of noise,a comprehensive technique that can cover all the diverse noises and mitigate their damaging effects on the performance and precision of various systems is still missing.In this paper,we have focused on the stability and robustness of one computer vision branch(i.e.,visual object tracking).We have demonstrated that,without imposing a heavy computational load on a model or changing its algorithms,the drop in the performance and accuracy of a system when it is exposed to an unseen noise-laden test dataset can be prevented by simply applying the style transfer technique on the train dataset and training the model with a combination of these and the original untrained data.To verify our proposed approach,it is applied on a generic object tracker by using regression networks.This method’s validity is confirmed by testing it on an exclusive benchmark comprising 50 image sequences,with each sequence containing 15 types of noise at five different intensity levels.The OPE curves obtained show a 40%increase in the robustness of the proposed object tracker against noise,compared to the other trackers considered.
文摘Aiming at the current process of artistic creation and animation creation, there are a lot of repeated manual operations in the process of conversion from sketch to the stylized image. This paper presented a solution based on a deep learning framework to realize image generation and style transfer. The method first used the conditional generation to resist the network, optimizes the loss function of the training mapping relationship, and generated the actual image from the input sketch. Then, by defining and optimizing the perceptual loss function of the style transfer model, the style features are extracted from the image, thereby forming the actual The conversion between images and stylized art images. Experiments show that this method can greatly reduce the work of coloring and converting with different artistic effects, and achieve the purpose of transforming simple stick figures into actual object images.
基金partly supported by the National Key R&D Program of China(No.2020YFA0714100)the National Natural Science Foundation of China(Nos.61872162,62102162,61832016,U20B2070).
文摘Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media.However,the appearances of different regions may be inconsistent with each other after performing regional editing.In this paper,we focus on harmonized regional style transfer for facial images.A multi-scale encoder is proposed for accurate style code extraction.The key part of our work is a multi-region style attention module.It adapts multiple regional style embeddings from a reference image to a target image,to generate a harmonious result.We also propose style mapping networks for multi-modal style synthesis.We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space.Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets.The results show that our model can reliably perform style transfer and multimodal manipulation,generating output comparable to the state of the art.
基金This work was supported by the Key Project of the National Natural Science Foundation of China(NSFC)under Grant 41930104National Key R&D Program of China under Grant 2021 YFE0112300+1 种基金Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant KYCX21_1336China Scholarship Council under Grant 202206860019.
文摘Most of the existing virtual scenarios built for the digital protection of Chinese classical private gardens are too modern in expression style to show the aesthetic significance of their historical period.Considering the aesthetic commonality between traditional Chinese landscape paintings and classical private gardens and referring to image style transfer,here,a deep neural network was proposed to transfer the aesthetic style from landscape paintings to the virtual scenario of classical private gardens.The network consisted of two parts:style prediction and style transfer.The style prediction network was used to obtain style representation from style paintings,and the style transfer network was used to transfer style representation to the content scenario.The pre-trained network was then embedded into the scenario rendering pipeline and combined with the screen post-processing method to realise the stylised expression of the virtual scenario.To verify the feasibility of this methodology,a virtual scenario of the Humble Administrator’s Garden was used as the content scenario andfive garden landscape paintings from different time periods and painting styles were selected for the case study.The results demonstrated that this methodology could effectively achieve the aesthetic style transfer of a virtual scenario.
基金funded by the Hanoi University of Science and Technology(HUST)under grant number T2018-PC-210.
文摘In recent years,speech synthesis systems have allowed for the produc-tion of very high-quality voices.Therefore,research in this domain is now turning to the problem of integrating emotions into speech.However,the method of con-structing a speech synthesizer for each emotion has some limitations.First,this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second,training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition,each model for each emotion failed to take advantage of data sets of other emotions.In this paper,we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flow-tron model.In addition,we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next,to produce a high-quality speech synthesis model,we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opi-nion score(MOS)assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion,the proposed method proves to be more effec-tive for a high degree of automation and fast emotional sentence generation,using a small emotional-speech data set.
基金the National Key Research and Development Program of China under Grant No.2020AAA0106200the National Natural Science Foundation of China under Grant Nos.62102162,61832016,U20B2070,and 6210070958+1 种基金the CASIA-Tencent Youtu Joint Research Projectthe Open Projects Program of the National Laboratory of Pattern Recognition.
文摘Vision Transformer has shown impressive performance on the image classification tasks.Observing that most existing visual style transfer(VST)algorithms are based on the texture-biased convolution neural network(CNN),here raises the question of whether the shape-biased Vision Transformer can perform style transfer as CNN.In this work,we focus on comparing and analyzing the shape bias between CNN-and transformer-based models from the view of VST tasks.For comprehensive comparisons,we propose three kinds of transformer-based visual style transfer(Tr-VST)methods(Tr-NST for optimization-based VST,Tr-WCT for reconstruction-based VST and Tr-AdaIN for perceptual-based VST).By engaging three mainstream VST methods in the transformer pipeline,we show that transformer-based models pre-trained on ImageNet are not proper for style transfer methods.Due to the strong shape bias of the transformer-based models,these Tr-VST methods cannot render style patterns.We further analyze the shape bias by considering the influence of the learned parameters and the structure design.Results prove that with proper style supervision,the transformer can learn similar texture-biased features as CNN does.With the reduced shape bias in the transformer encoder,Tr-VST methods can generate higher-quality results compared with state-of-the-art VST methods.
文摘Visual illustration transformation from real-world to cartoon images is one of the famous and challenging tasks in computer vision.Image-to-image translation from real-world to cartoon domains poses issues such as a lack of paired training samples,lack of good image translation,low feature extraction from the previous domain images,and lack of high-quality image translation from the traditional generator algorithms.To solve the above-mentioned issues,paired independent model,high-quality dataset,Bayesian-based feature extractor,and an improved generator must be proposed.In this study,we propose a high-quality dataset to reduce the effect of paired training samples on the model’s performance.We use a Bayesian Very Deep Convolutional Network(VGG)-based feature extractor to improve the performance of the standard feature extractor because Bayesian inference regu-larizes weights well.The generator from the Cartoon Generative Adversarial Network(GAN)is modified by introducing a depthwise convolution layer and channel attention mechanism to improve the performance of the original generator.We have used the Fréchet inception distance(FID)score and user preference score to evaluate the performance of the model.The FID scores obtained for the generated cartoon and real-world images are 107 and 76 for the TCC style,and 137 and 57 for the Hayao style,respectively.User preference score is also calculated to evaluate the quality of generated images and our proposed model acquired a high preference score compared to other models.We achieved stunning results in producing high-quality cartoon images,demonstrating the proposed model’s effectiveness in transferring style between authentic images and cartoon images.
文摘针对服装风格人工分类受主观性、地域等因素影响而造成的分类错误问题,研究了一种基于人工智能的服装风格图像分类方法。首先,在FashionStyle14数据集基础上筛除重复或无效图像,构建服装风格图像数据集;然后,采用迁移学习方法,对EfficientNet V2、RegNet Y 16GF和ViT Large 16等模型进行微调训练,生成新模型,实现基于单个深度学习的服装风格图像分类;最后,为进一步提高图像分类的准确性、可靠性和鲁棒性,分别采用基于投票、加权平均和堆叠的集成学习方法对上述单个模型进行组合预测。迁移学习实验结果表明,基于ViT Large 16的深度学习模型在测试集上表现最佳,平均准确率为77.024%;集成学习方法实验结果显示,基于投票的集成学习方法在相同测试集上平均准确率可达78.833%。研究结果为解决服装风格分类问题提供了新的思路。