Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for tem...Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.展开更多
Sarcasm detection in text data is an increasingly vital area of research due to the prevalence of sarcastic content in online communication.This study addresses challenges associated with small datasets and class imba...Sarcasm detection in text data is an increasingly vital area of research due to the prevalence of sarcastic content in online communication.This study addresses challenges associated with small datasets and class imbalances in sarcasm detection by employing comprehensive data pre-processing and Generative Adversial Network(GAN)based augmentation on diverse datasets,including iSarcasm,SemEval-18,and Ghosh.This research offers a novel pipeline for augmenting sarcasm data with Reverse Generative Adversarial Network(RGAN).The proposed RGAN method works by inverting labels between original and synthetic data during the training process.This inversion of labels provides feedback to the generator for generating high-quality data closely resembling the original distribution.Notably,the proposed RGAN model exhibits performance on par with standard GAN,showcasing its robust efficacy in augmenting text data.The exploration of various datasets highlights the nuanced impact of augmentation on model performance,with cautionary insights into maintaining a delicate balance between synthetic and original data.The methodological framework encompasses comprehensive data pre-processing and GAN-based augmentation,with a meticulous comparison against Natural Language Processing Augmentation(NLPAug)as an alternative augmentation technique.Overall,the F1-score of our proposed technique outperforms that of the synonym replacement augmentation technique using NLPAug.The increase in F1-score in experiments using RGAN ranged from 0.066%to 1.054%,and the use of standard GAN resulted in a 2.88%increase in F1-score.The proposed RGAN model outperformed the NLPAug method and demonstrated comparable performance to standard GAN,emphasizing its efficacy in text data augmentation.展开更多
In this paper,we propose a hybrid model aiming to map the input noise vector to the label of the generated image by the generative adversarial network(GAN).This model mainly consists of a pre-trained deep convolution ...In this paper,we propose a hybrid model aiming to map the input noise vector to the label of the generated image by the generative adversarial network(GAN).This model mainly consists of a pre-trained deep convolution generative adversarial network(DCGAN)and a classifier.By using the model,we visualize the distribution of two-dimensional input noise,leading to a specific type of the generated image after each training epoch of GAN.The visualization reveals the distribution feature of the input noise vector and the performance of the generator.With this feature,we try to build a guided generator(GG)with the ability to produce a fake image we need.Two methods are proposed to build GG.One is the most significant noise(MSN)method,and the other utilizes labeled noise.The MSN method can generate images precisely but with less variations.In contrast,the labeled noise method has more variations but is slightly less stable.Finally,we propose a criterion to measure the performance of the generator,which can be used as a loss function to effectively train the network.展开更多
Sampling-based path planning is a popular methodology for robot path planning.With a uniform sampling strategy to explore the state space,a feasible path can be found without the complex geometric modeling of the conf...Sampling-based path planning is a popular methodology for robot path planning.With a uniform sampling strategy to explore the state space,a feasible path can be found without the complex geometric modeling of the configuration space.However,the quality of the initial solution is not guaranteed,and the convergence speed to the optimal solution is slow.In this paper,we present a novel image-based path planning algorithm to overcome these limitations.Specifically,a generative adversarial network(GAN)is designed to take the environment map(denoted as RGB image)as the input without other preprocessing works.The output is also an RGB image where the promising region(where a feasible path probably exists)is segmented.This promising region is utilized as a heuristic to achieve non-uniform sampling for the path planner.We conduct a number of simulation experiments to validate the effectiveness of the proposed method,and the results demonstrate that our method performs much better in terms of the quality of the initial solution and the convergence speed to the optimal solution.Furthermore,apart from the environments similar to the training set,our method also works well on the environments which are very different from the training set.展开更多
With aperture synthesis(AS)technique,a number of small antennas can be assembled to form a large telescope whose spatial resolution is determined by the distance of two farthest antennas instead of the diameter of a s...With aperture synthesis(AS)technique,a number of small antennas can be assembled to form a large telescope whose spatial resolution is determined by the distance of two farthest antennas instead of the diameter of a single-dish antenna.In contrast from a direct imaging system,an AS telescope captures the Fourier coefficients of a spatial object,and then implement inverse Fourier transform to reconstruct the spatial image.Due to the limited number of antennas,the Fourier coefficients are extremely sparse in practice,resulting in a very blurry image.To remove/reduce blur,“CLEAN”deconvolution has been widely used in the literature.However,it was initially designed for a point source.For an extended source,like the Sun,its efficiency is unsatisfactory.In this study,a deep neural network,referring to Generative Adversarial Network(GAN),is proposed for solar image deconvolution.The experimental results demonstrate that the proposed model is markedly better than traditional CLEAN on solar images.The main purpose of this work is visual inspection instead of quantitative scientific computation.We believe that this will also help scientists to better understand solar phenomena with high quality images.展开更多
The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean port...The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.展开更多
Concrete subjected to fire loads is susceptible to explosive spalling, which can lead to the exposure of reinforcingsteel bars to the fire, substantially jeopardizing the structural safety and stability. The spalling ...Concrete subjected to fire loads is susceptible to explosive spalling, which can lead to the exposure of reinforcingsteel bars to the fire, substantially jeopardizing the structural safety and stability. The spalling of fire-loaded concreteis closely related to the evolution of pore pressure and temperature. Conventional analytical methods involve theresolution of complex, strongly coupled multifield equations, necessitating significant computational efforts. Torapidly and accurately obtain the distributions of pore-pressure and temperature, the Pix2Pix model is adoptedin this work, which is celebrated for its capabilities in image generation. The open-source dataset used hereinfeatures RGB images we generated using a sophisticated coupled model, while the grayscale images encapsulate the15 principal variables influencing spalling. After conducting a series of tests with different layers configurations,activation functions and loss functions, the Pix2Pix model suitable for assessing the spalling risk of fire-loadedconcrete has been meticulously designed and trained. The applicability and reliability of the Pix2Pix model inconcrete parameter prediction are verified by comparing its outcomes with those derived fromthe strong couplingTHC model. Notably, for the practical engineering applications, our findings indicate that utilizing monochromeimages as the initial target for analysis yields more dependable results. This work not only offers valuable insightsfor civil engineers specializing in concrete structures but also establishes a robust methodological approach forresearchers seeking to create similar predictive models.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
The marine biological sonar system evolved in the struggle of nature is far superior to the current artificial sonar. Therefore, the development of bionic underwater concealed detection is of great strategic significa...The marine biological sonar system evolved in the struggle of nature is far superior to the current artificial sonar. Therefore, the development of bionic underwater concealed detection is of great strategic significance to the military and economy. In this paper, a generative adversarial network(GAN) is trained based on the dolphin vocal sound dataset we constructed, which can achieve unsupervised generation of dolphin vocal sounds with global consistency. Through the analysis of the generated audio samples and the real audio samples in the time domain and the frequency domain, it can be proven that the generated audio samples are close to the real audio samples,which meets the requirements of bionic underwater concealed detection.展开更多
Deep learning has reached many successes in Video Processing.Video has become a growing important part of our daily digital interactions.The advancement of better resolution content and the large volume offers serious...Deep learning has reached many successes in Video Processing.Video has become a growing important part of our daily digital interactions.The advancement of better resolution content and the large volume offers serious challenges to the goal of receiving,distributing,compressing and revealing highquality video content.In this paper we propose a novel Effective and Efficient video compression by the Deep Learning framework based on the flask,which creatively combines the Deep Learning Techniques on Convolutional Neural Networks(CNN)and Generative Adversarial Networks(GAN).The video compression method involves the layers are divided into different groups for data processing,using CNN to remove the duplicate frames,repeating the single image instead of the duplicate images by recognizing and detecting minute changes using GAN and recorded with Long Short-Term Memory(LSTM).Instead of the complete image,the small changes generated using GAN are substituted,which helps with frame-level compression.Pixel wise comparison is performed using K-nearest Neighbours(KNN)over the frame,clustered with K-means and Singular Value Decomposition(SVD)is applied for every frame in the video for all three colour channels[Red,Green,Blue]to decrease the dimension of the utility matrix[R,G,B]by extracting its latent factors.Video frames are packed with parameters with the aid of a codec and converted to video format and the results are compared with the original video.Repeated experiments on several videos with different sizes,duration,Frames per second(FPS),and quality results demonstrated a significant resampling rate.On normal,the outcome delivered had around a 10%deviation in quality and over half in size when contrasted,and the original video.展开更多
Cancellable biometrics is the solution for the trade-off between two concepts:Biometrics for Security and Security for Biometrics.The cancelable template is stored in the authentication system’s database rather than ...Cancellable biometrics is the solution for the trade-off between two concepts:Biometrics for Security and Security for Biometrics.The cancelable template is stored in the authentication system’s database rather than the original biometric data.In case of the database is compromised,it is easy for the template to be canceled and regenerated from the same biometric data.Recoverability of the cancelable template comes from the diversity of the cancelable transformation parameters(cancelable key).Therefore,the cancelable key must be secret to be used in the system authentication process as a second authentication factor in con-junction with the biometric data.The main contribution of this paper is to tackle the risks of stolen/lost/shared cancelable keys by using biometric trait(in different feature domains)as the only authentication factor,in addition to achieving good performance with high security.The standard Generative Adversarial Network(GAN)is proposed as an encryption tool that needs the cancelable key during the training phase,and the testing phase depends only on the biometric trait.Additionally,random projection transformation is employed to increase the proposed system’s security and performance.The proposed transformation system is tested using the standard ORL face database,and the experiments are done by applying different features domains.Moreover,a security analysis for the proposed transformation system is presented.展开更多
Building Integrated Photovoltaics (BIPV) is a promising technology to decarbonize urban energy systems viaharnessing solar energy available on building envelopes. While methods to assess solar irradiation, especiallyo...Building Integrated Photovoltaics (BIPV) is a promising technology to decarbonize urban energy systems viaharnessing solar energy available on building envelopes. While methods to assess solar irradiation, especiallyon rooftops, are well established, the assessment on building facades usually involves a higher effort due tomore complex urban features and obstructions. The drawback of existing physics-based simulation programsare that they require significant manual modeling effort and computing time for generating time resolveddeterministic results. Yet, solar irradiation is highly intermittent and representing its inherent uncertainty maybe required for designing robust BIPV energy systems. Targeting on these drawbacks, this paper proposes adata-driven model based on Deep Generative Networks (DGN) to efficiently generate stochastic ensembles ofannual hourly solar irradiance time series on building facades with uncompromised spatiotemporal resolutionat the urban scale. The only input required are easily obtainable fisheye images as categorical shading maskscaptured from 3D models. In principle, even actual photographs of urban contexts can be utilized, given they are semantically segmented. The potential of our approach is that it may be applied as a surrogate for timeconsuming simulations, when facing lacking information (e.g., no 3D model exists), and to use the generatedstochastic time-series ensembles in robust energy systems planning. Our validations exemplify a good fidelityof the generated time series when compared to the physics-based simulator. Due to the nature of the usedDGNs, it remains an open challenge to precisely reconstruct the ground truth one-to-one for each hour of theyear. However, we consider the benefits of the approach to outweigh the shortcomings. To demonstrate themodel’s relevance for urban energy planning, we showcase its potential for generative design by parametricallyaltering characteristic features of the urban environment and producing corresponding time series on buildingfacades under different climatic contexts in real-time.展开更多
Super-resolution reconstruction in medical imaging has become more demanding due to the necessity of obtaining high-quality images with minimal radiation dose,such as in low-field magnetic resonance imaging(MRI).Howev...Super-resolution reconstruction in medical imaging has become more demanding due to the necessity of obtaining high-quality images with minimal radiation dose,such as in low-field magnetic resonance imaging(MRI).However,image super-resolution reconstruction remains a difficult task because of the complexity and high textual requirements for diagnosis purpose.In this paper,we offer a deep learning based strategy for reconstructing medical images from low resolutions utilizing Transformer and generative adversarial networks(T-GANs).The integrated system can extract more precise texture information and focus more on important locations through global image matching after successfully inserting Transformer into the generative adversarial network for picture reconstruction.Furthermore,we weighted the combination of content loss,adversarial loss,and adversarial feature loss as the final multi-task loss function during the training of our proposed model T-GAN.In comparison to established measures like peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM),our suggested T-GAN achieves optimal performance and recovers more texture features in super-resolution reconstruction of MRI scanned images of the knees and belly.展开更多
Ambient occlusion(AO)is a widely-used real-time rendering technique which estimates light intensity on visible scene surfaces.Recently,a number of learning-based AO approaches have been proposed,which bring a new angl...Ambient occlusion(AO)is a widely-used real-time rendering technique which estimates light intensity on visible scene surfaces.Recently,a number of learning-based AO approaches have been proposed,which bring a new angle to solving screen space shading via a unified learning framework with competitive quality and speed.However,most such methods have high error for complex scenes or tend to ignore details.We propose an end-to-end generative adversarial network for the production of realistic AO,and explore the importance of perceptual loss in the generative model to AO accuracy.An attention mechanism is also described to improve the accuracy of details,whose effectiveness is demonstrated on a wide variety of scenes.展开更多
Alzheimer’s disease(AD)is a neurodegenerative disease that severely affects the activities of daily living in aged individuals,which typically needs to be diagnosed at an early stage.Generative adversarial networks(G...Alzheimer’s disease(AD)is a neurodegenerative disease that severely affects the activities of daily living in aged individuals,which typically needs to be diagnosed at an early stage.Generative adversarial networks(GANs)provide a new deep learning method that show good performance in image processing,while it remains to be verified whether a GAN brings benefit in AD diagnosis.The purpose of this research is to systematically review psychoradiological studies on the application of a GAN in the diagnosis of AD from the aspects of classification of AD state and AD-related image processing compared with other methods.In addition,we evaluated the research methodology and provided suggestions from the perspective of clinical application.Compared with othermethods,a GAN has higher accuracy in the classification of AD state and better performance in AD-related image processing(e.g.image denoising and segmentation).Most studies used data from public databases but lacked clinical validation,and the process of quantitative assessment and comparison in these studies lacked clinicians’participation,which may have an impact on the improvement of generation effect and generalization ability of the GAN model.The application value of GANs in the classification of AD state and AD-related image processing has been confirmed in reviewed studies.Improvement methods toward better GAN architecture were also discussed in this paper.In sum,the present study demonstrated advancing diagnostic performance and clinical applicability of GAN for AD,and suggested that the future researchers should consider recruiting clinicians to compare the algorithm with clinician manual methods and evaluate the clinical effect of the algorithm.展开更多
With the increasing demands of health care,the design of hospital buildings has become increasingly demanding and complicated.However,the traditional layout design method for hospital is labor intensive,time consuming...With the increasing demands of health care,the design of hospital buildings has become increasingly demanding and complicated.However,the traditional layout design method for hospital is labor intensive,time consuming and prone to errors.With the development of artificial intelligence(AI),the intelligent design method has become possible and is considered to be suitable for the layout design of hospital buildings.Two intelli-gent design processes based on healthcare systematic layout planning(HSLP)and generative adversarial network(GAN)are proposed in this paper,which aim to solve the generation problem of the plane functional layout of the operating departments(ODs)of general hospitals.The first design method that is more like a mathemati-cal model with traditional optimization algorithm concerns the following two steps:developing the HSLP model based on the conventional systematic layout planning(SLP)theory,identifying the relationship and flows amongst various departments/units,and arriving at the preliminary plane layout design;establishing mathematical model to optimize the building layout by using the genetic algorithm(GA)to obtain the optimized scheme.The specific process of the second intelligent design based on more than 100 sets of collected OD drawings includes:labelling the corresponding functional layouts of each OD plan;building image-to-image translation with conditional ad-versarial network(pix2pix)for training OD plane layouts,which is one of the most representative GAN models.Finally,the functions and features of the results generated by the two methods are analyzed and compared from an architectural and algorithmic perspective.Comparison of the two design methods shows that the HSLP and GAN models can autonomously generate new OD plane functional layouts.The HSLP layouts have clear functional area adjacencies and optimization goals,but the layouts are relatively rigid and not specific enough.The GAN outputs are the most innovative layouts with strong applicability,but the dataset has strict constraints.The goal of this paper is to help release the heavy load of architects in the early design stage and present the effectiveness of these intelligent design methods in the field of medical architecture.展开更多
Nowadays,the fifth-generation(5G)mobile communication system has obtained prosperous development and deployment,reshaping our daily lives.However,anomalies of cell outages and congestion in 5G critically influence the...Nowadays,the fifth-generation(5G)mobile communication system has obtained prosperous development and deployment,reshaping our daily lives.However,anomalies of cell outages and congestion in 5G critically influence the quality of experience and significantly increase operational expenditures.Although several big data and artificial intelligencebased anomaly detection methods have been proposed for wireless cellular systems,they change distributions of the data and ignore the relevance among user activities,causing anomaly detection ineffective for some cells.In this paper,we propose a highly effective and accurate anomaly detection framework by utilizing generative adversarial networks(GAN)and long short-term memory(LSTM)neural networks.The framework expands the original dataset while simultaneously keeping the distribution of data unchanged,and explores the relevance among user activities to further improve the system performance.The results demonstrate that our framework can achieve 97.16%accuracy and 2.30%false positive rate by utilizing the correlation of user activities and data expansion.展开更多
At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these metho...At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data,it also supposes that all data are pre-classified.Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features.The main objective of this research is to develop a new method for incremental learning based on the proposed ant lion fuzzy-generative adversarial network model.The proposed model is implemented in spark architecture.For each data stream,the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation.This method overcomes the limitations of existing methods as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency.The results show that the proposed model outperforms stateof-the-art performance in terms of accuracy(0.861)precision(0.9328)and minimal MSE(0.0416).展开更多
Learning-based approaches have made substantial progress in capturing spatially-varying bidirectional reflectance distribution functions(SVBRDFs)from a single image with unknown lighting and geometry.However,most exis...Learning-based approaches have made substantial progress in capturing spatially-varying bidirectional reflectance distribution functions(SVBRDFs)from a single image with unknown lighting and geometry.However,most existing networks only consider per-pixel losses which limit their capability to recover local features such as smooth glossy regions.A few generative adversarial networks use multiple discriminators for different parameter maps,increasing network complexity.We present a novel end-to-end generative adversarial network(GAN)to recover appearance from a single picture of a nearly-flat surface lit by flash.We use a single unified adversarial framework for each parameter map.An attention module guides the network to focus on details of the maps.Furthermore,the SVBRDF map loss is combined to prevent paying excess attention to specular highlights.We demonstrate and evaluate our method on both public datasets and real data.Quantitative analysis and visual comparisons indicate that our method achieves better results than the state-of-the-art in most cases.展开更多
Deep-Fake is an emerging technology used in synthetic media which manipulates individuals in existing images and videos with someone else’s likeness.This paper presents the comparative study of different deep neural ...Deep-Fake is an emerging technology used in synthetic media which manipulates individuals in existing images and videos with someone else’s likeness.This paper presents the comparative study of different deep neural networks employed for Deep-Fake video detection.In the model,the features from the training data are extracted with the intended Convolution Neural Network model to form feature vectors which are further analysed using a dense layer,a Long Short-Term Memoryand Gated Recurrent by adopting transfer learning with fine tuning for training the models.The model is evaluated to detect Artificial Intelligence based Deep fakes images and videos using benchmark datasets.Comparative analysis shows that the detections are majorly biased towards domain of the dataset but there is a noteworthy improvement in the model performance parameters by using Transfer Learning whereas Convolutional-Recurrent Neural Network has benefits in sequence detection.展开更多
基金supported by the General Program of the National Natural Science Foundation of China(Grant No.61977029).
文摘Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.
文摘Sarcasm detection in text data is an increasingly vital area of research due to the prevalence of sarcastic content in online communication.This study addresses challenges associated with small datasets and class imbalances in sarcasm detection by employing comprehensive data pre-processing and Generative Adversial Network(GAN)based augmentation on diverse datasets,including iSarcasm,SemEval-18,and Ghosh.This research offers a novel pipeline for augmenting sarcasm data with Reverse Generative Adversarial Network(RGAN).The proposed RGAN method works by inverting labels between original and synthetic data during the training process.This inversion of labels provides feedback to the generator for generating high-quality data closely resembling the original distribution.Notably,the proposed RGAN model exhibits performance on par with standard GAN,showcasing its robust efficacy in augmenting text data.The exploration of various datasets highlights the nuanced impact of augmentation on model performance,with cautionary insights into maintaining a delicate balance between synthetic and original data.The methodological framework encompasses comprehensive data pre-processing and GAN-based augmentation,with a meticulous comparison against Natural Language Processing Augmentation(NLPAug)as an alternative augmentation technique.Overall,the F1-score of our proposed technique outperforms that of the synonym replacement augmentation technique using NLPAug.The increase in F1-score in experiments using RGAN ranged from 0.066%to 1.054%,and the use of standard GAN resulted in a 2.88%increase in F1-score.The proposed RGAN model outperformed the NLPAug method and demonstrated comparable performance to standard GAN,emphasizing its efficacy in text data augmentation.
基金supported by Shenzhen Science and Technology Innovation Committee under Grants No. JCYJ20170306170559215 and No. JCYJ20180302153918689。
文摘In this paper,we propose a hybrid model aiming to map the input noise vector to the label of the generated image by the generative adversarial network(GAN).This model mainly consists of a pre-trained deep convolution generative adversarial network(DCGAN)and a classifier.By using the model,we visualize the distribution of two-dimensional input noise,leading to a specific type of the generated image after each training epoch of GAN.The visualization reveals the distribution feature of the input noise vector and the performance of the generator.With this feature,we try to build a guided generator(GG)with the ability to produce a fake image we need.Two methods are proposed to build GG.One is the most significant noise(MSN)method,and the other utilizes labeled noise.The MSN method can generate images precisely but with less variations.In contrast,the labeled noise method has more variations but is slightly less stable.Finally,we propose a criterion to measure the performance of the generator,which can be used as a loss function to effectively train the network.
基金This work was partially supported by National Key R&D Program of China(2019YFB1312400)Shenzhen Key Laboratory of Robotics Perception and Intelligence(ZDSYS20200810171800001)+1 种基金Hong Kong RGC GRF(14200618)Hong Kong RGC CRF(C4063-18G).
文摘Sampling-based path planning is a popular methodology for robot path planning.With a uniform sampling strategy to explore the state space,a feasible path can be found without the complex geometric modeling of the configuration space.However,the quality of the initial solution is not guaranteed,and the convergence speed to the optimal solution is slow.In this paper,we present a novel image-based path planning algorithm to overcome these limitations.Specifically,a generative adversarial network(GAN)is designed to take the environment map(denoted as RGB image)as the input without other preprocessing works.The output is also an RGB image where the promising region(where a feasible path probably exists)is segmented.This promising region is utilized as a heuristic to achieve non-uniform sampling for the path planner.We conduct a number of simulation experiments to validate the effectiveness of the proposed method,and the results demonstrate that our method performs much better in terms of the quality of the initial solution and the convergence speed to the optimal solution.Furthermore,apart from the environments similar to the training set,our method also works well on the environments which are very different from the training set.
基金the National Natural Science Foundation of China(NSFC)(Grant Nos.61572461,61811530282,61872429,11790301 and 11790305).
文摘With aperture synthesis(AS)technique,a number of small antennas can be assembled to form a large telescope whose spatial resolution is determined by the distance of two farthest antennas instead of the diameter of a single-dish antenna.In contrast from a direct imaging system,an AS telescope captures the Fourier coefficients of a spatial object,and then implement inverse Fourier transform to reconstruct the spatial image.Due to the limited number of antennas,the Fourier coefficients are extremely sparse in practice,resulting in a very blurry image.To remove/reduce blur,“CLEAN”deconvolution has been widely used in the literature.However,it was initially designed for a point source.For an extended source,like the Sun,its efficiency is unsatisfactory.In this study,a deep neural network,referring to Generative Adversarial Network(GAN),is proposed for solar image deconvolution.The experimental results demonstrate that the proposed model is markedly better than traditional CLEAN on solar images.The main purpose of this work is visual inspection instead of quantitative scientific computation.We believe that this will also help scientists to better understand solar phenomena with high quality images.
基金supported by Metaverse Lab Program funded by the Ministry of Science and ICT(MSIT),and the Korea Radio Promotion Association(RAPA).
文摘The objective of style transfer is to maintain the content of an image while transferring the style of another image.However,conventional methods face challenges in preserving facial features,especially in Korean portraits where elements like the“Gat”(a traditional Korean hat)are prevalent.This paper proposes a deep learning network designed to perform style transfer that includes the“Gat”while preserving the identity of the face.Unlike traditional style transfer techniques,the proposed method aims to preserve the texture,attire,and the“Gat”in the style image by employing image sharpening and face landmark,with the GAN.The color,texture,and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16,and only the necessary elements during training were preserved using a facial landmark mask.The head area was presented using the eyebrow area to transfer the“Gat”.Furthermore,the identity of the face was retained,and style correlation was considered based on the Gram matrix.To evaluate performance,we introduced a metric using PSNR and SSIM,with an emphasis on median values through new weightings for style transfer in Korean portraits.Additionally,we have conducted a survey that evaluated the content,style,and naturalness of the transferred results,and based on the assessment,we can confidently conclude that our method to maintain the integrity of content surpasses the previous research.Our approach,enriched by landmarks preservation and diverse loss functions,including those related to“Gat”,outperformed previous researches in facial identity preservation.
基金the National Natural Science Foundation of China(NSFC)(52178324).
文摘Concrete subjected to fire loads is susceptible to explosive spalling, which can lead to the exposure of reinforcingsteel bars to the fire, substantially jeopardizing the structural safety and stability. The spalling of fire-loaded concreteis closely related to the evolution of pore pressure and temperature. Conventional analytical methods involve theresolution of complex, strongly coupled multifield equations, necessitating significant computational efforts. Torapidly and accurately obtain the distributions of pore-pressure and temperature, the Pix2Pix model is adoptedin this work, which is celebrated for its capabilities in image generation. The open-source dataset used hereinfeatures RGB images we generated using a sophisticated coupled model, while the grayscale images encapsulate the15 principal variables influencing spalling. After conducting a series of tests with different layers configurations,activation functions and loss functions, the Pix2Pix model suitable for assessing the spalling risk of fire-loadedconcrete has been meticulously designed and trained. The applicability and reliability of the Pix2Pix model inconcrete parameter prediction are verified by comparing its outcomes with those derived fromthe strong couplingTHC model. Notably, for the practical engineering applications, our findings indicate that utilizing monochromeimages as the initial target for analysis yields more dependable results. This work not only offers valuable insightsfor civil engineers specializing in concrete structures but also establishes a robust methodological approach forresearchers seeking to create similar predictive models.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金supported by the National Natural Science Foundation of China under Grants No. 62027803, No. 61701095,No. 61601096, No. 61801089, and No. 61971111the Science and Technology Program of Sichuan under Grants No. 2020YFG0044, No. 2020YFG0046, and No. 2021YFG0200+1 种基金the Science and Technology Program under Grant No.2021-JCJQ-JJ-0949the Defense Industrial Technology Development Program under Grant No. JCKY2020110C041。
文摘The marine biological sonar system evolved in the struggle of nature is far superior to the current artificial sonar. Therefore, the development of bionic underwater concealed detection is of great strategic significance to the military and economy. In this paper, a generative adversarial network(GAN) is trained based on the dolphin vocal sound dataset we constructed, which can achieve unsupervised generation of dolphin vocal sounds with global consistency. Through the analysis of the generated audio samples and the real audio samples in the time domain and the frequency domain, it can be proven that the generated audio samples are close to the real audio samples,which meets the requirements of bionic underwater concealed detection.
文摘Deep learning has reached many successes in Video Processing.Video has become a growing important part of our daily digital interactions.The advancement of better resolution content and the large volume offers serious challenges to the goal of receiving,distributing,compressing and revealing highquality video content.In this paper we propose a novel Effective and Efficient video compression by the Deep Learning framework based on the flask,which creatively combines the Deep Learning Techniques on Convolutional Neural Networks(CNN)and Generative Adversarial Networks(GAN).The video compression method involves the layers are divided into different groups for data processing,using CNN to remove the duplicate frames,repeating the single image instead of the duplicate images by recognizing and detecting minute changes using GAN and recorded with Long Short-Term Memory(LSTM).Instead of the complete image,the small changes generated using GAN are substituted,which helps with frame-level compression.Pixel wise comparison is performed using K-nearest Neighbours(KNN)over the frame,clustered with K-means and Singular Value Decomposition(SVD)is applied for every frame in the video for all three colour channels[Red,Green,Blue]to decrease the dimension of the utility matrix[R,G,B]by extracting its latent factors.Video frames are packed with parameters with the aid of a codec and converted to video format and the results are compared with the original video.Repeated experiments on several videos with different sizes,duration,Frames per second(FPS),and quality results demonstrated a significant resampling rate.On normal,the outcome delivered had around a 10%deviation in quality and over half in size when contrasted,and the original video.
文摘Cancellable biometrics is the solution for the trade-off between two concepts:Biometrics for Security and Security for Biometrics.The cancelable template is stored in the authentication system’s database rather than the original biometric data.In case of the database is compromised,it is easy for the template to be canceled and regenerated from the same biometric data.Recoverability of the cancelable template comes from the diversity of the cancelable transformation parameters(cancelable key).Therefore,the cancelable key must be secret to be used in the system authentication process as a second authentication factor in con-junction with the biometric data.The main contribution of this paper is to tackle the risks of stolen/lost/shared cancelable keys by using biometric trait(in different feature domains)as the only authentication factor,in addition to achieving good performance with high security.The standard Generative Adversarial Network(GAN)is proposed as an encryption tool that needs the cancelable key during the training phase,and the testing phase depends only on the biometric trait.Additionally,random projection transformation is employed to increase the proposed system’s security and performance.The proposed transformation system is tested using the standard ORL face database,and the experiments are done by applying different features domains.Moreover,a security analysis for the proposed transformation system is presented.
文摘Building Integrated Photovoltaics (BIPV) is a promising technology to decarbonize urban energy systems viaharnessing solar energy available on building envelopes. While methods to assess solar irradiation, especiallyon rooftops, are well established, the assessment on building facades usually involves a higher effort due tomore complex urban features and obstructions. The drawback of existing physics-based simulation programsare that they require significant manual modeling effort and computing time for generating time resolveddeterministic results. Yet, solar irradiation is highly intermittent and representing its inherent uncertainty maybe required for designing robust BIPV energy systems. Targeting on these drawbacks, this paper proposes adata-driven model based on Deep Generative Networks (DGN) to efficiently generate stochastic ensembles ofannual hourly solar irradiance time series on building facades with uncompromised spatiotemporal resolutionat the urban scale. The only input required are easily obtainable fisheye images as categorical shading maskscaptured from 3D models. In principle, even actual photographs of urban contexts can be utilized, given they are semantically segmented. The potential of our approach is that it may be applied as a surrogate for timeconsuming simulations, when facing lacking information (e.g., no 3D model exists), and to use the generatedstochastic time-series ensembles in robust energy systems planning. Our validations exemplify a good fidelityof the generated time series when compared to the physics-based simulator. Due to the nature of the usedDGNs, it remains an open challenge to precisely reconstruct the ground truth one-to-one for each hour of theyear. However, we consider the benefits of the approach to outweigh the shortcomings. To demonstrate themodel’s relevance for urban energy planning, we showcase its potential for generative design by parametricallyaltering characteristic features of the urban environment and producing corresponding time series on buildingfacades under different climatic contexts in real-time.
文摘Super-resolution reconstruction in medical imaging has become more demanding due to the necessity of obtaining high-quality images with minimal radiation dose,such as in low-field magnetic resonance imaging(MRI).However,image super-resolution reconstruction remains a difficult task because of the complexity and high textual requirements for diagnosis purpose.In this paper,we offer a deep learning based strategy for reconstructing medical images from low resolutions utilizing Transformer and generative adversarial networks(T-GANs).The integrated system can extract more precise texture information and focus more on important locations through global image matching after successfully inserting Transformer into the generative adversarial network for picture reconstruction.Furthermore,we weighted the combination of content loss,adversarial loss,and adversarial feature loss as the final multi-task loss function during the training of our proposed model T-GAN.In comparison to established measures like peak signal-to-noise ratio(PSNR)and structural similarity index measure(SSIM),our suggested T-GAN achieves optimal performance and recovers more texture features in super-resolution reconstruction of MRI scanned images of the knees and belly.
基金National Natural Science Foundation of China(No.61602416)Shaoxing Science and Technology Bureau Key Project(No.2020B41006)Opening Fund(No.2020WLB10)of the Key Laboratory of Silk Culture Heritage and Product Design Digital Technology。
文摘Ambient occlusion(AO)is a widely-used real-time rendering technique which estimates light intensity on visible scene surfaces.Recently,a number of learning-based AO approaches have been proposed,which bring a new angle to solving screen space shading via a unified learning framework with competitive quality and speed.However,most such methods have high error for complex scenes or tend to ignore details.We propose an end-to-end generative adversarial network for the production of realistic AO,and explore the importance of perceptual loss in the generative model to AO accuracy.An attention mechanism is also described to improve the accuracy of details,whose effectiveness is demonstrated on a wide variety of scenes.
基金supported by grants from National Key Research and Development Project(2018YFC1704605)National Natural Science Foundation of China(81401398)+5 种基金Sichuan Science and Technology Program(2019YJ0049)Sichuan Provincial Health and Family Planning Commission(19PJ080)National College Students’innovation and entrepreneurship training program(C2021116624)Chinese Postdoctoral Science Foundation(2013M530401)Dr Gong was also supported by the US-China joint grant(Grant No.NSFC81761128023)NIH/NIMH R01MH112189-01.
文摘Alzheimer’s disease(AD)is a neurodegenerative disease that severely affects the activities of daily living in aged individuals,which typically needs to be diagnosed at an early stage.Generative adversarial networks(GANs)provide a new deep learning method that show good performance in image processing,while it remains to be verified whether a GAN brings benefit in AD diagnosis.The purpose of this research is to systematically review psychoradiological studies on the application of a GAN in the diagnosis of AD from the aspects of classification of AD state and AD-related image processing compared with other methods.In addition,we evaluated the research methodology and provided suggestions from the perspective of clinical application.Compared with othermethods,a GAN has higher accuracy in the classification of AD state and better performance in AD-related image processing(e.g.image denoising and segmentation).Most studies used data from public databases but lacked clinical validation,and the process of quantitative assessment and comparison in these studies lacked clinicians’participation,which may have an impact on the improvement of generation effect and generalization ability of the GAN model.The application value of GANs in the classification of AD state and AD-related image processing has been confirmed in reviewed studies.Improvement methods toward better GAN architecture were also discussed in this paper.In sum,the present study demonstrated advancing diagnostic performance and clinical applicability of GAN for AD,and suggested that the future researchers should consider recruiting clinicians to compare the algorithm with clinician manual methods and evaluate the clinical effect of the algorithm.
基金the Scientific Research Project of Shanghai Science and Technology Commission(No.18DZ1205603)the Science Research Plan of Shanghai Municipal Science and Technology Committee(No.20DZ1201300)the National Key Research and Development Program of China(No.2017YFC0806100)。
文摘With the increasing demands of health care,the design of hospital buildings has become increasingly demanding and complicated.However,the traditional layout design method for hospital is labor intensive,time consuming and prone to errors.With the development of artificial intelligence(AI),the intelligent design method has become possible and is considered to be suitable for the layout design of hospital buildings.Two intelli-gent design processes based on healthcare systematic layout planning(HSLP)and generative adversarial network(GAN)are proposed in this paper,which aim to solve the generation problem of the plane functional layout of the operating departments(ODs)of general hospitals.The first design method that is more like a mathemati-cal model with traditional optimization algorithm concerns the following two steps:developing the HSLP model based on the conventional systematic layout planning(SLP)theory,identifying the relationship and flows amongst various departments/units,and arriving at the preliminary plane layout design;establishing mathematical model to optimize the building layout by using the genetic algorithm(GA)to obtain the optimized scheme.The specific process of the second intelligent design based on more than 100 sets of collected OD drawings includes:labelling the corresponding functional layouts of each OD plan;building image-to-image translation with conditional ad-versarial network(pix2pix)for training OD plane layouts,which is one of the most representative GAN models.Finally,the functions and features of the results generated by the two methods are analyzed and compared from an architectural and algorithmic perspective.Comparison of the two design methods shows that the HSLP and GAN models can autonomously generate new OD plane functional layouts.The HSLP layouts have clear functional area adjacencies and optimization goals,but the layouts are relatively rigid and not specific enough.The GAN outputs are the most innovative layouts with strong applicability,but the dataset has strict constraints.The goal of this paper is to help release the heavy load of architects in the early design stage and present the effectiveness of these intelligent design methods in the field of medical architecture.
基金supported by National Natural Science Foundation of China under Grant 61772406 and Grant 61941105in part by the projects of the Fundamental Research Funds for the Central Universitiesthe Innovation Fund of Xidian University under Grant 500120109215456。
文摘Nowadays,the fifth-generation(5G)mobile communication system has obtained prosperous development and deployment,reshaping our daily lives.However,anomalies of cell outages and congestion in 5G critically influence the quality of experience and significantly increase operational expenditures.Although several big data and artificial intelligencebased anomaly detection methods have been proposed for wireless cellular systems,they change distributions of the data and ignore the relevance among user activities,causing anomaly detection ineffective for some cells.In this paper,we propose a highly effective and accurate anomaly detection framework by utilizing generative adversarial networks(GAN)and long short-term memory(LSTM)neural networks.The framework expands the original dataset while simultaneously keeping the distribution of data unchanged,and explores the relevance among user activities to further improve the system performance.The results demonstrate that our framework can achieve 97.16%accuracy and 2.30%false positive rate by utilizing the correlation of user activities and data expansion.
基金Taif University Researchers Supporting Project Number(TURSP-2020/126),Taif University,Taif,Saudi Arabia.
文摘At this current time,data stream classification plays a key role in big data analytics due to its enormous growth.Most of the existing classification methods used ensemble learning,which is trustworthy but these methods are not effective to face the issues of learning from imbalanced big data,it also supposes that all data are pre-classified.Another weakness of current methods is that it takes a long evaluation time when the target data stream contains a high number of features.The main objective of this research is to develop a new method for incremental learning based on the proposed ant lion fuzzy-generative adversarial network model.The proposed model is implemented in spark architecture.For each data stream,the class output is computed at slave nodes by training a generative adversarial network with the back propagation error based on fuzzy bound computation.This method overcomes the limitations of existing methods as it can classify data streams that are slightly or completely unlabeled data and providing high scalability and efficiency.The results show that the proposed model outperforms stateof-the-art performance in terms of accuracy(0.861)precision(0.9328)and minimal MSE(0.0416).
基金supported by the National Natural Science Foundation of China(No.61602416)Shaoxing Science and Technology Plan Project(No.2020B41006).
文摘Learning-based approaches have made substantial progress in capturing spatially-varying bidirectional reflectance distribution functions(SVBRDFs)from a single image with unknown lighting and geometry.However,most existing networks only consider per-pixel losses which limit their capability to recover local features such as smooth glossy regions.A few generative adversarial networks use multiple discriminators for different parameter maps,increasing network complexity.We present a novel end-to-end generative adversarial network(GAN)to recover appearance from a single picture of a nearly-flat surface lit by flash.We use a single unified adversarial framework for each parameter map.An attention module guides the network to focus on details of the maps.Furthermore,the SVBRDF map loss is combined to prevent paying excess attention to specular highlights.We demonstrate and evaluate our method on both public datasets and real data.Quantitative analysis and visual comparisons indicate that our method achieves better results than the state-of-the-art in most cases.
文摘Deep-Fake is an emerging technology used in synthetic media which manipulates individuals in existing images and videos with someone else’s likeness.This paper presents the comparative study of different deep neural networks employed for Deep-Fake video detection.In the model,the features from the training data are extracted with the intended Convolution Neural Network model to form feature vectors which are further analysed using a dense layer,a Long Short-Term Memoryand Gated Recurrent by adopting transfer learning with fine tuning for training the models.The model is evaluated to detect Artificial Intelligence based Deep fakes images and videos using benchmark datasets.Comparative analysis shows that the detections are majorly biased towards domain of the dataset but there is a noteworthy improvement in the model performance parameters by using Transfer Learning whereas Convolutional-Recurrent Neural Network has benefits in sequence detection.