Nowadays,the personalized recommendation has become a research hotspot for addressing information overload.Despite this,generating effective recommendations from sparse data remains a challenge.Recently,auxiliary info...Nowadays,the personalized recommendation has become a research hotspot for addressing information overload.Despite this,generating effective recommendations from sparse data remains a challenge.Recently,auxiliary information has been widely used to address data sparsity,but most models using auxiliary information are linear and have limited expressiveness.Due to the advantages of feature extraction and no-label requirements,autoencoder-based methods have become quite popular.However,most existing autoencoder-based methods discard the reconstruction of auxiliary information,which poses huge challenges for better representation learning and model scalability.To address these problems,we propose Serial-Autoencoder for Personalized Recommendation(SAPR),which aims to reduce the loss of critical information and enhance the learning of feature representations.Specifically,we first combine the original rating matrix and item attribute features and feed them into the first autoencoder for generating a higher-level representation of the input.Second,we use a second autoencoder to enhance the reconstruction of the data representation of the prediciton rating matrix.The output rating information is used for recommendation prediction.Extensive experiments on the MovieTweetings and MovieLens datasets have verified the effectiveness of SAPR compared to state-of-the-art models.展开更多
The purpose of unsupervised domain adaptation is to use the knowledge of the source domain whose data distribution is different from that of the target domain for promoting the learning task in the target domain.The k...The purpose of unsupervised domain adaptation is to use the knowledge of the source domain whose data distribution is different from that of the target domain for promoting the learning task in the target domain.The key bottleneck in unsupervised domain adaptation is how to obtain higher-level and more abstract feature representations between source and target domains which can bridge the chasm of domain discrepancy.Recently,deep learning methods based on autoencoder have achieved sound performance in representation learning,and many dual or serial autoencoderbased methods take different characteristics of data into consideration for improving the effectiveness of unsupervised domain adaptation.However,most existing methods of autoencoders just serially connect the features generated by different autoencoders,which pose challenges for the discriminative representation learning and fail to find the real cross-domain features.To address this problem,we propose a novel representation learning method based on an integrated autoencoders for unsupervised domain adaptation,called IAUDA.To capture the inter-and inner-domain features of the raw data,two different autoencoders,which are the marginalized autoencoder with maximum mean discrepancy(mAE)and convolutional autoencoder(CAE)respectively,are proposed to learn different feature representations.After higher-level features are obtained by these two different autoencoders,a sparse autoencoder is introduced to compact these inter-and inner-domain representations.In addition,a whitening layer is embedded for features processed before the mAE to reduce redundant features inside a local area.Experimental results demonstrate the effectiveness of our proposed method compared with several state-of-the-art baseline methods.展开更多
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based mach...Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.展开更多
1 Introduction Lexical simplification(LS)aims to simplify a sentence by replacing complex words with simpler words without changing the meaning of the sentence,which can facilitate comprehension of the text for people...1 Introduction Lexical simplification(LS)aims to simplify a sentence by replacing complex words with simpler words without changing the meaning of the sentence,which can facilitate comprehension of the text for people with non-native speakers and children.Traditional LS methods utilize linguistic databases(e.g.,WordNet)[1]or word embedding models[2]to extract synonyms or high-similar words for the complex word,and then sort them based on their appropriateness in context.Recently,BERT-based LS methods[3,4]entirely or partially mask the complex word of the original sentence,and then feed the sentence into pretrained modeling BERT[5]to obtain the top probability tokens corresponding to the masked word as the substitute candidates.They have made remarkable progress in generating substitutes by making full use of the context information of complex words,that can effectively alleviate the shortcomings of traditional methods.展开更多
1 Introduction Recent advancements in encoder-decoder based text generation technology,like ChatGPT by OpenAI,and PaLM[1]by Google,have garnered attention in the AI community.Pay-per-use APIs offer access to these mod...1 Introduction Recent advancements in encoder-decoder based text generation technology,like ChatGPT by OpenAI,and PaLM[1]by Google,have garnered attention in the AI community.Pay-per-use APIs offer access to these models,but research shows they are prone to imitation attacks,where malicious users train their models through skillfully crafted queries to get responses from lawful APIs.Such attacks violate the intellectual property(IP)and deter further research[2].Recent work introduced lexical watermarking(LW)methods to protect legal APIs’IP.LW modifies the original outputs and uses null-hypothesis test for ownership verification on imitation models[2,3].High-frequency words are selected,and WordNet synonyms replace them,but this one-size-fits-all approach neglects rational substitutes.展开更多
基金National Natural Science Foundation of China(Grant Nos.61906060,62076217,and 62120106008)National Key R&D Program of China(No.2016YFC0801406)Natural Science Foundation of the Jiangsu Higher Education Institutions(No.20KJB520007).
文摘Nowadays,the personalized recommendation has become a research hotspot for addressing information overload.Despite this,generating effective recommendations from sparse data remains a challenge.Recently,auxiliary information has been widely used to address data sparsity,but most models using auxiliary information are linear and have limited expressiveness.Due to the advantages of feature extraction and no-label requirements,autoencoder-based methods have become quite popular.However,most existing autoencoder-based methods discard the reconstruction of auxiliary information,which poses huge challenges for better representation learning and model scalability.To address these problems,we propose Serial-Autoencoder for Personalized Recommendation(SAPR),which aims to reduce the loss of critical information and enhance the learning of feature representations.Specifically,we first combine the original rating matrix and item attribute features and feed them into the first autoencoder for generating a higher-level representation of the input.Second,we use a second autoencoder to enhance the reconstruction of the data representation of the prediciton rating matrix.The output rating information is used for recommendation prediction.Extensive experiments on the MovieTweetings and MovieLens datasets have verified the effectiveness of SAPR compared to state-of-the-art models.
基金supported by the National Natural Science Foundation of China(Grant Nos.61906060,62076217,62120106008)the Yangzhou University Interdisciplinary Research Foundation for Animal Husbandry Discipline of Targeted Support(yzuxk202015)+1 种基金the Opening Foundation of Key Laboratory of Huizhou Architecture in Anhui Province(HPJZ-2020-02)the Open Project Program of Joint International Research Laboratory of Agriculture and AgriProduct Safety(JILAR-KF202104).
文摘The purpose of unsupervised domain adaptation is to use the knowledge of the source domain whose data distribution is different from that of the target domain for promoting the learning task in the target domain.The key bottleneck in unsupervised domain adaptation is how to obtain higher-level and more abstract feature representations between source and target domains which can bridge the chasm of domain discrepancy.Recently,deep learning methods based on autoencoder have achieved sound performance in representation learning,and many dual or serial autoencoderbased methods take different characteristics of data into consideration for improving the effectiveness of unsupervised domain adaptation.However,most existing methods of autoencoders just serially connect the features generated by different autoencoders,which pose challenges for the discriminative representation learning and fail to find the real cross-domain features.To address this problem,we propose a novel representation learning method based on an integrated autoencoders for unsupervised domain adaptation,called IAUDA.To capture the inter-and inner-domain features of the raw data,two different autoencoders,which are the marginalized autoencoder with maximum mean discrepancy(mAE)and convolutional autoencoder(CAE)respectively,are proposed to learn different feature representations.After higher-level features are obtained by these two different autoencoders,a sparse autoencoder is introduced to compact these inter-and inner-domain representations.In addition,a whitening layer is embedded for features processed before the mAE to reduce redundant features inside a local area.Experimental results demonstrate the effectiveness of our proposed method compared with several state-of-the-art baseline methods.
基金supported by the National Natural Science Foundation of China(Grant Nos.62076217 and 61906060)and the Program for Changjiang Scholars and Innovative Research Team in University(PCSIRT)of the Ministry of Education,China(IRT17R32).
文摘Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.
基金supported by the National Natural Science Foundation of China(Grant Nos.62076217 and 61906060)the Blue Project of Yangzhou University.
文摘1 Introduction Lexical simplification(LS)aims to simplify a sentence by replacing complex words with simpler words without changing the meaning of the sentence,which can facilitate comprehension of the text for people with non-native speakers and children.Traditional LS methods utilize linguistic databases(e.g.,WordNet)[1]or word embedding models[2]to extract synonyms or high-similar words for the complex word,and then sort them based on their appropriateness in context.Recently,BERT-based LS methods[3,4]entirely or partially mask the complex word of the original sentence,and then feed the sentence into pretrained modeling BERT[5]to obtain the top probability tokens corresponding to the masked word as the substitute candidates.They have made remarkable progress in generating substitutes by making full use of the context information of complex words,that can effectively alleviate the shortcomings of traditional methods.
基金This research was partially supported by the National Natural Science Foundation of China(Grant Nos.62076217 and U22B2037)the Blue Project of Yangzhou University.
文摘1 Introduction Recent advancements in encoder-decoder based text generation technology,like ChatGPT by OpenAI,and PaLM[1]by Google,have garnered attention in the AI community.Pay-per-use APIs offer access to these models,but research shows they are prone to imitation attacks,where malicious users train their models through skillfully crafted queries to get responses from lawful APIs.Such attacks violate the intellectual property(IP)and deter further research[2].Recent work introduced lexical watermarking(LW)methods to protect legal APIs’IP.LW modifies the original outputs and uses null-hypothesis test for ownership verification on imitation models[2,3].High-frequency words are selected,and WordNet synonyms replace them,but this one-size-fits-all approach neglects rational substitutes.