The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product revie...The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product reviews;however,to react to such reviews,extracting aspects of the entity to which these reviews belong is equally important.Aspect-based Sentiment Analysis(ABSA)refers to aspects extracted from an opinionated text.The literature proposes different approaches for ABSA;however,most research is focused on supervised approaches,which require labeled datasets with manual sentiment polarity labeling and aspect tagging.This study proposes a semisupervised approach with minimal human supervision to extract aspect terms by detecting the aspect categories.Hence,the study deals with two main sub-tasks in ABSA,named Aspect Category Detection(ACD)and Aspect Term Extraction(ATE).In the first sub-task,aspects categories are extracted using topic modeling and filtered by an oracle further,and it is fed to zero-shot learning as the prompts and the augmented text.The predicted categories are the input to find similar phrases curated with extracting meaningful phrases(e.g.,Nouns,Proper Nouns,NER(Named Entity Recognition)entities)to detect the aspect terms.The study sets a baseline accuracy for two main sub-tasks in ABSA on the Multi-Aspect Multi-Sentiment(MAMS)dataset along with SemEval-2014 Task 4 subtask 1 to show that the proposed approach helps detect aspect terms via aspect categories.展开更多
The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the...The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.展开更多
Live video streaming is one of the newly emerged services over the Internet that has attracted immense interest of the service providers.Since Internet was not designed for such services during its inception,such a se...Live video streaming is one of the newly emerged services over the Internet that has attracted immense interest of the service providers.Since Internet was not designed for such services during its inception,such a service poses some serious challenges including cost and scalability.Peer-to-Peer(P2P)Internet Protocol Television(IPTV)is an application-level distributed paradigm to offer live video contents.In terms of ease of deployment,it has emerged as a serious alternative to client server,Content Delivery Network(CDN)and IP multicast solutions.Nevertheless,P2P approach has struggled to provide the desired streaming quality due to a number of issues.Stability of peers in a network is one of themajor issues among these.Most of the existing approaches address this issue through older-stable principle.This paper first extensively investigates the older-stable principle to observe its validity in different scenarios.It is observed that the older-stable principle does not hold in several of them.Then,it utilizes machine learning approach to predict the stability of peers.This work evaluates the accuracy of severalmachine learning algorithms over the prediction of stability,where the Gradient Boosting Regressor(GBR)out-performs other algorithms.Finally,this work presents a proof-of-concept simulation to compare the effectiveness of older-stable rule and machine learning-based predictions for the stabilization of the overlay.The results indicate that machine learning-based stability estimation significantly improves the system.展开更多
Sentiment analysis task has widely been studied for various languages such as English and French.However,Roman Urdu sentiment analysis yet requires more attention from peer-researchers due to the lack of Off-the-Shelf...Sentiment analysis task has widely been studied for various languages such as English and French.However,Roman Urdu sentiment analysis yet requires more attention from peer-researchers due to the lack of Off-the-Shelf Natural Language Processing(NLP)solutions.The primary objective of this study is to investigate the diverse machine learning methods for the sentiment analysis of Roman Urdu data which is very informal in nature and needs to be lexically normalized.To mitigate this challenge,we propose a fine-tuned Support Vector Machine(SVM)powered by Roman Urdu Stemmer.In our proposed scheme,the corpus data is initially cleaned to remove the anomalies from the text.After initial pre-processing,each user review is being stemmed.The input text is transformed into a feature vector using the bag-of-word model.Subsequently,the SVM is used to classify and detect user sentiment.Our proposed scheme is based on a dictionary based Roman Urdu stemmer.The creation of the Roman Urdu stemmer is aimed at standardizing the text so as to minimize the level of complexity.The efficacy of our proposed model is also empirically evaluated with diverse experimental configurations,so as to fine-tune the hyper-parameters and achieve superior performance.Moreover,a series of experiments are conducted on diverse machine learning and deep learning models to compare the performance with our proposed model.We also introduced the largest dataset on Roman Urdu,i.e.,Roman Urdu e-commerce dataset(RUECD),which contains 26K+user reviews annotated by the group of experts.The RUECD is challenging and the largest dataset available of Roman Urdu.The experiments show that the newly generated dataset is quite challenging and requires more attention from the peer researchers for Roman Urdu sentiment analysis.展开更多
文摘The Internet has become one of the significant sources for sharing information and expressing users’opinions about products and their interests with the associated aspects.It is essential to learn about product reviews;however,to react to such reviews,extracting aspects of the entity to which these reviews belong is equally important.Aspect-based Sentiment Analysis(ABSA)refers to aspects extracted from an opinionated text.The literature proposes different approaches for ABSA;however,most research is focused on supervised approaches,which require labeled datasets with manual sentiment polarity labeling and aspect tagging.This study proposes a semisupervised approach with minimal human supervision to extract aspect terms by detecting the aspect categories.Hence,the study deals with two main sub-tasks in ABSA,named Aspect Category Detection(ACD)and Aspect Term Extraction(ATE).In the first sub-task,aspects categories are extracted using topic modeling and filtered by an oracle further,and it is fed to zero-shot learning as the prompts and the augmented text.The predicted categories are the input to find similar phrases curated with extracting meaningful phrases(e.g.,Nouns,Proper Nouns,NER(Named Entity Recognition)entities)to detect the aspect terms.The study sets a baseline accuracy for two main sub-tasks in ABSA on the Multi-Aspect Multi-Sentiment(MAMS)dataset along with SemEval-2014 Task 4 subtask 1 to show that the proposed approach helps detect aspect terms via aspect categories.
文摘The optical character recognition for the right to left and cursive languages such as Arabic is challenging and received little attention from researchers in the past compared to the other Latin languages.Moreover,the absence of a standard publicly available dataset for several low-resource lan-guages,including the Pashto language remained a hurdle in the advancement of language processing.Realizing that,a clean dataset is the fundamental and core requirement of character recognition,this research begins with dataset generation and aims at a system capable of complete language understanding.Keeping in view the complete and full autonomous recognition of the cursive Pashto script.The first achievement of this research is a clean and standard dataset for the isolated characters of the Pashto script.In this paper,a database of isolated Pashto characters for forty four alphabets using various font styles has been introduced.In order to overcome the font style shortage,the graphical software Inkscape has been used to generate sufficient image data samples for each character.The dataset has been pre-processed and reduced in dimensions to 32×32 pixels,and further converted into the binary format with a black background and white text so that it resembles the Modified National Institute of Standards and Technology(MNIST)database.The benchmark database is publicly available for further research on the standard GitHub and Kaggle database servers both in pixel and Comma Separated Values(CSV)formats.
文摘Live video streaming is one of the newly emerged services over the Internet that has attracted immense interest of the service providers.Since Internet was not designed for such services during its inception,such a service poses some serious challenges including cost and scalability.Peer-to-Peer(P2P)Internet Protocol Television(IPTV)is an application-level distributed paradigm to offer live video contents.In terms of ease of deployment,it has emerged as a serious alternative to client server,Content Delivery Network(CDN)and IP multicast solutions.Nevertheless,P2P approach has struggled to provide the desired streaming quality due to a number of issues.Stability of peers in a network is one of themajor issues among these.Most of the existing approaches address this issue through older-stable principle.This paper first extensively investigates the older-stable principle to observe its validity in different scenarios.It is observed that the older-stable principle does not hold in several of them.Then,it utilizes machine learning approach to predict the stability of peers.This work evaluates the accuracy of severalmachine learning algorithms over the prediction of stability,where the Gradient Boosting Regressor(GBR)out-performs other algorithms.Finally,this work presents a proof-of-concept simulation to compare the effectiveness of older-stable rule and machine learning-based predictions for the stabilization of the overlay.The results indicate that machine learning-based stability estimation significantly improves the system.
基金the Deputy for Study and Innovation,Ministry of Education,Kingdom of Saudi Arabia,for funding this research through a Grant(NU/IFC/INT/01/008)from the Najran University Institutional Funding Committee.
文摘Sentiment analysis task has widely been studied for various languages such as English and French.However,Roman Urdu sentiment analysis yet requires more attention from peer-researchers due to the lack of Off-the-Shelf Natural Language Processing(NLP)solutions.The primary objective of this study is to investigate the diverse machine learning methods for the sentiment analysis of Roman Urdu data which is very informal in nature and needs to be lexically normalized.To mitigate this challenge,we propose a fine-tuned Support Vector Machine(SVM)powered by Roman Urdu Stemmer.In our proposed scheme,the corpus data is initially cleaned to remove the anomalies from the text.After initial pre-processing,each user review is being stemmed.The input text is transformed into a feature vector using the bag-of-word model.Subsequently,the SVM is used to classify and detect user sentiment.Our proposed scheme is based on a dictionary based Roman Urdu stemmer.The creation of the Roman Urdu stemmer is aimed at standardizing the text so as to minimize the level of complexity.The efficacy of our proposed model is also empirically evaluated with diverse experimental configurations,so as to fine-tune the hyper-parameters and achieve superior performance.Moreover,a series of experiments are conducted on diverse machine learning and deep learning models to compare the performance with our proposed model.We also introduced the largest dataset on Roman Urdu,i.e.,Roman Urdu e-commerce dataset(RUECD),which contains 26K+user reviews annotated by the group of experts.The RUECD is challenging and the largest dataset available of Roman Urdu.The experiments show that the newly generated dataset is quite challenging and requires more attention from the peer researchers for Roman Urdu sentiment analysis.