CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferrin...CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary ver...Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA’s contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing.展开更多
Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess...Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.展开更多
Whale optimization algorithm(WOA)tends to fall into the local optimum and fails to converge quickly in solving complex problems.To address the shortcomings,an improved WOA(QGBWOA)is proposed in this work.First,quasi-o...Whale optimization algorithm(WOA)tends to fall into the local optimum and fails to converge quickly in solving complex problems.To address the shortcomings,an improved WOA(QGBWOA)is proposed in this work.First,quasi-opposition-based learning is introduced to enhance the ability of WOA to search for optimal solutions.Second,a Gaussian barebone mechanism is embedded to promote diversity and expand the scope of the solution space in WOA.To verify the advantages of QGBWOA,comparison experiments between QGBWOA and its comparison peers were carried out on CEC 2014 with dimensions 10,30,50,and 100 and on CEC 2020 test with dimension 30.Furthermore,the performance results were tested using Wilcoxon signed-rank(WS),Friedman test,and post hoc statistical tests for statistical analysis.Convergence accuracy and speed are remarkably improved,as shown by experimental results.Finally,feature selection and multi-threshold image segmentation applications are demonstrated to validate the ability of QGBWOA to solve complex real-world problems.QGBWOA proves its superiority over compared algorithms in feature selection and multi-threshold image segmentation by performing several evaluation metrics.展开更多
Gait is a biological typical that defines the method by that people walk.Walking is the most significant performance which keeps our day-to-day life and physical condition.Surface electromyography(sEMG)is a weak bioel...Gait is a biological typical that defines the method by that people walk.Walking is the most significant performance which keeps our day-to-day life and physical condition.Surface electromyography(sEMG)is a weak bioelectric signal that portrays the functional state between the human muscles and nervous system to any extent.Gait classifiers dependent upon sEMG signals are extremely utilized in analysing muscle diseases and as a guide path for recovery treatment.Several approaches are established in the works for gait recognition utilizing conventional and deep learning(DL)approaches.This study designs an Enhanced Artificial Algae Algorithm with Hybrid Deep Learning based Human Gait Classification(EAAA-HDLGR)technique on sEMG signals.The EAAA-HDLGR technique extracts the time domain(TD)and frequency domain(FD)features from the sEMG signals and is fused.In addition,the EAAA-HDLGR technique exploits the hybrid deep learning(HDL)model for gait recognition.At last,an EAAA-based hyperparameter optimizer is applied for the HDL model,which is mainly derived from the quasi-oppositional based learning(QOBL)concept,showing the novelty of the work.A brief classifier outcome of the EAAA-HDLGR technique is examined under diverse aspects,and the results indicate improving the EAAA-HDLGR technique.The results imply that the EAAA-HDLGR technique accomplishes improved results with the inclusion of EAAA on gait recognition.展开更多
Web application fingerprint recognition is an effective security technology designed to identify and classify web applications,thereby enhancing the detection of potential threats and attacks.Traditional fingerprint r...Web application fingerprint recognition is an effective security technology designed to identify and classify web applications,thereby enhancing the detection of potential threats and attacks.Traditional fingerprint recognition methods,which rely on preannotated feature matching,face inherent limitations due to the ever-evolving nature and diverse landscape of web applications.In response to these challenges,this work proposes an innovative web application fingerprint recognition method founded on clustering techniques.The method involves extensive data collection from the Tranco List,employing adjusted feature selection built upon Wappalyzer and noise reduction through truncated SVD dimensionality reduction.The core of the methodology lies in the application of the unsupervised OPTICS clustering algorithm,eliminating the need for preannotated labels.By transforming web applications into feature vectors and leveraging clustering algorithms,our approach accurately categorizes diverse web applications,providing comprehensive and precise fingerprint recognition.The experimental results,which are obtained on a dataset featuring various web application types,affirm the efficacy of the method,demonstrating its ability to achieve high accuracy and broad coverage.This novel approach not only distinguishes between different web application types effectively but also demonstrates superiority in terms of classification accuracy and coverage,offering a robust solution to the challenges of web application fingerprint recognition.展开更多
As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs...As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.展开更多
Pavement crack detection plays a crucial role in ensuring road safety and reducing maintenance expenses.Recent advancements in deep learning(DL)techniques have shown promising results in detecting pavement cracks;howe...Pavement crack detection plays a crucial role in ensuring road safety and reducing maintenance expenses.Recent advancements in deep learning(DL)techniques have shown promising results in detecting pavement cracks;however,the selection of relevant features for classification remains challenging.In this study,we propose a new approach for pavement crack detection that integrates deep learning for feature extraction,the whale optimization algorithm(WOA)for feature selection,and random forest(RF)for classification.The performance of the models was evaluated using accuracy,recall,precision,F1 score,and area under the receiver operating characteristic curve(AUC).Our findings reveal that Model 2,which incorporates RF into the ResNet-18 architecture,outperforms baseline Model 1 across all evaluation metrics.Nevertheless,our proposed model,which combines ResNet-18 with both WOA and RF,achieves significantly higher accuracy,recall,precision,and F1 score compared to the other two models.These results underscore the effectiveness of integrating RF and WOA into ResNet-18 for pavement crack detection applications.We applied the proposed approach to a dataset of pavement images,achieving an accuracy of 97.16%and an AUC of 0.984.Our results demonstrate that the proposed approach surpasses existing methods for pavement crack detection,offering a promising solution for the automatic identification of pavement cracks.By leveraging this approach,potential safety hazards can be identified more effectively,enabling timely repairs and maintenance measures.Lastly,the findings of this study also emphasize the potential of integrating RF and WOA with deep learning for pavement crack detection,providing road authorities with the necessary tools to make informed decisions regarding road infrastructure maintenance.展开更多
Cryptocurrency price prediction has garnered significant attention due to the growing importance of digital assets in the financial landscape. This paper presents a comprehensive study on predicting future cryptocurre...Cryptocurrency price prediction has garnered significant attention due to the growing importance of digital assets in the financial landscape. This paper presents a comprehensive study on predicting future cryptocurrency prices using machine learning algorithms. Open-source historical data from various cryptocurrency exchanges is utilized. Interpolation techniques are employed to handle missing data, ensuring the completeness and reliability of the dataset. Four technical indicators are selected as features for prediction. The study explores the application of five machine learning algorithms to capture the complex patterns in the highly volatile cryptocurrency market. The findings demonstrate the strengths and limitations of the different approaches, highlighting the significance of feature engineering and algorithm selection in achieving accurate cryptocurrency price predictions. The research contributes valuable insights into the dynamic and rapidly evolving field of cryptocurrency price prediction, assisting investors and traders in making informed decisions amidst the challenges posed by the cryptocurrency market.展开更多
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor...In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.展开更多
Multi-class classification can be solved by decomposing it into a set of binary classification problems according to some encoding rules,e.g.,one-vs-one,one-vs-rest,error-correcting output codes.Existing works solve t...Multi-class classification can be solved by decomposing it into a set of binary classification problems according to some encoding rules,e.g.,one-vs-one,one-vs-rest,error-correcting output codes.Existing works solve these binary classification problems in the original feature space,while it might be suboptimal as different binary classification problems correspond to different positive and negative examples.In this paper,we propose to learn label-specific features for each decomposed binary classification problem to consider the specific characteristics containing in its positive and negative examples.Specifically,to generate the label-specific features,clustering analysis is respectively conducted on the positive and negative examples in each decomposed binary data set to discover their inherent information and then label-specific features for one example are obtained by measuring the similarity between it and all cluster centers.Experiments clearly validate the effectiveness of learning label-specific features for decomposition-based multi-class classification.展开更多
In the present work,we have employed machine learning(ML)techniques to evaluate ductile-brittle(DB)behaviors in intermetallic compounds(IMCs)which can form magnesium(Mg)alloys.This procedure was mainly conducted by a ...In the present work,we have employed machine learning(ML)techniques to evaluate ductile-brittle(DB)behaviors in intermetallic compounds(IMCs)which can form magnesium(Mg)alloys.This procedure was mainly conducted by a proxy-based method,where the ratio of shear(G)/bulk(B)moduli was used as a proxy to identify whether the compound is ductile or brittle.Starting from compounds information(composition and crystal structure)and their moduli,as found in open databases(AFLOW),ML-based models were built,and those models were used to predict the moduli in other compounds,and accordingly,to foresee the ductile-brittle behaviors of these new compounds.The results reached in the present work showed that the built models can effectively catch the elastic moduli of new compounds.This was confirmed through moduli calculations done by density functional theory(DFT)on some compounds,where the DFT calculations were consistent with the ML prediction.A further confirmation on the reliability of the built ML models was considered through relating between the DB behavior in MgBe_(13) and MgPd_(2),as evaluated by the ML-predicted moduli,and the nature of chemical bonding in these two compounds,which in turn,was investigated by the charge density distribution(CDD)and electron localization function(ELF)obtained by DFT methodology.The ML-evaluated DB behaviors of the two compounds was also consistent with the DFT calculations of CDD and ELF.These findings and confirmations gave legitimacy to the built model to be employed in further prediction processes.Indeed,as examples,the DB characteristics were investigated in IMCs that might from in three Mg alloy series,involving AZ,ZX and WE.展开更多
Pulmonary Hypertension(PH)is a global health problem that affects about 1%of the global population.Animal models of PH play a vital role in unraveling the pathophysiological mechanisms of the disease.The present study...Pulmonary Hypertension(PH)is a global health problem that affects about 1%of the global population.Animal models of PH play a vital role in unraveling the pathophysiological mechanisms of the disease.The present study proposes a Kernel Extreme Learning Machine(KELM)model based on an improved Whale Optimization Algorithm(WOA)for predicting PH mouse models.The experimental results showed that the selected blood indicators,including Haemoglobin(HGB),Hematocrit(HCT),Mean,Platelet Volume(MPV),Platelet distribution width(PDW),and Platelet–Large Cell Ratio(P-LCR),were essential for identifying PH mouse models using the feature selection method proposed in this paper.Remarkably,the method achieved 100.0%accuracy and 100.0%specificity in classification,demonstrating that our method has great potential to be used for evaluating and identifying mouse PH models.展开更多
Metal flat surface in-line surface defect detection is notoriously difficult due to obstacles such as high surface reflectivity,pseudo-defect interference,and random elastic deformation.This study evaluates the approa...Metal flat surface in-line surface defect detection is notoriously difficult due to obstacles such as high surface reflectivity,pseudo-defect interference,and random elastic deformation.This study evaluates the approach for detecting scratches on a metal surface in order to address a problem in the detection process.This paper proposes an improved Gauss-Laplace(LoG)operator combined with a deep learning technique for metal surface scratch identification in order to solve the difficulties that it is challenging to reduce noise and that the edges are unclear when utilizing existing edge detection algorithms.In the process of scratch identification,it is challenging to differentiate between the scratch edge and the interference edge.Therefore,local texture screening is utilized by deep learning techniques that evaluate and identify scratch edges and interference edges based on the local texture characteristics of scratches.Experiments have proven that by combining the improved LoG operator with a deep learning strategy,it is able to effectively detect image edges,distinguish between scratch edges and interference edges,and identify clear scratch information.Experiments based on the six categories of meta scratches indicate that the proposedmethod has achieved rolled-in crazing(100%),inclusion(94.4%),patches(100%),pitted(100%),rolled(100%),and scratches(100%),respectively.展开更多
Autism Spectrum Disorder (ASD) refers to a neuro-disorder wherean individual has long-lasting effects on communication and interaction withothers.Advanced information technologywhich employs artificial intelligence(AI...Autism Spectrum Disorder (ASD) refers to a neuro-disorder wherean individual has long-lasting effects on communication and interaction withothers.Advanced information technologywhich employs artificial intelligence(AI) model has assisted in early identify ASD by using pattern detection.Recent advances of AI models assist in the automated identification andclassification of ASD, which helps to reduce the severity of the disease.This study introduces an automated ASD classification using owl searchalgorithm with machine learning (ASDC-OSAML) model. The proposedASDC-OSAML model majorly focuses on the identification and classificationof ASD. To attain this, the presentedASDC-OSAML model follows minmaxnormalization approach as a pre-processing stage. Next, the owl searchalgorithm (OSA)-based feature selection (OSA-FS) model is used to derivefeature subsets. Then, beetle swarm antenna search (BSAS) algorithm withIterative Dichotomiser 3 (ID3) classification method was implied for ASDdetection and classification. The design of BSAS algorithm helps to determinethe parameter values of the ID3 classifier. The performance analysis of theASDC-OSAML model is performed using benchmark dataset. An extensivecomparison study highlighted the supremacy of the ASDC-OSAML modelover recent state of art approaches.展开更多
With recent advancements in information and communication technology,a huge volume of corporate and sensitive user data was shared consistently across the network,making it vulnerable to an attack that may be brought ...With recent advancements in information and communication technology,a huge volume of corporate and sensitive user data was shared consistently across the network,making it vulnerable to an attack that may be brought some factors under risk:data availability,confidentiality,and integrity.Intrusion Detection Systems(IDS)were mostly exploited in various networks to help promptly recognize intrusions.Nowadays,blockchain(BC)technology has received much more interest as a means to share data without needing a trusted third person.Therefore,this study designs a new Blockchain Assisted Optimal Machine Learning based Cyberattack Detection and Classification(BAOML-CADC)technique.In the BAOML-CADC technique,the major focus lies in identifying cyberattacks.To do so,the presented BAOML-CADC technique applies a thermal equilibrium algorithm-based feature selection(TEA-FS)method for the optimal choice of features.The BAOML-CADC technique uses an extreme learning machine(ELM)model for cyberattack recognition.In addition,a BC-based integrity verification technique is developed to defend against the misrouting attack,showing the innovation of the work.The experimental validation of BAOML-CADC algorithm is tested on a benchmark cyberattack dataset.The obtained values implied the improved performance of the BAOML-CADC algorithm over other techniques.展开更多
Nowadays,Internet of Things(IoT)has penetrated all facets of human life while on the other hand,IoT devices are heavily prone to cyberattacks.It has become important to develop an accurate system that can detect malic...Nowadays,Internet of Things(IoT)has penetrated all facets of human life while on the other hand,IoT devices are heavily prone to cyberattacks.It has become important to develop an accurate system that can detect malicious attacks on IoT environments in order to mitigate security risks.Botnet is one of the dreadfulmalicious entities that has affected many users for the past few decades.It is challenging to recognize Botnet since it has excellent carrying and hidden capacities.Various approaches have been employed to identify the source of Botnet at earlier stages.Machine Learning(ML)and Deep Learning(DL)techniques are developed based on heavy influence from Botnet detection methodology.In spite of this,it is still a challenging task to detect Botnet at early stages due to low number of features accessible from Botnet dataset.The current study devises IoT with Cloud Assisted Botnet Detection and Classification utilizingRat SwarmOptimizer with Deep Learning(BDC-RSODL)model.The presented BDC-RSODL model includes a series of processes like pre-processing,feature subset selection,classification,and parameter tuning.Initially,the network data is pre-processed to make it compatible for further processing.Besides,RSO algorithm is exploited for effective selection of subset of features.Additionally,Long Short TermMemory(LSTM)algorithm is utilized for both identification and classification of botnets.Finally,Sine Cosine Algorithm(SCA)is executed for fine-tuning the hyperparameters related to LSTM model.In order to validate the promising 3086 CMC,2023,vol.74,no.2 performance of BDC-RSODL system,a comprehensive comparison analysis was conducted.The obtained results confirmed the supremacy of BDCRSODL model over recent approaches.展开更多
Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the...Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the features of the context such as neighboring words like adjective provide the evidence for classification using machine learning approach.This paper presented the text document classification that has wide applications in information retrieval,which uses movie review datasets.Here the document indexing based on controlled vocabulary,adjective,word sense disambiguation,generating hierarchical cate-gorization of web pages,spam detection,topic labeling,web search,document summarization,etc.Here the kernel support vector machine learning algorithm helps to classify the text and feature extract is performed by cuckoo search opti-mization.Positive review and negative review of movie dataset is presented to get the better classification accuracy.Experimental results focused with context mining,feature analysis and classification.By comparing with the previous work,proposed work designed to achieve the efficient results.Overall design is per-formed with MATLAB 2020a tool.展开更多
In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making d...In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.展开更多
文摘CC’s(Cloud Computing)networks are distributed and dynamic as signals appear/disappear or lose significance.MLTs(Machine learning Techniques)train datasets which sometime are inadequate in terms of sample for inferring information.A dynamic strategy,DevMLOps(Development Machine Learning Operations)used in automatic selections and tunings of MLTs result in significant performance differences.But,the scheme has many disadvantages including continuity in training,more samples and training time in feature selections and increased classification execution times.RFEs(Recursive Feature Eliminations)are computationally very expensive in its operations as it traverses through each feature without considering correlations between them.This problem can be overcome by the use of Wrappers as they select better features by accounting for test and train datasets.The aim of this paper is to use DevQLMLOps for automated tuning and selections based on orchestrations and messaging between containers.The proposed AKFA(Adaptive Kernel Firefly Algorithm)is for selecting features for CNM(Cloud Network Monitoring)operations.AKFA methodology is demonstrated using CNSD(Cloud Network Security Dataset)with satisfactory results in the performance metrics like precision,recall,F-measure and accuracy used.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
文摘Feature Subset Selection(FSS)is an NP-hard problem to remove redundant and irrelevant features particularly from medical data,and it can be effectively addressed by metaheuristic algorithms.However,existing binary versions of metaheuristic algorithms have issues with convergence and lack an effective binarization method,resulting in suboptimal solutions that hinder diagnosis and prediction accuracy.This paper aims to propose an Improved Binary Quantum-based Avian Navigation Optimizer Algorithm(IBQANA)for FSS in medical data preprocessing to address the suboptimal solutions arising from binary versions of metaheuristic algorithms.The proposed IBQANA’s contributions include the Hybrid Binary Operator(HBO)and the Distance-based Binary Search Strategy(DBSS).HBO is designed to convert continuous values into binary solutions,even for values outside the[0,1]range,ensuring accurate binary mapping.On the other hand,DBSS is a two-phase search strategy that enhances the performance of inferior search agents and accelerates convergence.By combining exploration and exploitation phases based on an adaptive probability function,DBSS effectively avoids local optima.The effectiveness of applying HBO is compared with five transfer function families and thresholding on 12 medical datasets,with feature numbers ranging from 8 to 10,509.IBQANA's effectiveness is evaluated regarding the accuracy,fitness,and selected features and compared with seven binary metaheuristic algorithms.Furthermore,IBQANA is utilized to detect COVID-19.The results reveal that the proposed IBQANA outperforms all comparative algorithms on COVID-19 and 11 other medical datasets.The proposed method presents a promising solution to the FSS problem in medical data preprocessing.
基金This work was supported by the National Science Foundation of China(62176055)the China University S&T Innovation Plan Guided by the Ministry of Education.
文摘Multi-label learning deals with objects associated with multiple class labels,and aims to induce a predictive model which can assign a set of relevant class labels for an unseen instance.Since each class might possess its own characteristics,the strategy of extracting label-specific features has been widely employed to improve the discrimination process in multi-label learning,where the predictive model is induced based on tailored features specific to each class label instead of the identical instance representations.As a representative approach,LIFT generates label-specific features by conducting clustering analysis.However,its performance may be degraded due to the inherent instability of the single clustering algorithm.To improve this,a novel multi-label learning approach named SENCE(stable label-Specific features gENeration for multi-label learning via mixture-based Clustering Ensemble)is proposed,which stabilizes the generation process of label-specific features via clustering ensemble techniques.Specifically,more stable clustering results are obtained by firstly augmenting the original instance repre-sentation with cluster assignments from base clusters and then fitting a mixture model via the expectation-maximization(EM)algorithm.Extensive experiments on eighteen benchmark data sets show that SENCE performs better than LIFT and other well-established multi-label learning algorithms.
基金the Zhejiang Provincial Natural Science Foundation of China(no.LZ21F020001)the Basic Scientific Research Program of Wenzhou(no.S20220018).
文摘Whale optimization algorithm(WOA)tends to fall into the local optimum and fails to converge quickly in solving complex problems.To address the shortcomings,an improved WOA(QGBWOA)is proposed in this work.First,quasi-opposition-based learning is introduced to enhance the ability of WOA to search for optimal solutions.Second,a Gaussian barebone mechanism is embedded to promote diversity and expand the scope of the solution space in WOA.To verify the advantages of QGBWOA,comparison experiments between QGBWOA and its comparison peers were carried out on CEC 2014 with dimensions 10,30,50,and 100 and on CEC 2020 test with dimension 30.Furthermore,the performance results were tested using Wilcoxon signed-rank(WS),Friedman test,and post hoc statistical tests for statistical analysis.Convergence accuracy and speed are remarkably improved,as shown by experimental results.Finally,feature selection and multi-threshold image segmentation applications are demonstrated to validate the ability of QGBWOA to solve complex real-world problems.QGBWOA proves its superiority over compared algorithms in feature selection and multi-threshold image segmentation by performing several evaluation metrics.
基金supported by a grant from the Korea Health Technology R&D Project through the KoreaHealth Industry Development Institute (KHIDI)funded by the Ministry of Health&Welfare,Republic of Korea (grant number:HI21C1831)the Soonchunhyang University Research Fund.
文摘Gait is a biological typical that defines the method by that people walk.Walking is the most significant performance which keeps our day-to-day life and physical condition.Surface electromyography(sEMG)is a weak bioelectric signal that portrays the functional state between the human muscles and nervous system to any extent.Gait classifiers dependent upon sEMG signals are extremely utilized in analysing muscle diseases and as a guide path for recovery treatment.Several approaches are established in the works for gait recognition utilizing conventional and deep learning(DL)approaches.This study designs an Enhanced Artificial Algae Algorithm with Hybrid Deep Learning based Human Gait Classification(EAAA-HDLGR)technique on sEMG signals.The EAAA-HDLGR technique extracts the time domain(TD)and frequency domain(FD)features from the sEMG signals and is fused.In addition,the EAAA-HDLGR technique exploits the hybrid deep learning(HDL)model for gait recognition.At last,an EAAA-based hyperparameter optimizer is applied for the HDL model,which is mainly derived from the quasi-oppositional based learning(QOBL)concept,showing the novelty of the work.A brief classifier outcome of the EAAA-HDLGR technique is examined under diverse aspects,and the results indicate improving the EAAA-HDLGR technique.The results imply that the EAAA-HDLGR technique accomplishes improved results with the inclusion of EAAA on gait recognition.
基金supported in part by the National Science Foundation of China under Grants U22B2027,62172297,62102262,61902276 and 62272311,Tianjin Intelligent Manufacturing Special Fund Project under Grant 20211097the China Guangxi Science and Technology Plan Project(Guangxi Science and Technology Base and Talent Special Project)under Grant AD23026096(Application Number 2022AC20001)+1 种基金Hainan Provincial Natural Science Foundation of China under Grant 622RC616CCF-Nsfocus Kunpeng Fund Project under Grant CCF-NSFOCUS202207.
文摘Web application fingerprint recognition is an effective security technology designed to identify and classify web applications,thereby enhancing the detection of potential threats and attacks.Traditional fingerprint recognition methods,which rely on preannotated feature matching,face inherent limitations due to the ever-evolving nature and diverse landscape of web applications.In response to these challenges,this work proposes an innovative web application fingerprint recognition method founded on clustering techniques.The method involves extensive data collection from the Tranco List,employing adjusted feature selection built upon Wappalyzer and noise reduction through truncated SVD dimensionality reduction.The core of the methodology lies in the application of the unsupervised OPTICS clustering algorithm,eliminating the need for preannotated labels.By transforming web applications into feature vectors and leveraging clustering algorithms,our approach accurately categorizes diverse web applications,providing comprehensive and precise fingerprint recognition.The experimental results,which are obtained on a dataset featuring various web application types,affirm the efficacy of the method,demonstrating its ability to achieve high accuracy and broad coverage.This novel approach not only distinguishes between different web application types effectively but also demonstrates superiority in terms of classification accuracy and coverage,offering a robust solution to the challenges of web application fingerprint recognition.
基金This research is funded by Fayoum University,Egypt.
文摘As big data,its technologies,and application continue to advance,the Smart Grid(SG)has become one of the most successful pervasive and fixed computing platforms that efficiently uses a data-driven approach and employs efficient information and communication technology(ICT)and cloud computing.As a result of the complicated architecture of cloud computing,the distinctive working of advanced metering infrastructures(AMI),and the use of sensitive data,it has become challenging tomake the SG secure.Faults of the SG are categorized into two main categories,Technical Losses(TLs)and Non-Technical Losses(NTLs).Hardware failure,communication issues,ohmic losses,and energy burnout during transmission and propagation of energy are TLs.NTL’s are human-induced errors for malicious purposes such as attacking sensitive data and electricity theft,along with tampering with AMI for bill reduction by fraudulent customers.This research proposes a data-driven methodology based on principles of computational intelligence as well as big data analysis to identify fraudulent customers based on their load profile.In our proposed methodology,a hybrid Genetic Algorithm and Support Vector Machine(GA-SVM)model has been used to extract the relevant subset of feature data from a large and unsupervised public smart grid project dataset in London,UK,for theft detection.A subset of 26 out of 71 features is obtained with a classification accuracy of 96.6%,compared to studies conducted on small and limited datasets.
文摘Pavement crack detection plays a crucial role in ensuring road safety and reducing maintenance expenses.Recent advancements in deep learning(DL)techniques have shown promising results in detecting pavement cracks;however,the selection of relevant features for classification remains challenging.In this study,we propose a new approach for pavement crack detection that integrates deep learning for feature extraction,the whale optimization algorithm(WOA)for feature selection,and random forest(RF)for classification.The performance of the models was evaluated using accuracy,recall,precision,F1 score,and area under the receiver operating characteristic curve(AUC).Our findings reveal that Model 2,which incorporates RF into the ResNet-18 architecture,outperforms baseline Model 1 across all evaluation metrics.Nevertheless,our proposed model,which combines ResNet-18 with both WOA and RF,achieves significantly higher accuracy,recall,precision,and F1 score compared to the other two models.These results underscore the effectiveness of integrating RF and WOA into ResNet-18 for pavement crack detection applications.We applied the proposed approach to a dataset of pavement images,achieving an accuracy of 97.16%and an AUC of 0.984.Our results demonstrate that the proposed approach surpasses existing methods for pavement crack detection,offering a promising solution for the automatic identification of pavement cracks.By leveraging this approach,potential safety hazards can be identified more effectively,enabling timely repairs and maintenance measures.Lastly,the findings of this study also emphasize the potential of integrating RF and WOA with deep learning for pavement crack detection,providing road authorities with the necessary tools to make informed decisions regarding road infrastructure maintenance.
文摘Cryptocurrency price prediction has garnered significant attention due to the growing importance of digital assets in the financial landscape. This paper presents a comprehensive study on predicting future cryptocurrency prices using machine learning algorithms. Open-source historical data from various cryptocurrency exchanges is utilized. Interpolation techniques are employed to handle missing data, ensuring the completeness and reliability of the dataset. Four technical indicators are selected as features for prediction. The study explores the application of five machine learning algorithms to capture the complex patterns in the highly volatile cryptocurrency market. The findings demonstrate the strengths and limitations of the different approaches, highlighting the significance of feature engineering and algorithm selection in achieving accurate cryptocurrency price predictions. The research contributes valuable insights into the dynamic and rapidly evolving field of cryptocurrency price prediction, assisting investors and traders in making informed decisions amidst the challenges posed by the cryptocurrency market.
文摘In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
基金supported by the National Natural Science Foundation of China(Grant No.62225602).
文摘Multi-class classification can be solved by decomposing it into a set of binary classification problems according to some encoding rules,e.g.,one-vs-one,one-vs-rest,error-correcting output codes.Existing works solve these binary classification problems in the original feature space,while it might be suboptimal as different binary classification problems correspond to different positive and negative examples.In this paper,we propose to learn label-specific features for each decomposed binary classification problem to consider the specific characteristics containing in its positive and negative examples.Specifically,to generate the label-specific features,clustering analysis is respectively conducted on the positive and negative examples in each decomposed binary data set to discover their inherent information and then label-specific features for one example are obtained by measuring the similarity between it and all cluster centers.Experiments clearly validate the effectiveness of learning label-specific features for decomposition-based multi-class classification.
基金supported by National Research Foundation(NRF)of South Korea(2020R1A2C1004720)。
文摘In the present work,we have employed machine learning(ML)techniques to evaluate ductile-brittle(DB)behaviors in intermetallic compounds(IMCs)which can form magnesium(Mg)alloys.This procedure was mainly conducted by a proxy-based method,where the ratio of shear(G)/bulk(B)moduli was used as a proxy to identify whether the compound is ductile or brittle.Starting from compounds information(composition and crystal structure)and their moduli,as found in open databases(AFLOW),ML-based models were built,and those models were used to predict the moduli in other compounds,and accordingly,to foresee the ductile-brittle behaviors of these new compounds.The results reached in the present work showed that the built models can effectively catch the elastic moduli of new compounds.This was confirmed through moduli calculations done by density functional theory(DFT)on some compounds,where the DFT calculations were consistent with the ML prediction.A further confirmation on the reliability of the built ML models was considered through relating between the DB behavior in MgBe_(13) and MgPd_(2),as evaluated by the ML-predicted moduli,and the nature of chemical bonding in these two compounds,which in turn,was investigated by the charge density distribution(CDD)and electron localization function(ELF)obtained by DFT methodology.The ML-evaluated DB behaviors of the two compounds was also consistent with the DFT calculations of CDD and ELF.These findings and confirmations gave legitimacy to the built model to be employed in further prediction processes.Indeed,as examples,the DB characteristics were investigated in IMCs that might from in three Mg alloy series,involving AZ,ZX and WE.
基金the National Natural Science Foundation of China(82003831,62076185 and U1809209)the Project of Health Commission of Zhejiang Province(2020KY177)+2 种基金the Wenzhou Technology Foundation(Y2020002)the Natural Science Foundation of Zhejiang Province(LZ22F020005)the First Affiliated Hospital of Wenzhou Medical University Youth Excellence Project(QNYC114).
文摘Pulmonary Hypertension(PH)is a global health problem that affects about 1%of the global population.Animal models of PH play a vital role in unraveling the pathophysiological mechanisms of the disease.The present study proposes a Kernel Extreme Learning Machine(KELM)model based on an improved Whale Optimization Algorithm(WOA)for predicting PH mouse models.The experimental results showed that the selected blood indicators,including Haemoglobin(HGB),Hematocrit(HCT),Mean,Platelet Volume(MPV),Platelet distribution width(PDW),and Platelet–Large Cell Ratio(P-LCR),were essential for identifying PH mouse models using the feature selection method proposed in this paper.Remarkably,the method achieved 100.0%accuracy and 100.0%specificity in classification,demonstrating that our method has great potential to be used for evaluating and identifying mouse PH models.
基金supported by the National Natural Science Foundation of China(No.62001197)Natural Sciences Research Grant for Colleges and Universities of Jiangsu Province(No.22KJD470002)Jiangsu Provincial Postgraduate Research and Practice Innovation Program(No.XSJCX21_58).
文摘Metal flat surface in-line surface defect detection is notoriously difficult due to obstacles such as high surface reflectivity,pseudo-defect interference,and random elastic deformation.This study evaluates the approach for detecting scratches on a metal surface in order to address a problem in the detection process.This paper proposes an improved Gauss-Laplace(LoG)operator combined with a deep learning technique for metal surface scratch identification in order to solve the difficulties that it is challenging to reduce noise and that the edges are unclear when utilizing existing edge detection algorithms.In the process of scratch identification,it is challenging to differentiate between the scratch edge and the interference edge.Therefore,local texture screening is utilized by deep learning techniques that evaluate and identify scratch edges and interference edges based on the local texture characteristics of scratches.Experiments have proven that by combining the improved LoG operator with a deep learning strategy,it is able to effectively detect image edges,distinguish between scratch edges and interference edges,and identify clear scratch information.Experiments based on the six categories of meta scratches indicate that the proposedmethod has achieved rolled-in crazing(100%),inclusion(94.4%),patches(100%),pitted(100%),rolled(100%),and scratches(100%),respectively.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project Under Grant Number(61/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R114)+1 种基金Princess Nourah bint Abdulrahman University,Riyadh,Saudi ArabiaThe authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4310373DSR26).
文摘Autism Spectrum Disorder (ASD) refers to a neuro-disorder wherean individual has long-lasting effects on communication and interaction withothers.Advanced information technologywhich employs artificial intelligence(AI) model has assisted in early identify ASD by using pattern detection.Recent advances of AI models assist in the automated identification andclassification of ASD, which helps to reduce the severity of the disease.This study introduces an automated ASD classification using owl searchalgorithm with machine learning (ASDC-OSAML) model. The proposedASDC-OSAML model majorly focuses on the identification and classificationof ASD. To attain this, the presentedASDC-OSAML model follows minmaxnormalization approach as a pre-processing stage. Next, the owl searchalgorithm (OSA)-based feature selection (OSA-FS) model is used to derivefeature subsets. Then, beetle swarm antenna search (BSAS) algorithm withIterative Dichotomiser 3 (ID3) classification method was implied for ASDdetection and classification. The design of BSAS algorithm helps to determinethe parameter values of the ID3 classifier. The performance analysis of theASDC-OSAML model is performed using benchmark dataset. An extensivecomparison study highlighted the supremacy of the ASDC-OSAML modelover recent state of art approaches.
基金This work was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University,through the Research Groups Program Grant No.(RGP-1443-0051)。
文摘With recent advancements in information and communication technology,a huge volume of corporate and sensitive user data was shared consistently across the network,making it vulnerable to an attack that may be brought some factors under risk:data availability,confidentiality,and integrity.Intrusion Detection Systems(IDS)were mostly exploited in various networks to help promptly recognize intrusions.Nowadays,blockchain(BC)technology has received much more interest as a means to share data without needing a trusted third person.Therefore,this study designs a new Blockchain Assisted Optimal Machine Learning based Cyberattack Detection and Classification(BAOML-CADC)technique.In the BAOML-CADC technique,the major focus lies in identifying cyberattacks.To do so,the presented BAOML-CADC technique applies a thermal equilibrium algorithm-based feature selection(TEA-FS)method for the optimal choice of features.The BAOML-CADC technique uses an extreme learning machine(ELM)model for cyberattack recognition.In addition,a BC-based integrity verification technique is developed to defend against the misrouting attack,showing the innovation of the work.The experimental validation of BAOML-CADC algorithm is tested on a benchmark cyberattack dataset.The obtained values implied the improved performance of the BAOML-CADC algorithm over other techniques.
基金the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number(61/43)Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R319)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code:(22UQU4340237DSR27).
文摘Nowadays,Internet of Things(IoT)has penetrated all facets of human life while on the other hand,IoT devices are heavily prone to cyberattacks.It has become important to develop an accurate system that can detect malicious attacks on IoT environments in order to mitigate security risks.Botnet is one of the dreadfulmalicious entities that has affected many users for the past few decades.It is challenging to recognize Botnet since it has excellent carrying and hidden capacities.Various approaches have been employed to identify the source of Botnet at earlier stages.Machine Learning(ML)and Deep Learning(DL)techniques are developed based on heavy influence from Botnet detection methodology.In spite of this,it is still a challenging task to detect Botnet at early stages due to low number of features accessible from Botnet dataset.The current study devises IoT with Cloud Assisted Botnet Detection and Classification utilizingRat SwarmOptimizer with Deep Learning(BDC-RSODL)model.The presented BDC-RSODL model includes a series of processes like pre-processing,feature subset selection,classification,and parameter tuning.Initially,the network data is pre-processed to make it compatible for further processing.Besides,RSO algorithm is exploited for effective selection of subset of features.Additionally,Long Short TermMemory(LSTM)algorithm is utilized for both identification and classification of botnets.Finally,Sine Cosine Algorithm(SCA)is executed for fine-tuning the hyperparameters related to LSTM model.In order to validate the promising 3086 CMC,2023,vol.74,no.2 performance of BDC-RSODL system,a comprehensive comparison analysis was conducted.The obtained results confirmed the supremacy of BDCRSODL model over recent approaches.
文摘Word Sense Disambiguation has been a trending topic of research in Natural Language Processing and Machine Learning.Mining core features and performing the text classification still exist as a challenging task.Here the features of the context such as neighboring words like adjective provide the evidence for classification using machine learning approach.This paper presented the text document classification that has wide applications in information retrieval,which uses movie review datasets.Here the document indexing based on controlled vocabulary,adjective,word sense disambiguation,generating hierarchical cate-gorization of web pages,spam detection,topic labeling,web search,document summarization,etc.Here the kernel support vector machine learning algorithm helps to classify the text and feature extract is performed by cuckoo search opti-mization.Positive review and negative review of movie dataset is presented to get the better classification accuracy.Experimental results focused with context mining,feature analysis and classification.By comparing with the previous work,proposed work designed to achieve the efficient results.Overall design is per-formed with MATLAB 2020a tool.
文摘In a competitive digital age where data volumes are increasing with time, the ability to extract meaningful knowledge from high-dimensional data using machine learning (ML) and data mining (DM) techniques and making decisions based on the extracted knowledge is becoming increasingly important in all business domains. Nevertheless, high-dimensional data remains a major challenge for classification algorithms due to its high computational cost and storage requirements. The 2016 Demographic and Health Survey of Ethiopia (EDHS 2016) used as the data source for this study which is publicly available contains several features that may not be relevant to the prediction task. In this paper, we developed a hybrid multidimensional metrics framework for predictive modeling for both model performance evaluation and feature selection to overcome the feature selection challenges and select the best model among the available models in DM and ML. The proposed hybrid metrics were used to measure the efficiency of the predictive models. Experimental results show that the decision tree algorithm is the most efficient model. The higher score of HMM (m, r) = 0.47 illustrates the overall significant model that encompasses almost all the user’s requirements, unlike the classical metrics that use a criterion to select the most appropriate model. On the other hand, the ANNs were found to be the most computationally intensive for our prediction task. Moreover, the type of data and the class size of the dataset (unbalanced data) have a significant impact on the efficiency of the model, especially on the computational cost, and the interpretability of the parameters of the model would be hampered. And the efficiency of the predictive model could be improved with other feature selection algorithms (especially hybrid metrics) considering the experts of the knowledge domain, as the understanding of the business domain has a significant impact.