期刊文献+
共找到320,690篇文章
< 1 2 250 >
每页显示 20 50 100
Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
1
作者 Yap Bee Wah Azlan Ismail +4 位作者 Nur Niswah Naslina Azid Jafreezal Jaafar Izzatdin Abdul Aziz Mohd Hilmi Hasan Jasni Mohamad Zain 《Computers, Materials & Continua》 SCIE EI 2023年第6期4821-4841,共21页
Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate.The common approach to han-dle classification involving imbalanced data is to balance the data using a sampling a... Prediction of machine failure is challenging as the dataset is often imbalanced with a low failure rate.The common approach to han-dle classification involving imbalanced data is to balance the data using a sampling approach such as random undersampling,random oversampling,or Synthetic Minority Oversampling Technique(SMOTE)algorithms.This paper compared the classification performance of three popular classifiers(Logistic Regression,Gaussian Naïve Bayes,and Support Vector Machine)in predicting machine failure in the Oil and Gas industry.The original machine failure dataset consists of 20,473 hourly data and is imbalanced with 19945(97%)‘non-failure’and 528(3%)‘failure data’.The three independent variables to predict machine failure were pressure indicator,flow indicator,and level indicator.The accuracy of the classifiers is very high and close to 100%,but the sensitivity of all classifiers using the original dataset was close to zero.The performance of the three classifiers was then evaluated for data with different imbalance rates(10%to 50%)generated from the original data using SMOTE,SMOTE-Support Vector Machine(SMOTE-SVM)and SMOTE-Edited Nearest Neighbour(SMOTE-ENN).The classifiers were evaluated based on improvement in sensitivity and F-measure.Results showed that the sensitivity of all classifiers increases as the imbalance rate increases.SVM with radial basis function(RBF)kernel has the highest sensitivity when data is balanced(50:50)using SMOTE(Sensitivitytest=0.5686,Ftest=0.6927)compared to Naïve Bayes(Sensitivitytest=0.4033,Ftest=0.6218)and Logistic Regression(Sensitivitytest=0.4194,Ftest=0.621).Overall,the Gaussian Naïve Bayes model consistently improves sensitivity and F-measure as the imbalance ratio increases,but the sensitivity is below 50%.The classifiers performed better when data was balanced using SMOTE-SVM compared to SMOTE and SMOTE-ENN. 展开更多
关键词 Machine failure machine learning imbalanced data SMOTE classification
下载PDF
Imbalanced Data Classification Using SVM Based on Improved Simulated Annealing Featuring Synthetic Data Generation and Reduction
2
作者 Hussein Ibrahim Hussein Said Amirul Anwar Muhammad Imran Ahmad 《Computers, Materials & Continua》 SCIE EI 2023年第4期547-564,共18页
Imbalanced data classification is one of the major problems in machine learning.This imbalanced dataset typically has significant differences in the number of data samples between its classes.In most cases,the perform... Imbalanced data classification is one of the major problems in machine learning.This imbalanced dataset typically has significant differences in the number of data samples between its classes.In most cases,the performance of the machine learning algorithm such as Support Vector Machine(SVM)is affected when dealing with an imbalanced dataset.The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples.In this paper,a hybrid approach combining data pre-processing technique andSVMalgorithm based on improved Simulated Annealing(SA)was proposed.Firstly,the data preprocessing technique which primarily aims at solving the resampling strategy of handling imbalanced datasets was proposed.In this technique,the data were first synthetically generated to equalize the number of samples between classes and followed by a reduction step to remove redundancy and duplicated data.Next is the training of a balanced dataset using SVM.Since this algorithm requires an iterative process to search for the best penalty parameter during training,an improved SA algorithm was proposed for this task.In this proposed improvement,a new acceptance criterion for the solution to be accepted in the SA algorithm was introduced to enhance the accuracy of the optimization process.Experimental works based on ten publicly available imbalanced datasets have demonstrated higher accuracy in the classification tasks using the proposed approach in comparison with the conventional implementation of SVM.Registering at an average of 89.65%of accuracy for the binary class classification has demonstrated the good performance of the proposed works. 展开更多
关键词 imbalanced data resampling technique data reduction support vector machine simulated annealing
下载PDF
Fault Diagnosis of Power Transformer Based on Improved ACGAN Under Imbalanced Data
3
作者 Tusongjiang.Kari Lin Du +3 位作者 Aisikaer.Rouzi Xiaojing Ma Zhichao Liu Bo Li 《Computers, Materials & Continua》 SCIE EI 2023年第5期4573-4592,共20页
The imbalance of dissolved gas analysis(DGA)data will lead to over-fitting,weak generalization and poor recognition performance for fault diagnosis models based on deep learning.To handle this problem,a novel transfor... The imbalance of dissolved gas analysis(DGA)data will lead to over-fitting,weak generalization and poor recognition performance for fault diagnosis models based on deep learning.To handle this problem,a novel transformer fault diagnosis method based on improved auxiliary classifier generative adversarial network(ACGAN)under imbalanced data is proposed in this paper,which meets both the requirements of balancing DGA data and supplying accurate diagnosis results.The generator combines one-dimensional convolutional neural networks(1D-CNN)and long short-term memories(LSTM),which can deeply extract the features from DGA samples and be greatly beneficial to ACGAN’s data balancing and fault diagnosis.The discriminator adopts multilayer perceptron networks(MLP),which prevents the discriminator from losing important features of DGA data when the network is too complex and the number of layers is too large.The experimental results suggest that the presented approach can effectively improve the adverse effects of DGA data imbalance on the deep learning models,enhance fault diagnosis performance and supply desirable diagnosis accuracy up to 99.46%.Furthermore,the comparison results indicate the fault diagnosis performance of the proposed approach is superior to that of other conventional methods.Therefore,the method presented in this study has excellent and reliable fault diagnosis performance for various unbalanced datasets.In addition,the proposed approach can also solve the problems of insufficient and imbalanced fault data in other practical application fields. 展开更多
关键词 Power transformer dissolved gas analysis imbalanced data auxiliary classifier generative adversarial network
下载PDF
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data
4
作者 Mini Prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
Over-sampling algorithm for imbalanced data classification 被引量:5
5
作者 XU Xiaolong CHEN Wen SUN Yanfei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第6期1182-1191,共10页
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic... For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value. 展开更多
关键词 imbalanced data density-based spatial clustering of applications with noise(DBSCAN) synthetic minority over sampling technique(SMOTE) over-sampling.
下载PDF
An Embedded Feature Selection Method for Imbalanced Data Classification 被引量:11
6
作者 Haoyue Liu MengChu Zhou Qing Liu 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第3期703-715,共13页
Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority cl... Imbalanced data is one type of datasets that are frequently found in real-world applications, e.g., fraud detection and cancer diagnosis. For this type of datasets, improving the accuracy to identify their minority class is a critically important issue.Feature selection is one method to address this issue. An effective feature selection method can choose a subset of features that favor in the accurate determination of the minority class. A decision tree is a classifier that can be built up by using different splitting criteria. Its advantage is the ease of detecting which feature is used as a splitting node. Thus, it is possible to use a decision tree splitting criterion as a feature selection method. In this paper, an embedded feature selection method using our proposed weighted Gini index(WGI) is proposed. Its comparison results with Chi2, F-statistic and Gini index feature selection methods show that F-statistic and Chi2 reach the best performance when only a few features are selected. As the number of selected features increases, our proposed method has the highest probability of achieving the best performance. The area under a receiver operating characteristic curve(ROC AUC) and F-measure are used as evaluation criteria. Experimental results with two datasets show that ROC AUC performance can be high, even if only a few features are selected and used, and only changes slightly as more and more features are selected. However, the performance of Fmeasure achieves excellent performance only if 20% or more of features are chosen. The results are helpful for practitioners to select a proper feature selection method when facing a practical problem. 展开更多
关键词 Classification and regression TREE FEATURE SELECTION imbalanced data WEIGHTED GINI index (WGI)
下载PDF
Dealing with Imbalanced Dataset Leveraging Boundary Samples Discovered by Support Vector Data Description
7
作者 Zhengbo Luo Hamïd Parvïn +3 位作者 Harish Garg Sultan Noman Qasem Kim-Hung Pho Zulkefli Mansor 《Computers, Materials & Continua》 SCIE EI 2021年第3期2691-2708,共18页
These days,imbalanced datasets,denoted throughout the paper by ID,(a dataset that contains some(usually two)classes where one contains considerably smaller number of samples than the other(s))emerge in many real world... These days,imbalanced datasets,denoted throughout the paper by ID,(a dataset that contains some(usually two)classes where one contains considerably smaller number of samples than the other(s))emerge in many real world problems(like health care systems or disease diagnosis systems,anomaly detection,fraud detection,stream based malware detection systems,and so on)and these datasets cause some problems(like under-training of minority class(es)and over-training of majority class(es),bias towards majority class(es),and so on)in classification process and application.Therefore,these datasets take the focus of many researchers in any science and there are several solutions for dealing with this problem.The main aim of this study for dealing with IDs is to resample the borderline samples discovered by Support Vector Data Description(SVDD).There are naturally two kinds of resampling:Under-sampling(U-S)and oversampling(O-S).The O-S may cause the occurrence of over-fitting(the occurrence of over-fitting is its main drawback).The U-S can cause the occurrence of significant information loss(the occurrence of significant information loss is its main drawback).In this study,to avoid the drawbacks of the sampling techniques,we focus on the samples that may be misclassified.The data points that can be misclassified are considered to be the borderline data points which are on border(s)between the majority class(es)and minority class(es).First by SVDD,we find the borderline examples;then,the data resampling is applied over them.At the next step,the base classifier is trained on the newly created dataset.Finally,we compare the result of our method in terms of Area Under Curve(AUC)and F-measure and G-mean with the other state-of-the-art methods.We show that our method has betterresults than the other state-of-the-art methods on our experimental study. 展开更多
关键词 imbalanced learning CLASSIFICATION borderline examples
下载PDF
Clustered Federated Learning with Weighted Model Aggregation for Imbalanced Data
8
作者 Dong Wang Naifu Zhang Meixia Tao 《China Communications》 SCIE CSCD 2022年第8期41-56,共16页
As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is o... As a promising edge learning framework in future 6G networks,federated learning(FL)faces a number of technical challenges due to the heterogeneous network environment and diversified user behaviors.Data imbalance is one of these challenges that can significantly degrade the learning efficiency.To deal with data imbalance issue,this work proposes a new learning framework,called clustered federated learning with weighted model aggregation(weighted CFL).Compared with traditional FL,our weighted CFL adaptively clusters the participating edge devices based on the cosine similarity of their local gradients at each training iteration,and then performs weighted per-cluster model aggregation.Therein,the similarity threshold for clustering is adaptive over iterations in response to the time-varying divergence of local gradients.Moreover,the weights for per-cluster model aggregation are adjusted according to the data balance feature so as to speed up the convergence rate.Experimental results show that the proposed weighted CFL achieves a faster model convergence rate and greater learning accuracy than benchmark methods under the imbalanced data scenario. 展开更多
关键词 clustered federated learning data imbalance convergence rate analysis model aggregation
下载PDF
Fault diagnosis of HVAC system with imbalanced data using multi-scale convolution composite neural network
9
作者 Rouhui Wu Yizhu Ren +1 位作者 Mengying Tan Lei Nie 《Building Simulation》 SCIE EI CSCD 2024年第3期371-386,共16页
Accurate fault diagnosis of heating,ventilation,and air conditioning(HVAC)systems is of significant importance for maintaining normal operation,reducing energy consumption,and minimizing maintenance costs.However,in p... Accurate fault diagnosis of heating,ventilation,and air conditioning(HVAC)systems is of significant importance for maintaining normal operation,reducing energy consumption,and minimizing maintenance costs.However,in practical applications,it is challenging to obtain sufficient fault data for HVAC systems,leading to imbalanced data,where the number of fault samples is much smaller than that of normal samples.Moreover,most existing HVAC system fault diagnosis methods heavily rely on balanced training sets to achieve high fault diagnosis accuracy.Therefore,to address this issue,a composite neural network fault diagnosis model is proposed,which combines SMOTETomek,multi-scale one-dimensional convolutional neural networks(M1DCNN),and support vector machine(SVM).This method first utilizes SMOTETomek to augment the minority class samples in the imbalanced dataset,achieving a balanced number of faulty and normal data.Then,it employs the M1DCNN model to extract feature information from the augmented dataset.Finally,it replaces the original Softmax classifier with an SVM classifier for classification,thus enhancing the fault diagnosis accuracy.Using the SMOTETomek-M1DCNN-SVM method,we conducted fault diagnosis validation on both the ASHRAE RP-1043 dataset and experimental dataset with an imbalance ratio of 1:10.The results demonstrate the superiority of this approach,providing a novel and promising solution for intelligent building management,with accuracy and F1 scores of 98.45%and 100%for the RP-1043 dataset and experimental dataset,respectively. 展开更多
关键词 fault diagnosis CHILLER imbalanced data SMOTETomek MULTI-SCALE neural networks
原文传递
Constraint Learning-based Optimal Power Dispatch for Active Distribution Networks with Extremely Imbalanced Data
10
作者 Yonghua Song Ge Chen Hongcai Zhang 《CSEE Journal of Power and Energy Systems》 SCIE EI CSCD 2024年第1期51-65,共15页
Transition towards carbon-neutral power systems has necessitated optimization of power dispatch in active distribution networks(ADNs)to facilitate integration of distributed renewable generation.Due to unavailability ... Transition towards carbon-neutral power systems has necessitated optimization of power dispatch in active distribution networks(ADNs)to facilitate integration of distributed renewable generation.Due to unavailability of network topology and line impedance in many distribution networks,physical model-based methods may not be applicable to their operations.To tackle this challenge,some studies have proposed constraint learning,which replicates physical models by training a neural network to evaluate feasibility of a decision(i.e.,whether a decision satisfies all critical constraints or not).To ensure accuracy of this trained neural network,training set should contain sufficient feasible and infeasible samples.However,since ADNs are mostly operated in a normal status,only very few historical samples are infeasible.Thus,the historical dataset is highly imbalanced,which poses a significant obstacle to neural network training.To address this issue,we propose an enhanced constraint learning method.First,it leverages constraint learning to train a neural network as surrogate of ADN's model.Then,it introduces Synthetic Minority Oversampling Technique to generate infeasible samples to mitigate imbalance of historical dataset.By incorporating historical and synthetic samples into the training set,we can significantly improve accuracy of neural network.Furthermore,we establish a trust region to constrain and thereafter enhance reliability of the solution.Simulations confirm the benefits of the proposed method in achieving desirable optimality and feasibility while maintaining low computational complexity. 展开更多
关键词 Deep learning demand response distribution networks imbalanced data optimal power flow
原文传递
Conditional self-attention generative adversarial network with differential evolution algorithm for imbalanced data classification
11
作者 Jiawei NIU Zhunga LIU +2 位作者 Quan PAN Yanbo YANG Yang LI 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2023年第3期303-315,共13页
Imbalanced data classification is an important research topic in real-world applications,like fault diagnosis in an aircraft manufacturing system.The over-sampling method is often used to solve this problem.It generat... Imbalanced data classification is an important research topic in real-world applications,like fault diagnosis in an aircraft manufacturing system.The over-sampling method is often used to solve this problem.It generates samples according to the distance between minority data.However,the traditional over-sampling method may change the original data distribution,which is harmful to the classification performance.In this paper,we propose a new method called Conditional SelfAttention Generative Adversarial Network with Differential Evolution(CSAGAN-DE)for imbalanced data classification.The new method aims at improving the classification performance of minority data by enhancing the quality of the generation of minority data.In CSAGAN-DE,the minority data are fed into the self-attention generative adversarial network to approximate the data distribution and create new data for the minority class.Then,the differential evolution algorithm is employed to automatically determine the number of generated minority data for achieving a satisfactory classification performance.Several experiments are conducted to evaluate the performance of the new CSAGAN-DE method.The results show that the new method can efficiently improve the classification performance compared with other related methods. 展开更多
关键词 Classification Generative adversarial network imbalanced data Optimization OVER-SAMPLING
原文传递
Tackling imbalanced data in cybersecurity with transfer learning: a case with ROP payload detection
12
作者 Haizhou Wang Anoop Singhal Peng Liu 《Cybersecurity》 EI CSCD 2023年第2期29-43,共15页
In recent years,deep learning gained proliferating popularity in the cybersecurity application domain,since when being compared to traditional machine learning methods,it usually involves less human efforts,produces b... In recent years,deep learning gained proliferating popularity in the cybersecurity application domain,since when being compared to traditional machine learning methods,it usually involves less human efforts,produces better results,and provides better generalizability.However,the imbalanced data issue is very common in cybersecurity,which can substantially deteriorate the performance of the deep learning models.This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study.We achieved 0.0290 average false positive rate,0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs,with 0 benign training data sample in the target domain.The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate.Using our approach,the total number of false positives is reduced by 23.16%,and as a trade-off,the number of detected malicious samples decreases by 0.68%. 展开更多
关键词 Domain adaptation Return-oriented programming imbalanced dataset
原文传递
GraphCWGAN-GP:A Novel Data Augmenting Approach for Imbalanced Encrypted Traffic Classification
13
作者 Jiangtao Zhai Peng Lin +2 位作者 Yongfu Cui Lilong Xu Ming Liu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第8期2069-2092,共24页
Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Altho... Encrypted traffic classification has become a hot issue in network security research.The class imbalance problem of traffic samples often causes the deterioration of Machine Learning based classifier performance.Although the Generative Adversarial Network(GAN)method can generate new samples by learning the feature distribution of the original samples,it is confronted with the problems of unstable training andmode collapse.To this end,a novel data augmenting approach called Graph CWGAN-GP is proposed in this paper.The traffic data is first converted into grayscale images as the input for the proposed model.Then,the minority class data is augmented with our proposed model,which is built by introducing conditional constraints and a new distance metric in typical GAN.Finally,the classical deep learning model is adopted as a classifier to classify datasets augmented by the Condition GAN(CGAN),Wasserstein GAN-Gradient Penalty(WGAN-GP)and Graph CWGAN-GP,respectively.Compared with the state-of-the-art GAN methods,the Graph CWGAN-GP cannot only control the modes of the data to be generated,but also overcome the problem of unstable training and generate more realistic and diverse samples.The experimental results show that the classification precision,recall and F1-Score of theminority class in the balanced dataset augmented in this paper have improved by more than 2.37%,3.39% and 4.57%,respectively. 展开更多
关键词 Generative Adversarial Network imbalanced traffic data data augmenting encrypted traffic classification
下载PDF
基于re3data的中英科学数据仓储平台对比研究
14
作者 袁烨 陈媛媛 《数字图书馆论坛》 2024年第2期13-23,共11页
以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛... 以re3data为数据获取源,选取中英两国406个科学数据仓储为研究对象,从分布特征、责任类型、仓储许可、技术标准及质量标准等5个方面、11个指标对两国科学数据仓储的建设情况进行对比分析,试图为我国数据仓储的可持续发展提出建议:广泛联结国内外异质机构,推进多学科领域的交流与合作,有效扩充仓储许可权限与类型,优化技术标准的应用现况,提高元数据使用的灵活性。 展开更多
关键词 科学数据 数据仓储平台 re3data 中国 英国
下载PDF
Redundant Data Detection and Deletion to Meet Privacy Protection Requirements in Blockchain-Based Edge Computing Environment
15
作者 Zhang Lejun Peng Minghui +6 位作者 Su Shen Wang Weizheng Jin Zilong Su Yansen Chen Huiling Guo Ran Sergey Gataullin 《China Communications》 SCIE CSCD 2024年第3期149-159,共11页
With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for clou... With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis. 展开更多
关键词 blockchain data integrity edge computing privacy protection redundant data
下载PDF
Research on Interpolation Method for Missing Electricity Consumption Data
16
作者 Junde Chen Jiajia Yuan +3 位作者 Weirong Chen Adnan Zeb Md Suzauddola Yaser A.Nanehkaran 《Computers, Materials & Continua》 SCIE EI 2024年第2期2575-2591,共17页
Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-qual... Missing value is one of the main factors that cause dirty data.Without high-quality data,there will be no reliable analysis results and precise decision-making.Therefore,the data warehouse needs to integrate high-quality data consistently.In the power system,the electricity consumption data of some large users cannot be normally collected resulting in missing data,which affects the calculation of power supply and eventually leads to a large error in the daily power line loss rate.For the problem of missing electricity consumption data,this study proposes a group method of data handling(GMDH)based data interpolation method in distribution power networks and applies it in the analysis of actually collected electricity data.First,the dependent and independent variables are defined from the original data,and the upper and lower limits of missing values are determined according to prior knowledge or existing data information.All missing data are randomly interpolated within the upper and lower limits.Then,the GMDH network is established to obtain the optimal complexity model,which is used to predict the missing data to replace the last imputed electricity consumption data.At last,this process is implemented iteratively until the missing values do not change.Under a relatively small noise level(α=0.25),the proposed approach achieves a maximum error of no more than 0.605%.Experimental findings demonstrate the efficacy and feasibility of the proposed approach,which realizes the transformation from incomplete data to complete data.Also,this proposed data interpolation approach provides a strong basis for the electricity theft diagnosis and metering fault analysis of electricity enterprises. 展开更多
关键词 data interpolation GMDH electricity consumption data distribution system
下载PDF
Defect Detection Model Using Time Series Data Augmentation and Transformation
17
作者 Gyu-Il Kim Hyun Yoo +1 位作者 Han-Jin Cho Kyungyong Chung 《Computers, Materials & Continua》 SCIE EI 2024年第2期1713-1730,共18页
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende... Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight. 展开更多
关键词 Defect detection time series deep learning data augmentation data transformation
下载PDF
Reliable Data Collection Model and Transmission Framework in Large-Scale Wireless Medical Sensor Networks
18
作者 Haosong Gou Gaoyi Zhang +2 位作者 RenêRipardo Calixto Senthil Kumar Jagatheesaperumal Victor Hugo C.de Albuquerque 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期1077-1102,共26页
Large-scale wireless sensor networks(WSNs)play a critical role in monitoring dangerous scenarios and responding to medical emergencies.However,the inherent instability and error-prone nature of wireless links present ... Large-scale wireless sensor networks(WSNs)play a critical role in monitoring dangerous scenarios and responding to medical emergencies.However,the inherent instability and error-prone nature of wireless links present significant challenges,necessitating efficient data collection and reliable transmission services.This paper addresses the limitations of existing data transmission and recovery protocols by proposing a systematic end-to-end design tailored for medical event-driven cluster-based large-scale WSNs.The primary goal is to enhance the reliability of data collection and transmission services,ensuring a comprehensive and practical approach.Our approach focuses on refining the hop-count-based routing scheme to achieve fairness in forwarding reliability.Additionally,it emphasizes reliable data collection within clusters and establishes robust data transmission over multiple hops.These systematic improvements are designed to optimize the overall performance of the WSN in real-world scenarios.Simulation results of the proposed protocol validate its exceptional performance compared to other prominent data transmission schemes.The evaluation spans varying sensor densities,wireless channel conditions,and packet transmission rates,showcasing the protocol’s superiority in ensuring reliable and efficient data transfer.Our systematic end-to-end design successfully addresses the challenges posed by the instability of wireless links in large-scaleWSNs.By prioritizing fairness,reliability,and efficiency,the proposed protocol demonstrates its efficacy in enhancing data collection and transmission services,thereby offering a valuable contribution to the field of medical event-drivenWSNs. 展开更多
关键词 Wireless sensor networks reliable data transmission medical emergencies CLUSTER data collection routing scheme
下载PDF
Call for Papers:Special Section on Progress of Analysis Techniques for Domain-Specific Big Data
19
《Journal of Electronic Science and Technology》 EI CAS CSCD 2024年第1期I0001-I0002,共2页
Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Techno... Guest Editors Prof.Ling Tian Prof.Jian-Hua Tao University of Electronic Science and Technology of China Tsinghua University lingtian@uestc.edu.cn jhtao@tsinghua.edu.cn Dr.Bin Zhou National University of Defense Technology binzhou@nudt.edu.cn, Since the concept of “Big Data” was first introduced in Nature in 2008, it has been widely applied in fields, such as business, healthcare, national defense, education, transportation, and security. With the maturity of artificial intelligence technology, big data analysis techniques tailored to various fields have made significant progress, but still face many challenges in terms of data quality, algorithms, and computing power. 展开更多
关键词 EDUCATION MATURITY data
下载PDF
Assimilation of GOES-R Geostationary Lightning Mapper Flash Extent Density Data in GSI 3DVar, EnKF, and Hybrid En3DVar for the Analysis and Short-Term Forecast of a Supercell Storm Case
20
作者 Rong KONG Ming XUE +2 位作者 Edward R.MANSELL Chengsi LIU Alexandre O.FIERRO 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2024年第2期263-277,共15页
Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interp... Capabilities to assimilate Geostationary Operational Environmental Satellite “R-series ”(GOES-R) Geostationary Lightning Mapper(GLM) flash extent density(FED) data within the operational Gridpoint Statistical Interpolation ensemble Kalman filter(GSI-EnKF) framework were previously developed and tested with a mesoscale convective system(MCS) case. In this study, such capabilities are further developed to assimilate GOES GLM FED data within the GSI ensemble-variational(EnVar) hybrid data assimilation(DA) framework. The results of assimilating the GLM FED data using 3DVar, and pure En3DVar(PEn3DVar, using 100% ensemble covariance and no static covariance) are compared with those of EnKF/DfEnKF for a supercell storm case. The focus of this study is to validate the correctness and evaluate the performance of the new implementation rather than comparing the performance of FED DA among different DA schemes. Only the results of 3DVar and pEn3DVar are examined and compared with EnKF/DfEnKF. Assimilation of a single FED observation shows that the magnitude and horizontal extent of the analysis increments from PEn3DVar are generally larger than from EnKF, which is mainly caused by using different localization strategies in EnFK/DfEnKF and PEn3DVar as well as the integration limits of the graupel mass in the observation operator. Overall, the forecast performance of PEn3DVar is comparable to EnKF/DfEnKF, suggesting correct implementation. 展开更多
关键词 GOES-R LIGHTNING data assimilation ENKF EnVar
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部