Handling the massive amount of data generated by Smart Mobile Devices(SMDs)is a challenging computational problem.Edge Computing is an emerging computation paradigm that is employed to conquer this problem.It can brin...Handling the massive amount of data generated by Smart Mobile Devices(SMDs)is a challenging computational problem.Edge Computing is an emerging computation paradigm that is employed to conquer this problem.It can bring computation power closer to the end devices to reduce their computation latency and energy consumption.Therefore,this paradigm increases the computational ability of SMDs by collaboration with edge servers.This is achieved by computation offloading from the mobile devices to the edge nodes or servers.However,not all applications benefit from computation offloading,which is only suitable for certain types of tasks.Task properties,SMD capability,wireless channel state,and other factors must be counted when making computation offloading decisions.Hence,optimization methods are important tools in scheduling computation offloading tasks in Edge Computing networks.In this paper,we review six types of optimization methods-they are Lyapunov optimization,convex optimization,heuristic techniques,game theory,machine learning,and others.For each type,we focus on the objective functions,application areas,types of offloading methods,evaluation methods,as well as the time complexity of the proposed algorithms.We discuss a few research problems that are still open.Our purpose for this review is to provide a concise summary that can help new researchers get started with their computation offloading researches for Edge Computing networks.展开更多
In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)...In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.展开更多
In this study,an observation points‐based positive‐unlabeled learning algorithm(hence called OP‐PUL)is proposed to deal with positive‐unlabeled learning(PUL)tasks by judiciously assigning highly credible labels to...In this study,an observation points‐based positive‐unlabeled learning algorithm(hence called OP‐PUL)is proposed to deal with positive‐unlabeled learning(PUL)tasks by judiciously assigning highly credible labels to unlabeled samples.The proposed OP‐PUL algorithm has three components.First,an observation point classifier ensemble(OPCE)algorithm is constructed to divide unlabeled samples into two categories,which are temporary positive and permanent negative samples.Second,a temporary OPC(TOPC)is trained based on the combination of original positive samples and permanent negative samples and then the permanent positive samples that are correctly classified with TOPC are retained from the temporary positive samples.Third,a permanent OPC(POPC)is finally trained based on the combination of original positive samples,permanent positive samples and permanent negative samples.An exhaustive experimental evaluation is conducted to validate the feasibility,rationality and effectiveness of the OP‐PUL algorithm,using 30 benchmark PU data sets.Results show that(1)the OP‐PUL algorithm is stable and robust as unlabeled samples and positive samples are increased in unlabeled data sets and(2)the permanent positive samples have a consistent probability distribution with the original positive samples.Moreover,a statistical analysis reveals that POPC in the OP‐PUL algorithm can yield better PUL performances on the 30 data sets in comparison with four well‐known PUL algorithms.This demonstrates that OP‐PUL is a viable algorithm to deal with PUL tasks.展开更多
Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed...Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.展开更多
基金supported by National Key R&D Program of China under Grant.No.2018YFB1800805National Natural Science Foundation of China under Grant No.61772345,61902257,61972261Shenzhen Science and Technology Program under Grant No.RCYX20200714114645048,No.JCYJ20190808142207420,No.GJHZ20190822095416463.
文摘Handling the massive amount of data generated by Smart Mobile Devices(SMDs)is a challenging computational problem.Edge Computing is an emerging computation paradigm that is employed to conquer this problem.It can bring computation power closer to the end devices to reduce their computation latency and energy consumption.Therefore,this paradigm increases the computational ability of SMDs by collaboration with edge servers.This is achieved by computation offloading from the mobile devices to the edge nodes or servers.However,not all applications benefit from computation offloading,which is only suitable for certain types of tasks.Task properties,SMD capability,wireless channel state,and other factors must be counted when making computation offloading decisions.Hence,optimization methods are important tools in scheduling computation offloading tasks in Edge Computing networks.In this paper,we review six types of optimization methods-they are Lyapunov optimization,convex optimization,heuristic techniques,game theory,machine learning,and others.For each type,we focus on the objective functions,application areas,types of offloading methods,evaluation methods,as well as the time complexity of the proposed algorithms.We discuss a few research problems that are still open.Our purpose for this review is to provide a concise summary that can help new researchers get started with their computation offloading researches for Edge Computing networks.
基金National Natural Science Foundation of China,Grant/Award Number:61972261Basic Research Foundations of Shenzhen,Grant/Award Numbers:JCYJ20210324093609026,JCYJ20200813091134001。
文摘In this paper,an Observation Points Classifier Ensemble(OPCE)algorithm is proposed to deal with High-Dimensional Imbalanced Classification(HDIC)problems based on data processed using the Multi-Dimensional Scaling(MDS)feature extraction technique.First,dimensionality of the original imbalanced data is reduced using MDS so that distances between any two different samples are preserved as well as possible.Second,a novel OPCE algorithm is applied to classify imbalanced samples by placing optimised observation points in a low-dimensional data space.Third,optimization of the observation point mappings is carried out to obtain a reliable assessment of the unknown samples.Exhaustive experiments have been conducted to evaluate the feasibility,rationality,and effectiveness of the proposed OPCE algorithm using seven benchmark HDIC data sets.Experimental results show that(1)the OPCE algorithm can be trained faster on low-dimensional imbalanced data than on high-dimensional data;(2)the OPCE algorithm can correctly identify samples as the number of optimised observation points is increased;and(3)statistical analysis reveals that OPCE yields better HDIC performances on the selected data sets in comparison with eight other HDIC algorithms.This demonstrates that OPCE is a viable algorithm to deal with HDIC problems.
基金National Natural Science Foundation of China,Grant/Award Number:61972261Natural Science Foundation of Guangdong Province,Grant/Award Number:2314050006683+1 种基金Key Basic Research Foundation of Shenzhen,Grant/Award Number:JCYJ20220818100205012Basic Research Foundations of Shenzhen,Grant/Award Number:JCYJ20210324093609026.
文摘In this study,an observation points‐based positive‐unlabeled learning algorithm(hence called OP‐PUL)is proposed to deal with positive‐unlabeled learning(PUL)tasks by judiciously assigning highly credible labels to unlabeled samples.The proposed OP‐PUL algorithm has three components.First,an observation point classifier ensemble(OPCE)algorithm is constructed to divide unlabeled samples into two categories,which are temporary positive and permanent negative samples.Second,a temporary OPC(TOPC)is trained based on the combination of original positive samples and permanent negative samples and then the permanent positive samples that are correctly classified with TOPC are retained from the temporary positive samples.Third,a permanent OPC(POPC)is finally trained based on the combination of original positive samples,permanent positive samples and permanent negative samples.An exhaustive experimental evaluation is conducted to validate the feasibility,rationality and effectiveness of the OP‐PUL algorithm,using 30 benchmark PU data sets.Results show that(1)the OP‐PUL algorithm is stable and robust as unlabeled samples and positive samples are increased in unlabeled data sets and(2)the permanent positive samples have a consistent probability distribution with the original positive samples.Moreover,a statistical analysis reveals that POPC in the OP‐PUL algorithm can yield better PUL performances on the 30 data sets in comparison with four well‐known PUL algorithms.This demonstrates that OP‐PUL is a viable algorithm to deal with PUL tasks.
基金Supported in part by the National Natural Science Foundation of China(No.61972261)the National Key R&D Program of China(No.2017YFC0822604-2)
文摘Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.