The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific req...The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.展开更多
DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expres...DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.展开更多
With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Alth...With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Although distributed streaming data processing frameworks such asApache Flink andApache Spark Streaming provide solutions,meeting stringent response time requirements while ensuring high throughput and resource utilization remains an urgent problem.To address this,the study proposes a formal modeling approach based on Performance Evaluation Process Algebra(PEPA),which abstracts the core components and interactions of cloud-based distributed streaming data processing systems.Additionally,a generic service flow generation algorithmis introduced,enabling the automatic extraction of service flows fromthe PEPAmodel and the computation of key performance metrics,including response time,throughput,and resource utilization.The novelty of this work lies in the integration of PEPA-based formal modeling with the service flow generation algorithm,bridging the gap between formal modeling and practical performance evaluation for IoT systems.Simulation experiments demonstrate that optimizing the execution efficiency of components can significantly improve system performance.For instance,increasing the task execution rate from 10 to 100 improves system performance by 9.53%,while further increasing it to 200 results in a 21.58%improvement.However,diminishing returns are observed when the execution rate reaches 500,with only a 0.42%gain.Similarly,increasing the number of TaskManagers from 10 to 20 improves response time by 18.49%,but the improvement slows to 6.06% when increasing from 20 to 50,highlighting the importance of co-optimizing component efficiency and resource management to achieve substantial performance gains.This study provides a systematic framework for analyzing and optimizing the performance of IoT systems for large-scale real-time streaming data processing.The proposed approach not only identifies performance bottlenecks but also offers insights into improving system efficiency under different configurations and workloads.展开更多
In order to solve the problems of short network lifetime and high data transmission delay in data gathering for wireless sensor network(WSN)caused by uneven energy consumption among nodes,a hybrid energy efficient clu...In order to solve the problems of short network lifetime and high data transmission delay in data gathering for wireless sensor network(WSN)caused by uneven energy consumption among nodes,a hybrid energy efficient clustering routing base on firefly and pigeon-inspired algorithm(FF-PIA)is proposed to optimise the data transmission path.After having obtained the optimal number of cluster head node(CH),its result might be taken as the basis of producing the initial population of FF-PIA algorithm.The L′evy flight mechanism and adaptive inertia weighting are employed in the algorithm iteration to balance the contradiction between the global search and the local search.Moreover,a Gaussian perturbation strategy is applied to update the optimal solution,ensuring the algorithm can jump out of the local optimal solution.And,in the WSN data gathering,a onedimensional signal reconstruction algorithm model is developed by dilated convolution and residual neural networks(DCRNN).We conducted experiments on the National Oceanic and Atmospheric Administration(NOAA)dataset.It shows that the DCRNN modeldriven data reconstruction algorithm improves the reconstruction accuracy as well as the reconstruction time performance.FF-PIA and DCRNN clustering routing co-simulation reveals that the proposed algorithm can effectively improve the performance in extending the network lifetime and reducing data transmission delay.展开更多
This paper proposes a multivariate data fusion based quality evaluation model for software talent cultivation.The model constructs a comprehensive ability and quality evaluation index system for college students from ...This paper proposes a multivariate data fusion based quality evaluation model for software talent cultivation.The model constructs a comprehensive ability and quality evaluation index system for college students from a perspective of engineering course,especially of software engineering.As for evaluation method,relying on the behavioral data of students during their school years,we aim to construct the evaluation model as objective as possible,effectively weakening the negative impact of personal subjective assumptions on the evaluation results.展开更多
This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models ...This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.展开更多
To ensure agreement between theoretical calculations and experimental data,parameters to selected nuclear physics models are perturbed and fine-tuned in nuclear data evaluations.This approach assumes that the chosen s...To ensure agreement between theoretical calculations and experimental data,parameters to selected nuclear physics models are perturbed and fine-tuned in nuclear data evaluations.This approach assumes that the chosen set of models accurately represents the‘true’distribution of considered observables.Furthermore,the models are chosen globally,indicating their applicability across the entire energy range of interest.However,this approach overlooks uncertainties inherent in the models themselves.In this work,we propose that instead of selecting globally a winning model set and proceeding with it as if it was the‘true’model set,we,instead,take a weighted average over multiple models within a Bayesian model averaging(BMA)framework,each weighted by its posterior probability.The method involves executing a set of TALYS calculations by randomly varying multiple nuclear physics models and their parameters to yield a vector of calculated observables.Next,computed likelihood function values at each incident energy point were then combined with the prior distributions to obtain updated posterior distributions for selected cross sections and the elastic angular distributions.As the cross sections and elastic angular distributions were updated locally on a per-energy-point basis,the approach typically results in discontinuities or“kinks”in the cross section curves,and these were addressed using spline interpolation.The proposed BMA method was applied to the evaluation of proton-induced reactions on ^(58)Ni between 1 and 100 MeV.The results demonstrated a favorable comparison with experimental data as well as with the TENDL-2023 evaluation.展开更多
The precise correction of atmospheric zenith tropospheric delay(ZTD)is significant for the Global Navigation Satellite System(GNSS)performance regarding positioning accuracy and convergence time.In the past decades,ma...The precise correction of atmospheric zenith tropospheric delay(ZTD)is significant for the Global Navigation Satellite System(GNSS)performance regarding positioning accuracy and convergence time.In the past decades,many empirical ZTD models based on whether the gridded or scattered ZTD products have been proposed and widely used in the GNSS positioning applications.But there is no comprehensive evaluation of these models for the whole China region,which features complicated topography and climate.In this study,we completely assess the typical empirical models,the IGGtropSH model(gridded,non-meteorology),the SHAtropE model(scattered,non-meteorology),and the GPT3 model(gridded,meteorology)using the Crustal Movement Observation Network of China(CMONOC)network.In general,the results show that the three models share consistent performance with RMSE/bias of 37.45/1.63,37.13/2.20,and 38.27/1.34 mm for the GPT3,SHAtropE and IGGtropSH model,respectively.However,the models had a distinct performance regarding geographical distribution,elevation,seasonal variations,and daily variation.In the southeastern region of China,RMSE values are around 50 mm,which are much higher than that in the western region,approximately 20 mm.The SHAtropE model exhibits better performance for areas with large variations in elevation.The GPT3 model and the IGGtropSH model are more stable across different months,and the SHAtropE model based on the GNSS data exhibits superior performance across various UTC epochs.展开更多
A patient co-infected with COVID-19 and viral hepatitis B can be atmore risk of severe complications than the one infected with a single infection.This study develops a comprehensive stochastic model to assess the epi...A patient co-infected with COVID-19 and viral hepatitis B can be atmore risk of severe complications than the one infected with a single infection.This study develops a comprehensive stochastic model to assess the epidemiological impact of vaccine booster doses on the co-dynamics of viral hepatitis B and COVID-19.The model is fitted to real COVID-19 data from Pakistan.The proposed model incorporates logistic growth and saturated incidence functions.Rigorous analyses using the tools of stochastic calculus,are performed to study appropriate conditions for the existence of unique global solutions,stationary distribution in the sense of ergodicity and disease extinction.The stochastic threshold estimated from the data fitting is given by:R_(0)^(S)=3.0651.Numerical assessments are implemented to illustrate the impact of double-dose vaccination and saturated incidence functions on the dynamics of both diseases.The effects of stochastic white noise intensities are also highlighted.展开更多
Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The ...Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.展开更多
Since the launch of the Google Earth Engine(GEE)cloud platform in 2010,it has been widely used,leading to a wealth of valuable information.However,the potential of GEE for forest resource management has not been fully...Since the launch of the Google Earth Engine(GEE)cloud platform in 2010,it has been widely used,leading to a wealth of valuable information.However,the potential of GEE for forest resource management has not been fully exploited.To extract dominant woody plant species,GEE combined Sen-tinel-1(S1)and Sentinel-2(S2)data with the addition of the National Forest Resources Inventory(NFRI)and topographic data,resulting in a 10 m resolution multimodal geospatial dataset for subtropical forests in southeast China.Spectral and texture features,red-edge bands,and vegetation indices of S1 and S2 data were computed.A hierarchical model obtained information on forest distribution and area and the dominant woody plant species.The results suggest that combining data sources from the S1 winter and S2 yearly ranges enhances accuracy in forest distribution and area extraction compared to using either data source independently.Similarly,for dominant woody species recognition,using S1 winter and S2 data across all four seasons was accurate.Including terrain factors and removing spatial correlation from NFRI sample points further improved the recognition accuracy.The optimal forest extraction achieved an overall accuracy(OA)of 97.4%and a maplevel image classification efficacy(MICE)of 96.7%.OA and MICE were 83.6%and 80.7%for dominant species extraction,respectively.The high accuracy and efficacy values indicate that the hierarchical recognition model based on multimodal remote sensing data performed extremely well for extracting information about dominant woody plant species.Visualizing the results using the GEE application allows for an intuitive display of forest and species distribution,offering significant convenience for forest resource monitoring.展开更多
This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hac...This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hackers, thereby making customer/client data visible and unprotected. Also, this led to enormous risk of the clients/customers due to defective equipment, bugs, faulty servers, and specious actions. The aim if this paper therefore is to analyze a secure model using Unicode Transformation Format (UTF) base 64 algorithms for storage of data in cloud securely. The methodology used was Object Orientated Hypermedia Analysis and Design Methodology (OOHADM) was adopted. Python was used to develop the security model;the role-based access control (RBAC) and multi-factor authentication (MFA) to enhance security Algorithm were integrated into the Information System developed with HTML 5, JavaScript, Cascading Style Sheet (CSS) version 3 and PHP7. This paper also discussed some of the following concepts;Development of Computing in Cloud, Characteristics of computing, Cloud deployment Model, Cloud Service Models, etc. The results showed that the proposed enhanced security model for information systems of cooperate platform handled multiple authorization and authentication menace, that only one login page will direct all login requests of the different modules to one Single Sign On Server (SSOS). This will in turn redirect users to their requested resources/module when authenticated, leveraging on the Geo-location integration for physical location validation. The emergence of this newly developed system will solve the shortcomings of the existing systems and reduce time and resources incurred while using the existing system.展开更多
Smart metering has gained considerable attention as a research focus due to its reliability and energy-efficient nature compared to traditional electromechanical metering systems. Existing methods primarily focus on d...Smart metering has gained considerable attention as a research focus due to its reliability and energy-efficient nature compared to traditional electromechanical metering systems. Existing methods primarily focus on data management,rather than emphasizing efficiency. Accurate prediction of electricity consumption is crucial for enabling intelligent grid operations,including resource planning and demandsupply balancing. Smart metering solutions offer users the benefits of effectively interpreting their energy utilization and optimizing costs. Motivated by this,this paper presents an Intelligent Energy Utilization Analysis using Smart Metering Data(IUA-SMD)model to determine energy consumption patterns. The proposed IUA-SMD model comprises three major processes:data Pre-processing,feature extraction,and classification,with parameter optimization. We employ the extreme learning machine(ELM)based classification approach within the IUA-SMD model to derive optimal energy utilization labels. Additionally,we apply the shell game optimization(SGO)algorithm to enhance the classification efficiency of the ELM by optimizing its parameters. The effectiveness of the IUA-SMD model is evaluated using an extensive dataset of smart metering data,and the results are analyzed in terms of accuracy and mean square error(MSE). The proposed model demonstrates superior performance,achieving a maximum accuracy of65.917% and a minimum MSE of0.096. These results highlight the potential of the IUA-SMD model for enabling efficient energy utilization through intelligent analysis of smart metering data.展开更多
The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), ha...The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), has decreased dramatically in the past decades due to climate change and human activity, which may have influenced its ecological functions. To restore its ecological functions, reasonable reforestation is the key measure. Many previous efforts have predicted the potential distribution of Picea crassifolia, which provides guidance on regional reforestation policy. However, all of them were performed at low spatial resolution, thus ignoring the natural characteristics of the patchy distribution of Picea crassifolia. Here, we modeled the distribution of Picea crassifolia with species distribution models at high spatial resolutions. For many models, the area under the receiver operating characteristic curve (AUC) is larger than 0.9, suggesting their excellent precision. The AUC of models at 30 m is higher than that of models at 90 m, and the current potential distribution of Picea crassifolia is more closely aligned with its actual distribution at 30 m, demonstrating that finer data resolution improves model performance. Besides, for models at 90 m resolution, annual precipitation (Bio12) played the paramount influence on the distribution of Picea crassifolia, while the aspect became the most important one at 30 m, indicating the crucial role of finer topographic data in modeling species with patchy distribution. The current distribution of Picea crassifolia was concentrated in the northern and central parts of the study area, and this pattern will be maintained under future scenarios, although some habitat loss in the central parts and gain in the eastern regions is expected owing to increasing temperatures and precipitation. Our findings can guide protective and restoration strategies for the Qilian Mountains, which would benefit regional ecological balance.展开更多
We estimate tree heights using polarimetric interferometric synthetic aperture radar(PolInSAR)data constructed by the dual-polarization(dual-pol)SAR data and random volume over the ground(RVoG)model.Considering the Se...We estimate tree heights using polarimetric interferometric synthetic aperture radar(PolInSAR)data constructed by the dual-polarization(dual-pol)SAR data and random volume over the ground(RVoG)model.Considering the Sentinel-1 SAR dual-pol(SVV,vertically transmitted and vertically received and SVH,vertically transmitted and horizontally received)configuration,one notes that S_(HH),the horizontally transmitted and horizontally received scattering element,is unavailable.The S_(HH)data were constructed using the SVH data,and polarimetric SAR(PolSAR)data were obtained.The proposed approach was first verified in simulation with satisfactory results.It was next applied to construct PolInSAR data by a pair of dual-pol Sentinel-1A data at Duke Forest,North Carolina,USA.According to local observations and forest descriptions,the range of estimated tree heights was overall reasonable.Comparing the heights with the ICESat-2 tree heights at 23 sampling locations,relative errors of 5 points were within±30%.Errors of 8 points ranged from 30%to 40%,but errors of the remaining 10 points were>40%.The results should be encouraged as error reduction is possible.For instance,the construction of PolSAR data should not be limited to using SVH,and a combination of SVH and SVV should be explored.Also,an ensemble of tree heights derived from multiple PolInSAR data can be considered since tree heights do not vary much with time frame in months or one season.展开更多
Machine learning(ML)and data mining are used in various fields such as data analysis,prediction,image processing and especially in healthcare.Researchers in the past decade have focused on applying ML and data mining ...Machine learning(ML)and data mining are used in various fields such as data analysis,prediction,image processing and especially in healthcare.Researchers in the past decade have focused on applying ML and data mining to generate conclusions from historical data in order to improve healthcare systems by making predictions about the results.Using ML algorithms,researchers have developed applications for decision support,analyzed clinical aspects,extracted informative information from historical data,predicted the outcomes and categorized diseases which help physicians make better decisions.It is observed that there is a huge difference between women depending on the region and their social lives.Due to these differences,scholars have been encouraged to conduct studies at a local level in order to better understand those factors that affect maternal health and the expected child.In this study,the ensemble modeling technique is applied to classify birth outcomes based on either cesarean section(C-Section)or normal delivery.A voting ensemble model for the classification of a birth dataset was made by using a Random Forest(RF),Gradient Boosting Classifier,Extra Trees Classifier and Bagging Classifier as base learners.It is observed that the voting ensemble modal of proposed classifiers provides the best accuracy,i.e.,94.78%,as compared to the individual classifiers.ML algorithms are more accurate due to ensemble models,which reduce variance and classification errors.It is reported that when a suitable classification model has been developed for birth classification,decision support systems can be created to enable clinicians to gain in-depth insights into the patterns in the datasets.Developing such a system will not only allow health organizations to improve maternal health assessment processes,but also open doors for interdisciplinary research in two different fields in the region.展开更多
The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses we...The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses were gathered via an online questionnaire to explore the sleep habits, cervical health conditions, and pillow product preferences of modern individuals. The study found that sleeping late and staying up late are common, and the use of electronic devices and caffeine consumption have a negative impact on sleep. Most respondents have cervical discomfort and have varying satisfaction with pillows, which shows their demand for personalized pillows. The machine learning model for predicting the demand of latex pillow was constructed and optimized to provide personalized pillow recommendation, aiming to improve sleep quality and provide market data for sleep product developers.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
With the advent of the Big Data era,the amount of data on supplier information is increasing geometrically.Buyers want to use this data to find high quality suppliers before purchasing,so as to reduce transaction risk...With the advent of the Big Data era,the amount of data on supplier information is increasing geometrically.Buyers want to use this data to find high quality suppliers before purchasing,so as to reduce transaction risks and guarantee transaction quality.Supplier portraits under big data can not only help buyers select high quality suppliers,but also monitor the abnormal behavior of suppliers in real time.In this paper,the supplier data under big data are normalized,correlation analysis is performed,ratings are assigned,and classification is made through fuzzy calculation to give some reference and provide early warning tips for buyers.In addition,this paper is based on the data of active suppliers in the Jiangxi Open Data Innovation Application Competition,and realizes the data mining of two⁃dimensional labels and statistical types,thus forming the supplier portrait model.This paper aims to study supplier data analysis in the big data environment,hoping to provide some suggestions and guidances for the procurement work of related governments,enterprises and individuals.展开更多
The structural modeling of open-high-low-close(OHLC)data contained within the candlestick chart is crucial to financial practice.However,the inherent constraints in OHLC data pose immense challenges to its structural ...The structural modeling of open-high-low-close(OHLC)data contained within the candlestick chart is crucial to financial practice.However,the inherent constraints in OHLC data pose immense challenges to its structural modeling.Models that fail to process these constraints may yield results deviating from those of the original OHLC data structure.To address this issue,a novel unconstrained transformation method,along with its explicit inverse transformation,is proposed to properly handle the inherent constraints of OHLC data.A flexible and effective framework for structurally modeling OHLC data is designed,and the detailed procedure for modeling OHLC data through the vector autoregression and vector error correction model are provided as an example of multivariate time-series analysis.Extensive simulations and three authentic financial datasets from the Kweichow Moutai,CSI 100 index,and 50 ETF of the Chinese stock market demonstrate the effectiveness and stability of the proposed modeling approach.The modeling results of support vector regression provide further evidence that the proposed unconstrained transformation not only ensures structural forecasting of OHLC data but also is an effective feature-extraction method that can effectively improve the forecasting accuracy of machine-learning models for close prices.展开更多
文摘The study aimed to develop a customized Data Governance Maturity Model (DGMM) for the Ministry of Defence (MoD) in Kenya to address data governance challenges in military settings. Current frameworks lack specific requirements for the defence industry. The model uses Key Performance Indicators (KPIs) to enhance data governance procedures. Design Science Research guided the study, using qualitative and quantitative methods to gather data from MoD personnel. Major deficiencies were found in data integration, quality control, and adherence to data security regulations. The DGMM helps the MOD improve personnel, procedures, technology, and organizational elements related to data management. The model was tested against ISO/IEC 38500 and recommended for use in other government sectors with similar data governance issues. The DGMM has the potential to enhance data management efficiency, security, and compliance in the MOD and guide further research in military data governance.
文摘DNA microarray technology is an extremely effective technique for studying gene expression patterns in cells, and the main challenge currently faced by this technology is how to analyze the large amount of gene expression data generated. To address this, this paper employs a mixed-effects model to analyze gene expression data. In terms of data selection, 1176 genes from the white mouse gene expression dataset under two experimental conditions were chosen, setting up two conditions: pneumococcal infection and no infection, and constructing a mixed-effects model. After preprocessing the gene chip information, the data were imported into the model, preliminary results were calculated, and permutation tests were performed to biologically validate the preliminary results using GSEA. The final dataset consists of 20 groups of gene expression data from pneumococcal infection, which categorizes functionally related genes based on the similarity of their expression profiles, facilitating the study of genes with unknown functions.
基金funded by the Joint Project of Industry-University-Research of Jiangsu Province(Grant:BY20231146).
文摘With the widespread application of Internet of Things(IoT)technology,the processing of massive realtime streaming data poses significant challenges to the computational and data-processing capabilities of systems.Although distributed streaming data processing frameworks such asApache Flink andApache Spark Streaming provide solutions,meeting stringent response time requirements while ensuring high throughput and resource utilization remains an urgent problem.To address this,the study proposes a formal modeling approach based on Performance Evaluation Process Algebra(PEPA),which abstracts the core components and interactions of cloud-based distributed streaming data processing systems.Additionally,a generic service flow generation algorithmis introduced,enabling the automatic extraction of service flows fromthe PEPAmodel and the computation of key performance metrics,including response time,throughput,and resource utilization.The novelty of this work lies in the integration of PEPA-based formal modeling with the service flow generation algorithm,bridging the gap between formal modeling and practical performance evaluation for IoT systems.Simulation experiments demonstrate that optimizing the execution efficiency of components can significantly improve system performance.For instance,increasing the task execution rate from 10 to 100 improves system performance by 9.53%,while further increasing it to 200 results in a 21.58%improvement.However,diminishing returns are observed when the execution rate reaches 500,with only a 0.42%gain.Similarly,increasing the number of TaskManagers from 10 to 20 improves response time by 18.49%,but the improvement slows to 6.06% when increasing from 20 to 50,highlighting the importance of co-optimizing component efficiency and resource management to achieve substantial performance gains.This study provides a systematic framework for analyzing and optimizing the performance of IoT systems for large-scale real-time streaming data processing.The proposed approach not only identifies performance bottlenecks but also offers insights into improving system efficiency under different configurations and workloads.
基金partially supported by the National Natural Science Foundation of China(62161016)the Key Research and Development Project of Lanzhou Jiaotong University(ZDYF2304)+1 种基金the Beijing Engineering Research Center of Highvelocity Railway Broadband Mobile Communications(BHRC-2022-1)Beijing Jiaotong University。
文摘In order to solve the problems of short network lifetime and high data transmission delay in data gathering for wireless sensor network(WSN)caused by uneven energy consumption among nodes,a hybrid energy efficient clustering routing base on firefly and pigeon-inspired algorithm(FF-PIA)is proposed to optimise the data transmission path.After having obtained the optimal number of cluster head node(CH),its result might be taken as the basis of producing the initial population of FF-PIA algorithm.The L′evy flight mechanism and adaptive inertia weighting are employed in the algorithm iteration to balance the contradiction between the global search and the local search.Moreover,a Gaussian perturbation strategy is applied to update the optimal solution,ensuring the algorithm can jump out of the local optimal solution.And,in the WSN data gathering,a onedimensional signal reconstruction algorithm model is developed by dilated convolution and residual neural networks(DCRNN).We conducted experiments on the National Oceanic and Atmospheric Administration(NOAA)dataset.It shows that the DCRNN modeldriven data reconstruction algorithm improves the reconstruction accuracy as well as the reconstruction time performance.FF-PIA and DCRNN clustering routing co-simulation reveals that the proposed algorithm can effectively improve the performance in extending the network lifetime and reducing data transmission delay.
基金supported in part by the Education Reform Key Projects of Heilongjiang Province(Grant No.SJGZ20220011,SJGZ20220012)the Excellent Project of Ministry of Education and China Higher Education Association on Digital Ideological and Political Education in Universities(Grant No.GXSZSZJPXM001)。
文摘This paper proposes a multivariate data fusion based quality evaluation model for software talent cultivation.The model constructs a comprehensive ability and quality evaluation index system for college students from a perspective of engineering course,especially of software engineering.As for evaluation method,relying on the behavioral data of students during their school years,we aim to construct the evaluation model as objective as possible,effectively weakening the negative impact of personal subjective assumptions on the evaluation results.
基金sponsored by the U.S.Department of Housing and Urban Development(Grant No.NJLTS0027-22)The opinions expressed in this study are the authors alone,and do not represent the U.S.Depart-ment of HUD’s opinions.
文摘This paper addresses urban sustainability challenges amid global urbanization, emphasizing the need for innova tive approaches aligned with the Sustainable Development Goals. While traditional tools and linear models offer insights, they fall short in presenting a holistic view of complex urban challenges. System dynamics (SD) models that are often utilized to provide holistic, systematic understanding of a research subject, like the urban system, emerge as valuable tools, but data scarcity and theoretical inadequacy pose challenges. The research reviews relevant papers on recent SD model applications in urban sustainability since 2018, categorizing them based on nine key indicators. Among the reviewed papers, data limitations and model assumptions were identified as ma jor challenges in applying SD models to urban sustainability. This led to exploring the transformative potential of big data analytics, a rare approach in this field as identified by this study, to enhance SD models’ empirical foundation. Integrating big data could provide data-driven calibration, potentially improving predictive accuracy and reducing reliance on simplified assumptions. The paper concludes by advocating for new approaches that reduce assumptions and promote real-time applicable models, contributing to a comprehensive understanding of urban sustainability through the synergy of big data and SD models.
基金funding from the Paul ScherrerInstitute,Switzerland through the NES/GFA-ABE Cross Project。
文摘To ensure agreement between theoretical calculations and experimental data,parameters to selected nuclear physics models are perturbed and fine-tuned in nuclear data evaluations.This approach assumes that the chosen set of models accurately represents the‘true’distribution of considered observables.Furthermore,the models are chosen globally,indicating their applicability across the entire energy range of interest.However,this approach overlooks uncertainties inherent in the models themselves.In this work,we propose that instead of selecting globally a winning model set and proceeding with it as if it was the‘true’model set,we,instead,take a weighted average over multiple models within a Bayesian model averaging(BMA)framework,each weighted by its posterior probability.The method involves executing a set of TALYS calculations by randomly varying multiple nuclear physics models and their parameters to yield a vector of calculated observables.Next,computed likelihood function values at each incident energy point were then combined with the prior distributions to obtain updated posterior distributions for selected cross sections and the elastic angular distributions.As the cross sections and elastic angular distributions were updated locally on a per-energy-point basis,the approach typically results in discontinuities or“kinks”in the cross section curves,and these were addressed using spline interpolation.The proposed BMA method was applied to the evaluation of proton-induced reactions on ^(58)Ni between 1 and 100 MeV.The results demonstrated a favorable comparison with experimental data as well as with the TENDL-2023 evaluation.
基金supported by the National Natural Science Foundation of China(42204022,52174160,52274169)Open Fund of Hubei Luojia Laboratory(230100031)+2 种基金the Open Fund of State Laboratory of Information Engineering in Surveying,Mapping and Remote Sensing,Wuhan University(23P02)the Fundamental Research Funds for the Central Universities(2023ZKPYDC10)China University of Mining and Technology-Beijing Innovation Training Program for College Students(202302014,202202023)。
文摘The precise correction of atmospheric zenith tropospheric delay(ZTD)is significant for the Global Navigation Satellite System(GNSS)performance regarding positioning accuracy and convergence time.In the past decades,many empirical ZTD models based on whether the gridded or scattered ZTD products have been proposed and widely used in the GNSS positioning applications.But there is no comprehensive evaluation of these models for the whole China region,which features complicated topography and climate.In this study,we completely assess the typical empirical models,the IGGtropSH model(gridded,non-meteorology),the SHAtropE model(scattered,non-meteorology),and the GPT3 model(gridded,meteorology)using the Crustal Movement Observation Network of China(CMONOC)network.In general,the results show that the three models share consistent performance with RMSE/bias of 37.45/1.63,37.13/2.20,and 38.27/1.34 mm for the GPT3,SHAtropE and IGGtropSH model,respectively.However,the models had a distinct performance regarding geographical distribution,elevation,seasonal variations,and daily variation.In the southeastern region of China,RMSE values are around 50 mm,which are much higher than that in the western region,approximately 20 mm.The SHAtropE model exhibits better performance for areas with large variations in elevation.The GPT3 model and the IGGtropSH model are more stable across different months,and the SHAtropE model based on the GNSS data exhibits superior performance across various UTC epochs.
文摘A patient co-infected with COVID-19 and viral hepatitis B can be atmore risk of severe complications than the one infected with a single infection.This study develops a comprehensive stochastic model to assess the epidemiological impact of vaccine booster doses on the co-dynamics of viral hepatitis B and COVID-19.The model is fitted to real COVID-19 data from Pakistan.The proposed model incorporates logistic growth and saturated incidence functions.Rigorous analyses using the tools of stochastic calculus,are performed to study appropriate conditions for the existence of unique global solutions,stationary distribution in the sense of ergodicity and disease extinction.The stochastic threshold estimated from the data fitting is given by:R_(0)^(S)=3.0651.Numerical assessments are implemented to illustrate the impact of double-dose vaccination and saturated incidence functions on the dynamics of both diseases.The effects of stochastic white noise intensities are also highlighted.
文摘Long-term navigation ability based on consumer-level wearable inertial sensors plays an essential role towards various emerging fields, for instance, smart healthcare, emergency rescue, soldier positioning et al. The performance of existing long-term navigation algorithm is limited by the cumulative error of inertial sensors, disturbed local magnetic field, and complex motion modes of the pedestrian. This paper develops a robust data and physical model dual-driven based trajectory estimation(DPDD-TE) framework, which can be applied for long-term navigation tasks. A Bi-directional Long Short-Term Memory(Bi-LSTM) based quasi-static magnetic field(QSMF) detection algorithm is developed for extracting useful magnetic observation for heading calibration, and another Bi-LSTM is adopted for walking speed estimation by considering hybrid human motion information under a specific time period. In addition, a data and physical model dual-driven based multi-source fusion model is proposed to integrate basic INS mechanization and multi-level constraint and observations for maintaining accuracy under long-term navigation tasks, and enhanced by the magnetic and trajectory features assisted loop detection algorithm. Real-world experiments indicate that the proposed DPDD-TE outperforms than existing algorithms, and final estimated heading and positioning accuracy indexes reaches 5° and less than 2 m under the time period of 30 min, respectively.
基金supported by the National Technology Extension Fund of Forestry,Forest Vegetation Carbon Storage Monitoring Technology Based on Watershed Algorithm ([2019]06)Fundamental Research Funds for the Central Universities (No.PTYX202107).
文摘Since the launch of the Google Earth Engine(GEE)cloud platform in 2010,it has been widely used,leading to a wealth of valuable information.However,the potential of GEE for forest resource management has not been fully exploited.To extract dominant woody plant species,GEE combined Sen-tinel-1(S1)and Sentinel-2(S2)data with the addition of the National Forest Resources Inventory(NFRI)and topographic data,resulting in a 10 m resolution multimodal geospatial dataset for subtropical forests in southeast China.Spectral and texture features,red-edge bands,and vegetation indices of S1 and S2 data were computed.A hierarchical model obtained information on forest distribution and area and the dominant woody plant species.The results suggest that combining data sources from the S1 winter and S2 yearly ranges enhances accuracy in forest distribution and area extraction compared to using either data source independently.Similarly,for dominant woody species recognition,using S1 winter and S2 data across all four seasons was accurate.Including terrain factors and removing spatial correlation from NFRI sample points further improved the recognition accuracy.The optimal forest extraction achieved an overall accuracy(OA)of 97.4%and a maplevel image classification efficacy(MICE)of 96.7%.OA and MICE were 83.6%and 80.7%for dominant species extraction,respectively.The high accuracy and efficacy values indicate that the hierarchical recognition model based on multimodal remote sensing data performed extremely well for extracting information about dominant woody plant species.Visualizing the results using the GEE application allows for an intuitive display of forest and species distribution,offering significant convenience for forest resource monitoring.
文摘This paper was motivated by the existing problems of Cloud Data storage in Imo State University, Nigeria such as outsourced data causing the loss of data and misuse of customer information by unauthorized users or hackers, thereby making customer/client data visible and unprotected. Also, this led to enormous risk of the clients/customers due to defective equipment, bugs, faulty servers, and specious actions. The aim if this paper therefore is to analyze a secure model using Unicode Transformation Format (UTF) base 64 algorithms for storage of data in cloud securely. The methodology used was Object Orientated Hypermedia Analysis and Design Methodology (OOHADM) was adopted. Python was used to develop the security model;the role-based access control (RBAC) and multi-factor authentication (MFA) to enhance security Algorithm were integrated into the Information System developed with HTML 5, JavaScript, Cascading Style Sheet (CSS) version 3 and PHP7. This paper also discussed some of the following concepts;Development of Computing in Cloud, Characteristics of computing, Cloud deployment Model, Cloud Service Models, etc. The results showed that the proposed enhanced security model for information systems of cooperate platform handled multiple authorization and authentication menace, that only one login page will direct all login requests of the different modules to one Single Sign On Server (SSOS). This will in turn redirect users to their requested resources/module when authenticated, leveraging on the Geo-location integration for physical location validation. The emergence of this newly developed system will solve the shortcomings of the existing systems and reduce time and resources incurred while using the existing system.
文摘Smart metering has gained considerable attention as a research focus due to its reliability and energy-efficient nature compared to traditional electromechanical metering systems. Existing methods primarily focus on data management,rather than emphasizing efficiency. Accurate prediction of electricity consumption is crucial for enabling intelligent grid operations,including resource planning and demandsupply balancing. Smart metering solutions offer users the benefits of effectively interpreting their energy utilization and optimizing costs. Motivated by this,this paper presents an Intelligent Energy Utilization Analysis using Smart Metering Data(IUA-SMD)model to determine energy consumption patterns. The proposed IUA-SMD model comprises three major processes:data Pre-processing,feature extraction,and classification,with parameter optimization. We employ the extreme learning machine(ELM)based classification approach within the IUA-SMD model to derive optimal energy utilization labels. Additionally,we apply the shell game optimization(SGO)algorithm to enhance the classification efficiency of the ELM by optimizing its parameters. The effectiveness of the IUA-SMD model is evaluated using an extensive dataset of smart metering data,and the results are analyzed in terms of accuracy and mean square error(MSE). The proposed model demonstrates superior performance,achieving a maximum accuracy of65.917% and a minimum MSE of0.096. These results highlight the potential of the IUA-SMD model for enabling efficient energy utilization through intelligent analysis of smart metering data.
基金supported by the National Natural Science Foundation of China(No.42071057).
文摘The Qilian Mountains, a national key ecological function zone in Western China, play a pivotal role in ecosystem services. However, the distribution of its dominant tree species, Picea crassifolia (Qinghai spruce), has decreased dramatically in the past decades due to climate change and human activity, which may have influenced its ecological functions. To restore its ecological functions, reasonable reforestation is the key measure. Many previous efforts have predicted the potential distribution of Picea crassifolia, which provides guidance on regional reforestation policy. However, all of them were performed at low spatial resolution, thus ignoring the natural characteristics of the patchy distribution of Picea crassifolia. Here, we modeled the distribution of Picea crassifolia with species distribution models at high spatial resolutions. For many models, the area under the receiver operating characteristic curve (AUC) is larger than 0.9, suggesting their excellent precision. The AUC of models at 30 m is higher than that of models at 90 m, and the current potential distribution of Picea crassifolia is more closely aligned with its actual distribution at 30 m, demonstrating that finer data resolution improves model performance. Besides, for models at 90 m resolution, annual precipitation (Bio12) played the paramount influence on the distribution of Picea crassifolia, while the aspect became the most important one at 30 m, indicating the crucial role of finer topographic data in modeling species with patchy distribution. The current distribution of Picea crassifolia was concentrated in the northern and central parts of the study area, and this pattern will be maintained under future scenarios, although some habitat loss in the central parts and gain in the eastern regions is expected owing to increasing temperatures and precipitation. Our findings can guide protective and restoration strategies for the Qilian Mountains, which would benefit regional ecological balance.
文摘We estimate tree heights using polarimetric interferometric synthetic aperture radar(PolInSAR)data constructed by the dual-polarization(dual-pol)SAR data and random volume over the ground(RVoG)model.Considering the Sentinel-1 SAR dual-pol(SVV,vertically transmitted and vertically received and SVH,vertically transmitted and horizontally received)configuration,one notes that S_(HH),the horizontally transmitted and horizontally received scattering element,is unavailable.The S_(HH)data were constructed using the SVH data,and polarimetric SAR(PolSAR)data were obtained.The proposed approach was first verified in simulation with satisfactory results.It was next applied to construct PolInSAR data by a pair of dual-pol Sentinel-1A data at Duke Forest,North Carolina,USA.According to local observations and forest descriptions,the range of estimated tree heights was overall reasonable.Comparing the heights with the ICESat-2 tree heights at 23 sampling locations,relative errors of 5 points were within±30%.Errors of 8 points ranged from 30%to 40%,but errors of the remaining 10 points were>40%.The results should be encouraged as error reduction is possible.For instance,the construction of PolSAR data should not be limited to using SVH,and a combination of SVH and SVV should be explored.Also,an ensemble of tree heights derived from multiple PolInSAR data can be considered since tree heights do not vary much with time frame in months or one season.
基金Natural Sciences and Engineering Research Council of Canada(NSERC)and New Brunswick Innovation Foundation(NBIF)for the financial support of the global project.These granting agencies did not contribute in the design of the study and collection,analysis,and interpretation of data。
文摘Machine learning(ML)and data mining are used in various fields such as data analysis,prediction,image processing and especially in healthcare.Researchers in the past decade have focused on applying ML and data mining to generate conclusions from historical data in order to improve healthcare systems by making predictions about the results.Using ML algorithms,researchers have developed applications for decision support,analyzed clinical aspects,extracted informative information from historical data,predicted the outcomes and categorized diseases which help physicians make better decisions.It is observed that there is a huge difference between women depending on the region and their social lives.Due to these differences,scholars have been encouraged to conduct studies at a local level in order to better understand those factors that affect maternal health and the expected child.In this study,the ensemble modeling technique is applied to classify birth outcomes based on either cesarean section(C-Section)or normal delivery.A voting ensemble model for the classification of a birth dataset was made by using a Random Forest(RF),Gradient Boosting Classifier,Extra Trees Classifier and Bagging Classifier as base learners.It is observed that the voting ensemble modal of proposed classifiers provides the best accuracy,i.e.,94.78%,as compared to the individual classifiers.ML algorithms are more accurate due to ensemble models,which reduce variance and classification errors.It is reported that when a suitable classification model has been developed for birth classification,decision support systems can be created to enable clinicians to gain in-depth insights into the patterns in the datasets.Developing such a system will not only allow health organizations to improve maternal health assessment processes,but also open doors for interdisciplinary research in two different fields in the region.
文摘The purpose of this study is to investigate the sleep habits, cervical health status, and the demand and preference for pillow products of different populations through data analysis. A total of 780 valid responses were gathered via an online questionnaire to explore the sleep habits, cervical health conditions, and pillow product preferences of modern individuals. The study found that sleeping late and staying up late are common, and the use of electronic devices and caffeine consumption have a negative impact on sleep. Most respondents have cervical discomfort and have varying satisfaction with pillows, which shows their demand for personalized pillows. The machine learning model for predicting the demand of latex pillow was constructed and optimized to provide personalized pillow recommendation, aiming to improve sleep quality and provide market data for sleep product developers.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金support by National Natural Science Foundation of China(52066006)Science and Technology Department Major R&D Project of Jiangxi Provincial(20192BBHL80009)+2 种基金Science and Technology Department Major R&D Project of Jiangxi Provincial(20171BAB206031)Education Department of Jiangxi Province Project(GJJ14637,GJJ150909)the Project of Jingdezhen Science and Technology Bureau(2019GYZD008⁃13)。
文摘With the advent of the Big Data era,the amount of data on supplier information is increasing geometrically.Buyers want to use this data to find high quality suppliers before purchasing,so as to reduce transaction risks and guarantee transaction quality.Supplier portraits under big data can not only help buyers select high quality suppliers,but also monitor the abnormal behavior of suppliers in real time.In this paper,the supplier data under big data are normalized,correlation analysis is performed,ratings are assigned,and classification is made through fuzzy calculation to give some reference and provide early warning tips for buyers.In addition,this paper is based on the data of active suppliers in the Jiangxi Open Data Innovation Application Competition,and realizes the data mining of two⁃dimensional labels and statistical types,thus forming the supplier portrait model.This paper aims to study supplier data analysis in the big data environment,hoping to provide some suggestions and guidances for the procurement work of related governments,enterprises and individuals.
基金the financial support from the Beijing Natural Science Foundation(Grant No.9244030)the National Natural Science Foundation of China(Grant Nos.72021001,11701023).
文摘The structural modeling of open-high-low-close(OHLC)data contained within the candlestick chart is crucial to financial practice.However,the inherent constraints in OHLC data pose immense challenges to its structural modeling.Models that fail to process these constraints may yield results deviating from those of the original OHLC data structure.To address this issue,a novel unconstrained transformation method,along with its explicit inverse transformation,is proposed to properly handle the inherent constraints of OHLC data.A flexible and effective framework for structurally modeling OHLC data is designed,and the detailed procedure for modeling OHLC data through the vector autoregression and vector error correction model are provided as an example of multivariate time-series analysis.Extensive simulations and three authentic financial datasets from the Kweichow Moutai,CSI 100 index,and 50 ETF of the Chinese stock market demonstrate the effectiveness and stability of the proposed modeling approach.The modeling results of support vector regression provide further evidence that the proposed unconstrained transformation not only ensures structural forecasting of OHLC data but also is an effective feature-extraction method that can effectively improve the forecasting accuracy of machine-learning models for close prices.