Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/appr...Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.展开更多
In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluste...In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.展开更多
Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection ...Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.展开更多
Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for deliveri...Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for delivering the services to their customers,clients and citizens.But,the interaction is success-ful only based on the trust that each device has on another.Thus trust is very much essential for a social network.As Internet of Things have access over sen-sitive information,it urges to many threats that lead data management to risk.This issue is addressed by trust management that help to take decision about trust-worthiness of requestor and provider before communication and sharing.Several trust-based systems are existing for different domain using Dynamic weight meth-od,Fuzzy classification,Bayes inference and very few Regression analysis for IoT.The proposed algorithm is based on Logistic Regression,which provide strong statistical background to trust prediction.To make our stand strong on regression support to trust,we have compared the performance with equivalent sound Bayes analysis using Beta distribution.The performance is studied in simu-lated IoT setup with Quality of Service(QoS)and Social parameters for the nodes.The proposed model performs better in terms of various metrics.An IoT connects heterogeneous devices such as tags and sensor devices for sharing of information and avail different application services.The most salient features of IoT system is to design it with scalability,extendibility,compatibility and resiliency against attack.The existing worksfinds a way to integrate direct and indirect trust to con-verge quickly and estimate the bias due to attacks in addition to the above features.展开更多
This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. ...This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. The LR and ANN algorithms are employed to train the datasets. The models demonstrate a remarkably high classification accuracy of 89.3% in predicting ozone levels on a given day. Evaluation metrics reveal that both the ANN and LR models exhibit accuracies of 89.3% and 88.4%, respectively. Additionally, the AUC values for both models are comparable, with the ANN achieving 95.4% and the LR obtaining 95.2%. The lower the cross-entropy loss (log loss), the higher the model’s accuracy or performance. Our ANN model yields a log loss of 3.74, while the LR model shows a log loss of 6.03. The prediction time for the ANN model is approximately 0.00 seconds, whereas the LR model takes 0.02 seconds. Our odds ratio analysis indicates that features such as “Solar radiation”, “Std. Dev. Wind Direction”, “outdoor temperature”, “dew point temperature”, and “PM10” contribute to high ozone levels in El Paso, Texas. Based on metrics such as accuracy, error rate, log loss, and prediction time, the ANN model proves to be faster and more suitable for ozone classification in the El Paso, Texas area.展开更多
In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for pr...In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.展开更多
This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,w...This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.展开更多
Nowadays,Wireless Sensor Network(WSN)is a modern technology with a wide range of applications and greatly attractive benefits,for example,self-governing,low expenditure on execution and data communication,long-term fu...Nowadays,Wireless Sensor Network(WSN)is a modern technology with a wide range of applications and greatly attractive benefits,for example,self-governing,low expenditure on execution and data communication,long-term function,and unsupervised access to the network.The Internet of Things(IoT)is an attractive,exciting paradigm.By applying communication technologies in sensors and supervising features,WSNs have initiated communication between the IoT devices.Though IoT offers access to the highest amount of information collected through WSNs,it leads to privacy management problems.Hence,this paper provides a Logistic Regression machine learning with the Elliptical Curve Cryptography technique(LRECC)to establish a secure IoT structure for preventing,detecting,and mitigating threats.This approach uses the Elliptical Curve Cryptography(ECC)algorithm to generate and distribute security keys.ECC algorithm is a light weight key;thus,it minimizes the routing overhead.Furthermore,the Logistic Regression machine learning technique selects the transmitter based on intelligent results.The main application of this approach is smart cities.This approach provides continuing reliable routing paths with small overheads.In addition,route nodes cooperate with IoT,and it handles the resources proficiently and minimizes the 29.95%delay.展开更多
This study aimed to assess the potential of in-situ measured soil and vegetation characteristics in landslide susceptibility analyses.First,data for eight independent variables,i.e.,soil moisture content,soil organic ...This study aimed to assess the potential of in-situ measured soil and vegetation characteristics in landslide susceptibility analyses.First,data for eight independent variables,i.e.,soil moisture content,soil organic content,compaction of soil(soil toughness),plant root strength,crop biomass,tree diameter at knee height,Shannon Wiener Index(SWI)for trees and herbs was assembled from field tests at two historic landslide locations:Aranayaka and Kurukudegama,Sri Lanka.An economical,finer resolution database was obtained as the field tests were not cost-prohibitive.The logistic regression(LR)analysis showed that soil moisture content,compaction of soil,SWI for trees and herbs were statistically significant at P<0.05.The variance inflation factors(VIFs)were computed to test for multicollinearity.VIF values(<2)confirmed the absence of multicollinearity between four independent variables in the LR model.Receiver Operating Characteristics(ROC)curve and Confusion Metrix(CM)methods were used to validate the model.In ROC analysis,areas under the curve of Success Rate Curve and Prediction Rate Curve were 84.5% and 96.6%,respectively,demonstrating the model’s excellent compatibility and predictability.According to the CM,the model demonstrated a 79.6% accuracy,63.6% precision,100% recall,and a F-measure of 77.8%.The model coefficients revealed that the vegetation cover has a more significant contribution to landslide susceptibility than soil characteristics.Finally,the susceptibility map,which was then classified as low,medium,and highly susceptible areas based on the natural breaks(Jenks)method,was generated using geographical information systems(GIS)techniques.All the historic landslide locations fell into the high susceptibility areas.Thus,validation of the model and inspection of the susceptibility map indicated that the in-situ soil and vegetation characteristics used in the model could be employed to demarcate historical landslide patches and identify landslide susceptible locations with high confidence.展开更多
Ecological land is an important guarantee to maintain urban ecological security and sustainable development.Although increasing studies have been brought to ecological land,with few explorations of the relative import...Ecological land is an important guarantee to maintain urban ecological security and sustainable development.Although increasing studies have been brought to ecological land,with few explorations of the relative importance of anthropogenic-natural factors and how they interact to induce the ecological land evolution.This research sought to fill this gap.In this study,18 factors,including the risk of goaf collapse,fault,prime croplands,were selected from six aspects of topography,geology,climate,accessibility,socio-economic and land control policies.logistic regression(LR)and random forest(RF)models were adopted to identify the anthropogenic and biophysical factors on the dynamic change of ecological land of Mentougou in Beijing from 1990 to 2018.The results show that there was a significant increase in ecological land from 1990 to 2018.The increased area of ecological land reached 102.11 km2 with an increased rate of 0.78,the gravity center of ecological land gradually moved to the northwest.The impact of anthropogenic factors on ecological land was greater than that of natural factors,ecological land was mainly driven by proportion of prime cropland,per capita GDP,land urbanization,temperature,per capita rural income,elevation and aspect factors.Additionally,slope and precipitation were also identified as important predictors for ecological land change.The model comparison suggested that RF can better identify the relationship between ecological land and explanatory variables than LR model.Based on our findings,the implementation of government policies along with anthropogenic factors are the most important variables influencing ecological land change,and the rational planning and allocation of ecological land by Mentougou government are still needed.展开更多
Landslide distribution and susceptibility mapping are the fundamental steps for landslide-related hazard and disaster risk management activities, especially in the Himalaya region which has resulted in a great deal of...Landslide distribution and susceptibility mapping are the fundamental steps for landslide-related hazard and disaster risk management activities, especially in the Himalaya region which has resulted in a great deal of death and damage to property. To better understand the landslide condition in the Nepal Himalaya, we carried out an investigation on the landslide distribution and susceptibility using the landslide inventory data and 12 different contributing factors in the Dailekh district, Western Nepal. Based on the evaluation of the frequency distribution of the landslide, the relationship between the landslide and the various contributing factors was determined.Then, the landslide susceptibility was calculated using logistic regression and statistical index methods along with different topographic(slope, aspect, relative relief, plan curvature, altitude, topographic wetness index) and non-topographic factors(distance from river, normalized difference vegetation index(NDVI), distance from road, precipitation, land use and land cover, and geology), and 470(70%) of total 658 landslides. The receiver operating characteristic(ROC) curve analysis using 198(30%) of total landslides showed that the prediction curve rates(area under the curve, AUC) values for two methods(logistic regression and statistical index) were 0.826, and 0.823with success rates of 0.793, and 0.811, respectively. The values of R-Index for the logistic regression and statistical index methods were83.66 and 88.54, respectively, consisting of high susceptible hazard classes. In general, this research concluded that the cohesive and coherent natural interplay of topographic and non-topographic factors strongly affects landslide occurrence, distribution, and susceptibility condition in the Nepal Himalaya region. Furthermore, the reliability of these two methods is verified for landslide susceptibility mapping in Nepal’s central mountain region.展开更多
For high-dimensional models with a focus on classification performance,the?1-penalized logistic regression is becoming important and popular.However,the Lasso estimates could be problematic when penalties of different...For high-dimensional models with a focus on classification performance,the?1-penalized logistic regression is becoming important and popular.However,the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data.We propose two types of weighted Lasso estimates,depending upon covariates determined by the Mc Diarmid inequality.Given sample size n and a dimension of covariates p,the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the?1-estimation error and the squared prediction error of the unknown parameters.We compare the performance of our method with that of former weighted estimates on simulated data,then apply it to do real data analysis.展开更多
Deforestation process represents a wide concern mainly in the mountain environments due to its role in global warming, biodiversity loss, land degradation and natural hazards occurrence. Thus, the present study is foc...Deforestation process represents a wide concern mainly in the mountain environments due to its role in global warming, biodiversity loss, land degradation and natural hazards occurrence. Thus, the present study is focused on the largest afforested landform unit of Romania and, consequently, the most affected area by forest losses: Carpathian Mountains. The main goal of the paper is to examine and analyze the various explanatory variables associated with deforestation process and to model the probability of deforestation using GIS spatial analysis and logistic regression. The forest cover for 1990 and 2012, derived from CORINE Land Cover(CLC) database, were used to quantify historical forest cover change included in the modelling. To explain the biophysical and anthropogenic effects, this study considered several explanatory factors related to local topography, forest cover pattern, accessibility, urban growth and population density. Using ROC(Receiver Operating Characteristic) and 500 controlling sampling points, the statistical and spatial validations were assessed in order to evaluate the performance of the resulted data. The analysis showed that the area experienced a continuous forest cover change, leading to the loss of over 250,000 ha of forested area during the period 1990–2012. The most significant influence of the explanatory factors of deforestation were noticed in case of distance to forest edge(β=–4.215), forest fragmentation(β=2.231), slope declivity(β=–1.901), elevation(β=1.734) and distance to roads(β=–1.713). The statistical and spatial validation indicates a good accuracy of the model with reasonably AUC(0.736) and Kappa(0.739) values. The model's results suggest an intensification of the deforestation process in the area, designing numerous new clusters with high probability in the Apuseni Mountains, northern and central part of the Eastern Carpathians, western part of the Southern Carpathians and northern part of the Banat Mountains. The study could represent a useful outcome to identify the forests more vulnerable to logging and to adopt appropriate policies and decisions in forest management and conservation. In addition, the resulted probability map could be used in other studies in order to investigate potential environmental implications(e.g. geomorphological hazards or impact on biodiversity and landscape diversity).展开更多
BACKGROUND Acute kidney injury(AKI)has serious consequences on the prognosis of patients undergoing liver transplantation.Recently,artificial neural network(ANN)was reported to have better predictive ability than the ...BACKGROUND Acute kidney injury(AKI)has serious consequences on the prognosis of patients undergoing liver transplantation.Recently,artificial neural network(ANN)was reported to have better predictive ability than the classical logistic regression(LR)for this postoperative outcome.AIM To identify the risk factors of AKI after deceased-donor liver transplantation(DDLT)and compare the prediction performance of ANN with that of LR for this complication.METHODS Adult patients with no evidence of end-stage kidney dysfunction(KD)who underwent the first DDLT according to model for end-stage liver disease(MELD)score allocation system was evaluated.AKI was defined according to the International Club of Ascites criteria,and potential predictors of postoperative AKI were identified by LR.The prediction performance of both ANN and LR was tested.RESULTS The incidence of AKI was 60.6%(n=88/145)and the following predictors were identified by LR:MELD score>25(odds ratio[OR]=1.999),preoperative kidney dysfunction(OR=1.279),extended criteria donors(OR=1.191),intraoperative arterial hypotension(OR=1.935),intraoperative massive blood transfusion(MBT)(OR=1.830),and postoperative serum lactate(SL)(OR=2.001).The area under the receiver-operating characteristic curve was best for ANN(0.81,95%confidence interval[CI]:0.75-0.83)than for LR(0.71,95%CI:0.67-0.76).The root-mean-square error and mean absolute error in the ANN model were 0.47 and 0.38,respectively.CONCLUSION The severity of liver disease,pre-existing kidney dysfunction,marginal grafts,hemodynamic instability,MBT,and SL are predictors of postoperative AKI,and ANN has better prediction performance than LR in this scenario.展开更多
Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application...Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application. So, recent aware-context CF takes advantages of such information in order to improve the quality of recommendation. There are three main aware-context approaches: contextual pre-filtering, contextual post-filtering and contextual modeling. Each approach has individual strong points and drawbacks but there is a requirement of steady and fast inference model which supports the aware-context recommendation process. This paper proposes a new approach which discovers multivariate logistic regression model by mining both traditional rating data and contextual data. Logistic model is optimal inference model in response to the binary question “whether or not a user prefers a list of recommendations with regard to contextual condition”. Consequently, such regression model is used as a filter to remove irrelevant items from recommendations. The final list is the best recommendations to be given to users under contextual information. Moreover the searching items space of logistic model is reduced to smaller set of items so-called general user pattern (GUP). GUP supports logistic model to be faster in real-time response.展开更多
Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classifi...Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classification problem,such as problem with non-equilibrium samples.Many scholars have proposed some methods,such as neural network,least square support vector machine,AdaBoost meta-algorithm,etc.These methods essentially belong to machine learning categories.In this work,based on the probability theory and statistical principle,we propose an improved logistic regression algorithm based on kernel density estimation for solving nonlinear multi-classification.We have compared our approach with other methods using non-equilibrium samples,the results show that our approach guarantees sample integrity and achieves superior classification.展开更多
Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey...Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey (DHS), are constructed assuming complex sampling, i.e., probabilistic, stratified and multistage sampling, with unequal weights in the observations;this complex design must be taken into account in order to have reliable results. However, this very relevant issue usually is not well analyzed in the literature. The aim of the study is to specify the logistic regression model with complex sample design, and to demonstrate how to estimate it using the R software survey package. More specifically, we used Mozambique Demographic Health and Survey data 2011 (MDHS 2011) to illustrate how to correct for the effect of sample design in the particular case of estimating the risk factors associated to the probability of using mosquito bed nets. Our results show that in the presence of complex sampling, appropriate methods must be used both in descriptive and inferential statistics.展开更多
文摘Purpose:The purpose of this study is to develop and compare model choice strategies in context of logistic regression.Model choice means the choice of the covariates to be included in the model.Design/methodology/approach:The study is based on Monte Carlo simulations.The methods are compared in terms of three measures of accuracy:specificity and two kinds of sensitivity.A loss function combining sensitivity and specificity is introduced and used for a final comparison.Findings:The choice of method depends on how much the users emphasize sensitivity against specificity.It also depends on the sample size.For a typical logistic regression setting with a moderate sample size and a small to moderate effect size,either BIC,BICc or Lasso seems to be optimal.Research limitations:Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data.Thus,more simulations are needed.Practical implications:Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper.Alternatively,they could run their own simulations and calculate the loss function.Originality/value:This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression.The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
文摘In view of the composition analysis and identification of ancient glass products, L1 regularization, K-Means cluster analysis, elbow rule and other methods were comprehensively used to build logical regression, cluster analysis, hyper-parameter test and other models, and SPSS, Python and other tools were used to obtain the classification rules of glass products under different fluxes, sub classification under different chemical compositions, hyper-parameter K value test and rationality analysis. Research can provide theoretical support for the protection and restoration of ancient glass relics.
基金The authors extend their appreciation to the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number(IF2-PSAU-2022/01/22043)。
文摘Autism spectrum disorder(ASD),classified as a developmental disability,is now more common in children than ever.A drastic increase in the rate of autism spectrum disorder in children worldwide demands early detection of autism in children.Parents can seek professional help for a better prognosis of the child’s therapy when ASD is diagnosed under five years.This research study aims to develop an automated tool for diagnosing autism in children.The computer-aided diagnosis tool for ASD detection is designed and developed by a novel methodology that includes data acquisition,feature selection,and classification phases.The most deterministic features are selected from the self-acquired dataset by novel feature selection methods before classification.The Imperialistic competitive algorithm(ICA)based on empires conquering colonies performs feature selection in this study.The performance of Logistic Regression(LR),Decision tree,K-Nearest Neighbor(KNN),and Random Forest(RF)classifiers are experimentally studied in this research work.The experimental results prove that the Logistic regression classifier exhibits the highest accuracy for the self-acquired dataset.The ASD detection is evaluated experimentally with the Least Absolute Shrinkage and Selection Operator(LASSO)feature selection method and different classifiers.The Exploratory Data Analysis(EDA)phase has uncovered crucial facts about the data,like the correlation of the features in the dataset with the class variable.
文摘Internet of Things(IoT)is a popular social network in which devices are virtually connected for communicating and sharing information.This is applied greatly in business enterprises and government sectors for delivering the services to their customers,clients and citizens.But,the interaction is success-ful only based on the trust that each device has on another.Thus trust is very much essential for a social network.As Internet of Things have access over sen-sitive information,it urges to many threats that lead data management to risk.This issue is addressed by trust management that help to take decision about trust-worthiness of requestor and provider before communication and sharing.Several trust-based systems are existing for different domain using Dynamic weight meth-od,Fuzzy classification,Bayes inference and very few Regression analysis for IoT.The proposed algorithm is based on Logistic Regression,which provide strong statistical background to trust prediction.To make our stand strong on regression support to trust,we have compared the performance with equivalent sound Bayes analysis using Beta distribution.The performance is studied in simu-lated IoT setup with Quality of Service(QoS)and Social parameters for the nodes.The proposed model performs better in terms of various metrics.An IoT connects heterogeneous devices such as tags and sensor devices for sharing of information and avail different application services.The most salient features of IoT system is to design it with scalability,extendibility,compatibility and resiliency against attack.The existing worksfinds a way to integrate direct and indirect trust to con-verge quickly and estimate the bias due to attacks in addition to the above features.
文摘This paper focuses on ozone prediction in the atmosphere using a machine learning approach. We utilize air pollutant and meteorological variable datasets from the El Paso area to classify ozone levels as high or low. The LR and ANN algorithms are employed to train the datasets. The models demonstrate a remarkably high classification accuracy of 89.3% in predicting ozone levels on a given day. Evaluation metrics reveal that both the ANN and LR models exhibit accuracies of 89.3% and 88.4%, respectively. Additionally, the AUC values for both models are comparable, with the ANN achieving 95.4% and the LR obtaining 95.2%. The lower the cross-entropy loss (log loss), the higher the model’s accuracy or performance. Our ANN model yields a log loss of 3.74, while the LR model shows a log loss of 6.03. The prediction time for the ANN model is approximately 0.00 seconds, whereas the LR model takes 0.02 seconds. Our odds ratio analysis indicates that features such as “Solar radiation”, “Std. Dev. Wind Direction”, “outdoor temperature”, “dew point temperature”, and “PM10” contribute to high ozone levels in El Paso, Texas. Based on metrics such as accuracy, error rate, log loss, and prediction time, the ANN model proves to be faster and more suitable for ozone classification in the El Paso, Texas area.
文摘In this paper, a weighted maximum likelihood technique (WMLT) for the logistic regression model is presented. This method depended on a weight function that is continuously adaptable using Mahalanobis distances for predictor variables. Under the model, the asymptotic consistency of the suggested estimator is demonstrated and properties of finite-sample are also investigated via simulation. In simulation studies and real data sets, it is observed that the newly proposed technique demonstrated the greatest performance among all estimators compared.
文摘This paper presents a case study on the IPUMS NHIS database,which provides data from censuses and surveys on the health of the U.S.population,including data related to COVID-19.By addressing gaps in previous studies,we propose a machine learning approach to train predictive models for identifying and measuring factors that affect the severity of COVID-19 symptoms.Our experiments focus on four groups of factors:demographic,socio-economic,health condition,and related to COVID-19 vaccination.By analysing the sensitivity of the variables used to train the models and the VEC(variable effect characteristics)analysis on the variable values,we identify and measure importance of various factors that influence the severity of COVID-19 symptoms.
文摘Nowadays,Wireless Sensor Network(WSN)is a modern technology with a wide range of applications and greatly attractive benefits,for example,self-governing,low expenditure on execution and data communication,long-term function,and unsupervised access to the network.The Internet of Things(IoT)is an attractive,exciting paradigm.By applying communication technologies in sensors and supervising features,WSNs have initiated communication between the IoT devices.Though IoT offers access to the highest amount of information collected through WSNs,it leads to privacy management problems.Hence,this paper provides a Logistic Regression machine learning with the Elliptical Curve Cryptography technique(LRECC)to establish a secure IoT structure for preventing,detecting,and mitigating threats.This approach uses the Elliptical Curve Cryptography(ECC)algorithm to generate and distribute security keys.ECC algorithm is a light weight key;thus,it minimizes the routing overhead.Furthermore,the Logistic Regression machine learning technique selects the transmitter based on intelligent results.The main application of this approach is smart cities.This approach provides continuing reliable routing paths with small overheads.In addition,route nodes cooperate with IoT,and it handles the resources proficiently and minimizes the 29.95%delay.
基金funded by the National Research Council,Sri Lanka[NRC 17-066]。
文摘This study aimed to assess the potential of in-situ measured soil and vegetation characteristics in landslide susceptibility analyses.First,data for eight independent variables,i.e.,soil moisture content,soil organic content,compaction of soil(soil toughness),plant root strength,crop biomass,tree diameter at knee height,Shannon Wiener Index(SWI)for trees and herbs was assembled from field tests at two historic landslide locations:Aranayaka and Kurukudegama,Sri Lanka.An economical,finer resolution database was obtained as the field tests were not cost-prohibitive.The logistic regression(LR)analysis showed that soil moisture content,compaction of soil,SWI for trees and herbs were statistically significant at P<0.05.The variance inflation factors(VIFs)were computed to test for multicollinearity.VIF values(<2)confirmed the absence of multicollinearity between four independent variables in the LR model.Receiver Operating Characteristics(ROC)curve and Confusion Metrix(CM)methods were used to validate the model.In ROC analysis,areas under the curve of Success Rate Curve and Prediction Rate Curve were 84.5% and 96.6%,respectively,demonstrating the model’s excellent compatibility and predictability.According to the CM,the model demonstrated a 79.6% accuracy,63.6% precision,100% recall,and a F-measure of 77.8%.The model coefficients revealed that the vegetation cover has a more significant contribution to landslide susceptibility than soil characteristics.Finally,the susceptibility map,which was then classified as low,medium,and highly susceptible areas based on the natural breaks(Jenks)method,was generated using geographical information systems(GIS)techniques.All the historic landslide locations fell into the high susceptibility areas.Thus,validation of the model and inspection of the susceptibility map indicated that the in-situ soil and vegetation characteristics used in the model could be employed to demarcate historical landslide patches and identify landslide susceptible locations with high confidence.
基金funded by the National Natural Science Foundation of China(Grant No.41877533)。
文摘Ecological land is an important guarantee to maintain urban ecological security and sustainable development.Although increasing studies have been brought to ecological land,with few explorations of the relative importance of anthropogenic-natural factors and how they interact to induce the ecological land evolution.This research sought to fill this gap.In this study,18 factors,including the risk of goaf collapse,fault,prime croplands,were selected from six aspects of topography,geology,climate,accessibility,socio-economic and land control policies.logistic regression(LR)and random forest(RF)models were adopted to identify the anthropogenic and biophysical factors on the dynamic change of ecological land of Mentougou in Beijing from 1990 to 2018.The results show that there was a significant increase in ecological land from 1990 to 2018.The increased area of ecological land reached 102.11 km2 with an increased rate of 0.78,the gravity center of ecological land gradually moved to the northwest.The impact of anthropogenic factors on ecological land was greater than that of natural factors,ecological land was mainly driven by proportion of prime cropland,per capita GDP,land urbanization,temperature,per capita rural income,elevation and aspect factors.Additionally,slope and precipitation were also identified as important predictors for ecological land change.The model comparison suggested that RF can better identify the relationship between ecological land and explanatory variables than LR model.Based on our findings,the implementation of government policies along with anthropogenic factors are the most important variables influencing ecological land change,and the rational planning and allocation of ecological land by Mentougou government are still needed.
基金Under the auspices of the CAS Overseas Institutions Platform Project (No. 131C11KYSB20200033)the National Natural Science Foundation of China (No. 42071349)the Sichuan Science and Technology Program (No. 2020JDJQ0003)。
文摘Landslide distribution and susceptibility mapping are the fundamental steps for landslide-related hazard and disaster risk management activities, especially in the Himalaya region which has resulted in a great deal of death and damage to property. To better understand the landslide condition in the Nepal Himalaya, we carried out an investigation on the landslide distribution and susceptibility using the landslide inventory data and 12 different contributing factors in the Dailekh district, Western Nepal. Based on the evaluation of the frequency distribution of the landslide, the relationship between the landslide and the various contributing factors was determined.Then, the landslide susceptibility was calculated using logistic regression and statistical index methods along with different topographic(slope, aspect, relative relief, plan curvature, altitude, topographic wetness index) and non-topographic factors(distance from river, normalized difference vegetation index(NDVI), distance from road, precipitation, land use and land cover, and geology), and 470(70%) of total 658 landslides. The receiver operating characteristic(ROC) curve analysis using 198(30%) of total landslides showed that the prediction curve rates(area under the curve, AUC) values for two methods(logistic regression and statistical index) were 0.826, and 0.823with success rates of 0.793, and 0.811, respectively. The values of R-Index for the logistic regression and statistical index methods were83.66 and 88.54, respectively, consisting of high susceptible hazard classes. In general, this research concluded that the cohesive and coherent natural interplay of topographic and non-topographic factors strongly affects landslide occurrence, distribution, and susceptibility condition in the Nepal Himalaya region. Furthermore, the reliability of these two methods is verified for landslide susceptibility mapping in Nepal’s central mountain region.
基金Supported by the National Natural Science Foundation of China(61877023)the Fundamental Research Funds for the Central Universities(CCNU19TD009)。
文摘For high-dimensional models with a focus on classification performance,the?1-penalized logistic regression is becoming important and popular.However,the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data.We propose two types of weighted Lasso estimates,depending upon covariates determined by the Mc Diarmid inequality.Given sample size n and a dimension of covariates p,the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the?1-estimation error and the squared prediction error of the unknown parameters.We compare the performance of our method with that of former weighted estimates on simulated data,then apply it to do real data analysis.
基金elaborated in the framework of the research project framed into the research plan of the Institute of Geography, Romanian Academy:"The National Geographic Atlas of Romania"
文摘Deforestation process represents a wide concern mainly in the mountain environments due to its role in global warming, biodiversity loss, land degradation and natural hazards occurrence. Thus, the present study is focused on the largest afforested landform unit of Romania and, consequently, the most affected area by forest losses: Carpathian Mountains. The main goal of the paper is to examine and analyze the various explanatory variables associated with deforestation process and to model the probability of deforestation using GIS spatial analysis and logistic regression. The forest cover for 1990 and 2012, derived from CORINE Land Cover(CLC) database, were used to quantify historical forest cover change included in the modelling. To explain the biophysical and anthropogenic effects, this study considered several explanatory factors related to local topography, forest cover pattern, accessibility, urban growth and population density. Using ROC(Receiver Operating Characteristic) and 500 controlling sampling points, the statistical and spatial validations were assessed in order to evaluate the performance of the resulted data. The analysis showed that the area experienced a continuous forest cover change, leading to the loss of over 250,000 ha of forested area during the period 1990–2012. The most significant influence of the explanatory factors of deforestation were noticed in case of distance to forest edge(β=–4.215), forest fragmentation(β=2.231), slope declivity(β=–1.901), elevation(β=1.734) and distance to roads(β=–1.713). The statistical and spatial validation indicates a good accuracy of the model with reasonably AUC(0.736) and Kappa(0.739) values. The model's results suggest an intensification of the deforestation process in the area, designing numerous new clusters with high probability in the Apuseni Mountains, northern and central part of the Eastern Carpathians, western part of the Southern Carpathians and northern part of the Banat Mountains. The study could represent a useful outcome to identify the forests more vulnerable to logging and to adopt appropriate policies and decisions in forest management and conservation. In addition, the resulted probability map could be used in other studies in order to investigate potential environmental implications(e.g. geomorphological hazards or impact on biodiversity and landscape diversity).
文摘BACKGROUND Acute kidney injury(AKI)has serious consequences on the prognosis of patients undergoing liver transplantation.Recently,artificial neural network(ANN)was reported to have better predictive ability than the classical logistic regression(LR)for this postoperative outcome.AIM To identify the risk factors of AKI after deceased-donor liver transplantation(DDLT)and compare the prediction performance of ANN with that of LR for this complication.METHODS Adult patients with no evidence of end-stage kidney dysfunction(KD)who underwent the first DDLT according to model for end-stage liver disease(MELD)score allocation system was evaluated.AKI was defined according to the International Club of Ascites criteria,and potential predictors of postoperative AKI were identified by LR.The prediction performance of both ANN and LR was tested.RESULTS The incidence of AKI was 60.6%(n=88/145)and the following predictors were identified by LR:MELD score>25(odds ratio[OR]=1.999),preoperative kidney dysfunction(OR=1.279),extended criteria donors(OR=1.191),intraoperative arterial hypotension(OR=1.935),intraoperative massive blood transfusion(MBT)(OR=1.830),and postoperative serum lactate(SL)(OR=2.001).The area under the receiver-operating characteristic curve was best for ANN(0.81,95%confidence interval[CI]:0.75-0.83)than for LR(0.71,95%CI:0.67-0.76).The root-mean-square error and mean absolute error in the ANN model were 0.47 and 0.38,respectively.CONCLUSION The severity of liver disease,pre-existing kidney dysfunction,marginal grafts,hemodynamic instability,MBT,and SL are predictors of postoperative AKI,and ANN has better prediction performance than LR in this scenario.
文摘Traditional collaborative filtering (CF) does not take into account contextual factors such as time, place, companion, environment, etc. which are useful information around users or relevant to recommender application. So, recent aware-context CF takes advantages of such information in order to improve the quality of recommendation. There are three main aware-context approaches: contextual pre-filtering, contextual post-filtering and contextual modeling. Each approach has individual strong points and drawbacks but there is a requirement of steady and fast inference model which supports the aware-context recommendation process. This paper proposes a new approach which discovers multivariate logistic regression model by mining both traditional rating data and contextual data. Logistic model is optimal inference model in response to the binary question “whether or not a user prefers a list of recommendations with regard to contextual condition”. Consequently, such regression model is used as a filter to remove irrelevant items from recommendations. The final list is the best recommendations to be given to users under contextual information. Moreover the searching items space of logistic model is reduced to smaller set of items so-called general user pattern (GUP). GUP supports logistic model to be faster in real-time response.
基金The authors would like to thank all anonymous reviewers for their suggestions and feedback.This work was supported by National Natural Science Foundation of China(Grant No.61379103).
文摘Logistic regression is often used to solve linear binary classification problems such as machine vision,speech recognition,and handwriting recognition.However,it usually fails to solve certain nonlinear multi-classification problem,such as problem with non-equilibrium samples.Many scholars have proposed some methods,such as neural network,least square support vector machine,AdaBoost meta-algorithm,etc.These methods essentially belong to machine learning categories.In this work,based on the probability theory and statistical principle,we propose an improved logistic regression algorithm based on kernel density estimation for solving nonlinear multi-classification.We have compared our approach with other methods using non-equilibrium samples,the results show that our approach guarantees sample integrity and achieves superior classification.
文摘Logistic Regression Models have been widely used in many areas of research, namely in health sciences, to study risk factors associated to diseases. Many population based surveys, such as Demographic and Health Survey (DHS), are constructed assuming complex sampling, i.e., probabilistic, stratified and multistage sampling, with unequal weights in the observations;this complex design must be taken into account in order to have reliable results. However, this very relevant issue usually is not well analyzed in the literature. The aim of the study is to specify the logistic regression model with complex sample design, and to demonstrate how to estimate it using the R software survey package. More specifically, we used Mozambique Demographic Health and Survey data 2011 (MDHS 2011) to illustrate how to correct for the effect of sample design in the particular case of estimating the risk factors associated to the probability of using mosquito bed nets. Our results show that in the presence of complex sampling, appropriate methods must be used both in descriptive and inferential statistics.