期刊文献+
共找到3,189篇文章
< 1 2 160 >
每页显示 20 50 100
DCS-SOCP-SVM:A Novel Integrated Sampling and Classification Algorithm for Imbalanced Datasets
1
作者 Xuewen Mu Bingcong Zhao 《Computers, Materials & Continua》 2025年第5期2143-2159,共17页
When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes... When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets. 展开更多
关键词 DCS-SOCP-SVM imbalanced datasets sampling method ensemble method integrated algorithm
下载PDF
Impact of climate changes on Arizona State precipitation patterns using high-resolution climatic gridded datasets
2
作者 Hayder H.Kareem Shahla Abdulqader Nassrullah 《Journal of Groundwater Science and Engineering》 2025年第1期34-46,共13页
Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geograph... Climate change significantly affects environment,ecosystems,communities,and economies.These impacts often result in quick and gradual changes in water resources,environmental conditions,and weather patterns.A geographical study was conducted in Arizona State,USA,to examine monthly precipi-tation concentration rates over time.This analysis used a high-resolution 0.50×0.50 grid for monthly precip-itation data from 1961 to 2022,Provided by the Climatic Research Unit.The study aimed to analyze climatic changes affected the first and last five years of each decade,as well as the entire decade,during the specified period.GIS was used to meet the objectives of this study.Arizona experienced 51–568 mm,67–560 mm,63–622 mm,and 52–590 mm of rainfall in the sixth,seventh,eighth,and ninth decades of the second millennium,respectively.Both the first and second five year periods of each decade showed accept-able rainfall amounts despite fluctuations.However,rainfall decreased in the first and second decades of the third millennium.and in the first two years of the third decade.Rainfall amounts dropped to 42–472 mm,55–469 mm,and 74–498 mm,respectively,indicating a downward trend in precipitation.The central part of the state received the highest rainfall,while the eastern and western regions(spanning north to south)had significantly less.Over the decades of the third millennium,the average annual rainfall every five years was relatively low,showing a declining trend due to severe climate changes,generally ranging between 35 mm and 498 mm.The central regions consistently received more rainfall than the eastern and western outskirts.Arizona is currently experiencing a decrease in rainfall due to climate change,a situation that could deterio-rate further.This highlights the need to optimize the use of existing rainfall and explore alternative water sources. 展开更多
关键词 Spatial Analysis Climate Impact Precipitation Rates CRU dataset GIS Arizona State USA
下载PDF
面向研究生招生咨询的中文Text-to-SQL模型
3
作者 王庆丰 李旭 +1 位作者 姚春龙 程腾腾 《计算机工程》 北大核心 2025年第3期362-368,共7页
研究生招生咨询是一种具有代表性的短时间高频次问答应用场景。针对现有基于词向量等方法的招生问答系统返回答案不够精确,以及每年需要更新问题库的问题,引入了基于文本转结构化查询语言(Text-to-SQL)技术的RESDSQL模型,可将自然语言... 研究生招生咨询是一种具有代表性的短时间高频次问答应用场景。针对现有基于词向量等方法的招生问答系统返回答案不够精确,以及每年需要更新问题库的问题,引入了基于文本转结构化查询语言(Text-to-SQL)技术的RESDSQL模型,可将自然语言问题转化为SQL语句后到结构化数据库中查询答案并返回。搜集了研究生招生场景中的高频咨询问题,根据3所高校真实招生数据,构建问题与SQL语句模板,通过填充模板的方式构建数据集,共有训练集1501条、测试集386条。将RESDSQL的RoBERTa模型替换为具有更强多语言生成能力的XLM-RoBERTa模型、T5模型替换为mT5模型,并在目标领域数据集上进行微调,在招生领域问题上取得了较高的准确率,在mT5-large模型上执行正确率为0.95,精确匹配率为1。与基于ChatGPT3.5模型、使用零样本提示的C3SQL方法对比,该模型性能与成本均更优。 展开更多
关键词 中文文本转结构化查询语言 自然语言查询 中文SQL语句生成 预训练模型 text-to-sql数据集
下载PDF
A Method of Generating Semi-Experimental Biomedical Datasets
4
作者 Jing Wang Naike Du +1 位作者 Zi He Xiuzhu Ye 《Journal of Beijing Institute of Technology》 EI CAS 2024年第3期219-226,共8页
This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which ... This paper proposed a method to generate semi-experimental biomedical datasets based on full-wave simulation software.The system noise such as antenna port couplings is fully considered in the proposed datasets,which is more realistic than synthetical datasets.In this paper,datasets containing different shapes are constructed based on the relative permittivities of human tissues.Then,a back-propagation scheme is used to obtain the rough reconstructions,which will be fed into a U-net convolutional neural network(CNN)to recover the high-resolution images.Numerical results show that the network trained on the datasets generated by the proposed method can obtain satisfying reconstruction results and is promising to be applied in real-time biomedical imaging. 展开更多
关键词 electromagnetic imaging dataset biomedical imaging
下载PDF
Performance evaluation of seven multi-label classification methods on real-world patent and publication datasets
5
作者 Shuo Xu Yuefu Zhang +1 位作者 Xin An Sainan Pi 《Journal of Data and Information Science》 CSCD 2024年第2期81-103,共23页
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t... Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution. 展开更多
关键词 Multi-label classification Real-World datasets Hierarchical structure Classification system Label correlation Machine learning
下载PDF
Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection
6
作者 Ankan Kar Nirjhar Nath +1 位作者 Utpalraj Kemprai   Aman 《International Journal of Communications, Network and System Sciences》 2024年第2期11-29,共19页
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to... This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus. 展开更多
关键词 Support Vector Machine Challenging datasets Forest Fire Detection CLASSIFICATION
下载PDF
ML and DL-based Phishing Website Detection:The Effects of Varied Size Datasets and Informative Feature Selection Techniques
7
作者 Kibreab Adane Berhanu Beyene Mohammed Abebe 《Journal of Artificial Intelligence and Technology》 2024年第1期18-30,共13页
Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors ha... Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites,they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware.It is impossible to stop attackers fromcreating phishingwebsites,which is one of the core challenges in combating them.However,this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handing over sensitive information.In this study,five machine learning(ML)and DL algorithms—cat-boost(CATB),gradient boost(GB),random forest(RF),multilayer perceptron(MLP),and deep neural network(DNN)—were tested with three different reputable datasets and two useful feature selection techniques,to assess the scalability and consistency of each classifier’s performance on varied dataset sizes.The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.9%,95.73%,and 98.83%.The GB classifier achieved the second-best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.16%,95.18%,and 98.58%.MLP achieved the best computational time across all datasets(DS-1,DS-2,and DS-3)with respective values of 2,7,and 3 seconds despite scoring the lowest accuracy across all datasets. 展开更多
关键词 ANOVA-F-test deep learning feature selection technique machine learning mutual information phishing website datasets phishing website detection
下载PDF
An Intrusion Detection System Based on HiTar-2024 Dataset Generation from LOG Files for Smart Industrial Internet-of-Things Environment
8
作者 Tarak Dhaouadi Hichem Mrabet +1 位作者 Adeeb Alhomoud Abderrazak Jemai 《Computers, Materials & Continua》 2025年第3期4535-4554,共20页
The increasing adoption of Industrial Internet of Things(IIoT)systems in smart manufacturing is leading to raise cyberattack numbers and pressing the requirement for intrusion detection systems(IDS)to be effective.How... The increasing adoption of Industrial Internet of Things(IIoT)systems in smart manufacturing is leading to raise cyberattack numbers and pressing the requirement for intrusion detection systems(IDS)to be effective.However,existing datasets for IDS training often lack relevance to modern IIoT environments,limiting their applicability for research and development.To address the latter gap,this paper introduces the HiTar-2024 dataset specifically designed for IIoT systems.As a consequence,that can be used by an IDS to detect imminent threats.Likewise,HiTar-2024 was generated using the AREZZO simulator,which replicates realistic smart manufacturing scenarios.The generated dataset includes five distinct classes:Normal,Probing,Remote to Local(R2L),User to Root(U2R),and Denial of Service(DoS).Furthermore,comprehensive experiments with popular Machine Learning(ML)models using various classifiers,including BayesNet,Logistic,IBK,Multiclass,PART,and J48 demonstrate high accuracy,precision,recall,and F1-scores,exceeding 0.99 across all ML metrics.The latter result is reached thanks to the rigorous applied process to achieve this quite good result,including data pre-processing,features extraction,fixing the class imbalance problem,and using a test option for model robustness.This comprehensive approach emphasizes meticulous dataset construction through a complete dataset generation process,a careful labelling algorithm,and a sophisticated evaluation method,providing valuable insights to reinforce IIoT system security.Finally,the HiTar-2024 dataset is compared with other similar datasets in the literature,considering several factors such as data format,feature extraction tools,number of features,attack categories,number of instances,and ML metrics. 展开更多
关键词 Intrusion detection system industrial IoT machine learning security cyber-attacks dataset
下载PDF
High-resolution Simulation Dataset of Hourly PM_(2.5)Chemical Composition in China(CAQRA-aerosol)from 2013 to 2020
9
作者 Lei KONG Xiao TANG +14 位作者 Jiang ZHU Zifa WANG Bing LIU Yuanyuan ZHU Lili ZHU Duohong CHEN Ke HU Huangjian WU Qian WU Jin SHEN Yele SUN Zirui LIU Jinyuan XIN Dongsheng JI Mei ZHENG 《Advances in Atmospheric Sciences》 2025年第4期697-712,共16页
Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategi... Scientific knowledge on the chemical compositions of fine particulate matter(PM_(2.5)) is essential for properly assessing its health and climate effects,and for decisionmakers to develop efficient mitigation strategies.A high-resolution PM_(2.5) chemical composition dataset(CAQRA-aerosol)is developed in this study,which provides hourly maps of organic carbon,black carbon,ammonium,nitrate,and sulfate in China from 2013 to 2020 with a horizontal resolution of 15 km.This paper describes the method,access,and validation results of this dataset.It shows that CAQRA-aerosol has good consistency with observations and achieves higher or comparable accuracy with previous PM_(2.5) composition datasets.Based on CAQRA-aerosol,spatiotemporal changes of different PM_(2.5) compositions were investigated from a national viewpoint,which emphasizes different changes of nitrate from other compositions.The estimated annual rate of population-weighted concentrations of nitrate is 0.23μg m^(−3)yr^(−1) from 2015 to 2020,compared with−0.19 to−1.1μg m^(−3)yr^(−1) for other compositions.The whole dataset is freely available from the China Air Pollution Data Center(https://doi.org/10.12423/capdb_PKU.2023.DA). 展开更多
关键词 PM_(2.5)composition dataset black carbon organic carbon AMMONIUM NITRATE SULFATE
下载PDF
大语言模型时代Text-to-SQL更准确的评估指标
10
作者 蒋鹏 《电脑知识与技术》 2025年第1期76-78,88,共4页
大型语言模型(LLM)已成为推进Text-to-SQL任务的强大工具。研究发现,基于LLM的模型在不同评估指标下,其性能表现与经过微调的模型存在显著差异。因此,文章分析了测试套件执行准确度(EXE)和精确集匹配准确度(ESM)在评估基于LLM的Text-to-... 大型语言模型(LLM)已成为推进Text-to-SQL任务的强大工具。研究发现,基于LLM的模型在不同评估指标下,其性能表现与经过微调的模型存在显著差异。因此,文章分析了测试套件执行准确度(EXE)和精确集匹配准确度(ESM)在评估基于LLM的Text-to-SQL模型时的不足,并提出了改进指标EESM(Enhanced Exact Set Matching)。实验结果表明,EXE和ESM分别存在高达13.2%和10.8%的假阳性和假阴性率,而EESM的假阳性率和假阴性率分别仅为0.2%和1.8%,表明EESM能够提供更准确的评估。 展开更多
关键词 EESM 增强的精确集匹配准确度 测试套件执行准确度 精确集匹配准确度 text-to-sql
下载PDF
Performances of Seven Datasets in Presenting the Upper Ocean Heat Content in the South China Sea 被引量:2
11
作者 陈晓 严幼芳 +1 位作者 程旭华 齐义泉 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2013年第5期1331-1342,共12页
In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (W... In this study, the upper ocean heat content (OHC) variations in the South China Sea (SCS) during 1993- 2006 were investigated by examining ocean temperatures in seven datasets, including World Ocean Atlas 2009 (WOA09) (climatology), Ishii datasets, Ocean General Circulation ModeI for the Earth Simulator (OFES), Simple Ocean Data Assimilation system (SODA), Global Ocean Data Assimilation System (GODAS), China Oceanic ReAnalysis system (CORA) , and an ocean reanalysis dataset for the joining area of Asia and Indian-Pacific Ocean (AIPO1.0). Among these datasets, two were independent of any numerical model, four relied on data assimilation, and one was generated without any data assimilation. The annual cycles revealed by the seven datasets were similar, but the interannual variations were different. Vertical structures of temperatures along the 18~N, 12.75~N, and 120~E sections were compared with data collected during open cruises in 1998 and 2005-08. The results indicated that Ishii, OFES, CORA, and AIPO1.0 were more consistent with the observations. Through systematic shortcomings and advantages in presenting the upper comparisons, we found that each dataset had its own OHC in the SCS. 展开更多
关键词 South China Sea ocean heat content multiple datasets interannual variability
下载PDF
The Assessment of Global Surface Temperature Change from 1850s:The C-LSAT2.0 Ensemble and the CMST-Interim Datasets 被引量:10
12
作者 Wenbin SUN Qingxiang LI +6 位作者 Boyin HUANG Jiayi CHENG Zhaoyang SONG Haiyan LI Wenjie DONG Panmao ZHAI Phil JONES 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2021年第5期875-888,共14页
Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been develo... Based on C-LSAT2.0,using high-and low-frequency components reconstruction methods,combined with observation constraint masking,a reconstructed C-LSAT2.0 with 756 ensemble members from the 1850s to 2018 has been developed.These ensemble versions have been merged with the ERSSTv5 ensemble dataset,and an upgraded version of the CMSTInterim dataset with 5°×5°resolution has been developed.The CMST-Interim dataset has significantly improved the coverage rate of global surface temperature data.After reconstruction,the data coverage before 1950 increased from 78%−81%of the original CMST to 81%−89%.The total coverage after 1955 reached about 93%,including more than 98%in the Northern Hemisphere and 81%−89%in the Southern Hemisphere.Through the reconstruction ensemble experiments with different parameters,a good basis is provided for more systematic uncertainty assessment of C-LSAT2.0 and CMSTInterim.In comparison with the original CMST,the global mean surface temperatures are estimated to be cooler in the second half of 19th century and warmer during the 21st century,which shows that the global warming trend is further amplified.The global warming trends are updated from 0.085±0.004℃(10 yr)^(–1)and 0.128±0.006℃(10 yr)^(–1)to 0.089±0.004℃(10 yr)^(–1)and 0.137±0.007℃(10 yr)^(–1),respectively,since the start and the second half of 20th century. 展开更多
关键词 C-LSAT2.0 ensemble datasets CMST-Interim EOTs high-and low-frequency components RECONSTRUCTION
下载PDF
Intercomparison of the Extended Reconstructed Sea Surface Temperature v4 and v3b Datasets 被引量:1
13
作者 WANG Jinping CHEN Xianyao 《Journal of Ocean University of China》 SCIE CAS CSCD 2018年第2期209-218,共10页
Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences i... Version 4(v4) of the Extended Reconstructed Sea Surface Temperature(ERSST) dataset is compared with its precedent, the widely used version 3b(v3b). The essential upgrades applied to v4 lead to remarkable differences in the characteristics of the sea surface temperature(SST) anomaly(SSTa) in both the temporal and spatial domains. First, the largest discrepancy of the global mean SSTa values around the 1940 s is due to ship-observation corrections made to reconcile observations from buckets and engine intake thermometers. Second, differences in global and regional mean SSTa values between v4 and v3b exhibit a downward trend(around-0.032℃ per decade) before the 1940s, an upward trend(around 0.014℃ per decade) during the period of 1950–2015, interdecadal oscillation with one peak around the 1980s, and two troughs during the 1960s and 2000s, respectively. This does not derive from treatments of the polar or the other data-void regions, since the difference of the SSTa does not share the common features. Third, the spatial pattern of the ENSO-related variability of v4 exhibits a wider but weaker cold tongue in the tropical region of the Pacific Ocean compared with that of v3b, which could be attributed to differences in gap-filling assumptions since the latter features satellite observations whereas the former features in situ ones. This intercomparison confirms that the structural uncertainty arising from underlying assumptions on the treatment of diverse SST observations even in the same SST product family is the main source of significant SST differences in the temporal domain. Why this uncertainty introduces artificial decadal oscillations remains unknown. 展开更多
关键词 ERSST datasets SEA surface temperature global WARMING ARCTIC data intercomparison
下载PDF
Effectiveness of predicting tunneling-induced ground settlements using machine learning methods with small datasets 被引量:9
14
作者 Linan Liu Wendy Zhou Marte Gutierrez 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2022年第4期1028-1041,共14页
Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground st... Prediction of tunneling-induced ground settlements is an essential task,particularly for tunneling in urban settings.Ground settlements should be limited within a tolerable threshold to avoid damages to aboveground structures.Machine learning(ML)methods are becoming popular in many fields,including tunneling and underground excavations,as a powerful learning and predicting technique.However,the available datasets collected from a tunneling project are usually small from the perspective of applying ML methods.Can ML algorithms effectively predict tunneling-induced ground settlements when the available datasets are small?In this study,seven ML methods are utilized to predict tunneling-induced ground settlement using 14 contributing factors measured before or during tunnel excavation.These methods include multiple linear regression(MLR),decision tree(DT),random forest(RF),gradient boosting(GB),support vector regression(SVR),back-propagation neural network(BPNN),and permutation importancebased BPNN(PI-BPNN)models.All methods except BPNN and PI-BPNN are shallow-structure ML methods.The effectiveness of these seven ML approaches on small datasets is evaluated using model accuracy and stability.The model accuracy is measured by the coefficient of determination(R2)of training and testing datasets,and the stability of a learning algorithm indicates robust predictive performance.Also,the quantile error(QE)criterion is introduced to assess model predictive performance considering underpredictions and overpredictions.Our study reveals that the RF algorithm outperforms all the other models with the highest model prediction accuracy(0.9)and stability(3.0210^(-27)).Deep-structure ML models do not perform well for small datasets with relatively low model accuracy(0.59)and stability(5.76).The PI-BPNN architecture is proposed and designed for small datasets,showing better performance than typical BPNN.Six important contributing factors of ground settlements are identified,including tunnel depth,the distance between tunnel face and surface monitoring points(DTM),weighted average soil compressibility modulus(ACM),grouting pressure,penetrating rate and thrust force. 展开更多
关键词 Ground settlements TUNNELING Machine learning Small dataset Model accuracy Model stability Feature importance
下载PDF
Evaluating data-driven algorithms for predicting mechanical properties with small datasets:A case study on gear steel hardenability 被引量:3
15
作者 Bogdan Nenchev Qing Tao +4 位作者 Zihui Dong Chinnapat Panwisawas Haiyang Li Biao Tao Hongbiao Dong 《International Journal of Minerals,Metallurgy and Materials》 SCIE EI CAS CSCD 2022年第4期836-847,共12页
Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are iden... Data-driven algorithms for predicting mechanical properties with small datasets are evaluated in a case study on gear steel hardenability.The limitations of current data-driven algorithms and empirical models are identified.Challenges in analysing small datasets are discussed,and solution is proposed to handle small datasets with multiple variables.Gaussian methods in combination with novel predictive algorithms are utilized to overcome the challenges in analysing gear steel hardenability data and to gain insight into alloying elements interaction and structure homogeneity.The gained fundamental knowledge integrated with machine learning is shown to be superior to the empirical equations in predicting hardenability.Metallurgical-property relationships between chemistry,sample size,and hardness are predicted via two optimized machine learning algorithms:neural networks(NNs)and extreme gradient boosting(XGboost).A comparison is drawn between all algorithms,evaluating their performance based on small data sets.The results reveal that XGboost has the highest potential for predicting hardenability using small datasets with class imbalance and large inhomogeneity issues. 展开更多
关键词 machine learning small dataset XGboost HARDENABILITY gear steel
下载PDF
COMPARISONS OF THE WEST PACIFIC SUBTROPICAL HIGH AND THE SOUTH ASIA HIGH BETWEEN NCEP/NCAR AND ECMWF REANALYSIS DATASETS 被引量:4
16
作者 陈雯 智协飞 《Journal of Tropical Meteorology》 SCIE 2008年第2期121-124,共4页
Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies ar... Comparisons of the west Pacific subtropical high with the South Asia High are made using the NCEP/NCAR and ECMWF 500 hPa and 100 hPa monthly boreal geopotential height fields for the period 1961-2000. Discrepancies are found for the time prior to 1980. The west Pacific subtropical high in the NCEP/NCAR data is less intense than in ECMWF data before 1980. The range and strength of the west Pacific subtropical high variation described by the NCEP/NCAR data are larger than those depicted by ECMWF data. The same situation appears in the 100-hPa geopotential field. These discoveries suggest that the interdecadal variation of the two systems as shown by the NCEP/NCAR data may not be true. Besides, the South Asia High center in the NCEP/NCAR data is obviously stronger than in the ECMWF data during the periods 1969, 1979-1991 and 1992-1995. Furthermore, the range is larger from 1992 to 1995. 展开更多
关键词 reanalysis datasets west Pacific subtropical high South Asia High comparisons
下载PDF
COMPARISON OF SOME TROPICAL CYCLONE DATASETS AND CORRECTION OF YEARBOOK DATA 被引量:2
17
作者 邹燕 赵平 《Journal of Tropical Meteorology》 SCIE 2010年第2期109-114,共6页
This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (... This is a study to compare three selected tropical cyclone datasets separately compiled by CMA Shanghai Typhoon Institute (CMA SHI), the Joint Typhoon Warning Center (JTWC), and the Japan Meteorological Agency (JMA). The annual fi'equencies, observation times and destructive power index as the characteristic quantities are investigated of the tropical cyclones over the western North Pacific. The comparative study has resulted in the following findings: 1) Statistical gaps between the datasets compared are narrowing down as the intensity of tropical cyclones increases. 2) In the context of interdecadal distribution, there is for the 1950s a relatively large gap between the datasets, as compared with a narrowed gap for the period from the mid 1970s to the 1980s, and a recurring widened gap for the mid and late 1990s. Additionally, an approach is proposed in the paper to correct the wind speed data in the TC Yearbook. 展开更多
关键词 tropical cyclones datasets comparison comparative study
下载PDF
基于语义增强模式链接的Text-to-SQL模型 被引量:1
18
作者 吴相岚 肖洋 +1 位作者 刘梦莹 刘明铭 《计算机应用》 CSCD 北大核心 2024年第9期2689-2695,共7页
为优化基于异构图编码器的Text-to-SQL生成效果,提出SELSQL模型。首先,模型采用端到端的学习框架,使用双曲空间下的庞加莱距离度量替代欧氏距离度量,以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图;其次,利用K头加... 为优化基于异构图编码器的Text-to-SQL生成效果,提出SELSQL模型。首先,模型采用端到端的学习框架,使用双曲空间下的庞加莱距离度量替代欧氏距离度量,以此优化使用探针技术从预训练语言模型中构建的语义增强的模式链接图;其次,利用K头加权的余弦相似度以及图正则化方法学习相似度度量图使得初始模式链接图在训练中迭代优化;最后,使用改良的关系图注意力网络(RGAT)图编码器以及多头注意力机制对两个模块的联合语义模式链接图进行编码,并且使用基于语法的神经语义解码器和预定义的结构化语言进行结构化查询语言(SQL)语句解码。在Spider数据集上的实验结果表明,使用ELECTRA-large预训练模型时,SELSQL模型比最佳基线模型的准确率提升了2.5个百分点,对于复杂SQL语句生成的提升效果很大。 展开更多
关键词 模式链接 图结构学习 预训练语言模型 text-to-sql 异构图
下载PDF
Variability and Long-Term Trend of Total Cloud Cover in China Derived from ISCCP, ERA-40, CRU3, and Ground Station Datasets 被引量:1
19
作者 ZONG Xue-Mei WANG Pu-Cai XIA Xiang-Ao 《Atmospheric and Oceanic Science Letters》 CSCD 2013年第3期133-137,共5页
Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medi... Total Cloud Cover (TCC) over China deter- mined from four climate datasets including the Interna- tional Satellite Cloud Climatology Project (ISCCP), the 40-year Re-Analysis Project of the European Centre for Medium-Range Weather Forecasts (ERA-40), Climate Research Unit Time Series 3.0 (CRU3), and ground sta- tion datasets are used to show spatial and temporal varia- tion of TCC and their differences. It is demonstrated that the four datasets show similar spatial pattern and seasonal variation. The maximum value is derived from ISCCE TCC value in North China derived from ERA-40 is 50% larger than that from the station dataset; however, the value is 50% less than that in South China. The annual TCC of ISCCP, ERA-40, and ground station datasets shows a decreasing trend during 1984-2002; however, an increasing trend is derived from CRU3. The results of this study imply remarkable differences of TCC derived from surface and satellite observations as well as model simu- lations. The potential effects of these differences on cloud climatology and associated climatic issues should be carefully considered. 展开更多
关键词 total cloud cover ISCCE ERA-40 CRU3 ground station dataset
下载PDF
Service Life Design for Concrete Engineering in Marine Environments of Northern China Based on a Modified Theoretical Model of Chloride Diffusion and Large Datasets of Ocean Parameters 被引量:1
20
作者 Taotao Feng Hongfa Yu +3 位作者 Yongshan Tan Haiyan Ma Mei Xu Chengjun Yue 《Engineering》 SCIE EI CAS 2022年第10期123-139,共17页
In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed fo... In this study,through experimental research and an investigation on large datasets of the durability parameters in ocean engineering,the values,ranges,and types of distribution of the durability parameters employed for the durability design in ocean engineering in northern China were confirmed.Based on a modified theoretical model of chloride diffusion and the reliability theory,the service lives of concrete structures exposed to the splash,tidal,and underwater zones were calculated.Mixed concrete proportions meeting the requirement of a service life of 100 or 120 years were designed,and a cover thickness requirement was proposed.In addition,the effects of the different time-varying relationships of the boundary condition(Cs)and diffusion coefficient(Df)on the service life were compared;the results showed that the time-varying relationships used in this study(i.e.,Cscontinuously increased and then remained stable,and Dfcontinuously decreased and then remained stable)were beneficial for the durability design of concrete structures in marine environment. 展开更多
关键词 Large datasets Modified theoretical model Reliability theory Service life Boundary condition Diffusion coefficient
下载PDF
上一页 1 2 160 下一页 到第
使用帮助 返回顶部