Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, a...Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.展开更多
Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challeng...Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration Data Model (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.展开更多
In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues...In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues of cross-sectional dependence, and introduces the concepts of weak and strong cross-sectional dependence. Then, the main attention is primarily paid to spatial and factor approaches for modeling cross-sectional dependence for both linear and nonlinear (nonparametric and semiparametric) panel data models. Finally, we conclude with some speculations on future research directions.展开更多
Towards a better understanding of hydrological interactions between the land surface and atmosphere, land surface mod- els are routinely used to simulate hydro-meteorological fluxes. However, there is a lack of observ...Towards a better understanding of hydrological interactions between the land surface and atmosphere, land surface mod- els are routinely used to simulate hydro-meteorological fluxes. However, there is a lack of observations available for model forcing, to estimate the hydro-meteorological fluxes in East Asia. In this study, Common Land Model (CLM) was used in offline-mode during the summer monsoon period of 2006 in East Asia, with different forcings from Asiaflux, Korea Land Data Assimilation System (KLDAS), and Global Land Data Assimilation System (GLDAS), at point and regional scales, separately. The CLM results were compared with observations from Asiaflux sites. The estimated net radiation showed good agreement, with r = 0.99 for the point scale and 0.85 for the regional scale. The estimated sensible and latent heat fluxes using Asiaflux and KLDAS data indicated reasonable agreement, with r = 0.70. The estimated soil moisture and soil temperature showed similar patterns to observations, although the estimated water fluxes using KLDAS showed larger discrepancies than those of Asiaflux because of scale mismatch. The spatial distribution of hydro-meteorological fluxes according to KLDAS for East Asia were compared to the CLM results with GLDAS, and the GLDAS provided online. The spatial distributions of CLM with KLDAS were analogous to CLM with GLDAS, and the standalone GLDAS data. The results indicate that KLDAS is a good potential source of high spatial resolution forcing data. Therefore, the KLDAS is a promising alternative product, capable of compensating for the lack of observations and low resolution grid data for East Asia.展开更多
Cooperative spectrum monitoring with multiple sensors has been deemed as an efficient mechanism for improving the monitoring accuracy and enlarging the monitoring area in wireless sensor networks.However,there exists ...Cooperative spectrum monitoring with multiple sensors has been deemed as an efficient mechanism for improving the monitoring accuracy and enlarging the monitoring area in wireless sensor networks.However,there exists redundancy among the spectrum data collected by a sensor node within a data collection period,which may reduce the data uploading efficiency.In this paper,we investigate the inter-data commonality detection which describes how much two data have in common.We define common segment set and divide it into six categories firstly,then a method to measure a common segment set is conducted by extracting commonality between two files.Moreover,the existing algorithms fail in finding a good common segment set,so Common Data Measurement(CDM)algorithm that can identify a good common segment set based on inter-data commonality detection is proposed.Theoretical analysis proves that CDM algorithm achieves a good measurement for the commonality between two strings.In addition,we conduct an synthetic dataset which are produced randomly.Numerical results shows that CDM algorithm can get better performance in measuring commonality between two binary files compared with Greedy-String-Tiling(GST)algorithm and simple greedy algorithm.展开更多
现有标准格式雷达基数据解析工具在设计上存在通用性和抽象性不足的问题,不便于雷达数据的解析和处理。为了解决这个问题,本文基于Unidata的CDM(Common Data Model),设计和构建了中国天气雷达基数据模型,在数据模型层面实现了对天气雷...现有标准格式雷达基数据解析工具在设计上存在通用性和抽象性不足的问题,不便于雷达数据的解析和处理。为了解决这个问题,本文基于Unidata的CDM(Common Data Model),设计和构建了中国天气雷达基数据模型,在数据模型层面实现了对天气雷达标准格式基数据的访问,并以Unidata开源的NetCDF Java库和IDV(Integrated Data Viewer)可视化软件为基础,形成了一套基于CDM的天气雷达标准格式基数据内容提取和可视化分析工具。本研究以广州雷达新旧两种格式基本反射率数据对比为例,展示了研究成果在多普勒天气雷达标准格式基数据评估中的应用。结果表明:本研究成果方便了雷达标准格式基数据的使用,对雷达标准格式基数据的业务应用起到了促进作用。本研究成果亦可应用于雷达基数据处理与分析相关的实际业务和科研工作中,为雷达资料的应用提供基础支持。展开更多
Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify th...Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify the subgroups compared with support vector machine(SVM)and extreme gradient boosting(XGBoost),and to select the features.The top 10 important features are included in the stepwise logistic regression,and the odds ratio(OR)and 95%confidence interval(CI)are obtained.There are 41290 adult inpatient records diagnosed with CSVD.Accuracy and area under curve(AUC)of RF are close to 0.7,which performs best in classification compared to SVM and XGBoost.OR and 95%CI of hematocrit for white matter lesions(WMLs),lacunes,microbleeds,atrophy,and enlarged perivascular space(EPVS)are 0.9875(0.9857−0.9893),0.9728(0.9705−0.9752),0.9782(0.9740−0.9824),1.0093(1.0081−1.0106),and 0.9716(0.9597−0.9832).OR and 95%CI of red cell distribution width for WMLs,lacunes,atrophy,and EPVS are 0.9600(0.9538−0.9662),0.9630(0.9559−0.9702),1.0751(1.0686−1.0817),and 0.9304(0.8864−0.9755).OR and 95%CI of platelet distribution width for WMLs,lacunes,and microbleeds are 1.1796(1.1636−1.1958),1.1663(1.1476−1.1853),and 1.0416(1.0152−1.0687).This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model,which has low cost,fast speed,large sample size,and continuous data sources.展开更多
We compare the Hubble diagram calculated from the observed redshift (RS)/magnitude (μ) data of 280 Supernovae in the RS range of z = 0.0104 to 8.1 with Hubble diagrams inferred on the basis of the exponential tired l...We compare the Hubble diagram calculated from the observed redshift (RS)/magnitude (μ) data of 280 Supernovae in the RS range of z = 0.0104 to 8.1 with Hubble diagrams inferred on the basis of the exponential tired light and the Lambda Cold Dark Matter (ΛCDM) cosmological model. We show that the experimentally measured Hubble diagram follows clearly the exponential photon flight time (tS)/RS relation, whilst the data calculated on the basis of the ΛCDM model exhibit poor agreement with the observed data.展开更多
Based on an analysis of 280 Type SNIa supernovae and gamma-ray bursts redshifts in the range of z = 0.0104 - 8.1 the Hubble diagram is shown to follow a strictly exponential slope predicting an exponentially expanding...Based on an analysis of 280 Type SNIa supernovae and gamma-ray bursts redshifts in the range of z = 0.0104 - 8.1 the Hubble diagram is shown to follow a strictly exponential slope predicting an exponentially expanding or static universe. At redshifts > 2 - 3 ΛCDM models show a poor agreement with the observed data. Based on the results presented in this paper, the Hubble diagram test does not necessarily support the idea of expansion according to the big-bang concordance model.展开更多
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst...A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.展开更多
文摘Large Language Models (LLMs) have revolutionized Generative Artificial Intelligence (GenAI) tasks, becoming an integral part of various applications in society, including text generation, translation, summarization, and more. However, their widespread usage emphasizes the critical need to enhance their security posture to ensure the integrity and reliability of their outputs and minimize harmful effects. Prompt injections and training data poisoning attacks are two of the most prominent vulnerabilities in LLMs, which could potentially lead to unpredictable and undesirable behaviors, such as biased outputs, misinformation propagation, and even malicious content generation. The Common Vulnerability Scoring System (CVSS) framework provides a standardized approach to capturing the principal characteristics of vulnerabilities, facilitating a deeper understanding of their severity within the security and AI communities. By extending the current CVSS framework, we generate scores for these vulnerabilities such that organizations can prioritize mitigation efforts, allocate resources effectively, and implement targeted security measures to defend against potential risks.
文摘Multidatabase systems are designed to achieve schema integration and data interoperation among distributed and heterogeneous database systems. But data model heterogeneity and schema heterogeneity make this a challenging task. A multidatabase common data model is firstly introduced based on XML, named XML-based Integration Data Model (XIDM), which is suitable for integrating different types of schemas. Then an approach of schema mappings based on XIDM in multidatabase systems has been presented. The mappings include global mappings, dealing with horizontal and vertical partitioning between global schemas and export schemas, and local mappings, processing the transformation between export schemas and local schemas. Finally, the illustration and implementation of schema mappings in a multidatabase prototype - Panorama system are also discussed. The implementation results demonstrate that the XIDM is an efficient model for managing multiple heterogeneous data sources and the approaches of schema mapping based on XIDM behave very well when integrating relational, object-oriented database systems and other file systems.
基金Supported by the National Natural Science Foundation of China(71131008(Key Project)and 71271179)
文摘In this review, we highlight some recent methodological and theoretical develop- ments in estimation and testing of large panel data models with cross-sectional dependence. The paper begins with a discussion of issues of cross-sectional dependence, and introduces the concepts of weak and strong cross-sectional dependence. Then, the main attention is primarily paid to spatial and factor approaches for modeling cross-sectional dependence for both linear and nonlinear (nonparametric and semiparametric) panel data models. Finally, we conclude with some speculations on future research directions.
基金supported by Space Core Technology Development Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Science,ICTFuture Planning(NRF-2014M1A3A3A02034789)+1 种基金Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(NRF-2013R1A1A2A10004743)the Korea Meteorological Administration Research and Development Program under Grant Weather Information Service Engine(WISE)project,KMA-2012-0001-A
文摘Towards a better understanding of hydrological interactions between the land surface and atmosphere, land surface mod- els are routinely used to simulate hydro-meteorological fluxes. However, there is a lack of observations available for model forcing, to estimate the hydro-meteorological fluxes in East Asia. In this study, Common Land Model (CLM) was used in offline-mode during the summer monsoon period of 2006 in East Asia, with different forcings from Asiaflux, Korea Land Data Assimilation System (KLDAS), and Global Land Data Assimilation System (GLDAS), at point and regional scales, separately. The CLM results were compared with observations from Asiaflux sites. The estimated net radiation showed good agreement, with r = 0.99 for the point scale and 0.85 for the regional scale. The estimated sensible and latent heat fluxes using Asiaflux and KLDAS data indicated reasonable agreement, with r = 0.70. The estimated soil moisture and soil temperature showed similar patterns to observations, although the estimated water fluxes using KLDAS showed larger discrepancies than those of Asiaflux because of scale mismatch. The spatial distribution of hydro-meteorological fluxes according to KLDAS for East Asia were compared to the CLM results with GLDAS, and the GLDAS provided online. The spatial distributions of CLM with KLDAS were analogous to CLM with GLDAS, and the standalone GLDAS data. The results indicate that KLDAS is a good potential source of high spatial resolution forcing data. Therefore, the KLDAS is a promising alternative product, capable of compensating for the lack of observations and low resolution grid data for East Asia.
基金supported in part by the National Natural Science Foundation of China(No.61901328)the China Postdoctoral Science Foundation (No. 2019M653558)+1 种基金the Fundamental Research Funds for the Central Universities (No. CJT150101)the Key project of National Natural Science Foundation of China (No. 61631015)
文摘Cooperative spectrum monitoring with multiple sensors has been deemed as an efficient mechanism for improving the monitoring accuracy and enlarging the monitoring area in wireless sensor networks.However,there exists redundancy among the spectrum data collected by a sensor node within a data collection period,which may reduce the data uploading efficiency.In this paper,we investigate the inter-data commonality detection which describes how much two data have in common.We define common segment set and divide it into six categories firstly,then a method to measure a common segment set is conducted by extracting commonality between two files.Moreover,the existing algorithms fail in finding a good common segment set,so Common Data Measurement(CDM)algorithm that can identify a good common segment set based on inter-data commonality detection is proposed.Theoretical analysis proves that CDM algorithm achieves a good measurement for the commonality between two strings.In addition,we conduct an synthetic dataset which are produced randomly.Numerical results shows that CDM algorithm can get better performance in measuring commonality between two binary files compared with Greedy-String-Tiling(GST)algorithm and simple greedy algorithm.
文摘现有标准格式雷达基数据解析工具在设计上存在通用性和抽象性不足的问题,不便于雷达数据的解析和处理。为了解决这个问题,本文基于Unidata的CDM(Common Data Model),设计和构建了中国天气雷达基数据模型,在数据模型层面实现了对天气雷达标准格式基数据的访问,并以Unidata开源的NetCDF Java库和IDV(Integrated Data Viewer)可视化软件为基础,形成了一套基于CDM的天气雷达标准格式基数据内容提取和可视化分析工具。本研究以广州雷达新旧两种格式基本反射率数据对比为例,展示了研究成果在多普勒天气雷达标准格式基数据评估中的应用。结果表明:本研究成果方便了雷达标准格式基数据的使用,对雷达标准格式基数据的业务应用起到了促进作用。本研究成果亦可应用于雷达基数据处理与分析相关的实际业务和科研工作中,为雷达资料的应用提供基础支持。
基金supported by the National Natural Science Foundation of China(Nos.72204169 and 81825007)Beijing Outstanding Young Scientist Program(No.BJJWZYJH01201910025030)+5 种基金Capital’s Funds for Health Improvement and Research(No.2022-2-2045)National Key R&D Program of China(Nos.2022YFF15015002022YFF1501501,2022YFF1501502,2022YFF1501503,2022YFF1501504,and 2022YFF1501505)Youth Beijing Scholar Program(No.010)Beijing Laboratory of Oral Health(No.PXM2021_014226_000041)Beijing Talent Project-Class A:Innovation and Development(No.2018A12)National Ten-Thousand Talent PlanLeadership of Scientific and Technological Innovation,and National Key R&D Program of China(Nos.2017YFC1307900 and 2017YFC1307905).
文摘Differences in the imaging subgroups of cerebral small vessel disease(CSVD)need to be further explored.First,we use propensity score matching to obtain balanced datasets.Then random forest(RF)is adopted to classify the subgroups compared with support vector machine(SVM)and extreme gradient boosting(XGBoost),and to select the features.The top 10 important features are included in the stepwise logistic regression,and the odds ratio(OR)and 95%confidence interval(CI)are obtained.There are 41290 adult inpatient records diagnosed with CSVD.Accuracy and area under curve(AUC)of RF are close to 0.7,which performs best in classification compared to SVM and XGBoost.OR and 95%CI of hematocrit for white matter lesions(WMLs),lacunes,microbleeds,atrophy,and enlarged perivascular space(EPVS)are 0.9875(0.9857−0.9893),0.9728(0.9705−0.9752),0.9782(0.9740−0.9824),1.0093(1.0081−1.0106),and 0.9716(0.9597−0.9832).OR and 95%CI of red cell distribution width for WMLs,lacunes,atrophy,and EPVS are 0.9600(0.9538−0.9662),0.9630(0.9559−0.9702),1.0751(1.0686−1.0817),and 0.9304(0.8864−0.9755).OR and 95%CI of platelet distribution width for WMLs,lacunes,and microbleeds are 1.1796(1.1636−1.1958),1.1663(1.1476−1.1853),and 1.0416(1.0152−1.0687).This study proposes a new analytical framework to select important clinical markers for CSVD with machine learning based on a common data model,which has low cost,fast speed,large sample size,and continuous data sources.
文摘We compare the Hubble diagram calculated from the observed redshift (RS)/magnitude (μ) data of 280 Supernovae in the RS range of z = 0.0104 to 8.1 with Hubble diagrams inferred on the basis of the exponential tired light and the Lambda Cold Dark Matter (ΛCDM) cosmological model. We show that the experimentally measured Hubble diagram follows clearly the exponential photon flight time (tS)/RS relation, whilst the data calculated on the basis of the ΛCDM model exhibit poor agreement with the observed data.
文摘Based on an analysis of 280 Type SNIa supernovae and gamma-ray bursts redshifts in the range of z = 0.0104 - 8.1 the Hubble diagram is shown to follow a strictly exponential slope predicting an exponentially expanding or static universe. At redshifts > 2 - 3 ΛCDM models show a poor agreement with the observed data. Based on the results presented in this paper, the Hubble diagram test does not necessarily support the idea of expansion according to the big-bang concordance model.
文摘A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.