Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for...Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for us to explore new me-thods and approaches concerning the prevention and treatment of mental illne-sses.Large multimodal models(LMMs),as the most advanced artificial intelligen-ce models(i.e.ChatGPT-4),have brought new hope to the accurate prevention,diagnosis,and treatment of psychiatric disorders.The assistance of these models in the promotion of mental health is critical,as the latter necessitates a strong foundation of medical knowledge and professional skills,emotional support,stigma mitigation,the encouragement of more honest patient self-disclosure,reduced health care costs,improved medical efficiency,and greater mental health service coverage.However,these models must address challenges related to health,safety,hallucinations,and ethics simultaneously.In the future,we should address these challenges by developing relevant usage manuals,accountability rules,and legal regulations;implementing a human-centered approach;and intelligently upgrading LMMs through the deep optimization of such models,their algorithms,and other means.This effort will thus substantially contribute not only to the maintenance of students’health but also to the achievement of global sustainable development goals.展开更多
User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore ...User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the ch...For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the channel capacity formula derivation was obtained.On the optimal criterion of the channel capacity,the power allocation methods of both amplifying and forwarding(AF) and decoding and forwarding(DF) cooperative communication systems were proposed in the limitation of the total power to maximize the channel capacity.The mode selection methods of single input single output(SISO) and single input multiple output(SIMO) models in the rectangular tunnel,through which the higher channel capacity can be obtained,were put forward as well.The theoretical analysis and simulation comparison show that,channel capacity of the wireless communication system in the rectangular tunnel can be effectively enhanced through the cooperative technology;channel capacity of the rectangular tunnel under complicated conditions is maximized through the proposed power allocation methods,and the optimal cooperative mode of the channel capacity can be chosen according to the cooperative mode selection methods given in the paper.展开更多
A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the...A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the AMPPCA algorithm first estimates a statistical description for each operating mode by applying mixture probabilistic principal component analysis(MPPCA). As a comparison, the combined MPPCA is employed where monitoring results are softly integrated according to posterior probabilities of the test sample in each local model. For exploiting the cross-mode correlations, which may be useful but are inadvertently neglected due to separately held monitoring approaches, a global monitoring model is constructed by aligning all local models together. In this way, both within-mode and cross-mode correlations are preserved in this integrated space. Finally, the utility and feasibility of AMPPCA are demonstrated through a non-isothermal continuous stirred tank reactor and the TE benchmark process.展开更多
Complex processes often work with multiple operation regions, it is critical to develop effective monitoring approaches to ensure the safety of chemical processes. In this work, a discriminant local consistency Gaussi...Complex processes often work with multiple operation regions, it is critical to develop effective monitoring approaches to ensure the safety of chemical processes. In this work, a discriminant local consistency Gaussian mixture model(DLCGMM) for multimode process monitoring is proposed for multimode process monitoring by integrating LCGMM with modified local Fisher discriminant analysis(MLFDA). Different from Fisher discriminant analysis(FDA) that aims to discover the global optimal discriminant directions, MLFDA is capable of uncovering multimodality and local structure of the data by exploiting the posterior probabilities of observations within clusters calculated from the results of LCGMM. This may enable MLFDA to capture more meaningful discriminant information hidden in the high-dimensional multimode observations comparing to FDA. Contrary to most existing multimode process monitoring approaches, DLCGMM performs LCGMM and MFLDA iteratively, and the optimal subspaces with multi-Gaussianity and the optimal discriminant projection vectors are simultaneously achieved in the framework of supervised and unsupervised learning. Furthermore, monitoring statistics are established on each cluster that represents a specific operation condition and two global Bayesian inference-based fault monitoring indexes are established by combining with all the monitoring results of all clusters. The efficiency and effectiveness of the proposed method are evaluated through UCI datasets, a simulated multimode model and the Tennessee Eastman benchmark process.展开更多
Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often...Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.展开更多
Numerical investigations of the Darmstadt transonic single stage compressor (DTC), in the Rotor1-Stator1 configuration, aimed at advancing the understanding of the effect of different rotor tip gaps and transition mod...Numerical investigations of the Darmstadt transonic single stage compressor (DTC), in the Rotor1-Stator1 configuration, aimed at advancing the understanding of the effect of different rotor tip gaps and transition modelling on the blade surfaces are presented. Steady three dimensional Reynolds Averaged Navier Stokes (RANS) simulations were performed to obtain the flow fields for the different configurations at different operating conditions using the RANS-Solver TRACE. The stage geometry and the multi-block structured grid were generated by G3DMESH and a grid sensitivity analysis was conducted. For the clearance gap region, a fully gridded special H-grid was chosen. Comparisons were made between the flow characteristic at design speed, representative for a transonic flow regime, and at 65% speed, representative for a subsonic flow regime. The computations were used to analyse the flow phenomena through the tip clearance region for the different configurations and their impact on the performance of the compressor stage.展开更多
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
文摘Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for us to explore new me-thods and approaches concerning the prevention and treatment of mental illne-sses.Large multimodal models(LMMs),as the most advanced artificial intelligen-ce models(i.e.ChatGPT-4),have brought new hope to the accurate prevention,diagnosis,and treatment of psychiatric disorders.The assistance of these models in the promotion of mental health is critical,as the latter necessitates a strong foundation of medical knowledge and professional skills,emotional support,stigma mitigation,the encouragement of more honest patient self-disclosure,reduced health care costs,improved medical efficiency,and greater mental health service coverage.However,these models must address challenges related to health,safety,hallucinations,and ethics simultaneously.In the future,we should address these challenges by developing relevant usage manuals,accountability rules,and legal regulations;implementing a human-centered approach;and intelligently upgrading LMMs through the deep optimization of such models,their algorithms,and other means.This effort will thus substantially contribute not only to the maintenance of students’health but also to the achievement of global sustainable development goals.
文摘User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
基金financial supports provided by the National Natural Science Foundation of China (No.51274202)the Fundamental Research Funds for the Central Universities (No.2013RC11)+3 种基金the Science and Technology Achievements Transformation Project of Jiangsu Province (No.BA2012068)the Natural Science Foundation of Jiangsu Province (Nos.BK20130199 and BK20131124)Ceeusro Prospective Joint Research Project of Jiangsu Province (No.BY2014028-01)Great Cultivating Special Project at China University of Mining and Technology (No.2014ZDPY16)
文摘For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the channel capacity formula derivation was obtained.On the optimal criterion of the channel capacity,the power allocation methods of both amplifying and forwarding(AF) and decoding and forwarding(DF) cooperative communication systems were proposed in the limitation of the total power to maximize the channel capacity.The mode selection methods of single input single output(SISO) and single input multiple output(SIMO) models in the rectangular tunnel,through which the higher channel capacity can be obtained,were put forward as well.The theoretical analysis and simulation comparison show that,channel capacity of the wireless communication system in the rectangular tunnel can be effectively enhanced through the cooperative technology;channel capacity of the rectangular tunnel under complicated conditions is maximized through the proposed power allocation methods,and the optimal cooperative mode of the channel capacity can be chosen according to the cooperative mode selection methods given in the paper.
基金Supported by the National Natural Science Foundation of China(61374140)Shanghai Pujiang Program(12PJ1402200)
文摘A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the AMPPCA algorithm first estimates a statistical description for each operating mode by applying mixture probabilistic principal component analysis(MPPCA). As a comparison, the combined MPPCA is employed where monitoring results are softly integrated according to posterior probabilities of the test sample in each local model. For exploiting the cross-mode correlations, which may be useful but are inadvertently neglected due to separately held monitoring approaches, a global monitoring model is constructed by aligning all local models together. In this way, both within-mode and cross-mode correlations are preserved in this integrated space. Finally, the utility and feasibility of AMPPCA are demonstrated through a non-isothermal continuous stirred tank reactor and the TE benchmark process.
基金Supported by the National Natural Science Foundation of China(61273167)
文摘Complex processes often work with multiple operation regions, it is critical to develop effective monitoring approaches to ensure the safety of chemical processes. In this work, a discriminant local consistency Gaussian mixture model(DLCGMM) for multimode process monitoring is proposed for multimode process monitoring by integrating LCGMM with modified local Fisher discriminant analysis(MLFDA). Different from Fisher discriminant analysis(FDA) that aims to discover the global optimal discriminant directions, MLFDA is capable of uncovering multimodality and local structure of the data by exploiting the posterior probabilities of observations within clusters calculated from the results of LCGMM. This may enable MLFDA to capture more meaningful discriminant information hidden in the high-dimensional multimode observations comparing to FDA. Contrary to most existing multimode process monitoring approaches, DLCGMM performs LCGMM and MFLDA iteratively, and the optimal subspaces with multi-Gaussianity and the optimal discriminant projection vectors are simultaneously achieved in the framework of supervised and unsupervised learning. Furthermore, monitoring statistics are established on each cluster that represents a specific operation condition and two global Bayesian inference-based fault monitoring indexes are established by combining with all the monitoring results of all clusters. The efficiency and effectiveness of the proposed method are evaluated through UCI datasets, a simulated multimode model and the Tennessee Eastman benchmark process.
文摘Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.
基金a part of the Deutsche Forschungsgemeinschaft Joint Research Project FOR-1066
文摘Numerical investigations of the Darmstadt transonic single stage compressor (DTC), in the Rotor1-Stator1 configuration, aimed at advancing the understanding of the effect of different rotor tip gaps and transition modelling on the blade surfaces are presented. Steady three dimensional Reynolds Averaged Navier Stokes (RANS) simulations were performed to obtain the flow fields for the different configurations at different operating conditions using the RANS-Solver TRACE. The stage geometry and the multi-block structured grid were generated by G3DMESH and a grid sensitivity analysis was conducted. For the clearance gap region, a fully gridded special H-grid was chosen. Comparisons were made between the flow characteristic at design speed, representative for a transonic flow regime, and at 65% speed, representative for a subsonic flow regime. The computations were used to analyse the flow phenomena through the tip clearance region for the different configurations and their impact on the performance of the compressor stage.