期刊文献+
共找到2,703篇文章
< 1 2 136 >
每页显示 20 50 100
Terrorism Attack Classification Using Machine Learning: The Effectiveness of Using Textual Features Extracted from GTD Dataset
1
作者 Mohammed Abdalsalam Chunlin Li +1 位作者 Abdelghani Dahou Natalia Kryvinska 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1427-1467,共41页
One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelli... One of the biggest dangers to society today is terrorism, where attacks have become one of the most significantrisks to international peace and national security. Big data, information analysis, and artificial intelligence (AI) havebecome the basis for making strategic decisions in many sensitive areas, such as fraud detection, risk management,medical diagnosis, and counter-terrorism. However, there is still a need to assess how terrorist attacks are related,initiated, and detected. For this purpose, we propose a novel framework for classifying and predicting terroristattacks. The proposed framework posits that neglected text attributes included in the Global Terrorism Database(GTD) can influence the accuracy of the model’s classification of terrorist attacks, where each part of the datacan provide vital information to enrich the ability of classifier learning. Each data point in a multiclass taxonomyhas one or more tags attached to it, referred as “related tags.” We applied machine learning classifiers to classifyterrorist attack incidents obtained from the GTD. A transformer-based technique called DistilBERT extracts andlearns contextual features from text attributes to acquiremore information from text data. The extracted contextualfeatures are combined with the “key features” of the dataset and used to perform the final classification. Thestudy explored different experimental setups with various classifiers to evaluate the model’s performance. Theexperimental results show that the proposed framework outperforms the latest techniques for classifying terroristattacks with an accuracy of 98.7% using a combined feature set and extreme gradient boosting classifier. 展开更多
关键词 Artificial intelligence machine learning natural language processing data analytic DistilBERT feature extraction terrorism classification GTD dataset
下载PDF
The accessible seismological dataset of a high-density 2D seismic array along Anninghe fault
2
作者 Weifan Lu Zeyan Zhao +3 位作者 Han Yue Shiyong Zhou Jianping Wu Xiaodong Song 《Earthquake Science》 2024年第1期67-77,共11页
The scientific goal of the Anninghe seismic array is to investigate the detailed geometry of the Anninghe fault and the velocity structure of the fault zone.This 2D seismic array is composed of 161 stations forming su... The scientific goal of the Anninghe seismic array is to investigate the detailed geometry of the Anninghe fault and the velocity structure of the fault zone.This 2D seismic array is composed of 161 stations forming sub-rectangular geometry along the Anninghe fault,which covers 50 km and 150 km in the fault normal and strike directions,respectively,with~5 km intervals.The data were collected between June 2020 and June 2021,with some level of temporal gaps.Two types of instruments,i.e.QS-05A and SmartSolo,are used in this array.Data quality and examples of seismograms are provided in this paper.After the data protection period ends(expected in June 2024),researchers can request a dataset from the National Earthquake Science Data Center. 展开更多
关键词 Anninghe fault seismological dataset data share
下载PDF
CREDIT-X1local:A reference dataset for machine learning seismology from ChinArray in Southwest China
3
作者 Lu Li Weitao Wang +1 位作者 Ziye Yu Yini Chen 《Earthquake Science》 2024年第2期139-157,共19页
High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology.Here,we present an earthquake dataset based on the ChinArray Phase I records(X1).ChinArray Phase I was deplo... High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology.Here,we present an earthquake dataset based on the ChinArray Phase I records(X1).ChinArray Phase I was deployed in the southern north-south seismic zone(20°N-32°N,95°E-110°E)in 2011-2013 using 355 portable broadband seismic stations.CREDIT-X1local,the first release of the ChinArray Reference Earthquake Dataset for Innovative Techniques(CREDIT),includes comprehensive information for the 105,455 local events that occurred in the southern north-south seismic zone during array observation,incorporating them into a single HDF5 file.Original 100-Hz sampled three-component waveforms are organized by event for stations within epicenter distances of 1,000 km,and records of≥200 s are included for each waveform.Two types of phase labels are provided.The first includes manually picked labels for 5,999 events with magnitudes≥2.0,providing 66,507 Pg,42,310 Sg,12,823 Pn,and 546 Sn phases.The second contains automatically labeled phases for 105,442 events with magnitudes of−1.6 to 7.6.These phases were picked using a recurrent neural network phase picker and screened using the corresponding travel time curves,resulting in 1,179,808 Pg,884,281 Sg,176,089 Pn,and 22,986 Sn phases.Additionally,first-motion polarities are included for 31,273 Pg phases.The event and station locations are provided,so that deep learning networks for both conventional phase picking and phase association can be trained and validated.The CREDIT-X1local dataset is the first million-scale dataset constructed from a dense seismic array,which is designed to support various multi-station deep-learning methods,high-precision focal mechanism inversion,and seismic tomography studies.Additionally,owing to the high seismicity in the southern north-south seismic zone in China,this dataset has great potential for future scientific discoveries. 展开更多
关键词 earthquake dataset machine learning Pg/Sg/Pn/Sn phase picking P-wave first-motion polarity
下载PDF
Optimizing Enterprise Conversational AI: Accelerating Response Accuracy with Custom Dataset Fine-Tuning
4
作者 Yash Kishore 《Intelligent Information Management》 2024年第2期65-76,共12页
As the realm of enterprise-level conversational AI continues to evolve, it becomes evident that while generalized Large Language Models (LLMs) like GPT-3.5 bring remarkable capabilities, they also bring forth formidab... As the realm of enterprise-level conversational AI continues to evolve, it becomes evident that while generalized Large Language Models (LLMs) like GPT-3.5 bring remarkable capabilities, they also bring forth formidable challenges. These models, honed on vast and diverse datasets, have undoubtedly pushed the boundaries of natural language understanding and generation. However, they often stumble when faced with the intricate demands of nuanced enterprise applications. This research advocates for a strategic paradigm shift, urging enterprises to embrace a fine-tuning approach as a means to optimize conversational AI. While generalized LLMs are linguistic marvels, their inability to cater to the specific needs of businesses across various industries poses a critical challenge. This strategic shift involves empowering enterprises to seamlessly integrate their own datasets into LLMs, a process that extends beyond linguistic enhancement. The core concept of this approach centers on customization, enabling businesses to fine-tune the AI’s functionality to fit precisely within their unique business landscapes. By immersing the LLM in industry-specific documents, customer interaction records, internal reports, and regulatory guidelines, the AI transcends its generic capabilities to become a sophisticated conversational partner aligned with the intricacies of the enterprise’s domain. The transformative potential of this fine-tuning approach cannot be overstated. It enables a transition from a universal AI solution to a highly customizable tool. The AI evolves from being a linguistic powerhouse to a contextually aware, industry-savvy assistant. As a result, it not only responds with linguistic accuracy but also with depth, relevance, and resonance, significantly elevating user experiences and operational efficiency. In the subsequent sections, this paper delves into the intricacies of fine-tuning, exploring the multifaceted challenges and abundant opportunities it presents. It addresses the technical intricacies of data integration, ethical considerations surrounding data usage, and the broader implications for the future of enterprise AI. The journey embarked upon in this research holds the potential to redefine the role of conversational AI in enterprises, ushering in an era where AI becomes a dynamic, deeply relevant, and highly effective tool, empowering businesses to excel in an ever-evolving digital landscape. 展开更多
关键词 Fine-Tuning dataset AI CONVERSATIONAL ENTERPRISE LLM
下载PDF
Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection
5
作者 Ankan Kar Nirjhar Nath +1 位作者 Utpalraj Kemprai   Aman 《International Journal of Communications, Network and System Sciences》 2024年第2期11-29,共19页
This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to... This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus. 展开更多
关键词 Support Vector Machine Challenging datasets Forest Fire Detection CLASSIFICATION
下载PDF
Deep Learning Recognition for Arabic Alphabet Sign Language RGB Dataset
6
作者 Rabie El Kharoua Xiaoming Jiang 《Journal of Computer and Communications》 2024年第3期32-51,共20页
This paper introduces a Convolutional Neural Network (CNN) model for Arabic Sign Language (AASL) recognition, using the AASL dataset. Recognizing the fundamental importance of communication for the hearing-impaired, e... This paper introduces a Convolutional Neural Network (CNN) model for Arabic Sign Language (AASL) recognition, using the AASL dataset. Recognizing the fundamental importance of communication for the hearing-impaired, especially within the Arabic-speaking deaf community, the study emphasizes the critical role of sign language recognition systems. The proposed methodology achieves outstanding accuracy, with the CNN model reaching 99.9% accuracy on the training set and a validation accuracy of 97.4%. This study not only establishes a high-accuracy AASL recognition model but also provides insights into effective dropout strategies. The achieved high accuracy rates position the proposed model as a significant advancement in the field, holding promise for improved communication accessibility for the Arabic-speaking deaf community. 展开更多
关键词 Convolutional Neural Network (CNN) AASL dataset DROPOUT Deep Learning Communication Technology
下载PDF
M3SC:A Generic Dataset for Mixed Multi-Modal(MMM)Sensing and Communication Integration 被引量:3
7
作者 Xiang Cheng Ziwei Huang +6 位作者 Lu Bai Haotian Zhang Mingran Sun Boxun Liu Sijiang Li Jianan Zhang Minson Lee 《China Communications》 SCIE CSCD 2023年第11期13-29,共17页
The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication ... The sixth generation(6G)of mobile communication system is witnessing a new paradigm shift,i.e.,integrated sensing-communication system.A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research.This paper develops a novel simulation dataset,named M3SC,for mixed multi-modal(MMM)sensing-communication integration,and the generation framework of the M3SC dataset is further given.To obtain multimodal sensory data in physical space and communication data in electromagnetic space,we utilize Air-Sim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data.Furthermore,the in-depth integration and precise alignment of AirSim,WaveFarer,andWireless InSite are achieved.The M3SC dataset covers various weather conditions,multiplex frequency bands,and different times of the day.Currently,the M3SC dataset contains 1500 snapshots,including 80 RGB images,160 depth maps,80 LiDAR point clouds,256 sets of mmWave waveforms with 8 radar point clouds,and 72 channel impulse response(CIR)matrices per snapshot,thus totaling 120,000 RGB images,240,000 depth maps,120,000 LiDAR point clouds,384,000 sets of mmWave waveforms with 12,000 radar point clouds,and 108,000 CIR matrices.The data processing result presents the multi-modal sensory information and communication channel statistical properties.Finally,the MMM sensing-communication application,which can be supported by the M3SC dataset,is discussed. 展开更多
关键词 multi-modal sensing RAY-TRACING sensing-communication integration simulation dataset
下载PDF
Pavement Cracks Coupled With Shadows:A New Shadow-Crack Dataset and A Shadow-Removal-Oriented Crack Detection Approach 被引量:1
8
作者 Lili Fan Shen Li +3 位作者 Ying Li Bai Li Dongpu Cao Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第7期1593-1607,共15页
Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,whi... Automatic pavement crack detection is a critical task for maintaining the pavement stability and driving safety.The task is challenging because the shadows on the pavement may have similar intensity with the crack,which interfere with the crack detection performance.Till to the present,there still lacks efficient algorithm models and training datasets to deal with the interference brought by the shadows.To fill in the gap,we made several contributions as follows.First,we proposed a new pavement shadow and crack dataset,which contains a variety of shadow and pavement pixel size combinations.It also covers all common cracks(linear cracks and network cracks),placing higher demands on crack detection methods.Second,we designed a two-step shadow-removal-oriented crack detection approach:SROCD,which improves the performance of the algorithm by first removing the shadow and then detecting it.In addition to shadows,the method can cope with other noise disturbances.Third,we explored the mechanism of how shadows affect crack detection.Based on this mechanism,we propose a data augmentation method based on the difference in brightness values,which can adapt to brightness changes caused by seasonal and weather changes.Finally,we introduced a residual feature augmentation algorithm to detect small cracks that can predict sudden disasters,and the algorithm improves the performance of the model overall.We compare our method with the state-of-the-art methods on existing pavement crack datasets and the shadow-crack dataset,and the experimental results demonstrate the superiority of our method. 展开更多
关键词 Automatic pavement crack detection data augmentation compensation deep learning residual feature augmentation shadow removal shadow-crack dataset
下载PDF
DiTing:A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology 被引量:1
9
作者 Ming Zhao Zhuowei Xiao +1 位作者 Shi Chen Lihua Fang 《Earthquake Science》 2023年第2期84-94,共11页
In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and a... In recent years,artificial intelligence technology has exhibited great potential in seismic signal recognition,setting off a new wave of research.Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research.In this study,based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center,we constructed an artificial intelligence seismological training dataset(“DiTing”)with the largest known total time length.Data were recorded using broadband and short-period seismometers.The obtained dataset included 2,734,748 threecomponent waveform traces from 787,010 regional seismic events,the corresponding P-and S-phase arrival time labels,and 641,025 P-wave first-motion polarity labels.All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake.Each three-component waveform contained a considerable amount of descriptive information,such as the epicentral distance,back azimuth,and signal-to-noise ratios.The magnitudes of seismic events,epicentral distance,signal-to-noise ratio of P-wave data,and signal-to-noise ratio of S-wave data ranged from 0 to 7.7,0 to 330 km,–0.05 to 5.31 dB,and–0.05 to 4.73 dB,respectively.The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection,seismic phase picking,first-motion polarity determination,earthquake magnitude prediction,early warning systems,and strong ground-motion prediction.Such research will further promote the development and application of artificial intelligence in seismology. 展开更多
关键词 artificial intelligence benchmark dataset earthquake detection seismic phase identification first-motion polarity
下载PDF
AWSD:An Aircraft Wing Dataset Created by an Automatic Workflow for Data Mining in Geometric Processing
10
作者 Xiang Su Nan Li +1 位作者 Yuedi Hu Haisheng Li 《Computer Modeling in Engineering & Sciences》 SCIE EI 2023年第9期2935-2952,共18页
This paper introduces an aircraft wing simulation data set(AWSD)created by an automatic workflow based on creating models,meshing.simulating the wing flight flow field solution,and parameterizing solution results.AWSD... This paper introduces an aircraft wing simulation data set(AWSD)created by an automatic workflow based on creating models,meshing.simulating the wing flight flow field solution,and parameterizing solution results.AWSD is a flexible,independent wing collection of simulations with specific engineering requirements.The data set is applicable to handle computer geometry processing tasks.In contrast to the existing 3D model data set,there are some advantages the scale of this data set is not limited by the collection source,the data files have high quality,no defects,redundancy,and other problems,and the models and simulation are all designed for the specific actual engineering demand.Moreover,AWSD has the characteristics of rich information and a similar model structure,which contributes to the construction of the surrogate model.On the other hand,this data set is suitable for advancing research of data mining in computational geometry graphics.To solve the problem that the CFD flows field results are not intuitive,this paper used the resampling method of surface data to sample the result to the model surface,then segmented the re sampled 3D mesh surface,and compared with the diferences among K-means algorithm,Mini_Batch K means algorithm,and Spectral Clustering algorithm.AWSD provides 300 sets of models,meshes,CFD simulation results,and parametric results based on ARAP(As-Rigid-As Posible)and Harmonic mapping for advancing the construction of engineering surrogate models,3D mesh segmentation,surface resampling,and related geometric processing tasks. 展开更多
关键词 dataset CFD mesh PARAMETERIZATION parametric modeling
下载PDF
MBB-IoT:Construction and Evaluation of IoT DDoS Traffic Dataset from a New Perspective
11
作者 Yi Qing Xiangyu Liu Yanhui Du 《Computers, Materials & Continua》 SCIE EI 2023年第8期2095-2119,共25页
Distributed Denial of Service(DDoS)attacks have always been a major concern in the security field.With the release of malware source codes such as BASHLITE and Mirai,Internet of Things(IoT)devices have become the new ... Distributed Denial of Service(DDoS)attacks have always been a major concern in the security field.With the release of malware source codes such as BASHLITE and Mirai,Internet of Things(IoT)devices have become the new source of DDoS attacks against many Internet applications.Although there are many datasets in the field of IoT intrusion detection,such as Bot-IoT,ConstrainedApplication Protocol–Denial of Service(CoAPDoS),and LATAM-DDoS-IoT(some of the names of DDoS datasets),which mainly focus on DDoS attacks,the datasets describing new IoT DDoS attack scenarios are extremely rare,and only N-BaIoT and IoT-23 datasets used IoT devices as DDoS attackers in the construction process,while they did not use Internet applications as victims either.To supplement the description of the new trend of DDoS attacks in the dataset,we built an IoT environment with mainstream DDoS attack tools such as Mirai and BASHLITE being used to infect IoT devices and implement DDoS attacks against WEB servers.Then,data aggregated into a dataset namedMBB-IoTwere captured atWEBservers and IoT nodes.After the MBB-IoT dataset was split into a training set and a test set,it was applied to the training and testing of the Random Forests classification algorithm.The multi-class classification metrics were good and all above 90%.Secondly,in a cross-evaluation experiment based on Support Vector Machine(SVM),Light Gradient Boosting Machine(LightGBM),and Long Short Term Memory networks(LSTM)classification algorithms,the training set and test set were derived from different datasets(MBB-IoT or IoT-23),and the test performance is better when MBB-IoT is used as the training set. 展开更多
关键词 Intrusion detection IOT MALWARE BOTNET DDOS dataset
下载PDF
COVID-19 Classification from X-Ray Images:An Approach to Implement Federated Learning on Decentralized Dataset
12
作者 Ali Akbar Siddique S.M.Umar Talha +3 位作者 M.Aamir Abeer D.Algarni Naglaa F.Soliman Walid El-Shafai 《Computers, Materials & Continua》 SCIE EI 2023年第5期3883-3901,共19页
The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients ... The COVID-19 pandemic has devastated our daily lives,leaving horrific repercussions in its aftermath.Due to its rapid spread,it was quite difficult for medical personnel to diagnose it in such a big quantity.Patients who test positive for Covid-19 are diagnosed via a nasal PCR test.In comparison,polymerase chain reaction(PCR)findings take a few hours to a few days.The PCR test is expensive,although the government may bear expenses in certain places.Furthermore,subsets of the population resist invasive testing like swabs.Therefore,chest X-rays or Computerized Vomography(CT)scans are preferred in most cases,and more importantly,they are non-invasive,inexpensive,and provide a faster response time.Recent advances in Artificial Intelligence(AI),in combination with state-of-the-art methods,have allowed for the diagnosis of COVID-19 using chest x-rays.This article proposes a method for classifying COVID-19 as positive or negative on a decentralized dataset that is based on the Federated learning scheme.In order to build a progressive global COVID-19 classification model,two edge devices are employed to train the model on their respective localized dataset,and a 3-layered custom Convolutional Neural Network(CNN)model is used in the process of training the model,which can be deployed from the server.These two edge devices then communicate their learned parameter and weight to the server,where it aggregates and updates the globalmodel.The proposed model is trained using an image dataset that can be found on Kaggle.There are more than 13,000 X-ray images in Kaggle Database collection,from that collection 9000 images of Normal and COVID-19 positive images are used.Each edge node possesses a different number of images;edge node 1 has 3200 images,while edge node 2 has 5800.There is no association between the datasets of the various nodes that are included in the network.By doing it in this manner,each of the nodes will have access to a separate image collection that has no correlation with each other.The diagnosis of COVID-19 has become considerably more efficient with the installation of the suggested algorithm and dataset,and the findings that we have obtained are quite encouraging. 展开更多
关键词 Artificial intelligence deep learning federated learning COVID-19 decentralized image dataset
下载PDF
Intrusion Detection System with Customized Machine Learning Techniques for NSL-KDD Dataset
13
作者 Mohammed Zakariah Salman A.AlQahtani +1 位作者 Abdulaziz M.Alawwad Abdullilah A.Alotaibi 《Computers, Materials & Continua》 SCIE EI 2023年第12期4025-4054,共30页
Modern networks are at risk from a variety of threats as a result of the enormous growth in internet-based traffic.By consuming time and resources,intrusive traffic hampers the efficient operation of network infrastru... Modern networks are at risk from a variety of threats as a result of the enormous growth in internet-based traffic.By consuming time and resources,intrusive traffic hampers the efficient operation of network infrastructure.An effective strategy for preventing,detecting,and mitigating intrusion incidents will increase productivity.A crucial element of secure network traffic is Intrusion Detection System(IDS).An IDS system may be host-based or network-based to monitor intrusive network activity.Finding unusual internet traffic has become a severe security risk for intelligent devices.These systems are negatively impacted by several attacks,which are slowing computation.In addition,networked communication anomalies and breaches must be detected using Machine Learning(ML).This paper uses the NSL-KDD data set to propose a novel IDS based on Artificial Neural Networks(ANNs).As a result,the ML model generalizes sufficiently to perform well on untried data.The NSL-KDD dataset shall be utilized for both training and testing.In this paper,we present a custom ANN model architecture using the Keras open-source software package.The specific arrangement of nodes and layers,along with the activation functions,enhances the model’s ability to capture intricate patterns in network data.The performance of the ANN is carefully tested and evaluated,resulting in the identification of a maximum detection accuracy of 97.5%.We thoroughly compared our suggested model to industry-recognized benchmark methods,such as decision classifier combinations and ML classifiers like k-Nearest Neighbors(KNN),Deep Learning(DL),Support Vector Machine(SVM),Long Short-Term Memory(LSTM),Deep Neural Network(DNN),and ANN.It is encouraging to see that our model consistently outperformed each of these tried-and-true techniques in all evaluations.This result underlines the effectiveness of the suggested methodology by demonstrating the ANN’s capacity to accurately assess the effectiveness of the developed strategy in identifying and categorizing instances of network intrusion. 展开更多
关键词 Artificial neural networks intrusion detection system CLASSIFICATION NSL-KDD dataset machine and deep-learning neural network
下载PDF
An Imbalanced Dataset and Class Overlapping Classification Model for Big Data
14
作者 Mini Prince P.M.Joe Prathap 《Computer Systems Science & Engineering》 SCIE EI 2023年第2期1009-1024,共16页
Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imba... Most modern technologies,such as social media,smart cities,and the internet of things(IoT),rely on big data.When big data is used in the real-world applications,two data challenges such as class overlap and class imbalance arises.When dealing with large datasets,most traditional classifiers are stuck in the local optimum problem.As a result,it’s necessary to look into new methods for dealing with large data collections.Several solutions have been proposed for overcoming this issue.The rapid growth of the available data threatens to limit the usefulness of many traditional methods.Methods such as oversampling and undersampling have shown great promises in addressing the issues of class imbalance.Among all of these techniques,Synthetic Minority Oversampling TechniquE(SMOTE)has produced the best results by generating synthetic samples for the minority class in creating a balanced dataset.The issue is that their practical applicability is restricted to problems involving tens of thousands or lower instances of each.In this paper,we have proposed a parallel mode method using SMOTE and MapReduce strategy,this distributes the operation of the algorithm among a group of computational nodes for addressing the aforementioned problem.Our proposed solution has been divided into three stages.Thefirst stage involves the process of splitting the data into different blocks using a mapping function,followed by a pre-processing step for each mapping block that employs a hybrid SMOTE algo-rithm for solving the class imbalanced problem.On each map block,a decision tree model would be constructed.Finally,the decision tree blocks would be com-bined for creating a classification model.We have used numerous datasets with up to 4 million instances in our experiments for testing the proposed scheme’s cap-abilities.As a result,the Hybrid SMOTE appears to have good scalability within the framework proposed,and it also cuts down the processing time. 展开更多
关键词 Imbalanced dataset class overlapping SMOTE MAPREDUCE parallel programming OVERSAMPLING
下载PDF
Active Learning Strategies for Textual Dataset-Automatic Labelling
15
作者 Sher Muhammad Daudpota Saif Hassan +2 位作者 Yazeed Alkhurayyif Abdullah Saleh Alqahtani Muhammad Haris Aziz 《Computers, Materials & Continua》 SCIE EI 2023年第8期1409-1422,共14页
The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in ... The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling. 展开更多
关键词 Active learning automatic labelling textual datasets
下载PDF
Dataset of Comparative Observations for Land Surface Processes over the Semi-Arid Alpine Grassland against Alpine Lakes in the Source Region of the Yellow River
16
作者 Xianhong MENG Shihua LYU +13 位作者 Zhaoguo LI Yinhuan AO Lijuan WEN Lunyu SHANG Shaoying WANG Mingshan DENG Shaobo ZHANG Lin ZHAO Hao CHEN Di MA Suosuo LI Lele SHU Yingying AN Hanlin NIU 《Advances in Atmospheric Sciences》 SCIE CAS CSCD 2023年第6期1142-1157,共16页
Thousands of lakes on the Tibetan Plateau(TP) play a critical role in the regional water cycle, weather, and climate. In recent years, the areas of TP lakes underwent drastic changes and have become a research hotspot... Thousands of lakes on the Tibetan Plateau(TP) play a critical role in the regional water cycle, weather, and climate. In recent years, the areas of TP lakes underwent drastic changes and have become a research hotspot. However, the characteristics of the lake-atmosphere interaction over the high-altitude lakes are still unclear, which inhibits model development and the accurate simulation of lake climate effects. The source region of the Yellow River(SRYR) has the largest outflow lake and freshwater lake on the TP and is one of the most densely distributed lakes on the TP. Since 2011,three observation sites have been set up in the Ngoring Lake basin in the SRYR to monitor the lake-atmosphere interaction and the differences among water-heat exchanges over the land and lake surfaces. This study presents an eight-year(2012–19), half-hourly, observation-based dataset related to lake–atmosphere interactions composed of three sites. The three sites represent the lake surface, the lakeside, and the land. The observations contain the basic meteorological elements,surface radiation, eddy covariance system, soil temperature, and moisture(for land). Information related to the sites and instruments, the continuity and completeness of data, and the differences among the observational results at different sites are described in this study. These data have been used in the previous study to reveal a few energy and water exchange characteristics of TP lakes and to validate and improve the lake and land surface model. The dataset is available at National Cryosphere Desert Data Center and Science Data Bank. 展开更多
关键词 field observation dataset lake-atmosphere interaction energy and water exchanges the source region of the Yellow River Tibetan Plateau
下载PDF
A Comprehensive Analysis of Datasets for Automotive Intrusion Detection Systems
17
作者 Seyoung Lee Wonsuk Choi +2 位作者 InsupKim Ganggyu Lee Dong Hoon Lee 《Computers, Materials & Continua》 SCIE EI 2023年第9期3413-3442,共30页
Recently,automotive intrusion detection systems(IDSs)have emerged as promising defense approaches to counter attacks on in-vehicle networks(IVNs).However,the effectiveness of IDSs relies heavily on the quality of the ... Recently,automotive intrusion detection systems(IDSs)have emerged as promising defense approaches to counter attacks on in-vehicle networks(IVNs).However,the effectiveness of IDSs relies heavily on the quality of the datasets used for training and evaluation.Despite the availability of several datasets for automotive IDSs,there has been a lack of comprehensive analysis focusing on assessing these datasets.This paper aims to address the need for dataset assessment in the context of automotive IDSs.It proposes qualitative and quantitative metrics that are independent of specific automotive IDSs,to evaluate the quality of datasets.These metrics take into consideration various aspects such as dataset description,collection environment,and attack complexity.This paper evaluates eight commonly used datasets for automotive IDSs using the proposed metrics.The evaluation reveals biases in the datasets,particularly in terms of limited contexts and lack of diversity.Additionally,it highlights that the attacks in the datasets were mostly injected without considering normal behaviors,which poses challenges for training and evaluating machine learning-based IDSs.This paper emphasizes the importance of addressing the identified limitations in existing datasets to improve the performance and adaptability of automotive IDSs.The proposed metrics can serve as valuable guidelines for researchers and practitioners in selecting and constructing high-quality datasets for automotive security applications.Finally,this paper presents the requirements for high-quality datasets,including the need for representativeness,diversity,and balance. 展开更多
关键词 Controller area network(CAN) intrusion detection system(IDS) automotive security machine learning(ML) dataset
下载PDF
Dataset of Large Gathering Images for Person Identification and Tracking
18
作者 Adnan Nadeem Amir Mehmood +7 位作者 Kashif Rizwan Muhammad Ashraf Nauman Qadeer Ali Alzahrani Qammer H.Abbasi Fazal Noor Majed Alhaisoni Nadeem Mahmood 《Computers, Materials & Continua》 SCIE EI 2023年第3期6065-6080,共16页
This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed ... This paper presents a large gathering dataset of images extracted from publicly filmed videos by 24 cameras installed on the premises of Masjid Al-Nabvi,Madinah,Saudi Arabia.This dataset consists of raw and processed images reflecting a highly challenging and unconstraint environment.The methodology for building the dataset consists of four core phases;that include acquisition of videos,extraction of frames,localization of face regions,and cropping and resizing of detected face regions.The raw images in the dataset consist of a total of 4613 frames obtained fromvideo sequences.The processed images in the dataset consist of the face regions of 250 persons extracted from raw data images to ensure the authenticity of the presented data.The dataset further consists of 8 images corresponding to each of the 250 subjects(persons)for a total of 2000 images.It portrays a highly unconstrained and challenging environment with human faces of varying sizes and pixel quality(resolution).Since the face regions in video sequences are severely degraded due to various unavoidable factors,it can be used as a benchmark to test and evaluate face detection and recognition algorithms for research purposes.We have also gathered and displayed records of the presence of subjects who appear in presented frames;in a temporal context.This can also be used as a temporal benchmark for tracking,finding persons,activity monitoring,and crowd counting in large crowd scenarios. 展开更多
关键词 Large crowd gatherings a dataset of large crowd images highly uncontrolled environment tracking missing persons face recognition activity monitoring
下载PDF
Empirical Analysis of Neural Networks-Based Models for Phishing Website Classification Using Diverse Datasets
19
作者 Shoaib Khan Bilal Khan +2 位作者 Saifullah Jan Subhan Ullah Aiman 《Journal of Cyber Security》 2023年第1期47-66,共20页
Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phis... Phishing attacks pose a significant security threat by masquerading as trustworthy entities to steal sensitive information,a problem that persists despite user awareness.This study addresses the pressing issue of phishing attacks on websites and assesses the performance of three prominent Machine Learning(ML)models—Artificial Neural Networks(ANN),Convolutional Neural Networks(CNN),and Long Short-Term Memory(LSTM)—utilizing authentic datasets sourced from Kaggle and Mendeley repositories.Extensive experimentation and analysis reveal that the CNN model achieves a better accuracy of 98%.On the other hand,LSTM shows the lowest accuracy of 96%.These findings underscore the potential of ML techniques in enhancing phishing detection systems and bolstering cybersecurity measures against evolving phishing tactics,offering a promising avenue for safeguarding sensitive information and online security. 展开更多
关键词 Artificial neural networks phishing websites network security machine learning phishing datasets CLASSIFICATION
下载PDF
Credit Card Fraud Detection on Original European Credit Card Holder Dataset Using Ensemble Machine Learning Technique
20
作者 Yih Bing Chu Zhi Min Lim +3 位作者 Bryan Keane Ping Hao Kong Ahmed Rafat Elkilany Osama Hisham Abusetta 《Journal of Cyber Security》 2023年第1期33-46,共14页
The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machin... The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud,particularly in credit card transactions.Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising frompotentially fraudulent activities.However,a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations.While sampling techniques can significantly reduce computational time,the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed.Such datasets often lack true representativeness of realworld data,potentially introducing secondary issues that affect the precision of the results.For instance,undersampling may result in the loss of critical information,while over-sampling can lead to overfitting machine learning models.In this paper,we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset.The results indicate that Support Vector Machine(SVM)consistently achieves classification performance exceeding 90%across various evaluation metrics.This discovery serves as a valuable reference for future research,encouraging comparative studies on original dataset without the reliance on sampling techniques.Furthermore,we explore hybrid machine learning techniques,such as ensemble learning constructed based on SVM,K-Nearest Neighbor(KNN)and decision tree,highlighting their potential advancements in the field.The study demonstrates that the proposed machine learning models yield promising results,suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary.This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets,thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems. 展开更多
关键词 Machine learning credit card fraud ensemble learning non-sampled dataset hybrid AI models European credit card holder
下载PDF
上一页 1 2 136 下一页 到第
使用帮助 返回顶部