The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their ...The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their adaptability and wide usage.Detecting malware in PDF files is challenging due to its ability to include various harmful elements such as embedded scripts,exploits,and malicious URLs.This paper presents a comparative analysis of machine learning(ML)techniques,including Naive Bayes(NB),K-Nearest Neighbor(KNN),Average One Dependency Estimator(A1DE),RandomForest(RF),and SupportVectorMachine(SVM)forPDFmalware detection.The study utilizes a dataset obtained from the Canadian Institute for Cyber-security and employs different testing criteria,namely percentage splitting and 10-fold cross-validation.The performance of the techniques is evaluated using F1-score,precision,recall,and accuracy measures.The results indicate that KNNoutperforms other models,achieving an accuracy of 99.8599%using 10-fold cross-validation.The findings highlight the effectiveness of ML models in accurately detecting PDF malware and provide insights for developing robust systems to protect against malicious activities.展开更多
Movies are the better source of entertainment.Every year,a great percentage of movies are released.People comment on movies in the form of reviews after watching them.Since it is difficult to read all of the reviews f...Movies are the better source of entertainment.Every year,a great percentage of movies are released.People comment on movies in the form of reviews after watching them.Since it is difficult to read all of the reviews for a movie,summarizing all of the reviews will help make this decision without wasting time in reading all of the reviews.Opinion mining also known as sentiment analysis is the process of extracting subjective information from textual data.Opinion mining involves identifying and extracting the opinions of individuals,which can be positive,neutral,or negative.The task of opinion mining also called sentiment analysis is performed to understand people’s emotions and attitudes in movie reviews.Movie reviews are an important source of opinion data because they provide insight into the general public’s opinions about a particular movie.The summary of all reviews can give a general idea about the movie.This study compares baseline techniques,Logistic Regression,Random Forest Classifier,Decision Tree,K-Nearest Neighbor,Gradient Boosting Classifier,and Passive Aggressive Classifier with Linear Support Vector Machines and Multinomial Naïve Bayes on the IMDB Dataset of 50K reviews and Sentiment Polarity Dataset Version 2.0.Before applying these classifiers,in pre-processing both datasets are cleaned,duplicate data is dropped and chat words are treated for better results.On the IMDB Dataset of 50K reviews,Linear Support Vector Machines achieve the highest accuracy of 89.48%,and after hyperparameter tuning,the Passive Aggressive Classifier achieves the highest accuracy of 90.27%,while Multinomial Nave Bayes achieves the highest accuracy of 70.69%and 71.04%after hyperparameter tuning on the Sentiment Polarity Dataset Version 2.0.This study highlights the importance of sentiment analysis as a tool for understanding the emotions and attitudes in movie reviews and predicts the performance of a movie based on the average sentiment of all the reviews.展开更多
文摘The proliferation of maliciously coded documents as file transfers increase has led to a rise in sophisticated attacks.Portable Document Format(PDF)files have emerged as a major attack vector for malware due to their adaptability and wide usage.Detecting malware in PDF files is challenging due to its ability to include various harmful elements such as embedded scripts,exploits,and malicious URLs.This paper presents a comparative analysis of machine learning(ML)techniques,including Naive Bayes(NB),K-Nearest Neighbor(KNN),Average One Dependency Estimator(A1DE),RandomForest(RF),and SupportVectorMachine(SVM)forPDFmalware detection.The study utilizes a dataset obtained from the Canadian Institute for Cyber-security and employs different testing criteria,namely percentage splitting and 10-fold cross-validation.The performance of the techniques is evaluated using F1-score,precision,recall,and accuracy measures.The results indicate that KNNoutperforms other models,achieving an accuracy of 99.8599%using 10-fold cross-validation.The findings highlight the effectiveness of ML models in accurately detecting PDF malware and provide insights for developing robust systems to protect against malicious activities.
文摘Movies are the better source of entertainment.Every year,a great percentage of movies are released.People comment on movies in the form of reviews after watching them.Since it is difficult to read all of the reviews for a movie,summarizing all of the reviews will help make this decision without wasting time in reading all of the reviews.Opinion mining also known as sentiment analysis is the process of extracting subjective information from textual data.Opinion mining involves identifying and extracting the opinions of individuals,which can be positive,neutral,or negative.The task of opinion mining also called sentiment analysis is performed to understand people’s emotions and attitudes in movie reviews.Movie reviews are an important source of opinion data because they provide insight into the general public’s opinions about a particular movie.The summary of all reviews can give a general idea about the movie.This study compares baseline techniques,Logistic Regression,Random Forest Classifier,Decision Tree,K-Nearest Neighbor,Gradient Boosting Classifier,and Passive Aggressive Classifier with Linear Support Vector Machines and Multinomial Naïve Bayes on the IMDB Dataset of 50K reviews and Sentiment Polarity Dataset Version 2.0.Before applying these classifiers,in pre-processing both datasets are cleaned,duplicate data is dropped and chat words are treated for better results.On the IMDB Dataset of 50K reviews,Linear Support Vector Machines achieve the highest accuracy of 89.48%,and after hyperparameter tuning,the Passive Aggressive Classifier achieves the highest accuracy of 90.27%,while Multinomial Nave Bayes achieves the highest accuracy of 70.69%and 71.04%after hyperparameter tuning on the Sentiment Polarity Dataset Version 2.0.This study highlights the importance of sentiment analysis as a tool for understanding the emotions and attitudes in movie reviews and predicts the performance of a movie based on the average sentiment of all the reviews.