期刊文献+
共找到41,824篇文章
< 1 2 250 >
每页显示 20 50 100
Recent Advances in Video Coding for Machines Standard and Technologies
1
作者 ZHANG Qiang MEI Junjun +3 位作者 GUAN Tao SUN Zhewen ZHANG Zixiang YU Li 《ZTE Communications》 2024年第1期62-76,共15页
To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization procedures.In this paper,recent advan... To improve the performance of video compression for machine vision analysis tasks,a video coding for machines(VCM)standard working group was established to promote standardization procedures.In this paper,recent advances in video coding for machine standards are presented and comprehensive introductions to the use cases,requirements,evaluation frameworks and corresponding metrics of the VCM standard are given.Then the existing methods are presented,introducing the existing proposals by category and the research progress of the latest VCM conference.Finally,we give conclusions. 展开更多
关键词 video coding for machines VCM video compression
下载PDF
A Personalized Video Synopsis Framework for Spherical Surveillance Video
2
作者 S.Priyadharshini Ansuman Mahapatra 《Computer Systems Science & Engineering》 SCIE EI 2023年第6期2603-2616,共14页
Video synopsis is an effective way to easily summarize long-recorded surveillance videos.The omnidirectional view allows the observer to select the desired fields of view(FoV)from the different FoVavailable for spheri... Video synopsis is an effective way to easily summarize long-recorded surveillance videos.The omnidirectional view allows the observer to select the desired fields of view(FoV)from the different FoVavailable for spherical surveillance video.By choosing to watch one portion,the observer misses out on the events occurring somewhere else in the spherical scene.This causes the observer to experience fear of missing out(FOMO).Hence,a novel personalized video synopsis approach for the generation of non-spherical videos has been introduced to address this issue.It also includes an action recognition module that makes it easy to display necessary actions by prioritizing them.This work minimizes and maximizes multiple goals such as loss of activity,collision,temporal consistency,length,show,and important action cost respectively.The performance of the proposed framework is evaluated through extensive simulation and compared with the state-of-art video synopsis optimization algorithms.Experimental results suggest that some constraints are better optimized by using the latest metaheuristic optimization algorithms to generate compact personalized synopsis videos from spherical surveillance videos. 展开更多
关键词 Immersive video non-spherical video synopsis spherical video panoramic surveillance video 360°video
下载PDF
A Video Captioning Method by Semantic Topic-Guided Generation
3
作者 Ou Ye Xinli Wei +2 位作者 Zhenhua Yu Yan Fu Ying Yang 《Computers, Materials & Continua》 SCIE EI 2024年第1期1071-1093,共23页
In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is de... In the video captioning methods based on an encoder-decoder,limited visual features are extracted by an encoder,and a natural sentence of the video content is generated using a decoder.However,this kind ofmethod is dependent on a single video input source and few visual labels,and there is a problem with semantic alignment between video contents and generated natural sentences,which are not suitable for accurately comprehending and describing the video contents.To address this issue,this paper proposes a video captioning method by semantic topic-guided generation.First,a 3D convolutional neural network is utilized to extract the spatiotemporal features of videos during the encoding.Then,the semantic topics of video data are extracted using the visual labels retrieved from similar video data.In the decoding,a decoder is constructed by combining a novel Enhance-TopK sampling algorithm with a Generative Pre-trained Transformer-2 deep neural network,which decreases the influence of“deviation”in the semantic mapping process between videos and texts by jointly decoding a baseline and semantic topics of video contents.During this process,the designed Enhance-TopK sampling algorithm can alleviate a long-tail problem by dynamically adjusting the probability distribution of the predicted words.Finally,the experiments are conducted on two publicly used Microsoft Research Video Description andMicrosoft Research-Video to Text datasets.The experimental results demonstrate that the proposed method outperforms several state-of-art approaches.Specifically,the performance indicators Bilingual Evaluation Understudy,Metric for Evaluation of Translation with Explicit Ordering,Recall Oriented Understudy for Gisting Evaluation-longest common subsequence,and Consensus-based Image Description Evaluation of the proposed method are improved by 1.2%,0.1%,0.3%,and 2.4% on the Microsoft Research Video Description dataset,and 0.1%,1.0%,0.1%,and 2.8% on the Microsoft Research-Video to Text dataset,respectively,compared with the existing video captioning methods.As a result,the proposed method can generate video captioning that is more closely aligned with human natural language expression habits. 展开更多
关键词 video captioning encoder-decoder semantic topic jointly decoding Enhance-TopK sampling
下载PDF
Improving Video Watermarking through Galois Field GF(2^(4)) Multiplication Tables with Diverse Irreducible Polynomials and Adaptive Techniques
4
作者 Yasmin Alaa Hassan Abdul Monem S.Rahma 《Computers, Materials & Continua》 SCIE EI 2024年第1期1423-1442,共20页
Video watermarking plays a crucial role in protecting intellectual property rights and ensuring content authenticity.This study delves into the integration of Galois Field(GF)multiplication tables,especially GF(2^(4))... Video watermarking plays a crucial role in protecting intellectual property rights and ensuring content authenticity.This study delves into the integration of Galois Field(GF)multiplication tables,especially GF(2^(4)),and their interaction with distinct irreducible polynomials.The primary aim is to enhance watermarking techniques for achieving imperceptibility,robustness,and efficient execution time.The research employs scene selection and adaptive thresholding techniques to streamline the watermarking process.Scene selection is used strategically to embed watermarks in the most vital frames of the video,while adaptive thresholding methods ensure that the watermarking process adheres to imperceptibility criteria,maintaining the video's visual quality.Concurrently,careful consideration is given to execution time,crucial in real-world scenarios,to balance efficiency and efficacy.The Peak Signal-to-Noise Ratio(PSNR)serves as a pivotal metric to gauge the watermark's imperceptibility and video quality.The study explores various irreducible polynomials,navigating the trade-offs between computational efficiency and watermark imperceptibility.In parallel,the study pays careful attention to the execution time,a paramount consideration in real-world scenarios,to strike a balance between efficiency and efficacy.This comprehensive analysis provides valuable insights into the interplay of GF multiplication tables,diverse irreducible polynomials,scene selection,adaptive thresholding,imperceptibility,and execution time.The evaluation of the proposed algorithm's robustness was conducted using PSNR and NC metrics,and it was subjected to assessment under the impact of five distinct attack scenarios.These findings contribute to the development of watermarking strategies that balance imperceptibility,robustness,and processing efficiency,enhancing the field's practicality and effectiveness. 展开更多
关键词 video watermarking galois field irreducible polynomial multiplication table scene selection adaptive thresholding
下载PDF
Generative Multi-Modal Mutual Enhancement Video Semantic Communications
5
作者 Yuanle Chen Haobo Wang +3 位作者 Chunyu Liu Linyi Wang Jiaxin Liu Wei Wu 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期2985-3009,共25页
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the... Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent. 展开更多
关键词 Generative adversarial networks multi-modal mutual enhancement video semantic transmission deep learning
下载PDF
Multi-Stream Temporally Enhanced Network for Video Salient Object Detection
6
作者 Dan Xu Jiale Ru Jinlong Shi 《Computers, Materials & Continua》 SCIE EI 2024年第1期85-104,共20页
Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing com... Video salient object detection(VSOD)aims at locating the most attractive objects in a video by exploring the spatial and temporal features.VSOD poses a challenging task in computer vision,as it involves processing complex spatial data that is also influenced by temporal dynamics.Despite the progress made in existing VSOD models,they still struggle in scenes of great background diversity within and between frames.Additionally,they encounter difficulties related to accumulated noise and high time consumption during the extraction of temporal features over a long-term duration.We propose a multi-stream temporal enhanced network(MSTENet)to address these problems.It investigates saliency cues collaboration in the spatial domain with a multi-stream structure to deal with the great background diversity challenge.A straightforward,yet efficient approach for temporal feature extraction is developed to avoid the accumulative noises and reduce time consumption.The distinction between MSTENet and other VSOD methods stems from its incorporation of both foreground supervision and background supervision,facilitating enhanced extraction of collaborative saliency cues.Another notable differentiation is the innovative integration of spatial and temporal features,wherein the temporal module is integrated into the multi-stream structure,enabling comprehensive spatial-temporal interactions within an end-to-end framework.Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on five benchmark datasets while maintaining a real-time speed of 27 fps(Titan XP).Our code and models are available at https://github.com/RuJiaLe/MSTENet. 展开更多
关键词 video salient object detection deep learning temporally enhanced foreground-background collaboration
下载PDF
CVTD: A Robust Car-Mounted Video Text Detector
7
作者 Di Zhou Jianxun Zhang +2 位作者 Chao Li Yifan Guo Bowen Li 《Computers, Materials & Continua》 SCIE EI 2024年第2期1821-1842,共22页
Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid... Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios. 展开更多
关键词 Deep learning text detection Car-mounted video text detector intelligent driving assistance arbitrary shape text detector
下载PDF
Pulse rate estimation based on facial videos:an evaluation and optimization of the classical methods using both self-constructed and public datasets
8
作者 Chao-Yong Wu Jian-Xin Chen +3 位作者 Yu Chen Ai-Ping Chen Lu Zhou Xu Wang 《Traditional Medicine Research》 2024年第1期14-22,共9页
Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate b... Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation. 展开更多
关键词 pulse rate heart rate PHOTOPLETHYSMOGRAPHY observation and pulse diagnosis facial videos
下载PDF
TEAM:Transformer Encoder Attention Module for Video Classification
9
作者 Hae Sung Park Yong Suk Choi 《Computer Systems Science & Engineering》 2024年第2期451-477,共27页
Much like humans focus solely on object movement to understand actions,directing a deep learning model’s attention to the core contexts within videos is crucial for improving video comprehension.In the recent study,V... Much like humans focus solely on object movement to understand actions,directing a deep learning model’s attention to the core contexts within videos is crucial for improving video comprehension.In the recent study,Video Masked Auto-Encoder(VideoMAE)employs a pre-training approach with a high ratio of tube masking and reconstruction,effectively mitigating spatial bias due to temporal redundancy in full video frames.This steers the model’s focus toward detailed temporal contexts.However,as the VideoMAE still relies on full video frames during the action recognition stage,it may exhibit a progressive shift in attention towards spatial contexts,deteriorating its ability to capture the main spatio-temporal contexts.To address this issue,we propose an attention-directing module named Transformer Encoder Attention Module(TEAM).This proposed module effectively directs the model’s attention to the core characteristics within each video,inherently mitigating spatial bias.The TEAM first figures out the core features among the overall extracted features from each video.After that,it discerns the specific parts of the video where those features are located,encouraging the model to focus more on these informative parts.Consequently,during the action recognition stage,the proposed TEAM effectively shifts the VideoMAE’s attention from spatial contexts towards the core spatio-temporal contexts.This attention-shift manner alleviates the spatial bias in the model and simultaneously enhances its ability to capture precise video contexts.We conduct extensive experiments to explore the optimal configuration that enables the TEAM to fulfill its intended design purpose and facilitates its seamless integration with the VideoMAE framework.The integrated model,i.e.,VideoMAE+TEAM,outperforms the existing VideoMAE by a significant margin on Something-Something-V2(71.3%vs.70.3%).Moreover,the qualitative comparisons demonstrate that the TEAM encourages the model to disregard insignificant features and focus more on the essential video features,capturing more detailed spatio-temporal contexts within the video. 展开更多
关键词 video classification action recognition vision transformer masked auto-encoder
下载PDF
SwinVid:Enhancing Video Object Detection Using Swin Transformer
10
作者 Abdelrahman Maharek Amr Abozeid +1 位作者 Rasha Orban Kamal ElDahshan 《Computer Systems Science & Engineering》 2024年第2期305-320,共16页
What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reas... What causes object detection in video to be less accurate than it is in still images?Because some video frames have degraded in appearance from fast movement,out-of-focus camera shots,and changes in posture.These reasons have made video object detection(VID)a growing area of research in recent years.Video object detection can be used for various healthcare applications,such as detecting and tracking tumors in medical imaging,monitoring the movement of patients in hospitals and long-term care facilities,and analyzing videos of surgeries to improve technique and training.Additionally,it can be used in telemedicine to help diagnose and monitor patients remotely.Existing VID techniques are based on recurrent neural networks or optical flow for feature aggregation to produce reliable features which can be used for detection.Some of those methods aggregate features on the full-sequence level or from nearby frames.To create feature maps,existing VID techniques frequently use Convolutional Neural Networks(CNNs)as the backbone network.On the other hand,Vision Transformers have outperformed CNNs in various vision tasks,including object detection in still images and image classification.We propose in this research to use Swin-Transformer,a state-of-the-art Vision Transformer,as an alternative to CNN-based backbone networks for object detection in videos.The proposed architecture enhances the accuracy of existing VID methods.The ImageNet VID and EPIC KITCHENS datasets are used to evaluate the suggested methodology.We have demonstrated that our proposed method is efficient by achieving 84.3%mean average precision(mAP)on ImageNet VID using less memory in comparison to other leading VID techniques.The source code is available on the website https://github.com/amaharek/SwinVid. 展开更多
关键词 video object detection vision transformers convolutional neural networks deep learning
下载PDF
Research on Information Architecture Design of Short-Form Video Social Platforms Based on Cognitive Psychology
11
作者 Zhengyang Liu Albert Young Choi 《Psychology Research》 2024年第1期1-13,共13页
This study investigates how cognitive psychology principles can be integrated into the information architecture design of short-form video platforms,like TikTok,to enhance user experience,engagement,and sharing.Using ... This study investigates how cognitive psychology principles can be integrated into the information architecture design of short-form video platforms,like TikTok,to enhance user experience,engagement,and sharing.Using a questionnaire,it explores TikTok users’habits and preferences,highlighting how social media fatigue(SMF)impacts their interaction with the platform.The paper offers strategies to optimize TikTok’s design.It suggests refining the organizational system using principles like chunking,schema theory,and working memory capacity.Additionally,it proposes incorporating shopping features within TikTok’s interface to personalize product suggestions and enable monetization for influencers and content creators.Furthermore,the study underlines the need to consider gender differences and user preferences in improving TikTok’s sharing features,recommending streamlined and customizable sharing options,collaborative sharing,and a system to acknowledge sharing milestones.Aiming to strengthen social connections and increase sharing likelihood,this research provides insights into enhancing information architecture for short-form video platforms,contributing to their growth and success. 展开更多
关键词 information architecture design short-form video social cognitive psychology user experience
下载PDF
Problematic Use of Video Games in Schools in Northern Benin (2023)
12
作者 Ireti Nethania Elie Ataigba David Sinet Koivogui +6 位作者 Damega Wenkourama Marcos Tohou Eurydice Elvire Djossou Anselme Djidonou Francis Tognon Tchegnonsi Prosper Gandaho Josiane Ezin Houngbe 《Open Journal of Psychiatry》 2024年第2期120-141,共22页
Objective: To study the problematic use of video games among secondary school students in the city of Parakou in 2023. Methods: Descriptive cross-sectional study conducted in the commune of Parakou from December 2022 ... Objective: To study the problematic use of video games among secondary school students in the city of Parakou in 2023. Methods: Descriptive cross-sectional study conducted in the commune of Parakou from December 2022 to July 2023. The study population consisted of students regularly enrolled in public and private secondary schools in the city of Parakou for the 2022-2023 academic year. A two-stage non-proportional stratified sampling technique combined with simple random sampling was adopted. The Problem Video Game Playing (PVP) scale was used to assess problem gambling in the study population, while anxiety and depression were assessed using the Hospital Anxiety and Depression Scale (HADS). Results: A total of 1030 students were included. The mean age of the pupils surveyed was 15.06 ± 2.68 years, with extremes of 10 and 28 years. The [13 - 18] age group was the most represented, with a proportion of 59.6% (614) in the general population. Females predominated, at 52.8% (544), with a sex ratio of 0.89. The prevalence of problematic video game use was 24.9%, measured using the Video Game Playing scale. Associated factors were male gender (p = 0.005), pocket money under 10,000 cfa (p = 0.001) and between 20,000 - 90,000 cfa (p = 0.030), addictive family behavior (p < 0.001), monogamous family (p = 0.023), good relationship with father (p = 0.020), organization of video game competitions (p = 0.001) and definite anxiety (p Conclusion: Substance-free addiction is struggling to attract the attention it deserves, as it did in its infancy everywhere else. This study complements existing data and serves as a reminder of the need to focus on this group of addictions, whose problematic use of video games remains the most frequent due to its accessibility and social tolerance. Preventive action combined with curative measures remains the most effective means of combating the problem at national level. 展开更多
关键词 Gaming Problem video Games BENIN 2023
下载PDF
Real-Time Mosaic Method of Aerial Video Based on Two-Stage Key Frame Selection Method
13
作者 Minwen Yuan Yonghong Long Xin Li 《Open Journal of Applied Sciences》 2024年第4期1008-1021,共14页
A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequenc... A two-stage automatic key frame selection method is proposed to enhance stitching speed and quality for UAV aerial videos. In the first stage, to reduce redundancy, the overlapping rate of the UAV aerial video sequence within the sampling period is calculated. Lagrange interpolation is used to fit the overlapping rate curve of the sequence. An empirical threshold for the overlapping rate is then applied to filter candidate key frames from the sequence. In the second stage, the principle of minimizing remapping spots is used to dynamically adjust and determine the final key frame close to the candidate key frames. Comparative experiments show that the proposed method significantly improves stitching speed and accuracy by more than 40%. 展开更多
关键词 UAV Aerial video Image Stiching Key Frame Selection Overlapping Rate Remap Error
下载PDF
Optimization of Interactive Videos Empowered the Experience of Learning Management System
14
作者 Muhammad Akram Muhammad Waseem Iqbal +3 位作者 M.Usman Ashraf Erssa Arif Khalid Alsubhi Hani Moaiteq Aljahdali 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期1021-1038,共18页
The Learning management system(LMS)is now being used for uploading educational content in both distance and blended setups.LMS platform has two types of users:the educators who upload the content,and the students who ... The Learning management system(LMS)is now being used for uploading educational content in both distance and blended setups.LMS platform has two types of users:the educators who upload the content,and the students who have to access the content.The students,usually rely on text notes or books and video tutorials while their exams are conducted with formal methods.Formal assessments and examination criteria are ineffective with restricted learning space which makes the student tend only to read the educational contents and videos instead of interactive mode.The aim is to design an interactive LMS and examination video-based interface to cater the issues of educators and students.It is designed according to Human-computer interaction(HCI)principles to make the interactive User interface(UI)through User experience(UX).The interactive lectures in the form of annotated videos increase user engagement and improve the self-study context of users involved in LMS.The interface design defines how the design will interact with users and how the interface exchanges information.The findings show that interactive videos for LMS allow the users to have a more personalized learning experience by engaging in the educational content.The result shows a highly personalized learning experience due to the interactive video and quiz within the video. 展开更多
关键词 User interface user experience learning management system linear nonlinear video interactive video visual design
下载PDF
An Efficient Attention-Based Strategy for Anomaly Detection in Surveillance Video
15
作者 Sareer Ul Amin Yongjun Kim +2 位作者 Irfan Sami Sangoh Park Sanghyun Seo 《Computer Systems Science & Engineering》 SCIE EI 2023年第9期3939-3958,共20页
In the present technological world,surveillance cameras generate an immense amount of video data from various sources,making its scrutiny tough for computer vision specialists.It is difficult to search for anomalous e... In the present technological world,surveillance cameras generate an immense amount of video data from various sources,making its scrutiny tough for computer vision specialists.It is difficult to search for anomalous events manually in thesemassive video records since they happen infrequently and with a low probability in real-world monitoring systems.Therefore,intelligent surveillance is a requirement of the modern day,as it enables the automatic identification of normal and aberrant behavior using artificial intelligence and computer vision technologies.In this article,we introduce an efficient Attention-based deep-learning approach for anomaly detection in surveillance video(ADSV).At the input of the ADSV,a shots boundary detection technique is used to segment prominent frames.Next,The Lightweight ConvolutionNeuralNetwork(LWCNN)model receives the segmented frames to extract spatial and temporal information from the intermediate layer.Following that,spatial and temporal features are learned using Long Short-Term Memory(LSTM)cells and Attention Network from a series of frames for each anomalous activity in a sample.To detect motion and action,the LWCNN received chronologically sorted frames.Finally,the anomaly activity in the video is identified using the proposed trained ADSV model.Extensive experiments are conducted on complex and challenging benchmark datasets.In addition,the experimental results have been compared to state-ofthe-artmethodologies,and a significant improvement is attained,demonstrating the efficiency of our ADSV method. 展开更多
关键词 Attention-based anomaly detection video shots segmentation video surveillance computer vision deep learning smart surveillance system violence detection attention model
下载PDF
An Efficient Method for Underwater Video Summarization and Object Detection Using YoLoV3
16
作者 Mubashir Javaid Muazzam Maqsood +2 位作者 Farhan Aadil Jibran Safdar Yongsung Kim 《Intelligent Automation & Soft Computing》 SCIE 2023年第2期1295-1310,共16页
Currently,worldwide industries and communities are concerned with building,expanding,and exploring the assets and resources found in the oceans and seas.More precisely,to analyze a stock,archaeology,and surveillance,s... Currently,worldwide industries and communities are concerned with building,expanding,and exploring the assets and resources found in the oceans and seas.More precisely,to analyze a stock,archaeology,and surveillance,sev-eral cameras are installed underseas to collect videos.However,on the other hand,these large size videos require a lot of time and memory for their processing to extract relevant information.Hence,to automate this manual procedure of video assessment,an accurate and efficient automated system is a greater necessity.From this perspective,we intend to present a complete framework solution for the task of video summarization and object detection in underwater videos.We employed a perceived motion energy(PME)method tofirst extract the keyframes followed by an object detection model approach namely YoloV3 to perform object detection in underwater videos.The issues of blurriness and low contrast in underwater images are also taken into account in the presented approach by applying the image enhancement method.Furthermore,the suggested framework of underwater video summarization and object detection has been evaluated on a publicly available brackish dataset.It is observed that the proposed framework shows good performance and hence ultimately assists several marine researchers or scientists related to thefield of underwater archaeology,stock assessment,and surveillance. 展开更多
关键词 Computer vision deep learning digital image processing underwater video analysis video summarization object detection YOLOV3
下载PDF
COVAD: Content-oriented video anomaly detection using a self attention-based deep learning model
17
作者 Wenhao SHAO Praboda RAJAPAKSHA +3 位作者 Yanyan WEI Dun LI Noel CRESPI Zhigang LUO 《Virtual Reality & Intelligent Hardware》 2023年第1期24-41,共18页
Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than consider... Background Video anomaly detection has always been a hot topic and has attracted increasing attention.Many of the existing methods for video anomaly detection depend on processing the entire video rather than considering only the significant context. Method This paper proposes a novel video anomaly detection method called COVAD that mainly focuses on the region of interest in the video instead of the entire video. Our proposed COVAD method is based on an autoencoded convolutional neural network and a coordinated attention mechanism,which can effectively capture meaningful objects in the video and dependencies among different objects. Relying on the existing memory-guided video frame prediction network, our algorithm can significantly predict the future motion and appearance of objects in a video more effectively. Result The proposed algorithm obtained better experimental results on multiple datasets and outperformed the baseline models considered in our analysis. Simultaneously, we provide an improved visual test that can provide pixel-level anomaly explanations. 展开更多
关键词 video surveillance video anomaly detection Machine learning Deep learning Neural network Coordinate attention
下载PDF
Multimodal feature fusion based on object relation for video captioning 被引量:1
18
作者 Zhiwen Yan Ying Chen +1 位作者 Jinlong Song Jia Zhu 《CAAI Transactions on Intelligence Technology》 SCIE EI 2023年第1期247-259,共13页
Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects... Video captioning aims at automatically generating a natural language caption to describe the content of a video.However,most of the existing methods in the video captioning task ignore the relationship between objects in the video and the correlation between multimodal features,and they also ignore the effect of caption length on the task.This study proposes a novel video captioning framework(ORMF)based on the object relation graph and multimodal feature fusion.ORMF uses the similarity and Spatio-temporal relationship of objects in video to construct object relation features graph and introduce graph convolution network(GCN)to encode the object relation.At the same time,ORMF also constructs a multimodal features fusion network to learn the relationship between different modal features.The multimodal feature fusion network is used to fuse the features of different modals.Furthermore,the proposed model calculates the length loss of the caption,making the caption get richer information.The experimental results on two public datasets(Microsoft video captioning corpus[MSVD]and Microsoft research-video to text[MSR-VTT])demonstrate the effectiveness of our method. 展开更多
关键词 APPROACHES deep learning multimodel scene understanding video analysis
下载PDF
Saccades of video head impulse test in Meniere's disease and Vestibular Migraine: What can we learn from? 被引量:1
19
作者 Yi Du Xingjian Liu +4 位作者 Lili Ren Yu Wang Fei Ji Weiwei Guo Ziming Wu 《Journal of Otology》 CSCD 2023年第2期79-84,共6页
Background:Saccades are often observed on video head impulse tests(vHIT)in patients with Meniere's Disease(MD)and Vestibular Migraine(VM).However,their saccadic features are not fully described.Objective:This stud... Background:Saccades are often observed on video head impulse tests(vHIT)in patients with Meniere's Disease(MD)and Vestibular Migraine(VM).However,their saccadic features are not fully described.Objective:This study aims to identify the saccades characteristics of MD and VM.Methods:75 VM patients and 103 definite unilateral MD patients were enrolled in this study.First raw saccades were exported and analyzed.The VM patients were divided into left and right based on their ears,while the MD patients were separated into affected and unaffected subgroups based on their audiograms and symptoms.Results:The MD patients have more saccades on the affected side(85%vs.69%),and saccade velocity is more consistent than the contralateral side(shown by the coefficient of variation).The saccades occurrence rates on both sides are similar in VM(77%vs.76%),as are other saccadic parameters.The MD patients have more significant inter-aural differences than the VM patients,manifested in higher velocity(p-value 0.000),earlier arriving(p-value 0.010),and more time-domain gathered(p-value 0.003)on the affected side.Conclusions:Bilateral saccades are commonly observed in MD and VM.In contrast to MD,saccades on VM are subtle,scattered,and late-arrived.Furthermore,the MD patients showed inconsistent saccadic distribution with more velocity-uniform saccades on the affected side. 展开更多
关键词 Meniere's disease Vestibular migraine SACCADES video head impulse test Differential diagnosis
下载PDF
Coarse-to-Fine Video Instance Segmentation With Factorized Conditional Appearance Flows 被引量:1
20
作者 Zheyun Qin Xiankai Lu +3 位作者 Xiushan Nie Dongfang Liu Yilong Yin Wenguan Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2023年第5期1192-1208,共17页
We introduce a novel method using a new generative model that automatically learns effective representations of the target and background appearance to detect,segment and track each instance in a video sequence.Differ... We introduce a novel method using a new generative model that automatically learns effective representations of the target and background appearance to detect,segment and track each instance in a video sequence.Differently from current discriminative tracking-by-detection solutions,our proposed hierarchical structural embedding learning can predict more highquality masks with accurate boundary details over spatio-temporal space via the normalizing flows.We formulate the instance inference procedure as a hierarchical spatio-temporal embedded learning across time and space.Given the video clip,our method first coarsely locates pixels belonging to a particular instance with Gaussian distribution and then builds a novel mixing distribution to promote the instance boundary by fusing hierarchical appearance embedding information in a coarse-to-fine manner.For the mixing distribution,we utilize a factorization condition normalized flow fashion to estimate the distribution parameters to improve the segmentation performance.Comprehensive qualitative,quantitative,and ablation experiments are performed on three representative video instance segmentation benchmarks(i.e.,YouTube-VIS19,YouTube-VIS21,and OVIS)and the effectiveness of the proposed method is demonstrated.More impressively,the superior performance of our model on an unsupervised video object segmentation dataset(i.e.,DAVIS19)proves its generalizability.Our algorithm implementations are publicly available at https://github.com/zyqin19/HEVis. 展开更多
关键词 Embedding learning generative model normalizing flows video instance segmentation(VIS)
下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部