Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate b...Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation.展开更多
Audio description(AD),unlike interlingual translation and interpretation,is subject to unique constraints as a spoken text.Facilitated by AD,educational videos on COVID-19 anti-virus measures are made accessible to th...Audio description(AD),unlike interlingual translation and interpretation,is subject to unique constraints as a spoken text.Facilitated by AD,educational videos on COVID-19 anti-virus measures are made accessible to the visually disadvantaged.In this study,a corpus of AD of COVID-19 educational videos is developed,named“Audio Description Corpus of COVID-19 Educational Videos”(ADCCEV).Drawing on the model of Textual and Linguistic Audio Description Matrix(TLADM),this paper aims to identify the linguistic and textual idiosyncrasies of AD themed on COVID-19 response released by the New Zealand Government.This study finds that linguistically,the AD script uses a mix of complete sentences and phrases,the majority being in Present Simple tense.Present participles and the“with”structure are used for brevity.Vocabulary is diverse,with simpler words for animated explainers.Third-person pronouns are common in educational videos.Color words are a salient feature of AD,where“yellow”denotes urgency,and“red”indicates importance,negativity,and hostility.On textual idiosyncrasies,coherence is achieved through intermodal components that align with the video’s mood and style.AD style varies depending on the video’s purpose,from informative to narrative or expressive.展开更多
The Learning management system(LMS)is now being used for uploading educational content in both distance and blended setups.LMS platform has two types of users:the educators who upload the content,and the students who ...The Learning management system(LMS)is now being used for uploading educational content in both distance and blended setups.LMS platform has two types of users:the educators who upload the content,and the students who have to access the content.The students,usually rely on text notes or books and video tutorials while their exams are conducted with formal methods.Formal assessments and examination criteria are ineffective with restricted learning space which makes the student tend only to read the educational contents and videos instead of interactive mode.The aim is to design an interactive LMS and examination video-based interface to cater the issues of educators and students.It is designed according to Human-computer interaction(HCI)principles to make the interactive User interface(UI)through User experience(UX).The interactive lectures in the form of annotated videos increase user engagement and improve the self-study context of users involved in LMS.The interface design defines how the design will interact with users and how the interface exchanges information.The findings show that interactive videos for LMS allow the users to have a more personalized learning experience by engaging in the educational content.The result shows a highly personalized learning experience due to the interactive video and quiz within the video.展开更多
For intelligent surveillance videos,anomaly detection is extremely important.Deep learning algorithms have been popular for evaluating realtime surveillance recordings,like traffic accidents,and criminal or unlawful i...For intelligent surveillance videos,anomaly detection is extremely important.Deep learning algorithms have been popular for evaluating realtime surveillance recordings,like traffic accidents,and criminal or unlawful incidents such as suicide attempts.Nevertheless,Deep learning methods for classification,like convolutional neural networks,necessitate a lot of computing power.Quantum computing is a branch of technology that solves abnormal and complex problems using quantum mechanics.As a result,the focus of this research is on developing a hybrid quantum computing model which is based on deep learning.This research develops a Quantum Computing-based Convolutional Neural Network(QC-CNN)to extract features and classify anomalies from surveillance footage.A Quantum-based Circuit,such as the real amplitude circuit,is utilized to improve the performance of the model.As far as my research,this is the first work to employ quantum deep learning techniques to classify anomalous events in video surveillance applications.There are 13 anomalies classified from the UCF-crime dataset.Based on experimental results,the proposed model is capable of efficiently classifying data concerning confusion matrix,Receiver Operating Characteristic(ROC),accuracy,Area Under Curve(AUC),precision,recall as well as F1-score.The proposed QC-CNN has attained the best accuracy of 95.65 percent which is 5.37%greater when compared to other existing models.To measure the efficiency of the proposed work,QC-CNN is also evaluated with classical and quantum models.展开更多
Football is one of the most-watched sports,but analyzing players’per-formance is currently difficult and labor intensive.Performance analysis is done manually,which means that someone must watch video recordings and ...Football is one of the most-watched sports,but analyzing players’per-formance is currently difficult and labor intensive.Performance analysis is done manually,which means that someone must watch video recordings and then log each player’s performance.This includes the number of passes and shots taken by each player,the location of the action,and whether or not the play had a successful outcome.Due to the time-consuming nature of manual analyses,interest in automatic analysis tools is high despite the many interdependent phases involved,such as pitch segmentation,player and ball detection,assigning players to their teams,identifying individual players,activity recognition,etc.This paper proposes a system for developing an automatic video analysis tool for sports.The proposed system is the first to integrate multiple phases,such as segmenting the field,detecting the players and the ball,assigning players to their teams,and iden-tifying players’jersey numbers.In team assignment,this research employed unsu-pervised learning based on convolutional autoencoders(CAEs)to learn discriminative latent representations and minimize the latent embedding distance between the players on the same team while simultaneously maximizing the dis-tance between those on opposing teams.This paper also created a highly accurate approach for the real-time detection of the ball.Furthermore,it also addressed the lack of jersey number datasets by creating a new dataset with more than 6,500 images for numbers ranging from 0 to 99.Since achieving a high perfor-mance in deep learning requires a large training set,and the collected dataset was not enough,this research utilized transfer learning(TL)to first pretrain the jersey number detection model on another large dataset and then fine-tune it on the target dataset to increase the accuracy.To test the proposed system,this paper presents a comprehensive evaluation of its individual stages as well as of the sys-tem as a whole.展开更多
With the rapid development of immersive multimedia technologies,360-degree video services have quickly gained popularity and how to ensure sufficient spatial presence of end users when viewing 360-degree videos become...With the rapid development of immersive multimedia technologies,360-degree video services have quickly gained popularity and how to ensure sufficient spatial presence of end users when viewing 360-degree videos becomes a new challenge.In this regard,accurately acquiring users’sense of spatial presence is of fundamental importance for video service providers to improve their service quality.Unfortunately,there is no efficient evaluation model so far for measuring the sense of spatial presence for 360-degree videos.In this paper,we first design an assessment framework to clarify the influencing factors of spatial presence.Related parameters of 360-degree videos and headmounted display devices are both considered in this framework.Well-designed subjective experiments are then conducted to investigate the impact of various influencing factors on the sense of presence.Based on the subjective ratings,we propose a spatial presence assessment model that can be easily deployed in 360-degree video applications.To the best of our knowledge,this is the first attempt in literature to establish a quantitative spatial presence assessment model by using technical parameters that are easily extracted.Experimental results demonstrate that the proposed model can reliably predict the sense of spatial presence.展开更多
In this paper, we will be looking at our efforts to find a novel solution for motion deblurring in videos. In addition, our solution has the requirement of being camera-independent. This means that the solution is ful...In this paper, we will be looking at our efforts to find a novel solution for motion deblurring in videos. In addition, our solution has the requirement of being camera-independent. This means that the solution is fully implemented in software and is not aware of any of the characteristics of the camera. We found a solution by implementing a Convolutional Neural Network-Long Short Term Memory (CNN-LSTM) hybrid model. Our CNN-LSTM is able to deblur video without any knowledge of the camera hardware. This allows it to be implemented on any system that allows the camera to be swapped out with any camera model with any physical characteristics.展开更多
Detecting feature points on the human body in video frames is a key step for tracking human movements. There have been methods developed that leverage models of human pose and classification of pixels of the body imag...Detecting feature points on the human body in video frames is a key step for tracking human movements. There have been methods developed that leverage models of human pose and classification of pixels of the body image. Yet, occlusion and robustness are still open challenges. In this paper, we present an automatic, model-free feature point detection and action tracking method using a time-of-flight camera. Our method automatically detects feature points for movement abstraction. To overcome errors caused by miss-detection and occlusion, a refinement method is devised that uses the trajectory of the feature points to correct the erroneous detections. Experiments were conducted using videos acquired with a Microsoft Kinect camera and a publicly available video set and comparisons were conducted with the state-of-the-art methods. The results demonstrated that our proposed method delivered improved and reliable performance with an average accuracy in the range of 90 %.The trajectorybased refinement also demonstrated satisfactory effectiveness that recovers the detection with a success rate of 93.7 %. Our method processed a frame in an average time of 71.1 ms.展开更多
Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random ...Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random matrix destroys the target location information in the original video frames. This paper summarizes our research effort on target tracking and classification directly in the compressive measurement domain. We focus on one particular type of compressive measurement using pixel subsampling. That is, original pixels in video frames are randomly subsampled. Even in such a special compressive sensing setting, conventional trackers do not work in a satisfactory manner. We propose a deep learning approach that integrates YOLO (You Only Look Once) and ResNet (residual network) for multiple target tracking and classification. YOLO is used for multiple target tracking and ResNet is for target classification. Extensive experiments using short wave infrared (SWIR), mid-wave infrared (MWIR), and long-wave infrared (LWIR) videos demonstrated the efficacy of the proposed approach even though the training data are very scarce.展开更多
Nowadays,people use online resources such as educational videos and courses.However,such videos and courses are mostly long and thus,summarizing them will be valuable.The video contents(visual,audio,and subtitles)coul...Nowadays,people use online resources such as educational videos and courses.However,such videos and courses are mostly long and thus,summarizing them will be valuable.The video contents(visual,audio,and subtitles)could be analyzed to generate textual summaries,i.e.,notes.Videos’subtitles contain significant information.Therefore,summarizing subtitles is effective to concentrate on the necessary details.Most of the existing studies used Term Frequency-Inverse Document Frequency(TF-IDF)and Latent Semantic Analysis(LSA)models to create lectures’summaries.This study takes another approach and applies LatentDirichlet Allocation(LDA),which proved its effectiveness in document summarization.Specifically,the proposed LDA summarization model follows three phases.The first phase aims to prepare the subtitle file for modelling by performing some preprocessing steps,such as removing stop words.In the second phase,the LDA model is trained on subtitles to generate the keywords list used to extract important sentences.Whereas in the third phase,a summary is generated based on the keywords list.The generated summaries by LDA were lengthy;thus,a length enhancement method has been proposed.For the evaluation,the authors developed manual summaries of the existing“EDUVSUM”educational videos dataset.The authors compared the generated summaries with the manual-generated outlines using two methods,(i)Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and(ii)human evaluation.The performance of LDA-based generated summaries outperforms the summaries generated by TF-IDF and LSA.Besides reducing the summaries’length,the proposed length enhancement method did improve the summaries’precision rates.Other domains,such as news videos,can apply the proposed method for video summarization.展开更多
The trend of globalization has a profound impact on the development of the whole world, and there are a lot of events held by several countries. The publicity videos have become a significant tool to disseminate conno...The trend of globalization has a profound impact on the development of the whole world, and there are a lot of events held by several countries. The publicity videos have become a significant tool to disseminate connotation of the city and to attract people all over the world. At the same time, videos also show us different cultural values between China and foreign countries.To analyse different cultural values, the importance of intercultural communication will be deeply understood, and it will be helpful for promoting China's cultural soft power.展开更多
The authors propose a novel method for transporting multi-view videos that aims to keep the bandwidth requirements on both end-users and servers as low as possible. The method is based on application layer multicast, ...The authors propose a novel method for transporting multi-view videos that aims to keep the bandwidth requirements on both end-users and servers as low as possible. The method is based on application layer multicast, where each end point re- ceives only a selected number of views required for rendering video from its current viewpoint at any given time. The set of selected videos changes in real time as the user’s viewpoint changes because of head or eye movements. Techniques for reducing the black-outs during fast viewpoint changes were investigated. The performance of the approach was studied through network experiments.展开更多
In the current era of multimedia information,it is increasingly urgent to realize intelligent video action recognition and content analysis.In the past few years,video action recognition,as an important direction in c...In the current era of multimedia information,it is increasingly urgent to realize intelligent video action recognition and content analysis.In the past few years,video action recognition,as an important direction in computer vision,has attracted many researchers and made much progress.First,this paper reviews the latest video action recognition methods based on Deep Neural Network and Markov Logic Network.Second,we analyze the characteristics of each method and the performance from the experiment results.Then compare the emphases of these methods and discuss the application scenarios.Finally,we consider and prospect the development trend and direction of this field.展开更多
With the advent in services such as telemedicine and telesurgery,provision of continuous quality monitoring for these services has become a challenge for the network operators.Quality standards for provision of such s...With the advent in services such as telemedicine and telesurgery,provision of continuous quality monitoring for these services has become a challenge for the network operators.Quality standards for provision of such services are application specic as medical imagery is quite different than general purpose images and videos.This paper presents a novel full reference objective video quality metric that focuses on estimating the quality of wireless capsule endoscopy(WCE)videos containing bleeding regions.Bleeding regions in gastrointestinal tract have been focused in this research,as bleeding is one of the major reasons behind several diseases within the tract.The method jointly estimates the diagnostic as well as perceptual quality of WCE videos,and accurately predicts the quality,which is in high correlation with the subjective differential mean opinion scores(DMOS).The proposed combines motion quality estimates,bleeding regions’quality estimates based on support vector machine(SVM)and perceptual quality estimates using the pristine and impaired WCE videos.Our method Quality Index for Bleeding Regions in Capsule Endoscopy(QI-BRiCE)videos is one of its kind and the results show high correlation in terms of Pearson’s linear correlation coefcient(PLCC)and Spearman’s rank order correlation coefcient(SROCC).An F-test is also provided in the results section to prove the statistical signicance of our proposed method.展开更多
In recent years,with the rapid development of deep learning technologies,some neural network models have been applied to generate fake media.DeepFakes,a deep learning based forgery technology,can tamper with the face ...In recent years,with the rapid development of deep learning technologies,some neural network models have been applied to generate fake media.DeepFakes,a deep learning based forgery technology,can tamper with the face easily and generate fake videos that are difficult to be distinguished by human eyes.The spread of face manipulation videos is very easy to bring fake information.Therefore,it is important to develop effective detection methods to verify the authenticity of the videos.Due to that it is still challenging for current forgery technologies to generate all facial details and the blending operations are used in the forgery process,the texture details of the fake face are insufficient.Therefore,in this paper,a new method is proposed to detect DeepFake videos.Firstly,the texture features are constructed,which are based on the gradient domain,standard deviation,gray level co-occurrence matrix and wavelet transform of the face region.Then,the features are processed by the feature selection method to form a discriminant feature vector,which is finally employed to SVM for classification at the frame level.The experimental results on the mainstream DeepFake datasets demonstrate that the proposed method can achieve ideal performance,proving the effectiveness of the proposed method for DeepFake videos detection.展开更多
In-loop filtering significantly helps detect and remove blocking artifacts across block boundaries in low bitrate coded High Efficiency Video Coding(HEVC)frames and improves its subjective visual quality in multimedia...In-loop filtering significantly helps detect and remove blocking artifacts across block boundaries in low bitrate coded High Efficiency Video Coding(HEVC)frames and improves its subjective visual quality in multimedia services over communication networks.However,on faster processing of the complex videos at a low bitrate,some visible artifacts considerably degrade the picture quality.In this paper,we proposed a four-step fuzzy based adaptive deblocking filter selection technique.The proposed method removes the quantization noise,blocking artifacts and corner outliers efficiently for HEVC coded videos even at low bit-rate.We have considered Y(luma),U(chromablue),and V(chroma-red)components parallelly.Finally,we have developed a fuzzy system to detect blocking artifacts and use adaptive filters as per requirement in all four quadrants,namely up 45◦,down 45◦,up 135◦,and down 135◦across horizontal and vertical block boundaries.In this context,experimentation is done on a wide variety of videos.An objective and subjective analysis is carried out with MATLAB software and Human Visual System(HVS).The proposed method substantially outperforms existing postprocessing deblocking techniques in terms of YPSNR and BD_rate.In the proposed method,we achieved 0.32–0.97 dB values of YPSNR.Our method achieved a BD_rate of+1.69%for the luma component,−0.18%(U)and−1.99%(V)for chroma components,respectively,with respect to the stateof-the-art methods.The proposed method proves to have low computational complexity and has better parallel processing,hence suitable for a real-time system in the near future.展开更多
Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras avai...Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras available in various devices that can be in a static or dynamic position and are referred to as untrimmed videos.Smarter monitoring is a historical necessity in which commonly occurring,regular,and out-of-the-ordinary activities can be automatically identified using intelligence systems and computer vision technology.In a long video,human activity may be present anywhere in the video.There can be a single ormultiple human activities present in such videos.This paper presents a deep learning-based methodology to identify the locally present human activities in the video sequences captured by a single wide-view camera in a sports environment.The recognition process is split into four parts:firstly,the video is divided into different set of frames,then the human body part in a sequence of frames is identified,next process is to identify the human activity using a convolutional neural network and finally the time information of the observed postures for each activity is determined with the help of a deep learning algorithm.The proposed approach has been tested on two different sports datasets including ActivityNet and THUMOS.Three sports activities like swimming,cricket bowling and high jump have been considered in this paper and classified with the temporal information i.e.,the start and end time for every activity present in the video.The convolutional neural network and long short-term memory are used for feature extraction of temporal action recognition from video data of sports activity.The outcomes show that the proposed method for activity recognition in the sports domain outperforms the existing methods.展开更多
Background: Real-time use of procedure videos as educational tools has not been studied. We sought to determine whether viewing a video of a medical procedure prior to procedure performance in the emergency department...Background: Real-time use of procedure videos as educational tools has not been studied. We sought to determine whether viewing a video of a medical procedure prior to procedure performance in the emergency department improves the quality of teaching of procedures, and whether videos are particularly beneficial during periods of emergency department crowding. Methods: In this single-centre, prospective, before and after study standardized data collection forms were completed by both trainees and supervising emergency physicians (EPs) at the end of each emergency department shift in the before (August 2008-March 2009) and after (August 2009-March 2010) phase. Online procedure videos were introduced on emergency department computers in the after phase. The primary outcome measure was EP rating of the quality of teaching provided (5-point Likert scale). The interaction between crowding and videos was also assessed, to determine whether videos provide a specific additional benefit during periods of emergency department crowding. Results: There were 1159 procedures performed by 192 trainees. Median procedures performed per shift was 1.0 (IQR 0 - 2.0). Mean EP rating of teaching provided was significantly higher in the group that viewed videos, at 4.2 versus 3.7 (p 0.001). In the adjusted analysis, EP ratings increased by 0.5 with a video (p 0.001), while the odds of a score of 5.0 were 2.2 times greater if a video was viewed (p = 0.03). The interaction of crowding and procedure videos was not significant (the use of videos increased the average score by 0.24 in times of crowding compared to times of non-crowding, p = 0.19). Conclusions: Use of procedural videos was associated with EP perception of improved quality of teaching provided around procedures. While EPs rated the quality of their teaching as improved overall, the effect of videos on teaching quality was the same in crowded settings as it was in non-crowded setting.展开更多
Existing broadcasting schemes provide services for the stored videos. The basic approach in these schemes is to divide the video into segments and organize them over the channels for proper transmission. Some schemes ...Existing broadcasting schemes provide services for the stored videos. The basic approach in these schemes is to divide the video into segments and organize them over the channels for proper transmission. Some schemes use segments as a basic unit, whereas the others require segments to be further divided into subsegments. In a scheme, the number of segments/subsegments depends upon the bandwidth allocated to the video by the video server. For constructing segments, the video length should be known. If it is unknown, then the segments cannot be constructed and hence the scheme cannot be applied to provide the video services. This is an important issue especially in live broadcasting applications wherein the ending time of the video is unknown, for example, cricket match. In this paper, we propose a mechanism for the conservative staircase scheme so that it can support live video broadcasting.展开更多
Since the era of tourism gaze,in the process of shaping cultural tourism images,film and television works have influenced the landscape formation and cultural transmission of a place through the lens and the attention...Since the era of tourism gaze,in the process of shaping cultural tourism images,film and television works have influenced the landscape formation and cultural transmission of a place through the lens and the attention of the audiences.With the evolution of media and the development of technology,the short videos on rural cultural tourism in China in the digital era go beyond gazing but directly generate the landscape and culture of a place in the real-time multi-directional interaction.With the mobilization mode of dual sensory integration,the short videos on rural cultural tourism in China become geographic media attracting users to participate in emotional communication and tourism practice beyond the objective rational orientation.They attract users to integrate re-localization experiences in real world from the media localization image presentation,and trigger embodied practice from virtual sensory presence.By the ubiquitous interactive nature of digital media and reality,the short videos demonstrate the constructive and generative power of symbolic reality for reality.With the help of technology,the short videos spread local culture all over the world via pre-storing the physical tourism in the callable space and topological time.Through the free transduction of relational logical networks,the short videos make the transition from Plato’s cave,the mimicry environment into the real world,to awaken perception of Chinese solar terms from the time-less sequence,and construct tourism and culture,online,and offline.展开更多
基金supported by the Key Research Program of the Chinese Academy of Sciences(grant number ZDRW-ZS-2021-1-2).
文摘Pulse rate is one of the important characteristics of traditional Chinese medicine pulse diagnosis,and it is of great significance for determining the nature of cold and heat in diseases.The prediction of pulse rate based on facial video is an exciting research field for getting palpation information by observation diagnosis.However,most studies focus on optimizing the algorithm based on a small sample of participants without systematically investigating multiple influencing factors.A total of 209 participants and 2,435 facial videos,based on our self-constructed Multi-Scene Sign Dataset and the public datasets,were used to perform a multi-level and multi-factor comprehensive comparison.The effects of different datasets,blood volume pulse signal extraction algorithms,region of interests,time windows,color spaces,pulse rate calculation methods,and video recording scenes were analyzed.Furthermore,we proposed a blood volume pulse signal quality optimization strategy based on the inverse Fourier transform and an improvement strategy for pulse rate estimation based on signal-to-noise ratio threshold sliding.We found that the effects of video estimation of pulse rate in the Multi-Scene Sign Dataset and Pulse Rate Detection Dataset were better than in other datasets.Compared with Fast independent component analysis and Single Channel algorithms,chrominance-based method and plane-orthogonal-to-skin algorithms have a more vital anti-interference ability and higher robustness.The performances of the five-organs fusion area and the full-face area were better than that of single sub-regions,and the fewer motion artifacts and better lighting can improve the precision of pulse rate estimation.
文摘Audio description(AD),unlike interlingual translation and interpretation,is subject to unique constraints as a spoken text.Facilitated by AD,educational videos on COVID-19 anti-virus measures are made accessible to the visually disadvantaged.In this study,a corpus of AD of COVID-19 educational videos is developed,named“Audio Description Corpus of COVID-19 Educational Videos”(ADCCEV).Drawing on the model of Textual and Linguistic Audio Description Matrix(TLADM),this paper aims to identify the linguistic and textual idiosyncrasies of AD themed on COVID-19 response released by the New Zealand Government.This study finds that linguistically,the AD script uses a mix of complete sentences and phrases,the majority being in Present Simple tense.Present participles and the“with”structure are used for brevity.Vocabulary is diverse,with simpler words for animated explainers.Third-person pronouns are common in educational videos.Color words are a salient feature of AD,where“yellow”denotes urgency,and“red”indicates importance,negativity,and hostility.On textual idiosyncrasies,coherence is achieved through intermodal components that align with the video’s mood and style.AD style varies depending on the video’s purpose,from informative to narrative or expressive.
文摘The Learning management system(LMS)is now being used for uploading educational content in both distance and blended setups.LMS platform has two types of users:the educators who upload the content,and the students who have to access the content.The students,usually rely on text notes or books and video tutorials while their exams are conducted with formal methods.Formal assessments and examination criteria are ineffective with restricted learning space which makes the student tend only to read the educational contents and videos instead of interactive mode.The aim is to design an interactive LMS and examination video-based interface to cater the issues of educators and students.It is designed according to Human-computer interaction(HCI)principles to make the interactive User interface(UI)through User experience(UX).The interactive lectures in the form of annotated videos increase user engagement and improve the self-study context of users involved in LMS.The interface design defines how the design will interact with users and how the interface exchanges information.The findings show that interactive videos for LMS allow the users to have a more personalized learning experience by engaging in the educational content.The result shows a highly personalized learning experience due to the interactive video and quiz within the video.
文摘For intelligent surveillance videos,anomaly detection is extremely important.Deep learning algorithms have been popular for evaluating realtime surveillance recordings,like traffic accidents,and criminal or unlawful incidents such as suicide attempts.Nevertheless,Deep learning methods for classification,like convolutional neural networks,necessitate a lot of computing power.Quantum computing is a branch of technology that solves abnormal and complex problems using quantum mechanics.As a result,the focus of this research is on developing a hybrid quantum computing model which is based on deep learning.This research develops a Quantum Computing-based Convolutional Neural Network(QC-CNN)to extract features and classify anomalies from surveillance footage.A Quantum-based Circuit,such as the real amplitude circuit,is utilized to improve the performance of the model.As far as my research,this is the first work to employ quantum deep learning techniques to classify anomalous events in video surveillance applications.There are 13 anomalies classified from the UCF-crime dataset.Based on experimental results,the proposed model is capable of efficiently classifying data concerning confusion matrix,Receiver Operating Characteristic(ROC),accuracy,Area Under Curve(AUC),precision,recall as well as F1-score.The proposed QC-CNN has attained the best accuracy of 95.65 percent which is 5.37%greater when compared to other existing models.To measure the efficiency of the proposed work,QC-CNN is also evaluated with classical and quantum models.
文摘Football is one of the most-watched sports,but analyzing players’per-formance is currently difficult and labor intensive.Performance analysis is done manually,which means that someone must watch video recordings and then log each player’s performance.This includes the number of passes and shots taken by each player,the location of the action,and whether or not the play had a successful outcome.Due to the time-consuming nature of manual analyses,interest in automatic analysis tools is high despite the many interdependent phases involved,such as pitch segmentation,player and ball detection,assigning players to their teams,identifying individual players,activity recognition,etc.This paper proposes a system for developing an automatic video analysis tool for sports.The proposed system is the first to integrate multiple phases,such as segmenting the field,detecting the players and the ball,assigning players to their teams,and iden-tifying players’jersey numbers.In team assignment,this research employed unsu-pervised learning based on convolutional autoencoders(CAEs)to learn discriminative latent representations and minimize the latent embedding distance between the players on the same team while simultaneously maximizing the dis-tance between those on opposing teams.This paper also created a highly accurate approach for the real-time detection of the ball.Furthermore,it also addressed the lack of jersey number datasets by creating a new dataset with more than 6,500 images for numbers ranging from 0 to 99.Since achieving a high perfor-mance in deep learning requires a large training set,and the collected dataset was not enough,this research utilized transfer learning(TL)to first pretrain the jersey number detection model on another large dataset and then fine-tune it on the target dataset to increase the accuracy.To test the proposed system,this paper presents a comprehensive evaluation of its individual stages as well as of the sys-tem as a whole.
基金supported in part by ZTE Industry⁃University⁃Institute Coop⁃eration Funds.
文摘With the rapid development of immersive multimedia technologies,360-degree video services have quickly gained popularity and how to ensure sufficient spatial presence of end users when viewing 360-degree videos becomes a new challenge.In this regard,accurately acquiring users’sense of spatial presence is of fundamental importance for video service providers to improve their service quality.Unfortunately,there is no efficient evaluation model so far for measuring the sense of spatial presence for 360-degree videos.In this paper,we first design an assessment framework to clarify the influencing factors of spatial presence.Related parameters of 360-degree videos and headmounted display devices are both considered in this framework.Well-designed subjective experiments are then conducted to investigate the impact of various influencing factors on the sense of presence.Based on the subjective ratings,we propose a spatial presence assessment model that can be easily deployed in 360-degree video applications.To the best of our knowledge,this is the first attempt in literature to establish a quantitative spatial presence assessment model by using technical parameters that are easily extracted.Experimental results demonstrate that the proposed model can reliably predict the sense of spatial presence.
文摘In this paper, we will be looking at our efforts to find a novel solution for motion deblurring in videos. In addition, our solution has the requirement of being camera-independent. This means that the solution is fully implemented in software and is not aware of any of the characteristics of the camera. We found a solution by implementing a Convolutional Neural Network-Long Short Term Memory (CNN-LSTM) hybrid model. Our CNN-LSTM is able to deblur video without any knowledge of the camera hardware. This allows it to be implemented on any system that allows the camera to be swapped out with any camera model with any physical characteristics.
文摘Detecting feature points on the human body in video frames is a key step for tracking human movements. There have been methods developed that leverage models of human pose and classification of pixels of the body image. Yet, occlusion and robustness are still open challenges. In this paper, we present an automatic, model-free feature point detection and action tracking method using a time-of-flight camera. Our method automatically detects feature points for movement abstraction. To overcome errors caused by miss-detection and occlusion, a refinement method is devised that uses the trajectory of the feature points to correct the erroneous detections. Experiments were conducted using videos acquired with a Microsoft Kinect camera and a publicly available video set and comparisons were conducted with the state-of-the-art methods. The results demonstrated that our proposed method delivered improved and reliable performance with an average accuracy in the range of 90 %.The trajectorybased refinement also demonstrated satisfactory effectiveness that recovers the detection with a success rate of 93.7 %. Our method processed a frame in an average time of 71.1 ms.
文摘Although compressive measurements save data storage and bandwidth usage, they are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random matrix destroys the target location information in the original video frames. This paper summarizes our research effort on target tracking and classification directly in the compressive measurement domain. We focus on one particular type of compressive measurement using pixel subsampling. That is, original pixels in video frames are randomly subsampled. Even in such a special compressive sensing setting, conventional trackers do not work in a satisfactory manner. We propose a deep learning approach that integrates YOLO (You Only Look Once) and ResNet (residual network) for multiple target tracking and classification. YOLO is used for multiple target tracking and ResNet is for target classification. Extensive experiments using short wave infrared (SWIR), mid-wave infrared (MWIR), and long-wave infrared (LWIR) videos demonstrated the efficacy of the proposed approach even though the training data are very scarce.
文摘Nowadays,people use online resources such as educational videos and courses.However,such videos and courses are mostly long and thus,summarizing them will be valuable.The video contents(visual,audio,and subtitles)could be analyzed to generate textual summaries,i.e.,notes.Videos’subtitles contain significant information.Therefore,summarizing subtitles is effective to concentrate on the necessary details.Most of the existing studies used Term Frequency-Inverse Document Frequency(TF-IDF)and Latent Semantic Analysis(LSA)models to create lectures’summaries.This study takes another approach and applies LatentDirichlet Allocation(LDA),which proved its effectiveness in document summarization.Specifically,the proposed LDA summarization model follows three phases.The first phase aims to prepare the subtitle file for modelling by performing some preprocessing steps,such as removing stop words.In the second phase,the LDA model is trained on subtitles to generate the keywords list used to extract important sentences.Whereas in the third phase,a summary is generated based on the keywords list.The generated summaries by LDA were lengthy;thus,a length enhancement method has been proposed.For the evaluation,the authors developed manual summaries of the existing“EDUVSUM”educational videos dataset.The authors compared the generated summaries with the manual-generated outlines using two methods,(i)Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and(ii)human evaluation.The performance of LDA-based generated summaries outperforms the summaries generated by TF-IDF and LSA.Besides reducing the summaries’length,the proposed length enhancement method did improve the summaries’precision rates.Other domains,such as news videos,can apply the proposed method for video summarization.
文摘The trend of globalization has a profound impact on the development of the whole world, and there are a lot of events held by several countries. The publicity videos have become a significant tool to disseminate connotation of the city and to attract people all over the world. At the same time, videos also show us different cultural values between China and foreign countries.To analyse different cultural values, the importance of intercultural communication will be deeply understood, and it will be helpful for promoting China's cultural soft power.
基金Project (No. 511568) supported by the European Commissionwithin Framework Program 6 with the acronym 3DTV
文摘The authors propose a novel method for transporting multi-view videos that aims to keep the bandwidth requirements on both end-users and servers as low as possible. The method is based on application layer multicast, where each end point re- ceives only a selected number of views required for rendering video from its current viewpoint at any given time. The set of selected videos changes in real time as the user’s viewpoint changes because of head or eye movements. Techniques for reducing the black-outs during fast viewpoint changes were investigated. The performance of the approach was studied through network experiments.
基金This work was supported in part by National Science Foundation Project of P.R.China(Grant Nos.61503424,61331013)。
文摘In the current era of multimedia information,it is increasingly urgent to realize intelligent video action recognition and content analysis.In the past few years,video action recognition,as an important direction in computer vision,has attracted many researchers and made much progress.First,this paper reviews the latest video action recognition methods based on Deep Neural Network and Markov Logic Network.Second,we analyze the characteristics of each method and the performance from the experiment results.Then compare the emphases of these methods and discuss the application scenarios.Finally,we consider and prospect the development trend and direction of this field.
基金supported by Innovate UK,which is a part of UK Research&Innovation,under the Knowledge Transfer Partnership(KTP)program(Project No.11433)supported by the Grand Information Technology Research Center Program through the Institute of Information&Communications Technology and Planning&Evaluation(IITP)funded by the Ministry of Science and ICT(MSIT),Korea(IITP-2020-2020-0-01612)。
文摘With the advent in services such as telemedicine and telesurgery,provision of continuous quality monitoring for these services has become a challenge for the network operators.Quality standards for provision of such services are application specic as medical imagery is quite different than general purpose images and videos.This paper presents a novel full reference objective video quality metric that focuses on estimating the quality of wireless capsule endoscopy(WCE)videos containing bleeding regions.Bleeding regions in gastrointestinal tract have been focused in this research,as bleeding is one of the major reasons behind several diseases within the tract.The method jointly estimates the diagnostic as well as perceptual quality of WCE videos,and accurately predicts the quality,which is in high correlation with the subjective differential mean opinion scores(DMOS).The proposed combines motion quality estimates,bleeding regions’quality estimates based on support vector machine(SVM)and perceptual quality estimates using the pristine and impaired WCE videos.Our method Quality Index for Bleeding Regions in Capsule Endoscopy(QI-BRiCE)videos is one of its kind and the results show high correlation in terms of Pearson’s linear correlation coefcient(PLCC)and Spearman’s rank order correlation coefcient(SROCC).An F-test is also provided in the results section to prove the statistical signicance of our proposed method.
基金supported by the National Natural Science Foundation of China(Nos.U2001202,62072480,U1736118)the National Key R&D Program of China(Nos.2019QY2202,2019QY(Y)0207)+1 种基金the Key Areas R&D Program of Guangdong(No.2019B010136002)the Key Scientific Research Program of Guangzhou(No.201804020068).
文摘In recent years,with the rapid development of deep learning technologies,some neural network models have been applied to generate fake media.DeepFakes,a deep learning based forgery technology,can tamper with the face easily and generate fake videos that are difficult to be distinguished by human eyes.The spread of face manipulation videos is very easy to bring fake information.Therefore,it is important to develop effective detection methods to verify the authenticity of the videos.Due to that it is still challenging for current forgery technologies to generate all facial details and the blending operations are used in the forgery process,the texture details of the fake face are insufficient.Therefore,in this paper,a new method is proposed to detect DeepFake videos.Firstly,the texture features are constructed,which are based on the gradient domain,standard deviation,gray level co-occurrence matrix and wavelet transform of the face region.Then,the features are processed by the feature selection method to form a discriminant feature vector,which is finally employed to SVM for classification at the frame level.The experimental results on the mainstream DeepFake datasets demonstrate that the proposed method can achieve ideal performance,proving the effectiveness of the proposed method for DeepFake videos detection.
文摘In-loop filtering significantly helps detect and remove blocking artifacts across block boundaries in low bitrate coded High Efficiency Video Coding(HEVC)frames and improves its subjective visual quality in multimedia services over communication networks.However,on faster processing of the complex videos at a low bitrate,some visible artifacts considerably degrade the picture quality.In this paper,we proposed a four-step fuzzy based adaptive deblocking filter selection technique.The proposed method removes the quantization noise,blocking artifacts and corner outliers efficiently for HEVC coded videos even at low bit-rate.We have considered Y(luma),U(chromablue),and V(chroma-red)components parallelly.Finally,we have developed a fuzzy system to detect blocking artifacts and use adaptive filters as per requirement in all four quadrants,namely up 45◦,down 45◦,up 135◦,and down 135◦across horizontal and vertical block boundaries.In this context,experimentation is done on a wide variety of videos.An objective and subjective analysis is carried out with MATLAB software and Human Visual System(HVS).The proposed method substantially outperforms existing postprocessing deblocking techniques in terms of YPSNR and BD_rate.In the proposed method,we achieved 0.32–0.97 dB values of YPSNR.Our method achieved a BD_rate of+1.69%for the luma component,−0.18%(U)and−1.99%(V)for chroma components,respectively,with respect to the stateof-the-art methods.The proposed method proves to have low computational complexity and has better parallel processing,hence suitable for a real-time system in the near future.
基金This work was supported by the Deanship of Scientific Research at King Khalid University through a General Research Project under Grant Number GRP/41/42.
文摘Nowadays,the most challenging and important problem of computer vision is to detect human activities and recognize the same with temporal information from video data.The video datasets are generated using cameras available in various devices that can be in a static or dynamic position and are referred to as untrimmed videos.Smarter monitoring is a historical necessity in which commonly occurring,regular,and out-of-the-ordinary activities can be automatically identified using intelligence systems and computer vision technology.In a long video,human activity may be present anywhere in the video.There can be a single ormultiple human activities present in such videos.This paper presents a deep learning-based methodology to identify the locally present human activities in the video sequences captured by a single wide-view camera in a sports environment.The recognition process is split into four parts:firstly,the video is divided into different set of frames,then the human body part in a sequence of frames is identified,next process is to identify the human activity using a convolutional neural network and finally the time information of the observed postures for each activity is determined with the help of a deep learning algorithm.The proposed approach has been tested on two different sports datasets including ActivityNet and THUMOS.Three sports activities like swimming,cricket bowling and high jump have been considered in this paper and classified with the temporal information i.e.,the start and end time for every activity present in the video.The convolutional neural network and long short-term memory are used for feature extraction of temporal action recognition from video data of sports activity.The outcomes show that the proposed method for activity recognition in the sports domain outperforms the existing methods.
文摘Background: Real-time use of procedure videos as educational tools has not been studied. We sought to determine whether viewing a video of a medical procedure prior to procedure performance in the emergency department improves the quality of teaching of procedures, and whether videos are particularly beneficial during periods of emergency department crowding. Methods: In this single-centre, prospective, before and after study standardized data collection forms were completed by both trainees and supervising emergency physicians (EPs) at the end of each emergency department shift in the before (August 2008-March 2009) and after (August 2009-March 2010) phase. Online procedure videos were introduced on emergency department computers in the after phase. The primary outcome measure was EP rating of the quality of teaching provided (5-point Likert scale). The interaction between crowding and videos was also assessed, to determine whether videos provide a specific additional benefit during periods of emergency department crowding. Results: There were 1159 procedures performed by 192 trainees. Median procedures performed per shift was 1.0 (IQR 0 - 2.0). Mean EP rating of teaching provided was significantly higher in the group that viewed videos, at 4.2 versus 3.7 (p 0.001). In the adjusted analysis, EP ratings increased by 0.5 with a video (p 0.001), while the odds of a score of 5.0 were 2.2 times greater if a video was viewed (p = 0.03). The interaction of crowding and procedure videos was not significant (the use of videos increased the average score by 0.24 in times of crowding compared to times of non-crowding, p = 0.19). Conclusions: Use of procedural videos was associated with EP perception of improved quality of teaching provided around procedures. While EPs rated the quality of their teaching as improved overall, the effect of videos on teaching quality was the same in crowded settings as it was in non-crowded setting.
文摘Existing broadcasting schemes provide services for the stored videos. The basic approach in these schemes is to divide the video into segments and organize them over the channels for proper transmission. Some schemes use segments as a basic unit, whereas the others require segments to be further divided into subsegments. In a scheme, the number of segments/subsegments depends upon the bandwidth allocated to the video by the video server. For constructing segments, the video length should be known. If it is unknown, then the segments cannot be constructed and hence the scheme cannot be applied to provide the video services. This is an important issue especially in live broadcasting applications wherein the ending time of the video is unknown, for example, cricket match. In this paper, we propose a mechanism for the conservative staircase scheme so that it can support live video broadcasting.
文摘Since the era of tourism gaze,in the process of shaping cultural tourism images,film and television works have influenced the landscape formation and cultural transmission of a place through the lens and the attention of the audiences.With the evolution of media and the development of technology,the short videos on rural cultural tourism in China in the digital era go beyond gazing but directly generate the landscape and culture of a place in the real-time multi-directional interaction.With the mobilization mode of dual sensory integration,the short videos on rural cultural tourism in China become geographic media attracting users to participate in emotional communication and tourism practice beyond the objective rational orientation.They attract users to integrate re-localization experiences in real world from the media localization image presentation,and trigger embodied practice from virtual sensory presence.By the ubiquitous interactive nature of digital media and reality,the short videos demonstrate the constructive and generative power of symbolic reality for reality.With the help of technology,the short videos spread local culture all over the world via pre-storing the physical tourism in the callable space and topological time.Through the free transduction of relational logical networks,the short videos make the transition from Plato’s cave,the mimicry environment into the real world,to awaken perception of Chinese solar terms from the time-less sequence,and construct tourism and culture,online,and offline.