Human activity detection and recognition is a challenging task.Video surveillance can benefit greatly by advances in Internet of Things(IoT)and cloud computing.Artificial intelligence IoT(AIoT)based devices form the b...Human activity detection and recognition is a challenging task.Video surveillance can benefit greatly by advances in Internet of Things(IoT)and cloud computing.Artificial intelligence IoT(AIoT)based devices form the basis of a smart city.The research presents Intelligent dynamic gesture recognition(IDGR)using a Convolutional neural network(CNN)empowered by edit distance for video recognition.The proposed system has been evaluated using AIoT enabled devices for static and dynamic gestures of Pakistani sign language(PSL).However,the proposed methodology can work efficiently for any type of video.The proposed research concludes that deep learning and convolutional neural networks give a most appropriate solution retaining discriminative and dynamic information of the input action.The research proposes recognition of dynamic gestures using image recognition of the keyframes based on CNN extracted from the human activity.Edit distance is used to find out the label of the word to which those sets of frames belong to.The simulation results have shown that at 400 videos per human action,100 epochs,234×234 image size,the accuracy of the system is 90.79%,which is a reasonable accuracy for a relatively small dataset as compared to the previously published techniques.展开更多
文摘Human activity detection and recognition is a challenging task.Video surveillance can benefit greatly by advances in Internet of Things(IoT)and cloud computing.Artificial intelligence IoT(AIoT)based devices form the basis of a smart city.The research presents Intelligent dynamic gesture recognition(IDGR)using a Convolutional neural network(CNN)empowered by edit distance for video recognition.The proposed system has been evaluated using AIoT enabled devices for static and dynamic gestures of Pakistani sign language(PSL).However,the proposed methodology can work efficiently for any type of video.The proposed research concludes that deep learning and convolutional neural networks give a most appropriate solution retaining discriminative and dynamic information of the input action.The research proposes recognition of dynamic gestures using image recognition of the keyframes based on CNN extracted from the human activity.Edit distance is used to find out the label of the word to which those sets of frames belong to.The simulation results have shown that at 400 videos per human action,100 epochs,234×234 image size,the accuracy of the system is 90.79%,which is a reasonable accuracy for a relatively small dataset as compared to the previously published techniques.