强化学习算法在机器人轨迹优化控制与智能纠偏中的应用被引量：3

Application of Reinforcement Learning Algorithm in Robot Trajectory Optimization Control and Intelligent Deviation Correction

下载PDF

导出

摘要针对传统机器人控制算法在运行轨迹控制方面精度低的不足,提出一种基于优化强化学习算法的轨迹控制与纠偏方法。分析机器人工作过程中关节和连杆空间位置关系及参数变化情况,依据当前的奖惩函数值变化来制定下一时刻机器人运动策略;对强化学习算法进行优化,累计已经获得的奖励值通过综合判断确定最优轨迹,并实施动态化的轨迹纠偏,对机器人位置控制结构中的关节位置矢量控制及连杆空间位姿同步控制,基于多维控制视角提高控制精度。仿真实验结果表明,提出算法的机器人轨迹控制精度更高,且末端执行器与理论轨迹的偏差均值及方差均较小。 Aiming at the low accuracy of traditional robot control algorithms in trajectory control,a trajectory control and deviation correction method based on optimal reinforcement learning algorithm is proposed.Analyze the spatial position relationship and parameter changes of joints and connecting rods during the working process of the robot,and formulate the robot motion strategy at the next time according to the changes of the current reward and punishment function;The reinforcement learning algorithm is optimized,the obtained reward values are accumulated,the optimal trajectory is determined through comprehensive judgment,and the dynamic trajectory correction is implemented.The joint position vector control and linkage space pose synchronization control in the robot position control structure are controlled,and the control accuracy is improved based on the multi-dimensional control perspective.The simulation results show that the robot trajectory control accuracy of the proposed algorithm is higher,and the deviation mean and variance between the end effector and the theoretical trajectory are small.

作者陈宇翔栗强强 CHEN Yu-xiang;LI Qiang-qiang(College of Architectural Engineering,YiWu Industrial&Commercial College,Yiwu 322000,China;School of Big Data and Computer Science,Chongqing College of Mobile Communication,Chongqing 401520,China)

机构地区义乌工商职业技术学院建筑工程学院重庆移通学院大数据与计算机科学学院

出处《组合机床与自动化加工技术》北大核心 2022年第11期111-114,共4页 Modular Machine Tool & Automatic Manufacturing Technique

基金金华市科学技术局科学技术研究计划项目(2021-4-259) 浙江省住房和城乡建设厅建设科研项目(2021K257) 浙江省商务厅对策类课题(2021ZSY50)。

关键词强化学习轨迹优化纠偏奖惩函数值 strengthen learning trajectory optimization correction of deviation value of reward and punishment function

分类号 TH165 [机械工程—机械制造及自动化] TG659 [金属学及工艺—金属切削加工及机床]