摘要
频发的交通事故证明驾驶车辆是一类高风险行为,驾驶人的风险驾驶行为是引发交通事故的主要诱因,自动驾驶系统辅助或代替人类驾驶人被视为从根本上解决人为因素导致行车风险的有效途径。首先,以智能汽车的整体安全性最大化为目标,使用熵权-逼近理想解排序法(Technique for Order Preference by Similarity to Ideal Solution,TOPSIS)和完全静态博弈理论建立了人、车风险博弈模型,提出了相对效用最大化的策略函数并嵌入强化学习奖励函数中,推理了以最大化车辆安全性期望为导向的强化学习奖惩机制。其次,利用强化学习算法擅长解决序列决策问题的优势,提出了基于优势演员评论家(Advantage Actor Critic,A2C)的人机共驾控制权决策方法,通过迭代人、车风险决策权重和奖励函数优化了决策模型的输出效果,借助模型性能评价指标进行了训练过程和结果的有效性验证。最后,通过仿真试验分析了不同切换时机对车辆安全性的影响,提出了能够及时有效地限制驾驶人风险行为并提升车辆安全性的控制权决策方法。研究结果表明:研究创新地以人、车风险监测模块分别映射至A2C的演员、评论家模块为研究框架,充分发挥了智能汽车与人、车风险状态进行交互并通过获取奖励来迭代更新取得最大回报的效果,实现了以促进车辆安全性最大化为导向的人机共驾控制权决策方法。
Frequent traffic accidents have proved that driving is a high-risk event and risky driving behaviors are one of the main causes.Using an automatic driving system as an agent to assist or replace human drivers is considered an effective way to fundamentally solve the threats caused by human factors.First,to maximize the overall safety of intelligent vehicles,a human-vehicle risk game model was established by utilizing entropy-technique for order preference by similarity to ideal solution(TOPSIS)and complete static game theory.A strategy function to maximize the relative utility was proposed and embedded in the reinforcement learning reward function,then the reward and punishment mechanism guided by maximizing vehicle safety expectation were deduced.Second,taking advantage of reinforcement learning which is good at solving sequence decision-making problems,a human-vehicle driving control transition method based on advantage actor critical(A2 C)was proposed.The output effect of the decision model was optimized by iterating the decision weights and reward functions,and the validity of the training process and result was verified by the model performance evaluation indices.Finally,the influence of different transition times on vehicle safety was analyzed through simulation test.A control right decision-making method that can limit risky behaviors and improve vehicle safety timely and effectively was proposed.The results showed that this research innovatively takes the actor and critic modules mapped from the human and vehicle risk monitoring module to A2 C as the framework,which fully utilizes the interaction between intelligent vehicle and human-vehicle risk state.Moreover,it achieves the maximum return by obtaining rewards updated iteratively.The decision-making method of human-machine driving control right guided by promoting the maximization of vehicle safety is realized.
作者
郭柏苍
王胤霖
谢宪毅
金立生
韩广德
GUO Bo-cang;WANG Yin-lin;XIE Xian-yi;JIN Li-sheng;HAN Guang-de(School of Vehicle and Energy,Yanshan University,Qinhuangdao 066004,Hebei,China)
出处
《中国公路学报》
EI
CAS
CSCD
北大核心
2022年第3期153-165,共13页
China Journal of Highway and Transport
基金
国家自然科学基金项目(52072333,U19A2069)
河北省省级科技计划资助项目(E2020203092,20310801D,F2021203107).
关键词
汽车工程
控制权决策
强化学习
人机共驾
汽车人因工程
智能车辆
automotive engineering
control right transition
reinforcement learning
human-machine shared driving
automotive human factors engineering
intelligent vehicle