U-Clustering:基于效用聚类的激励学习算法

U-Clustering: A Reinforcement Learning Algorithm Based on Utility Clustering

下载PDF

导出

摘要提出了一个新的效用聚类激励学习算法U-Clustering。该算法完全不用像U-Tree算法那样进行边缘节点的生成和测试,它首先根据实例链的观测动作值对实例进行聚类,然后对每个聚类进行特征选择,最后再进行特征压缩,经过压缩后的新特征就成为新的状态空间树节点。通过对NewYorkDriving[2,13]的仿真和算法的实验分析,表明U-Clustering算法对解决大型部分可观测环境问题是比较有效的算法。 That presented in this paper is a new utility clustering based reinforcement learning algorithm called U-Clustering.Unlike the U-Tree,it does not use fringe and related statistical test at all.The U-Clustering algorithm groups the instances that have matching history up to a certain length into a cluster based on the observation-action value of them,and makes the feature selecting and feature compressing for each cluster.The new features become new nodes in the agent＇s internal state space tree.Experimental results in a difficult partially observable driving task called New York Driving show that the U-Clustering algorithm is an effective one for solving the large partially observable problems.

作者陈焕文殷苌茗谢丽娟

机构地区长沙理工大学计算机与通信工程学院

出处《计算机工程与应用》 CSCD 北大核心 2005年第26期37-42,74,共7页 Computer Engineering and Applications

基金国家自然科学基金(编号:60075019)资助

关键词激励学习效用聚类部分可观测Markov决策过程 reinforcement learning, utility clustering, partially observable Markov decision processes （POMDPs）

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献17

1McCallum A.Efficient exploration in reinforcement learning with hidden state[C].In:AAAI Fall Symposium on Model-directed Autonomous Systems, 1997.
2McCallum A.Reinforcement Learning with Selective Perception and Hidden State[D].Ph D Thesis.Rochester NY:Dept of Computer Science,University of Rochester,1995.
3MitchellTM著曾华军张银奎译.机器学习[M].北京:机械工业出版社,2003..
4Uther W T B,Veloso M M.Tree based discretization for continuous state space reinforcement learning[C].In:Proceedings of AAAI-98, Madison, WI, 1998-07.
5Breslow L A.Greedy utile suffix memory for reinforcement learning with perceptually-aliased states[R].NCARAI Technical Report No AIC-96-004,1996.
6Inoue K,Ota J,Arai T.Autonomous state space construction in POMDP with continuous observation space[C].In:Proceedings of 4th I-FAC Symposium on Intelligent Autonomous Vehicles,2001:255-260.
7Suematsu N,Hayashi A.A reinforcement learning algorithm in partially observable environments using short-term memory[C].In:Proceedings of Neural Information Processing Systems, 1998:IPS 11.
8Gardiol N H,Mahadevan S.Hierarchical memory-based reinforcement learning[C].In:Advances in Neural Information Processing Systems, 14,MIT Press,2001.
9Jonsson A,Barto A G.Automated state abstraction for options using the U-Tree algorithm[C].In:Advances in Neural Information Processing Systems, 13,MIT Press, 1054-1060.
10Kaelbling L P,Oates T,Hernandez Net al.Learning in worlds with objects[C].In:working Notes of AAAI Spring Symposium Workshop: Learning Grounded Representation,2001.

共引文献1

1何雪英,秦伟,尹义龙,赵联征,乔昊.基于机器学习的视频指纹识别[J].山东大学学报（工学版）,2011,41(4):29-33. 被引量：1

1任丽芳,王文剑,许行.不确定感知的自适应云计算服务组合[J].计算机研究与发展,2016,53(12):2867-2881. 被引量：7
2殷苌茗,王汉兴,陈焕文,谢丽娟.求解POMDP的动态合并激励学习算法[J].计算机工程,2005,31(22):4-6. 被引量：1
3林巧.利用回溯法求解若干问题的探讨[J].计算机时代,2002(8):39-40. 被引量：2
4谢丽娟,陈焕文.部分可观测Markov环境下的激励学习综述[J].长沙电力学院学报（自然科学版）,2002,17(2):23-27.
5刘运通,梁燕军.自然语言语义相关度计算模型的k枝剪求解法[J].计算机工程与设计,2013,34(8):2939-2943. 被引量：7
6唐中勇,付强,卓佳,陈焕文.一类基于启发式搜索的激励学习算法[J].计算机技术与发展,2006,16(8):41-43. 被引量：2
7王学宁,贺汉根,徐昕.求解部分可观测马氏决策过程的强化学习算法[J].控制与决策,2004,19(11):1263-1266. 被引量：5
8盛小伟.论剩余电流火灾报警系统动作值[J].电气应用,2014,0(14):93-96.
9任小康,吴尚智,苟平章.基于动态状态树的回溯算法[J].计算机工程与设计,2007,28(4):755-756. 被引量：10
10殷苌茗,陈焕文,谢丽娟.基于每阶段平均费用最优的激励学习算法[J].计算机应用,2002,22(4):25-27. 被引量：3

计算机工程与应用

2005年第26期

浏览历史

内容加载中请稍等...

U-Clustering:基于效用聚类的激励学习算法

参考文献17

共引文献1

相关作者

相关机构

相关主题

浏览历史