讲座题目:Minimax Weight Learning for Absorbing MDPs
报告人:李育强
讲座时间:2022年10月27日,星期四:14:00--15:00,
讲座地点:
线下:综合楼644
线上:腾讯会议ID 184-369-637
讲座摘要:
Reinforcement learning policy evaluation problems are often modeled as a finite or infinite-horizon MDP, but this is often unrealistic for practical issues. In this paper, we study off-policy policy estimation for absorbing MDPs. Based on the Minimax Weight Learning (MWL) algorithm, we propose a so-called MWLA algorithm to directly estimate the importance ratio of state-action measure when the behavior policy is unknown, under the assumption that the data is collected by i.i.d. episodes. The Mean Square Error (MSE) bound for the MWLA method is investigated. In the episodic taxi environment, we show that the MWLA method has the lower MSE as the number of episodes and truncation length increase, significantly improving the accuracy of policy evaluation.
This talk is based on a joint work with Fengying Li and Xianyi Wu.
报告人简介:
李育强,华东师范大学统计学院教授,博士生导师,《应用概率统计》期刊编辑部主任。主要研究兴趣包括随机过程理论及其应用,强化学习等方向。主持国家自然科学基金、上海市自然科学基金、上海市教委科研创新重点项目等十余项,目前在Stochastic Processes and Their Applications,Bernoulli,Science China-Mathematics,Journal of Applied Probability等杂志上发表30余篇论文,研究成果被包括墨西哥科学院院士Gorostiza教授在内的数十位国内外学者所引用。