AI-ML·중요도 7·2017. 07. 20.·OpenAI Blog

Proximal Policy Optimization

── KO ──────────────────

근접 정책 최적화(PPO) 알고리즘이 출시되었습니다.

우리는 새로운 강화 학습 알고리즘 클래스인 근접 정책 최적화(Proximal Policy Optimization, PPO)를 출시합니다. PPO는 최첨단 접근 방식과 비교하여 성능이 비슷하거나 더 뛰어나면서도 구현과 조정이 훨씬 간편합니다. OpenAI에서 사용하기 쉽고 성능이 뛰어나기 때문에 PPO를 기본 강화 학습 알고리즘으로 채택하고 있습니다.

── EN ──────────────────

A new reinforcement learning algorithm, Proximal Policy Optimization (PPO), is released.

We are releasing a new class of reinforcement learning algorithms called Proximal Policy Optimization (PPO). PPO performs comparably or better than state-of-the-art approaches and is much simpler to implement and tune. It has become OpenAI's default reinforcement learning algorithm due to its ease of use and good performance.

원문 보기 →목록으로