The person shown here is a basic user of this site. He or she may be contacted directly for language-related services. |
English to Chinese
Chinese to English
| Hao Wang |
chasing tomorrow's sunrise
Local time: 19:21 CST (GMT+8)
| || |
clients and colleagues
on Willingness to Work Again
|No feedback collected|
| Freelancer |
| This person has a SecurePRO™ card. Because this person is not a ProZ.com Plus subscriber, to view his or her SecurePRO™ card you must be a ProZ.com Business member or Plus subscriber. |
|Translation, Interpreting, Editing/proofreading|
|Computers: Software||Computers: Systems, Networks|
|Mathematics & Statistics||Computers (general)|
|Also works in:|
|Games / Video Games / Gaming / Casino||IT (Information Technology)|
|Science (general)||Certificates, Diplomas, Licenses, CVs|
|Sample translations submitted: 1 |
|English to Chinese: Dimensions of Reinforcement Learning|
|Source text - English|
In this book we have tried to present reinforcement learning not as a collection of individual methods, but as a coherent set of ideas cutting across methods. Each idea can be viewed as a dimension along which methods vary. The set of such dimensions spans a large space of possible methods. By exploring this space at the level of dimensions we hope to obtain the broadest and most lasting understanding. In this chapter we use the concept of dimensions in method space to recapitulate the view of reinforcement learning we have developed in this book and to identify some of the more important gaps in our coverage of the field.
All of the reinforcement learning methods we have explored in this book have three key ideas in common. First, the objective of all of them is the estimation of value functions. Second, all operate by backing up values along actual or possible state trajectories. Third, all follow the general strategy of generalized policy iteration (GPI), meaningt hat they maintain an approximate value function and an approximate policy, and they continually try to improve each on the basis of the other. These three ideas that the methods have in common circumscribe the subject covered in this book.
We suggest that value functions, backups, and GPI are powerful organizing principles potentially relevant to any model of intelligence.
Two of the most important dimensions along which the methods vary are shown in Figure 10.1. These dimensions have to do with the kind of backup used to improve the value function. The vertical dimension is whether they are sample backups (based on a sample trajectory) or full backups (based on a distribution of possible trajectories). Full backups of course require a model, whereas sample backups can be done either with or without a model (another dimension of variation). The horizontal dimension corresponds to the depth of backups, that is, to the degree of bootstrapping. At three of the four comers of the space are the three primary methods for estimating values: DP, TD, and Monte Carlo. Along the lower edge of the space are
the sample-backup methods, ranging from one-step TD backups to full-return Monte Carlo backups. Between these is a spectrum including methods based on n-step backups and mixtures of n-step backups such as the λ-backups implemented by eligibility traces.
|Translation - Chinese|
我们在本书中探究的所有强化学习方法有三个主要的共同思想。第一，它们的目标都是值函数的估计；第二，它们都通过沿着实际的或可能的状态轨迹进行值备份来运作；第三，它们都遵循广义策略迭代（generalized policy iteration，GPI）的一般方式，意味着它们会维护近似值函数和近似策略，且不断地尝试着依据一方而改进另一方。那些方法共有的这三种思想描绘出了本书覆盖的主题。我们认为，值函数、备份和GPI是潜在地关联着任何智能模型的强大组织原理。
|Years of translation experience: 14. Registered at ProZ.com: Dec 2008.|
|Adobe Acrobat, Adobe Illustrator, Adobe Photoshop, DejaVu, Frontpage, Microsoft Excel, Microsoft Word, Powerpoint|
Keywords: Chinese, English, Mathematics, Maths, Computer, IT, research
Profile last updated
Dec 18, 2008