浅谈 Transformer 和 Reinforcement Learning

1. The mechanism of Transformer

2017 年，Google 在论文 Attention is All you need 中提出了 Transformer 模型，其使用 Self-Attention 结构取代了在 NLP 任务中常用的 RNN 网络结构。相比 RNN 网络结构，其最大的优点是可以并行计算。Transformer 的整体模型架构具体图和简略图分别在下面：

Transformer 本质上是一个 Encoder-Decoder 架构。因此中间部分的 Transformer 可以分为两个部分：Encoder组件和Decoder组件. 其中，编码组件由多个相同模组块儿组成（在论文中作者使用了 6 层模组）. Decoder 组件也是由相同层数的模组块儿（在论文也使用了 6 层），

其中每个编码模组由两个子层组成：Self-Attention 层和全连接层组成。 Self-Attention 层类似于卷积操作，是提取特征的层，以翻译任务为例，具体Encoder如下图(里面就是多了一个类似协方差矩阵的东西，called E)

其中每个解码模组也由两个子层组成：Self-Attention 层和全连接层组成。区别于前面 Self-Attention 层，它会加入一个不能预知未来的矩阵M，即当前单词不要有后面单词的相关性。具体Decoder如下

2. The mechanism of Transformer

强化学习就是用马氏决策过程，根据环境反馈，学习人类决策的过程。其实最重要的就是Reward的建立成本，Reward 的建立分为两种：

第一种是每一个环境交互都会有一个小的reward，最后一起加权平均有一个总的Reward。

优点：训练周期短，缺点：会限制究极最优解的产生，如果要想超过人类，最好不要限制太多。目前落地应用：自动驾驶、各种棋类，ChatGPT。

第二种是直接给一个最终Reward。优缺点和第一个相反。落地应用为各种游戏。

设计到的具体定义如下：

As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in unsupervised learning is to find similarities and differences between data points, in the case of reinforcement learning the goal is to find a suitable action model that would maximize the total cumulative reward of the agent. The figure below illustrates the action-reward feedback loop of a generic RL model.

Environment — Physical world in which the agent operates
State — Current situation of the agent
Reward — Feedback from the environment
Policy — Method to map agent’s state to actions
Value — Future reward that an agent would receive by taking an action in a particular state

Markov Decision Processes(MDPs) are mathematical frameworks to describe an environment in RL and almost all RL problems can be formulated using MDPs. An MDP consists of a set of finite environment states S, a set of possible actions A(s) in each state, a real valued reward function R(s) and a transition model P(s’, s | a). However, real world environments are more likely to lack any prior knowledge of environment dynamics. Model-free RL methods come handy in such cases.

再谈ChatGPT应用

1.客服领域： Chat GPT 可以用于聊天机器人、智能客服等领域，帮助用户解决问题，提升客户满意度。

2.金融领域： Chat GPT 可以用于智能客服、风控等领域，提高金融服务的效率和质量。

3.医疗领域： Chat GPT 可以用于智能诊断、病历管理、医疗问答等领域，提高医疗服务的效率和质量。

4.零售领域： Chat GPT 可以用于智能导购、客服等领域，提升用户体验，促进销售。

5.教育领域： Chat GPT 可以用于智能教学、智能辅导等领域，提高教育服务的效率和质量。

6.旅游领域： Chat GPT 可以用于智能客服、智能导游等领域，提高旅游服务的效率和质量。

总之，Chat GPT 可以用于任何需要处理自然语言的领域，可以帮助企业提升服务质量，降低成本，提高效率。

参考：[1] Transformer 模型详解 https://blog.csdn/benzhujie1245com/article/details/117173090

[2]这么多年，终于有人讲清楚Transformer了

[3] Nisan Stiennon, Long Ouyang, Jeff Wu, Learning to summarize from human feedback.Nips 2020.

更多推荐

浅谈 Transformer 和 Reinforcement Learning

浅谈 Transformer 和 Reinforcement Learning

1. The mechanism of Transformer

2. The mechanism of Transformer

发布评论取消回复

最近发表

热门文章

标签列表

浅谈 Transformer 和 Reinforcement Learning

1. The mechanism of Transformer

2. The mechanism of Transformer

相关文章

发布评论取消回复

最近发表

热门文章

标签列表