Agent Attention: On The Integration Of Softmax And Linear Attention

Di: Stella

文章浏览阅读2.8k次，点赞35次，收藏48次。本文要是对《Agent Attention On the Integration of Softmax and Linear Attention》这篇论文的一个解读与总结，原文链接本文提出一种新型Transformer模型，它结合了Linear tokens Notably agent attention has Attention和Softmax Attention的优点，在计算效率和表示能力之间取得良好的平衡。具体来说，Agent Attention

Softmax注意力与线性注意力的优雅融合，Agent Attention推动注意力新升级 - 知乎

作者丨科技猛兽编辑丨极市平台本文目录1 Agent Attention：集成 Softmax 和 Linear 注意力机制 (来自清华，黄高老师团队) 1 Agent Attention 论文解读 1.1 Agent Attention 集成 Softmax Attention 和 Linear Att The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation

这篇文章提出了一种新的注意力机制—— 代理注意力（Agent Attention），旨在解决传统 Transformer 模型中Softmax注意力机制计算复杂度高的问题，同时保持其强大的表达能力。 Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its a novel attention paradigm Agent linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training.

Agent Attention: on the Integration of Softmax and Linear Attention

In this paper, to break the low-rank dilemma of linear attention, we conduct rank analysis from two perspectives: the KV buffer and the output features. Consequently, we introduce Rank-Augmented Linear Attention (RALA), which rivals the performance of Softmax attention while maintaining linear complexity and high efficiency. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training.

Implementation of Agent Attention in Pytorch
Agent Attention: On the Integration of Softmax and Linear Attention
dblp: Agent Attention: On the Integration of Softmax and Linear Attention.
【深度学习】注意力机制（七）Agent Attention-CSDN博客

Agent 代理注意力 Agent Attention: On the Integration of Softmax and Linear Attention 内容：论文提出了一种新的注意力机制——Agent Attention，它通过引入一组代理令牌（来平衡计算效率和表示能力。 Request PDF | On Nov 1, 2024, Dongchen Han and others published Agent Attention: On the Integration of Softmax and Linear Attention | Find, read and cite all the research you need on ResearchGate Overview The paper explores the integration of softmax and linear attention mechanisms in transformer models, aiming to improve performance and efficiency. It introduces a novel attention module called Agent Attention, which combines the strengths of

Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and The paper introduces Agent Attention, integrating agent tokens with Softmax and linear attention to reduce computation and boost performance on vision tasks.

Agent Attention: On the Integration of Softmax and Linear Attention 论文地址： https://arxiv.org/pdf/2312.08874 问题：普遍使用的 Softmax 注意力机制在视觉 Transformer 模型中计算复杂度过高，限制了其在各种场景中的应用。 Nonetheless, the unsatisfactory performance of linear attention greatly limits its practical application in various scenarios. In this paper, we take a step forward to close the gap between the linear and Softmax attention with novel theoretical analyses, which demystify the core factors behind the per-formance deviations. 3. Agent Attention与Softmax和线性注意力的关系： Agent Attention实际上是Softmax注意力和线性注意力的一种优雅整合，它既保留了Softmax注意力的全局上下文建模能力，又具有线性注意力的高效计算复杂度。四、实验分析 1.

【深度学习】注意力机制（七）Agent Attention-CSDN博客

An illustration of our agent attention and agent attention module. (a) Agent attention uses agent tokens to aggregate global information and distribute it to individual image tokens, resulting in a practical integration of Softmax and linear attention. σ ( ) represents Softmax function. In (b), we depict the information flow of agent attention module. shown remarkable performance As a showcase, we acquire agent tokens Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training.

While the paper raises some questions about the complexity and interpretability of the Agent Attention module, its core contribution of integrating softmax and linear attention in a novel way is a significant step forward in the

Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. 这篇文章来自清华大学。这是对于传统Self-Attention（SA）的改进，仍然是企图将平方的QK计算复杂度逼近至线性。本文采取的方法是引入一个新的类型的Query，称为Agent（A），将SA由（Q，K，V）的范式变为（Q，A Agent Attention：浅谈Softmax与线性Attention的整合摘要注意力模块是 Transformers 中的关键组件。虽然全局注意力机制具有较高的表达能力，但其过高的计算成本限制了其在各种场景中的适用性。

有趣的是，本文展示了 Agent attention 等效于 Linear attention 的广义形式。因此，代理注意力无缝集成了强大的 Softmax attention 和高效的 Linear attention。 The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation Agent Attention Agent Attention: On the Integration of Softmax and Linear Attention 方法：论文提出了一种新颖的注意力机制，Agent Attention，用于在计算效率和表示能力之间取得良好的平衡。

Agent Attention: On the Integration of Softmax and Linear Attention: Paper and Code. The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a Agent Attention：浅谈Softmax与线性Attention的整合摘要注意力模块是 Transformers 中的关键组件。虽然全局注意力机制具有较高的表达能力，但其过高的计算成本限制了其在各种场景中的适用性。 The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation

dblp: Agent Attention: On the Integration of Softmax and Linear Attention.

Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear at- tention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient lin- ear attention. 传统Attention的计算如下（x是输入，W是权重）： Softmax Attention就是将上式中的Sim（Q，K）变成了下式：而Linear Attention的Sim（Q,K）如下：为了简单起见，可以将Softmax Attention和Linear Attention写成下式：那么Agent Attention可以写成：等价于下式（A是引入的Agent

Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Agent Attention引入了一组额外的代理令牌，这些代理令牌首先作为查询令牌的代理，从K和V中聚合信息，然后将信息广播回Q。由于代理令牌的数量可以设计为比查询令牌的数量小得多，因此Agent Attention比广泛采用的Softmax注意力更加高效，同时保留了全局上下文建模

JQDN

General

Agent Attention: On The Integration Of Softmax And Linear Attention

Agent Attention: on the Integration of Softmax and Linear Attention

【深度学习】注意力机制（七）Agent Attention-CSDN博客

dblp: Agent Attention: On the Integration of Softmax and Linear Attention.