AI-MLยท์ค‘์š”๋„ 8ยท2026. 05. 25.ยทr/MachineLearning

๐ƒ๐ž๐ฅ๐ญ๐š ๐€๐ญ๐ญ๐ž๐ง๐ญ๐ข๐จ๐ง ๐‘๐ž๐ฌ๐ข๐๐ฎ๐š๐ฅ๐ฌ [R]

โ”€โ”€ KO โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

๋ธํƒ€ ์ฃผ์˜ ์ž”์ฐจ๊ฐ€ ์ถœ์‹œ๋˜์–ด ๊นŠ์€ ๋ ˆ์ด์–ด์—์„œ ๋ผ์šฐํŒ… ๋ถ•๊ดด๋ฅผ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

๋ธํƒ€ ์ฃผ์˜ ์ž”์ฐจ(DAR)๋Š” ์ž”์—ฌ ์—ฐ๊ฒฐ์„ ๊ฐœ์„ ํ•˜๋Š” ์—…๊ทธ๋ ˆ์ด๋“œ๋กœ, ๊นŠ์€ ๋ ˆ์ด์–ด์—์„œ์˜ ๋ผ์šฐํŒ… ๋ถ•๊ดด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋ˆ„์  ํžˆ๋“  ์ƒํƒœ ๋Œ€์‹  ๋ธํƒ€ ๊ฐ’์„ ์‚ฌ์šฉํ•ด ๋ผ์šฐํŒ…์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ณ , != 1.8๋ฐฐ์˜ ๊ต์ฐจ ๋ ˆ์ด์–ด ๋ผ์šฐํŒ…์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๋ธํƒ€ ๋ธ”๋ก์€ ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ ์˜ค๋ฒ„ํ—ค๋“œ๋กœ ์„ฑ๋Šฅ์„ ๋†’์ด๋ฉด์„œ๋„ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค.


โ”€โ”€ EN โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Delta Attention Residuals are released to improve routing collapse in deep layers.

Delta Attention Residuals (DAR) are a drop-in upgrade to existing residual connections, addressing routing collapse issues prevalent in deep layers. By utilizing delta values instead of cumulative hidden states, DAR improves cross-layer routing by 1.8 times. The delta block introduces minimal parameter overhead while enhancing performance and reducing memory usage.

์›๋ฌธ ๋ณด๊ธฐ โ†’๋ชฉ๋ก์œผ๋กœ