AI-ML·중요도 7·2026. 06. 11.·r/MachineLearning

Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting [R]

── KO ──────────────────

이터레이션 없이 컴퓨팅 오버헤드를 줄이는 적응형 비디오 토큰화 방법을 소개합니다.

이 논문에서는 시퀀스의 시각적 복잡성에 따라 동적으로 토큰 예산을 할당하는 적응형 비디오 토큰화를 다룹니다. 저자들은 기존 방법들이 겪는 계산적 오버헤드를 줄이는 새로운 접근 방식을 제시하며, 고정 임계값을 적용해 중복된 잠재 위치를 식별하고 제거하는 파라미터 없는 적응형 토큰 할당 메커니즘을 도입합니다. 이를 통해 정적 장면에서 더 강력한 압축이 이루어지고, 동적 장면에서는 더 많은 토큰이 유지됩니다. 마지막으로 Latent Inpainting Transformer를 사용해 제거된 위치를 재구성하는 간단하고 효율적인 모델을 제안합니다.

── EN ──────────────────

Introducing a method for adaptive video tokenization that reduces computational overhead without iteration.

This paper addresses adaptive video tokenization, which dynamically allocates token budgets based on visual complexity. The authors present a new approach that reduces computational overheads experienced by existing methods, introducing a parameter-free adaptive token allocation mechanism that identifies and drops redundant latent positions using a fixed threshold. This results in stronger compression for static scenes and retention of more tokens for dynamic sequences. The Latent Inpainting Transformer is proposed for reconstructing dropped positions, offering a simple and efficient model.

원문 보기 →목록으로