AI-ML·중요도 7·2026. 05. 23.·r/MachineLearning

Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]

── KO ──────────────────

WordDetectorNet은 픽셀별 경계 상자 회귀 및 DBSCAN을 이용한 손글씨 단어 탐지 모델입니다.

WordDetectorNet은 픽셀 기반의 단어 탐지 기법을 사용하여 각 픽셀이 단어 픽셀일 경우, 경계 상자와의 거리 정보를 회귀하는 독특한 방식을 채택합니다. 이 구조는 ResNet18을 기반으로 하며, 중간 단계에서 여러 특징을 결합하여 처리합니다. 세분화된 출력으로부터 후보 박스를 생성하고, DBSCAN을 사용해 최종 결과를 도출합니다. 이 포스팅은 모델의 아키텍처와 훈련 과정에 대해 자세히 설명하고 있습니다.

── EN ──────────────────

WordDetectorNet uses per-pixel bounding box regression and DBSCAN for handwritten word detection.

WordDetectorNet employs a pixel-based word detection approach where each pixel classified as a 'word pixel' regressively outputs distances to a bounding box. This architecture, based on ResNet18, combines features from various scales and generates outputs for candidate boxes, ultimately refining results through DBSCAN. The article provides a detailed overview of the model's architecture and training process, highlighting its unique design choices.

원문 보기 →목록으로