Text Position-Aware Pixel Aggregation Network with Adaptive Gaussian Threshold: Detecting Text in the Wild

Published in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2023

Jiayu Xu#, Ailiang Lin# (equal contribution), Jinxing Li and Guangming Lu.

[PDF]

Abstract

Over recent years, deep learning has significantly boosted scene text detection performance, and current segmentation-based scene text detectors can achieve compact bounding boxes for irregular texts. However, it is also challenging to tackle crowded or overlapping texts for these existing methods due to conglutination between adjacent text instances in segmentation results. To address these issues, we propose a more accurate scene text detector, Text Position-Aware Pixel Aggregation Network, termed TPPAN. Specifically, a Gaussian threshold representation is adaptively learned instead of a constant setting in Adaptively Text Kernel Thresholding (ATKT) module to obtain more accurate text kernels. Then Text Position-Aware Region Pixel Aggregation (TPAR-PA) module predicts the text regions in relative positions and generates more accurate text contours. Adequate experiments have demonstrated that the resulting detector has achieved state-of-the-art performance on multi-oriented and curved scene text benchmarks.