The proliferation of gigapixel imaging has ushered in unprecedented challenges in object detection and tracking due to the intense computational demands. Previous deep learning approaches, often tailored for megapixel images, fall short in addressing the unique complexities presented by the gigapixel level. To bridge this gap, we introduce SaccadeMOT, a novel architecture designed for efficient gigapixel-level multi-object tracking. Based on our observations of density map regression in crowd counting and small object detection in object detection tasks, we propose a novel gigapixel detection paradigm that combines the strengths of both approaches. Firstly, the “saccade” stage swiftly identifies regions likely containing objects, followed by the “gaze” stage that refines the detection within these areas. This strategic region selection is complemented by a robust tracking mechanism that combines head and body tracking, enhancing accuracy in environments with potential occlusions. Validated on the PANDA dataset, SaccadeMOT not only demonstrates an 13× speed improvement over existing state-of-the-art tracker BotSORT but also exhibits promising applications in gigapixel-level pathology analysis, particularly in Whole Slide Imaging (WSI). This approach sets a new benchmark for handling super high-resolution images, offering significant advancements in both the speed and precision of object tracking technologies.