


Recent and more accurate detection methods like Fast R-CNN and Faster R-CNN advocate using features computed from a single scale, because it offers a good trade-off between accuracy and speed. Multi-scale detection, however, still performs better, especially for small objects.


SSD and MS-CNN predict objects at multiple layers of the feature hierachy without combining features or scores.

There are recent methods exploiting lateral/skip connections that associate low-level feature maps across resolutions and semantic levels, including U-Net and Sharp-Mask for segmentation,…Ghiasi et al. present a Laplacian pyramidal presentation for FCNs to progressively refine segmentation.

Although these methods adopt architectures with pyramidal shapes, they are unlike featurized image pyramids where predictions are made independently at all levels, see Fig. 2. In fact, for