SuperPoint Outline

FRAMEWORK

MAINLY THREE PARTS（pic above）：

Interest Point Pre-Training
1. use synthetic dataset（easy to get corner pts, e.g. L/Y/T junctions…）
2. train base detector（what is the detector‘s arch?）
3. transfer to real img（next steps）
Interest Point Self-Labeling（sample random homography，generate pseudo-GT）
1. use base detector to inference（init interest pts in real img）
2. use homographic adaption（detail？）
Joint Training（interest pts and descriptor）
1. loss
2. network arch

Untitled

encoder-decoder architecture
1. shared encoder（advantages？）
2. two heads
Shared Encoder
1. VGG-styled
2. pixel cells
  three 2×2 non-overlapping max pooling operations in the encoder result in 8 × 8 pixel cells
Interest Point Decoder
1. NO upsampling layers（high computation & unwanted checkerboard artifacts）
2. designed the interest point detection head（with an explicit decoder）
  This decoder has no parameters, and is known as “sub-pixel convolution” or “depth to space” in TensorFlow or “pixel shuffle” in PyTorch
Descriptor Decoder
1. similar to UCN（Universal Correspondence Network）
2. perform bicubic interpolation of the descriptor and then L2-normalizes（fixed）

Untitled

Untitled

formulation
- $I$: input image
- $x$: resulting interest points
- $f_\theta$: network
- $\mathcal{H}$: homography
improved super-point detector
choosing homographies
Iterative Homographic Adaptation