Introduced in Girshick1), R-CNNs or Regions with Convolutional Neural Networks are an architecture.
The Inception architecture2) combines R-CNNs with strategic use of $1×1$ convolutional layers, and some other techniques (pooling), for a high-performance result.
How is this the Inception architecture3) different from that of the Le et al architecture4)?