7 – 07 Fast RCNN V1 RENDER V2

The next advancement in region-based CNNs came with the Fast R-CNN architecture. Instead of processing each region of interest individually through a classification CNN, this architecture runs the entire image through a classification CNN only once. The image goes through a series of convolutional and pooling layers and at the end of these layers, we get a stack of feature maps. We still need to identify regions of interest but instead of cropping the original image, we project these proposals into the smaller feature map layer. Each region in the feature map corresponds to a larger region in the original image. So we can grab selected regions in this feature map and feed them one by one into a fully connected layer that generates a class for each of these different regions. In this model we complete the most time-consuming steps, processing an image through a series of convolutional layers only once and then selectively use that map to get our desired outputs. Again, we have to handle the variable sizes and these protections, since layers further in the network are expecting input of a fixed size. So, we do something called ROI pooling to warp these regions into a consistent size before giving them to a fully connected layer. Now this network is faster than R-CNN but it’s still slow when faced with a test image for which it has to generate region proposals and it’s still looking at regions that do not contain objects at all. The next architecture we’ll look at aims to improve this region generation step.

%d 블로거가 이것을 좋아합니다: