3 – Pooling Layers

We’re now ready to introduce you to the second and final type of layer that we’ll need to introduce before building our own convolutional neural networks. These so-called pooling layers often take convolutional layers as input. Recall that a convolutional layer is a stack of feature maps- where we have one feature map for each filter. A complicated dataset with many different object categories will require a large number of filters, each responsible for finding a pattern in the image. More filters means a bigger stack, which means that the dimensionality of our convolutional layers can get quite large. Higher dimensionality means we’ll need to use more parameters, which can lead to overfitting. Thus, we need a method for reducing this dimensionality. This is the role of pooling layers within a convolutional neural network. We’ll focus on two different types of pooling layers. The first type is a max pooling layer. Max pooling layers will take a stack of feature maps as input. Here we’ve enlarged and visualized all three of the feature maps. As with convolutional layers, we’ll define a window size and stride. In this case we’ll use a window size of two and a stride of two. To construct the max pooling layer, we’ll work with each feature mapped separately. Let’s begin with the first feature map. We start with our window in the top left corner of the image. The value of the corresponding node in the max pooling layer is calculated by just taking the maximum of the pixels contained in the window. In this case we had a 1, 9, 5, and 4 in our window, so 9 was the maximum. If we continue this process and do it for all of our feature maps, the output is a stack with the same number of feature maps, but each feature map has been reduced in width and height. In this case the width and height are half of that of the previous convolutional layer. Global average pooling is a bit different. For a layer of this type we specify neither window size nor stride. This type of pooling is a more extreme type of dimensionality reduction. It takes a stack of feature maps and computes the average value of the nodes for each map in the stack. As before, we’ll work with each feature map separately beginning with the first feature map. To get the average value of the nodes we first sum up all the values which yields 80. Then we divide by the total number of nodes, which is 16. This yields five, which is the value for the node here. Repeating the process for the remaining two feature maps, we get two values of four. Our final output is a stack of feature maps where each feature map has been reduced to a single value. In this way we see that a global average pooling layer takes a 3D array and turns it into a vector. Here we have a vector with three entries. Let’s summarize what we’ve learned with the food analogy. We’ll think of a convolutional layer as a stack of pancakes with one pancake for each feature map. Pooling layers take that stack and give us back a stack with the same number of pancakes except the output pancakes are smaller in width and height. Non-global pooling layers represent a moderate reduction in pancake size. Where each pancake is generally about half as tall and half as wide as its corresponding input pancake. The global pooling layers reduce each input pancake to essentially an output crumb. But, we still have one crumb for each input pancake. In this video, we’ve focused on two types of pooling layers, but feel free to read about the others in the Keras documentation linked below.

%d 블로거가 이것을 좋아합니다: