The first convolutional layer in a CNN applies a set of image filters to an input image and outputs a stack of feature maps. After such a network is trained, we can get a sense of what kinds of features this first layer has learned to extract by visualizing the learned weights for each of these filters. We know that filters are grids of weight values and so we can visualize them by treating them like a small image. 1 3 by three filter for a grayscale image can be pictured as a three-by-three grayscale pixel image, where each pixel contains a learned weight value. The bright pixels can be thought of as positive or high values for weights. The darker pixels correspond to small or negative weights. So, a kernel that looks like this with bright values on the top and dark values on the bottom, we might think that it subtracts bottom pixels from top pixels in a patch and so, it can detect horizontal edges in an image. For a CNN that analyzes color images, the filters just have another dimension for color, and can be displayed as small color images. Here are some examples of filters from the first convolutional layer of a CNN with 11 by 11 color filters. You can see that a lot of these filters are looking for specific kinds of geometric patterns, vertical or horizontal edges, which are represented by alternating bars of light and dark at different angles. You can also see corner detectors and filters that appear to detect areas of opposing colors with magenta and green bunches or blue and orange lines. Now you might be thinking great, this is really simple. I can just look at the filter weights of a trans CNN to see what my network is learning. But this becomes tricky as soon as you move further into the network and try to look at the second or even third convolutional layer. This is because these later layers are no longer directly connected to the input image. They’re connected to some output that has likely been pooled and processed by an activation function. So, visualizing the filters in these deeper layers doesn’t clearly tell us what these layers are actually seen in the original input image. So we’ll need to develop a different technique to really see what’s going on in these intermediate layers.