The final enhancement of DQNs that we will briefly look at is appropriately titled Dueling networks. Here is a typical DQN architecture. A sequence of convolutional layers followed by a couple of fully connected layers that produce Q values. The core idea of dueling networks is to use two streams, one that estimates the state value function and one that estimates the advantage for each action. These streams may share some layers in the beginning such as convolutional layers, then branch off with their own fully-connected layers. Finally, the desired Q values are obtained by combining the state and advantage values. The intuition behind this is that the value of most states don’t vary a lot across actions. So, it makes sense to try and directly estimate them, but we still need to capture the difference actions make in each state. This is where the advantage function comes in. Some modifications are necessary to adapt Q learning to this architecture, which you can find in the dueling networks paper. Along with double DQNs and prioritized replay, this technique has resulted in significant improvement over vanilla DQNs.