One way to solve this is to use random restarts, and this is just very simple. We start from a few different random places and do gradient descend from all of them. This increases the probability that we’ll get to the global minimum, or at least a pretty good local minimum.