Here is the crucial point: the gradient on q was computed as negative (derivative_f_wrt_q = -4), so the circuit wants q to decrease, and with a force of 4! So if the + gate wants to contribute to making the final output value larger, it needs to listen to the gradient signal coming from the top.
the circuit wants wants to contribute it needs to listen to the gradient signal
обосрался с объясниней автора.