Beyond Monte Carlo: leveraging temporal difference learning for superior performance in dynamic resource allocation uri icon