A state is Markov if and only if
In an RL agent, there are policy, value function, and model. Policy is what we want to learn by experiences – learning how to maximize the reward by trial and error. Value function is the prediction of future reward. It evaluates the goodness and badness of states.
Lastly, model predicts what the environment will do next. MDP does not require to have a model.
This concept can be applied to the UAS routing algorithm, especially in grid systems.