written by @marc_lelarge after this thread.
We always assume that the graph we consider is connected (and finite) so that there is always a shortest path between and .
See Chapter 4 of Algorithms - Sanjoy Dasgupta, Christos H. Papadimitriou, and Umesh V. Vazirani
Dijkstra's shortest-path algorithm
Input: Graph ; positive edge lengths ; vertex
Output: For all vertices reachable from , the distance from to :
for all :
(using dist-values as keys)
while is not empty:
for all edges :
if :
At the end of the algorithm, will hold for each node the identity of the node immediately before it on the shortest path from to .
The right data structure for is a priority queue which maintians a set of elements (nodes) with associated numeric key values () and supports the following operations:
Note that each node is added once in before the while loop and removed from during an iteration of the while loop. For each node , can only decrease in this algorithm and is always an upper bound for the true distance from to .
Property : when is removed from , then is the correct distance from to and is the correct node before in the shortest path from to .
Proof : This can be proved by induction with the following inductive hypothesis: at the end of each iteration of the while loop, the following conditions hold with , and : (1) all nodes in are at distance from and all nodes in are at distance from , and (2) for every node , the value is the length of the shortest path from to whose intermediate nodes are constrained to be in (if no such path exists, the value is ).
The base case is straightforward since is the first removed element from so that .
We denote by the true distance between and .
Points (1) and (2) of the inductive hypothesis imply that all nodes are at from : . If this is not the case, by (2), there exists on the shortest path from to . Since the edge lenghts are positive, the partial path from to is the shortest path between and and its lenght is strictly less than the distance between and but by (1) contradicting the fact that . As a result, nodes have been removed from the queue in increasing order of .
Assume now that the inductive hypothesis is correct and let , and . Let be the last node in on the shortest path from to and the node on this path neighbor of and closer to : . We have . By the inductive hypothesis: and hence . But since , we have , so that and we proved Clearly all nodes in are at distance and (2) is correct. Suppose there exists with . We can apply the same argument as above by considering the shortest pasth form to and the last node in : so that and since : , a contradiction. Hence we proved (1).
Instead of computing all the shortest paths from , we now have a target and want to compute efficiently the shortest path from to .
We have access to a heuristic fucntion that estimates the length of the path from any node to the target .
The A* algorithm is a simple variation of the Dijkstra's algorithm with a single modification: the priority in the queue is now the function:
A* algorithm
Input: Graph ; positive edge lengths ; vertex ; a vertex and an associated heuristic function estimating the distance from to .
Output: under some conditions on the heuristic , a shortest path from to .
for all :
(using keys: )
while is not empty:
if :
break
for all edges :
if :
if :
We still have that for each node , can only decrease and is always an upper bound for the true distance from to . Note that is the first node removed from (and nerver added back to ).
If the heuristic function is zero: , A* reduces to Dijkstra's algorithm.
In this case, it follows from previous proof that when is decreased, is always in , hence there is no insertion in and all steps are the same in both algorithms.
Accessible here: Demo and GitHub code
Lemma : For an optimal path , if is removed from and all the 's for have been removed from then is the true distance between and and will never be inserted back into .
Proof : the claim is clearly true for . Assume it is true for all and is removed from . Between the removal of and some other nodes might have been removed from but since for all , these removals will not modify any of the (and not add them back to ). Now after removal of , since is a neighbor, we have .
Consider a variant of A*, where we remove the break
. This variant of A* terminates, i.e. will become empty after a finite number of iterations since the number of possible values for all the is finite. Moreover, thanks to previous Lemma, in this variant of A*, when is last removed from then is the true distance between and (and all the for have been removed from ). In particular, the only way for A* to fail to return an optimal path is because it removes the target from too early. Under some conditions on the heuristic, we can guarantee that this will nerver happen.
A heuristic function is admissible if it never overestimates the actual cost to get to the goal: , where is the distance between and the goal .
A heuristic is consistent (or monotone) if for each edge , .
Property : every consistent heuristic is also admissible.
Proof : By definition, we have and we need to prove that .
Consider a shortest path from to denoted . We have as . Then and so on so that we get .
Property : A* is equivalent to Dijkstra’s algorithm on a graph with reduced lengths . Note that, since Dijkstra’s algorithm requires arc costs to be nonnegative, the heuristic needs to be consistent.
Running Dijkstra's algorithm with the reduced lengths, if is the current path from to , we have:
The result follows since is a constant and the fact that if is updated then .
As a consequence A* returns an optimal path with reduced costs but since , this optimal path is also optimal for the original cost.
It turns out, we only need to have an admissible heuristic for A* to find shortest paths:
Property : if the heuristic is admissible, then A* always finds a shortest path.
This can be proved by contradiction. Assume that the path retruned by A* between and is not optimal, i.e. when is removed from . Consider just before is chosen. Denote by an optimal path from to . Since , and , by previous Lemma, there exists a node such that for all , and . But since has been chosen and , we have . But we have (since has been removed from and ) and (by admissibility). Hence, we get , a contradiction.
public
agreginfo