The JFA has desirable attributes in GPU computation, notably for its efficient performance. However, it is only an approximate algorithm and does not always compute the correct result for every pixel, although in practice errors are few and the magnitude of errors is generally small.[1]
Implementation
The JFA original formulation is simple to implement.
Take an grid of pixels[2] (like an image or texture). All pixels will start with an "undefined" color unless it is a uniquely-colored "seed" pixel. As the JFA progresses, each undefined pixel will be filled with a color corresponding to that of a seed pixel.
For each step size , run one iteration of the JFA:
Iterate over every pixel at .
For each neighbor at where :
if is undefined and is colored, change 's color to 's
if is colored and is colored, if where and are the seed pixels for and , respectively, then change 's color to 's.
Note that pixels may change color more than once in each step, and that the JFA does not specify a method for resolving cases where distances are equal, therefore the last-checked pixel's color is used above.
The JFA finishes after evaluating the last pixel in the last step size. Regardless of the content of the initial data, the innermost loop runs a total of times over each pixel, for an overall computational complexity of .
Variants
Some variants of JFA are:
Additional pass at the end: JFA+1 has one additional pass with step size of 1, i.e. the step sizes are N/2, N/4, ..., 1, 1; JFA+2 has two additional passes with step sizes of 2 and 1, i.e. the step sizes are N/2, N/4, ..., 1, 2, 1; JFA has additional passes, i.e. the step sizes are N/2, N/4, ..., 1, N/2, N/4, ..., 1. JFA+1 has much fewer errors than JFA, and JFA+2 has even fewer errors.[1]
Additional pass at the beginning: 1+JFA has one additional pass with step size of 1, i.e. the step sizes are 1, N/2, N/4, ..., 1. 1+JFA has very low error rate (similar to JFA+2) and the same performance as JFA+1.[3]
Half resolution: This variant runs normal JFA at a half resolution, and enlarge the result into the original resolution and run one additional pass with step size of 1. Because most of the passes has only half resolution, the speed of this variant is much faster than the full resolution JFA.[3]
The JFA has inspired the development of numerous similar algorithms. Some have well-defined error properties which make them useful for scientific computing.[11]
^The original paper uses the optimal case, a square grid, as an example, but a grid of any size works. Albeit with reduced efficiency. See this StackOverflow question for more.