EventAnalysis
1.3.0
|
#include <levenshtein-sse.hpp>
Static Public Member Functions | |
static void | perform (const Iterator1 &a, const Iterator2 &b, std::size_t &i, std::size_t j, std::size_t bLen, Vec1 &diag, const Vec2 &diag2) |
Some notes on the algorithm used here:
The standard Levenshtein metric is usually calculated using the Wagner-Fischer algorithm 1. This essentially means calculating Levenshtein(a[:n], b[:m]) for all prefixes a[:n] and b[:m] of a and b, storing the results in a table (and possibly "forgetting" the previous rows to save memory). Normally, this is performed row by row, but this approach has the disadvantage that it is not easily converted to using SIMD instructions.
The "diagonal" algorithm variants in this file calculate the minor diagonals one by one. There, the index i
always corresponds to the table row in which we currently are and j
always denotes the column. Note that the tables have a trivial i=0 row and j=0 column; The table entry [i,j] corresponds to the comparison a[i-1] == b[j-1]. A diagonal is defined by its index k
, yielding the inner loop invariant k == i + j.
The outer loop iterates over all diagonals from k = 0 to k = len(a)+len(b) (compare with the table in 1 to verify), meaning that the length of the diagonals may vary between iterations. The current diagonal and the previous diagonal are kept in memory, referred to as diag and diag2 respectively.
In the inner loop, the rows are treated in descending order, i.e. i
decreases from iteration to iteration, to avoid overwriting data needed in the next iteration.
Trivial implementation of one inner loop iteration.
Note that i is passed as a reference; The SIMD version may decrease it by more than 1.
|
inlinestatic |