algorithm - Finding all pairs of sequences that differ at exactly one position -
i need data structure representing set of sequences (all of same, known, length) following non-standard operation:
find 2 sequences in set differ @ 1 index. (or establish no such pair exists.)
if n
length of sequences , m
number of sequences, there obvious o(n*m*m)
algorithm. wonder if there standard way of solving more efficiently. i'm willing apply pre-processing if needed.
bonus points if instead of returning pair, algorithm returns all sequences differ @ same point.
alternatively, interested in solution can check efficiently whether particular sequence differs @ 1 index sequence in set. if helps, can assume in set, no 2 sequences have property.
edit: can assume n
reasonably small. this, mean improvements such o(log(n)*m*m)
not useful use case.
for each sequence , each position in sequence, calculate hash of sequence without position , add hash table. if there entry in table, have found potential pair differs in 1 position. using rolling hashes both start , end , combining them, can calculate each hash in constant time. total running time expected o(n*m).
Comments
Post a Comment