python - numpy indexing with multiple arrays -
given 2 sequences of data (of equal length) , quality values each data point, want calculate similarity score based upon given scoring matrix.
what efficient way vectorize following loop:
score = 0 in xrange(len(seq1)): score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]]
similarity
4-dimensional float array, shape=(32, 32, 100, 100); seq1
, seq2
, qual1
, qual2
1-dimensional int arrays of equal length (of order 1000 - 40000).
shouldn't work(tm)?
>>> score = 0 >>> in xrange(len(seq1)): score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]] ... >>> score 498.71792400493433 >>> similarity[seq1,seq2, qual1, qual2].sum() 498.71792400493433
code:
import numpy np similarity = np.random.random((32, 32, 100, 100)) n = 1000 seq1, seq2, qual1, qual2 = [np.random.randint(0, s, n) s in similarity.shape] def slow(): score = 0 in xrange(len(seq1)): score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]] return score def fast(): return similarity[seq1, seq2, qual1, qual2].sum()
gives:
>>> timeit slow() 100 loops, best of 3: 3.59 ms per loop >>> timeit fast() 10000 loops, best of 3: 143 per loop >>> np.allclose(slow(),fast()) true
Comments
Post a Comment