python - numpy indexing with multiple arrays -


given 2 sequences of data (of equal length) , quality values each data point, want calculate similarity score based upon given scoring matrix.

what efficient way vectorize following loop:

score = 0 in xrange(len(seq1)):     score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]] 

similarity 4-dimensional float array, shape=(32, 32, 100, 100); seq1, seq2, qual1 , qual2 1-dimensional int arrays of equal length (of order 1000 - 40000).

shouldn't work(tm)?

>>> score = 0 >>> in xrange(len(seq1)):         score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]] ...      >>> score 498.71792400493433 >>> similarity[seq1,seq2, qual1, qual2].sum() 498.71792400493433 

code:

import numpy np  similarity = np.random.random((32, 32, 100, 100)) n = 1000 seq1, seq2, qual1, qual2 = [np.random.randint(0, s, n) s in similarity.shape]  def slow():     score = 0     in xrange(len(seq1)):         score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]]     return score  def fast():     return similarity[seq1, seq2, qual1, qual2].sum() 

gives:

>>> timeit slow() 100 loops, best of 3: 3.59 ms per loop >>> timeit fast() 10000 loops, best of 3: 143 per loop >>> np.allclose(slow(),fast()) true 

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -