python - Representing a ragged array in numpy by padding -


i have 1-dimensional numpy array scores of scores associated objects. these objects belong disjoint groups, , scores of items in first group first, followed scores of items in second group, etc.

i'd create 2-dimensional array each row corresponds group, , each entry score of 1 of items. if groups of same size can do:

scores.reshape((numgroups, groupsize)) 

unfortunately, groups may of varying size. understand numpy doesn't support ragged arrays, fine me if resulting array pads each row specified value make rows same length.

to make concrete, suppose have set a 3 items, b 2 items, , c 4 items.

scores = numpy.array([f(a[0]), f(a[1]), f(a[2]), f(b[0]), f(b[1]),                         f(c[0]), f(c[1]), f(c[2]), f(c[3])]) rowstarts = numpy.array([0, 3, 5]) paddingvalue = -1.0 scoresbygroup = groupintorows(scores, rowstarts, paddingvalue) 

the desired value of scoresbygroup be:

 [[f(a[0]), f(a[1]), f(a[2]), -1.0],      [f(b[0]), f(b[1]), -1.0, -1.0]     [f(c[0]), f(c[1]), f(c[2]), f(c[3])]] 

is there numpy function or composition of functions can use create groupintorows?

background:

  • this operation used in calculating loss minibatch gradient descent algorithm in theano, that's why need keep composition of numpy functions if possible, rather falling on native python.
  • it's fine assume there known maximum row size
  • the original objects being scored vectors , scoring function matrix multiplication, why flatten things out in first place. possible pad maximum item set size before doing matrix multiplication, biggest set on ten times bigger average set size, undesirable speed reasons.

try this:

scores = np.random.rand(9) row_starts = np.array([0, 3, 5]) row_ends = np.concatenate((row_starts, [len(scores)])) lens = np.diff(row_ends) pad_len = np.max(lens) - lens where_to_pad = np.repeat(row_ends[1:], pad_len) padding_value = -1.0 padded_scores = np.insert(scores, where_to_pad,                           padding_value).reshape(-1, np.max(lens))  >>> padded_scores array([[ 0.05878244,  0.40804443,  0.35640463, -1.        ],        [ 0.39365072,  0.85313545, -1.        , -1.        ],        [ 0.133687  ,  0.73651147,  0.98531828,  0.78940163]]) 

Comments

Popular posts from this blog

linux - Does gcc have any options to add version info in ELF binary file? -

android - send complex objects as post php java -

charts - What graph/dashboard product is facebook using in Dashboard: PUE & WUE -