Accelerating a matrix multiplication in branch neural networks #25242

cshsgy · 2024-12-04T05:22:44Z

cshsgy
Dec 4, 2024

Hi There! I am writing a code right now to do a type-based matrix multiplication. The problem I am interested to solve is relevant to the branched neural networks. I have an input data x of size (n_seq, n_in), and a branch number data br, which is integer numbers between 0~19 of size (n_seq,). For each of the branch I host a different matrix, so the weights w have size (n_branch, n_in, n_out). The operation is, for each i in n_seq, we use the corresponding weights to calculate the output, so y[i] = x[i] @ w[br[i]]
Let's just talk about forward for now. I am currently doing tests on V100 cards. I permuted quite several options of doing this and found the following the most efficient:

@jit
def branched_linear(x, w, br):
    def single_branch(branch_idx):
        return (x @ w[branch_idx]) * (br == branch_idx)[:, :, None]
    return jnp.sum(single_branch(jnp.arange(20), axis=0)

This takes about 23 ms for a 1,280,000 * 64 input and n_out=32. I think since this does about 20 times more calculations than needed, there should be a way to accelerate it by at least several times. Any suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating a matrix multiplication in branch neural networks #25242

{{title}}

Replies: 0 comments

Select a reply

Accelerating a matrix multiplication in branch neural networks #25242

cshsgy Dec 4, 2024

Replies: 0 comments

cshsgy
Dec 4, 2024