[inductor] Lower aten.index_add_/aten.index_add #885

desertfire · 2022-08-18T16:44:02Z

Should be fairly similar to how aten.index_put_/aten.index_put is currently handled, see

torchdynamo/torchinductor/lowering.py

Line 1445 in 7cc850c

def index_put_(self, indices, values, accumulate=False):
torchdynamo/torchinductor/decomposition.py

Line 357 in 7cc850c

def index_put(self, indices, values, accumulate=False):

The text was updated successfully, but these errors were encountered:

eellison · 2022-08-24T00:20:38Z

I looked into this a bit, I'm not sure it has that much in common with index_put, since index_add can accumulate arbitrarily many indices of the same value. Also, the usage seems to be pretty different -
for fastNLP_Bert you have inputs: Operator: aten.index_put_.default cnt: 1, ((T([6, 476], i64), [T([6], i64), T([6], i64)], T([], i64)), {}) (aka indices of length 6).

For e.g. hf_Longformer you have the following: ((T([2359296], f16), 0, T([4718592], i64), T([4718592], f16)), {}) (aka, indices of length 4718592).

I can't think of a lowering of index_add within inductor that makes a ton of sense at least within the current IR. Maybe a custom template op (similar to matmul/conv) might make sense ? Or maybe there's a way to generalize this pattern with embedding_bag. Both are pretty common in huggingface.

cc @ngimel @jansel

jansel · 2022-08-24T01:40:08Z

You should be able to map this to ir.Scatter(..., scatter_mode="atomic_add").

ir.Scatter is very similar to ir.Pointwise, except:

ir.Pointwise computes

out[x] = inner_fn(x)

ir.Scatter computes

out[output_indexer(x)] = inner_fn(x)

If you set scatter_mode="atomic_add" it instead does:

out[output_indexer(x)] += inner_fn(x)

In a thread-safe way.

The iteration ranges of scatter then become the iteration space of the scatter loops (which will be different than the size of the output).

The lowering should be something like this (warning, this is untested):

def index_add_(self, dim, index, source, *, alpha=1):
    index_loader = index.make_loader()
    source_loader = source.make_loader()

    def output_indexer(index):
        index = list(index)
        index[dim] = ops.indirect_indexing(index_loader([index[dim]]))
        return index

    def fn(index):
        return ops.mul(
            ops.constant(alpha, source.get_dtype()), 
            source_loader(index)
        )
    
    scatter = ir.Scatter(
        device=self.get_device(),
        dtype=self.get_dtype(),
        inner_fn=fn,
        ranges=list(source.get_size()),
        output_indexer=output_indexer,
        scatter_mode="atomic_add"
    )
    buffer = ir.ComputedBuffer(
        None,
        ir.MutationLayout(self),
        scatter,
    )
    buffer.name = V.graph.register_buffer(buffer)
    return self

ngimel · 2022-08-24T19:04:06Z

For completeness, we should also add index_select (which can be done as Pointwise, similar toindex)

ngimel · 2022-09-13T21:09:56Z

@eellison index_put also accumulates arbitrarily many indices into the same location

lezcano · 2022-09-14T08:32:40Z

This and index_copy can be done as a decomposition in terms of index_put doing something like x[:, :, ..., idx] = tensor. Writing a lowering should also be easy, as it's pretty much the same as index_select which also has a lowering. I was going to implement the two and see how they compare, but a priori I think they should be comparable, so the decomposition should be a better option.

lezcano · 2022-09-14T12:36:45Z

The decompositions are here: pytorch/pytorch#85002.

@eellison if you implement the lowering, you can benchmark it against the decomposition see how they fare.

lezcano · 2022-09-14T14:42:33Z

Of course, if you guys think that the decomposition is more convenient than the lowering, I could implement the decomposition for index_select and remove the lowering.

jansel mentioned this issue Feb 1, 2023

TorchInductor missing ops tracker pytorch/pytorch#93757

Closed

45 tasks

desertfire changed the title ~~aten.index_add_~~ [inductor] Lower aten.index_add_ Aug 18, 2022

desertfire added inductor vertical:huggingface labels Aug 18, 2022

desertfire changed the title ~~[inductor] Lower aten.index_add_~~ [inductor] Lower aten.index_add_/aten.index_add Aug 18, 2022

eellison self-assigned this Aug 19, 2022

desertfire assigned lezcano and unassigned eellison Sep 13, 2022

lezcano mentioned this issue Sep 21, 2022

Remove lowering for index_select and add decomp for index_{add,add_} #1292

Merged

lezcano closed this as completed in #1292 Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Lower aten.index_add_/aten.index_add #885

[inductor] Lower aten.index_add_/aten.index_add #885

desertfire commented Aug 18, 2022 •

edited

Loading

eellison commented Aug 24, 2022

jansel commented Aug 24, 2022 •

edited

Loading

ngimel commented Aug 24, 2022

ngimel commented Sep 13, 2022

lezcano commented Sep 14, 2022 •

edited

Loading

lezcano commented Sep 14, 2022

lezcano commented Sep 14, 2022

[inductor] Lower aten.index_add_/aten.index_add #885

[inductor] Lower aten.index_add_/aten.index_add #885

Comments

desertfire commented Aug 18, 2022 • edited Loading

eellison commented Aug 24, 2022

jansel commented Aug 24, 2022 • edited Loading

ngimel commented Aug 24, 2022

ngimel commented Sep 13, 2022

lezcano commented Sep 14, 2022 • edited Loading

lezcano commented Sep 14, 2022

lezcano commented Sep 14, 2022

desertfire commented Aug 18, 2022 •

edited

Loading

jansel commented Aug 24, 2022 •

edited

Loading

lezcano commented Sep 14, 2022 •

edited

Loading