Tags · yushangdi/executorch

ciflow/trunk/4144

Merge branch 'upstream/main' into debug_features

Change-Id: Iad9360b091111365847bde16fc8a1e8705a520f5

Jul 22, 2024
b3a98ed
zip
tar.gz

ciflow/trunk/4114

Merge branch 'upstream/main' into op_sigmoid

Change-Id: I0e688fae977eb090a135f8ff8828d2f641370a39

Jul 22, 2024
ff8d9d1
zip
tar.gz

ciflow/trunk/4074

Merge branch 'upstream/main' into sub_op

Change-Id: Id70cc7f9d7787b02defb6981dbaf292937f1982f

Jul 22, 2024
9c38fa9
zip
tar.gz

ciflow/trunk/4073

Merge branch 'upstream/main' into op_full

Change-Id: I68062223b0baaf91192784e2eb04e06677c3280f

Jul 22, 2024
ced369e
zip
tar.gz

ciflow/periodic/4316

skip NoneType spec in vulkan_graph_builder

Summary:
This comes up in dynamic shape ops.
Example error message: ``RuntimeError: Cannot create value for spec of type <class 'NoneType'>``

Differential Revision: D59028536

Jul 19, 2024
10b2a8c
zip
tar.gz

ciflow/trunk/3786

use index_put only in kv cache update to reduce number of operators (p…

…ytorch#3786)

Summary:
Pull Request resolved: pytorch#3786

The decomposition from

```
class IndexPut(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x, input_pos, value):
        x[:, :, input_pos] = value
        return x
```

is
```
opcode         name             target                      args                                             kwargs
-------------  ---------------  --------------------------  -----------------------------------------------  --------
placeholder    x                x                           ()                                               {}
placeholder    input_pos        input_pos                   ()                                               {}
placeholder    value            value                       ()                                               {}
call_function  slice_1          aten.slice.Tensor           (x, 0, 0, 9223372036854775807)                   {}
call_function  slice_2          aten.slice.Tensor           (slice_1, 1, 0, 9223372036854775807)             {}
call_function  index_put        aten.index_put.default      (slice_2, [None, None, input_pos], value)        {}
call_function  slice_3          aten.slice.Tensor           (x, 0, 0, 9223372036854775807)                   {}
call_function  slice_scatter    aten.slice_scatter.default  (slice_3, index_put, 1, 0, 9223372036854775807)  {}
call_function  slice_scatter_1  aten.slice_scatter.default  (x, slice_scatter, 0, 0, 9223372036854775807)    {}
output         output           output                      ((slice_scatter_1, slice_scatter_1),)            {}
```

however `x[:, :, input_pos] = value` really is just updating the content inside `x` with value, essentially just `index_put`

By replacing `x[:, :, input_pos] = value` with `torch.ops.aten.index_put_(x, [None, None, input_pos], value)`, we reduce the number of operators from 6 to 1.

```
class IndexPut(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x, indices, values):
        torch.ops.aten.index_put_(x, [None, None, input_pos], value)
        return x
```
decomposition is
```
opcode         name       target                  args                                 kwargs
-------------  ---------  ----------------------  -----------------------------------  --------
placeholder    x          x                       ()                                   {}
placeholder    input_pos  input_pos               ()                                   {}
placeholder    value      value                   ()                                   {}
call_function  index_put  aten.index_put.default  (x, [None, None, input_pos], value)  {}
output         output     output                  ((index_put, index_put),)            {}
```

A more proper way to address this in long term is via pattern matching to replace the patterns with the simplified pattern

Perf:
For stories, before the diff
```
I 00:00:03.437290 executorch:runner.cpp:419] 	Prompt Tokens: 9    Generated Tokens: 118
I 00:00:03.437295 executorch:runner.cpp:425] 	Model Load Time:		0.763000 (seconds)
I 00:00:03.437301 executorch:runner.cpp:435] 	Total inference time:		2.661000 (seconds)		 Rate: 	44.344231 (tokens/second)
I 00:00:03.437305 executorch:runner.cpp:443] 		Prompt evaluation:	0.185000 (seconds)		 Rate: 	48.648649 (tokens/second)
I 00:00:03.437309 executorch:runner.cpp:454] 		Generated 118 tokens:	2.476000 (seconds)		 Rate: 	47.657512 (tokens/second)
I 00:00:03.437313 executorch:runner.cpp:462] 	Time to first generated token:	0.206000 (seconds)
I 00:00:03.437315 executorch:runner.cpp:469] 	Sampling time over 127 tokens:	0.042000 (seconds)
```
After the diff
```
I 00:00:03.195257 executorch:runner.cpp:419] 	Prompt Tokens: 9    Generated Tokens: 118
I 00:00:03.195295 executorch:runner.cpp:425] 	Model Load Time:		0.683000 (seconds)
I 00:00:03.195314 executorch:runner.cpp:435] 	Total inference time:		2.502000 (seconds)		 Rate: 	47.162270 (tokens/second)
I 00:00:03.195319 executorch:runner.cpp:443] 		Prompt evaluation:	0.175000 (seconds)		 Rate: 	51.428571 (tokens/second)
I 00:00:03.195323 executorch:runner.cpp:454] 		Generated 118 tokens:	2.327000 (seconds)		 Rate: 	50.709067 (tokens/second)
I 00:00:03.195327 executorch:runner.cpp:462] 	Time to first generated token:	0.195000 (seconds)
I 00:00:03.195330 executorch:runner.cpp:469] 	Sampling time over 127 tokens:	0.049000 (seconds)
```

Differential Revision: D57949659

Jul 18, 2024
6574788
zip
tar.gz

ciflow/trunk/4076

more fix

Jul 1, 2024
954eaaf
zip
tar.gz

ciflow/trunk/4072

Add slice op to Arm backend

Implements node visitor and tests.

Also implements a io_config in ArmQuantizer
as a fallback. The io_config
QuantizationConfig is applied to placeholders
and outputs that miss annotation after all
other annotation is applied.

The intended use is for unit testing
quantization of operations
without quantization annotators.

Signed-off-by: Erik Lundell <erik.lundell@arm.com>
Change-Id: Iae7dc3f1dc2afe23776566f0e9904271cde0892a

Jun 24, 2024
f99a32f
zip
tar.gz

v0.3.0-rc1

Enable build on aarch64 linux (pytorch#3896) (pytorch#4017)

Summary:
There's a mismatch in torch and torchvision dependecies for the linux-aarch64 packages and missing support in resolve_buck script.

Signed-off-by: Per Åstrand <per.astrand@arm.com>

Change-Id: I491499ca5e524fd2788919b6446a370fe44fdb86

Pull Request resolved: pytorch#3896

Reviewed By: digantdesai

Differential Revision: D58741803

Pulled By: mergennachin

fbshipit-source-id: 7fe598da58ea6fc29726f38cfb394a9eda832c44
(cherry picked from commit 337174c)

Co-authored-by: Per Åstrand <per.astrand@arm.com>

Jun 20, 2024
cd4be43
zip
tar.gz

v0.2.1

Upgrade versions. (pytorch#3903)

Jun 7, 2024
088cedf
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ciflow/trunk/4144

ciflow/trunk/4114

ciflow/trunk/4074

ciflow/trunk/4073

ciflow/periodic/4316

ciflow/trunk/3786

ciflow/trunk/4076

ciflow/trunk/4072

v0.3.0-rc1

v0.2.1

Tags: yushangdi/executorch