8000 Tags · yushangdi/executorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: yushangdi/executorch

Tags

ciflow/trunk/4144

Toggle ciflow/trunk/4144's commit message
Merge branch 'upstream/main' into debug_features

Change-Id: Iad9360b091111365847bde16fc8a1e8705a520f5

ciflow/trunk/4114

Toggle ciflow/trunk/4114's commit message
Merge branch 'upstream/main' into op_sigmoid

Change-Id: I0e688fae977eb090a135f8ff8828d2f641370a39

ciflow/trunk/4074

Toggle ciflow/trunk/4074's commit message
Merge branch 'upstream/main' into sub_op

Change-Id: Id70cc7f9d7787b02defb6981dbaf292937f1982f

ciflow/trunk/4073

Toggle ciflow/trunk/4073's commit message
Merge branch 'upstream/main' into op_full

Change-Id: I68062223b0baaf91192784e2eb04e06677c3280f

ciflow/periodic/4316

Toggle ciflow/periodic/4316's commit message
skip NoneType spec in vulkan_graph_builder

Summary:
This comes up in dynamic shape ops.
Example error message: ``RuntimeError: Cannot create value for spec of type <class 'NoneType'>``

Differential Revision: D59028536

ciflow/trunk/3786

Toggle ciflow/trunk/3786's commit message
use index_put only in kv cache update to reduce number of operators (p…

…ytorch#3786)

Summary:
Pull Request resolved: pytorch#3786

The decomposition from

```
class IndexPut(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x, input_pos, value):
        x[:, :, input_pos] = value
        return x
```

is
```
opcode         name             target                      args                                             kwargs
-------------  ---------------  --------------------------  -----------------------------------------------  --------
placeholder    x                x                           ()                                               {}
placeholder    input_pos        input_pos                   ()                                               {}
placeholder    value            value                       ()                                               {}
call_function  slice_1          aten.slice.Tensor           (x, 0, 0, 9223372036854775807)                   {}
call_function  slice_2          aten.slice.Tensor           (slice_1, 1, 0, 9223372036854775807)             {}
call_function  index_put        aten.index_put.default      (slice_2, [None, None, input_pos], value)        {}
call_function  slice_3          aten.slice.Tensor           (x, 0, 0, 9223372036854775807)                   {}
call_function  slice_scatter    aten.slice_scatter.default  (slice_3, index_put, 1, 0, 9223372036854775807)  {}
call_function  slice_scatter_1  aten.slice_scatter.default  (x, slice_scatter, 0, 0, 9223372036854775807)    {}
output         output           output                      ((slice_scatter_1, slice_scatter_1),)            {}
```

however `x[:, :, input_pos] = value` really is just updating the content inside `x` with value, essentially just `index_put`

By replacing `x[:, :, input_pos] = value` with `torch.ops.aten.index_put_(x, [None, None, input_pos], value)`, we reduce the number of operators from 6 to 1.

```
class IndexPut(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x, indices, values):
        torch.ops.aten.index_put_(x, [None, None, input_pos], value)
        return x
```
decomposition is
```
opcode         name       target                  args                                 kwargs
-------------  ---------  ----------------------  -----------------------------------  --------
placeholder    x          x                       ()                                   {}
placeholder    input_pos  input_pos               ()                                   {}
placeholder    value      value                   ()                                   {}
call_function  index_put  aten.index_put.default  (x, [None, None, input_pos], value)  {}
output         output     output                  ((index_put, index_put),)            {}
```

A more proper way to address this in long term is via pattern matching to replace the patterns with the simplified pattern

Perf:
For stories, before the diff
```
I 00:00:03.437290 executorch:runner.cpp:419] 	Prompt Tokens: 9    Generated Tokens: 118
I 00:00:03.437295 executorch:runner.cpp:425] 	Model Load Time:		0.763000 (seconds)
I 00:00:03.437301 executorch:runner.cpp:435] 	Total inference time:		2.661000 (seconds)		 Rate: 	44.344231 (tokens/second)
I 00:00:03.437305 executorch:runner.cpp:443] 		Prompt evaluation:	0.185000 (seconds)		 Rate: 	48.648649 (tokens/second)
I 00:00:03.437309 executorch:runner.cpp:454] 		Generated 118 tokens:	2.476000 (seconds)		 Rate: 	47.657512 (tokens/second)
I 00:00:03.437313 executorch:runner.cpp:462] 	Time to first generated token:	0.206000 (seconds)
I 00:00:03.437315 executorch:runner.cpp:469] 	Sampling time over 127 tokens:	0.042000 (seconds)
```
After the diff
```
I 00:00:03.195257 executorch:runner.cpp:419] 	Prompt Tokens: 9    Generated Tokens: 118
I 00:00:03.195295 executorch:runner.cpp:425] 	Model Load Time:		0.683000 (seconds)
I 00:00:03.195314 executorch:runner.cpp:435] 	Total inference time:		2.502000 (seconds)		 Rate: 	47.162270 (tokens/second)
I 00:00:03.195319 executorch:runner.cpp:443] 		Prompt evaluation:	0.175000 (seconds)		 Rate: 	51.428571 (tokens/second)
I 00:00:03.195323 executorch:runner.cpp:454] 		Generated 118 tokens:	2.327000 (seconds)		 Rate: 	50.709067 (tokens/second)
I 00:00:03.195327 executorch:runner.cpp:462] 	Time to first generated token:	0.195000 (seconds)
I 00:00:03.195330 executorch:runner.cpp:469] 	Sampling time over 127 tokens:	0.049000 (seconds)
```

Differential Revision: D57949659

ciflow/trunk/4076

Toggle ciflow/trunk/4076's commit message
more fix

ciflow/trunk/4072

Toggle ciflow/trunk/4072's commit message
Add slice op to Arm backend

Implements node visitor and tests.

Also implements a io_config in ArmQuantizer
as a fallback. The io_config
QuantizationConfig is applied to placeholders
and outputs that miss annotation after all
other annotation is applied.

The intended use is for unit testing
quantization of operations
without quantization annotators.

Signed-off-by: Erik Lundell <erik.lundell@arm.com>
Change-Id: Iae7dc3f1dc2afe23776566f0e9904271cde0892a

v0.3.0-rc1

Toggle v0.3.0-rc1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enable build on aarch64 linux (pytorch#3896) (pytorch#4017)

Summary:
There's a mismatch in torch and torchvision dependecies for the linux-aarch64 packages and missing support in resolve_buck script.

Signed-off-by: Per Åstrand <per.astrand@arm.com>

Change-Id: I491499ca5e524fd2788919b6446a370fe44fdb86

Pull Request resolved: pytorch#3896

Reviewed By: digantdesai

Differential Revision: D58741803

Pulled By: mergennachin

fbshipit-source-id: 7fe598da58ea6fc29726f38cfb394a9eda832c44
(cherry picked from commit 337174c)

Co-authored-by: Per Åstrand <per.astrand@arm.com>

v0.2.1

Toggle v0.2.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Upgrade versions. (pytorch#3903)

0