Tags: yushangdi/executorch
Tags
Merge branch 'upstream/main' into debug_features Change-Id: Iad9360b091111365847bde16fc8a1e8705a520f5
Merge branch 'upstream/main' into op_sigmoid Change-Id: I0e688fae977eb090a135f8ff8828d2f641370a39
Merge branch 'upstream/main' into sub_op Change-Id: Id70cc7f9d7787b02defb6981dbaf292937f1982f
Merge branch 'upstream/main' into op_full Change-Id: I68062223b0baaf91192784e2eb04e06677c3280f
skip NoneType spec in vulkan_graph_builder Summary: This comes up in dynamic shape ops. Example error message: ``RuntimeError: Cannot create value for spec of type <class 'NoneType'>`` Differential Revision: D59028536
use index_put only in kv cache update to reduce number of operators (p… …ytorch#3786) Summary: Pull Request resolved: pytorch#3786 The decomposition from ``` class IndexPut(torch.nn.Module): def __init__(self): super().__init__() def forward(self, x, input_pos, value): x[:, :, input_pos] = value return x ``` is ``` opcode name target args kwargs ------------- --------------- -------------------------- ----------------------------------------------- -------- placeholder x x () {} placeholder input_pos input_pos () {} placeholder value value () {} call_function slice_1 aten.slice.Tensor (x, 0, 0, 9223372036854775807) {} call_function slice_2 aten.slice.Tensor (slice_1, 1, 0, 9223372036854775807) {} call_function index_put aten.index_put.default (slice_2, [None, None, input_pos], value) {} call_function slice_3 aten.slice.Tensor (x, 0, 0, 9223372036854775807) {} call_function slice_scatter aten.slice_scatter.default (slice_3, index_put, 1, 0, 9223372036854775807) {} call_function slice_scatter_1 aten.slice_scatter.default (x, slice_scatter, 0, 0, 9223372036854775807) {} output output output ((slice_scatter_1, slice_scatter_1),) {} ``` however `x[:, :, input_pos] = value` really is just updating the content inside `x` with value, essentially just `index_put` By replacing `x[:, :, input_pos] = value` with `torch.ops.aten.index_put_(x, [None, None, input_pos], value)`, we reduce the number of operators from 6 to 1. ``` class IndexPut(torch.nn.Module): def __init__(self): super().__init__() def forward(self, x, indices, values): torch.ops.aten.index_put_(x, [None, None, input_pos], value) return x ``` decomposition is ``` opcode name target args kwargs ------------- --------- ---------------------- ----------------------------------- -------- placeholder x x () {} placeholder input_pos input_pos () {} placeholder value value () {} call_function index_put aten.index_put.default (x, [None, None, input_pos], value) {} output output output ((index_put, index_put),) {} ``` A more proper way to address this in long term is via pattern matching to replace the patterns with the simplified pattern Perf: For stories, before the diff ``` I 00:00:03.437290 executorch:runner.cpp:419] Prompt Tokens: 9 Generated Tokens: 118 I 00:00:03.437295 executorch:runner.cpp:425] Model Load Time: 0.763000 (seconds) I 00:00:03.437301 executorch:runner.cpp:435] Total inference time: 2.661000 (seconds) Rate: 44.344231 (tokens/second) I 00:00:03.437305 executorch:runner.cpp:443] Prompt evaluation: 0.185000 (seconds) Rate: 48.648649 (tokens/second) I 00:00:03.437309 executorch:runner.cpp:454] Generated 118 tokens: 2.476000 (seconds) Rate: 47.657512 (tokens/second) I 00:00:03.437313 executorch:runner.cpp:462] Time to first generated token: 0.206000 (seconds) I 00:00:03.437315 executorch:runner.cpp:469] Sampling time over 127 tokens: 0.042000 (seconds) ``` After the diff ``` I 00:00:03.195257 executorch:runner.cpp:419] Prompt Tokens: 9 Generated Tokens: 118 I 00:00:03.195295 executorch:runner.cpp:425] Model Load Time: 0.683000 (seconds) I 00:00:03.195314 executorch:runner.cpp:435] Total inference time: 2.502000 (seconds) Rate: 47.162270 (tokens/second) I 00:00:03.195319 executorch:runner.cpp:443] Prompt evaluation: 0.175000 (seconds) Rate: 51.428571 (tokens/second) I 00:00:03.195323 executorch:runner.cpp:454] Generated 118 tokens: 2.327000 (seconds) Rate: 50.709067 (tokens/second) I 00:00:03.195327 executorch:runner.cpp:462] Time to first generated token: 0.195000 (seconds) I 00:00:03.195330 executorch:runner.cpp:469] Sampling time over 127 tokens: 0.049000 (seconds) ``` Differential Revision: D57949659
Add slice op to Arm backend Implements node visitor and tests. Also implements a io_config in ArmQuantizer as a fallback. The io_config QuantizationConfig is applied to placeholders and outputs that miss annotation after all other annotation is applied. The intended use is for unit testing quantization of operations without quantization annotators. Signed-off-by: Erik Lundell <erik.lundell@arm.com> Change-Id: Iae7dc3f1dc2afe23776566f0e9904271cde0892a
Enable build on aarch64 linux (pytorch#3896) (pytorch#4017) Summary: There's a mismatch in torch and torchvision dependecies for the linux-aarch64 packages and missing support in resolve_buck script. Signed-off-by: Per Åstrand <per.astrand@arm.com> Change-Id: I491499ca5e524fd2788919b6446a370fe44fdb86 Pull Request resolved: pytorch#3896 Reviewed By: digantdesai Differential Revision: D58741803 Pulled By: mergennachin fbshipit-source-id: 7fe598da58ea6fc29726f38cfb394a9eda832c44 (cherry picked from commit 337174c) Co-authored-by: Per Åstrand <per.astrand@arm.com>
PreviousNext