8000 [LLHD] Update Deseq pass to work with process results by fabianschuiki · Pull Request #8403 · llvm/circt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[LLHD] Update Deseq pass to work with process results #8403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

fabianschuiki
Copy link
Contributor

Add an updated version of the desequentialization pass. This new version expects all probes and drives to have been hoisted out of processes, and for the processes to produce the interesting signals as a result. Resets can now be distinguished from clocks more reliably. The previous version of the pass is left untouched for now, since it is used in the circt-verilog pipeline. Once process lowering also works with process results, we can get rid of the old pass implementations in one go.

This pass is an absolute headache to implement because of the fuzzy and ill-defined Verilog and VHDL semantics it has to match. I expect us to iterate heavily on this in the future, along the definition of llhd.process and potentially process flavors that are more convenient to analyze.

In particular, the pass has to do a lot of things in one go: it checks whether a process has only a single wait, if the wait carries past values into the present as block arguments, whether the IR expresses a posedge/negedge detection based on past and present values, whether the process properly observes the necessary triggers values, and it has to match the detected triggers against a very narrow list of things that a register can represent.

Life would be easier if we had a more restricted form of processes that already express the basic structure of a loop through a single wait, observed triggers, and which past values are required. This would then also allow us to detect and generate latches, which are missing completely right now.

The pass currently performs a data flow analysis by propagating DNFs and tables of potential result values across a lattice. I'm not too happy with this implementation, and doing a depth-first traversal of the IR may be a better approach in the future. We should probably revisit this once we have improved the process op a bit and update this pass again.

This commit also adds an implementation of the Disjunctive Normal Form as a way of expressing and manipulating boolean functions. This is necessary to reliably detect if a process computes the posedge or negedge of a value. In the future we may want to switch this to a more compact representation of the underlying truth table, which can be done fairly efficiently using APInt. The current implementation of DNF::optimize() already does this internally. Note that DNFs are very inefficient at representing arbitrary boolean functions, as they have a tendency to grow exponentially. Therefore care must be taken that only operations on potential trigger values of sequential processes are captured in the DNF, and all other values are kept as opaque values. This keeps the complexity bounded.

@fabianschuiki fabianschuiki marked this pull request as ready for review April 8, 2025 04:52
@fabianschuiki fabianschuiki requested a review from maerhart as a code owner April 8, 2025 04:52
Copy link
Member
@maerhart maerhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed this in super detail, but it's better than what we currently have, so good to merge and iterate on.

I'm quite curious about how the DNF vs truth table approaches perform for various inputs parameterized by number of expression terms and number of primitives. Both are exponential, both can't handle processing of arbitrary/big enable conditions, etc. My guess would be that there is no significant difference in practice, but it'd be cool to see actual numbers 🤓

@@ -91,6 +91,7 @@ def FinalOp : LLHDOp<"final", [
def WaitOp : LLHDOp<"wait", [
AttrSizedOperandSegments,
HasParent<"ProcessOp">,
Pure,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning of making this op pure? It suspends execution, stores the current values of the block arguments, makes the process hold the values of the yields, etc. Which all sounds implicitly side-effecting to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added this because it simplified a check in deseq that ensures a process has no side-effects besides what's visible through its results. But you definitely raise a good point, the actual op sounds very side-effecting. I'll revert this and make the check in deseq a bit more careful.

// Technically also non-constants can be reset values. However, since
// async resets are level sensitive (even though Verilog describes them as
// edge sensitive), the reset value would have to be part of the wait op's
// observed values. We don't check for that.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason we don't check for that or is it just not implemented yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just not implemented yet. In practice this doesn't really come up, since almost all flip-flop standard cells have a set or reset input that sets them to a constant 1 or 0, respectively. There are ways to cook up resets to a dynamic value, but they are pretty obscure and not really used in practice AFAIK. They are also pretty hard to describe in Verilog, since it generally gets the level-sensitivity of resets wrong.

Comment on lines +39 to +46
namespace circt {
namespace llhd {
static llvm::raw_ostream &operator<<(llvm::raw_ostream &os, const DNF &dnf) {
dnf.printWithValues(os);
return os;
}
} // namespace llhd
} // namespace circt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: shouldn't this be in DNF.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. I'll move that over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered why this is here. The unit tests want to define their own operator with different value printing. Since the Deseq pass is the only thing that uses this specific style of printing, and only for debugging, the << overload is in here.

propagate(cast<Block *>(node));
}
LLVM_DEBUG(llvm::dbgs() << "- Finished propagation\n");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing the DNF for all values is quite inefficient as the runtime is exponential in the number of terms. I see two options:

  • Change the DNF datastructure to just collect the terms and only reorder them to form a DNF and optimize it when asked to do so instead of on-the-fly as soon as a new term is added
  • Construct the DNFs backward from values that we need to know the DNF of

But we can just merge this and do some benchmarking/profiling as well as collecting statistics about expression sizes and number of primitive values to properly optimize this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point. I initially had an implementation that performed a depth-first traversal of the interesting signals to determine DNFs for the trigger signals specifically. I think I'll change Deseq over to that approach again in a second iteration. The pass currently has pretty poor scaling for exactly the reasons you mention.

Add an updated version of the desequentialization pass. This new version
expects all probes and drives to have been hoisted out of processes, and
for the processes to produce the interesting signals as a result. Resets
can now be distinguished from clocks more reliably. The previous version
of the pass is left untouched for now, since it is used in the
circt-verilog pipeline. Once process lowering also works with process
results, we can get rid of the old pass implementations in one go.

This pass is an absolute headache to implement because of the fuzzy and
ill-defined Verilog and VHDL semantics it has to match. I expect us to
iterate heavily on this in the future, along the definition of
`llhd.process` and potentially process flavors that are more convenient
to analyze.

In particular, the pass has to do a lot of things in one go: it checks
whether a process has only a single wait, if the wait carries past
values into the present as block arguments, whether the IR expresses a
posedge/negedge detection based on past and present values, whether the
process properly observes the necessary triggers values, and it has to
match the detected triggers against a very narrow list of things that a
register can represent.

Life would be easier if we had a more restricted form of processes that
already express the basic structure of a loop through a single wait,
observed triggers, and which past values are required. This would then
also allow us to detect and generate latches, which are missing
completely right now.

The pass currently performs a data flow analysis by propagating DNFs and
tables of potential result values across a lattice. I'm not too happy
with this implementation, and doing a depth-first traversal of the IR
may be a better approach in the future. We should probably revisit this
once we have improved the process op a bit and update this pass again.

This commit also adds an implementation of the Disjunctive Normal Form
as a way of expressing and manipulating boolean functions. This is
necessary to reliably detect if a process computes the posedge or
negedge of a value. In the future we may want to switch this to a more
compact representation of the underlying truth table, which can be done
fairly efficiently using APInt. The current implementation of
`DNF::optimize()` already does this internally. Note that DNFs are very
inefficient at representing arbitrary boolean functions, as they have a
tendency to grow exponentially. Therefore care must be taken that only
operations on potential trigger values of sequential processes are
captured in the DNF, and all other values are kept as opaque values.
This keeps the complexity bounded.
@fabianschuiki
Copy link
Contributor Author

Going to land this and iterate on some of the performance stuff in follow-up PRs.

@fabianschuiki fabianschuiki merged commit d9ed306 into main Apr 17, 2025
5 checks passed
@fabianschuiki fabianschuiki deleted the fschuiki/deseq branch April 17, 2025 16:53
KelvinChung2000 pushed a commit to KelvinChung2000/circt that referenced this pull request Apr 22, 2025
Add an updated version of the desequentialization pass. This new version
expects all probes and drives to have been hoisted out of processes, and
for the processes to produce the interesting signals as a result. Resets
can now be distinguished from clocks more reliably. The previous version
of the pass is left untouched for now, since it is used in the
circt-verilog pipeline. Once process lowering also works with process
results, we can get rid of the old pass implementations in one go.

This pass is an absolute headache to implement because of the fuzzy and
ill-defined Verilog and VHDL semantics it has to match. I expect us to
iterate heavily on this in the future, along the definition of
`llhd.process` and potentially process flavors that are more convenient
to analyze.

In particular, the pass has to do a lot of things in one go: it checks
whether a process has only a single wait, if the wait carries past
values into the present as block arguments, whether the IR expresses a
posedge/negedge detection based on past and present values, whether the
process properly observes the necessary triggers values, and it has to
match the detected triggers against a very narrow list of things that a
register can represent.

Life would be easier if we had a more restricted form of processes that
already express the basic structure of a loop through a single wait,
observed triggers, and which past values are required. This would then
also allow us to detect and generate latches, which are missing
completely right now.

The pass currently performs a data flow analysis by propagating DNFs and
tables of potential result values across a lattice. I'm not too happy
with this implementation, and doing a depth-first traversal of the IR
may be a better approach in the future. We should probably revisit this
once we have improved the process op a bit and update this pass again.

This commit also adds an implementation of the Disjunctive Normal Form
as a way of expressing and manipulating boolean functions. This is
necessary to reliably detect if a process computes the posedge or
negedge of a value. In the future we may want to switch this to a more
compact representation of the underlying truth table, which can be done
fairly efficiently using APInt. The current implementation of
`DNF::optimize()` already does this internally. Note that DNFs are very
inefficient at representing arbitrary boolean functions, as they have a
tendency to grow exponentially. Therefore care must be taken that only
operations on potential trigger values of sequential processes are
captured in the DNF, and all other values are kept as opaque values.
This keeps the complexity bounded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0