8000 Observable/Noise parameter naming · Issue #594 · PEtab-dev/PEtab · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Observable/Noise parameter naming #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
FFroehlich opened this issue Jan 31, 2025 · 25 comments
Open

Observable/Noise parameter naming #594

FFroehlich opened this issue Jan 31, 2025 · 25 comments
Milestone

Comments

@FFroehlich
Copy link
Collaborator

Currently, noise and observable parameter placeholders require _${observableId} as suffix. This suffix appears to be redundant and superfluous to me, as they need to be specified in the measurements table and in the measurements table you can only specify the parameter for the same observable.

I can see how adding the suffix could make it more explicit that these variables are different across observables, but that should already by evident from the fact that they have to be specified in the measurements table. Neither the linter nor simulators appear to check for correctness anyways (see https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab)

@dweindl
Copy link
Member
dweindl commented Jan 31, 2025

I think, the suffix was intended to reduce the chances of collision with already exiting parameters with a different purpose. (Not saying that it would be a good idea to name your parameters observableParameter1, ..., but you never know).

I am open to changing that, but I would prefer not just dropping the suffix, but instead, introducing a column placeholders (or similar), where one can specify the parameter Ids and the expected order for overriding via the measurements table. This way, one can use more descriptive parameter IDs.

Concrete example:

Current:

observableId observableFormula
obs1 observableParameter1_obs1 * X + observableParameter2_obs1

Proposed:

observableId observableFormula observablePlaceholders
obs1 scale * X + offset scale;offset

Neither the linter nor simulators appear to check for correctness anyways (see https://github.com/Benchmarking-Initiative/Benchmark-Models-PEtab)

I don't think so. It will not complain about the name as such, but it will not consider it as a placeholder. Whether that results in an error will depend on the rest of the petab files. (If not, please post an example/issue). (EDIT: Okay, I think you were referring to Benchmarking-Initiative/Benchmark-Models-PEtab#261)

@FFroehlich
Copy link
Collaborator Author

I think, the suffix was intended to reduce the chances of collision with already exiting parameters with a different purpose. (Not saying that it would be a good idea to name your parameters observableParameter1, ..., but you never know).

Well if it's only about collisions they could just universally be renamed to observableParameter1_placeholder.

@dweindl
Copy link
Member
dweindl commented Feb 11, 2025

Well if it's only about collisions they could just universally be renamed to observableParameter1_placeholder.

In principle, yes. (Or even shorter, placeholder$n.)

However, the problem remains, that those IDs aren't very descriptive and that certain IDs have special meaning. If we change the way these placeholders are handled/specified, I'd prefer directly addressing these issues as well.

@FFroehlich
Copy link
Collaborator Author

Well if it's only about collisions they could just universally be renamed to observableParameter1_placeholder.

In principle, yes. (Or even shorter, placeholder$n.)

However, the problem remains, that those IDs aren't very descriptive and that certain IDs have special meaning. If we change the way these placeholders are handled/specified, I'd prefer directly addressing these issues as well.

Sounds reasonable. How about using non-PEtab IDs, i.e. placeholder1.scale and placeholder2.offset`? Avoids introducing an additional column and simplifies parsing.

@dweindl
Copy link
Member
dweindl commented Feb 26, 2025

Sounds reasonable. How about using non-PEtab IDs, i.e. placeholder1.scale and placeholder2.offset`? Avoids introducing an additional column and simplifies parsing.

It would definitely be more descriptive.

However, I'm not so sure about the simpler parsing. I could imagine that not every symbolic engine will allow dots in their symbols. (Could use placeholder1(scale) instead. Not sure.) For parsing, the separate column would probably be the easiest. No reserved names, no changes in the current math expression grammar, ...

@dweindl dweindl added this to the PEtab 2.0.0 milestone Mar 24, 2025
@dilpath
Copy link
Member
dilpath commented Mar 26, 2025

Another suggestion is to allow arbitrary columns like scale and offset in the measurements table. These are treated as PEtab IDs and cannot appear anywhere else in the PEtab problem, except the observables table formulae.

Overall this is much nicer for the user and for tools I guess, but arbitrary columns are suboptimal.

Example of a single problem with two measurement tables, one per observable

Measurement table 1

observableId time measurement scale
obs1 0 0 1
obs1 0 0 2
obs1 0 0 3
obs1 0 0 4

Measurement table 2

observableId time measurement offset
obs2 0 0 1
obs2 0 0 2
obs2 0 0 3
obs2 0 0 4

Observable table

observableId observableFormula noiseFormula
obs1 scale * x1 1
obs2 x2 + offset 1

I split this into two measurement tables, since scale is empty for all obs2 measurements, and similarly offset 8000 and obs1. Otherwise the measurement table can get sparse/large.

@dilpath
Copy link
Member
dilpath commented Mar 26, 2025

Another suggestion is to support labels in the observableParameters column, where the labels cannot be reused for other observables nor appear anywhere else in the PEtab problem.

Measurements table

observableId time measurement observableParameters
obs1 0 0 scale=1
obs2 0 0 offset=4
obs3 0 0 scale_obs3=1;offset_obs3=2

Observables table

observableId observableFormula noiseFormula
obs1 scale * x1 1
obs2 x2 + offset 1
obs3 scale_obs3*x2 + offset_obs3 1

@dweindl
Copy link
Member
dweindl commented Mar 27, 2025

Thanks for the suggestions. What would be important for me when we change the current measurement-specific parameters approach:

  • It should be evident from the observables table which parameters are meant to be measurement-specific (i.e. overridable via the measurements table)
  • It should be evident from the observables table which parameters are shared across observables. So I'd be careful with allowing some, e.g., parameter scaling that is used in several observableFormulas, but is in fact different for each. Not really a technical constraint, but I think it might be misleading.
    Somewhat related, if we don't require unique parameter names and have a something like {observableId=y, observableFormula=scaling * x, noiseFormula=scaling * y where scaling is a placeholder in both cases, then substituting the observable formula for the observable symbol in the noise formula would have potentially unexpected effects.
  • We shouldn't have to change the math expression grammar
  • Having well-defined column names that are relevant for the handling of the parameter estimation problem (now that we just got rid of the wide conditions table, I wouldn't want to introduce additional parameter-id-based columns in the measurements table.
  • Ideally, we could avoid any use of reserved names (or as currently, even worse, context-specific reserved patterns)

Seeing directly from the measurement table which parameters are overridden would be nice and more robust, but is probably inconvenient to write. This would also be inconsistent with other cases, such as priorParameters, where we have lists of parameters. Either keyword parameters everywhere or nowhere, but that's probably a separate discussion.

@dilpath
Copy link
Member
dilpath commented Mar 27, 2025

It should be evident from the observables table which parameters are meant to be measurement-specific (i.e. overridable via the measurements table)

I'm fine with this, but: it's not evident which parameters are experiment/condition-specific, so why care about measurement-specific parameters? To make life a bit easier for tools?

Apart from that I agree on all points.

So I would vote first for your observablePlaceholders column (perhaps rather observableParameterIds).

@dweindl
Copy link
Member
dweindl commented Mar 27, 2025

it's not evident which parameters are experiment/condition-specific, so why care about measurement-specific parameters? To make life a bit easier for tools?

For me there is a slight difference: other parameters may be overridden in a condition-specific way, but those placeholders must be overridden for any measurements of the respective observable. (At least, that is how it used to be.) Therefore, I'd prefer seeing easily which ones have to be specified with the measurements.

@matthiaskoenig
Copy link
Contributor

I definitely like the suggestion by @dweindl much better then what we have right now.

observableId observableFormula observablePlaceholders
obs1 scale * X + offset scale;offset

@dweindl
Copy link
Member
dweindl commented Apr 25, 2025

@FFroehlich @fbergmann @sebapersson : Do you have any thoughts on this? Keep it as it was in v1? Change? (How?)

@sebapersson
Copy link
Contributor

I definitely like the suggestion by @dweindl much better then what we have right now.

I agree on this. This is something that should be straightforward to parse for an importer.

@dweindl
Copy link
Member
dweindl commented May 5, 2025

Follow-up question: would the placeholder parameter IDs have to be globally unique or would the following be allowed (scale & offset used as placeholders in different observables)?

observableId observableFormula observablePlaceholders
obs1 scale * X + offset scale;offset
obs2 scale * Y + offset scale;offset

At first glance, it might be confusing that obs1 and obs2 seemingly use the same scale & offset parameters (which in fact are completely independent). However, this is pretty much the same thing as with experiment/period-specific changes. So I'd say this should be allowed and implementations will have to take care of handling their namespaces properly.

@fbergmann
Copy link
Contributor

@dweindl Just for clarification, if I wanted to use other model elements in the observableFormula, that would be fine, right? And only if tokens from the observableFormula are not in the model, they would be sought in the placeholders column. Right?

As for uniqueness, in v1, things were unique by default, since we forced the suffix. As such, even without reading and memorising the spec, you would see the parameters belonging together. in V2, looking at the example above, I would then have to go to read / remember the spec, to recall that scale would always be different for obs1 and ob2 if used as in the example.

@dweindl
Copy link
Member
dweindl commented May 5, 2025

if I wanted to use other model elements in the observableFormula, that would be fine, right?

Yes, anything from the model file or the parameters table could be used there.

And only if tokens from the observableFormula are not in the model, they would be sought in the placeholders column. Right?

I would put it the other way around: If there is anything specified in the placeholders column, then the values of those entities needs to be retrieved from the measurement table row for the the current timepoint/experiment/observable.

As for uniqueness, in v1, things were unique by default, since we forced the suffix. As such, even without reading and memorising the spec, you would see the parameters belonging together. in V2, looking at the example above, I would then have to go to read / remember the spec, to recall that scale would always be different for obs1 and ob2 if used as in the example.

Yes and no. I think in terms of uniqueness it's similar to different simulation conditions (v1) or different experiments (v2), where the same parameter may take different values. Those overrides from the measurement table are kind of micro-conditions. The condition-specific parameters aren't evident from the model. However, here, the placeholder parameters are specified as such just next to where they are used. So I find it even more explicit here.

I don't have a very strong opinion on this matter. If it makes life significantly easier, I am also fine with requiring globally unique IDs.

Just be explicit about the example above: That would assume/require that scale and offset aren't defined in the model, parameters table, ...

@dilpath
Copy link
Member
dilpath commented May 5, 2025

If we separate the IDs into two groups "PEtab IDs" and "placeholder IDs" (IDs in the observablePlaceholders column), there should be no overlap between the two groups (edit: as you write above...). Apart from that, I am fine with duplicates in the "placeholder IDs" group.

But I prefer globally unique IDs, because I don't think it's completely correct to view overrides as micro-conditions. In PEtab conditions, the parameter that gets overridden with different values always has the same meaning: "this is the rate constant for reaction R". But in this case, we cannot say "this is the offset for observable O", rather just "this is the offset ID for all observables". The offset for each observable seems different conceptually, for example, each offset will have different units.

@dweindl
Copy link
Member
dweindl commented May 5, 2025

I don't think it's completely correct to view overrides as micro-conditions

Fair. I mixed the measurement-specific parameters in general with the re-use of parameter IDs. It only applies to the former.

@matthiaskoenig
Copy link
Contributor

I would prefer globally unique ids. This makes things often easier in the implementation and avoids confusion (at the cost of having to make the ids unique). For instance storing things in hash maps is so much easier with unique ids instead of having to create your own unique ids for objects.
Also one wants to display/use the results of the respective scale and offset. Latest here a disambiguation is required. So why not make them unique in the beginning.

@FFroehlich
Copy link
Collaborator Author

What's the point of having globally unique placeholders? Apart from time-point specific overrides (which is rare, as far as I know) everything else could be achieved by overriding scales/offsets in a condition specific manner, right?

@dweindl
Copy link
Member
dweindl commented May 8, 2025

What's the point of having globally unique placeholders?

It might be more convenient for annotations (e.g., units).

Apart from time-point specific overrides (which is rare, as far as I know) everything else could be achieved by overriding scales/offsets in a condition specific manner, right?

Yes, timepoint-specific output parameters are the main reason for those placeholders. Are you suggesting to drop that completely?

@FFroehlich
Copy link
Collaborator Author
FFroehlich commented May 10, 2025

What's the point of having globally unique placeholders?

It might be more convenient for annotations (e.g., units).

Fair.

Apart from time-point specific overrides (which is rare, as far as I know) everything else could be achieved by overriding scales/offsets in a condition specific manner, right?

Yes, timepoint-specific output parameters are the main reason for those placeholders. Are you suggesting to drop that completely?

Not sure, perhaps. With the new conditions/experiments spec, you could just take the "micro-condition" approach and add a ton of conditions that update the respective parameters, so there is not really any new functionality that we are supporting with this. Thus, we should be mainly concerned about how much this simplifies problem formulation and facilitates conceptual understanding of the spec. Introducing additional columns and introducing constraints about global uniqueness seem more geared towards formal correctness/ease of implementation rather than making it easier for the user.

@matthiaskoenig
Copy link
Contributor

I agree, we should keep this as simple as possible for the user.
But if this is some functionality which is rarely used and already possible via the conditions than we probably don't need this additional information. But I am not an expert in this and have never used it.

@dweindl
Copy link
Member
dweindl commented May 12, 2025

So far, I am more in favor of keeping this in some form.

It's true that some of the current uses could be replaced by condition-specifc parameters and that the remaining use cases don't seem to be super frequent, but nevertheless, they are there.

I also find it preferable to have the measurement-specific parameter changes specified together with the measurements, but that's just a matter of taste 8000 (cohesion vs. conciseness), I guess.

My feeling is that the simplification of the standard by removing placeholders is not that significant. If a user doesn't need it, they can just ignore it. There isn't much coupling to other parts of PEtab. The same for tool developers - they can just choose not support it (initially). (Having explicit mark up for those placeholders as proposed above would make it easier to recognize the use of that feature).

@matthiaskoenig
Copy link
Contributor

@dweindl perfect. I have no preference either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
0