8000 [wip] Revamp codegen by mzeitlin11 · Pull Request #314 · arlyon/async-stripe · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[wip] Revamp codegen #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 27 commits into from
Closed

[wip] Revamp codegen #314

wants to merge 27 commits into from

Conversation

mzeitlin11
Copy link
Collaborator
@mzeitlin11 mzeitlin11 commented Jan 4, 2023

Built(ish) on #311. This initially started as an attempt to better types involved in the codegen, but turned into more of a codegen refactor to handle (almost) all of the OpenAPI spec.

Also, all the ext files are no longer needed!

Left to do in this PR

  • Add back handling for bespoke prefixed ids. These ended up presenting some complexity since they are updated manually, so don't fit as nicely into the more automated codegen flow.
  • Then can add back traits for pagination handling
  • Add back constructor methods (new) for parameters, figure out how to handle lifetimes / borrowed data without making type definitions more complex
  • Document everything, make any edge cases clear
  • Currently ignoring Delete requests with body params. Instead of the currently separate delete and delete_query methods, would be nice to combine the delete / post functions into some builder style system so that delete and post can be handled identically.
  • Making sure doc comments are picked up where possible
  • Add tests for different categories of requests we generate that weren't previously covered
  • Looking into file organization (also relevant to questions below). Right now we generate a single file per top-level component in the spec because it makes dependency tracking straightforward (dependencies only introduced through refs in the schema). However, because lots of these components have long names, we start generating items with very long names. I hope that the x-stripeResource can be used to organize around a structure where inner_classes can just be in a module with their parent, making the structure much clearer (along with crate-splitting / features). For example, account and friends could become
account/
    mod.rs
    business_profile.rs
    future_requirements.rs
    ...

instead of the current

account.rs
account_business_profile.rs
account_future_requirements.rs
...

. We could then avoid exporting everything from resources and have some namespacing.

  • Solve open questions below!

Open Questions

Handling Breaking Changes

  • There will be many breaking changes in this PR. After fixing the TODOs above these should be limited to naming / import paths / duplicate struct handling. Not sure of a way to avoid breaking changes without adding a bunch of manual overrides to the codegen since the more consistent naming scheme (to fix Determine a way to unambiguously re-export types without silent conflicts. #154) changes lots of paths.
  • What is the best way to handle breaking changes? This will affect how to handle the other questions mentioned below since it might be best to break everything once all at once instead of multiple breaking changes, but that will make it harder to split up changes into smaller PR's.

Features / crate-splitting

  • Since we now generate everything, crate-splitting + features is more important than ever for compile-time sanity.
  • If we want to break as little as possible, could just try to add back the current feature set.
  • However, this might be a good time to work on feature discovery / crate-splitting because the changes make it a lot easier because the new codegen process includes dependency generation across components. There is now a --graph argument to the codegen which writes a graphviz-compatible outputs of the component dependency graph instead of generating code (but that will hopefully become much clearer if the file organization mentioned in TODOs is implemented. Right now it includes all components so is very hard to learn from).

My hope is that we could define features along the lines of feature x includes objects account, customer, feature y includes component balance, .... Dependency tracking would then enable understanding which features necessarily enable other features to fix #137. With a split into crates, this would make clear what cross-crate deps must need to be.

Hopefully, circular dependencies could be handled by just mandating that some specific items live in some crate stripe-core.

Improve handling of unions

With respect to the compile time pains mentioned above, another solution is to continue exploring usage of something like miniserde to improve compile times. Right now the unions are the main deterrent to trying this since they are not supported. Currently, we used serde(untagged) for these, but I think for most of these unions of object cases we could use the object field as a discriminant which hopefully allows serde to write more efficient (and smaller) implementations with better error messages. On the miniserde front, this would allow a reasonable automated generation of a deserialize impl through a structure like

match object {
   "account" -> deserialize account type,
   "customer" -> deserialize customer type,
}

When writing other unions, we often end up with a structure like

pub enum TrialEnd {
    Now(TrialEndNow),
    Timestamp(Timestamp)
}

pub enum TrialEndNow {
    Now
}

This happens because we only generate simple enums without fields or enums with fields. It would be great to simplify this representation into

pub enum TrialEnd {
    Now,
    Timestamp(Timestamp)
}

which would generate less code (hopefully help with compile time on its own) and make implementing Deserialize for unions easier

The other common union pattern that appears in the OpenAPI spec is a union of something like Account and DeletedAccount (which is returned by deleting account and some fetching methods). This was previously handled by adding a deleted field to top-level components with a Deleted counterpart and hardcoding the return value of a DELETE request as a Deleted<> struct parameterized by an id type. This doesn't seem to match the spec however - an Account should never have a deleted field and the return value of a deleting an account can also be the Account itself, the Deleted<> loses a bunch of information and forces hardcoding the request return value. However the automatically generated representation

pub enum AccountMaybeDeleted {
    Account(Account)
    Deleted(DeletedAccount)
}

is a breaking change. Deserialization could be done easily by using presence of the deleted: true field, but having to match on variants

Naming request functions

Naming requests is now harder since handling everything means fewer assumptions can be made. For now, this PR just names based on the request "operationId" field since that ensures unique names, but it gives pretty verbose / unexplanatory names. Not sure of best solutions here, best ideas I have are to perhaps just have some simple heuristic-based naming for common patterns (create / delete, etc.) then maybe just manually provide name mappings for other requests? (a pain, but there really aren't that many requests).

Handling duplicate generated content (but different name)

The OpenApi schema can only avoid duplicating content by using reference links. Especially in the case of request parameters, this is not done.

For example, the enum specifying webhook enabled events (which is huge) is specified in 2 separate parameters. The content is identical, but since they are not linked in the spec, they are created for each parameter with a different name following the spec. This was a problem as well before, but some cases were explicitly handled by adding a manual ext file and a mapping of field name / object to that specific type name. This works, but doesn't generalize and has the added issue that manually written types can fall out of date easily (for example the ApiVersion enum).

Since this PR introduces strict definitions of the content of types we generate, we can detect duplicates like these. The question becomes

  • Do we try to deduplicate as much as possible or only if the objects are large / repeated a ton?
  • How do we determine where deduplicated types live? Within in the same file this is easy, but not across files. Not sure if there are many cases of those.
  • What happens to the doc comment for the deduplicated type?
  • What happens to the name of the type?
    The main benefit of deduplicating would be (hopefully generating enough less code to improve compile time) and giving a better user experience. However, deduplicating a generated type will usually be a breaking change since the paths will change

(more questions/edits coming soon :))

@arlyon
Copy link
Owner
arlyon commented Jan 16, 2023

Wow! It's taken me a few days to work through this (and likely will a few more...) but I am really looking forward to getting this in.

@databasedav
Copy link
databasedav commented May 13, 2023

because it's not defined correctly in stripe's own spec, the code generation doesn't correctly define stuff like https://stripe.com/docs/api/prices/create#create_price-currency_options, which should be Option<HashMap<crate::currency::Currency, crate::generated::CurrencyOption>>

edit: also cashapp is missing from the payment method enums, which i also suspect is a problem on stripe's end

edit: sorry about spamming edits, just want to document whatever issues i find; stripe::generated::price::get_prices will throw

error serializing or deserializing a request: data[0].product invalid type: string, expected a sequence at line 18 column 17

because Price.products is expected to be a Vec<crate::generated::Product> when the old type seems correct, i'm not sure if this is a broader issue with Expandable, also passing expand: Some(vec!["data.product".to_string()]) just changes the invalid type to map, since it expects a vec

edit: it does seem to be something generally wrong with the Expandable fields, the same vec problem is present with CheckoutSession.{customer, payment_intent, etc.}

@databasedav
Copy link

@mzeitlin11 hi i'm interested in filling out the Expandable generation here, can u give me some high level guidance on what needs to be done?

@mzeitlin11
Copy link
Collaborator Author

@mzeitlin11 hi i'm interested in filling out the Expandable generation here, can u give me some high level guidance on what needs to be done?

Hi @databasedav, good timing, was just hoping to start getting back to this soon! I need to refamiliarize myself with what's going, rebase, and clean stuff up. Will probably be hard to work off this branch since lots of changes still required that will determine how Expandable (and some related missing pieces) best handled. Will let you know when this ends up in a state where tasks like implementing Expandable can reasonably be split out.

Wip

Revert gen changes

wip

Revert gen changes

Revert gen changes

REF: split out code writing more

Add back generated files

Keep boolean params out of constructor

[wip] Revamp codegen

Wip

[wip] Revamp codegen

Remove files to ignore

Remove more accidently added

Remove unused deleted

Remove redundant prelude

[skip ci] Remove some unused codegen

wip

add pre rebase
@mzeitlin11
Copy link
Collaborator Author

Status Update:

Main outstanding TODO's:

Crates / Features

  • Code generation now organizes based on x-stripeResource, so there is now some natural parent / children structure between different schema components, making this easier. This also keeps the import structure clearer and the types shorter since they are namespaced within modules.
  • When deciding crates, more structure could be extracted from the request paths, e.g. we can automatically determine that external_account should placed with account since it only requests nested paths starting with account.
  • On a related note, webhook handling is not implemented yet, since the crate structure will affect implementation here (since the webhook event will mess with dependencies unless feature gating is used).

Pagination

  • Pseudo-pagination support is pretty much in place, but requires some hacks around inferring which id fields should be used as cursors.
  • Right now we implement Paginable for the parameters to a request, e.g. ListAccount. However, this means that we need to implement Paginable for a bunch of different List* params, where what really matters is the return value, not the parameter type.
  • It seems more natural to implement Paginable for List<T> where T: Object. This would automatically handle paginable cases we currently don't handle, for example retrieving an Account comes with a List<ExternalAccount> that should be paginable.

Miniserde

  • Right now partial miniserde work is included under a min-ser flag. Some needed codegen improvements are still left as todo!() for generating Deserialize implementations (derives are impossible for cases like the enums with fields since the technique of using object key as a discriminant doesn't work naturally with the miniserde approach).

Deduplication

  • The lowest hanging fruit here is taking advantage of many parameter types having a title field. Within a single request file, it should be possible to deduplicate groups of parameters that have the same title field and an equivalent inferred type.
  • Should any further deduplication schemes be used to deduplicate across files?
  • Need to add ability to designate where certain types should be generated to. This could be used to deduplicate very common types and for something like APIVersion which is generated as a parameter to a single webhook request, but we also want to be importable from the top level so it can be used in the client.

@arlyon
Copy link
Owner
arlyon commented Jun 19, 2023

Hey! Thanks for continuing on this. I am back from holidays today 🏄 and plan on addressing all your questions with some ideas.

@arlyon
Copy link
Owner
arlyon commented Jun 20, 2023

Ok! I will have comments about the openapi crate as well soon...

Crates

I am going to find some time to update my code-splitting PR and see if I can apply the same analysis to this PR and see if there are any clean boundaries between modules.

Pagination

hacks around inferring which id fields should be used as cursors

I think this is OK, we have enough hacks in the previous impl so I don't mind leaving this and coming back to it later if necessary since the scope of the 'damage' is probably quite limited.

It seems more natural to implement Paginable for List where T: Object

That makes sense to me. Happy to brainstorm implementations if you want.

Miniserde

Great! Also happy to help prototype on this. My intuition says it will be fairly straight forward to just spit out a manual trait impl for enums.

Dedupe

There are a few interesting parts here, but I think this should be tackled later. Some points to consider:

  • it will probably depend on how (if?) we split packages; we may need to extract duplicate items into a separate crate, but can't have crate cycles so duplicates with cycles must stay within their crate...
  • we can probably hash the field names and types to find duplicates fairly easily
  • it would be interesting to find out why some fields are repeated rather than exposed in the schema..

@mzeitlin11
Copy link
Collaborator Author
mzeitlin11 commented Jun 20, 2023

Crates

I am going to find some time to update my code-splitting PR and see if I can apply the same analysis to this PR and see if there are any clean boundaries between modules.

Sounds great! Will push up an initial implementation of this soonish (hopefully), but that will be more about setting up the structure of having the codegen understand different crates and crate deps than finding smart module boundaries. Initial efforts gave some annoying cycles like in your PR, due to cycles from cases like

pub enum BalanceTransactionSource {
...
    IssuingAuthorization(crate::issuing::authorization::Authorization),
    IssuingDispute(crate::issuing::dispute::Dispute),
    IssuingTransaction(crate::issuing::transaction::Transaction),
...and many more
}

combined with

pub struct Authorization {
...
    /// List of balance transactions associated with this authorization.
    pub balance_transactions: Vec<crate::balance_transaction::BalanceTransaction>,
...
}

Without ending up with just a giant core type crate to break the cycle, the other solution I can think of would be to define the shared base module types in some core crate, but keep the requests in separate crates. e.g. in this case BalanceTransactionSource and issuing::authorization::Authorization would live in core, but the issuing related requests would live in their own crate (since most of the code bloat comes from the requests).

The annoyance with that (other than more codegen complexity) is that right now requests are namespaced as

impl issuing::authorization::Authorization {

}

which would have to change if issuing::authorization::Authorization lives in a different crate

EDIT: this kind of approach would also have the benefit of solving the webhook deserialization problem by keeping all the event types in the same crate so webhooks could be enabled without requiring everything else

@arlyon
Copy link
Owner
arlyon commented Jun 23, 2023

Nice. There is of course also the possibility of not splitting at all, since the code size with miniserde is so so much better, but let's give it a shot and see if we can find something that works.

@mzeitlin11
Copy link
Collaborator Author

Added some code messing around with type deduplication within files. Definitely needs cleanup / docs (as does everything else), but wanted to include a POC since it influences compile time. Need to finish off miniserde stuff to then see what compile time looks like and whether better crate splitting is necessary. (probably requiring an idea like mentioned in the bottom of #314 (comment))

Could also introduce granular features to help compile time pretty easily (since we have ideas of dependencies and features could just gate the related mod requests). e.g. "apple_pay_domain" would be a feature to enable just those specific requests. Only concern is that too many granular features makes library less usable since need to look through many features to find what needs to be enabled (and probably need to autogenerate some good feature table for the docs).

Related to #314 (comment), cc @arlyon if you have any thoughts about the stripe::Client - I know you've had some ideas about changing things with an MSRV that allows GAT's. Depending on the structure of that, might affect how we want to generate requests. But regardless of that, curious of your thoughts on current pattern of

Account::create(&client, param_struct) vs. more of a builder pattern like
CreateAccount::new(required_params).param1(value).param2(value).send(&client), more like how https://github.com/awslabs/aws-sdk-rust does things. Advantages of the former seem to be that it's more consistent with stripe libs in other languages, but the main annoyance is that it forces the Account requests to be defined in the same crate as the Account type signature, making crate splitting harder. Builder pattern also helps with something like #380

@adamchalmers
Copy link
adamchalmers commented Jul 4, 2023

Hi there! My company uses async-stripe in our backend. We're only using a small fraction of the Stripe API. So, I'd be happy to test out migrating breaking changes for a realistic use-case (I suspect there's a fair number of other users who also only use a small part of the API).

IMO, breaking changes are OK as long as there's a good migration guide. I think Clap 4.0 had a great migration guide as did Dropshot 9.0 here -- explaining how to migrate is good, breaking changes have to happen eventually. And especially if there's a good reason to make them -- I think many users would be thrilled by the compile time improvements, even if it requires some manual migrations.

If this PR is in an OK state, I'd be happy to start a WIP branch on my work's backend and report what my migration experience was like.

@arlyon
Copy link
Owner
arlyon commented Jul 6, 2023

@adamchalmers thanks for volunteering to help out. I think the general structure of the codegen is mostly there. We (well, @mzeitlin11) may move some things around internally to try to break the code up better and reduce the size of the compilation units and, as you saw, there is still the open question about which syntax to use for generating the requests.

If you are OK with these changes down the line then go ahead. Some feedback from an integration branch would be great.

Speaking of which @mzeitlin11 do you think it would be handy to schedule a synchronous call soon to hash out some of the remaining items? You can book something here https://usemotion.com/meet/alexander-lyon/meeting

@adamchalmers
Copy link

Started work on it. One note so far is that this PR requires time-core <= 0.1.0 due to MSRV.

https://github.com/mzeitlin11/async-stripe/blob/types/async-stripe/Cargo.toml#L110-L111

But the time crate as of version 0.3.16 has required time-core = 0.1.1, so to integrate this branch, I've had to downgrade my project to time 0.3.16 from the latest version (0.3.22) and ensure it stays there.

I'd suggest allowing users to use time-core 0.1.x in the final PR.

@mzeitlin11
Copy link
Collaborator Author
mzeitlin11 commented Aug 1, 2023

https://usemotion.com/meet/alexander-lyon/meeting

Booked time for next week!

Crate splitting is now implemented with this initial structure:
out_crate

Compile Time Notes

stripe_types is a special case which contains base types like Currency, id generation, and all the core types which must be included to avoid circular dependencies. This approach helps a ton with compilation pipelining, because stripe_types has no dependency on the client, so can start compiling as soon as serde / serde_json are ready. Having no requests in stripe_types adds the complexity that, for example, a type like Account lives in stripe_types, but the requests related to Account live in stripe_connect. This should be able to made opaque to the user, however, by just re-exporting stripe_types::Account in stripe_connect. In theory, stripe_types should never have to be a user dependency at all.

The majority of the code (so largest compile time costs) then lives in the other generated crates since the request-related structs are the majority of what we generate. Since requests don't depend on each other, feature-gating different API resources should be straightforward, without having to worry about incompatible feature sets. Once those request-related features are added, the compile time should shrink toward how long stripe_types takes to compile since most users use only a small bit of the giant API.

TODOS

  • Latest commits were working and smoke tests passing, but unfortunately updated the OpenAPI version and suddenly the spec fields inner_classes and in_class (used a bunch for crate inference and organization) were gone :(. So some more work needed to get this back into functional shape for the latest spec version.
  • Add request features per stripe resource.
  • Clean up some codegen, e.g. we don't actually need to expose types like AccountObject which are single-variant enums only used as discriminants with serde(untagged). (ideally these object names would allow using tagged deserialization as well).
  • The stripe client can now import ApiError from stripe_types, so this type can be used instead of the manual version, ideally fixing API error caused serialization issue #384
  • Any changes to address the open questions below
  • Figure out how to integrate webhook event deserialization without requiring the webhook feature to bring in the entire library (maybe need to feature-gate the event types?)
  • Add back list pagination.

Open Questions

  • Requests could be made more ergonomic. Right now creating an account looks like stripe_connect::account::create(&client, CreateAccount::new()) - requiring both the import of the function create and the CreateAccount parameters. It feels more user-friendly to instead have the API call be something like CreateAccount::new().send(&client). (implemented in latest commit)
  • Right now users need to include whichever crates (e.g. stripe_connect, stripe_payment) in which there are requests they want to use. It would be more complicated on the codegen side, but the main async-stripe crate could instead reexport these crates as necessary based on features, so users only need to depend on async-stripe.

Miniserde Notes

  • To shrink the scope of this pr and simplify the codegen changes, miniserde stuff was ripped out. The main issue I ran into is that untagged enums are at odds with how the miniserde Deserialize trait is defined. So initial implementations could use miniserde as a tool for deserializing untagged enums (64eb517#diff-806d0bc9794937ef7af03d15d2438671d3568ed8f3754bcac57ca1c9bc7c71afL12-L39), but not actually impl the miniserde::Deserialize trait, which meant that any struct including an untagged enum could not derive miniserde::Deserialize. So further work could probably figure out solutions, but there would likely need to be significant codegen changes where the code generation has to generate deserialization code, instead of relying on derives in most cases.

@mzeitlin11
Copy link
Collaborator Author

Pagination added back (closes #283, closes #246).

Slightly different organization, now implemented on both List* parameters and an existing List<T>. So instead of

let params = ListCustomer::new();
let stream = Customer::list(&client, &params).await.unwrap().paginate(params).stream(&client);

can do

let stream = ListCustomer::new().paginate().stream(&client);

And can also paginate any List<T> received, e.g.

let account = RetrieveAccount::new().send(&client).await.unwrap().
// We have one page of `external_accounts`, get the rest:
let external_account_stream = account.external_accounts.into_paginator().stream(&client);

@mzeitlin11
Copy link
Collaborator Author

Added back webhook handling in new crate stripe_webhook. Adding another crate annoying in some ways, but issue with keeping in async-stripe is that dependencies look like stripe_webhook -> generated/* -> async-stripe, so adding webhook handling in async-stripe would create circular dependencies since the client cannot depend on the generated types crates.

Also migrated the examples into standalone example binaries in examples/. Those also cannot live in async-stripe anymore since that crate essentially only contains client functionality now, but knows nothing about specific requests.

This is readyish for review / testing (With caveat that these changes require much more testing / docs before being ready for production. @adamchalmers any feedback on issues / annoyances using for real-world use case would be super helpful).

I'd still plan to make more smaller changes around naming + deduplicating some generated types (one example right now is recognizing shared types between component definitions and requests, e.g. AccountType and CreateAccountType should be deduplicated. And reexporting types from stripe_types should still be done so users don't always have to rely on IDE / docs to figure out whether a type like Account lives in stripe_types or the relevant crate (stripe_connect).

Also if there are any opinions on MSRV - once that's settled can remove the time-core pin (and hopefully other pins) and start getting CI running on these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants
0