8000 [SYNPY-1599]: add json schema mixin class by linglp · Pull Request #1205 · Sage-Bionetworks/synapsePythonClient · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[SYNPY-1599]: add json schema mixin class #1205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 33 commits into
base: develop
Choose a base branch
from
Draft

Conversation

linglp
Copy link
Contributor
@linglp linglp commented Jun 4, 2025

Problem:

  • We currently do not have an easy interface to bind, validate, and unbind jsonschemas.
  • We currently don’t have a function that handles asynchronous POST requests with pagination.

Solution:

  • Based on the documentation here, created BaseJsonSchema mixin class for project, folder, table, entity view, file
  • Created ContainerEntityJsonSchema mixin class for project and folder
  • Add a function in api_client.py to handle asynchronous POST requests with pagination
  • Modified async_to_sync decorator so that it could handle async generator.

Testing:

  • Add integration tests.


async def get_json_schema_from_entity_async(
self, *, synapse_client: Optional["Synapse"] = None
) -> Dict[str, Union[str, int, bool]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For any of the return types of these functions try to convert them into a python dataclass with the snake_case format for attributes. That way someone that calls the function would be able to do:

(await my_folder.get_json_schema_from_entity_async()).json_schema_version_info.organization_id

We want to be very clear for users of the python client what fields are possible to access when they call these methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I will make the update.

Copy link
Contributor Author
@linglp linglp Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this look correct?

@dataclass
class JSONSchemaDerivedKeysResponse:
    keys: List[str]

async def get_json_schema_derived_keys_async(
        self, *, synapse_client: Optional["Synapse"] = None
    ) -> JSONSchemaDerivedKeysResponse:
        """Retrieve derived JSON schema keys for a given Synapse entity.

        Args:
            synapse_id (str): The Synapse ID of the entity for which to retrieve derived keys.

        Returns:
            JSONSchemaDerivedKeysResponse: An object containing the derived keys for the entity.
        """
        return JSONSchemaDerivedKeysResponse(**await get_json_schema_derived_keys(
            synapse_id=self.id, synapse_client=synapse_client
        ))

And when I call the function, I will do something like:

        response = await stored_folder.get_json_schema_derived_keys_async(
            synapse_client=self.syn
        )
       assert set(response.keys) == {"productId", "productName"}

instead of "response["keys"]"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that looks correct (Any the same pattern can be applied to other method calls too). The only thing to note: #1205 (comment)

It's probably not the best example that this API only has a single field: https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/annotation/v2/Keys.html - However, by returning a dataclass object we are future proofing this method call to support adding additional fields (If Synapse ever does) to the return method without breaking existing code, since it would be NEW data added to the existing object.

synapse_id: str,
json_schema_uri: str,
*,
enable_derived_annos: bool = False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set this as an optional parameter. This parameter should probably also be added to the sync function

Comment on lines 45 to 50
objectId: str # Synapse ID of the object (e.g., "syn12345678")
objectType: str # Type of the object (e.g., "entity")
objectEtag: str # ETag of the object at the time of validation
schema_id: str # Renamed from "schema$id" to a valid Python identifier
isValid: bool # True if the object content conforms to the schema
validatedOn: str # ISO 8601 timestamp of when validation occurred
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes on a dataclass can have a docstring. It's formatted like:

schema_id: str
"""
Renamed from "schema$id" to a valid Python identifier
"""

By formatting this as a docstring it will help it to show in the mkdocs pages when we surface these for the API reference pages.

@@ -113,7 +114,16 @@ def newmethod(self, *args, **kwargs):

async def wrapper(*args, **kwargs):
"""Wrapper for the function to be called in an async context."""
return await getattr(self, async_method_name)(*args, **kwargs)
result = getattr(self, async_method_name)(*args, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code here so that it could handle async generator.


id: Optional[str] = None

async def bind_json_schema_to_entity_async(
Copy link
Member
@thomasyu888 thomasyu888 Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is ready or review, but I think the _to, _from entity is redundant in these mixins.

Similar to how there is set_permission and other mixins, since you are doing Entity.bind_json_schema, it makes it obvious that this is doing an operation on the entity.

To be explicit, this naming is good for the services (imo). I'll defer to @BryanFauble for his thoughts here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, Tom! I am thinking how about:

bind_schema_async
get_schema_async
delete_schema_async
validate_schema_async
get_schema_derived_keys_async
get_schema_validation_statistics_async
get_invalid_validation_async

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linglp I think that is an excellent suggestion for naming the functions! That way we would also have the non-async functions like

bind_schema
get_schema
delete_schema
validate_schema
get_schema_derived_keys
get_schema_validation_statistics
get_invalid_validation

@@ -686,14 +686,20 @@ def json_schema_validation(self, json_schema_uri: str):
return response

@authentication_required
def bind_json_schema_to_entity(self, synapse_id: str, json_schema_uri: str):
def bind_json_schema_to_entity(
Copy link
Member
@thomasyu888 thomasyu888 Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire module will be replaced by the new OOP Model so you may not need to add to this unless you are using this particular function

I just wanted to make sure any new functions that were written included the derived annotation functionality


if TYPE_CHECKING:
from synapseclient import Synapse
from synapseclient.models.mixins import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking that there aren't any circular dependencies here.

Copy link
Member
@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress - I know this is in draft mode, and i think most of this is there, but I would add a tutorial page on the jsonschema services from an oop perspective, and then add examples In the docstrings.

Work with @BryanFauble on where those should be added.

Comment on lines 12 to +15
7. Copying a file
8. Storing an activity to a file
9. Retrieve an activity from a file
10. Bind a JSON schema to files and validate its contents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize that I did not catch this earlier, but these files will eventually be removed in favor of the tutorial pages instead. The intention of these files was to give the very first iteration on what tutorial scripts could look like.

Take a look at this confluence page that covers tutorial creation: https://sagebionetworks.jira.com/wiki/spaces/SYNPY/pages/3864690727/SYNPY-1549+Python+Client+New+Service+Implementation+SOP#Tutorials

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0