8000 Added visit detail entity by ssuvorov-fls · Pull Request #174 · OHDSI/circe-be · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Added visit detail entity #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 23, 2022
Merged

Added visit detail entity #174

merged 27 commits into from
Sep 23, 2022

Conversation

ssuvorov-fls
Copy link
Contributor
@ssuvorov-fls ssuvorov-fls commented Aug 15, 2022

@chrisknoll, hi
Could you also check is sql query correct for visit detail criteria?
visit_detail.txt

@chrisknoll
Copy link
Collaborator

@ssuvorov-fls I'm not sure if you are attaching the 'sql rendered' sql or if you are sending me the translated form from one of the databases. Can you confirm if this is 'raw' sql or translated sql?

If it is 'raw' from the sql builder, then it's incorrect because we shouldn't see things like 'interval 1 day' in the output. We should see MicrosoftSql dialect, meaning, your date-from-string should look like 'DATEFROMPARTS()' and your adding dates should be DATEADD() and difference between dates should be 'DATEDIFF()'. You should be able to look at the existing criteria templates for examples of the sql functions and how they are used.

…entity

# Conflicts:
#	src/test/java/org/ohdsi/circe/cohortdefinition/CohortGeneration_5_0_0_Test.java
@ssuvorov-fls
Copy link
Contributor Author

@ssuvorov-fls I'm not sure if you are attaching the 'sql rendered' sql or if you are sending me the translated form from one of the databases. Can you confirm if this is 'raw' sql or translated sql?

If it is 'raw' from the sql builder, then it's incorrect because we shouldn't see things like 'interval 1 day' in the output. We should see MicrosoftSql dialect, meaning, your date-from-string should look like 'DATEFROMPARTS()' and your adding dates should be DATEADD() and difference between dates should be 'DATEDIFF()'. You should be able to look at the existing criteria templates for examples of the sql functions and how they are used.

I changed to raw version

@chrisknoll
Copy link
Collaborator

Strange formatting in your output, not sure if you formatted this or if there's another issue:

Example 1: defining codeset temp table (other temp table creation has same strange formatting):

CREATE TABLE #Codesets
(
    codeset_id
    int
    NOT
    NULL,
    concept_id
    bigint
    NOT
    NULL
)
;

Example 2: DateFromParts:

                               WHERE C.visit_start_date
                                   < DATEFROMPARTS(2022
                                   , 1
                                   , 15)
                                 AND C.visit_end_date
                                   < DATEFROMPARTS(2022
                                   , 8
                                   , 15)
                                 AND C.visit_type_concept_id in (44787754)
                                 AND DATEDIFF(d
                                   , C.visit_start_date
                                   , C.visit_end_date)
                                   > 10

Other style notes:
When doing a join, I like to put it in this form (which may not be consistent everywhere, but I try to catch it where I can:

                               from (
                                        select vd.*,
                                               row_number()
                                               over (PARTITION BY vd.person_id ORDER BY vd.visit_detail_start_date, vd.visit_detail_id) as ordinal
                                        FROM @cdm_database_schema.VISIT_DETAIL vd
                                    ) C
                                        JOIN @cdm_database_schema.PERSON P
                               on C.person_id = P.person_id
                                   JOIN @cdm_database_schema.CARE_SITE CS on C.care_site_id = CS.care_site_id

When doing this join, we want to put the ON clause on the same line as the JOIN, but any additional AND clauses to the join can be either in-line or indented on 8000 the next line. Example:

FROM TABLE_1 T1
JOIN TABLE_2 T2 ON T1.col1 = T2.col2
  and T1.col3 = T2.col3

As far as the actual logic of the query, I'm going to leave that to the unit tests to confirm the proper implementation, but I think we already have a test that will check the syntax o the query, but will need tests to test the individual criteria attributes are pulling the correct rows from the test set.

@ssuvorov-fls
Copy link
Contributor Author

It seems that my ide autoformatting breaks sql code
I changed it the version that I get from atlas

@ssuvorov-fls
Copy link
Contributor Author

@chrisknoll
Hi.
Could you help with PrintFriendlyTest.nullCohortTest?
There's a strange error during this test and I can't figure out what is the problem

@@ -25,14 +26,17 @@ public static String getCodesetJoinExpression(Integer standardCodesetId, String
String joinExpression = "";
ArrayList<String> codesetClauses = new ArrayList<>();
String excludeExpression = exclude ? "<>" : "=";
String aliasPostfix = StringUtils.replaceAll(UUID.randomUUID().toString(), "-", "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're creating this in order to support multiple joins to the #codeset table, correct? I was thinking you could do an 'in clause instead of adding joins. As in:

WHERE  specialty_concept_id in (select concept_id from #codeset where codeset_id = @specialtyConceptId)

I think DBMSs can perform the necessary optimization where the in() is optimized to the same work as a join. The reason I like this form is that the criteria are all applied together in the WHERE clause and you don't have it mixed between JOINs and WHEREs.

I'm assuming that you followed the JOIN pattern from how we use the codeset_id when querying the domain tables (like for condition occurrence) but that was because, at the time, it wasn't an optional thing so putting it as part of the 'core' query made sense. But in this case, I think it would 'look' better if we just appended to the WHERE clauses (where concept_id in (select...) unless we can demonstrate a performance impact.

@chrisknoll
Copy link
Collaborator

I am sorry I missed your comment about the nullCohortTest. Let me touch base with you at tomorrow's meeting and you can give me the details about the issue.

@chrisknoll
Copy link
Collaborator

I'm sorry we didn't connect on the call, so I will pull down this branch and see if i can replicate your testing error.

@chrisknoll
Copy link
Collaborator

Side note: please set your editor to 2-space indent on this repo. 2-space indent was the original format, but you'll find cases where it is 4 space indent (mostly from odysseus updates) but I want to prevent that style change from continuing.

@chrisknoll
Copy link
Collaborator

@ssuvorov-fls , I've run the nullCohortTest on my local env on this branch, and it passes.

The nullCohrotTest() is just a null test when trying to call printFriendly.renderCohort() passing a NULL. The expected result is an exception (and this test attempts to cover code where null checking is being done).

What problem were you having with the test?

@chrisknoll
Copy link
Collaborator

Circling back on this: I need to look at the final output of the query when dealing with multiple concept sets in criteria, but if I understand the code properly, when you have multiple concept sets, the join clause is being expanded to make multiple joins back to the conceptset table, for each criteria check. I think the structure of the query looks like this (with 3 concept set criteria in this example):

select ...columns...
from cdm.VISIT_DETAIL vd
Join #codeset {uuid1}  on {uuid1}.concept_id = p.gender_id 
join #codeset {uuid2) on {uuid2}.concept_id = vd.visit_detail_type
join #codeset {uuid3) on {uuid2}.concept_id = PR.specialty_concept_id
WHERE
   {uuid1}.codeset_id = 1  and {uuid2}..codeset_id = 2  and  {uuid3}..codeset_id = 2

Vs:

select ...columns...
from cdm.VISIT_DETAIL vd
WHERE  p.gender_concept_id in (select concept_id from #codeset where codeset_id = 1)
  AND vd.visit_detail_type in (select concept_id from #codeset where codeset_id = 2)
  AND vPR.specialty_concept_id in (select concept_id from #codeset where codeset_id = 3)

What is nice about the second form is that the 'exclusion' form 'where we say "not in" is easy:

select ...columns...
from cdm.VISIT_DETAIL vd
WHERE  p.gender_concept_id not in (select concept_id from #codeset where codeset_id = 1)
  AND vd.visit_detail_type not in (select concept_id from #codeset where codeset_id = 2)
  AND vPR.specialty_concept_id not in (select concept_id from #codeset where codeset_id = 3)

Looking at the code, if you change the = to <>, I think it's an issue that you will get multiple rows returned in the join condition, ie:

Join #codeset {uuid1} on {uuid1}.concept_id <> p.gender_id .... {uuid1}.codeset_id = 1

This will return 1 row for each record that doesn't match the gender_concept_id...which is wrong, we should just return the row if it is true that none of the concepts in the concept set.

Therefore, I believe the path forward is to remove the JOIN approach and use IN specifically because of the way the 'not in' should be handled.

@ssuvorov-fls
Copy link
Contributor Author

Hi @chrisknoll
I fixed sql generation according to your suggestions

@chrisknoll
Copy link
Collaborator
chrisknoll commented Sep 16, 2022

@ssuvorov-fls ,

Thank you for your efforts on this. I was going to attempt to write some test cases to ensure that the in/not-in cases work as expected, but ran into a couple of concerns that I'd like to go over with you to figure out best way forward:

CDM version 5.3 vs. 5.4

VISIT_DETAIL was added in 5.3, but underwent changes in 5.4 Since we don't have a good way to configure the SQL builders to generate different versons of the queries based on CDM version. So, I spoke to @alondhe , and he confirms that he's on 5.3 but understand the troubles of implementing criteria that changes the underlying column names between releases. So, He's authorized that we can drop the changed columns from v5.3 to v5.4 from the VisitDetail criteria. These are:

  • v5.3 -> v5.4
  • Admitting_source_concept_id -> Admitted_from_concept_id
  • Admitting_source_value -> Admitted_from_source_value
  • Discharge_to_concept_id -> Discharged_to_concept_id
  • Discharge_to_source_value -> Discharged_to_source_value
  • Visit_detail_parent_id -> Parent_visit_detail_id

So, if any of those 5 were implemented in VisitDetail, you can remove those from the VisitDetail criteria and the UI in atlas.

Test Cases

I appreciate that you implemented test cases for the VisitDetail class, but, unfortunately, you altered the DDL for the cdm v5.0 when the VisitDetail was introduced in v5.3. In addition, you added test cases to the WindowCriteria_5_0_0_Test.java when the tests related to the v5.3 version should appear in a class dedicated to that version of tests (ie: WindowCriteria_5_3_0_Test.java). You'll notice we have a v5.2 schema for testing, but the only test there is to deal with the new 'condition_status' column. I think we should have a v5.3 schema for adding the VISIT_DETAIL table (ie: create a new file called cdm_v5.3.sql under the test/ddl folder)

It has been a major struggle for me to find a way to cleanly make Circe support multiple CDM versions in the same version of Circe. I do have a mental idea about an approach by creating CDM-versioned CohortExpressionQueryBuilder (that embeds the correct versions of the sql builders for the given CDM version) but I haven't made any progress. I'm still thinking that maybe a version of circe is specific to one version of a CDM, but that may not be possible with multiple CDM sources in WebAPI. I did try to begin the multi-cdm-version support, which is why the test cases were split out that way, so I'd ask that you try to follow that structure. If you think that it's too much effort or complexity, let me know and I'll try to figure out a compromise.

The second issue with testing was testing the 'concept-set based' criteria such has gender, provider specialty, etc. I attempted to work in those kind of tests into the WindowCriteria test cases you wrote, but noticed the above problem (that it was added to the CDM 5.0 set of tests) but also there isn't an easy way to add concept set expression #codeset temp table. But before delving into the details about a solution, I just wanted to get your feedback about structuring the test cases to align with the CDM version they are for.

Those are the remaining two issues, but they are fairly large: the v5.3 vs. 5.4 changes can be handled by not support either the 5.3 or 5.4 columns in circe, and at a later time we can add those changed columns if there is a demand for it (but only at the version 5.4 level, not supporting both v5.3 and 5.4 at the same time). The other issue on testing is something I do feel rather strongly about: we want to make sure that the codeset-based criteria works as expected, especially using the in vs. not-in options.

@chrisknoll
Copy link
Collaborator

I have pushed up a commit to re-organize the tests into cdm v5.3 groups. I would like to add a couple of tests to check the in/not-in behavior of concept-set-based criteria attributes (provider, gender, etc). I'll push up another commit for that tomorrow.

If you could remove the unused criteria attributes from the object model and Atlas UI, I'll add the new tests, and that should get this done for a new release that we can push out for WebAPI 2.12.

ssuvorov-fls and others added 2 commits September 19, 2022 13:10
Updated printfriendly to handle ConceptSetSelections.
Implemented tests for VisitDetail in/not in criteria.
@chrisknoll
Copy link
Collaborator

@ssuvorov-fls : I've made one last push with the following changes:

new class: ConceptSetSelection

As I was writing tests for in/not in, I noticed that the Gender, PlaceOfService and ProviderSpecialty was not handling the not-in logic (it assumed it was always 'in'). Because we are going to want this capability, I've introduced a new class: ConceptSetSelection that encapsulate a codesetId and if it is an exclusion or not (boolean: isExclusion). I understand that this is a bit of scope-creep moving from a simple Integer field into a complex object, but it was necessary to do this now because if we started with just an Inteer field and then later needed the 'isExclusion' field, we would have had these top level properties isGenderCsExclusion and isProviderSpecialtyExclusion that wouldn't have had that connection to the codesetId and if it should be applied as an exclusion (ie: not-in). But, I tried to take on most of the heavy lifting with this change: I modified the tests, object model, and print-friendly to handle this change. What is needed on your side is 2 parts: 1) Can you ensure that the 'Checkers' are doing the right thing with respect to the codesetIds that are embedded in ConceptSetSelection instances? I had trouble figuring out how the checkers were implemetned, and I'm hoping that it is simple for you to do: we should check inside ConceptSetSelections to see if a concept set is used, because we don't want a "fix it" on a concept set (ie: remove warning) when a concept set is only used inside a ConceptSetSelection. The part 2): On the AtlasUI side, you should introduce a new InputType called ConceptSetSelection with the properties described in that class. Follow the pattern in Period.js. Also on the subject of Atlas JS: in VisitDetail you will want to handle the conceptSets subscription to detect changes to the ConceptSet, such that if a codeset is deleted (see line 12 of the file), it should reset the CodeSetId of each of the *CS fields that match the CodeSetId that was just deleted.

With those 2 main changes, we should be all set to go with this PR. If you have any concerns, please ping me on Skype.

@ssuvorov-fls
Copy link
Contributor Author

@chrisknoll
We made changes according to your comment
Could you check it?

@chrisknoll
Copy link
Collaborator

Yes, that looks correct.

@chrisknoll chrisknoll merged commit 665a097 into master Sep 23, 2022
@chrisknoll chrisknoll deleted the added_visit_detail_entity branch September 27, 2022 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0