8000 ENG-268 Add seed/copy consolidate script by leite08 · Pull Request #3859 · metriport/metriport · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ENG-268 Add seed/copy consolidate script #3859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

leite08
Copy link
Member
@leite08 leite08 commented May 16, 2025

Dependencies

none

Description

  • Add see/copy consolidate script
  • Reorg fhir and cda scripts

Testing

  • Local
    • new script works
  • Staging
    • none
  • Sandbox
    • none
  • Production
    • none

Release Plan

  • Merge this

Summary by CodeRabbit

  • New Features
    • Introduced a script to copy and update consolidated patient data between IDs in AWS S3, supporting both JSON and non-JSON files, with progress logging and user confirmation.
    • Added a script to validate consolidated patient data bundles locally or from AWS S3, with detailed logging and error reporting.
    • Added a script to calculate total size of consolidated data files in S3 for given customer and patient IDs, with progress tracking and reporting.
  • Chores
    • Updated package configuration to support direct imports and type resolution for modules within the medical directory.
    • Improved concurrency and enhanced logging in patient file history processing for better runtime visibility.

Copy link
linear bot commented May 16, 2025

Copy link
coderabbitai bot commented May 16, 2025

Walkthrough

The shared package's package.json was updated to add export and type resolution patterns for the "medical" subpath, enabling direct imports and type support for files in the medical directory. New scripts were added: one to copy and adapt patient data between S3 buckets with confirmation and logging, and another to check a patient's consolidated data bundle either locally or from S3. Multiple import paths in various FHIR-related utility files were adjusted to reflect updated relative directory structures without changing any logic. Additionally, a new script was added to calculate consolidated data size in S3, and improvements were made to concurrency and logging in the file-history script.

Changes

File(s) Change Summary
packages/shared/package.json Extended exports and typesVersions to support the "medical" subpath, enabling direct imports and type resolution for medical/*.
packages/utils/src/consolidated/seed-patient-consolidated.ts Added a new script to copy patient data between S3 buckets, with confirmation, logging, and parallel processing.
packages/utils/src/consolidated/check-patient-consolidated.ts Added a new script to verify a patient's consolidated data bundle from local file or S3, with validation and logging.
packages/utils/src/s3/calculate-consolidated-size-in-s3.ts Added a new script to calculate total size of consolidated patient data files in S3, with concurrency and logging.
packages/utils/src/s3/file-history.ts Enhanced concurrency limit and improved logging with elapsed time formatting; consolidated imports.
packages/utils/src/fhir/fhir-converter/* Updated import paths to use deeper relative paths (e.g., from ../shared/ to ../../shared/) without changing logic or functionality.
packages/utils/src/fhir/fhir-deduplication/report/* Updated import paths to use deeper relative paths (e.g., from ../../shared/ to ../../../shared/) without changing logic or functionality.
packages/utils/src/fhir/fhir-hydration/* Updated import paths from ../shared/fs to ../../shared/fs without changing logic or functionality.
packages/utils/src/fhir/fhir-normalization/* Updated import paths from ../shared/fs to ../../shared/fs without changing logic or functionality.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SeedScript
    participant AWS_S3

    User->>SeedScript: Run seed-patient-consolidated.ts
    SeedScript->>User: Display warning and confirmation prompt
    User->>SeedScript: Confirm operation
    SeedScript->>AWS_S3: List files with source prefix (medical and conversion buckets)
    loop For each file (parallel up to 30)
        SeedScript->>AWS_S3: Download file (if JSON) or copy file (if not JSON)
        alt JSON file
            SeedScript->>SeedScript: Replace source patient ID with destination patient ID
            SeedScript->>AWS_S3: Upload modified JSON to destination path
        else Non-JSON file
            SeedScript->>AWS_S3: Direct S3 copy to destination path
        end
    end
    SeedScript->>User: Log progress and completion
Loading
sequenceDiagram
    participant User
    participant CheckScript
    participant AWS_S3

    User->>CheckScript: Run check-patient-consolidated.ts with cxId, patientId, optional file path
    alt Local file path provided
        CheckScript->>CheckScript: Read bundle JSON from local file
    else No local file path
        CheckScript->>AWS_S3: Download bundle JSON from S3 bucket
    end
    CheckScript->>CheckScript: Parse JSON as FHIR Bundle
    CheckScript->>CheckScript: Run checkBundle validation
    CheckScript->>User: Log validation results and elapsed time
Loading

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

npm warn config production Use --omit=dev instead.
npm error code ERR_SSL_WRONG_VERSION_NUMBER
npm error errno ERR_SSL_WRONG_VERSION_NUMBER
npm error request to https://10.0.0.28:4873/punycode/-/punycode-2.3.1.tgz failed, reason: C0AC79999A7F0000:error:0A00010B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:354:
npm error
npm error A complete log of this run can be found in: /.npm/_logs/2025-05-27T16_01_46_795Z-debug-0.log

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@leite08 leite08 changed the title ENG-268 Add see/copy consolidate script ENG-268 Add seed/copy consolidate script May 16, 2025
@leite08 leite08 marked this pull request as ready for review May 16, 2025 19:53
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
packages/utils/src/consolidated/seed-patient-consolidated.ts (4)

13-13: Remove or explain commented import

The commented import suggests functionality that might be relevant but is not being used.

Either remove the commented import or add a comment explaining why it's kept but commented out:

-// import { isMimeTypeXML } from "@metriport/core/util/mime";
+// XML processing temporarily disabled - will be implemented in future version

54-54: Remove unused commented code

This line appears to be leftover development code that's not used.

-  // const destinationPath = createFolderName(sourceCxId, sourcePatientId);

76-76: Remove or explain commented functionality

There's commented code that suggests XML files might need special handling.

-        // if (isMimeTypeXML(fileType)) return;
+        // Skip XML files processing for now - will be implemented in ENG-XXX

110-110: Consider making parallel execution count configurable

The hard-coded value of 30 parallel executions might not be optimal for all environments or workloads.

-      { numberOfParallelExecutions: 30 }
+      { numberOfParallelExecutions: Number(process.env.PARALLEL_EXECUTIONS) || 30 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a3853e and d73527d.

📒 Files selected for processing (2)
  • packages/shared/package.json (2 hunks)
  • packages/utils/src/consolidated/seed-patient-consolidated.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
  • Error handling
    • Pass the original error as the new one’s cause so the stack trace is persisted
    • Error messages should have a static message - add dynamic data to MetriportError's additionalInfo prop
    • Avoid sending multiple events to Sentry for a single error
  • Global constants and variables
    • Move literals to constants declared after imports when possible (avoid magic numbers)
    • Avoid shared, global objects
  • Avoid using console.log and console.error in packages other than utils, infra and shared,
    and try to use out().log instead
  • Avoid multi-line logs
    • don't send objects as a second parameter to console.log() or out().log()
    • don't create multi-line strings when using JSON.stringify()
  • Use eslint to enforce code style
  • Use prettier to format code
  • max column length is 100 chars
  • multi-line comments use /** */
  • scripts: top-level comments go after the import
  • packages/utils/src/consolidated/seed-patient-consolidated.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
packages/shared/package.json (2)

27-27: Well-structured addition to exports pattern

The addition of the medical subpath export pattern follows the established convention in the file and will allow consumers to directly import from the medical directory.


60-62: Type declarations properly added

The TypeScript type declarations for the medical subpath are correctly configured, maintaining consistency with other subpaths in the package.

packages/utils/src/consolidated/seed-patient-consolidated.ts (1)

57-58: 🛠️ Refactor suggestion

Use out().log consistently instead of console.log

According to the coding guidelines, direct use of console.log should be avoided in favor of out().log.

-    console.log(`>>> Processing bucket ${bucketName}...`);
+    log(`>>> Processing bucket ${bucketName}...`);
-    console.log(`... found ${sourceFiles.length} files to copy`);
+    log(`... found ${sourceFiles.length} files to copy`);
-        console.log(`... copied ${sourceFileKey} to ${destinationFileKey}`);
+        log(`... copied ${sourceFileKey} to ${destinationFileKey}`);

Also applies to: 60-60, 108-108

⛔ Skipped due to learnings
Learnt from: leite08
PR: metriport/metriport#3489
File: packages/api/src/routes/ehr/elation/appointment-webhook.ts:32-36
Timestamp: 2025-03-21T00:21:26.928Z
Learning: Use `out().log` instead of `console.log` for logging in packages other than utils, infra and shared.
Learnt from: thomasyopes
PR: metriport/metriport#3466
File: packages/api/src/routes/ehr/shared.ts:122-147
Timestamp: 2025-03-17T17:01:17.227Z
Learning: Avoid using console.log and console.error in packages other than utils, infra and shared, and try to use out().log instead according to Metriport's coding guidelines.

@leite08 leite08 force-pushed the eng-268-seed-consolidated-script branch 2 times, most recently from d5f2f4b to 5384386 Compare May 18, 2025 20:39
// STOP! Never point these to production buckets!
const medicalDocsBucketName = getEnvVarOrFail("MEDICAL_DOCUMENTS_BUCKET_NAME");
const conversionsBucketName = getEnvVarOrFail("CONVERSION_RESULT_BUCKET_NAME");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some validation here in case something gets pointed to prod, we catch it?
if (medicalDocsBucketName.toLowerCase().includes("prod") || conversionsBucketName.toLowerCase().includes("prod")) throw some exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor
@RamilGaripov RamilGaripov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but address @husainsyed's comments first. I like the one about prod warning

leite08 added 2 commits May 22, 2025 11:02
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
@leite08 leite08 force-pushed the eng-268-seed-consolidated-script branch from 5384386 to 453b030 Compare May 22, 2025 14:02
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
packages/utils/src/consolidated/seed-patient-consolidated.ts (3)

25-28: 🛠️ Refactor suggestion

Make IDs configurable through command-line arguments

Empty string variables for source and destination IDs require manual file editing before use, which reduces usability and makes the script harder to maintain.

Consider using command-line arguments or environment variables instead:

-const sourceCxId = "";
-const sourcePatientId = "";
-const destinationCxId = "";
-const destinationPatientId = "";
+const sourceCxId = process.env.SOURCE_CX_ID || process.argv[2];
+const sourcePatientId = process.env.SOURCE_PATIENT_ID || process.argv[3];
+const destinationCxId = process.env.DESTINATION_CX_ID || process.argv[4];
+const destinationPatientId = process.env.DESTINATION_PATIENT_ID || process.argv[5];
+
+if (!sourceCxId || !sourcePatientId || !destinationCxId || !destinationPatientId) {
+  console.error("Missing required parameters. Usage:");
+  console.error("ts-node src/consolidated/seed-patient-consolidated.ts <sourceCxId> <sourcePatientId> <destinationCxId> <destinationPatientId>");
+  process.exit(1);
+}

129-134: 🛠️ Refactor suggestion

Enhance confirmation mechanism for better safety

The current confirmation is passive - it just waits for a fixed time period without requiring explicit user confirmation. This could lead to unintended operations if the script is started accidentally.

Consider implementing an active confirmation mechanism:

 async function displayWarningAndConfirmation(log: typeof console.log) {
   const msg = `You are about to copy the consolidated data of patient ${sourcePatientId} to ${destinationPatientId}.`;
   log(msg);
-  log("Cancel this now if you're not sure.");
-  await sleep(confirmationTime.asMilliseconds());
+  log("Type 'CONFIRM' to proceed or press Ctrl+C to cancel...");
+  
+  return new Promise<void>((resolve) => {
+    const stdin = process.stdin;
+    stdin.setRawMode(true);
+    stdin.resume();
+    stdin.setEncoding("utf8");
+    
+    let input = "";
+    const  string) => {
+      // ctrl-c
+      if (data === "\u0003") {
+        log("Operation cancelled");
+        process.exit(0);
+      }
+      
+      // Backspace
+      if (data === "\u007f" && input.length > 0) {
+        input = input.substring(0, input.length - 1);
+        process.stdout.write("\b \b");
+        return;
+      }
+      
+      if (data === "\r" || data === "\n") {
+        process.stdout.write("\n");
+        if (input === "CONFIRM") {
+          stdin.removeListener("data", onData);
+          stdin.setRawMode(false);
+          stdin.pause();
+          resolve();
+        } else {
+          log("Invalid confirmation. Operation cancelled.");
+          process.exit(1);
+        }
+        return;
+      }
+      
+      input += data;
+      process.stdout.write(data);
+    };
+    
+    stdin.on("data", onData);
+  });
 }

59-60: 🛠️ Refactor suggestion

Add a validation check for identical source and destination

The script should prevent accidentally copying a patient's data onto itself, which could potentially cause data corruption or unexpected behavior.

  const sourceFilenamePrefix = createFilePath(sourceCxId, sourcePatientId, "");
  const destinationFilenamePrefix = createFilePath(destinationCxId, destinationPatientId, "");
+  
+  // Prevent copying to the same patient
+  if (sourceCxId === destinationCxId && sourcePatientId === destinationPatientId) {
+    log("ERROR: Source and destination patient IDs are identical. Operation aborted.");
+    process.exit(1);
+  }
🧹 Nitpick comments (2)
packages/utils/src/consolidated/seed-patient-consolidated.ts (2)

63-68: Use consistent logging approach

The script mixes direct console.log calls with the structured logger created via out(""). Using a consistent logging approach improves maintainability and readability.

  async function copyData(bucketName: string) {
-    console.log(`>>> Processing bucket ${bucketName}...`);
+    log(`>>> Processing bucket ${bucketName}...`);
    const sourceFiles = await s3Utils.listObjects(bucketName, sourceFilenamePrefix);
    if (sourceFiles.length < 1) return;
-    console.log(`... found ${sourceFiles.length} files to copy`);
+    log(`... found ${sourceFiles.length} files to copy`);

    // Later in the function
-        console.log(`... copied ${sourceFileKey} to ${destinationFileKey}`);
+        log(`... copied ${sourceFileKey} to ${destinationFileKey}`);

Also applies to: 115-115


117-117: Consider adjusting the parallelism level based on expected workload

Setting numberOfParallelExecutions to 30 could potentially cause AWS API rate limiting issues if there are many files to process.

-      { numberOfParallelExecutions: 30 }
+      { numberOfParallelExecutions: 10 }

Consider a more conservative value initially, or make this configurable through an environment variable:

+const parallelExecutions = parseInt(process.env.PARALLEL_EXECUTIONS || "10", 10);

// Then use it later
-      { numberOfParallelExecutions: 30 }
+      { numberOfParallelExecutions: parallelExecutions }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 453b030 and ef0fcef.

📒 Files selected for processing (1)
  • packages/utils/src/consolidated/seed-patient-consolidated.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
  • Error handling
    • Pass the original error as the new one’s cause so the stack trace is persisted
    • Error messages should have a static message - add dynamic data to MetriportError's additionalInfo prop
    • Avoid sending multiple events to Sentry for a single error
  • Global constants and variables
    • Move literals to constants declared after imports when possible (avoid magic numbers)
    • Avoid shared, global objects
  • Avoid using console.log and console.error in packages other than utils, infra and shared,
    and try to use out().log instead
  • Avoid multi-line logs
    • don't send objects as a second parameter to console.log() or out().log()
    • don't create multi-line strings when using JSON.stringify()
  • Use eslint to enforce code style
  • Use prettier to format code
  • max column length is 100 chars
  • multi-line comments use /** */
  • scripts: top-level comments go after the import
  • packages/utils/src/consolidated/seed-patient-consolidated.ts
🧠 Learnings (2)
📓 Common learnings
Learnt from: leite08
PR: metriport/metriport#3814
File: packages/api/src/routes/internal/medical/patient-consolidated.ts:141-174
Timestamp: 2025-05-20T21:26:26.804Z
Learning: The functionality introduced in packages/api/src/routes/internal/medical/patient-consolidated.ts is planned to be refactored in downstream PR #3857, including improvements to error handling and validation.
packages/utils/src/consolidated/seed-patient-consolidated.ts (1)
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:41-55
Timestamp: 2025-04-24T02:17:02.661Z
Learning: Files under `packages/utils` don't need robust error handling as they are utility scripts meant for local/developer environments, not production code.
🧬 Code Graph Analysis (1)
packages/utils/src/consolidated/seed-patient-consolidated.ts (3)
packages/core/src/external/aws/s3.ts (1)
  • S3Utils (114-509)
packages/shared/src/index.ts (1)
  • sleep (13-13)
packages/core/src/util/concurrency.ts (1)
  • executeAsynchronously (83-130)

leite08 added 3 commits May 22, 2025 16:04
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
packages/utils/src/consolidated/check-patient-consolidated.ts (2)

38-40: Remove or document the purpose of the sleep call.

The sleep(100) call at the beginning of the main function seems arbitrary and its purpose isn't clear. It adds a delay without any explanation.

Either remove this sleep call if it's unnecessary or add a comment explaining its purpose:

async function main() {
-  await sleep(100);
+  // If sleep is necessary, add a comment explaining why
  initRunsFolder();

17-27: Add more detailed usage instructions in the documentation.

The documentation explains the purpose of the script but doesn't provide details on how to configure it before running.

Enhance the documentation with more detailed usage instructions:

/**
 * This script checks a patient's consolidated data using the checkBundle() function
 * from packages/core.
 *
 * Either set the bundleFilePath or the cxId and patientId.
 *
+ * Configuration:
+ * - bundleFilePath: Path to a local JSON file containing the bundle
+ * - cxId: Customer ID when downloading from S3
+ * - patientId: Patient ID when downloading from S3
+ *
+ * Environment variables required:
+ * - MEDICAL_DOCUMENTS_BUCKET_NAME: S3 bucket name for medical documents
+ * - AWS_REGION: AWS region for S3 operations
 *
 * Execute this with:
 * $ ts-node src/consolidated/check-patient-consolidated.ts
+ * 
+ * Or with arguments:
+ * $ ts-node src/consolidated/check-patient-consolidated.ts [bundleFilePath] [cxId] [patientId]
 */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef0fcef and ea100ed.

📒 Files selected for processing (15)
  • packages/utils/src/consolidated/check-patient-consolidated.ts (1 hunks)
  • packages/utils/src/consolidated/seed-patient-consolidated.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/convert-and-generate-mr.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/convert.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/count-bundle-refs.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/count-resources.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/integration-test.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/s3-to-lambda.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/shared.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/validate-fhir-bundles.ts (1 hunks)
  • packages/utils/src/fhir/fhir-converter/validator/validate-converted-xmls.ts (1 hunks)
  • packages/utils/src/fhir/fhir-deduplication/report/get-files.ts (1 hunks)
  • packages/utils/src/fhir/fhir-deduplication/report/report-dedup.ts (1 hunks)
  • packages/utils/src/fhir/fhir-hydration/hydrate-files-local.ts (1 hunks)
  • packages/utils/src/fhir/fhir-normalization/normalize-files-local.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/utils/src/fhir/fhir-converter/s3-to-lambda.ts
🚧 Files skipped from review as they are similar to previous changes (13)
  • packages/utils/src/fhir/fhir-converter/integration-test.ts
  • packages/utils/src/fhir/fhir-converter/shared.ts
  • packages/utils/src/fhir/fhir-converter/validator/validate-converted-xmls.ts
  • packages/utils/src/fhir/fhir-converter/convert.ts
  • packages/utils/src/fhir/fhir-deduplication/report/get-files.ts
  • packages/utils/src/fhir/fhir-normalization/normalize-files-local.ts
  • packages/utils/src/fhir/fhir-converter/count-resources.ts
  • packages/utils/src/fhir/fhir-deduplication/report/report-dedup.ts
  • packages/utils/src/fhir/fhir-converter/count-bundle-refs.ts
  • packages/utils/src/fhir/fhir-converter/convert-and-generate-mr.ts
  • packages/utils/src/fhir/fhir-converter/validate-fhir-bundles.ts
  • packages/utils/src/fhir/fhir-hydration/hydrate-files-local.ts
  • packages/utils/src/consolidated/seed-patient-consolidated.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
  • Error handling
    • Pass the original error as the new one’s cause so the stack trace is persisted
    • Error messages should have a static message - add dynamic data to MetriportError's additionalInfo prop
    • Avoid sending multiple events to Sentry for a single error
  • Global constants and variables
    • Move literals to constants declared after imports when possible (avoid magic numbers)
    • Avoid shared, global objects
  • Avoid using console.log and console.error in packages other than utils, infra and shared,
    and try to use out().log instead
  • Avoid multi-line logs
    • don't send objects as a second parameter to console.log() or out().log()
    • don't create multi-line strings when using JSON.stringify()
  • Use eslint to enforce code style
  • Use prettier to format code
  • max column length is 100 chars
  • multi-line comments use /** */
  • scripts: top-level comments go after the import
  • packages/utils/src/consolidated/check-patient-consolidated.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (3)
packages/utils/src/consolidated/check-patient-consolidated.ts (3)

1-16: LGTM! Import and configuration setup looks good.

The imports are well-organized with dotenv config correctly placed at the top. The dayjs plugin extension is properly set up.


33-37: LGTM! AWS and S3 setup is correct.

The S3 setup appropriately uses environment variables to get the bucket name and region.


62-65: LGTM! Elapsed time measurement and clean exit.

The elapsed time calculation and logging is a good practice for script performance tracking.

Comment on lines 55 to 60
try {
log(`>>> Checking bundle...`);
checkBundle(bundle, cxId, patientId);
} catch (error) {
log(`>>> Error: ${error}`);
}
Copy link
@coderabbitai coderabbitai bot May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and exit with appropriate code.

The script catches errors during bundle checking but continues execution and exits with code 0 (success) regardless of whether errors occurred. This could be misleading for automated processes that rely on exit codes.

Enhance the error handling to provide more context and exit with a non-zero code on failure:

  try {
    log(`>>> Checking bundle...`);
    checkBundle(bundle, cxId, patientId);
+   log(`>>> Bundle check completed successfully`);
  } catch (error) {
-   log(`>>> Error: ${error}`);
+   log(`>>> Error during bundle validation: ${error}`);
+   // Exit with error code
+   process.exit(1);
  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
log(`>>> Checking bundle...`);
checkBundle(bundle, cxId, patientId);
} catch (error) {
log(`>>> Error: ${error}`);
}
try {
log(`>>> Checking bundle...`);
checkBundle(bundle, cxId, patientId);
log(`>>> Bundle check completed successfully`);
} catch (error) {
log(`>>> Error during bundle validation: ${error}`);
// Exit with error code
process.exit(1);
}
🤖 Prompt for AI Agents
In packages/utils/src/consolidated/check-patient-consolidated.ts around lines 55
to 60, the current error handling logs the error but does not stop execution or
exit with a failure code. Modify the catch block to log the error with more
context and then exit the process with a non-zero exit code to indicate failure,
ensuring automated systems can detect the error properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for error handling on util scripts

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leite08 Understood! I see from our previous discussions that utility scripts in the packages/utils directory have different requirements compared to production code and don't need comprehensive error handling. I'll keep this in mind for future reviews of utility scripts.

🐰
( ^.^ )
>


🧠 Learnings used
Learnt from: leite08
PR: metriport/metriport#3412
File: packages/utils/src/patient-import/send-webhook.ts:41-55
Timestamp: 2025-04-24T02:17:02.661Z
Learning: Files under `packages/utils` don't need robust error handling as they are utility scripts meant for local/developer environments, not production code.

Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:68-82
Timestamp: 2025-04-05T16:02:22.473Z
Learning: Utility scripts in the packages/utils directory don't require comprehensive error handling like production code.

Learnt from: leite08
PR: metriport/metriport#3595
File: packages/utils/src/carequality/download-directory.ts:36-66
Timestamp: 2025-04-03T02:23:58.332Z
Learning: In the metriport codebase, utility scripts may not require the same level of error handling as production code, especially for simple or one-off tasks like in the CQ directory download script.

Learnt from: leite08
PR: metriport/metriport#3611
File: packages/utils/src/carequality/download-directory.ts:33-66
Timestamp: 2025-04-05T16:02:31.219Z
Learning: In the Metriport codebase, utility scripts in the packages/utils directory have different error handling requirements compared to production code, and do not necessarily need the same level of error handling.

Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
@leite08 leite08 force-pushed the eng-268-seed-consolidated-script branch from d4faadb to 733bbee Compare May 26, 2025 16:25
Ref eng-268

Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
packages/utils/src/s3/calculate-consolidated-size-in-s3.ts (2)

41-41: Consider using more descriptive logging.

Following the coding guidelines for meaningful names and to align with the logging pattern used in the updated file-history.ts script.

-  console.log(`${Object.keys(cxIds).length} cxs, ${cxAndPatient.length} patients`);
+  console.log(`${Object.keys(cxIds).length} customers, ${cxAndPatient.length} patients`);

27-27: Consider providing example data for the cxAndPatient array.

The array is currently empty and needs to be populated by users. Consider adding example entries as comments to help users understand the expected format.

-const cxAndPatient: string[][] = [];
+const cxAndPatient: string[][] = [
+  // ["customer-id-1", "patient-id-1"],
+  // ["customer-id-2", "patient-id-2"],
+];
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 733bbee and a0a760e.

📒 Files selected for processing (2)
  • packages/utils/src/s3/calculate-consolidated-size-in-s3.ts (1 hunks)
  • packages/utils/src/s3/file-history.ts (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...

**/*.ts: - Use the Onion Pattern to organize a package's code in layers

  • Try to use immutable code and avoid sharing state across different functions, objects, and systems
  • Try to build code that's idempotent whenever possible
  • Prefer functional programming style functions: small, deterministic, 1 input, 1 output
  • Minimize coupling / dependencies
  • Avoid modifying objects received as parameter
  • Only add comments to code to explain why something was done, not how it works
  • Naming
    • classes, enums: PascalCase
    • constants, variables, functions: camelCase
    • file names: kebab-case
    • table and column names: snake_case
    • Use meaningful names, so whoever is reading the code understands what it means
    • Don’t use negative names, like notEnabled, prefer isDisabled
    • For numeric values, if the type doesn’t convey the unit, add the unit to the name
  • Typescript
    • Use types
    • Prefer const instead of let
    • Avoid any and casting from any to other types
    • Type predicates: only applicable to narrow down the type, not to force a complete type conversion
    • Prefer deconstructing parameters for functions instead of multiple parameters that might be of
      the same type
    • Don’t use null inside the app, only on code interacting with external interfaces/services,
      like DB and HTTP; convert to undefined before sending inwards into the code
    • Use async/await instead of .then()
    • Use the strict equality operator ===, don’t use abstract equality operator ==
    • When calling a Promise-returning function asynchronously (i.e., not awaiting), use .catch() to
      handle errors (see processAsyncError and emptyFunction depending on the case)
    • Date and Time
      • Always use buildDayjs() to create dayjs instances
      • Prefer dayjs.duration(...) to create duration consts and keep them as duration
  • Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
  • Avoid creating arrow functions
  • Use truthy syntax instead of in - i.e., if (data.link) not if ('link' in data)
  • Error handling
    • Pass the original error as the new one’s cause so the stack trace is persisted
    • Error messages should have a static message - add dynamic data to MetriportError's additionalInfo prop
    • Avoid sending multiple events to Sentry for a single error
  • Global constants and variables
    • Move literals to constants declared after imports when possible (avoid magic numbers)
    • Avoid shared, global objects
  • Avoid using console.log and console.error in packages other than utils, infra and shared,
    and try to use out().log instead
  • Avoid multi-line logs
    • don't send objects as a second parameter to console.log() or out().log()
    • don't create multi-line strings when using JSON.stringify()
  • Use eslint to enforce code style
  • Use prettier to format code
  • max column length is 100 chars
  • multi-line comments use /** */
  • scripts: top-level comments go after the import
  • packages/utils/src/s3/calculate-consolidated-size-in-s3.ts
  • packages/utils/src/s3/file-history.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: check-pr / lint-build-test
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (1)
packages/utils/src/s3/file-history.ts (1)

32-32: LGTM! Good improvements to concurrency and logging.

The changes improve the script's performance and user experience:

  • Increased parallel calls from 10 to 50 for better performance
  • Added timing information with consistent formatting
  • Added progress logging per customer
  • Used proper constants for configuration

These changes align well with the approach taken in the new calculate-consolidated-size-in-s3.ts script.

Also applies to: 52-52, 69-69, 82-82

Comment on lines +92 to +94
function inGigabyteStr(bytes: number): string {
return (bytes / 1_048_576).toFixed(2) + " GB";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect byte to gigabyte conversion.

The function divides by 1_048_576 (1 MB) but labels the result as GB. The correct conversion should divide by 1_073_741_824 (1 GB).

Apply this fix:

function inGigabyteStr(bytes: number): string {
-  return (bytes / 1_048_576).toFixed(2) + " GB";
+  return (bytes / 1_073_741_824).toFixed(2) + " GB";
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function inGigabyteStr(bytes: number): string {
return (bytes / 1_048_576).toFixed(2) + " GB";
}
function inGigabyteStr(bytes: number): string {
return (bytes / 1_073_741_824).toFixed(2) + " GB";
}
🤖 Prompt for AI Agents
In packages/utils/src/s3/calculate-consolidated-size-in-s3.ts at lines 92 to 94,
the function inGigabyteStr incorrectly converts bytes to gigabytes by dividing
by 1,048,576 (1 MB) instead of 1,073,741,824 (1 GB). Fix this by changing the
divisor to 1,073,741,824 to correctly convert bytes to gigabytes while keeping
the rest of the function unchanged.

for (const [cxId, entries] of Object.entries(cxIds)) {
let cxSize = 0;
const patientIds = entries.map(v => v[1]);
totalPatients = +patientIds.length;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect totalPatients calculation.

The current implementation reassigns totalPatients in each loop iteration instead of accumulating. This will result in totalPatients only reflecting the last customer's patient count.

Apply this fix:

-    totalPatients = +patientIds.length;
+    totalPatients += patientIds.length;

Note: The unary + operator is also unnecessary since patientIds.length is already a number.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
totalPatients = +patientIds.length;
- totalPatients = +patientIds.length;
+ totalPatients += patientIds.length;
🤖 Prompt for AI Agents
In packages/utils/src/s3/calculate-consolidated-size-in-s3.ts at line 50, the
code incorrectly assigns totalPatients to patientIds.length in each iteration,
overwriting the previous value. To fix this, change the assignment to accumulate
the count by adding patientIds.length to totalPatients instead of reassigning
it. Also, remove the unnecessary unary plus operator since patientIds.length is
already a number.

Comment on lines +36 to +37
await sleep(50); // Give some time to avoid mixing logs w/ Node's
console.log(`############ Starting... ############`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Follow coding guidelines for logging.

The coding guidelines specify to avoid using console.log in packages other than utils, infra and shared, and try to use out().log instead.

Consider importing and using the proper logging utility:

+import { out } from "@metriport/core/util/out";

async function main() {
  await sleep(50); // Give some time to avoid mixing logs w/ Node's
-  console.log(`############ Starting... ############`);
+  out().log(`############ Starting... ############`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await sleep(50); // Give some time to avoid mixing logs w/ Node's
console.log(`############ Starting... ############`);
import { out } from "@metriport/core/util/out";
async function main() {
await sleep(50); // Give some time to avoid mixing logs w/ Node's
out().log(`############ Starting... ############`);
// …rest of main()
}
🤖 Prompt for AI Agents
In packages/utils/src/s3/calculate-consolidated-size-in-s3.ts at lines 36 to 37,
replace the console.log statement with the appropriate logging utility as per
coding guidelines. Import the logging function (e.g., out().log) from the
designated logging module and use it to output the "Starting..." message instead
of console.log to maintain consistency and adhere to the project's logging
standards.

@@ -48,9 +49,10 @@ type Result = {
async function main() {
await sleep(50); // Give some time to avoid mixing logs w/ Node's
console.log(`############ Starting... ############`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Follow coding guidelines for logging.

Similar to the other script, this file uses console.log which violates the coding guidelines that specify to avoid using console.log in packages other than utils, infra and shared.

Consider updating all console.log calls to use the proper logging utility:

+import { out } from "@metriport/core/util/out";

// Replace all console.log calls with out().log
-  console.log(`############ Starting... ############`);
+  out().log(`############ Starting... ############`);

Also applies to: 55-55, 71-71, 82-82

🤖 Prompt for AI Agents
In packages/utils/src/s3/file-history.ts at lines 51, 55, 71, and 82, replace
all console.log calls with the appropriate logging utility as per the coding
guidelines. Identify the logging utility used in this project (e.g., a logger
instance or a logging module) and update each console.log statement to use that
instead, ensuring consistent and guideline-compliant logging throughout the
file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0