-
Notifications
You must be signed in to change notification settings - Fork 67
ENG-41 Adj OS bulk import error handling #3802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce new utilities for extracting and handling errors from OpenSearch bulk responses, refactor bulk ingestion logic for improved error tracking and reporting, and propagate granular error handling through the ingestion pipeline. Error callbacks are now supported during semantic ingestion, enabling errors to be logged to NDJSON files. Sentry initialization is added for error monitoring. Changes
Sequence Diagram(s)sequenceDiagram
participant Script as semantic-ingest.ts
participant Sentry as Sentry
participant SemanticIngest as ingestSemantic
participant TextIngestor as OpenSearchTextIngestorDirect
participant OpenSearch as OpenSearch API
Script->>Sentry: initSentry()
Script->>SemanticIngest: ingestSemantic({ patient, onItemError })
SemanticIngest->>TextIngestor: ingestBulk({ cxId, patientId, resources, onItemError })
loop For each chunk of resources
TextIngestor->>OpenSearch: bulk API request
OpenSearch-->>TextIngestor: bulk response
TextIngestor->>TextIngestor: processErrorsFromBulkResponse(response, operation, onItemError)
alt If error in item
TextIngestor->>Script: onItemError(error) (writes to NDJSON)
end
end
sequenceDiagram
participant TextIngestor as OpenSearchTextIngestorDirect
participant BulkUtil as bulk.ts
participant OpenSearch as OpenSearch API
TextIngestor->>OpenSearch: bulk API request
OpenSearch-->>TextIngestor: bulk response
TextIngestor->>BulkUtil: processErrorsFromBulkResponse(response, operation, onItemError)
BulkUtil-->>TextIngestor: error count
alt If onItemError provided
BulkUtil->>Script: onItemError(error)
end
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
npm warn config production Use Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
e78869e
to
781f6bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (7)
packages/utils/src/shared/sentry.ts (1)
15-23
: Consider adding return type and error handling.The function implementation follows best practices by using environment-specific configuration. However, consider adding a return type annotation and potentially some error handling in case Sentry initialization fails.
-export function initSentry() { +export function initSentry(): void { Sentry.init({ dsn: sentryDsn, enabled: sentryDsn != null, environment: getEnvType(), sampleRate: 1.0, // error sample rate tracesSampleRate: 0, // transaction sample rate }); }packages/utils/src/open-search/semantic-ingest.ts (1)
39-44
: Consider adding error handling for file operations.The file path setup code is well-structured, but there's no error handling for potential file system operation failures. Consider adding try/catch blocks around file operations to handle potential errors gracefully.
const outputFolderName = `semantic-ingest`; -initRunsFolder(outputFolderName); -const getFolderName = buildGetDirPathInside(outputFolderName); -const outputFilePrefix = getFolderName(`${cxId}_${patientId}`); -const outputFilePath = outputFilePrefix + ".ndjson"; +try { + initRunsFolder(outputFolderName); + const getFolderName = buildGetDirPathInside(outputFolderName); + const outputFilePrefix = getFolderName(`${cxId}_${patientId}`); + const outputFilePath = outputFilePrefix + ".ndjson"; +} catch (error) { + console.error(`Failed to initialize output folder: ${error}`); + process.exit(1); +}packages/core/src/external/opensearch/text-ingestor-direct.ts (5)
19-19
: Fix grammatical issue in commentThe unit should be plural since the value is 2.
-const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minute").asMilliseconds(); +const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minutes").asMilliseconds();
29-29
: Consider adding a configuration option for bulk chunk sizeThe bulk chunk size has been increased significantly from what appears to be a previous value to 10,000. While the comments justify this based on average resource size, it would be better to make this configurable to handle varying resource sizes across different environments.
Consider adding a configuration parameter for the bulk chunk size:
-const bulkChunkSize = 10_000; +const DEFAULT_BULK_CHUNK_SIZE = 10_000; export type OpenSearchTextIngestorDirectConfig = OpenSearchIngestorConfig & { endpoint: string; username: string; password: string; settings?: OpenSearchFileIngestorDirectSettings; + bulkChunkSize?: number; }; // Then in constructor: constructor(readonly config: OpenSearchTextIngestorDirectConfig) { // ... this.bulkChunkSize = config.bulkChunkSize ?? DEFAULT_BULK_CHUNK_SIZE; } // Use this.bulkChunkSize in the ingestBulk method
44-44
: Add JSDoc for theonItemError
parameter in IngestBulkRequestFor consistency with your other documentation, you should add a JSDoc comment for the
onItemError
parameter in theIngestBulkRequest
type.export type IngestBulkRequest = { cxId: string; patientId: string; resources: IngestRequestResource[]; + /** Function to call for each item error during bulk ingestion. See buildOnItemError() for the default implementation. */ onItemError?: OnBulkItemError | undefined; };
175-183
: Extract flatMap operation to a more explicit implementationThe flatMap operation can be hard to understand at a glance. Consider using a more explicit approach with map + flat for better readability, especially for developers who may not 8000 be familiar with flatMap.
- const bulkRequest = resources.flatMap(resource => - resourceToBulkRequest({ - cxId, - patientId, - resource, - operation, - }) - ); + const bulkRequest = resources + .map(resource => + resourceToBulkRequest({ + cxId, + patientId, + resource, + operation, + }) + ) + .flat();
244-245
: Fix typo in comment and clarify the assumptionThere's a typo in the comment and the assumption should be clarified to avoid potential misunderstandings.
-/** - * Builds a function that will add the error to the list if it doesn't already exist. - * - * - * NOTE: Assumes all requeests are for the same index and operation! - * +/** + * Builds a function that will add the error to the list if it doesn't already exist. + * + * + * NOTE: Assumes all requests being processed share the same index and operation type! + * This function only tracks error counts by type, not by index or operation. + *
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/semantic/ingest.ts
packages/utils/src/open-search/semantic-text-from-consolidated.ts
packages/utils/src/shared/sentry.ts
packages/utils/src/open-search/semantic-ingest.ts
packages/core/src/external/opensearch/bulk.ts
packages/core/src/external/opensearch/text-ingestor-direct.ts
🧬 Code Graph Analysis (4)
packages/core/src/external/opensearch/semantic/ingest.ts (1)
packages/core/src/external/opensearch/bulk.ts (1)
OnBulkItemError
(23-23)
packages/utils/src/shared/sentry.ts (1)
packages/api/src/shared/config.ts (1)
getEnvVar
(10-10)
packages/utils/src/open-search/semantic-ingest.ts (5)
packages/utils/src/shared/folder.ts (2)
initRunsFolder
(9-27)buildGetDirPathInside
(37-43)packages/utils/src/shared/sentry.ts (1)
initSentry
(15-23)packages/core/src/external/opensearch/semantic/ingest.ts (1)
ingestSemantic
(14-49)packages/utils/src/shared/duration.ts (1)
elapsedTimeAsStr
(8-13)packages/core/src/external/opensearch/bulk.ts (1)
BulkResponseErrorItem
(15-21)
packages/core/src/external/opensearch/text-ingestor-direct.ts (3)
packages/core/src/external/opensearch/file-ingestor.ts (1)
IngestRequest
(3-7)packages/core/src/external/opensearch/index.ts (1)
contentFieldName
(7-7)packages/core/src/external/opensearch/bulk.ts (4)
OnBulkItemError
(23-23)getCountErrorsFromBulkResponse
(33-55)BulkOperation
(3-3)BulkResponseErrorItem
(15-21)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: check-pr / lint-build-test
- GitHub Check: check-pr / lint-build-test
- GitHub Check: Analyze (javascript)
🔇 Additional comments (12)
packages/utils/src/shared/sentry.ts (2)
1-5
: Good import organization and constant declaration.The imports are well-organized with the external library separated from internal imports. The constant is declared at the module level which is appropriate.
6-14
: Comprehensive JSDoc documentation.The comment block clearly explains the purpose of the function and includes important warnings about not enabling Sentry in production or scripts that might generate excessive events. The references to existing implementations are also helpful.
packages/core/src/external/opensearch/semantic/ingest.ts (3)
5-5
: Clean import addition.The import of
OnBulkItemError
is appropriately placed with other imports from the same module.
14-20
: Well-structured parameter interface.The function signature update properly incorporates the optional error callback parameter using TypeScript's optional property syntax in the parameter object. This follows the project's coding guidelines of preferring destructured parameters.
44-44
: Proper propagation of error callback.The optional error callback is correctly forwarded to the
ingestBulk
method, allowing for granular error handling during semantic ingestion without modifying the core ingestion logic.packages/utils/src/open-search/semantic-ingest.ts (5)
6-6
: Appropriate type import.The import of
BulkResponseErrorItem
type from the bulk module aligns with the error handling improvements in this PR.
12-15
: Well-organized imports for file operations and Sentry.The imports are properly organized, adding necessary utilities for file operations and error reporting.
19-27
: Clear documentation update.The comment has been expanded to accurately describe the new error handling functionality, which helps future developers understand the purpose and usage of the script.
48-48
: Good integration of Sentry initialization.Adding Sentry initialization at the beginning of the main function ensures errors are properly captured during execution.
54-54
: Proper usage of the error callback.The
onItemError
callback is correctly passed to theingestSemantic
function, enabling granular error handling during semantic ingestion.packages/utils/src/open-search/semantic-text-from-consolidated.ts (1)
36-39
: Good reorganization of folder initialization.Moving the folder initialization and file path setup to the beginning of the main function ensures that these operations are completed before any processing begins. This matches the pattern used in the semantic-ingest.ts file and improves code consistency.
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
115-154
: Good implementation of error tracking with the map data structureThe refactoring to use a Map for error tracking is excellent. This approach effectively groups similar errors and provides counts, which is more useful than a flat list of all errors. The addition of the
buildOnItemError
helper function creates a clean default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixnship, everything i left is basically a nit
7eca2da
to
6eef068
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
267-271
: Improve error capturing and context in SentryWhen sending errors to Sentry, it's better to provide more structured context rather than JSON stringifying the error map. Sentry can handle structured data better than stringified JSON.
function captureErrors({ cxId, patientId, errors, log, }: { cxId: string; patientId: string; errors: Map<string, number>; log: typeof console.log; }) { const errorMapToObj = Object.fromEntries(errors.entries()); log(`Errors: `, () => JSON.stringify(errorMapToObj)); capture.error("Errors ingesting resources into OpenSearch", { - extra: { cxId, patientId, countPerErrorType: JSON.stringify(errorMapToObj, null, 2) }, + extra: { + cxId, + patientId, + countPerErrorType: errorMapToObj, + totalErrorCount: Array.from(errors.values()).reduce((sum, count) => sum + count, 0), + }, }); }
🧹 Nitpick comments (2)
packages/core/src/external/opensearch/text-ingestor-direct.ts (2)
139-150
: Consider parallel processing for better performanceThe current implementation processes chunks sequentially using a for-of loop. While this makes error tracking straightforward, it might be slower than processing chunks in parallel.
You could maintain parallel processing while still tracking errors accurately:
- for (const resourceChunk of chunks) { - const errorCountChunk = await this.ingestBulkInternal({ - cxId, - patientId, - resources: resourceChunk, - indexName, - client, - onItemError: _onItemError, - }); - errorCount += errorCountChunk; - } + const errorCounts = await Promise.all( + chunks.map(resourceChunk => + this.ingestBulkInternal({ + cxId, + patientId, + resources: resourceChunk, + indexName, + client, + onItemError: _onItemError, + }) + ) + ); + errorCount = errorCounts.reduce((sum, count) => sum + count, 0);
245-245
: Fix typo in commentThere's a typo in the comment: "requeests" should be "requests".
- NOTE: Assumes all requeests are for the same index and operation! + NOTE: Assumes all requests are for the same index and operation!
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/utils/src/shared/sentry.ts
🚧 Files skipped from review as they are similar to previous changes (4)
- packages/utils/src/open-search/semantic-ingest.ts
- packages/core/src/external/opensearch/semantic/ingest.ts
- packages/utils/src/open-search/semantic-text-from-consolidated.ts
- packages/core/src/external/opensearch/bulk.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to n 8000 arrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/text-ingestor-direct.ts
🔇 Additional comments (7)
packages/core/src/external/opensearch/text-ingestor-direct.ts (7)
19-20
: 👍 Good improvement to timeout and retry settingsIncreasing the bulk ingestion timeout from 1 to 2 minutes and adding a maximum retry count of 3 for bulk requests improves reliability for larger batch operations.
29-29
: 👍 Appropriate bulk chunk size increaseIncreasing the bulk chunk size from 200 to 10,000 aligns with OpenSearch/Elasticsearch recommendations for bulk operations (5-15MB). With resources typically around 250 bytes, this is approximately 2.5MB per chunk, which is reasonable.
31-45
: Good refactoring of request typesThe separation of
IngestRequestResource
fromIngestRequest
improves modularity, and adding the optionalonItemError
callback supports the new error handling workflow.
135-138
: 👍 Effective error tracking implementationThe implementation of error tracking via a Map with error types as keys and counts as values is an excellent approach to group similar errors, which aligns perfectly with the PR's objective.
175-182
: 👍 Good extraction of resource to bulk request logicExtracting the bulk request building logic to a separate function improves readability and maintainability.
191-194
: 👍 Clean error handling with helper functionUsing the
getCountErrorsFromBulkResponse
helper function with the error callback provides a cleaner way to handle errors from bulk responses.
249-254
: 👍 Effective error grouping implementationThe
buildOnItemError
function efficiently groups errors by type and counts occurrences, fulfilling the PR objective of grouping similar errors with their counts.
e6b29ee
to
358f4a6
Compare
358f4a6
to
6bef7a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
256-272
: Improve error capturing and context in Sentry.When sending errors to Sentry, it's better to provide more structured context rather than JSON stringifying the error map. Sentry can handle structured data better than stringified JSON.
function captureErrors({ cxId, patientId, errors, log, }: { cxId: string; patientId: string; errors: Map<string, number>; log: typeof console.log; }) { const errorMapToObj = Object.fromEntries(errors.entries()); log(`Errors: `, () => JSON.stringify(errorMapToObj)); capture.error("Errors ingesting resources into OpenSearch", { - extra: { cxId, patientId, countPerErrorType: JSON.stringify(errorMapToObj, null, 2) }, + extra: { + cxId, + patientId, + countPerErrorType: errorMapToObj, + totalErrorCount: Array.from(errors.values()).reduce((sum, count) => sum + count, 0), + }, }); }
🧹 Nitpick comments (3)
packages/core/src/external/opensearch/text-ingestor-direct.ts (3)
19-20
: Consider adding a comment explaining the timeout increase.The bulk ingestion timeout has been doubled from 1 to 2 minutes, but there's no explanation for this change. Adding a comment would help future developers understand the reasoning behind this value.
-const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minute").asMilliseconds(); +// Increased from 1 to 2 minutes to accommodate larger bulk operations +const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minute").asMilliseconds();
241-254
: Fix typo in comment.There's a typo in the comment for the
buildOnItemError
function.-/** - * Builds a function that will add the error to the list if it doesn't already exist. - * - * - * NOTE: Assumes all requeests are for the same index and operation! - * +/** + * Builds a function that will add the error to the list if it doesn't already exist. + * + * + * NOTE: Assumes all requests are for the same index and operation! + *
249-254
: Consider optimizing Map operation.The current implementation reads and writes to the Map in separate operations. This can be optimized using a single operation.
function buildOnItemError(errors: Map<string, number>): OnBulkItemError { return (error: BulkResponseErrorItem) => { - const count = errors.get(error.type) ?? 0; - errors.set(error.type, count + 1); + errors.set(error.type, (errors.get(error.type) ?? 0) + 1); }; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/utils/src/open-search/semantic-ingest.ts
- packages/utils/src/shared/sentry.ts
- packages/core/src/external/opensearch/semantic/ingest.ts
- packages/utils/src/open-search/semantic-text-from-consolidated.ts
- packages/core/src/external/opensearch/bulk.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/text-ingestor-direct.ts
🧬 Code Graph Analysis (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (3)
packages/core/src/external/opensearch/index.ts (1)
contentFieldName
(7-7)packages/core/src/external/opensearch/bulk.ts (4)
OnBulkItemError
(23-23)processErrorsFromBulkResponse
(33-55)BulkOperation
(3-3)BulkResponseErrorItem
(15-21)packages/lambdas/src/shared/capture.ts (1)
capture
(18-103)
🔇 Additional comments (7)
packages/core/src/external/opensearch/text-ingestor-direct.ts (7)
8-13
: Well-structured imports for bulk operation functionality.The imported types and functions from the bulk module provide a good separation of concerns and improve code organization.
29-29
: Approve the significantly increased bulk chunk size.The chunk size increase (from 200 to 10,000) is well-justified in the comment above. The calculation based on average resource size and Elasticsearch's recommendation supports this change.
34-39
: Good type refactoring for better code organization.Extracting the common resource fields into a separate
IngestRequestResource
type improves code organization and reusability.
43-44
: Good addition of error handling capability.Adding the optional
onItemError
callback to theIngestBulkRequest
type enables granular error handling during bulk operations, which aligns with the PR objectives.
115-155
: Improved bulk ingestion with better error handling.The rewritten
ingestBulk
method now properly processes resources in chunks, tracks errors by type, and provides better error reporting. This implementation aligns well with the PR objectives of improving error handling during bulk operations.
157-195
: Refactored ingestBulkInternal with improved error handling.The method now uses the
processErrorsFromBulkResponse
function from the bulk module and returns the error count, which is a cleaner approach than returning raw errors.
206-227
: Well-structured helper function for building bulk requests.The
resourceToBulkRequest
function encapsulates the logic for building bulk request bodies, improving code readability and maintainability.
6bef7a4
to
b39caab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
157-194
: 🛠️ Refactor suggestionConsider adding proper error handling for network or client failures.
While the method handles OpenSearch response errors via
processErrorsFromBulkResponse
, there's no try-catch block to handle connection issues or other exceptions that might occur during the client.bulk() call.const startedAt = Date.now(); - const response = await client.bulk( - { index: indexName, body: bulkRequest }, - { requestTimeout: DEFAULT_BULK_INGESTION_TIMEOUT, maxRetries: MAX_BULK_RETRIES } - ); - const time = Date.now() - startedAt; - - const errorCount = processErrorsFromBulkResponse(response, operation, onItemError); + try { + const response = await client.bulk( + { index: indexName, body: bulkRequest }, + { requestTimeout: DEFAULT_BULK_INGESTION_TIMEOUT, maxRetries: MAX_BULK_RETRIES } + ); + const time = Date.now() - startedAt; + + const errorCount = processErrorsFromBulkResponse(response, operation, onItemError); + debug(`${operation}ed ${resources.length} resources in ${time} ms, ${errorCount} errors`); + return errorCount; + } catch (error) { + debug(`Failed to bulk ${operation} ${resources.length} resources: ${error}`); + if (onItemError) { + // Handle all resources as failed with the same error + resources.forEach(() => + onItemError({ + status: 500, + type: error instanceof Error ? error.name : 'UnknownError', + reason: error instanceof Error ? error.message : String(error), + }) + ); + } + return resources.length; // All resources failed + } - debug(`${operation}ed ${resources.length} resources in ${time} ms, ${errorCount} errors`); - - return errorCount;
♻️ Duplicate comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
267-271
: Improve error capturing and context in Sentry.When sending errors to Sentry, it's better to provide more structured context rather than JSON stringifying the error map. Sentry can handle structured data better than stringified JSON.
const errorMapToObj = Object.fromEntries(errors.entries()); log(`Errors: `, () => JSON.stringify(errorMapToObj)); capture.error("Errors ingesting resources into OpenSearch", { - extra: { cxId, patientId, countPerErrorType: JSON.stringify(errorMapToObj, null, 2) }, + extra: { + cxId, + patientId, + countPerErrorType: errorMapToObj, + totalErrorCount: Array.from(errors.values()).reduce((sum, count) => sum + count, 0), + }, });
🧹 Nitpick comments (2)
packages/core/src/external/opensearch/text-ingestor-direct.ts (2)
19-19
: Fix grammatical error in constant definition.The time unit "minute" should be plural when specifying multiple time units.
-const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minute").asMilliseconds(); +const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minutes").asMilliseconds();
206-227
: Add return type to resourceToBulkRequest function.The function is missing a return type definition, which would improve type safety and documentation.
function resourceToBulkRequest({ cxId, patientId, resource, operation, }: { cxId: string; patientId: string; resource: IngestRequestResource; operation: BulkOperation; -}) { +}): [Record<BulkOperation, { _id: string }>, IngestRequest] { const entryId = getEntryId(cxId, patientId, resource.resourceId); const document: IngestRequest = { cxId, patientId, resourceType: resource.resourceType, resourceId: resource.resourceId, content: resource.content, }; const cmd = { [operation]: { _id: entryId } }; return [cmd, document]; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/utils/src/open-search/semantic-ingest.ts
- packages/utils/src/open-search/semantic-text-from-consolidated.ts
- packages/utils/src/shared/sentry.ts
- packages/core/src/external/opensearch/semantic/ingest.ts
- packages/core/src/external/opensearch/bulk.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/text-ingestor-direct.ts
🔇 Additional comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
124-154
: LGTM! Error handling implementation looks solid.The bulk ingestion implementation properly tracks errors and aggregates them by type, which is aligned with the PR objective to group similar errors. The fallback error handler is a nice touch that ensures errors are always captured even when no custom handler is provided.
b39caab
to
f3c604b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
256-272
: Consider structured error data in Sentry.While the implementation works, consider the previous review suggestion to provide structured data to Sentry rather than stringified JSON for better error exploration.
function captureErrors({ cxId, patientId, errors, log, }: { cxId: string; patientId: string; errors: Map<string, number>; log: typeof console.log; }) { const errorMapToObj = Object.fromEntries(errors.entries()); log(`Errors: `, () => JSON.stringify(errorMapToObj)); capture.error("Errors ingesting resources into OpenSearch", { - extra: { cxId, patientId, countPerErrorType: JSON.stringify(errorMapToObj, null, 2) }, + extra: { + cxId, + patientId, + countPerErrorType: errorMapToObj, + totalErrorCount: Array.from(errors.values()).reduce((sum, count) => sum + count, 0), + }, }); }
🧹 Nitpick comments (2)
packages/core/src/external/opensearch/text-ingestor-direct.ts (2)
135-138
: Consider clearer naming for the error handler variable.The variable name
_onItemError
is very similar to the parameter nameonItemError
which could be confusing for future developers.-const _onItemError = onItemError ?? buildOnItemError(errors); +const effectiveOnItemError = onItemError ?? buildOnItemError(errors);
241-254
: Error aggregation helper with typo in comment.The
buildOnItemError
function effectively aggregates errors by type, but there's a typo in the comment.-* NOTE: Assumes all requeests are for the same index and operation! +* NOTE: Assumes all requests are for the same index and operation!
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/utils/src/shared/sentry.ts
- packages/core/src/external/opensearch/semantic/ingest.ts
- packages/utils/src/open-search/semantic-text-from-consolidated.ts
- packages/utils/src/open-search/semantic-ingest.ts
- packages/core/src/external/opensearch/bulk.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/text-ingestor-direct.ts
🧬 Code Graph Analysis (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (3)
packages/core/src/external/opensearch/index.ts (1)
contentFieldName
(7-7)packages/core/src/external/opensearch/bulk.ts (4)
OnBulkItemError
(23-23)processErrorsFromBulkResponse
(33-55)BulkOperation
(3-3)BulkResponseErrorItem
(15-21)packages/lambdas/src/shared/capture.ts (1)
capture
(18-103)
🔇 Additional comments (10)
packages/core/src/external/opensearch/text-ingestor-direct.ts (10)
8-13
: LGTM: New imports added for bulk operations.These imports from the new
bulk.ts
module properly support the enhanced error handling capabilities being implemented.
34-39
: Good type refactoring for improved code structure.Extracting
IngestRequestResource
as a separate type and adding the optionalonItemError
callback parameter improve the interface design and support the new error handling functionality.Also applies to: 43-44
115-124
: Well-documented function with clear parameters.The JSDoc comments for
ingestBulk
clearly explain the purpose and parameters of the function, which is helpful for maintainability.
140-150
: LGTM: Properly handles chunks and tracks error count.The refactored loop correctly processes chunks of resources and accumulates the error count.
153-154
: Good error reporting with proper log context.The log includes the total resources processed, time taken, and error count, followed by error capture when errors exist.
175-182
: Improved bulk request creation with dedicated helper function.Extracting the bulk request creation logic to the
resourceToBulkRequest
helper function improves readability and maintainability.
186-187
: Enhanced bulk operation configuration.Using the increased timeout and setting a maximum retry count improves the robustness of bulk operations.
191-194
: Simplified error handling with bulk response processor.Using the new
processErrorsFromBulkResponse
utility function frombulk.ts
makes the error handling code more maintainable and consistent.
206-227
: Well-structured helper function for bulk request formatting.The
resourceToBulkRequest
function properly formats resources for OpenSearch bulk operations, making the main code cleaner.
19-20
:❓ Verification inconclusive
Configuration adjustments for improved performance.
The changes to these constants look good:
- Increasing timeout from 1 to 2 minutes allows more time for larger bulk operations
- Adding MAX_BULK_RETRIES ensures failed operations are retried
- Increasing bulkChunkSize from 200 to 10,000 aligns with the documentation comment about optimal bulk sizes
Consider verifying that these increased values don't cause memory pressure issues in your environment:
Also applies to: 29-29
🏁 Script executed:
#!/bin/bash # Check if there are any timeout errors in the logs related to OpenSearch bulk operations rg -i "timeout|opensearch.*error|econnreset" --glob "*.log"Length of output: 210
Approve code changes – updated timeout and retry settings
The adjustments look solid:
- Timeout increased from 1 to 2 minutes for larger bulk operations
- Added
MAX_BULK_RETRIES
to transparently retry failures- Bulk chunk size raised to 10,000 per the optimal-size comment
I ran
rg -i 'timeout|opensearch.*error|econnreset' --glob '*.log'
, but no log files were present in the repo. Please verify in your deployment environment that these increased settings don’t introduce memory pressure or timeout issues during OpenSearch bulk ingestion.Relevant location:
- packages/core/src/external/opensearch/text-ingestor-direct.ts (lines 19–20, 29)
f3c604b
to
9bbda99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
256-272
: Improve error capturing in Sentry.When sending errors to Sentry, it would be better to provide structured data rather than stringifying the error map. This was previously suggested in a past review comment.
function captureErrors({ cxId, patientId, errors, log, }: { cxId: string; patientId: string; errors: Map<string, number>; log: typeof console.log; }) { const errorMapToObj = Object.fromEntries(errors.entries()); log(`Errors: `, () => JSON.stringify(errorMapToObj)); capture.error("Errors ingesting resources into OpenSearch", { - extra: { cxId, patientId, countPerErrorType: JSON.stringify(errorMapToObj, null, 2) }, + extra: { + cxId, + patientId, + countPerErrorType: errorMapToObj, + totalErrorCount: Array.from(errors.values()).reduce((sum, count) => sum + count, 0), + }, }); }
🧹 Nitpick comments (2)
packages/core/src/external/opensearch/text-ingestor-direct.ts (2)
19-20
: Consider using plural form for time unit.The singular form "minute" in
dayjs.duration(2, "minute")
should be "minutes" for consistency with dayjs API conventions.-const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minute").asMilliseconds(); +const DEFAULT_BULK_INGESTION_TIMEOUT = dayjs.duration(2, "minutes").asMilliseconds();
241-254
: Fix typo in comment.There's a typo in the function comment.
-* NOTE: Assumes all requeests are for the same index and operation! +* NOTE: Assumes all requests are for the same index and operation!
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/utils/src/open-search/semantic-ingest.ts
- packages/core/src/external/opensearch/semantic/ingest.ts
- packages/utils/src/shared/sentry.ts
- packages/core/src/external/opensearch/bulk.ts
- packages/utils/src/open-search/semantic-text-from-consolidated.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/text-ingestor-direct.ts
🧬 Code Graph Analysis (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (4)
packages/core/src/external/opensearch/file-ingestor.ts (1)
IngestRequest
(3-7)packages/core/src/external/opensearch/index.ts (1)
contentFieldName
(7-7)packages/core/src/external/opensearch/bulk.ts (4)
OnBulkItemError
(23-23)processErrorsFromBulkResponse
(33-55)BulkOperation
(3-3)BulkResponseErrorItem
(15-21)packages/lambdas/src/shared/capture.ts (1)
capture
(18-103)
🔇 Additional comments (6)
packages/core/src/external/opensearch/text-ingestor-direct.ts (6)
8-13
: Well-organized imports for new bulk error handling capabilities.The new imports from the "./bulk" module bring in the necessary types and functions for enhanced error handling in bulk operations, which aligns well with the PR objective of improving OpenSearch bulk import error handling.
29-29
: Significant increase in bulk chunk size is well-documented.The increase from 200 to 10,000 is substantial but justified by the detailed comment above explaining the rationale based on Elasticsearch's guidelines and observed resource sizes.
34-45
: Good refactoring of types to support error handling.The extraction of
IngestRequestResource
fromIngestRequest
and the addition of the optionalonItemError
callback inIngestBulkRequest
improve type safety and enable granular error handling, which is essential for the PR's goals.
115-155
: Improved bulk ingestion with robust error tracking.The refactored
ingestBulk
method now properly tracks errors by type, provides clear documentation, and includes error reporting through Sentry. This implementation effectively addresses the PR objective of grouping similar errors with their counts.
157-195
: Enhanced internal bulk ingestion with error handling.The
ingestBulkInternal
method now uses theprocessErrorsFromBulkResponse
function for better error extraction and propagates errors through the callback mechanism. The code is more modular and easier to maintain.
206-227
: Well-structured helper function for building bulk requests.The
resourceToBulkRequest
function cleanly separates the logic for creating bulk request entries, making the code more maintainable and easier to test.
9bbda99
to
7ad3fc7
Compare
Ref eng-41 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
- group OS bulk errors to Sentry - store individual errors in NDJON when run from utils Ref eng-41 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
Ref eng-268 Signed-off-by: Rafael Leite <2132564+leite08@users.noreply.github.com>
7ad3fc7
to
23c850f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
256-272
: Consider structured data in SentryWhile the current implementation works, consider providing structured data to Sentry instead of JSON stringifying the error map. This would make errors more searchable and filterable in Sentry.
const errorMapToObj = Object.fromEntries(errors.entries()); log(`Errors: `, () => JSON.stringify(errorMapToObj)); capture.error("Errors ingesting resources into OpenSearch", { - extra: { cxId, patientId, countPerErrorType: JSON.stringify(errorMapToObj, null, 2) }, + extra: { + cxId, + patientId, + countPerErrorType: errorMapToObj, + totalErrorCount: Array.from(errors.values()).reduce((sum, count) => sum + count, 0), + }, });
🧹 Nitpick comments (1)
packages/core/src/external/opensearch/text-ingestor-direct.ts (1)
140-150
: Consider adding error handling for the chunk processing loopWhile the error counting mechanism works well, the code could benefit from try/catch handling around the chunk processing loop to ensure that errors in one chunk don't prevent processing of subsequent chunks.
const chunks = chunk(resources, bulkChunkSize); + let errorsInChunks = false; for (const resourceChunk of chunks) { + try { const errorCountChunk = await this.ingestBulkInternal({ cxId, patientId, resources: resourceChunk, indexName, client, onItemError: _onItemError, }); errorCount += errorCountChunk; + } catch (error) { + errorsInChunks = true; + log(`Error processing chunk: ${error.message}`); + capture.error("Error processing chunk in OpenSearch bulk import", { + extra: { cxId, patientId, error }, + cause: error, + }); + } } + + if (errorsInChunks) { + 1036C log("Some chunks failed to process, see errors above"); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
packages/core/src/external/opensearch/bulk.ts
(1 hunks)packages/core/src/external/opensearch/semantic/ingest.ts
(3 hunks)packages/core/src/external/opensearch/text-ingestor-direct.ts
(6 hunks)packages/utils/src/open-search/semantic-ingest.ts
(2 hunks)packages/utils/src/open-search/semantic-text-from-consolidated.ts
(1 hunks)packages/utils/src/shared/sentry.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/utils/src/open-search/semantic-ingest.ts
- packages/utils/src/open-search/semantic-text-from-consolidated.ts
- packages/core/src/external/opensearch/semantic/ingest.ts
- packages/utils/src/shared/sentry.ts
- packages/core/src/external/opensearch/bulk.ts
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.ts`: - Use the Onion Pattern to organize a package's code in layers - Try to use immutable code and avoid sharing state across different functions, objects, and systems - Try...
**/*.ts
: - Use the Onion Pattern to organize a package's code in layers
- Try to use immutable code and avoid sharing state across different functions, objects, and systems
- Try to build code that's idempotent whenever possible
- Prefer functional programming style functions: small, deterministic, 1 input, 1 output
- Minimize coupling / dependencies
- Avoid modifying objects received as parameter
- Only add comments to code to explain why something was done, not how it works
- Naming
- classes, enums:
PascalCase
- constants, variables, functions:
camelCase
- file names:
kebab-case
- table and column names:
snake_case
- Use meaningful names, so whoever is reading the code understands what it means
- Don’t use negative names, like
notEnabled
, preferisDisabled
- For numeric values, if the type doesn’t convey the unit, add the unit to the name
- Typescript
- Use types
- Prefer
const
instead oflet
- Avoid
any
and casting fromany
to other types- Type predicates: only applicable to narrow down the type, not to force a complete type conversion
- Prefer deconstructing parameters for functions instead of multiple parameters that might be of
the same type- Don’t use
null
inside the app, only on code interacting with external interfaces/services,
like DB and HTTP; convert toundefined
before sending inwards into the code- Use
async/await
instead of.then()
- Use the strict equality operator
===
, don’t use abstract equality operator==
- When calling a Promise-returning function asynchronously (i.e., not awaiting), use
.catch()
to
handle errors (seeprocessAsyncError
andemptyFunction
depending on the case)- Date and Time
- Always use
buildDayjs()
to createdayjs
instances- Prefer
dayjs.duration(...)
to create duration consts and keep them asduration
- Prefer Nullish Coalesce (??) than the OR operator (||) to provide a default value
- Avoid creating arrow functions
- Use truthy syntax instead of
in
- i.e.,if (data.link)
notif ('link' in data)
- Error handling
- Pass the original error as the new one’s
cause
so the stack trace is persisted- Error messages should have a static message - add dynamic data to MetriportError's
additionalInfo
prop- Avoid sending multiple events to Sentry for a single error
- Global constants and variables
- Move literals to constants declared after imports when possible (avoid magic numbers)
- Avoid shared, global objects
- Avoid using
console.log
andconsole.error
in packages other than utils, infra and shared,
and try to useout().log
instead- Avoid multi-line logs
- don't send objects as a second parameter to
console.log()
orout().log()
- don't create multi-line strings when using
JSON.stringify()
- Use
eslint
to enforce code style- Use
prettier
to format code- max column length is 100 chars
- multi-line comments use
/** */
- scripts: top-level comments go after the import
packages/core/src/external/opensearch/text-ingestor-direct.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: check-pr / lint-build-test
- GitHub Check: check-pr / lint-build-test
- GitHub Check: Analyze (javascript)
🔇 Additional comments (13)
packages/core/src/external/opensearch/text-ingestor-direct.ts (13)
8-13
: Good use of structured imports for error handlingThe addition of these imports from the bulk module provides better organization and separation of concerns for error handling functionality.
19-20
: Improved configuration for bulk operationsIncreasing the timeout to 2 minutes and adding a maximum retry count of 3 enhances the robustness of bulk operations, particularly for larger datasets.
29-29
: Significant increase in bulk chunk sizeThe increase to 10,000 items per chunk aligns well with the explanatory comment about optimal bulk sizes. This should improve throughput while staying within recommended limits.
34-39
: Good type refactoringSplitting
IngestRequest
into two types enhances clarity and follows the principle of separation of concerns. This makes the code more maintainable.
43-44
: Improved flexibility with optional error callbackAdding the optional error callback aligns with the PR objective of enhancing error handling during bulk operations.
115-124
: Well-documented method signatureThe JSDoc comments clearly explain the purpose and parameters of the method, which aids in maintainability and follows coding guidelines.
135-138
: Effective error tracking implementationInitializing an error count and map with a default error handler allows for grouping similar errors, directly addressing the PR objectives.
153-154
: Good error reporting approachOnly capturing errors when they exist and providing a count in the log message aligns with the PR objectives of improving error reporting.
175-182
: Improved code organization with helper functionExtracting the bulk request creation logic to a helper function follows the functional programming principle from the coding guidelines and makes the code more maintainable.
186-187
: Enhanced reliability with timeout and retriesSetting an appropriate timeout and adding retries for bulk operations improves reliability, especially for larger datasets or when experiencing transient network issues.
191-194
: Effective error processing and loggingUsing the specialized error processing function and returning the error count provides clear visibility into operation success/failure rates.
206-227
: Well-structured helper functionThe
resourceToBulkRequest
function follows functional programming principles by being small, deterministic, and having clear inputs and outputs as recommended in the coding guidelines.
249-254
: Effective error counting implementationThe
buildOnItemError
function provides a clean way to count errors by type, supporting the PR objective of grouping similar errors.
Dependencies
Description
Improve OpenSearch's bulk import error handling.
Testing
Release Plan
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Improvements
Refactor
Chores