8000 DOCK-2589: Only add utf-8 charset to text and json mime types by svonworl · Pull Request #6014 · dockstore/dockstore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

DOCK-2589: Only add utf-8 charset to text and json mime types #6014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

svonworl
Copy link
Contributor
@svonworl svonworl commented Oct 14, 2024

Description
During review testing of #6013, Charles noticed that the parameter charset was being set to utf-8 for responses that contained Zip content (mime type application/zip). For example:

content-type: application/zip;charset=utf-8

The Zip mime type https://www.iana.org/assignments/media-types/application/zip doesn't support any parameters, and it's binary data, so the response header is nonsensical.

This PR changes the webservice to only set the charset to utf-8 for text and JSON responses. Thanks to Charles for pinpointing where in the codebase this was happening!

Generally, per standards, text mime types should support the charset parameter: https://datatracker.ietf.org/doc/html/rfc2046#section-4.1.2

Opinions vary as to whether setting charset for JSON is necessary, proper, and/or good:
https://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean

However, it's what we've been doing for many years now, and it seems to work fine, so continuing to do so avoids breaking anything that happened to depend on that behavior.

Review Instructions
Retrieve a Zip per the endpoint in the ticket, and confirm that the charset is not defined in the mime type. Retrieve a JSON response (for example, the organizations endpoint), and confirm that the charset is set to utf-8.

Issue
https://ucsc-cgl.atlassian.net/browse/DOCK-2589
#6010

Security and Privacy

If there are any concerns that require extra attention from the security team, highlight them here and check the box when complete.

  • Security and Privacy assessed

e.g. Does this change...

  • Any user data we collect, or data location?
  • Access control, authentication or authorization?
  • Encryption features?

Please make sure that you've checked the following before submitting your pull request. Thanks!

  • Check that you pass the basic style checks and unit tests by running mvn clean install
  • Ensure that the PR targets the correct branch. Check the milestone or fix version of the ticket.
  • Follow the existing JPA patterns for queries, using named parameters, to avoid SQL injection
  • If you are changing dependencies, check the Snyk status check or the dashboard to ensure you are not introducing new high/critical vulnerabilities
  • Assume that inputs to the API can be malicious, and sanitize and/or check for Denial of Service type values, e.g., massive sizes
  • Do not serve user-uploaded binary images through the Dockstore API
  • Ensure that endpoints that only allow privileged access enforce that with the @RolesAllowed annotation
  • Do not create cookies, although this may change in the future
  • If this PR is for a user-facing feature, create and link a documentation ticket for this feature (usually in the same milestone as the linked issue). Style points if you create a documentation PR directly and link that instead.

Copy link
codecov bot commented Oct 14, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 74.44%. Comparing base (3258fe2) to head (c71fb37).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
...io/dockstore/webservice/CharsetResponseFilter.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6014   +/-   ##
==========================================
  Coverage      74.44%   74.44%           
- Complexity      5493     5495    +2     
==========================================
  Files            381      381           
  Lines          19785    19788    +3     
  Branches        2043     2044    +1     
==========================================
+ Hits           14728    14731    +3     
  Misses          4076     4076           
  Partials         981      981           
Flag Coverage Δ
bitbuckettests 26.65% <60.00%> (+<0.01%) ⬆️
hoverflytests 27.94% <60.00%> (+<0.01%) ⬆️
integrationtests 56.70% <60.00%> (+<0.01%) ⬆️
languageparsingtests 11.05% <60.00%> (+<0.01%) ⬆️
localstacktests 21.56% <60.00%> (+<0.01%) ⬆️
toolintegrationtests 30.03% <80.00%> (+0.01%) ⬆️
unit-tests_and_non-confidential-tests 25.78% <60.00%> (+<0.01%) ⬆️
workflowintegrationtests 38.05% <80.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -39,8 +39,12 @@ public class CharsetResponseFilter implements ContainerResponseFilter {
public void filter(ContainerRequestContext requestContext, ContainerResponseContext responseContext) {
MediaType contentType = responseContext.getMediaType();
if (contentType != null) {
if (!contentType.toString().toLowerCase().contains("charset=utf-8")) {
responseContext.getHeaders().putSingle("Content-Type", contentType.toString() + ";charset=UTF-8");
boolean isText = "text".equals(contentType.getType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be paranoid, should we have equalsIgnoreCase here and on the next line?

Copy link
Contributor Author
@svonworl svonworl Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've never seen a non-standardly-capitalized mime type but iT COuLd HapPeN.
Next line compares "types" and should do it correctly.

Copy link

@svonworl svonworl merged commit 4f79d9c into develop Oct 15, 2024
18 checks passed
@svonworl svonworl deleted the feature/dock-2589/only-set-charset-for-text-mime-types branch October 15, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0