8000 Adds `sanitize_html`, a whitelist based HTML sanitizer. by Kapu1178 · Pull Request #171 · tgstation/rust-g · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Adds sanitize_html, a whitelist based HTML sanitizer. #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 30, 2024

Conversation

Kapu1178
Copy link
Contributor
@Kapu1178 Kapu1178 commented Apr 21, 2024

Adds a customizable HTML sanitizer function using the Ammonia crate. Out of the box, it will:

  • Strip <script> and <style> attributes, as well as their contents.
  • Prune all URL schemes, including byond://
  • Prune all HTML attributes and CSS tags, but not their contents.

By providing json encoded lists, you can whitelist given attributes or tags to not be pruned. I have included a curated tag list in the dm source file for this module that will whitelist most safe CSS attributes.

It occured to me that alot of servers run things like old papercode, which does not sanitize on the server side before being viewable by a client. Sanitizing strings with DM would be an absolute performance nuke, assuming you could even make it bulletproof in the first place.
Here is a recommended default tag whitelist

list(
	"b","br",
	"center", "code",
	"dd", "del", "div", "dl", "dt",
	"em",
	"font",
	"h1", "h2", "h3", "h4", "h5", "h6", "hr",
	"i", "ins",
	"li",
	"menu",
	"ol",
	"p", "pre",
	"span", "strong",
	"table",
	"tbody",
	"td",
	"th",
	"thead",
	"tfoot",
	"tr",
	"u",
	"ul",
)

@Kapu1178
Copy link
Contributor Author
Kapu1178 commented Apr 21, 2024

Error: "sanitize = ["ammonia", "maplit", "serde_json"] is not sorted in Cargo.toml default features"

I am unsure how to fix this.

* * attribute_whitelist_json: a json_encode()'d list of HTML attributes to allow in the final string.
* * tag_whitelist_json: a json_encode()'d list of HTML tags to allow in the final string.
*/
#define rustg_sanitize_html(text, attribute_whitelist_json, tag_whitelist_json) RUSTG_CALL(RUST_G, "sanitize_html")(text, attribute_whitelist_json, tag_whitelist_json)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, am I missing something here?

Comment on lines +6 to +7
* * attribute_whitelist_json: a json_encode()'d list of HTML attributes to allow in the final string.
* * tag_whitelist_json: a json_encode()'d list of HTML tags to allow in the final string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface should take a list and json_encode in itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this so that you can store pre-encoded global lists to save on perf.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that mean it's encoding on every call? The thing is that this will likely be called many times with only one or a few lists, so this introduces extra overhead.

.link_rel(Some("noopener")) // https://mathiasbynens.github.io/rel-noopener/
.url_schemes(prune_url_schemes)
.generic_attributes(attribute_whitelist)
.tags(tag_whitelist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make sense to keep this around rather than build it anew on every invocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to hash the arguments and such and that's out of my skill set presently.

@Kapu1178 Kapu1178 requested a review from ZeWaka May 13, 2024 19:34
Copy link
Collaborator
@ZeWaka ZeWaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks about right.

@optimumtact
Copy link
Member

looks about right :+2:

@Kapu1178
Copy link
Contributor Author

mods? mergies? @ZeWaka

@ZeWaka ZeWaka merged commit 6ef3516 into tgstation:master May 30, 2024
2 checks passed
@Kapu1178 Kapu1178 deleted the bbcode branch May 30, 2024 01:41
itsmeow pushed a commit to itsmeowForks/rust-g that referenced this pull request Mar 31, 2025
itsmeow added a commit to BeeStation/rust-g that referenced this pull request Mar 31, 2025
* Routine Update PR (tgstation#167)

* Standardize redis_reliablequeue connect/disconnect output (tgstation#150)

* More routine updates (tgstation#169)

* IconForge - Building spritesheets at the speed of light (tgstation#160)

* iconforge beta

* Start blending

* Huge cleanup

* Finish optimizing the thing

* Finish the thing!!

* Clean up a bit

* Re-add 32-bit thing

* Fix TOML sorting

* Add dmsrc

* Fix clippy suggestions

* Clippy.. stop being mean

* Cargo fmt + doc comments

* Code cleanup

* More cleanup, remove most unsafe unwrap()s, use Match syntax.

* Remove unneccesarily verbose casting

* Fix overlay blending

* Cleanup with new DMI version

* Cargo fmt

* Leaf
8000
 test, DynamicImage->RgbaImage, better Error handling, DashMap, and cleanup command

* Fix

* Further tree optimizations, hashing optimization, cache icostrings more effectively.

* Optimize unique_icons insertion a little

* Fix macro

* Little more cleanup

* Add to README

* Update dmi, add caching logic.

* Address reviews

* Cleanup panic unwind

* Fix lint failure

* Fix bounds expansion crops, and properly index crops from 1,1

* Don't multiply by alpha if the base alpha is 0

* Fix subtract blending

* Don't hash the same DMI 500 times

* Address reviews

* Clippy fix

* v3.2.0 (tgstation#170)

* Adds `sanitize_html`, a whitelist based HTML sanitizer. (tgstation#171)

* Adds batchnoise to the default features set (tgstation#174)

* Typical Routine Updates (tgstation#175)

* Add task for building on windows (tgstation#176)

* v3.3.0 (tgstation#177)

* Fast poisson sampling (tgstation#178)

Co-authored-by: ZeWaka <zewakagamer@gmail.com>

* Add format argument to git revdate ffi (tgstation#179)

* Add method of parsing revdate for HEAD directly from logs (tgstation#180)

* use lines not split (tgstation#181)

* Windows 7 (tgstation#183)

* Allow compiling non-32bit under feature flag (tgstation#184)

Co-authored-by: ZeWaka <zewakagamer@gmail.com>

* 32bit readme (tgstation#186)

* v3.4.0 (tgstation#187)

* Fix a panic in `byond::parse_args` with debug assertions (tgstation#189)

* chore: routine updates (tgstation#190)

* more assorted package updates because bored (tgstation#191)

* last-minute updates (tgstation#193)

* v3.5.0 (tgstation#194)

* iconforge: Use height() for y axis to support non-square icons (tgstation#197)

* Add building of x64 libs to CI (tgstation#200)

* Add hash and iconforge as default features (tgstation#196)

* IconForge: GAGS (tgstation#188)

* 64 bit lib detection (tgstation#202)

* update mysql crate, trims a lot of deps (tgstation#203)

* Reset to correct versions

* IconForge: Sort GAGS output states (tgstation#206)

* IconForge: Improve GAGS frame/dir difference handling (tgstation#207)

* gamer release workflow

* v3.7.0 (tgstation#208)

* Fix release upload paths

* fully correct and rename files in CI/CD

* Massively optimizes `dmi_icon_states` (tgstation#209)

* Add support for timing out HTTP calls (tgstation#210)

* v3.8.0 (tgstation#211)

* fix default release name while i remember

* Feature: rustg_sound_length() (tgstation#192)

* update `rand` to `0.9`, `cargo update` (tgstation#204)

* Adjust CI to match our workflow

* Fix outdated upload-artifact version

---------

Co-authored-by: ZeWaka <zewakagamer@gmail.com>
Co-authored-by: Kapu1178 <75460809+Kapu1178@users.noreply.github.com>
Co-authored-by: GoldenAlpharex <58045821+GoldenAlpharex@users.noreply.github.com>
Co-authored-by: Fluffy <65877598+FluffyGhoster@users.noreply.github.com>
Co-authored-by: Zephyr <12817816+ZephyrTFA@users.noreply.github.com>
Co-authored-by: Mothblocks <35135081+Mothblocks@users.noreply.github.com>
Co-authored-by: Kyle Spier-Swenson <kyleshome@gmail.com>
Co-authored-by: Lucy <lucy@absolucy.moe>
Co-authored-by: tigercat2000 <nick.pilant2@gmail.com>
Co-authored-by: Amy <3855802+amylizzle@users.noreply.github.com>
Co-authored-by: Jordan Dominion <Cyberboss@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0