Use beautifulsoup for HTML-sanitization

This little example demonstrates how easy HTML sanitization might be with beautifulsoup:

from bs4 import BeautifulSoup

html_content = """
<html>
  <body>
    <h1 class="title" >Title</h1>
    <script>alert('This is malicious');</script>
    <p id="para1" style="color: red;">This is a paragraph.</p>
  </body>
</html>
"""

soup = BeautifulSoup(html_content, "html.parser")

# Remove specific tags
for tag in soup(["script", "style"]):
    tag.decompose()

# Sanitize attributes
allowed_attributes = {"p": ["id"], "h1": []}
for tag in soup.find_all(True):
    if tag.name in allowed_attributes:
        tag.attrs = {key: value for key, value in tag.attrs.items() if key in allowed_attributes[tag.name]}
    else:
        tag.attrs = {}  # Remove all attributes for tags not in the allowed list

print(soup.prettify())

We should consider this as part of #631

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions