8000 Use `beautifulsoup` for HTML-sanitization · Issue #1327 · viur-framework/viur-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Use beautifulsoup for HTML-sanitization #1327
Open
@phorward

Description

@phorward

This little example demonstrates how easy HTML sanitization might be with beautifulsoup:

from bs4 import BeautifulSoup

html_content = """
<html>
  <body>
    <h1 class="title" >Title</h1>
    <script>alert('This is malicious');</script>
    <p id="para1" style="color: red;">This is a paragraph.</p>
  </body>
</html>
"""

soup = BeautifulSoup(html_content, "html.parser")

# Remove specific tags
for tag in soup(["script", "style"]):
    tag.decompose()

# Sanitize attributes
allowed_attributes = {"p": ["id"], "h1": []}
for tag in soup.find_all(True):
    if tag.name in allowed_attributes:
        tag.attrs = {key: value for key, value in tag.attrs.items() if key in allowed_attributes[tag.name]}
    else:
        tag.attrs = {}  # Remove all attributes for tags not in the allowed list

print(soup.prettify())

We should consider this as part of #631

Metadata

Metadata

Assignees

Labels

featureNew feature or requestrefactoringPull requests that refactor code but do not change its behavior.securityFor security related bugs

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0