8000 Add torrent upload feature by apollotsantos · Pull Request #1046 · rivenmedia/riven · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add torrent upload feature #1046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

apollotsantos
Copy link
@apollotsantos apollotsantos commented Mar 26, 2025

Summary by CodeRabbit

  • New Features

    • Torrent File Upload: Users can now upload torrent files to initiate sessions directly. Files are validated to ensure only proper torrent files are accepted, with clear error feedback on unsupported uploads.
  • Enhancements

    • Streamlined backend processing for torrent uploads results in a more robust and user-friendly experience.
    • Added support for handling multipart data with the inclusion of a new dependency.

Copy link
Contributor
coderabbitai bot commented Mar 26, 2025

Walkthrough

The pull request adds the python-multipart dependency in both the main and development sections of the pyproject.toml file. In the scraping module, it refactors the infohash processing logic by introducing an async helper function (_process_infohash) and a new endpoint (upload_torrent_file) for handling torrent file uploads. Additionally, the test suite is enhanced with new import statements and a dedicated test file for the /scrape/upload_torrent endpoint, ensuring proper handling of valid and invalid torrent file uploads.

Changes

File(s) Change Summary
pyproject.toml Added python-multipart = "^0.0.20" to both [tool.poetry.dependencies] and [tool.poetry.group.dev.dependencies] sections.
src/routers/secure/scrape.py Refactored infohash handling by extracting logic into async _process_infohash; added async upload_torrent_file for processing torrent uploads.
src/tests/test_alldebrid_downloader.py Added imports (hashlib, MagicMock, patch, TestClient, UploadFile) to support enhanced mocking and testing capabilities.
src/tests/test_scrape_endpoints.py Introduced a new test suite for the /scrape/upload_torrent endpoint, including tests for valid torrent file uploads and invalid file scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Endpoint as upload_torrent_file
    participant Bencoder as bencodepy
    participant Processor as _process_infohash
    Client->>Endpoint: POST /scrape/upload_torrent with torrent file
    Endpoint->>Endpoint: Validate file extension (.torrent)
    Endpoint->>Bencoder: Decode torrent file contents
    Bencoder-->>Endpoint: Return extracted infohash
    Endpoint->>Processor: Process infohash with item ID & source description
    Processor-->>Endpoint: Return session response
    Endpoint-->>Client: 200 OK with session data
Loading

Poem

I'm a rabbit hopping through lines of code,
Where new functions bloom on every road.
Async steps and tests are here to stay,
With dependencies lined up in a playful array.
A whimsical burrow of updates, joyful and bright!
🥕🐇 Hop on, the code garden is a delight!

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
src/routers/secure/items.py (1)

611-627: ⚠️ Potential issue

The torrent upload endpoint is incomplete.

While the endpoint successfully extracts the infohash from the uploaded torrent file, it doesn't actually do anything with it. The commented code suggests an intention to process the infohash using an existing workflow, but this implementation is missing.

Additionally, there are two issues flagged by static analysis:

  1. Don't use File(...) in parameter defaults. Instead, define it within the function body.
  2. Within the except clause, use from e to preserve the exception chain.

Apply this fix:

-async def upload_torrent(request: Request, file: UploadFile = File(...)) -> MessageResponse:
+async def upload_torrent(request: Request, file: UploadFile) -> MessageResponse:
     try:
         torrent_data = await file.read()
         infohash = parse_torrent_file(torrent_data)
-        # Use the existing workflow for processing the extracted infohash
-        # Assuming there is a function to handle the infohash
-        # process_infohash(infohash)
+        # TODO: Implement the actual processing of the infohash
+        # Consider using the same workflow as other endpoints that handle infohashes
         return {"message": f"Successfully uploaded and processed .torrent file with infohash {infohash}"}
     except Exception as e:
-        raise HTTPException(status_code=400, detail=f"Failed to process .torrent file: {str(e)}")
+        raise HTTPException(status_code=400, detail=f"Failed to process .torrent file: {str(e)}") from e

Also consider adding validation to ensure the uploaded file is actually a torrent file:

if not file.filename.endswith('.torrent'):
    raise HTTPException(status_code=400, detail="Uploaded file must be a .torrent file")
🧰 Tools
🪛 Ruff (0.8.2)

617-617: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


626-626: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🧹 Nitpick comments (3)
src/tests/test_torrent_upload.py (3)

1-1: Remove unused import

The pytest import is not being used in this file.

-import pytest
🧰 Tools
🪛 Ruff (0.8.2)

1-1: pytest imported but unused

Remove unused import: pytest

(F401)


7-14: Test ensures valid torrent uploads work correctly

The test properly validates that the torrent upload endpoint accepts valid torrent files and returns a 200 status code with a success message.

Consider using path variables or fixtures for test file paths to make the tests more robust:

-    with open("tests/test.torrent", "rb") as torrent_file:
+    # Use pathlib for more reliable file paths
+    from pathlib import Path
+    test_file = Path(__file__).parent / "test.torrent"
+    with open(test_file, "rb") as torrent_file:

16-23: Test ensures invalid torrent uploads are rejected

This test correctly verifies that invalid torrent files are rejected with a 400 status code and an appropriate error message.

Apply the same path improvement as suggested for the valid torrent test:

-    with open("tests/invalid.torrent", "rb") as torrent_file:
+    # Use pathlib for more reliable file paths
+    from pathlib import Path
+    invalid_test_file = Path(__file__).parent / "invalid.torrent"
+    with open(invalid_test_file, "rb") as torrent_file:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11379dd and 64470de.

📒 Files selected for processing (6)
  • src/program/services/downloaders/alldebrid.py (3 hunks)
  • src/program/services/downloaders/models.py (1 hunks)
  • src/program/services/downloaders/realdebrid.py (2 hunks)
  • src/program/services/downloaders/shared.py (2 hunks)
  • src/routers/secure/items.py (3 hunks)
  • src/tests/test_torrent_upload.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
src/program/services/downloaders/realdebrid.py (2)
src/program/services/downloaders/shared.py (3)
  • DownloaderBase (17-93)
  • parse_torrent_file (121-126)
  • add_torrent (46-59)
src/program/services/downloaders/alldebrid.py (2)
  • add_torrent (166-199)
  • execute (39-43)
src/program/services/downloaders/alldebrid.py (1)
src/program/services/downloaders/shared.py (2)
  • parse_torrent_file (121-126)
  • add_torrent (46-59)
🪛 Ruff (0.8.2)
src/program/services/downloaders/shared.py

125-125: Undefined name encode

(F821)

src/tests/test_torrent_upload.py

1-1: pytest imported but unused

Remove unused import: pytest

(F401)

src/routers/secure/items.py

617-617: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


626-626: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (10)
src/program/services/downloaders/models.py (1)

164-169: LGTM - UploadedTorrent model looks well designed

The new UploadedTorrent class follows good design practices with appropriate attributes and types. The use of Field(default_factory=datetime.utcnow) for the uploaded_at field is correct as it properly creates a new timestamp for each instance.

src/program/services/downloaders/shared.py (2)

4-5: LGTM - Appropriate imports for torrent parsing

The added imports support the new torrent parsing functionality.


121-126:

✅ Verification successful

Fix missing encode function

The encode function is used on line 125 but is not defined or imported. This will cause a runtime error.

-from bencodepy import decode
+from bencodepy import decode, encode
-    infohash = hashlib.sha1(encode(info)).hexdigest()
+    # Make sure to import encode from bencodepy or implement the encoding here
+    infohash = hashlib.sha1(encode(info)).hexdigest()

🏁 Script executed:

#!/bin/bash
# Check if bencodepy provides an encode function
pip show bencodepy
pip install bencodepy
python -c "from bencodepy import encode; print('encode function exists')" || echo "encode function not found"

Length of output: 986


Import encode from bencodepy to prevent a runtime error

The verification confirmed that the installed version of bencodepy (v0.9.5) exports an encode function. To avoid issues when calling encode(info) in this file, please update the import as follows:

  • Change:
    from bencodepy import decode
  • To:
    from bencodepy import decode, encode

Also, consider adding an inline comment in the function to note that the encode function is imported from bencodepy. This update will resolve the issue of the missing function.

🧰 Tools
🪛 Ruff (0.8.2)

125-125: Undefined name encode

(F821)

src/program/services/downloaders/alldebrid.py (3)

25-25: LGTM - Correctly imports the new parse_torrent_file function

The new import gives this class access to the torrent parsing functionality.


166-170: LGTM - Updated method signature allows torrent file uploads

The function signature has been updated to accept an optional torrent_data parameter, and the docstring clearly explains the new functionality.


175-187: LGTM - Implementation properly handles both infohash and torrent file uploads

The code now intelligently handles both upload methods:

  1. If torrent_data is provided, it parses the torrent file and uses the POST endpoint
  2. Otherwise, it uses the existing GET endpoint with the infohash

This maintains backward compatibility while adding the new feature.

src/routers/secure/items.py (2)

7-7: Imports updated correctly for new functionality.

The necessary FastAPI imports for handling file uploads (UploadFile and File) have been added properly.


24-24: Import for torrent parsing function added correctly.

The parse_torrent_file function is properly imported from the shared module.

src/program/services/downloaders/realdebrid.py (2)

24-24: Import updated correctly.

The import statement now correctly includes parse_torrent_file from the shared module.


200-216:

✅ Verification successful

Implementation looks good but has a potential issue in error handling.

The method has been correctly updated to handle both infohash and torrent file uploads. However, there's a potential issue in the error handling code at line 230, where magnet might not be defined if torrent_data is provided.

Apply this fix to ensure the error is properly handled regardless of which path was taken:

    def add_torrent(self, infohash: str, torrent_data: Optional[bytes] = None) -> str:
        """Add a torrent by infohash or .torrent file"""
        try:
            if torrent_data:
                infohash = parse_torrent_file(torrent_data)
                response = self.api.request_handler.execute(
                    HttpMethod.POST,
                    "torrents/addTorrent",
                    files={"file": ("torrent", torrent_data, "application/x-bittorrent")}
                )
            else:
                magnet = f"magnet:?xt=urn:btih:{infohash}"
                response = self.api.request_handler.execute(
                    HttpMethod.POST,
                    "torrents/addMagnet",
                    data={"magnet": magnet.lower()}
                )
            return response["id"]
        except Exception as e:
            if len(e.args) > 0:
                if " 503 " in e.args[0]:
                    logger.debug(f"Failed to add torrent {infohash}: [503] Infringing Torrent or Service Unavailable")
                    raise RealDebridError("Infringing Torrent or Service Unavailable")
                elif " 429 " in e.args[0]:
                    logger.debug(f"Failed to add torrent {infohash}: [429] Rate Limit Exceeded")
                    raise RealDebridError("Rate Limit Exceeded")
                elif " 404 " in e.args[0]:
                    logger.debug(f"Failed to add torrent {infohash}: [404] Torrent Not Found or Service Unavailable")
                    raise RealDebridError("Torrent Not Found or Service Unavailable")
                elif " 400 " in e.args[0]:
-                    logger.debug(f"Failed to add torrent {infohash}: [400] Torrent file is not valid. Magnet: {magnet}")
+                    logger.debug(f"Failed to add torrent {infohash}: [400] Torrent file is not valid.")
                    raise RealDebridError("Torrent file is not valid")

Also, this method signature change could impact other parts of the codebase that call this method.


🏁 Script executed:

#!/bin/bash
# Find all instances of the add_torrent method being called to ensure they're compatible with the new signature

echo "Searching for calls to add_torrent method..."
rg "\.add_torrent\(" --type py | grep -v "def add_torrent"

Length of output: 645


Critical error handling issue resolved in add_torrent

The update correctly removes the undefined reference to magnet from the error log for the 400 error branch, ensuring that the exception message is built without relying on a variable that may not be defined when torrent_data is provided. Additionally, a search across the codebase confirms that all calls to add_torrent supply just the infohash (letting the optional torrent_data default to None), so no unintended breaking changes have been introduced.

  • Affected file: src/program/services/downloaders/realdebrid.py (error handling update at ~line 230)
  • Verified Call Sites:
    • src/routers/secure/scrape.py
    • src/program/services/downloaders/__init__.py
    • src/program/services/downloaders/alldebrid.py

@apollotsantos apollotsantos reopened this Mar 27, 2025
@apollotsantos
Copy link
Author

@Gaisberg I've changed the implementation to leverage existing logic. Couldn't test yet

Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/routers/secure/scrape.py (1)

476-481: Move import statement to the top of the file.

The from bencodepy import decode, encode import should be moved to the top of the file with other imports. While placing it inside the try block does handle potential import errors, it's more conventional and cleaner to keep all imports together at the top.

import hashlib
+from bencodepy import decode, encode

from fastapi import APIRouter, BackgroundTasks, File, HTTPException, Request, UploadFile
from loguru import logger
...

# Inside the function
try:
-   from bencodepy import decode, encode
    torrent_dict = decode(torrent_content)
    info = torrent_dict[b'info']
    info_hash = hashlib.sha1(encode(info)).hexdigest()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89e9266 and ee1207a.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml (2 hunks)
  • src/routers/secure/scrape.py (4 hunks)
  • src/tests/test_alldebrid_downloader.py (1 hunks)
  • src/tests/test_scrape_endpoints.py (1 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
src/tests/test_alldebrid_downloader.py (1)
src/program/services/downloaders/alldebrid.py (1)
  • AllDebridDownloader (64-256)
src/routers/secure/scrape.py (1)
src/program/services/indexers/trakt.py (2)
  • TraktIndexer (14-122)
  • run (55-89)
🪛 Ruff (0.8.2)
src/tests/test_scrape_endpoints.py

1-1: No such file or directory (os error 2)

(E902)

🔇 Additional comments (13)
src/tests/test_alldebrid_downloader.py (2)

2-3: New imports for mock testing and file upload handling.

The added imports for hashlib, MagicMock, and patch will support hash generation and mocking in tests, which aligns with the torrent upload feature implementation.


6-7: FastAPI imports for file upload testing.

These imports are necessary for testing the new torrent upload functionality, allowing proper handling of test client operations and file upload testing.

pyproject.toml (2)

61-61: Added python-multipart to dev dependencies.

Including the same package in dev dependencies ensures it's available during development and testing. This is good practice for consistency between environments.


45-45:

✅ Verification successful

Added python-multipart dependency for file upload support.

The addition of python-multipart is necessary for handling uploaded torrent files in the new endpoint. This is a runtime dependency required by FastAPI for form data processing.


🌐 Web query:

Is version 0.0.20 of python-multipart the latest stable version?

💡 Result:

As of March 2025, version 0.0.20 of Python-Multipart is the latest stable release, with sources confirming its status across multiple platforms:

  • PyPI listing (Dec 2024) explicitly shows 0.0.20 as the latest version[3][6]
  • GitHub releases confirm version 0.0.20 (Dec 2024) as most recent, with active development continuing under Kludex's repository[6][9]
  • Linux distributions package this version as current:
    • Ubuntu's "Plucky Puffin" branch (active development) uses 0.0.20-1[1]
    • Arch Linux lists 0.0.20-1 as latest in Extra repository[4]
    • Debian's package tracker references 0.0.20-1 in main/sid[11]

Key Details

Release timeline from changelogs:

0.0.20 (2024-12-16) - Latest
  - Bugfix: Boundary handling improvements
0.0.19 (2024-11-30)
0.0.18 (2024-11-28)
...
```[9]

The package maintains **100% test coverage** and follows semantic versioning with strict type checking since 0.0.12[9][10]. While another `multipart` package exists at version 1.2.1[5], this is a separate project with different maintainers.

Citations:

- 1: https://launchpad.net/ubuntu/+source/python-multipart
- 2: https://packages.guix.gnu.org/packages/python-multipart/0.0.20/
- 3: https://pypi.org/project/python-multipart/
- 4: https://archlinux.org/packages/extra/any/python-python-multipart/
- 5: https://pypi.org/project/multipart/
- 6: https://github.com/andrew-d/python-multipart/releases
- 7: https://multipart.readthedocs.io
- 8: https://devguide.python.org/versions/
- 9: https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md
- 10: https://github.com/Kludex/python-multipart
- 11: https://tracker.debian.org/python-multipart

---



</details>

**Python-multipart dependency version verified**

File: pyproject.toml, Line 45:
- The addition of `python-multipart = "^0.0.20"` is correct, as version 0.0.20 is confirmed to be the latest stable release.
- This dependency is essential for handling file uploads with FastAPI’s form data processing.

</details>
<details>
<summary>src/tests/test_scrape_endpoints.py (5)</summary>

`1-15`: **Good test setup with appropriate imports and client initialization.**

The test file properly imports necessary modules and sets up the test client. This establishes the groundwork for testing the new torrent upload endpoint.

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 Ruff (0.8.2)</summary>

1-1: No such file or directory (os error 2)

(E902)

</details>

</details>

---

`20-33`: **Well-structured fixture for mocking bencodepy and hash generation.**

The `mock_bencodepy` fixture correctly mocks the bencodepy library functions and hashlib's SHA1 functionality, ensuring consistent test behavior with a predetermined hash value that matches existing test data.

---

`35-66`: **Comprehensive test service mocking.**

The mock_services fixture properly sets up mocks for TraktIndexer, MediaItem, Downloader, and BackgroundTasks, along with resetting the session manager before each test. This ensures tests run in isolation and with predictable behavior.

---

`67-102`: **Well-designed test for successful torrent file upload.**

This test thoroughly validates the successful upload path, including:
- Creating a fake torrent file
- Mocking the necessary program services
- Verifying the response status, message, session ID, and torrent ID
- Confirming that _process_infohash was called with correct parameters

The test provides excellent coverage of the happy path flow.

---

`103-129`: **Good test coverage for invalid file upload scenarios.**

This test appropriately verifies that the endpoint properly rejects non-torrent files with a 400 status code and meaningful error message. The assertion that _process_infohash is not called ensures the validation is working as expected.

</details>
<details>
<summary>src/routers/secure/scrape.py (4)</summary>

`6-6`: **Added hashlib import for torrent infohash calculation.**

The hashlib library is required for computing SHA1 hashes from torrent file contents.

---

`8-8`: **Updated imports to support file uploads.**

Added File and UploadFil
8000
e imports from FastAPI to handle uploaded torrent files.

---

`189-245`: **Well-structured refactoring to extract common infohash processing logic.**

The _process_infohash function effectively consolidates duplicated logic that was previously in start_manual_session, making the code more maintainable and easier to extend. The function handles:
- Item identification via IMDb or database ID
- Service initialization validation
- Torrent availability checking
- Session creation and management
- Error handling

This refactoring follows good software engineering practices by reducing code duplication.

---

`315-315`: **Updated start_manual_session to use the refactored helper function.**

This change simplifies the function by delegating the processing to the new _process_infohash helper function.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/routers/secure/scrape.py (1)

464-464: Minor static analysis warning about function call in default argument.

The static analyzer flagged using File(...) as a default argument. While this is a common pattern in FastAPI, it's technically executing a function at module load time rather than function call time.

-async def upload_torrent_file(
-    request: Request,
-    background_tasks: BackgroundTasks,
-    item_id: str,
-    torrent_file: UploadFile = File(...),
-) -> StartSessionResponse:
+async def upload_torrent_file(
+    request: Request,
+    background_tasks: BackgroundTasks,
+    item_id: str,
+    torrent_file: UploadFile,
+) -> StartSessionResponse:

And update the FastAPI route decorator to specify the parameter as a file using FastAPI's dependency injection system instead.

🧰 Tools
🪛 Ruff (0.8.2)

464-464: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee1207a and 6d2269c.

📒 Files selected for processing (1)
  • src/routers/secure/scrape.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/routers/secure/scrape.py (3)
src/program/services/indexers/trakt.py (2)
  • TraktIndexer (14-122)
  • run (55-89)
src/program/db/db_functions.py (1)
  • get_item_by_id (23-44)
src/program/services/downloaders/shared.py (3)
  • get_instant_availability (30-41)
  • add_torrent (44-57)
  • get_torrent_info (71-81)
🪛 Ruff (0.8.2)
src/routers/secure/scrape.py

234-234: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


464-464: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


496-496: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (3)
src/routers/secure/scrape.py (3)

6-8: Appropriate import additions for the new torrent upload feature.

The addition of hashlib and the expansion of FastAPI imports to include File and UploadFile properly support the new torrent file upload functionality.


315-315: Good refactoring to use the centralized infohash processing.

The simplified implementation of start_manual_session properly delegates to the new helper function, reducing code duplication.


468-482: Good implementation of file size validation to prevent DoS attacks.

The chunk-based file reading approach efficiently handles large files by:

  1. Reading in manageable chunks (8192 bytes)
  2. Checking the accumulated size against the limit
  3. Rejecting files exceeding the 10MB limit

This is a security best practice for file upload endpoints.

Comment on lines +189 to +244
async def _process_infohash(
request: Request,
background_tasks: BackgroundTasks,
item_id: str,
info_hash: str,
source_description: str = "manual"
) -> StartSessionResponse:
"""Common logic for processing an infohash from either a magnet link or a torrent file"""
# Identify item based on IMDb or database ID
if item_id.startswith("tt"):
imdb_id = item_id
item_id = None
else:
imdb_id = None
item_id = item_id

if services := request.app.program.services:
indexer = services[TraktIndexer]
downloader = services[Downloader]
else:
raise HTTPException(status_code=412, detail="Required services not initialized")

initialize_downloader(downloader)

if imdb_id:
prepared_item = MediaItem({"imdb_id": imdb_id})
item = next(indexer.run(prepared_item))
else:
item = db_functions.get_item_by_id(item_id)

if not item:
raise HTTPException(status_code=404, detail="Item not found")

container = downloader.get_instant_availability(info_hash, item.type)
if not container or not container.cached:
raise HTTPException(status_code=400, detail=f"Torrent is not cached, please try another {source_description}")

session = session_manager.create_session(item_id or imdb_id, info_hash)

try:
torrent_id: str = downloader.add_torrent(info_hash)
torrent_info: TorrentInfo = downloader.get_torrent_info(torrent_id)
session_manager.update_session(session.id, torrent_id=torrent_id, torrent_info=torrent_info, containers=container)
except Exception as e:
background_tasks.add_task(session_manager.abort_session, session.id)
raise HTTPException(status_code=500, detail=str(e))

data = {
"message": f"Started manual scraping session from {source_description}",
"session_id": session.id,
"torrent_id": torrent_id,
"torrent_info": torrent_info,
"containers": container,
"expires_at": session.expires_at.isoformat()
}
return StartSessionResponse(**data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Well-designed extraction of common infohash processing logic.

Creating the _process_infohash helper function is a excellent refactoring that centralizes infohash handling logic for both magnet links and torrent file uploads. The code is well-structured with proper error handling, parame 8000 ter typing, and a clear docstring.

However, there's an error handling improvement that should be made:

-        raise HTTPException(status_code=500, detail=str(e))
+        raise HTTPException(status_code=500, detail=str(e)) from e

This change preserves the original exception context, which is helpful for debugging and error tracing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _process_infohash(
request: Request,
background_tasks: BackgroundTasks,
item_id: str,
info_hash: str,
source_description: str = "manual"
) -> StartSessionResponse:
"""Common logic for processing an infohash from either a magnet link or a torrent file"""
# Identify item based on IMDb or database ID
if item_id.startswith("tt"):
imdb_id = item_id
item_id = None
else:
imdb_id = None
item_id = item_id
if services := request.app.program.services:
indexer = services[TraktIndexer]
downloader = services[Downloader]
else:
raise HTTPException(status_code=412, detail="Required services not initialized")
initialize_downloader(downloader)
if imdb_id:
prepared_item = MediaItem({"imdb_id": imdb_id})
item = next(indexer.run(prepared_item))
else:
item = db_functions.get_item_by_id(item_id)
if not item:
raise HTTPException(status_code=404, detail="Item not found")
container = downloader.get_instant_availability(info_hash, item.type)
if not container or not container.cached:
raise HTTPException(status_code=400, detail=f"Torrent is not cached, please try another {source_description}")
session = session_manager.create_session(item_id or imdb_id, info_hash)
try:
torrent_id: str = downloader.add_torrent(info_hash)
torrent_info: TorrentInfo = downloader.get_torrent_info(torrent_id)
session_manager.update_session(session.id, torrent_id=torrent_id, torrent_info=torrent_info, containers=container)
except Exception as e:
background_tasks.add_task(session_manager.abort_session, session.id)
raise HTTPException(status_code=500, detail=str(e))
data = {
"message": f"Started manual scraping session from {source_description}",
"session_id": session.id,
"torrent_id": torrent_id,
"torrent_info": torrent_info,
"containers": container,
"expires_at": session.expires_at.isoformat()
}
return StartSessionResponse(**data)
async def _process_infohash(
request: Request,
background_tasks: BackgroundTasks,
item_id: str,
info_hash: str,
source_description: str = "manual"
) -> StartSessionResponse:
"""Common logic for processing an infohash from either a magnet link or a torrent file"""
# Identify item based on IMDb or database ID
if item_id.startswith("tt"):
imdb_id = item_id
item_id = None
else:
imdb_id = None
item_id = item_id
if services := request.app.program.services:
indexer = services[TraktIndexer]
downloader = services[Downloader]
else:
raise HTTPException(status_code=412, detail="Required services not initialized")
initialize_downloader(downloader)
if imdb_id:
prepared_item = MediaItem({"imdb_id": imdb_id})
item = next(indexer.run(prepared_item))
else:
item = db_functions.get_item_by_id(item_id)
if not item:
raise HTTPException(status_code=404, detail="Item not found")
container = downloader.get_instant_availability(info_hash, item.type)
if not container or not container.cached:
raise HTTPException(status_code=400, detail=f"Torrent is not cached, please try another {source_description}")
session = session_manager.create_session(item_id or imdb_id, info_hash)
try:
torrent_id: str = downloader.add_torrent(info_hash)
torrent_info: TorrentInfo = downloader.get_torrent_info(torrent_id)
session_manager.update_session(session.id, torrent_id=torrent_id, torrent_info=torrent_info, containers=container)
except Exception as e:
background_tasks.add_task(session_manager.abort_session, session.id)
- raise HTTPException(status_code=500, detail=str(e))
+ raise HTTPException(status_code=500, detail=str(e)) from e
data = {
"message": f"Started manual scraping session from {source_description}",
"session_id": session.id,
"torrent_id": torrent_id,
"torrent_info": torrent_info,
"containers": container,
"expires_at": session.expires_at.isoformat()
}
return StartSessionResponse(**data)
🧰 Tools
🪛 Ruff (0.8.2)

234-234: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +455 to +498
"/scrape/upload_torrent",
summary="Upload a torrent file to start a manual scraping session",
operation_id="upload_torrent_file"
)
async def upload_torrent_file(
request: Request,
background_tasks: BackgroundTasks,
item_id: str,
torrent_file: UploadFile = File(...),
) -> StartSessionResponse:
session_manager.cleanup_expired(background_tasks)

# Check file size limit (10MB is typically more than enough for torrent files)
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB
file_size = 0
content = bytearray()

# Read the file in chunks to check size without loading all into memory
chunk = await torrent_file.read(8192)
while chunk:
file_size += len(chunk)
if file_size > MAX_FILE_SIZE:
raise HTTPException(status_code=413, detail="File too large, maximum size is 10MB")
content.extend(chunk)
chunk = await torrent_file.read(8192)

torrent_content = bytes(content)

# Validate the file is a .torrent file
if not torrent_file.filename.endswith('.torrent'):
raise HTTPException(status_code=400, detail="File must be a .torrent file")

# Extract infohash from torrent file
try:
from bencodepy import decode, encode
torrent_dict = decode(torrent_content)
info = torrent_dict[b'info']
info_hash = hashlib.sha1(encode(info)).hexdigest()
except Exception as e:
logger.error(f"Failed to extract infohash from torrent file: {e}")
raise HTTPException(status_code=400, detail="Invalid torrent file format")

return await _process_infohash(request, background_tasks, item_id, info_hash, "uploaded torrent file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Well-implemented torrent file upload endpoint with security considerations.

The upload_torrent_file function properly:

  • Validates file extension
  • Implements file size limits (10MB) to prevent DoS attacks
  • Reads files in chunks to avoid memory issues
  • Safely extracts the infohash using bencodepy
  • Delegates to the common processing function

However, two issues need to be addressed:

-        raise HTTPException(status_code=400, detail="Invalid torrent file format")
+        raise HTTPException(status_code=400, detail="Invalid torrent file format") from e

Also, consider adding validation for the torrent content structure beyond just the file extension to ensure it's a valid torrent file before attempting to decode it.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@router.post(
"/scrape/upload_torrent",
summary="Upload a torrent file to start a manual scraping session",
operation_id="upload_torrent_file"
)
async def upload_torrent_file(
request: Request,
background_tasks: BackgroundTasks,
item_id: str,
torrent_file: UploadFile = File(...),
) -> StartSessionResponse:
session_manager.cleanup_expired(background_tasks)
# Check file size limit (10MB is typically more than enough for torrent files)
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB
file_size = 0
content = bytearray()
# Read the file in chunks to check size without loading all into memory
chunk = await torrent_file.read(8192)
while chunk:
file_size += len(chunk)
if file_size > MAX_FILE_SIZE:
raise HTTPException(status_code=413, detail="File too large, maximum size is 10MB")
content.extend(chunk)
chunk = await torrent_file.read(8192)
torrent_content = bytes(content)
# Validate the file is a .torrent file
if not torrent_file.filename.endswith('.torrent'):
raise HTTPException(status_code=400, detail="File must be a .torrent file")
# Extract infohash from torrent file
try:
from bencodepy import decode, encode
torrent_dict = decode(torrent_content)
info = torrent_dict[b'info']
info_hash = hashlib.sha1(encode(info)).hexdigest()
except Exception as e:
logger.error(f"Failed to extract infohash from torrent file: {e}")
raise HTTPException(status_code=400, detail="Invalid torrent file format")
return await _process_infohash(request, background_tasks, item_id, info_hash, "uploaded torrent file")
@ 1063A router.post(
"/scrape/upload_torrent",
summary="Upload a torrent file to start a manual scraping session",
operation_id="upload_torrent_file"
)
async def upload_torrent_file(
request: Request,
background_tasks: BackgroundTasks,
item_id: str,
torrent_file: UploadFile = File(...),
) -> StartSessionResponse:
session_manager.cleanup_expired(background_tasks)
# Check file size limit (10MB is typically more than enough for torrent files)
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB
file_size = 0
content = bytearray()
# Read the file in chunks to check size without loading all into memory
chunk = await torrent_file.read(8192)
while chunk:
file_size += len(chunk)
if file_size > MAX_FILE_SIZE:
raise HTTPException(status_code=413, detail="File too large, maximum size is 10MB")
content.extend(chunk)
chunk = await torrent_file.read(8192)
torrent_content = bytes(content)
# Validate the file is a .torrent file
if not torrent_file.filename.endswith('.torrent'):
raise HTTPException(status_code=400, detail="File must be a .torrent file")
# Extract infohash from torrent file
try:
from bencodepy import decode, encode
torrent_dict = decode(torrent_content)
info = torrent_dict[b'info']
info_hash = hashlib.sha1(encode(info)).hexdigest()
except Exception as e:
logger.error(f"Failed to extract infohash from torrent file: {e}")
- raise HTTPException(status_code=400, detail="Invalid torrent file format")
+ raise HTTPException(status_code=400, detail="Invalid torrent file format") from e
return await _process_infohash(request, background_tasks, item_id, info_hash, "uploaded torrent file")
🧰 Tools
🪛 Ruff (0.8.2)

464-464: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


496-496: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

@dreulavelle
Copy link
Member

@apollotsantos whats the status on this one?

@apollotsantos
Copy link
Author

@apollotsantos whats the status on this one?

I stopped work on this as I didn't have enough time to dedicate. Feel free to close it. Hopefully I can pick it up again soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0