-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Bugfix/browsermgr session #1200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…orts in browser_manager.py for consistency - Introduced a new .flake8 file to enforce coding standards including max line length, complexity, and docstring conventions. - Refactored import statements in browser_manager.py for better organization and consistency, ensuring all imports are grouped and sorted. - Cleaned up whitespace and formatting issues throughout the file to adhere to style guidelines.
WalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BrowserManager
participant ManagedBrowser
Client->>BrowserManager: get_page(crawlerRunConfig)
alt session_id provided and found
BrowserManager->>ManagedBrowser: get existing session context and page
BrowserManager-->>Client: return session page
else session_id provided but not found
BrowserManager->
8000
;>ManagedBrowser: create new browser context and page
BrowserManager->>BrowserManager: store new session in sessions dict
BrowserManager-->>Client: return new session page
else no session_id
BrowserManager->>ManagedBrowser: use shared or config-based context
BrowserManager-->>Client: return page
end
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/browser/test_playwright_strategy.py (1)
232-296
: Consider refactoring to reduce local variable complexity.The session management test function has many local variables which could impact readability. Consider breaking this into smaller helper functions or reducing variable scope where possible.
Example refactor approach:
async def test_playwright_session_management(): """Test session management with Playwright browser.""" logger.info("Testing session management with Playwright browser", tag="TEST") browser_config = BrowserConfig(headless=True) manager = BrowserManager(browser_config=browser_config, logger=logger) try: await manager.start() logger.info("Browser launched successfully", tag="TEST") # Setup and verify sessions session_results = await _setup_and_verify_sessions(manager) return session_results except Exception as e: logger.error(f"Test failed: {str(e)}", tag="TEST") try: await manager.close() except Exception as e: logger.error(f"Error closing browser: {str(e)}", tag="TEST") return False async def _setup_and_verify_sessions(manager): # Extract session setup logic here pass🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 232-232: Too many local variables (17/15)
(R0914)
crawl4ai/browser_manager.py (2)
974-984
: Consider simplifying the else clause structure.The else clause after the return statement can be simplified by removing the else and de-indenting the code, as suggested by static analysis.
Apply this refactoring:
if crawlerRunConfig.session_id in self.sessions: context, page, _ = self.sessions[crawlerRunConfig.session_id] # Update last-used timestamp self.sessions[crawlerRunConfig.session_id] = (context, page, time.time()) return page, context - # If session does not exist, create a new context and page specifically for it. - else: - context = await self.create_browser_context(crawlerRunConfig) - await self.setup_context(context, crawlerRunConfig) - page = await context.new_page() - self.sessions[crawlerRunConfig.session_id] = ( - context, - page, - time.time(), - ) - return page, context + # If session does not exist, create a new context and page specifically for it. + context = await self.create_browser_context(crawlerRunConfig) + await self.setup_context(context, crawlerRunConfig) + page = await context.new_page() + self.sessions[crawlerRunConfig.session_id] = ( + context, + page, + time.time(), + ) + return page, context
866-880
: Consider simplifying nested if statements.The nested if statements can be combined into a single condition for better readability.
Apply this refactoring:
- if crawlerRunConfig: - # Check if there is value for crawlerRunConfig.proxy_config set add that to context - if crawlerRunConfig.proxy_config: + if crawlerRunConfig and crawlerRunConfig.proxy_config: + # Add proxy config to context proxy_settings = { "server": crawlerRunConfig.proxy_config.server, }🧰 Tools
🪛 Ruff (0.11.9)
866-868: Use a single
if
statement instead of nestedif
statements(SIM102)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.flake8
(1 hunks).gitignore
(1 hunks)crawl4ai/browser_manager.py
(26 hunks)tests/browser/test_playwright_strategy.py
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/browser/test_playwright_strategy.py (7)
crawl4ai/browser_manager.py (6)
BrowserManager
(520-1069)close
(1036-1069)start
(179-267)start
(581-625)get_page
(952-1009)kill_session
(1011-1023)crawl4ai/async_configs.py (2)
BrowserConfig
(323-598)CrawlerRunConfig
(671-1323)crawl4ai/async_logger.py (2)
AsyncLogger
(78-321)LogLevel
(12-27)tests/browser/test_cdp_strategy.py (1)
run_tests
(209-225)tests/browser/test_browser_manager.py (1)
run_tests
(170-187)tests/browser/test_builtin_strategy.py (1)
run_tests
(142-157)tests/browser/test_profiles.py (1)
run_tests
(158-173)
🪛 Pylint (3.3.7)
tests/browser/test_playwright_strategy.py
[error] 25-25: Possibly using variable 'BrowserConfig' before assignment
(E0606)
[error] 28-28: Possibly using variable 'BrowserManager' before assignment
(E0606)
[refactor] 232-232: Too many local variables (17/15)
(R0914)
crawl4ai/browser_manager.py
[refactor] 253-258: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 967-983: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
🪛 Ruff (0.11.9)
crawl4ai/browser_manager.py
866-868: Use a single if
statement instead of nested if
statements
(SIM102)
🔇 Additional comments (7)
.gitignore (1)
231-233
: LGTM: Clean repository maintenance.Good addition to exclude cursor editor artifacts from version control.
.flake8 (1)
1-31
: LGTM: Well-configured linting rules.The flake8 configuration establishes good code quality standards with:
- Reasonable complexity and line length limits
- Appropriate exclusions for Black formatter compatibility
- Sensible per-file ignores for different file types
This will help maintain consistent code style across the project.
tests/browser/test_playwright_strategy.py (3)
13-17
: LGTM: Improved import organization.The conditional import structure and explicit LogLevel import improve code clarity and organization.
59-61
: LGTM: Enhanced exception handling.Replacing bare except clauses with explicit exception catching and error logging significantly improves debugging capabilities and code robustness.
Also applies to: 107-108, 168-170, 226-228, 292-294
302-307
: Verify focused test execution is intentional.The run_tests function is currently only executing the session management test with other tests commented out. Ensure this is intentional for this PR focused on session isolation.
If this is temporary for PR testing, consider adding a comment indicating when to re-enable the other tests.
crawl4ai/browser_manager.py (2)
964-984
: LGTM: Critical session isolation fix implemented correctly.This change successfully addresses the main bug by ensuring each session gets its own dedicated browser context and page. The explicit session handling logic prevents context sharing between sessions, which was causing the "Target page, context or browser has been closed" error.
Key improvements:
- Clear separation of session-specific logic with comments
- New sessions create dedicated contexts instead of sharing
- Proper session storage with timestamps for TTL management
This implementation aligns perfectly with the PR objectives to enforce strict session isolation.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 967-983: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
1-21
: LGTM: Improved import organization.The import formatting and organization improvements enhance code readability and maintain consistency with the new flake8 configuration.
Also there is a bug in the browser_profiler.py Bug ReportTitle: Reported By: User Description: Steps to Reproduce:
Expected Behavior: Actual Behavior: Root Cause Analysis: Resolution: Affected Files:
--- a/crawl4ai/browser_profiler.py
+++ b/crawl4ai/browser_profiler.py
@@ -621,22 +621,22 @@
self.logger.info(f"Debugging port: {debugging_port}", tag="CDP")
self.logger.info(f"Headless mode: {headless}", tag="CDP")
- # Create managed browser instance
- managed_browser = ManagedBrowser(
+ # Create a BrowserConfig object to pass to ManagedBrowser
+ browser_cfg = BrowserConfig(
browser_type=browser_type,
user_data_dir=profile_path,
headless=headless,
- logger=self.logger,
- debugging_port=debugging_port
+ debugging_port=debugging_port,
+ )
+
+ # Create managed browser instance
+ managed_browser = ManagedBrowser(
+ browser_config=browser_cfg,
+ logger=self.logger
)
# Set up signal handlers to ensure cleanup on interrupt
original_sigint = signal.getsignal(signal.SIGINT)
@@ -816,13 +816,19 @@
user_data_dir = os.path.join(self.builtin_browser_dir, "user_data")
os.makedirs(user_data_dir, exist_ok=True)
- # Create managed browser instance
- managed_browser = ManagedBrowser(
+ # Create a BrowserConfig object to pass to ManagedBrowser
+ browser_cfg = BrowserConfig(
browser_type=browser_type,
user_data_dir=user_data_dir,
headless=headless,
+ debugging_port=debugging_port,
+ )
+
+ # Create managed browser instance
+ managed_browser = ManagedBrowser(
+ browser_config=browser_cfg,
logger=self.logger,
- debugging_port=debugging_port
)
try: |
# Update last-used timestamp | ||
self.sessions[crawlerRunConfig.session_id] = (context, page, time.time()) | ||
return page, context | ||
# --- START: Session-specific logic --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the fix for the session failure.
Summary
This pull request addresses a critical bug where the BrowserManager would improperly share a single browser context across multiple, distinct user sessions. This led to unpredictable behavior, most notably causing active sessions to fail when another session was terminated.
The fix enforces strict context isolation, ensuring that every new session is provisioned with its own dedicated browser context. This change guarantees session integrity and resolves the "Target page, context or browser has been closed" error observed in tests.
List of files changed and why
crawl4ai/browser_manager.py:
Refactored the get_page() method to create a new, dedicated BrowserContext whenever a new session_id is provided. This prevents context sharing between sessions.
Removed debugging logs that were added to trace the issue.
tests/browser/test_playwright_strategy.py:
Removed unused imports and corrected minor linter errors for code quality.
Replaced bare except: clauses with specific exception handling for more robust tests.
The test logic was slightly improved to more accurately reflect the state after a session is killed, though the primary fix was in the manager itself.
How Has This Been Tested?
The changes were verified by running the test_playwright_strategy.py test suite. Specifically, the test_playwright_session_management test, which previously failed reliably, now passes.
The test procedure is as follows:
Launch the test script: python tests/browser/test_playwright_strategy.py
The script initializes two separate sessions (session1 and session2), each with its own browser context.
It then kills session1.
Finally, it verifies that session2 remains fully functional, confirming that its context was not affected by the termination of session1.
Checklist:
[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[ ] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation
[x] I have added/updated unit tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
Summary by CodeRabbit
.cursor/
directory from version control.