# Owl Browser — Full Documentation This file contains the complete text content from Owl Browser's key documentation pages, concatenated for LLM ingestion. Site: https://owlbrowser.net | By: Olib AI (https://olib.ai) --- # Owl Browser — Product Overview **The browser engine for automation at scale.** Owl Browser is a high-performance browser engine built on a custom Chromium/CEF with C++ source-level fingerprint spoofing compiled directly into the Blink renderer. It is designed as a self-hosted replacement for Playwright and Puppeteer. The browser combines advanced antidetect capabilities (27 C++ override modules, 31 Chromium patches) with natural language interaction, enabling developers to build undetectable automation at scale. Because spoofing is implemented at the Chromium source level, spoofed values are produced by the same code paths as genuine browser properties — making them undetectable by JavaScript introspection, toString() analysis, or prototype chain checks. The browser features a built-in vision LLM for intelligent page understanding, 170+ automation tools for comprehensive control, and fingerprint virtualization that defeats modern bot detection systems. Deploy as a Docker container with a built-in React control panel, REST API, WebSocket, and MCP integration. **Website**: https://www.owlbrowser.net | **By**: Olib AI (https://www.olib.ai) ## Key Features - **170+ Automation Tools** - Complete browser control via REST API, WebSocket, and MCP protocol (157 MCP tools for AI agent integration) - **Natural Language Selectors** - Click, type, and interact using descriptions like "search button" or "email input" instead of brittle CSS selectors - **Built-in Vision LLM** - On-device llama.cpp server with Qwen3-VL-2B model for page understanding, CAPTCHA solving, and intelligent automation - **Third-Party LLM Support** - Connect to OpenAI, Claude, or any OpenAI-compatible API - **Undetectable Stealth** - C++ source-level fingerprint spoofing compiled into a custom Chromium build (27 override modules, 31 patches), no JavaScript injection — spoofed values come from the same Blink code paths as real values, passing fingerprint.com and similar services - **Docker-First Deployment** - Production-ready container with nginx, React control panel, TOR integration, and s6-overlay process supervision - **Multi-Context Isolation** - Run up to 256 isolated browser sessions with unique fingerprints from a single instance - **64-Socket Parallel IPC** - Concurrent command execution via a pool of Unix domain socket connections - **Content Extraction** - Readability, HTML-to-Markdown, JSON extraction, site-specific templates, and full-site crawling - **Cross-Platform** - Native builds for macOS (arm64/x64), Linux (x64), and Windows (x64) - **SDKs** - Official Node.js and Python SDKs with async-first design and dynamic OpenAPI method generation ## Architecture Overview Owl Browser follows a modular architecture with four layers: client applications, HTTP middleware, browser core, and container infrastructure. ``` +===================================================================+ | Docker Container | | | | +-------------------------------------------------------------+ | | | nginx (port 80/443) s6-overlay supervised | | | | - React Control Panel (/) | | | | - API proxy (/execute/*, /ws, /tools, /health) | | | | - TLS termination, gzip, security headers | | | +-------------------------------+-----------------------------+ | | | | | +-------------------------------v-----------------------------+ | | | HTTP Server (port 8080) C99, poll-based | | | | +-------------------------------------------------------+ | | | | | REST API | WebSocket | Auth | Rate Limit | CORS | Help | | | | | +-------------------------------------------------------+ | | | | | Async IPC Layer (64-socket pool, request ID matching) | | | | | +-------------------------------------------------------+ | | | +-------------------------------+-----------------------------+ | | | | | +-------------------------------v-----------------------------+ | | | Owl Browser Core CEF (Chromium) | | | | +----------+ +----------+ +----------+ +----------+ | | | | | Context | | Context | | Context | | Context | | | | | | 1 | | 2 | | 3 | | ... | | | | | +----------+ +----------+ +----------+ +----------+ | | | | | | | | +--------------+ +--------------+ +------------------+ | | | | | Stealth | | AI Layer | | Media Layer | | | | | | - VM Profiles| | - llama.cpp | | - Recording | | | | | | - GPU Virt. | | - Vision LLM | | - Live Streaming | | | | | | - Spoof Mgr | | - NLA Engine | | - Screenshots | | | | | +--------------+ +--------------+ +------------------+ | | | +-------------------------------------------------------------+ | | | | +---------------------+ | | | TOR (SOCKS: 9050) | s6-overlay supervised | | +---------------------+ | +===================================================================+ | +---------------+----------------+------------------+ | | | | +---v---+ +-----v-----+ +-----v------+ +------v-----+ |Node.js| | Python | | MCP | | Direct | | SDK | | SDK | | Client | | IPC | +-------+ +-----------+ +------------+ +------------+ ``` ### Browser Core The browser core is built on a custom CEF (Chromium Embedded Framework) compiled from patched Chromium source: - Multi-process architecture (browser, renderer, GPU processes) - Off-screen rendering for headless operation - IPC via Unix domain sockets or stdin/stdout pipes - Context isolation with independent cookie jars and fingerprints - Up to 256 concurrent browser contexts ### Stealth Layer (Chromium Source-Level) All fingerprint spoofing is implemented as C++ modifications compiled directly into Chromium's Blink renderer: - 27 C++ override modules covering navigator, canvas, WebGL, audio, fonts, WebRTC, timezone, and more - 31 minimal hook patches that wire Chromium source files to the spoofing overlay - VirtualMachine profiles define complete browser identities (delivered via CEF IPC) - Spoofed values come from the same code paths as real values — undetectable by JS introspection - Worker coverage: Dedicated, Shared, and Service Workers receive per-context configs ### AI Layer On-device AI for intelligent automation: - OwlLlamaServer manages model lifecycle - OwlSemanticMatcher finds elements by description - OwlNLA executes natural language commands - OwlAIIntelligence provides page analysis ### Media Layer Video and image capture: - OwlVideoRecorder captures to MP4 via FFmpeg - OwlLiveStreamer provides MJPEG streams - Shared memory frame buffer for efficient capture - Native screenshot capture with zoom support ## Deployment ### Docker Container The production deployment runs as a Docker container with s6-overlay managing four services: | Service | Type | Description | |---------|------|-------------| | `owl-init-config` | oneshot | Secrets injection, TLS setup, license validation | | `tor` | longrun | TOR SOCKS proxy (port 9050) and control (port 9051) | | `owl-http-server` | longrun | HTTP API server with embedded browser process | | `nginx` | longrun | Reverse proxy, TLS termination, serves React panel | ```bash # Build with BuildKit secrets (required) DOCKER_BUILDKIT=1 docker build \ --secret id=owl_nonce_hmac_secret,src=./secrets/nonce_hmac_secret.txt \ --secret id=owl_vm_profile_db_pass,src=./secrets/vm_profile_db_pass.txt \ -t olib-browser:latest -f docker/Dockerfile . ``` ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `OWL_HTTP_TOKEN` | *required* | Bearer token for API authentication | | `OWL_PANEL_PASSWORD` | *required* | Control panel login password | | `OWL_HTTP_PORT` | 8080 | HTTP server port (internal) | | `OWL_PANEL_PORT` | 80 | Nginx port (external) | | `OWL_HTTP_MAX_CONNECTIONS` | 100 | Max concurrent HTTP connections | | `OWL_HTTP_TIMEOUT` | 30000 | Request timeout (ms) | | `OWL_BROWSER_TIMEOUT` | 60000 | Browser command timeout (ms) | | `OWL_RATE_LIMIT_ENABLED` | true | Enable rate limiting | | `OWL_RATE_LIMIT_REQUESTS` | 1000 | Requests per window | | `OWL_RATE_LIMIT_WINDOW` | 60 | Window duration (seconds) | | `OWL_RATE_LIMIT_BURST` | 200 | Burst allowance | | `OWL_CORS_ENABLED` | false | Enable CORS headers | | `OWL_TLS_ENABLED` | true | Enable HTTPS | | `OWL_DEV_MODE` | false | Disable auth and TLS validation | ## HTTP Server & API The HTTP server is a C99 middleware layer bridging REST/WebSocket clients to the browser process via IPC. ### Endpoints | Endpoint | Auth | Description | |----------|------|-------------| | `GET /health` | No | Health check | | `GET /tools` | Yes | List all tools with schemas | | `POST /execute/{tool_name}` | Yes | Execute a browser tool | | `WS /ws` | Yes | WebSocket for bidirectional communication | | `GET /video/stream/{context_id}` | Yes | MJPEG live stream | | `GET /video/recording/download/{filename}` | Yes | Download recorded video | | `POST /help` | Yes | Semantic tool search (natural language) | | `GET /openapi.json` | No | OpenAPI specification | | `GET /agent-skills.md` | Yes | Full tool reference in MCP format | ### Tool Categories | Category | Tools | Description | |----------|-------|-------------| | Context | 3 | Create, close, list browser contexts | | Navigation | 7 | Navigate, reload, back, forward, history state | | Interaction | 16 | Click, type, pick, drag, hover, keyboard | | Content | 7 | Screenshot, text, HTML, markdown, JSON extraction | | AI/LLM | 9 | Summarize, query, NLA, AI click/type/extract | | Scroll | 5 | Scroll by pixels, to element, top, bottom | | Wait | 5 | Wait for selector, network, URL, function, timeout | | Video | 8 | Record, pause, resume, stop, live stream | | CAPTCHA | 5 | Detect, classify, solve text/image/auto | | Cookies | 3 | Get, set, delete cookies | | Proxy | 4 | Set, get status, connect, disconnect | | Profile | 6 | Create, load, save, get, update cookies, context info | | Network | 6 | Interception rules, logging, mock responses | | Downloads | 5 | Set path, list, wait, cancel downloads | | Dialogs | 5 | Handle alerts, confirms, prompts | | Tabs | 7 | List, create, switch, close tabs | | Frames | 3 | List, switch to frame, switch to main | | Elements | 10 | Visibility, enabled, checked, attributes, position | | Demographics | 5 | Location, datetime, weather, homepage | | License | 4 | Status, info, fingerprint, add license | ### API Access Methods **HTTP REST API** ``` POST /execute/{tool_name} Authorization: Bearer Content-Type: application/json {"param1": "value1", "param2": "value2"} ``` **WebSocket** ```json {"id": 1, "method": "browser_navigate", "params": {"context_id": "ctx_1", "url": "https://example.com"}} ``` **MCP Protocol** ``` Tools are exposed via MCP for direct AI agent integration. Configure Claude Desktop or other MCP clients with the owl-browser-mcp server. npx @olib-ai/owl-browser-mcp ``` **Native IPC** ``` Direct communication via Unix domain socket (/tmp/owl_browser_{id}.sock) or stdin/stdout pipes. ``` ## SDKs & Integration ### Node.js SDK (@olib-ai/owl-browser-sdk) **Version**: 2.0.4 | **Min Node**: 18.0.0 ```bash npm install @olib-ai/owl-browser-sdk ``` ### Python SDK (owl-browser) **Version**: 2.0.4 | **Min Python**: 3.12 ```bash pip install owl-browser ``` ### MCP Server (@olib-ai/owl-browser-mcp) **Version**: 2.0.4 | **Protocol**: MCP 2025-11-25 - 157 browser automation tools exposed via MCP - Compatible with Claude Desktop, Cursor, and other MCP clients ```bash npx @olib-ai/owl-browser-mcp ``` ## Browser Capabilities ### Fingerprint Spoofing Owl Browser implements comprehensive fingerprint protection across all browser vectors at the Chromium C++ source level. Each browser context receives a complete, internally-consistent virtual machine profile. Because spoofing is compiled directly into Chromium's Blink renderer, spoofed values are produced by the same code paths as genuine values and cannot be detected through JavaScript introspection, toString() leaks, or prototype chain analysis. **Canvas Fingerprint**: Deterministic noise applied to all canvas operations — HTMLCanvasElement.toDataURL, toBlob, getImageData, OffscreenCanvas, WebGL canvas sources. Per-context seeds ensure consistent fingerprints within a session while varying across contexts. **WebGL Fingerprint**: C++ source-level GPU spoofing in the Chromium renderer provides undetectable WebGL fingerprints — complete GPU identity spoofing (vendor, renderer, unmasked values), all getParameter values matched to real GPU profiles, shader precision format emulation, extension lists matching target hardware, render output normalization, timing normalization to defeat DrawnApart-style attacks. Profiles for Intel (Gen9 through Arc), NVIDIA (Pascal through Blackwell), AMD (GCN through RDNA4), and Apple Silicon (M1 through M4). **Audio Fingerprint**: AudioContext fingerprinting handled through property spoofing — sampleRate, baseLatency, outputLatency, AudioDestinationNode channel configuration, OfflineAudioContext rendering normalization, deterministic noise injection in getChannelData results. **Font Fingerprint**: Platform-consistent font lists via DOM measurement hooks, canvas measureText normalization, document.fonts API interception, queryLocalFonts blocking. **Navigator Properties**: Complete navigator object spoofing — userAgent, platform, hardwareConcurrency, deviceMemory, vendor, language, webdriver (always false), maxTouchPoints, Client Hints. **Screen Properties**: screen.width/height, screen.availWidth/availHeight, screen.colorDepth, window.devicePixelRatio, screen.orientation, matchMedia queries. **Timezone and Locale**: Full Date and Intl API coverage — Date.prototype methods, Intl.DateTimeFormat, automatic timezone detection from proxy geolocation, DST handling. ### Stealth Features - **WebDriver Detection**: navigator.webdriver returns undefined (not false), no webdriver in navigator prototype, headless detection signals eliminated, Playwright/Puppeteer detection bypassed - **CDP Detection Bypass**: Runtime.evaluate artifacts removed, HeapProfiler detection prevented - **Headless Detection Bypass**: Chrome app property presence, plugin array population, WebGL rendering context validation - **Iframe Propagation**: Stealth patches automatically propagate to same-origin and cross-origin iframes, nested iframes, Web Workers (Dedicated, Shared, Service Workers) ### AI Features **Built-in Vision LLM**: llama.cpp server with Metal/CUDA acceleration, Qwen3-VL-2B model, 32K context window, 16 parallel request slots. **Natural Language Selectors**: Find and interact with elements using descriptions. Semantic element matching with synonym expansion, multi-scorer ensemble for 90%+ accuracy, LLM fallback for ambiguous matches. **Page Understanding**: Automatic page summarization with caching, natural language queries about page content, structure extraction, Readability-based main content extraction. **CAPTCHA Solving**: Heuristic detection, classification (text, image-selection, checkbox, puzzle), text CAPTCHA via OCR, image grid CAPTCHA solving (reCAPTCHA, hCaptcha, Cloudflare), auto-submit and retry. **Natural Language Automation (NLA)**: Execute complex tasks from plain English — command parsing to action plans, page state analysis, multi-step action sequences, error recovery and retry. ### Content Extraction - **Readability**: Article detection and clean text extraction (Mozilla Readability algorithm port) - **HTML-to-Markdown**: Clean Markdown with heading hierarchy, tables, code blocks, links, images - **JSON Extraction**: Structured data using CSS selectors and custom rules - **Site-Specific Templates**: Pre-built extractors for major websites (Amazon, Wikipedia, etc.) - **Site Crawling**: Breadth-first crawling with configurable depth, URL filtering, content extraction ### Proxy and Network **Proxy Types**: HTTP, HTTPS, SOCKS4, SOCKS5, SOCKS5h (remote DNS), Tor integration with circuit isolation. **WebRTC Leak Protection**: RTCPeerConnection blocking/filtering, STUN/TURN server blocking, ICE candidate filtering. **DNS Leak Protection**: SOCKS5h for remote DNS resolution, DNS prefetch disabling, DNS-over-HTTPS routing. **Proxy Geolocation Sync**: Automatic timezone spoofing from GeoIP, language preference matching. **Network Interception**: URL pattern matching (glob and regex), block/mock/modify/redirect actions, response body replacement, header modification, network logging. **Resource Blocking**: 1000+ known ad network domains, analytics services (Google Analytics, Mixpanel, Amplitude, Segment), tracker domains, fingerprinting scripts. ### Profile Management - Save and restore complete browser state: fingerprint seeds, cookies, local/session storage, IndexedDB - AES-256-GCM encryption at rest with PBKDF2 key derivation - Pre-built VM profiles: Windows 10/11 with Intel/NVIDIA/AMD GPUs, Ubuntu 22.04/24.04, macOS 14/15 with Apple Silicon and Intel ### Video Capabilities - **Recording**: H.264 via FFmpeg, configurable frame rate (default 30fps), pause/resume support - **Live Streaming**: MJPEG endpoint, configurable quality and frame rate, multiple concurrent viewers ## Security SOC2-aligned security controls: - **AES-256-GCM** for profile data at rest - **RSA-2048** for license signatures - **HMAC-SHA256** for request authentication - **TLS 1.2/1.3** for all network communication - **Bearer Token** and **JWT** (RS256/RS384/RS512) authentication - Non-root container execution (uid 1000), capability dropping, read-only root filesystem - Token bucket rate limiting with per-IP tracking ## License Types | Type | Description | |------|-------------| | Trial | 14-day $0.99 evaluation license with the same access tier as Developer | | Starter | Monthly subscription (3 seats) | | Business | Annual license (10 seats) | | Enterprise | Annual license (50 seats, priority support) | | Developer | Individual developer license | Contact: sales@olib.ai --- # Owl Browser — Agent Skills Reference Complete reference for AI agents to automate browsers using Owl Browser. name: owl-browser description: Automates browsers using Owl Browser, a high-performance browser engine with 158 tools for web navigation, element interaction, text extraction, screenshots, fingerprint management, CAPTCHA solving, proxy management, video recording, and cookie management. compatibility: Requires a running Owl Browser instance (Docker or native). Network access needed to connect to the Owl Browser HTTP server. version: "1.0.9" website: https://www.owlbrowser.net ## Core Concepts - **Context**: An isolated browser instance with its own fingerprint, cookies, and proxy. Create with `browser_create_context`, close with `browser_close_context`. Every tool call (except context creation/listing) requires a `context_id`. - **Tool**: An atomic browser action (navigate, click, type, screenshot, etc.). Execute via SDK methods or REST `POST /execute/{tool_name}`. - **Flow**: A JSON sequence of tool steps executed in order with variable resolution and expectations. Portable across SDKs. - **Antidetect**: Each context gets a unique, realistic browser fingerprint (canvas, WebGL, audio, fonts, navigator, etc.) from a database of 100+ real device profiles. - **Smart Selectors**: Any tool that takes a `selector` parameter accepts three formats — the browser auto-detects which one you're using: - **CSS selector**: `#submit-btn`, `.nav-link`, `input[name=email]` - **Coordinates**: `100x200` (clicks at x=100, y=200) - **Natural language**: `login button`, `email input`, `search icon` Natural language selectors use a built-in semantic matcher that scores page elements by text similarity, ARIA labels, placeholders, and visual context — no CSS inspection required. Prefer natural language when you don't know the exact selector; use CSS when you need precision. ## Environment Setup Store your connection settings in a `.env` file (never commit this file): ```bash # .env OWL_ENDPOINT=https://your-domain.com OWL_TOKEN=your-secret-token ``` ## Python SDK ### Installation ```bash pip install owl-browser python-dotenv ``` ### Async Usage (Recommended) ```python import os, asyncio from dotenv import load_dotenv from owl_browser import OwlBrowser, RemoteConfig load_dotenv() async def main(): config = RemoteConfig( url=os.environ["OWL_ENDPOINT"], token=os.environ["OWL_TOKEN"] ) async with OwlBrowser(config) as browser: ctx = await browser.create_context() context_id = ctx["context_id"] await browser.navigate(context_id=context_id, url="https://example.com") await browser.wait_for_network_idle(context_id=context_id) await browser.click(context_id=context_id, selector="button#submit") # Natural language selector — no need to inspect DOM await browser.type(context_id=context_id, selector="email input", text="user@example.com") screenshot = await browser.screenshot(context_id=context_id) text = await browser.extract_text(context_id=context_id, selector="h1") markdown = await browser.get_markdown(context_id=context_id) await browser.close_context(context_id=context_id) asyncio.run(main()) ``` ## Node.js SDK ### Installation ```bash npm install @olib-ai/owl-browser-sdk dotenv ``` ### Quick Start ```typescript import 'dotenv/config'; import { OwlBrowser } from '@olib-ai/owl-browser-sdk'; const browser = new OwlBrowser({ url: process.env.OWL_ENDPOINT!, token: process.env.OWL_TOKEN!, apiPrefix: '' // '' for direct, '/api' for nginx proxy }); await browser.connect(); const ctx = await browser.createContext(); const contextId = ctx.context_id; await browser.navigate({ context_id: contextId, url: 'https://example.com' }); await browser.waitForNetworkIdle({ context_id: contextId }); await browser.click({ context_id: contextId, selector: 'button#submit' }); // Natural language — semantic matcher finds the right element await browser.type({ context_id: contextId, selector: 'email input', text: 'user@example.com' }); const screenshot = await browser.screenshot({ context_id: contextId }); const markdown = await browser.getMarkdown({ context_id: contextId }); await browser.closeContext({ context_id: contextId }); await browser.close(); ``` ## REST API (Direct) ``` Authorization: Bearer POST /api/execute/{tool_name} Content-Type: application/json ``` ```bash # Create context curl -X POST -H "Authorization: Bearer $OWL_TOKEN" \ -H "Content-Type: application/json" \ $OWL_ENDPOINT/execute/browser_create_context # Navigate curl -X POST -H "Authorization: Bearer $OWL_TOKEN" \ -H "Content-Type: application/json" \ -d '{"context_id": "ctx_000001", "url": "https://example.com"}' \ $OWL_ENDPOINT/execute/browser_navigate # Screenshot curl -X POST -H "Authorization: Bearer $OWL_TOKEN" \ -H "Content-Type: application/json" \ -d '{"context_id": "ctx_000001"}' \ $OWL_ENDPOINT/execute/browser_screenshot ``` ## Flow JSON Format Flows are portable JSON files that define a sequence of browser automation steps. They work with both the Python and Node.js SDKs. ```json { "name": "My Automation Flow", "description": "Description of what this flow does", "steps": [ { "type": "browser_navigate", "url": "https://example.com" }, { "type": "browser_wait_for_network_idle" }, { "type": "browser_type", "selector": "#email", "text": "user@example.com" }, { "type": "browser_click", "selector": "#submit" }, { "type": "browser_extract_text", "selector": ".result", "expected": { "contains": "Success" } } ] } ``` Use `${prev}` to reference the previous step's result: ```json {"type": "browser_navigate", "url": "${prev.url}/about"} ``` Supported expectations: `equals`, `contains`, `length`, `greaterThan`, `lessThan`, `notEmpty`, `matches`, `field`. ## Tools Reference Tool name mapping: - REST API / Flow JSON: use as-is (e.g., `browser_navigate`) - Python SDK: strip `browser_` prefix → `navigate(context_id=cid, url=...)` - Node.js SDK: strip `browser_` prefix, camelCase or snake_case → `browser.navigate({context_id, url})` All browser tools require `context_id` (string) as first param. Exceptions: `browser_create_context`, `browser_list_contexts`, and all `http_*` tools. ### Context Management - `browser_create_context` — Create a new isolated browser context with its own cookies, storage, fingerprint, and optional proxy/LLM configuration. Params: llm_enabled, llm_use_builtin, llm_endpoint, llm_model, llm_api_key, profile_path, proxy_type, proxy_host, proxy_port, proxy_username, proxy_password, proxy_stealth, is_tor, resource_blocking, os, gpu, timezone, screen_size - `browser_go` — One-shot browser navigation. Creates a new context, navigates, extracts content, closes context. Params: url (required), wait_until, timeout, output, os, use_tor - `browser_close_context` — Close a browser context and release all resources - `browser_list_contexts` — List all currently active browser contexts ### Navigation - `browser_navigate` — Navigate to a URL. Params: url (required), wait_until, timeout - `browser_reload` — Reload the current page. Params: ignore_cache, wait_until, timeout - `browser_go_back` — Navigate back in history. Params: wait_until, timeout - `browser_go_forward` — Navigate forward in history. Params: wait_until, timeout - `browser_can_go_back` — Check if navigation back is possible - `browser_can_go_forward` — Check if navigation forward is possible - `browser_set_content` — Set the page's HTML content directly. Params: html (required) ### Interaction - `browser_click` — Click an element via CSS selector, XY coordinates, or natural language. Params: selector (required), hold_ms, index - `browser_type` — Type text with human-like keystroke simulation. Params: selector, text (required), index - `browser_pick` — Select an option from a dropdown. Params: selector (required), value (required) - `browser_press_key` — Press a keyboard key. Params: key (required) - `browser_submit_form` — Submit the focused form - `browser_drag_drop` — Drag from start to end coordinates. Params: start_x, start_y, end_x, end_y (all required), mid_points - `browser_html5_drag_drop` — HTML5 drag and drop using DragEvent. Params: source_selector (required), target_selector (required) - `browser_mouse_move` — Move mouse along a natural curved path. Params: start_x, start_y, end_x, end_y (all required), steps, stop_points - `browser_hover` — Hover over an element. Params: selector (required), index - `browser_double_click` — Double-click an element. Params: selector (required), index - `browser_right_click` — Right-click (context menu). Params: selector (required), index - `browser_clear_input` — Clear text from an input field. Params: selector (required), index - `browser_focus` — Set focus to an element. Params: selector (required), index - `browser_blur` — Remove focus from an element. Params: selector (required) - `browser_select_all` — Select all text in an input. Params: selector (required) - `browser_keyboard_combo` — Press a key combination with modifiers. Params: combo (required) - `browser_upload_file` — Upload files to a file input. Params: selector (required), file_paths (required) ### Content Extraction - `browser_extract_text` — Extract visible text from page or element. Params: selector, regex, regex_group, index - `browser_screenshot` — Capture PNG screenshot. Params: mode (viewport/element/fullpage), selector, scale - `browser_highlight` — Visually highlight an element. Params: selector (required), border_color, background_color, index - `browser_show_grid_overlay` — Display XY coordinate grid overlay. Params: horizontal_lines, vertical_lines, line_color, text_color - `browser_get_html` — Extract HTML with configurable cleaning levels. Params: selector, clean_level (minimal/basic/aggressive) - `browser_get_markdown` — Convert page content to clean Markdown. Params: include_links, include_images, max_length - `browser_extract_site` — Crawl multiple pages of a website. Params: url (required), depth, max_pages, follow_external, output_format, include_images, include_metadata, exclude_patterns, timeout_per_page - `browser_extract_site_progress` — Get progress of a site extraction job. Params: job_id (required) - `browser_extract_site_result` — Get result of a completed extraction job. Params: job_id (required) - `browser_extract_site_cancel` — Cancel a running extraction job. Params: job_id (required) - `browser_extract_json` — Extract structured data as JSON. Params: template, selector - `browser_detect_site` — Identify the type of website currently loaded - `browser_list_templates` — List all available JSON extraction templates ### AI / LLM Tools - `browser_summarize_page` — Generate structured AI summary of current page. Params: force_refresh - `browser_query_page` — Ask a natural language question about page content. Params: query (required) - `browser_llm_status` — Check if the LLM is ready to use - `browser_nla` — Execute complex automation using natural language commands. Params: command (required) - `browser_ai_click` — Click an element described in natural language using AI vision. Params: description (required) - `browser_ai_type` — Type text into element described in natural language using AI vision. Params: description (required), text (required) - `browser_ai_extract` — Extract specific information from the page using AI. Params: what (required) - `browser_ai_query` — Ask a natural language question about the current page. Params: query (required) - `browser_ai_analyze` — Perform comprehensive AI analysis of the current page - `browser_find_element` — Find elements using natural language description. Params: description (required), max_results ### Scroll - `browser_scroll_by` — Scroll by specified pixels. Params: y (required), x, verification_level - `browser_scroll_to_element` — Scroll to bring element into view. Params: selector (required), index - `browser_scroll_to_top` — Scroll to the top of the page - `browser_scroll_to_bottom` — Scroll to the bottom of the page ### Wait - `browser_wait_for_selector` — Wait for element to appear. Params: selector (required), timeout, index - `browser_wait` — Pause execution for fixed milliseconds. Params: timeout (required) - `browser_wait_for_network_idle` — Wait for network to be idle. Params: idle_time, timeout - `browser_wait_for_function` — Wait for JavaScript condition. Params: js_function (required), polling, timeout - `browser_wait_for_url` — Wait for URL to match pattern. Params: url_pattern (required), is_regex, timeout ### Page Info & Viewport - `browser_get_page_info` — Get URL, title, meta, dimensions of current page - `browser_get_page_map` — Get compact structured map of all interactive elements on the page. Params: intent, max_elements, region - `browser_set_viewport` — Set viewport size. Params: width (required), height (required) - `browser_reset_viewport` — Reset viewport to VM profile default - `browser_zoom_in` — Zoom in 10% - `browser_zoom_out` — Zoom out 10% - `browser_zoom_reset` — Reset zoom to 100% - `browser_get_console_log` — Read browser console logs. Params: level, filter, limit - `browser_clear_console_log` — Clear all console logs ### Element Inspection - `browser_is_visible` — Check if element is visible. Params: selector (required), index - `browser_is_enabled` — Check if element is enabled. Params: selector (required), index - `browser_is_checked` — Check if checkbox/radio is checked. Params: selector (required), index - `browser_is_editable` — Check if element is editable. Params: selector (required), index - `browser_count_elements` — Count elements matching a CSS selector. Params: selector (required) - `browser_dispatch_event` — Dispatch a DOM event on an element. Params: selector (required), event_type (required), bubbles - `browser_get_attribute` — Get HTML attribute value. Params: selector (required), attribute (required), index - `browser_get_bounding_box` — Get position and size of element. Params: selector (required), index - `browser_evaluate` — Execute arbitrary JavaScript. Params: script, expression, return_value - `browser_get_element_at_position` — Get DOM element at XY coordinates. Params: x (required), y (required) - `browser_get_interactive_elements` — Find all interactive elements on the page - `browser_get_blocker_stats` — Get ad/tracker/analytics blocking statistics ### Clipboard - `browser_clipboard_read` — Read text from system clipboard - `browser_clipboard_write` — Write text to system clipboard. Params: text (required) - `browser_clipboard_clear` — Clear the system clipboard ### Frames - `browser_list_frames` — List all frames and iframes on the page - `browser_switch_to_frame` — Switch to an iframe. Params: frame_selector (required) - `browser_switch_to_main_frame` — Switch back to the main frame ### Video Recording - `browser_start_video_recording` — Begin recording browser session. Params: fps, codec - `browser_pause_video_recording` — Pause recording - `browser_resume_video_recording` — Resume recording - `browser_stop_video_recording` — Stop and save recording - `browser_get_video_recording_stats` — Get recording statistics - `browser_download_video_recording` — Get download URL for recording - `browser_start_live_stream` — Start MJPEG live stream. Params: fps, quality - `browser_stop_live_stream` — Stop live stream - `browser_get_live_stream_stats` — Get live stream statistics - `browser_list_live_streams` — List all active live streams - `browser_get_live_frame` — Get latest frame as base64 image ### CAPTCHA - `browser_detect_captcha` — Detect if page has a CAPTCHA - `browser_classify_captcha` — Identify the type of CAPTCHA - `browser_solve_text_captcha` — Solve text-based CAPTCHA via OCR. Params: max_attempts - `browser_solve_image_captcha` — Solve image-selection CAPTCHA. Params: max_attempts, provider - `browser_solve_captcha` — Auto-detect and solve any CAPTCHA type. Params: max_attempts, provider ### Cookies - `browser_get_cookies` — Get all cookies. Params: url - `browser_set_cookie` — Set a cookie. Params: url (required), name (required), value (required), domain, path, secure, httpOnly, sameSite, expires - `browser_delete_cookies` — Delete cookies. Params: url, cookie_name - `browser_get_headers` — Get HTTP response headers. Params: url ### Proxy - `browser_set_proxy` — Configure proxy settings. Params: type (required), host (required), port (required), username, password, stealth, block_webrtc, spoof_timezone, spoof_language, is_tor - `browser_get_proxy_status` — Get current proxy configuration and status - `browser_connect_proxy` — Enable and connect the configured proxy - `browser_disconnect_proxy` — Disable proxy and revert to direct connection - `browser_set_timezone` — Override timezone at runtime. Params: timezone (required) ### Profiles - `browser_create_profile` — Create new profile with randomized fingerprints. Params: name - `browser_load_profile` — Load saved profile into context. Params: profile_path (required) - `browser_save_profile` — Save context state to profile. Params: profile_name (required) - `browser_download_profile` — Download a saved profile file. Params: profile_name (required) - `browser_get_profile` — Get current profile state as JSON - `browser_update_profile_cookies` — Update profile with current cookies - `browser_get_context_info` — Get context info including VM profile and fingerprint hashes ### Network Interception - `browser_add_network_rule` — Add interception rule. Params: url_pattern (required), action (required: allow/block/mock/redirect), is_regex, redirect_url, mock_body, mock_status, mock_content_type - `browser_remove_network_rule` — Remove a rule by ID. Params: rule_id (required) - `browser_enable_network_interception` — Enable/disable interception. Params: enable (required) - `browser_get_network_log` — Get captured network requests log - `browser_get_network_rules` — List all interception rules - `browser_clear_network_log` — Clear network log entries - `browser_enable_network_logging` — Enable/disable logging. Params: enable (required) ### Downloads - `browser_set_download_path` — Configure download directory. Params: path (required) - `browser_get_downloads` — List all downloads with status - `browser_get_active_downloads` — Get in-progress downloads - `browser_wait_for_download` — Wait for download to complete. Params: download_id (required), timeout - `browser_cancel_download` — Cancel an in-progress download. Params: download_id (required) ### Dialogs - `browser_set_dialog_action` — Configure automatic dialog handling. Params: dialog_type (required: alert/confirm/prompt/beforeunload), action (required: accept/dismiss/accept_with_text), prompt_text - `browser_get_pending_dialog` — Check for pending JavaScript dialogs - `browser_get_dialogs` — Get all dialog events in context - `browser_handle_dialog` — Manually handle a dialog. Params: dialog_id (required), accept (required), response_text - `browser_wait_for_dialog` — Wait for dialog to appear. Params: timeout ### Tabs - `browser_get_tabs` — List all tabs in context - `browser_switch_tab` — Switch to a tab. Params: tab_id (required) - `browser_close_tab` — Close a tab. Params: tab_id (required) - `browser_new_tab` — Open a new tab. Params: url - `browser_get_active_tab` — Get the currently active tab - `browser_get_tab_count` — Get number of open tabs - `browser_get_blocked_popups` — Get list of blocked popup URLs - `browser_set_popup_policy` — Configure popup handling. Params: policy (required: allow/block/new_tab/background) ### Demographics - `browser_get_demographics` — Get user demographics based on IP geolocation - `browser_get_location` — Get geographic location from IP - `browser_get_datetime` — Get current date, time, and day of week - `browser_get_weather` — Get current weather for detected location ### License - `browser_get_license_status` — Check license validity - `browser_get_license_info` — Get license details and seat information - `browser_get_hardware_fingerprint` — Get hardware fingerprint for license binding - `browser_add_license` — Add/activate a license. Params: license_content (required) ### HTTP Client Tools (No context_id required) - `http_request` — Make HTTP/HTTPS request. Params: url (required), method, headers, body, cookies, auth_type, auth_username, auth_password, auth_token, proxy_type, proxy_host, proxy_port, use_tor, follow_redirects, timeout, ssl_verify, user_agent, output - `http_download` — Download file from URL. Params: url (required), output_path, headers, proxy_type, proxy_host, proxy_port, use_tor, timeout, resume, user_agent - `http_session_create` — Create persistent HTTP session with cookie management. Params: headers, user_agent, follow_redirects, ssl_verify, proxy_type, proxy_host, proxy_port, use_tor - `http_session_request` — Make request in a persistent session. Params: session_id (required), url (required), method, headers, body, auth_type, timeout, output - `http_session_get_cookies` — Get session cookies. Params: session_id (required), url - `http_session_set_cookies` — Import cookies into session. Params: session_id (required), cookies (required) - `http_session_close` — Close and destroy a session. Params: session_id (required) - `http_session_list` — List all active HTTP sessions --- # Owl Browser — Python SDK Async-first Python SDK for Owl Browser automation with dynamic OpenAPI method generation and flow execution support. ## Features - **Dynamic Method Generation**: Methods are automatically generated from the OpenAPI schema - **Async-First Design**: Built with asyncio for optimal performance - **Sync Wrappers**: Convenience methods for non-async code - **Flow Execution**: Execute test flows with variable resolution and expectations - **Type Safety**: Full type hints with Python 3.12+ features - **Connection Pooling**: Efficient HTTP connection management - **Retry Logic**: Automatic retries with exponential backoff ## Installation ```bash pip install owl-browser ``` For development: ```bash pip install owl-browser[dev] ``` ## Quick Start ### Connection Modes ```python from owl_browser import OwlBrowser, RemoteConfig # Production (via nginx proxy) - this is the default # Uses /api prefix: https://your-domain.com/api/execute/... config = RemoteConfig( url="https://your-domain.com", token="your-token" ) # Development (direct to http-server on port 8080) # No prefix: http://localhost:8080/execute/... config = RemoteConfig( url="http://localhost:8080", token="test-token", api_prefix="" # Empty string for direct connection ) ``` ### Async Usage (Recommended) ```python import asyncio from owl_browser import OwlBrowser, RemoteConfig async def main(): config = RemoteConfig( url="https://your-domain.com", token="your-secret-token" ) async with OwlBrowser(config) as browser: ctx = await browser.create_context() context_id = ctx["context_id"] await browser.navigate(context_id=context_id, url="https://example.com") await browser.click(context_id=context_id, selector="button#submit") screenshot = await browser.screenshot(context_id=context_id) text = await browser.extract_text(context_id=context_id, selector="h1") print(f"Page title: {text}") await browser.close_context(context_id=context_id) asyncio.run(main()) ``` ### Sync Usage ```python from owl_browser import OwlBrowser, RemoteConfig config = RemoteConfig( url="http://localhost:8080", token="your-secret-token" ) browser = OwlBrowser(config) browser.connect_sync() ctx = browser.execute_sync("browser_create_context") browser.execute_sync("browser_navigate", context_id=ctx["context_id"], url="https://example.com") browser.execute_sync("browser_close_context", context_id=ctx["context_id"]) browser.close_sync() ``` ## Authentication ### Bearer Token ```python config = RemoteConfig( url="http://localhost:8080", token="your-secret-token" ) ``` ### JWT Authentication ```python from owl_browser import RemoteConfig, AuthMode, JWTConfig config = RemoteConfig( url="http://localhost:8080", auth_mode=AuthMode.JWT, jwt=JWTConfig( private_key_path="/path/to/private.pem", expires_in=3600, # 1 hour refresh_threshold=300, # Refresh 5 minutes before expiry issuer="my-app", subject="user-123" ) ) ``` ## Flow Execution ```python from owl_browser import OwlBrowser, RemoteConfig from owl_browser.flow import FlowExecutor async def run_flow(): async with OwlBrowser(RemoteConfig(...)) as browser: ctx = await browser.create_context() executor = FlowExecutor(browser, ctx["context_id"]) flow = FlowExecutor.load_flow("test-flows/navigation.json") result = await executor.execute(flow) if result.success: print(f"Flow completed in {result.total_duration_ms:.0f}ms") for step in result.steps: print(f" [{step.step_index}] {step.tool_name}: {'OK' if step.success else 'FAIL'}") else: print(f"Flow failed: {result.error}") await browser.close_context(context_id=ctx["context_id"]) ``` ## Error Handling ```python from owl_browser import ( OwlBrowserError, ConnectionError, AuthenticationError, ToolExecutionError, TimeoutError, ) try: async with OwlBrowser(config) as browser: await browser.navigate(context_id="invalid", url="https://example.com") except AuthenticationError as e: print(f"Authentication failed: {e}") except ToolExecutionError as e: print(f"Tool {e.tool_name} failed: {e.message}") except TimeoutError as e: print(f"Operation timed out: {e}") except ConnectionError as e: print(f"Connection failed: {e}") ``` ## Configuration Options ```python from owl_browser import RemoteConfig, RetryConfig config = RemoteConfig( url="https://your-domain.com", token="secret", timeout=30.0, # seconds max_concurrent=10, retry=RetryConfig( max_retries=3, initial_delay_ms=100, max_delay_ms=10000, backoff_multiplier=2.0, jitter_factor=0.1 ), api_prefix="/api", # Default: "/api" (production); "" for direct connection verify_ssl=True ) ``` ## Requirements - Python 3.12+ - aiohttp >= 3.9.0 - pyjwt[crypto] >= 2.8.0 - cryptography >= 42.0.0 ## Links - Website: https://www.owlbrowser.net - Documentation: https://www.owlbrowser.net/docs - GitHub: https://github.com/Olib-AI/olib-browser --- # Owl Browser — Node.js SDK Node.js SDK v2 for Owl Browser — AI-native browser automation with antidetect capabilities. ## Features - **Async-first design** — All operations are async/await based - **Dynamic method generation** — 144+ browser tools available as typed methods - **OpenAPI schema bundled** — Works offline, no need to fetch schema from server - **Flow execution engine** — Run complex automation flows with conditions and expectations - **JWT and Token auth** — Flexible authentication options - **TypeScript support** — Full type definitions included - **Retry with backoff** — Built-in retry logic with exponential backoff and jitter ## Installation ```bash npm install @olib-ai/owl-browser-sdk ``` ## Quick Start ```typescript import { OwlBrowser, RemoteConfig } from '@olib-ai/owl-browser-sdk'; const browser = new OwlBrowser({ url: 'http://localhost:8080', token: 'your-secret-token', apiPrefix: '' // Use '' for direct connection, '/api' for nginx proxy }); await browser.connect(); const ctx = await browser.createContext(); const contextId = ctx.context_id; await browser.navigate({ context_id: contextId, url: 'https://example.com' }); await browser.click({ context_id: contextId, selector: 'button#submit' }); const screenshot = await browser.screenshot({ context_id: contextId }); await browser.closeContext({ context_id: contextId }); await browser.close(); ``` ## Configuration ```typescript interface RemoteConfig { // Required url: string; // Server URL (e.g., 'http://localhost:8080') // Authentication (one required) token?: string; // Bearer token for TOKEN auth authMode?: AuthMode; // 'token' (default) or 'jwt' jwt?: JWTConfig; // JWT configuration for JWT auth // Optional transport?: TransportMode; // 'http' (default) or 'websocket' timeout?: number; // Request timeout in seconds (default: 30) maxConcurrent?: number; // Max concurrent requests (default: 10) retry?: RetryConfig; // Retry configuration verifySsl?: boolean; // Verify SSL certificates (default: true) apiPrefix?: string; // API prefix (default: '/api', use '' for direct) } ``` ### JWT Authentication ```typescript import { OwlBrowser, AuthMode } from '@olib-ai/owl-browser-sdk'; const browser = new OwlBrowser({ url: 'http://localhost:8080', authMode: AuthMode.JWT, jwt: { privateKeyPath: '/path/to/private.pem', expiresIn: 3600, // Token validity in seconds refreshThreshold: 300, // Refresh when < 300s remaining issuer: 'my-app', claims: { custom: 'data' } } }); ``` ## Dynamic Methods The SDK dynamically generates methods for all 144+ browser tools. Methods are available in both camelCase and snake_case: ```typescript // These are equivalent await browser.createContext(); await browser.create_context(); // Navigation await browser.navigate({ context_id: ctx, url: 'https://example.com' }); await browser.reload({ context_id: ctx }); await browser.goBack({ context_id: ctx }); await browser.goForward({ context_id: ctx }); // Interaction await browser.click({ context_id: ctx, selector: '#button' }); await browser.type({ context_id: ctx, selector: '#input', text: 'Hello' }); // Data extraction await browser.getHtml({ context_id: ctx }); await browser.getMarkdown({ context_id: ctx }); await browser.screenshot({ context_id: ctx }); // AI-powered tools await browser.queryPage({ context_id: ctx, question: 'What is the title?' }); await browser.solveCaptcha({ context_id: ctx }); await browser.findElement({ context_id: ctx, description: 'login button' }); ``` ## Flow Execution ```typescript import { OwlBrowser, FlowExecutor } from '@olib-ai/owl-browser-sdk'; const browser = new OwlBrowser({ url: '...', token: '...' }); await browser.connect(); const ctx = await browser.createContext(); const executor = new FlowExecutor(browser, ctx.context_id); const flow = FlowExecutor.loadFlow('test-flows/navigation.json'); const result = await executor.execute(flow); if (result.success) { console.log('Flow completed in', result.totalDurationMs, 'ms'); } else { console.error('Flow failed:', result.error); } ``` ### Conditional Branching ```json { "type": "condition", "condition": { "source": "previous", "operator": "equals", "field": "success", "value": true }, "onTrue": [ { "type": "browser_click", "selector": "#continue" } ], "onFalse": [ { "type": "browser_screenshot" } ] } ``` ## Error Handling ```typescript import { OwlBrowserError, ConnectionError, AuthenticationError, ToolExecutionError, TimeoutError, RateLimitError, ElementNotFoundError } from '@olib-ai/owl-browser-sdk'; try { await browser.click({ context_id: ctx, selector: '#nonexistent' }); } catch (e) { if (e instanceof ElementNotFoundError) { console.log('Element not found:', e.selector); } else if (e instanceof TimeoutError) { console.log('Operation timed out after', e.timeoutMs, 'ms'); } else if (e instanceof RateLimitError) { console.log('Rate limited. Retry after', e.retryAfter, 'seconds'); } else if (e instanceof AuthenticationError) { console.log('Auth failed:', e.message); } } ``` ## Advanced Usage ### OpenAPI Schema Access ```typescript import { OpenAPILoader, getBundledSchema } from '@olib-ai/owl-browser-sdk'; const schema = getBundledSchema(); console.log('API Version:', schema.info.version); const loader = new OpenAPILoader(schema); for (const [name, tool] of loader.tools) { console.log(name + ':', tool.description); } ``` ## API Reference ### OwlBrowser - `connect(): Promise` — Connect to server - `close(): Promise` — Close connection - `execute(toolName, params): Promise` — Execute any tool - `healthCheck(): Promise` — Check server health - `listTools(): string[]` — List all tool names - `listMethods(): string[]` — List all method names - `getTool(name): ToolDefinition | undefined` — Get tool definition ### FlowExecutor - `execute(flow): Promise` — Execute a flow - `abort(): void` — Abort current execution - `reset(): void` — Reset abort flag - `static loadFlow(path): Flow` — Load flow from JSON file - `static parseFlow(data): Flow` — Parse flow from object ## Requirements - Node.js 18+ - TypeScript 5+ (optional, for type definitions) ## Links - **Website**: https://www.owlbrowser.net - **Documentation**: https://www.owlbrowser.net/docs - **GitHub**: https://github.com/Olib-AI/olib-browser - **Support**: support@olib.ai