Function Reference

browser(url, headless=False, timeout=30, cookie_path=None)[source]

Initialize and return a browser instance for web automation.

Parameters:
  • url – Target URL to navigate to

  • headless – Run browser in headless mode (default: False)

  • timeout – Maximum seconds to wait for elements to appear (default: 30)

  • cookie_path

    Path to cookies JSON file (optional)

    - Cookies MUST be in JSON format
    - Export from Chrome using "Cookie-Editor" extension
    - Cookie domain must match the target URL
    

Returns:

Browser instance or None if initialization fails

Return type:

WebDriver

Example

# Basic usage
driver = browser('https://google.com')
click(driver, 'id', 'search-button')

# Slow-loading site
driver = browser('https://slow-site.gov', timeout=90)
click(driver, 'id', 'submit-btn')  # Waits up to 90s

# Fast site testing
driver = browser('https://fast-site.com', timeout=5)
click(driver, 'id', 'login-btn')  # Fails fast in 5s

# With cookies
driver = browser('https://site.com', cookie_path='cookies.json')

# Headless mode
driver = browser('https://google.com', headless=True)

# Trigger a download and wait for it
driver = browser('https://example.com')
click(driver, 'id', 'download-button')
wait_download(download_dir=driver.download_dir)

Note

Uses undetected-chromedriver (uc) to bypass bot detection. Requires Google Chrome to be installed.

Windows:

winget install Google.Chrome

Linux (Ubuntu/Debian/Mint):

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo dpkg -i google-chrome-stable_current_amd64.deb sudo apt-get install -f -y

Linux (RHEL/CentOS/Fedora):

wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm sudo rpm -i google-chrome-stable_current_x86_64.rpm

click(*where)[source]

Performs left-click based on different input types.

Modes

  1. Image matching: Click on visual element

  2. OCR text matching: Click on text found on screen (with nth occurrence support)

  3. Coordinates: Click at specific x, y position

  4. Color matching: Click on specific color in region

  5. Selenium web element: Click on element in browser

Parameters:

*where – Variable arguments depending on click mode

Returns:

True if successful, False otherwise

Example

# Image matching
click('button.png')

# OCR text matching
click('Submit')                    # Click first occurrence
click('Submit', 2)                 # Click 2nd occurrence
click('Login', 0)                  # Click all occurrences

# Coordinates
click(100, 200)

# Color matching in region
click(x1, y1, x2, y2, r, g, b)              # Find and click color
click(x1, y1, x2, y2, r, g, b, tolerance)   # With tolerance

# Selenium (pass driver object first)
click(driver, 'id', 'submit-button')
click(driver, 'xpath', '//button[@id="submit"]')
click(driver, 'class', 'btn-primary')
click(driver, 'name', 'username')
click(driver, 'css', 'button.submit')
click(driver, 'tag', 'button')
click(driver, 'text', 'Click Here')
click(driver, 'partial', 'Click')
click_right(*where)[source]

Performs right-click (context menu) based on different input types.

Modes

  1. Image matching: Right-click on visual element

  2. OCR text matching: Right-click on text found on screen (with nth occurrence support)

  3. Coordinates: Right-click at specific x, y position

  4. Color matching: Right-click on specific color in region

  5. Selenium web element: Right-click on element in browser

Parameters:

*where – Variable arguments depending on click mode

Returns:

True if successful, False otherwise

Example

# Image matching
click_right('button.png')

# OCR text matching
click_right('Submit')              # Right-click first occurrence
click_right('Submit', 2)           # Right-click 2nd occurrence
click_right('Login', 0)            # Right-click all occurrences

# Coordinates
click_right(100, 200)

# Color matching in region
click_right(x1, y1, x2, y2, r, g, b)              # Find and right-click color
click_right(x1, y1, x2, y2, r, g, b, tolerance)   # With tolerance

# Selenium (pass driver object first)
click_right(driver, 'id', 'submit-button')
click_right(driver, 'xpath', '//button[@id="submit"]')
click_right(driver, 'class', 'btn-primary')
click_right(driver, 'name', 'username')
click_right(driver, 'css', 'button.submit')
click_right(driver, 'tag', 'button')
click_right(driver, 'text', 'Click Here')
click_right(driver, 'partial', 'Click')
copy(*where)[source]

Copies text from various sources: screen, clipboard, Selenium elements or web pages.

Modes

  1. Active window: Copy all content from current window

  2. Clipboard: Get current clipboard content

  3. Screen coordinates: Click at position and copy

  4. Selenium webpage: Copy entire page content

  5. Selenium element: Copy element text or attribute value

Parameters:

*where – Variable arguments depending on copy mode

Returns:

Copied text or ‘’ if nothing was copied

Return type:

str

Example

# Active window - Copy everything from current window
# Ctrl+A, Ctrl+C from active window
copy()

# Clipboard
# Get current clipboard content
copy('clipboard')

# Screen coordinates
# Click at (500, 300) and copy
copy(500, 300)

# Selenium webpage - Copy entire page
# Copy all webpage content
copy(driver)

# Selenium element - Copy text
copy(driver, 'id', 'username-display')
copy(driver, 'xpath', '//div[@class="name"]')
copy(driver, 'class', 'user-info')
copy(driver, 'name', 'description')
copy(driver, 'css', 'p.content')
copy(driver, 'tag', 'h1')
copy(driver, 'text', 'Welcome')
copy(driver, 'partial', 'Hello')

# Selenium element - Copy attribute
copy(driver, 'id', 'download-link', 'href')         # Get link URL
copy(driver, 'class', 'product-img', 'src')         # Get image source
copy(driver, 'id', 'email-field', 'value')          # Get input value
copy(driver, 'xpath', '//a[@id="link"]', 'title')   # Get title attribute
csv_to_xlsx(csv_file=None, delete_csv=True)[source]

Converts CSV file(s) to XLSX format.

Parameters:
  • csv_file – Path to CSV file or None to auto-detect single CSV in current directory

  • delete_csv – If True, deletes original CSV after conversion (default: True)

Returns:

Path of created XLSX file or None if error

Return type:

str

Output

  • Prints the detected CSV filename when auto-detected.

  • Prints conversion result showing source and destination filenames.

  • Prints confirmation when original CSV is deleted.

Example

# Auto-detect single CSV in current directory (deletes CSV by default)
csv_to_xlsx()                               # Finds, converts and deletes CSV

# Specific file (deletes CSV by default)
csv_to_xlsx('data.csv')                     # Converts and deletes data.csv

# Keep original CSV
csv_to_xlsx('report.csv', delete_csv=False) # Keeps report.csv
date()[source]

Get current day of month.

Returns:

Current day (1-31)

Return type:

int

Example

date() # 24

day()[source]

Get current day of week.

Returns:

Day name in lowercase (monday, tuesday, wednesday, thursday, friday, saturday, sunday)

Return type:

str

Example

# Weekday check
if day() == 'monday':
    print("It is Monday today.")
drag(*args)[source]

Drag from source to target.

Modes

  1. PyAutoGUI screen drag: (x1, y1, x2, y2)

  2. Selenium element drag: (driver, src_type, src_selector, tgt_type, tgt_selector)

Parameters:
  • PyAutoGUI – (x1, y1, x2, y2)

  • Selenium – (driver, src_type, src_selector, tgt_type, tgt_selector)

Returns:

True if successful, False otherwise

Output

  • Prints drag confirmation showing source and target coordinates (PyAutoGUI).

  • Prints drag confirmation showing source and target selectors (Selenium).

Example

# Screen drag (PyAutoGUI) - 2 second duration
drag(100, 200, 500, 600)

# Web element drag (Selenium)
drag(driver, 'id', 'card-1', 'class', 'done-column')
drag(driver, 'xpath', '//li[1]', 'xpath', '//li[5]')

# Multiple drivers
driver1 = browser('https://trello.com')
driver2 = browser('https://jira.com')
drag(driver1, 'id', 'task-1', 'id', 'done-column')
drag(driver2, 'class', 'issue', 'class', 'backlog')
dropdown_select(driver_obj, selector_type, selector, selection_criteria)[source]

Selects an item from a dropdown menu based on the provided criteria.

Parameters:
  • driver_obj – Selenium WebDriver instance

  • selector_type – Type of selector (‘id’, ‘name’, ‘xpath’, ‘class’, ‘css’, ‘tag’, ‘text’, ‘partial’)

  • selector – The value of the selector

  • selection_criteria – Index (int) or visible text (str) for selection

Returns:

True if successful, False otherwise

Output

  • Prints confirmation showing the selected option index or text.

Example

# Select by index
dropdown_select(driver, 'id', 'country-dropdown', 0)        # Select first option
dropdown_select(driver, 'id', 'country-dropdown', 2)        # Select third option

# Select by visible text
dropdown_select(driver, 'id', 'country-dropdown', 'United States')
dropdown_select(driver, 'name', 'language', 'English')
dropdown_select(driver, 'xpath', '//select[@name="city"]', 'New York')

# Different selector types
dropdown_select(driver, 'class', 'form-select', 'Option 1')
dropdown_select(driver, 'css', 'select.dropdown', 'Value')
erase(*args)[source]

Erase/clear text from input fields.

Modes

  1. PyAutoGUI active window: ()

  2. Selenium specific element: (driver, selector_type, selector)

Parameters:

*args – Variable arguments depending on mode

Returns:

True if successful, False otherwise

Example

# PyAutoGUI mode (erase active window)
erase()                                  # Select all and delete (Ctrl+A, Delete)

# Selenium mode (erase specific element)
erase(driver, 'id', 'username')          # Clear username field
erase(driver, 'xpath', '//input[@name="email"]')  # Clear email field
erase(driver, 'class', 'search-box')     # Clear search box
find_browser(*args)[source]

Find text in browser using Ctrl+F (find function).

Parameters:

*args – Variable arguments depending on mode

Returns:

True if successful, False otherwise

Output

  • Prints confirmation with the searched text (PyAutoGUI).

  • Prints confirmation if text was found and highlighted (Selenium).

  • Prints a message if text was not found on the page (Selenium).

Example

# PyAutoGUI mode (any window)
find_browser('Python')              # Find in active window
find_browser('error message')       # Find phrase

# Selenium mode (browser)
find_browser(driver, 'Python')      # Find in Selenium browser
find_browser(driver, 'contact us')  # Find phrase in browser

Note

  • PyAutoGUI: Opens find dialog (Ctrl+F), types search term, presses Enter, then Esc.

  • Selenium: Uses JavaScript to highlight matching text on the page in yellow and scrolls to the first match. Removes any previous highlights before applying new ones.

  • Default wait time between actions is 1 second (PyAutoGUI only).

find_key(data, key)[source]

Recursively finds all values of a specified key in nested data structures (dictionaries, lists and tuples). Particularly useful for searching deeply nested JSON data from API responses or parsed files.

Parameters:
  • data – Data structure to search (dict, list or tuple)

  • key – Key name to find

Returns:

All values found for the key (empty list if not found)

Return type:

list

Example

# Single occurrence
data = {'name': 'John', 'age': 30}
name = find_key(data, 'name')[0]          # 'John'

# Multiple occurrences
data = {
    'user': {'id': 1, 'name': 'Alice'},
    'admin': {'id': 2, 'name': 'Bob'}
}
ids = find_key(data, 'id')                # [1, 2]

# Nested lists/tuples
data = {'users': [{'age': 25}, {'age': 30}]}
ages = find_key(data, 'age')              # [25, 30]

# API response workflow
response = requests.get('https://api.example.com/users').json()
ids = find_key(response, 'id')            # finds all 'id' values

# Parsed file workflow
data = json.loads(read('data.json'))
hosts = find_key(data, 'host')            # finds all 'host' values
find_str(string, starts_after, ends_before, index=0)[source]

Extracts substring between two markers.

Parameters:
  • string – Text to search in

  • starts_after – Start extraction after this

  • ends_before – End extraction before this

  • index – Which match (0=first, -1=last, 1=second, etc.)

Returns:

Extracted string or None if not found

Return type:

str or None

Example

# Extract version number from string
text = 'Version: 1.0.5 released'
version = find_str(text, 'Version: ', ' released')
# version = '1.0.5'

# Extract last occurrence using index=-1
text = 'User: Alice logged in. User: Bob logged in'
last_user = find_str(text, 'User: ', ' logged', -1)
# last_user = 'Bob'
hour()[source]

Get current hour.

Returns:

Current hour (0-23) in 24-hour format

Return type:

int

Example

hour() # 14 (2 PM)

inspect()[source]

Opens GUI to inspect pixel position and color with zoomed preview.

Usage

  1. Click on the Pixel Inspector window to bring it into focus.

  2. Move the mouse to the desired pixel.

  3. Press ‘ESC’ to capture.

Output

  • Prints position and RGB/HEX values to console.

  • Copies ‘x, y, r, g, b’ to clipboard.

log_setup(title)[source]

Sets up logging and terminal styling for the script.

Combines terminal setup with comprehensive logging and automatic color-coded status indication. Creates a logs folder and saves all output with timestamps. Shows output in terminal while also saving to file.

Parameters:

title – Name for both terminal title and log file

Example

log_setup("MyScript")
print("This gets logged")
# ... script runs ...
# Terminal turns GREEN on success or RED on crash

Log file format

logs/log_MyScript_2026-03-26_14-30-45_IST_session-1.txt        (active - newest logs)
logs/log_MyScript_2026-03-26_14-30-45_IST_session-1_part_1.txt (2nd newest - rotated)
logs/log_MyScript_2026-03-26_14-30-45_IST_session-1_part_2.txt (3rd newest)
...
logs/log_MyScript_2026-03-26_14-30-45_IST_session-1_part_9.txt (oldest backup)

Session numbering

session-1 : First run of this script
session-2 : Second run of this script
session-3 : Third run, etc.
session-N : Automatically increments based on existing log files

Features

  • Sets terminal title and colors (blue bg, white text)

  • Automatic color changes: Blue to Green (success) or Blue to Red (crash)

  • Automatic session numbering (increments from previous runs)

  • Captures all print() statements

  • Captures all errors and exceptions

  • Adds timestamp to each entry

  • Shows output in terminal AND saves to file

  • Automatic file rotation (10MB per file, max 10 files = 100MB per session)

  • Automatic cleanup (keeps max 100MB total logs across all sessions)

Note

Terminal colors change automatically based on script outcome:

Blue background  : Script is running
Green background : Script completed successfully
Red background   : Script crashed (unhandled exception)
minute()[source]

Get current minute.

Returns:

Current minute (0-59)

Return type:

int

Example

minute() # 23

month()[source]

Get current month.

Returns:

Current month (1-12)

Return type:

int

Example

month() # 2 (February)

press(*keys)[source]

Press keyboard keys with support for Selenium, PyAutoGUI and key combinations.

Modes

  1. PyAutoGUI single key: (key)

  2. PyAutoGUI key N times: (key, count)

  3. PyAutoGUI key combination: (key1, key2, …)

  4. Selenium driver key: (driver, key)

  5. Selenium driver key N times: (driver, key, count)

  6. Selenium driver key combination: (driver, key1, key2, …)

  7. Selenium element key: (driver, selector_type, selector, key)

Parameters:

*keys – Variable arguments for key presses

Returns:

True if successful, False otherwise

Example

# PyAutoGUI keys (no driver needed)
press("tab")                    # Single key
press("tab", 5)                 # Press 5 times
press("tab", -5)                # Press 5 times with shift held
press("ctrl", "a")              # Two-key combo
press("alt", "ctrl", "z")       # Three-key combo
press("num5")                   # Numpad 5
press("volumeup")               # Volume up
press("mute")                   # Volume mute (short form)
press("back")                   # Browser back (short form)
press("forward")                # Browser forward (short form)

# Selenium driver keys (pass driver object)
press(driver, "tab")
press(driver, "tab", 5)           # Press tab 5 times
press(driver, "tab", -5)          # Press shift+tab 5 times
press(driver, "ctrl", "c")
press(driver, "ctrl", "shift", "s")

# Selenium element + key (driver, selector_type, selector, key)
press(driver, "xpath", "//input", "enter")
press(driver, "id", "username", "tab")

Note

  • Negative count presses the key with Shift held (e.g. Shift+Tab for reverse navigation).

  • PyAutoGUI-only keys (num0-9, volumeup, volumedown, mute, back, forward, etc.) are not supported in Selenium mode.

  • Short forms supported: ‘back’ for browserback, ‘forward’ for browserforward, ‘mute’ for volumemute.

read(*args)[source]

Extract text from screen (using OCR), files (by parsing file format) or a Selenium browser window.

Modes

  1. No arguments: OCR full screen

  2. 2 integers: OCR from (x, y) to bottom-right corner

  3. 4 integers: OCR specific region (x, y, width, height)

  4. 1 string: Read file by parsing its format

  5. 1 driver object: Take screenshot of Selenium browser and read using OCR

Supported file formats

  • Documents: PDF, DOCX, PPTX, ODT, RTF

  • Tabular: CSV, TSV, XLSX, SQLite

  • Structured: JSON, YAML, XML, INI/CFG

  • Text: TXT, LOG, MD

  • Web: HTML

  • Email: EML, MSG

  • eBooks: EPUB

  • Scripts: SH, BAT, PY

Parameters:

*args – Variable arguments depending on mode

Returns:

Extracted text or None if error

Return type:

str

Example

# OCR - Read entire screen
text = read()

# OCR - Read from (100, 200) to bottom-right corner
text = read(100, 200)

# OCR - Read specific region: x=100, y=200, width=400, height=300
text = read(100, 200, 400, 300)

# Selenium - Read text from browser window using OCR
d1 = browser('https://example.com')
text = read(d1)

# File - Read with extension
text = read('report.pdf')
text = read('data.csv')
text = read('script.py')

# File - Read without extension (auto-detects)
text = read('report')      # Finds report.pdf automatically
text = read('config')      # Finds config.yaml automatically

# Check if text on screen
if 'login' in read():
    print("Login visible!")

Note

OCR first run downloads models (~100MB), subsequent runs are fast.

run(target)[source]

Runs a file or application on Windows and Linux.

Parameters:

target

File path or application name to execute

  • If target is a file path: Opens with default application

  • If target is an application name: Launches the application

  • For applications, the command must be available in system PATH

Raises:

NotImplementedError – If called on macOS

Output

  • Prints error message if file or application was not found.

  • Prints error message if permission was denied.

Example

# Open files with default application
run("sample.txt")           # Opens in default text editor
run("document.pdf")         # Opens in default PDF viewer
run("C:\Users\file.xlsx") # Opens Excel file

# Launch applications (Windows)
run("calc")                 # Calculator
run("notepad")              # Notepad
run("mspaint")              # Paint

# Launch applications (Linux)
run("gedit")                # Text editor
run("firefox")              # Browser
run("gnome-calculator")     # Calculator

Note

  • Windows: Uses os.startfile for files, subprocess for applications.

  • Linux: Uses xdg-open for files, direct execution for applications. xdg-utils is required (included in Linux dependencies).

say(text, volume=1.0)[source]

Speak text using offline Text-to-Speech via Piper TTS.

Parameters:
  • text – Text to speak

  • volume – Volume level 0.0 to 1.0 (default: 1.0)

Returns:

None

Example

say("Hello, how are you?")
say("Download complete.")
say("Error occurred, please try again.", volume=0.7)
say("Warning: Low battery.", volume=0.5)

Note

  • Automatically logs spoken text when log_setup() is active.

  • Requires: pip install piper-tts huggingface_hub

  • Linux requires: sudo apt install espeak-ng alsa-utils

  • Model files are saved in: Windows → %LOCALAPPDATA%autocorepiper_models

    Linux → ~/.local/share/autocore/piper_models/

  • Model size is approximately 60MB, downloaded once and reused.

  • Browse all voices at: https://huggingface.co/rhasspy/piper-voices

screenshot(*args)[source]

Takes a screenshot and saves it to the current working directory.

Modes

  1. Full screen, auto-named: ()

  2. Full screen, custom name: (filename)

  3. From (x,y) to screen edge, auto-named: (x, y)

  4. From (x,y) to screen edge, custom name: (x, y, filename)

  5. Specific region, auto-named: (x, y, width, height)

  6. Specific region, custom name: (x, y, width, height, filename)

  7. Selenium variants of all above: (driver, …)

Parameters:
  • *args

    Variable arguments depending on usage

    - ()                              : Full screen, auto-named
    - (filename)                      : Full screen, custom filename
    - (x, y)                          : From (x,y) to screen edge, auto-named
    - (x, y, filename)                : From (x,y) to screen edge, custom filename
    - (x, y, width, height)           : Specific region, auto-named
    - (x, y, width, height, filename) : Specific region, custom filename
    - (driver, ...)                   : Same as above but captures from Selenium browser
    

  • Where

    • driver: Selenium WebDriver instance

    • x, y: Top-left corner coordinates of the screenshot region

    • width, height: Dimensions of the screenshot region

    • filename: Custom name to save the screenshot
      • .png extension is added automatically if not provided

      • If not provided, auto-generates: screenshot_YYYY-MM-DD_HH-MM-SS_<unix>.png Example: screenshot_2025-02-18_14-30-45_1708268445.png

Returns:

True if successful, False otherwise

Output

  • Prints the full path of the saved screenshot on success.

  • Prints error message if invalid arguments or coordinates are provided.

Example

# Full screen (PyAutoGUI)
screenshot()                                    # Full screen, auto-named
screenshot('desktop.png')                       # Full screen, custom name

# Selenium full page
screenshot(driver)                              # Selenium full page, auto-named
screenshot(driver, 'webpage.png')               # Selenium full page, custom name

# From top-left point to edge (PyAutoGUI)
screenshot(100, 200)                            # From (100,200) to edge, auto-named
screenshot(100, 200, 'crop.png')                # From (100,200) to edge, custom name

# From top-left point to edge (Selenium)
screenshot(driver, 100, 200)                    # Selenium from (100,200) to edge
screenshot(driver, 100, 200, 'page.png')        # Selenium, custom name

# Specific region (PyAutoGUI)
screenshot(0, 0, 500, 300)                      # Region: top-left (0,0), 500x300, auto-named
screenshot(0, 0, 500, 300, 'region.png')        # Region: top-left (0,0), 500x300, custom name

# Specific region (Selenium)
screenshot(driver, 0, 0, 800, 600)              # Selenium region, auto-named
screenshot(driver, 0, 0, 800, 600, 'sel.png')   # Selenium region, custom name
scroll(*args, timeout=30)[source]

Universal scroll function for both PyAutoGUI and Selenium.

Parameters:
  • *args – Variable arguments (see examples below)

  • timeout – Max seconds when scrolling to ‘bottom’/’top’ (default: 30)

Returns:

True if successful, False otherwise

Output

  • Prints scroll direction and count on completion.

  • Prints progress every 10 scrolls for large scroll counts.

  • Prints confirmation when bottom or top is reached (Selenium).

Example

# PyAutoGUI Examples (scroll any window):
scroll()                      # Scroll down 1 time (default)
scroll(5)                     # Scroll down 5 times
scroll('down')                # Scroll down 1 time
scroll('down', 10)            # Scroll down 10 times
scroll('up', 5)               # Scroll up 5 times
scroll('bottom')              # Scroll down continuously for 30 seconds
scroll('bottom', timeout=60)  # Scroll down continuously for 60 seconds
scroll('top')                 # Scroll up continuously for 30 seconds

# Selenium Examples (pass driver object):
scroll(driver)                        # Scroll down 1 time in browser
scroll(driver, 5)                     # Scroll down 5 times in browser
scroll(driver, 'down')                # Scroll down 1 time in browser
scroll(driver, 'down', 10)            # Scroll down 10 times in browser
scroll(driver, 'up', 5)               # Scroll up 5 times in browser
scroll(driver, 'bottom')              # Scroll to bottom (auto-detect end)
scroll(driver, 'top')                 # Scroll to top (auto-detect end)
scroll(driver, 'bottom', timeout=120) # Scroll to bottom, max 2 minutes
scroll(driver, 'Login')               # Scroll to 1st instance of 'Login'
scroll(driver, 'Login', 2)            # Scroll to 2nd instance of 'Login'
scroll(driver, 'Login', -1)           # Scroll to last instance of 'Login'
scroll(driver, 'Login', -2)           # Scroll to 2nd last instance of 'Login'
second()[source]

Get current second.

Returns:

Current second (0-59)

Return type:

int

Example

second() # 45

wait(*args, countdown=True)[source]

Wait with countdown, wait for element or wait for color at pixel.

Parameters:
  • *args – Variable arguments (see examples)

  • countdown – If True, shows countdown display (default: True)

Returns:

True if successful, False if error or timeout

Example

# Countdown wait
wait(5)                              # Wait 5 seconds with countdown
wait(10, countdown=False)            # Wait 10 seconds silently
wait()                               # Wait 3 seconds (default)

# Wait for element (Selenium) - pass driver object
wait(driver, 'xpath', '//button')            # Wait max 180s with countdown
wait(driver, 'id', 'submit-btn', 10)         # Wait max 10s with countdown
wait(driver, 'class', 'content', 30, countdown=False)  # Wait silently for 30s

# Wait for color at pixel
wait(100, 200, 255, 0, 0)            # Wait for red (255,0,0) at (100,200) with countdown
wait(100, 200, 255, 0, 0, 30)        # Wait for red, max 30s with countdown
wait(500, 300, 0, 255, 0, 60, countdown=False)  # Wait silently
wait_download(timeout=1200, url=None, filename=None, download_dir=None)[source]

Wait for a browser-initiated download to complete or download a file directly via URL.

Modes

  1. URL mode (url provided): Downloads file directly using requests. Useful for file types blocked by browsers (.exe, .msix, .msi, etc.). File is saved to Python’s current working directory.

  2. Monitor mode (url not provided): Monitors the downloads folder for a browser-initiated download to complete.

Parameters:
  • timeout – Maximum seconds to wait for download completion (default: 1200)

  • url

    Direct download URL (optional):

    - If provided: Downloads file directly via requests
    - If None: Monitors downloads folder for browser-initiated download
    

  • filename

    Custom filename to save/rename the downloaded file (optional):

    - With extension (e.g. "myapp.exe")  : used as-is
    - Without extension (e.g. "myapp")   : extension borrowed from original file
    - If None: Original filename is kept
    - If multiple files are downloaded, only the first completed file is renamed
    

  • download_dir

    Custom download directory to monitor (monitor mode only, optional):

    - If provided: Uses specified path and skips all auto-detection
    - If None: Auto-detects using the priority order described in Note below
    

Returns:

str: Full path of the downloaded file (always includes extension) on success
False: On failure (download error, timeout, directory access issue, etc.)

Output

  • Prints download progress every 10 seconds showing elapsed time and file size.

  • Prints confirmation with final filename and saved path on completion.

  • Prints timeout message if download does not complete in time.

Example

wait_download()                                                   # Monitor downloads folder
wait_download(url='https://abc.com/file.msix')                    # Direct download via URL
wait_download(url='https://abc.com/file.msix', filename='myapp')  # Custom name, borrows extension
wait_download(300, filename='our_log.txt')                        # Monitor and rename on completion
wait_download(600, download_dir='/downloads')                     # Docker with custom path
wait_download(300, download_dir='D:/MyDownloads')                 # Windows custom path

# Use with browser() : pass driver.download_dir to guarantee alignment
driver = browser('https://example.com')
click(driver, 'id', 'download-button')
wait_download(download_dir=driver.download_dir)

Note

  • When download_dir is not provided, the folder is auto-detected in this order:
    1. DOWNLOAD_DIR environment variable (if set at OS level)

    2. /downloads folder (if running inside Docker)

    3. ~/Downloads (default fallback)

  • If a file was modified within the last 20 seconds before calling this function, it will be detected as a recently completed download and returned immediately. This handles cases where downloads complete very quickly before monitoring starts.

window(action=None, target=None, *args)[source]

Unified window management function for Windows and Linux.

Parameters:
  • action

    Window operation to perform (default: ‘list’):

    'list'     : Get all window titles
    'title'    : Get active window title (or find full title if target provided)
    'focus'    : Bring window to foreground
    'close'    : Close window
    'minimize' : Minimize window
    'maximize' : Maximize window
    'resize'   : Resize window (requires width, height)
    'move'     : Move window (requires x, y)
    

  • target – Window title or pattern (required for most actions)

  • *args – Additional parameters (width, height for resize; x, y for move)

Returns:

Return type depends on action:

list/None : List of strings when action is None or 'list'
title     : String or None
others    : True if successful, False otherwise

Raises:
  • ValueError – If invalid action, missing required parameters or invalid dimensions/coordinates

  • NotImplementedError – If called on macOS

Output

  • Prints error if window not found, with suggestions for similar window titles (focus action only).

  • Prints error if wmctrl or xdotool is not installed (Linux only).

Example

# Get all windows (default)
window()                                    # ['Chrome', 'Notepad', 'Excel']
window('list')                              # ['Chrome', 'Notepad', 'Excel']

# Check if window exists
if 'Chrome' in window():
    print("Chrome is open!")

# Get active window title
window('title')                             # 'Google Chrome - New Tab'

# Find full title containing text
window('title', 'Chrome')                   # 'Google Chrome - New Tab'
window('title', 'Note')                     # 'Untitled - Notepad'

# Window operations
window('focus', 'Chrome')                   # Focus window
window('close', 'Notepad')                  # Close window
window('minimize', 'Excel')                 # Minimize window
window('maximize', 'Word')                  # Maximize window

# Position and size
window('resize', 'Chrome', 800, 600)        # Resize to 800x600
window('move', 'Chrome', 100, 200)          # Move to (100, 200)

# Side-by-side setup (1920x1080 display)
window('move', 'Chrome', 0, 0)              # Left side
window('resize', 'Chrome', 960, 1080)       # Half screen width
window('move', 'Code', 960, 0)              # Right side
window('resize', 'Code', 960, 1080)         # Half screen width

# Recording setup
window('resize', 'Demo', 1280, 720)         # 720p size
window('move', 'Demo', 320, 180)            # Centered on 1920x1080

Note

  • On Linux, resize and move automatically remove maximized/minimized state before applying changes, ensuring consistent behavior.

  • Target matching is case-insensitive and partial (e.g. ‘Chrome’ matches ‘Google Chrome - New Tab’).

write(*keys)[source]

Write or type text using keyboard (PyAutoGUI or Selenium).

Parameters:

*keys – Variable arguments depending on mode

Returns:

True if successful, False otherwise

Output

  • Prints error message if element was not found (Selenium).

  • Prints error message if invalid arguments are provided.

Example

# PyAutoGUI mode (types in any active window)
write("Hello World")                                        # Types in active window
write("user@example.com")                                   # Types email
write("12345")                                              # Types numbers as string

# Selenium mode - type on active element in browser
write(driver, "Hello World")                                # Types on active element
write(driver, "Search query")                               # Types in focused input

# Selenium mode - type in specific element
write(driver, "id", "username", "john_doe")
write(driver, "xpath", "//input[@name='email']", "user@example.com")
write(driver, "class", "search-box", "Python tutorial")

Note

  • PyAutoGUI uses typewrite() which types one character at a time.

  • Selenium uses send_keys() which types the entire string at once.

year()[source]

Get current year.

Returns:

Current year (e.g., 2026)

Return type:

int

Example

year() # 2026

zoom(*args)[source]

Zoom in/out using steps or set zoom percentage.

Modes

  1. PyAutoGUI steps/reset: (value)

  2. Selenium steps: (driver, value) where value is -9 to +9

  3. Selenium percentage: (driver, value) where value is outside -9 to +9

  4. Selenium reset: (driver, 100) or (driver, 0)

Parameters:

*args

Variable arguments:

- (value): PyAutoGUI zoom steps/reset
- (driver, value): Selenium zoom steps/percentage/reset

Returns:

True if successful, False otherwise

Raises:

ValueError – If arguments are invalid, zoom value is not an integer, PyAutoGUI value is outside -9 to +9 (except 100), or Selenium percentage is less than 1%.

Output

  • Prints zoom direction and step count (PyAutoGUI).

  • Prints new zoom percentage after change (Selenium).

  • Prints confirmation when zoom is reset.

Value Logic

  • -9 to +9: Zoom steps (Ctrl+/Ctrl-)

  • 100 or 0: Reset to default/100%

  • Outside range (except 100): Percentage (Selenium only)

Example

# PyAutoGUI (desktop apps)
zoom(3)              # Zoom in 3 steps
zoom(-5)             # Zoom out 5 steps
zoom(100) or zoom(0) # Reset to default (Ctrl+0)
When zoom in/out is performed using UI (Ctrl and +/-) in Chrome,
the min %, zoom states % and max % follow this order:
    (25, 33, 50, 67, 75, 80, 90, 100, 110, 125, 150, 175, 200, 250, 300, 400, 500)

# Selenium (browser) - Steps
zoom(driver, 3)      # Zoom in current + 3 * 10%
zoom(driver, -5)     # Zoom out current - 5 * 10%

# Selenium (browser) - Reset
zoom(driver, 100)    # Reset to 100%
zoom(driver, 0)      # Reset to 100%

# Selenium (browser) - Percentage
zoom(driver, 150)    # Set zoom to 150%
zoom(driver, 75)     # Set zoom to 75%
zoom(driver, 50)     # Set zoom to 50%
zoom(driver, 200)    # Set zoom to 200%

Note

  • Selenium zoom is applied via JavaScript and is not reflected in the Chrome URL bar or the kebab menu zoom indicator.

  • PyAutoGUI reset (0 or 100) uses Ctrl+0 which resets to the application’s default zoom, which may not always be 100% (e.g. a PDF viewer may default to ‘fit to page’ instead).

  • Selenium reset explicitly sets zoom to exactly 100%.