Function Reference¶
- browser(url, headless=False, timeout=30, cookie_path=None)[source]¶
Initialize and return a browser instance for web automation.
- Parameters:
url – Target URL to navigate to
headless – Run browser in headless mode (default: False)
timeout – Maximum seconds to wait for elements to appear (default: 30)
cookie_path –
Path to cookies JSON file (optional)
- Cookies MUST be in JSON format - Export from Chrome using "Cookie-Editor" extension - Cookie domain must match the target URL
- Returns:
Browser instance or None if initialization fails
- Return type:
WebDriver
Example
# Basic usage driver = browser('https://google.com') click(driver, 'id', 'search-button') # Slow-loading site driver = browser('https://slow-site.gov', timeout=90) click(driver, 'id', 'submit-btn') # Waits up to 90s # Fast site testing driver = browser('https://fast-site.com', timeout=5) click(driver, 'id', 'login-btn') # Fails fast in 5s # With cookies driver = browser('https://site.com', cookie_path='cookies.json') # Headless mode driver = browser('https://google.com', headless=True) # Trigger a download and wait for it driver = browser('https://example.com') click(driver, 'id', 'download-button') wait_download(download_dir=driver.download_dir)
Note
Uses undetected-chromedriver (uc) to bypass bot detection. Requires Google Chrome to be installed.
- Windows:
winget install Google.Chrome
- Linux (Ubuntu/Debian/Mint):
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb sudo dpkg -i google-chrome-stable_current_amd64.deb sudo apt-get install -f -y
- Linux (RHEL/CentOS/Fedora):
wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm sudo rpm -i google-chrome-stable_current_x86_64.rpm
- click(*where)[source]¶
Performs left-click based on different input types.
Modes
Image matching: Click on visual element
OCR text matching: Click on text found on screen (with nth occurrence support)
Coordinates: Click at specific x, y position
Color matching: Click on specific color in region
Selenium web element: Click on element in browser
- Parameters:
*where – Variable arguments depending on click mode
- Returns:
True if successful, False otherwise
Example
# Image matching click('button.png') # OCR text matching click('Submit') # Click first occurrence click('Submit', 2) # Click 2nd occurrence click('Login', 0) # Click all occurrences # Coordinates click(100, 200) # Color matching in region click(x1, y1, x2, y2, r, g, b) # Find and click color click(x1, y1, x2, y2, r, g, b, tolerance) # With tolerance # Selenium (pass driver object first) click(driver, 'id', 'submit-button') click(driver, 'xpath', '//button[@id="submit"]') click(driver, 'class', 'btn-primary') click(driver, 'name', 'username') click(driver, 'css', 'button.submit') click(driver, 'tag', 'button') click(driver, 'text', 'Click Here') click(driver, 'partial', 'Click')
- click_right(*where)[source]¶
Performs right-click (context menu) based on different input types.
Modes
Image matching: Right-click on visual element
OCR text matching: Right-click on text found on screen (with nth occurrence support)
Coordinates: Right-click at specific x, y position
Color matching: Right-click on specific color in region
Selenium web element: Right-click on element in browser
- Parameters:
*where – Variable arguments depending on click mode
- Returns:
True if successful, False otherwise
Example
# Image matching click_right('button.png') # OCR text matching click_right('Submit') # Right-click first occurrence click_right('Submit', 2) # Right-click 2nd occurrence click_right('Login', 0) # Right-click all occurrences # Coordinates click_right(100, 200) # Color matching in region click_right(x1, y1, x2, y2, r, g, b) # Find and right-click color click_right(x1, y1, x2, y2, r, g, b, tolerance) # With tolerance # Selenium (pass driver object first) click_right(driver, 'id', 'submit-button') click_right(driver, 'xpath', '//button[@id="submit"]') click_right(driver, 'class', 'btn-primary') click_right(driver, 'name', 'username') click_right(driver, 'css', 'button.submit') click_right(driver, 'tag', 'button') click_right(driver, 'text', 'Click Here') click_right(driver, 'partial', 'Click')
- copy(*where)[source]¶
Copies text from various sources: screen, clipboard, Selenium elements or web pages.
Modes
Active window: Copy all content from current window
Clipboard: Get current clipboard content
Screen coordinates: Click at position and copy
Selenium webpage: Copy entire page content
Selenium element: Copy element text or attribute value
- Parameters:
*where – Variable arguments depending on copy mode
- Returns:
Copied text or ‘’ if nothing was copied
- Return type:
str
Example
# Active window - Copy everything from current window # Ctrl+A, Ctrl+C from active window copy() # Clipboard # Get current clipboard content copy('clipboard') # Screen coordinates # Click at (500, 300) and copy copy(500, 300) # Selenium webpage - Copy entire page # Copy all webpage content copy(driver) # Selenium element - Copy text copy(driver, 'id', 'username-display') copy(driver, 'xpath', '//div[@class="name"]') copy(driver, 'class', 'user-info') copy(driver, 'name', 'description') copy(driver, 'css', 'p.content') copy(driver, 'tag', 'h1') copy(driver, 'text', 'Welcome') copy(driver, 'partial', 'Hello') # Selenium element - Copy attribute copy(driver, 'id', 'download-link', 'href') # Get link URL copy(driver, 'class', 'product-img', 'src') # Get image source copy(driver, 'id', 'email-field', 'value') # Get input value copy(driver, 'xpath', '//a[@id="link"]', 'title') # Get title attribute
- csv_to_xlsx(csv_file=None, delete_csv=True)[source]¶
Converts CSV file(s) to XLSX format.
- Parameters:
csv_file – Path to CSV file or None to auto-detect single CSV in current directory
delete_csv – If True, deletes original CSV after conversion (default: True)
- Returns:
Path of created XLSX file or None if error
- Return type:
str
Output
Prints the detected CSV filename when auto-detected.
Prints conversion result showing source and destination filenames.
Prints confirmation when original CSV is deleted.
Example
# Auto-detect single CSV in current directory (deletes CSV by default) csv_to_xlsx() # Finds, converts and deletes CSV # Specific file (deletes CSV by default) csv_to_xlsx('data.csv') # Converts and deletes data.csv # Keep original CSV csv_to_xlsx('report.csv', delete_csv=False) # Keeps report.csv
- date()[source]¶
Get current day of month.
- Returns:
Current day (1-31)
- Return type:
int
Example
date() # 24
- day()[source]¶
Get current day of week.
- Returns:
Day name in lowercase (monday, tuesday, wednesday, thursday, friday, saturday, sunday)
- Return type:
str
Example
# Weekday check if day() == 'monday': print("It is Monday today.")
- drag(*args)[source]¶
Drag from source to target.
Modes
PyAutoGUI screen drag: (x1, y1, x2, y2)
Selenium element drag: (driver, src_type, src_selector, tgt_type, tgt_selector)
- Parameters:
PyAutoGUI – (x1, y1, x2, y2)
Selenium – (driver, src_type, src_selector, tgt_type, tgt_selector)
- Returns:
True if successful, False otherwise
Output
Prints drag confirmation showing source and target coordinates (PyAutoGUI).
Prints drag confirmation showing source and target selectors (Selenium).
Example
# Screen drag (PyAutoGUI) - 2 second duration drag(100, 200, 500, 600) # Web element drag (Selenium) drag(driver, 'id', 'card-1', 'class', 'done-column') drag(driver, 'xpath', '//li[1]', 'xpath', '//li[5]') # Multiple drivers driver1 = browser('https://trello.com') driver2 = browser('https://jira.com') drag(driver1, 'id', 'task-1', 'id', 'done-column') drag(driver2, 'class', 'issue', 'class', 'backlog')
- dropdown_select(driver_obj, selector_type, selector, selection_criteria)[source]¶
Selects an item from a dropdown menu based on the provided criteria.
- Parameters:
driver_obj – Selenium WebDriver instance
selector_type – Type of selector (‘id’, ‘name’, ‘xpath’, ‘class’, ‘css’, ‘tag’, ‘text’, ‘partial’)
selector – The value of the selector
selection_criteria – Index (int) or visible text (str) for selection
- Returns:
True if successful, False otherwise
Output
Prints confirmation showing the selected option index or text.
Example
# Select by index dropdown_select(driver, 'id', 'country-dropdown', 0) # Select first option dropdown_select(driver, 'id', 'country-dropdown', 2) # Select third option # Select by visible text dropdown_select(driver, 'id', 'country-dropdown', 'United States') dropdown_select(driver, 'name', 'language', 'English') dropdown_select(driver, 'xpath', '//select[@name="city"]', 'New York') # Different selector types dropdown_select(driver, 'class', 'form-select', 'Option 1') dropdown_select(driver, 'css', 'select.dropdown', 'Value')
- erase(*args)[source]¶
Erase/clear text from input fields.
Modes
PyAutoGUI active window: ()
Selenium specific element: (driver, selector_type, selector)
- Parameters:
*args – Variable arguments depending on mode
- Returns:
True if successful, False otherwise
Example
# PyAutoGUI mode (erase active window) erase() # Select all and delete (Ctrl+A, Delete) # Selenium mode (erase specific element) erase(driver, 'id', 'username') # Clear username field erase(driver, 'xpath', '//input[@name="email"]') # Clear email field erase(driver, 'class', 'search-box') # Clear search box
- find_browser(*args)[source]¶
Find text in browser using Ctrl+F (find function).
- Parameters:
*args – Variable arguments depending on mode
- Returns:
True if successful, False otherwise
Output
Prints confirmation with the searched text (PyAutoGUI).
Prints confirmation if text was found and highlighted (Selenium).
Prints a message if text was not found on the page (Selenium).
Example
# PyAutoGUI mode (any window) find_browser('Python') # Find in active window find_browser('error message') # Find phrase # Selenium mode (browser) find_browser(driver, 'Python') # Find in Selenium browser find_browser(driver, 'contact us') # Find phrase in browser
Note
PyAutoGUI: Opens find dialog (Ctrl+F), types search term, presses Enter, then Esc.
Selenium: Uses JavaScript to highlight matching text on the page in yellow and scrolls to the first match. Removes any previous highlights before applying new ones.
Default wait time between actions is 1 second (PyAutoGUI only).
- find_key(data, key)[source]¶
Recursively finds all values of a specified key in nested data structures (dictionaries, lists and tuples). Particularly useful for searching deeply nested JSON data from API responses or parsed files.
- Parameters:
data – Data structure to search (dict, list or tuple)
key – Key name to find
- Returns:
All values found for the key (empty list if not found)
- Return type:
list
Example
# Single occurrence data = {'name': 'John', 'age': 30} name = find_key(data, 'name')[0] # 'John' # Multiple occurrences data = { 'user': {'id': 1, 'name': 'Alice'}, 'admin': {'id': 2, 'name': 'Bob'} } ids = find_key(data, 'id') # [1, 2] # Nested lists/tuples data = {'users': [{'age': 25}, {'age': 30}]} ages = find_key(data, 'age') # [25, 30] # API response workflow response = requests.get('https://api.example.com/users').json() ids = find_key(response, 'id') # finds all 'id' values # Parsed file workflow data = json.loads(read('data.json')) hosts = find_key(data, 'host') # finds all 'host' values
- find_str(string, starts_after, ends_before, index=0)[source]¶
Extracts substring between two markers.
- Parameters:
string – Text to search in
starts_after – Start extraction after this
ends_before – End extraction before this
index – Which match (0=first, -1=last, 1=second, etc.)
- Returns:
Extracted string or None if not found
- Return type:
str or None
Example
# Extract version number from string text = 'Version: 1.0.5 released' version = find_str(text, 'Version: ', ' released') # version = '1.0.5' # Extract last occurrence using index=-1 text = 'User: Alice logged in. User: Bob logged in' last_user = find_str(text, 'User: ', ' logged', -1) # last_user = 'Bob'
- hour()[source]¶
Get current hour.
- Returns:
Current hour (0-23) in 24-hour format
- Return type:
int
Example
hour() # 14 (2 PM)
- inspect()[source]¶
Opens GUI to inspect pixel position and color with zoomed preview.
Usage
Click on the Pixel Inspector window to bring it into focus.
Move the mouse to the desired pixel.
Press ‘ESC’ to capture.
Output
Prints position and RGB/HEX values to console.
Copies ‘x, y, r, g, b’ to clipboard.
- log_setup(title)[source]¶
Sets up logging and terminal styling for the script.
Combines terminal setup with comprehensive logging and automatic color-coded status indication. Creates a logs folder and saves all output with timestamps. Shows output in terminal while also saving to file.
- Parameters:
title – Name for both terminal title and log file
Example
log_setup("MyScript") print("This gets logged") # ... script runs ... # Terminal turns GREEN on success or RED on crash
Log file format
logs/log_MyScript_2026-03-26_14-30-45_IST_session-1.txt (active - newest logs) logs/log_MyScript_2026-03-26_14-30-45_IST_session-1_part_1.txt (2nd newest - rotated) logs/log_MyScript_2026-03-26_14-30-45_IST_session-1_part_2.txt (3rd newest) ... logs/log_MyScript_2026-03-26_14-30-45_IST_session-1_part_9.txt (oldest backup)
Session numbering
session-1 : First run of this script session-2 : Second run of this script session-3 : Third run, etc. session-N : Automatically increments based on existing log files
Features
Sets terminal title and colors (blue bg, white text)
Automatic color changes: Blue to Green (success) or Blue to Red (crash)
Automatic session numbering (increments from previous runs)
Captures all print() statements
Captures all errors and exceptions
Adds timestamp to each entry
Shows output in terminal AND saves to file
Automatic file rotation (10MB per file, max 10 files = 100MB per session)
Automatic cleanup (keeps max 100MB total logs across all sessions)
Note
Terminal colors change automatically based on script outcome:
Blue background : Script is running Green background : Script completed successfully Red background : Script crashed (unhandled exception)
- minute()[source]¶
Get current minute.
- Returns:
Current minute (0-59)
- Return type:
int
Example
minute() # 23
- month()[source]¶
Get current month.
- Returns:
Current month (1-12)
- Return type:
int
Example
month() # 2 (February)
- press(*keys)[source]¶
Press keyboard keys with support for Selenium, PyAutoGUI and key combinations.
Modes
PyAutoGUI single key: (key)
PyAutoGUI key N times: (key, count)
PyAutoGUI key combination: (key1, key2, …)
Selenium driver key: (driver, key)
Selenium driver key N times: (driver, key, count)
Selenium driver key combination: (driver, key1, key2, …)
Selenium element key: (driver, selector_type, selector, key)
- Parameters:
*keys – Variable arguments for key presses
- Returns:
True if successful, False otherwise
Example
# PyAutoGUI keys (no driver needed) press("tab") # Single key press("tab", 5) # Press 5 times press("tab", -5) # Press 5 times with shift held press("ctrl", "a") # Two-key combo press("alt", "ctrl", "z") # Three-key combo press("num5") # Numpad 5 press("volumeup") # Volume up press("mute") # Volume mute (short form) press("back") # Browser back (short form) press("forward") # Browser forward (short form) # Selenium driver keys (pass driver object) press(driver, "tab") press(driver, "tab", 5) # Press tab 5 times press(driver, "tab", -5) # Press shift+tab 5 times press(driver, "ctrl", "c") press(driver, "ctrl", "shift", "s") # Selenium element + key (driver, selector_type, selector, key) press(driver, "xpath", "//input", "enter") press(driver, "id", "username", "tab")
Note
Negative count presses the key with Shift held (e.g. Shift+Tab for reverse navigation).
PyAutoGUI-only keys (num0-9, volumeup, volumedown, mute, back, forward, etc.) are not supported in Selenium mode.
Short forms supported: ‘back’ for browserback, ‘forward’ for browserforward, ‘mute’ for volumemute.
- read(*args)[source]¶
Extract text from screen (using OCR), files (by parsing file format) or a Selenium browser window.
Modes
No arguments: OCR full screen
2 integers: OCR from (x, y) to bottom-right corner
4 integers: OCR specific region (x, y, width, height)
1 string: Read file by parsing its format
1 driver object: Take screenshot of Selenium browser and read using OCR
Supported file formats
Documents: PDF, DOCX, PPTX, ODT, RTF
Tabular: CSV, TSV, XLSX, SQLite
Structured: JSON, YAML, XML, INI/CFG
Text: TXT, LOG, MD
Web: HTML
Email: EML, MSG
eBooks: EPUB
Scripts: SH, BAT, PY
- Parameters:
*args – Variable arguments depending on mode
- Returns:
Extracted text or None if error
- Return type:
str
Example
# OCR - Read entire screen text = read() # OCR - Read from (100, 200) to bottom-right corner text = read(100, 200) # OCR - Read specific region: x=100, y=200, width=400, height=300 text = read(100, 200, 400, 300) # Selenium - Read text from browser window using OCR d1 = browser('https://example.com') text = read(d1) # File - Read with extension text = read('report.pdf') text = read('data.csv') text = read('script.py') # File - Read without extension (auto-detects) text = read('report') # Finds report.pdf automatically text = read('config') # Finds config.yaml automatically # Check if text on screen if 'login' in read(): print("Login visible!")
Note
OCR first run downloads models (~100MB), subsequent runs are fast.
- run(target)[source]¶
Runs a file or application on Windows and Linux.
- Parameters:
target –
File path or application name to execute
If target is a file path: Opens with default application
If target is an application name: Launches the application
For applications, the command must be available in system PATH
- Raises:
NotImplementedError – If called on macOS
Output
Prints error message if file or application was not found.
Prints error message if permission was denied.
Example
# Open files with default application run("sample.txt") # Opens in default text editor run("document.pdf") # Opens in default PDF viewer run("C:\Users\file.xlsx") # Opens Excel file # Launch applications (Windows) run("calc") # Calculator run("notepad") # Notepad run("mspaint") # Paint # Launch applications (Linux) run("gedit") # Text editor run("firefox") # Browser run("gnome-calculator") # Calculator
Note
Windows: Uses os.startfile for files, subprocess for applications.
Linux: Uses xdg-open for files, direct execution for applications. xdg-utils is required (included in Linux dependencies).
- say(text, volume=1.0)[source]¶
Speak text using offline Text-to-Speech via Piper TTS.
- Parameters:
text – Text to speak
volume – Volume level 0.0 to 1.0 (default: 1.0)
- Returns:
None
Example
say("Hello, how are you?") say("Download complete.") say("Error occurred, please try again.", volume=0.7) say("Warning: Low battery.", volume=0.5)
Note
Automatically logs spoken text when log_setup() is active.
Requires: pip install piper-tts huggingface_hub
Linux requires: sudo apt install espeak-ng alsa-utils
- Model files are saved in: Windows → %LOCALAPPDATA%autocorepiper_models
Linux → ~/.local/share/autocore/piper_models/
Model size is approximately 60MB, downloaded once and reused.
Browse all voices at: https://huggingface.co/rhasspy/piper-voices
- screenshot(*args)[source]¶
Takes a screenshot and saves it to the current working directory.
Modes
Full screen, auto-named: ()
Full screen, custom name: (filename)
From (x,y) to screen edge, auto-named: (x, y)
From (x,y) to screen edge, custom name: (x, y, filename)
Specific region, auto-named: (x, y, width, height)
Specific region, custom name: (x, y, width, height, filename)
Selenium variants of all above: (driver, …)
- Parameters:
*args –
Variable arguments depending on usage
- () : Full screen, auto-named - (filename) : Full screen, custom filename - (x, y) : From (x,y) to screen edge, auto-named - (x, y, filename) : From (x,y) to screen edge, custom filename - (x, y, width, height) : Specific region, auto-named - (x, y, width, height, filename) : Specific region, custom filename - (driver, ...) : Same as above but captures from Selenium browser
Where –
driver: Selenium WebDriver instance
x, y: Top-left corner coordinates of the screenshot region
width, height: Dimensions of the screenshot region
- filename: Custom name to save the screenshot
.png extension is added automatically if not provided
If not provided, auto-generates: screenshot_YYYY-MM-DD_HH-MM-SS_<unix>.png Example: screenshot_2025-02-18_14-30-45_1708268445.png
- Returns:
True if successful, False otherwise
Output
Prints the full path of the saved screenshot on success.
Prints error message if invalid arguments or coordinates are provided.
Example
# Full screen (PyAutoGUI) screenshot() # Full screen, auto-named screenshot('desktop.png') # Full screen, custom name # Selenium full page screenshot(driver) # Selenium full page, auto-named screenshot(driver, 'webpage.png') # Selenium full page, custom name # From top-left point to edge (PyAutoGUI) screenshot(100, 200) # From (100,200) to edge, auto-named screenshot(100, 200, 'crop.png') # From (100,200) to edge, custom name # From top-left point to edge (Selenium) screenshot(driver, 100, 200) # Selenium from (100,200) to edge screenshot(driver, 100, 200, 'page.png') # Selenium, custom name # Specific region (PyAutoGUI) screenshot(0, 0, 500, 300) # Region: top-left (0,0), 500x300, auto-named screenshot(0, 0, 500, 300, 'region.png') # Region: top-left (0,0), 500x300, custom name # Specific region (Selenium) screenshot(driver, 0, 0, 800, 600) # Selenium region, auto-named screenshot(driver, 0, 0, 800, 600, 'sel.png') # Selenium region, custom name
- scroll(*args, timeout=30)[source]¶
Universal scroll function for both PyAutoGUI and Selenium.
- Parameters:
*args – Variable arguments (see examples below)
timeout – Max seconds when scrolling to ‘bottom’/’top’ (default: 30)
- Returns:
True if successful, False otherwise
Output
Prints scroll direction and count on completion.
Prints progress every 10 scrolls for large scroll counts.
Prints confirmation when bottom or top is reached (Selenium).
Example
# PyAutoGUI Examples (scroll any window): scroll() # Scroll down 1 time (default) scroll(5) # Scroll down 5 times scroll('down') # Scroll down 1 time scroll('down', 10) # Scroll down 10 times scroll('up', 5) # Scroll up 5 times scroll('bottom') # Scroll down continuously for 30 seconds scroll('bottom', timeout=60) # Scroll down continuously for 60 seconds scroll('top') # Scroll up continuously for 30 seconds # Selenium Examples (pass driver object): scroll(driver) # Scroll down 1 time in browser scroll(driver, 5) # Scroll down 5 times in browser scroll(driver, 'down') # Scroll down 1 time in browser scroll(driver, 'down', 10) # Scroll down 10 times in browser scroll(driver, 'up', 5) # Scroll up 5 times in browser scroll(driver, 'bottom') # Scroll to bottom (auto-detect end) scroll(driver, 'top') # Scroll to top (auto-detect end) scroll(driver, 'bottom', timeout=120) # Scroll to bottom, max 2 minutes scroll(driver, 'Login') # Scroll to 1st instance of 'Login' scroll(driver, 'Login', 2) # Scroll to 2nd instance of 'Login' scroll(driver, 'Login', -1) # Scroll to last instance of 'Login' scroll(driver, 'Login', -2) # Scroll to 2nd last instance of 'Login'
- second()[source]¶
Get current second.
- Returns:
Current second (0-59)
- Return type:
int
Example
second() # 45
- wait(*args, countdown=True)[source]¶
Wait with countdown, wait for element or wait for color at pixel.
- Parameters:
*args – Variable arguments (see examples)
countdown – If True, shows countdown display (default: True)
- Returns:
True if successful, False if error or timeout
Example
# Countdown wait wait(5) # Wait 5 seconds with countdown wait(10, countdown=False) # Wait 10 seconds silently wait() # Wait 3 seconds (default) # Wait for element (Selenium) - pass driver object wait(driver, 'xpath', '//button') # Wait max 180s with countdown wait(driver, 'id', 'submit-btn', 10) # Wait max 10s with countdown wait(driver, 'class', 'content', 30, countdown=False) # Wait silently for 30s # Wait for color at pixel wait(100, 200, 255, 0, 0) # Wait for red (255,0,0) at (100,200) with countdown wait(100, 200, 255, 0, 0, 30) # Wait for red, max 30s with countdown wait(500, 300, 0, 255, 0, 60, countdown=False) # Wait silently
- wait_download(timeout=1200, url=None, filename=None, download_dir=None)[source]¶
Wait for a browser-initiated download to complete or download a file directly via URL.
Modes
URL mode (url provided): Downloads file directly using requests. Useful for file types blocked by browsers (.exe, .msix, .msi, etc.). File is saved to Python’s current working directory.
Monitor mode (url not provided): Monitors the downloads folder for a browser-initiated download to complete.
- Parameters:
timeout – Maximum seconds to wait for download completion (default: 1200)
url –
Direct download URL (optional):
- If provided: Downloads file directly via requests - If None: Monitors downloads folder for browser-initiated download
filename –
Custom filename to save/rename the downloaded file (optional):
- With extension (e.g. "myapp.exe") : used as-is - Without extension (e.g. "myapp") : extension borrowed from original file - If None: Original filename is kept - If multiple files are downloaded, only the first completed file is renamed
download_dir –
Custom download directory to monitor (monitor mode only, optional):
- If provided: Uses specified path and skips all auto-detection - If None: Auto-detects using the priority order described in Note below
Returns:
str: Full path of the downloaded file (always includes extension) on success False: On failure (download error, timeout, directory access issue, etc.)
Output
Prints download progress every 10 seconds showing elapsed time and file size.
Prints confirmation with final filename and saved path on completion.
Prints timeout message if download does not complete in time.
Example
wait_download() # Monitor downloads folder wait_download(url='https://abc.com/file.msix') # Direct download via URL wait_download(url='https://abc.com/file.msix', filename='myapp') # Custom name, borrows extension wait_download(300, filename='our_log.txt') # Monitor and rename on completion wait_download(600, download_dir='/downloads') # Docker with custom path wait_download(300, download_dir='D:/MyDownloads') # Windows custom path # Use with browser() : pass driver.download_dir to guarantee alignment driver = browser('https://example.com') click(driver, 'id', 'download-button') wait_download(download_dir=driver.download_dir)
Note
- When download_dir is not provided, the folder is auto-detected in this order:
DOWNLOAD_DIR environment variable (if set at OS level)
/downloads folder (if running inside Docker)
~/Downloads (default fallback)
If a file was modified within the last 20 seconds before calling this function, it will be detected as a recently completed download and returned immediately. This handles cases where downloads complete very quickly before monitoring starts.
- window(action=None, target=None, *args)[source]¶
Unified window management function for Windows and Linux.
- Parameters:
action –
Window operation to perform (default: ‘list’):
'list' : Get all window titles 'title' : Get active window title (or find full title if target provided) 'focus' : Bring window to foreground 'close' : Close window 'minimize' : Minimize window 'maximize' : Maximize window 'resize' : Resize window (requires width, height) 'move' : Move window (requires x, y)
target – Window title or pattern (required for most actions)
*args – Additional parameters (width, height for resize; x, y for move)
- Returns:
Return type depends on action:
list/None : List of strings when action is None or 'list' title : String or None others : True if successful, False otherwise
- Raises:
ValueError – If invalid action, missing required parameters or invalid dimensions/coordinates
NotImplementedError – If called on macOS
Output
Prints error if window not found, with suggestions for similar window titles (focus action only).
Prints error if wmctrl or xdotool is not installed (Linux only).
Example
# Get all windows (default) window() # ['Chrome', 'Notepad', 'Excel'] window('list') # ['Chrome', 'Notepad', 'Excel'] # Check if window exists if 'Chrome' in window(): print("Chrome is open!") # Get active window title window('title') # 'Google Chrome - New Tab' # Find full title containing text window('title', 'Chrome') # 'Google Chrome - New Tab' window('title', 'Note') # 'Untitled - Notepad' # Window operations window('focus', 'Chrome') # Focus window window('close', 'Notepad') # Close window window('minimize', 'Excel') # Minimize window window('maximize', 'Word') # Maximize window # Position and size window('resize', 'Chrome', 800, 600) # Resize to 800x600 window('move', 'Chrome', 100, 200) # Move to (100, 200) # Side-by-side setup (1920x1080 display) window('move', 'Chrome', 0, 0) # Left side window('resize', 'Chrome', 960, 1080) # Half screen width window('move', 'Code', 960, 0) # Right side window('resize', 'Code', 960, 1080) # Half screen width # Recording setup window('resize', 'Demo', 1280, 720) # 720p size window('move', 'Demo', 320, 180) # Centered on 1920x1080
Note
On Linux, resize and move automatically remove maximized/minimized state before applying changes, ensuring consistent behavior.
Target matching is case-insensitive and partial (e.g. ‘Chrome’ matches ‘Google Chrome - New Tab’).
- write(*keys)[source]¶
Write or type text using keyboard (PyAutoGUI or Selenium).
- Parameters:
*keys – Variable arguments depending on mode
- Returns:
True if successful, False otherwise
Output
Prints error message if element was not found (Selenium).
Prints error message if invalid arguments are provided.
Example
# PyAutoGUI mode (types in any active window) write("Hello World") # Types in active window write("user@example.com") # Types email write("12345") # Types numbers as string # Selenium mode - type on active element in browser write(driver, "Hello World") # Types on active element write(driver, "Search query") # Types in focused input # Selenium mode - type in specific element write(driver, "id", "username", "john_doe") write(driver, "xpath", "//input[@name='email']", "user@example.com") write(driver, "class", "search-box", "Python tutorial")
Note
PyAutoGUI uses typewrite() which types one character at a time.
Selenium uses send_keys() which types the entire string at once.
- year()[source]¶
Get current year.
- Returns:
Current year (e.g., 2026)
- Return type:
int
Example
year() # 2026
- zoom(*args)[source]¶
Zoom in/out using steps or set zoom percentage.
Modes
PyAutoGUI steps/reset: (value)
Selenium steps: (driver, value) where value is -9 to +9
Selenium percentage: (driver, value) where value is outside -9 to +9
Selenium reset: (driver, 100) or (driver, 0)
- Parameters:
*args –
Variable arguments:
- (value): PyAutoGUI zoom steps/reset - (driver, value): Selenium zoom steps/percentage/reset
- Returns:
True if successful, False otherwise
- Raises:
ValueError – If arguments are invalid, zoom value is not an integer, PyAutoGUI value is outside -9 to +9 (except 100), or Selenium percentage is less than 1%.
Output
Prints zoom direction and step count (PyAutoGUI).
Prints new zoom percentage after change (Selenium).
Prints confirmation when zoom is reset.
Value Logic
-9 to +9: Zoom steps (Ctrl+/Ctrl-)
100 or 0: Reset to default/100%
Outside range (except 100): Percentage (Selenium only)
Example
# PyAutoGUI (desktop apps) zoom(3) # Zoom in 3 steps zoom(-5) # Zoom out 5 steps zoom(100) or zoom(0) # Reset to default (Ctrl+0) When zoom in/out is performed using UI (Ctrl and +/-) in Chrome, the min %, zoom states % and max % follow this order: (25, 33, 50, 67, 75, 80, 90, 100, 110, 125, 150, 175, 200, 250, 300, 400, 500) # Selenium (browser) - Steps zoom(driver, 3) # Zoom in current + 3 * 10% zoom(driver, -5) # Zoom out current - 5 * 10% # Selenium (browser) - Reset zoom(driver, 100) # Reset to 100% zoom(driver, 0) # Reset to 100% # Selenium (browser) - Percentage zoom(driver, 150) # Set zoom to 150% zoom(driver, 75) # Set zoom to 75% zoom(driver, 50) # Set zoom to 50% zoom(driver, 200) # Set zoom to 200%
Note
Selenium zoom is applied via JavaScript and is not reflected in the Chrome URL bar or the kebab menu zoom indicator.
PyAutoGUI reset (0 or 100) uses Ctrl+0 which resets to the application’s default zoom, which may not always be 100% (e.g. a PDF viewer may default to ‘fit to page’ instead).
Selenium reset explicitly sets zoom to exactly 100%.