Skip to content

Vision Library API

Here are the vision methods for reading and processing the screen.

macro_studio.vision

captureScreenText

Python
captureScreenText(bounds: QRect) -> str

Captures a region of the screen and extracts text using Tesseract OCR.

This method performs a screen grab via MSS, converts the buffer to a grayscale binary image for better contrast, and then processes it through the Tesseract engine.

Parameters:

Name Type Description Default
bounds QRect

The rectangular area of the screen to read from.

required

Returns:

Type Description
str

The extracted text string, stripped of leading/trailing whitespace.

Raises:

Type Description
FileNotFoundError

If the Tesseract OCR binary is not installed at the path specified in 'pytesseract.pytesseract.tesseract_cmd'.

captureScreenColor

Python
captureScreenColor(point: QPoint) -> QColor

Captures the QColor of a specific pixel on the screen.

Parameters:

Name Type Description Default
point QPoint

The specific pixel location to read from.

required

Returns:

Type Description
QColor

The QColor of the specified pixel.

isColorSimilar

Python
isColorSimilar(color_a: QColor, color_b: QColor, tolerance: int = 10) -> bool

Checks if two colors are within a certain Euclidean distance in RGB space.

Parameters:

Name Type Description Default
color_a QColor

The first color to compare (usually captured from the screen).

required
color_b QColor

The second color to compare (usually the target variable).

required
tolerance int

The maximum Euclidean distance allowed between colors. 0 is an exact match, 10-20 is tight, 50+ is loose.

10

Returns:

Type Description
bool

True if the distance between the two colors is <= tolerance, False otherwise.

isColorSimilarPerceptual

Python
isColorSimilarPerceptual(color_a: QColor, color_b: QColor, tolerance: int = 10) -> bool

Checks if two colors are within a certain weighted RGB space based on human perception.

Best for distinguishing between subtle UI shades (e.g., 'Active' vs 'Inactive' buttons).

Parameters:

Name Type Description Default
color_a QColor

The first color to compare (usually captured from the screen).

required
color_b QColor

The second color to compare (usually the target variable).

required
tolerance int

The maximum Euclidean distance allowed between colors. 0 is an exact match, 10-20 is tight, 50+ is loose.

10

Returns:

Type Description
bool

True if the distance between the two colors is <= tolerance, False otherwise.

isBrightnessSimilar

Python
isBrightnessSimilar(color_a: QColor, color_b: QColor, tolerance: int = 10) -> bool

Checks if the lightness/luminance of two colors are similar.

Best for detecting if a screen region flashes, dims, or highlights, regardless of the actual color hue.

Parameters:

Name Type Description Default
color_a QColor

The first color to compare (usually captured from the screen).

required
color_b QColor

The second color to compare (usually the target variable).

required
tolerance int

The maximum Euclidean distance allowed between colors. 0 is an exact match, 10-20 is tight, 50+ is loose.

10

Returns:

Type Description
bool

True if the distance between the two colors is <= tolerance, False otherwise.

findImageCenter

Python
findImageCenter(template_path: str, bounds: QRect | None = None, threshold: float = 0.8) -> tuple[QPoint, float] | None

Finds an image template on the screen and return its absolute center coordinates.

Parameters:

Name Type Description Default
template_path str

Path to the template image.

required
bounds QRect | None

The bounds to search for the template in. If no bounds are provided, it searches the entire primary monitor.

None
threshold float

Confidence threshold to consider the result as a potential match.

0.8

Returns:

Type Description
tuple[QPoint, float] | None

The absolute center coordinates of the found template object and the confidence score, or None if not found.

getScreenState

Python
getScreenState(bounds: QRect | None = None) -> np.ndarray

Capture a region and return it as a BGR numpy array for custom processing.

Parameters:

Name Type Description Default
bounds QRect | None

The region to capture. If None, processes the whole screen.

None

Returns:

Type Description
ndarray

A BGR numpy array for custom processing.