Vision Library API

Here are the vision methods for reading and processing the screen.

macro_studio.vision

captureScreenText

Python

captureScreenText(bounds: QRect) -> str

Captures a region of the screen and extracts text using Tesseract OCR.

This method performs a screen grab via MSS, converts the buffer to a grayscale binary image for better contrast, and then processes it through the Tesseract engine.

Parameters:

Name	Type	Description	Default
`bounds`	`QRect`	The rectangular area of the screen to read from.	required

Returns:

Type	Description
`str`	The extracted text string, stripped of leading/trailing whitespace.

Raises:

Type	Description
`FileNotFoundError`	If the Tesseract OCR binary is not installed at the path specified in 'pytesseract.pytesseract.tesseract_cmd'.

captureScreenColor

Python

captureScreenColor(point: QPoint) -> QColor

Captures the QColor of a specific pixel on the screen.

Parameters:

Name	Type	Description	Default
`point`	`QPoint`	The specific pixel location to read from.	required

Returns:

Type	Description
`QColor`	The QColor of the specified pixel.

isColorSimilar

Python

isColorSimilar(color_a: QColor, color_b: QColor, tolerance: int = 10) -> bool

Checks if two colors are within a certain Euclidean distance in RGB space.

Parameters:

Name	Type	Description	Default
`color_a`	`QColor`	The first color to compare (usually captured from the screen).	required
`color_b`	`QColor`	The second color to compare (usually the target variable).	required
`tolerance`	`int`	The maximum Euclidean distance allowed between colors. 0 is an exact match, 10-20 is tight, 50+ is loose.	`10`

Returns:

Type	Description
`bool`	True if the distance between the two colors is <= tolerance, False otherwise.

isColorSimilarPerceptual

Python

isColorSimilarPerceptual(color_a: QColor, color_b: QColor, tolerance: int = 10) -> bool

Checks if two colors are within a certain weighted RGB space based on human perception.

Best for distinguishing between subtle UI shades (e.g., 'Active' vs 'Inactive' buttons).

Parameters:

Name	Type	Description	Default
`color_a`	`QColor`	The first color to compare (usually captured from the screen).	required
`color_b`	`QColor`	The second color to compare (usually the target variable).	required
`tolerance`	`int`	The maximum Euclidean distance allowed between colors. 0 is an exact match, 10-20 is tight, 50+ is loose.	`10`

Returns:

Type	Description
`bool`	True if the distance between the two colors is <= tolerance, False otherwise.

isBrightnessSimilar

Python

isBrightnessSimilar(color_a: QColor, color_b: QColor, tolerance: int = 10) -> bool

Checks if the lightness/luminance of two colors are similar.

Best for detecting if a screen region flashes, dims, or highlights, regardless of the actual color hue.

Parameters:

Name	Type	Description	Default
`color_a`	`QColor`	The first color to compare (usually captured from the screen).	required
`color_b`	`QColor`	The second color to compare (usually the target variable).	required
`tolerance`	`int`	The maximum Euclidean distance allowed between colors. 0 is an exact match, 10-20 is tight, 50+ is loose.	`10`

Returns:

Type	Description
`bool`	True if the distance between the two colors is <= tolerance, False otherwise.

findImageCenter

Python

findImageCenter(template_path: str, bounds: QRect | None = None, threshold: float = 0.8) -> tuple[QPoint, float] | None

Finds an image template on the screen and return its absolute center coordinates.

Parameters:

Name	Type	Description	Default
`template_path`	`str`	Path to the template image.	required
`bounds`	`QRect \| None`	The bounds to search for the template in. If no bounds are provided, it searches the entire primary monitor.	`None`
`threshold`	`float`	Confidence threshold to consider the result as a potential match.	`0.8`

Returns:

Type	Description
`tuple[QPoint, float] \| None`	The absolute center coordinates of the found template object and the confidence score, or None if not found.

getScreenState

Python

getScreenState(bounds: QRect | None = None) -> np.ndarray

Capture a region and return it as a BGR numpy array for custom processing.

Parameters:

Name	Type	Description	Default
`bounds`	`QRect \| None`	The region to capture. If `None`, processes the whole screen.	`None`

Returns:

Type	Description
`ndarray`	A BGR numpy array for custom processing.