Track Overview

The FRAME Track is the single-image track of the ORena SAVE FOCUS Challenge. It evaluates whether a submitted algorithm can answer clinically relevant questions from one laparoscopic image, using only the image, associated metadata, and the provided question.

This page provides the technical track specification. Dataset composition, taxonomy details, resources, and general challenge background are described in the corresponding Overview and Data tabs.

FRAME Track

Description

The FRAME Track focuses on visual understanding at a single time point. Each task instance consists of a surgical RGB image and a natural-language question about foreign objects visible in the scene. The algorithm must return a short text answer.

The track targets capabilities that can be evaluated without temporal context, including:

Object identification: semantic classification of visible object instances
Object attributes and state: recognition of observable object properties or handling states
Object spatial localization (camera): object localization relative to the image plane
Object spatial localization (situs): object localization relative to visible anatomical structures
Object aggregation: aggregation of information across visible foreign object instances and/or categories

Algorithm Docker Input

The algorithm input consists of the image and the question. The question includes the metadata and the question text itself.

Image	Surgical RGB image from a laparoscopic procedure.
Question	Natural-language VQA question including the relevant metadata, such as procedure name, time point in the video, expected output, and a list of the foreign object classes.

The exact file structure and schema will follow the official submission template repository.

Algorithm Docker Output

Answer

Short text answer to the provided question.

The exact output format and validation rules will follow the official submission template repository.

Runtime Environment

AWS Hardware
NVIDIA L40S Tensor Core GPU
48GB VRAM

Time Limit
5 seconds per question

Execution
Docker container
No internet access during inference

Evaluation Scope

FRAME submissions are evaluated on single-frame question answering. The track does not require temporal grounding, duration estimation, temporal ordering, or retrieval-status reasoning across multiple video frames. Questions are restricted to information that can be inferred from the provided image and metadata.

Official Track Document

For the full formal specification, please consult the official FRAME Track document:

👉 FRAME Track PDF