Track Overview
The FRAME Track is the single-image track of the ORena SAVE FOCUS Challenge. It evaluates whether a submitted algorithm can answer clinically relevant questions from one laparoscopic image, using only the image, associated metadata, and the provided question.
This page provides the technical track specification. Dataset composition, taxonomy details, resources, and general challenge background are described in the corresponding Overview and Data tabs.
FRAME Track
Description
The FRAME Track focuses on visual understanding at a single time point. Each task instance consists of a surgical RGB image and a natural-language question about foreign objects visible in the scene. The algorithm must return a short text answer.
The track targets capabilities that can be evaluated without temporal context, including:
- Object identification: semantic classification of visible object instances
- Object attributes and state: recognition of observable object properties or handling states
- Object spatial localization (camera): object localization relative to the image plane
- Object spatial localization (situs): object localization relative to visible anatomical structures
- Object aggregation: aggregation of information across visible foreign object instances and/or categories
Algorithm Docker Input
The algorithm input consists of the image and the question. The question includes the metadata and the question text itself.
| Image | Surgical RGB image from a laparoscopic procedure. |
| Question | Natural-language VQA question including the relevant metadata, such as procedure name, time point in the video, expected output, and a list of the foreign object classes. |
The exact file structure and schema will follow the official submission template repository.
Algorithm Docker Output
| Answer | Short text answer to the provided question. |
The exact output format and validation rules will follow the official submission template repository.
Runtime Environment
|
AWS Hardware NVIDIA L40S Tensor Core GPU 48GB VRAM |
Time Limit 5 seconds per question |
Execution Docker container No internet access during inference |
Evaluation Scope
FRAME submissions are evaluated on single-frame question answering. The track does not require temporal grounding, duration estimation, temporal ordering, or retrieval-status reasoning across multiple video frames. Questions are restricted to information that can be inferred from the provided image and metadata.
Official Track Document
For the full formal specification, please consult the official FRAME Track document:
👉 FRAME Track PDF