Track Overview

The FRAME Track is the single-image track of the ORena SAVE FOCUS Challenge. It evaluates whether a submitted algorithm can answer clinically relevant questions from one laparoscopic image, using only the image, associated metadata, and the provided question.

This page provides the technical track specification. Dataset composition, taxonomy details, resources, and general challenge background are described in the corresponding Overview and Data tabs.


FRAME Track

Description

The FRAME Track focuses on visual understanding at a single time point. Each task instance consists of a surgical RGB image and a natural-language question about foreign objects visible in the scene. The algorithm must return a short text answer.

The track targets capabilities that can be evaluated without temporal context, including:

  • Object identification: semantic classification of visible object instances
  • Object attributes and state: recognition of observable object properties or handling states
  • Object spatial localization (camera): object localization relative to the image plane
  • Object spatial localization (situs): object localization relative to visible anatomical structures
  • Object aggregation: aggregation of information across visible foreign object instances and/or categories

  • Algorithm Docker Input

    The algorithm input consists of the image and the question. The question includes the metadata and the question text itself.

    Image Surgical RGB image from a laparoscopic procedure.
    Question Natural-language VQA question including the relevant metadata, such as procedure name, time point in the video, expected output, and a list of the foreign object classes.

    The exact file structure and schema will follow the official submission template repository.


    Algorithm Docker Output

    Answer Short text answer to the provided question.

    The exact output format and validation rules will follow the official submission template repository.


    Runtime Environment

    AWS Hardware
    NVIDIA L40S Tensor Core GPU
    48GB VRAM
    Time Limit
    5 seconds per question
    Execution
    Docker container
    No internet access during inference

    Evaluation Scope

    FRAME submissions are evaluated on single-frame question answering. The track does not require temporal grounding, duration estimation, temporal ordering, or retrieval-status reasoning across multiple video frames. Questions are restricted to information that can be inferred from the provided image and metadata.


    Official Track Document

    For the full formal specification, please consult the official FRAME Track document:

    👉 FRAME Track PDF