ISO/IEC 14496-11:2005 specifies the coded representation of interactive audio-visual scenes and applications.
It specifies the following tools:
- the coded representation of the spatio-temporal positioning of audio-visual objects as well as their behaviour in response to interaction (scene description);
- the coded representation of synthetic two-dimensional (2D) or three-dimensional (3D) objects that can be manifested audibly and/or visually;
- the Extensible MPEG-4 Textual (XMT) format, a textual representation of the multimedia content described in ISO/IEC 14496 using the Extensible Markup Language (XML); and
- a system level description of an application engine (format, delivery, lifecycle, and behaviour of dowloadable Java byte code applications).