<style type="text/css"> body:after { content: url(' '); position: fixed; bottom: 1.0em; left: 1.0em; } </style> # ONVIF and GStreamer Note: Hello everyone. My name is Mathieu Duponchelle and I'm an engineer at Centricular. The subject of this talk is ONVIF support in GStreamer, and some work I did this year to enable it. --- ### ONVIF * Open Network Video Interface Forum * Open standard * Originally a collaboration between Axis, Bosch and Sony Note: ONVIF stands for Open Network Video Interface Forum, it is an open standard that was initially developed as a collaboration between Axis, Bosch and Sony. Further members have joined since then, the latest figures from 2016 state that 461 members are now registered, with more than 6900 conformant products. --- ### Use cases * Video surveillance * Indexing and serving archive footage Note: The main use cases covered by the standard are Video Surveillance, and indexing and serving archive footage. For example, it can allow interoperability of an application to review surveillance footage with the hardware that captured and stored said footage, if the hardware vendor has made their hardware ONVIF-conformant. --- ### Specification * Actually, specification**s** * My scope was the Streaming specification * Other specifications: - Core specification: networking, device discovery, capabilities etc - More targeted specifications, eg. "Access Control", "Credential" Note: The ONVIF standard consists of a set of specifications, my scope for this work was the Playback section of the Streaming specification. The other main aspect of this specification covers the implementation of transport over WebSocket, which can enable direct playback in browsers. Other specifications include the Core specification, which addresses networking considerations, how devices should advertise themselves, how they can be discovered, their capabilities, what footage is available for what absolute times, and many other aspects I am not overly familiar with. On top of this specification, there are many other more targeted specifications, covering subjects such as Access Control, Credentials, and the subject of my talk today, the streaming specification. --- ### Conformance disclaimer * GStreamer is *not* an official ONVIF-conformant product * This requires passing a members-only test suite * Profile-based conformance Note: A quick disclaimer before we get more into this, I am not claiming that GStreamer is an ONVIF-conformant product: conformance is validated by a test tool, which is only available to official members. In addition, conformance is per profile: there are multiple profiles for different applications and use cases, and conformance to each profile requires implementing and supporting different ONVIF specifications. However, the work I did can make it easier for application writers and / or hardware vendors to use GStreamer to develop ONVIF-conformant products, comparable to the work Sebastian did to support ONVIF backchannels in gst-rtsp-server for instance. I'm interested of course to hear about such products, and potentially offer assistance with passing conformance tests. --- ### GStreamer components * gst-rtsp-server * rtspsrc (-good) Note: With this out of the way, I will now discuss the improvements I made as part of this work. I focused mainly on gst-rtsp-server, and rtspsrc for the client side. In addition, I implemented some new API in gstreamer core, and updated a few plugins to support these. --- ### Server side * ONVIF streaming is built on top of RTSP * Some features were missing (Scale / Speed) * New RTSP headers Note: The ONVIF streaming specification is built on top of RTSP, and depends on a sufficiently featured RTSP server implementation. Luckily gst-rtsp-server already implemented most of the mandatory features, with two notable exceptions, the support of the Scale and Speed headers in PLAY requests. On top of that, ONVIF specifies some new headers in various standard requests and responses, along with the expected behaviour when these headers are present. --- ### Server side - Speed * Controls the transfer rate of the server * Must be a positive number * With Speed: 2.0: - Transported bitrate == 2 * nominal bitrate - RTP timestamps are scaled accordingly Note: The Speed header, as specified by the RTSP RFC, controls the transfer rate delivered by the server. In effect, with a Speed of 2.0, the client will receive the same payload, at twice the delivery rate. The RTP timestamps will however be scaled accordingly: where the client would have received packets with timestamps 0, 100 and 200, it will receive the same packets with timestamps 0, 50 and 100. The Speed cannot be negative, and another mechanism is required to trigger reverse playback. --- ### Server side - Speed in GStreamer terms * gst-rtsp-server sends a Seek { rate: 2.0 } * Sources send a Segment { rate: 2.0 } * Sources then send buffers timestamped as usual * rtpbasepayload takes care of scaling RTP timestamps Note: The implementation of the support for the Speed header in gst-rtsp-server relies on an existing GStreamer mechanism: a seek event is sent upstream with a matching rate, and handling elements such as sources and demuxers reply with a segment with a corresponding rate. From the perspective of the source, nothing more is needed, they only have to send buffers with the same timestamps and durations as they would with a normal speed, and the base GStreamer payloader will take care of scaling the RTP timestamps according to the rate of the segment they received. --- ### Server side - Scale * Controls the timescale of the server * Can be negative, in which case playback is reversed * With Scale: 2.0: - Transported bitrate == nominal bitrate - The server alters the timeline of the transmitted media: Note: As I said before, the direction of data transmission cannot be controlled with the Speed header, and RTSP instead exposes this through another Header, Scale. Scale can be either a negative or positive number, and can be used in combination with Speed. When Scale has an absolute value of 1.0, it simply controls the direction of playback. However it can also hold different values, in which case its behaviour is different from Speed in the sense that the server is expected to keep delivering data at the nominal rate, but alter the transmitted media in such a way that its timeline matches the required value. To put it simply, if a scale of 2.0 is requested, the server is expected to either drop every other sample from the stream, or reencode the original media to achieve a similar effect. --- ### Server side - Scale in GStreamer terms * gst-rts-server sends a Seek { rate: 2.0, flags: TRICKMODE } * Sources send back a Segment { rate: 1.0, applied_rate: 2.0, flags: TRICKMODE } * Sources then usually have to reencode, dropping every other frame before the encoder Note: In GStreamer terms, this corresponds to a TRICKMODE seek: the server sends a seek upstream with a rate of 2.0 and the TRICKMODE flag set, and the handling elements reply with a Segment with a rate of 1.0, an applied_rate of 2.0, and the TRICKMODE flag set as well. If the original media consists of a video stream, with variable sizes for each frame, different types of frames etc, the required effect will not be achievable by simply dropping every other frame and retimestamping the rest. Instead, the server should reencode the video stream, dropping every other decoded frame and setting up a target bitrate on the encoder. --- ### Server side - Reverse playback behaviour * Format without GOP: strict reverse order * With GOP: - Start with the last GOP - Transmit it in forward order - Continue with the second to last GOP - rinse / repeat * Fortunately that's the GStreamer behaviour Note: When Scale is used to trigger reverse playback, for instance with a value of -1.0, the server cannot simply send packets in reverse order: while this is valid for simple formats, for video formats with Groups of Pictures, such as H264 with its I / P / B frames, the expected behaviour is slightly more complex than that: the server should start with the last Group of Pictures, send it in forward order, then send the second last Group of Pictures etc.. Fortunately, that's also the behaviour in GStreamer: when a demuxer is seeked with a negative rate, it sends buffers out in a similar fashion. The decoder then accumulates buffers until it has received a full GoP, and performs reversing at the GoP level. --- ### Server side - Combining Scale and Speed * Valid use case * Only ABS(Scale) == 1.0 && Speed != 1.0 is supported * New field probably needed in seek and segment events Note: As you may have noticed, while combining Scale and Speed in PLAY requests is a valid use case, it is not possible to express the combination of a Speed different from 1.0, and a Scale with an absolute value different from 1.0, as in GStreamer both features use the same rate field in seek events, albeit with different flags set. As a consequence, for now only a specific subset of combinations is supported by gst-rtsp-server, and a new mechanism will need to be designed to fully support combining both headers. --- ### Server side - feature check * `Require: onvif-replay` header * `gst_rtsp_media_factory_set_media_gtype (factory, GST_TYPE_RTSP_ONVIF_MEDIA);` Note: Let's now get into the ONVIF additions to RTSP. The first of those is a simple feature check, for clients to query ONVIF replay support from the server. Clients can add a Require: onvif-replay header in their SETUP requests, and if the GstRTSPMediaFactory was set up to serve ONVIF media, it will reply positively to the feature check. --- ### Server side - track identifiers * `x-onvif-track: XXX` in `DESCRIBE` responses * `gst_rtsp_media_factory_set_media_gtype (factory, GST_TYPE_RTSP_ONVIF_MEDIA);` Note: Another "informative" addition is the labeling of tracks in DESCRIBE responses: here again the user of the API simply needs to set up the media factory appropriately, and track identifiers will be added to DESCRIBE responses automagically. --- ### Server side - RTP header extension * Carries an NTP timestamp and various flags * NTP timestamp corresponds to the original capture time * `rtponviftimestamp` element, has to be added by the user Note: ONVIF also mandates that the server MUST add a custom extension in the headers of the RTP packets it sends out. This header extension will contain a NTP timestamp corresponding to the original capture time, and various one-bit flags carrying additional information about the packet. The first packet of a "synchronization point" (a keyframe) must have its "C" bit set. This is inferred from the DELTA_UNIT flag. The last packet of a contiguous section of recording must have the "E" bit set. This is inferred from the confusingly-named "discont" field of a custom event parsed by rtponviftimestamp. It is the responsibility of the application to send this event. The "D" bit must be set on the first packet of a continuous set of packets. This is inferred from the DISCONT flag, which is expected to be set at the beginning of each GOP in reverse playback. The "T" bit must be set when no more data is available, and is inferred from the EOS event. An element already existed in GStreamer for this purpose, "rtponviftimestamp", I updated it to cover all the specified flags as some were missing. For now, the user is expected to add the element manually when constructing the GstRTSPMedia, in the future we will probably want gst-rtsp-server to do so automatically. An equivalent element must be used on the client side, "rtponvifparse", as we'll see later. --- ### Server side - Rate-Control * TCP transport mandatory * `Rate-Control=no`: data is transferred as fast as possible * Transfer is paced only by the network / client * Meant as a file transfer mechanism, more on that later * `gst_rtsp_media_factory_set_media_gtype (factory, GST_TYPE_RTSP_ONVIF_MEDIA);` Note: Let's now discuss the main attraction, the ONVIF-specific RTSP trick modes. The first of those is the Rate-Control header. ONVIF mandates usage of TCP as the transport protocol, and thanks to that data can be delivered "as fast as possible", with the transfer rate only paced by the network and the consumption rate of the client. The intended use case for this feature is to use it as a file transfer mechanism, but it can also have other applications. Here again, the user only has to set the appropriate media type on the media factory, and the feature will be automatically available. On the implementation side, it is simply a matter of setting sync to false on the sinks of the pipeline, with some minor details regarding the RTP timestamps, all this is automatically taken care of. --- ### Server side - Frames trick modes * Controls what types of video frames are transmitted * Optionally controls the minimum interval * Typical use case is fast forward / fast rewind Note: Another ONVIF-specific header is the new "Frames" header. It can be used to control what types of frames the server will send out, with an optional interval, and is useful for limiting the data rate without reencoding. When doing very fast forward, for example 16x, using this will spare both bandwidth and CPU resources on the client side. --- ### Server side - Frames trick modes * `Frames: intra`: Transmit only keyframes (I) * `Frames: predicted`: Transmit only forward predicted frames (P) * `Frames: intra/<interval>`: Transmit only keyframes, with a minimum interval Note: A few different values are available: using Frames: intra will cause the server to only send keyframes, predicted will cause it to also send forward predicted frames, omitting bidirectional frames if there were any (B-frames). An optional interval can be specified when intra is used as the value, it can be used to deal with so-called "I-frame storms", where a stream consists mostly or entirely of keyframes, which would make the Frames trick mode inoperant. --- ### Server side - Frames trick modes (in GStreamer terms) * `Frames: intra`: `SEEK_FLAG_TRICKMODE_KEY_UNITS` * `Frames: predicted`: `SEEK_FLAG_TRICKMODE_FORWARD_PREDICTED` (new) * `<interval>`: `gst_event_set_seek_trickmode_interval()` (new) Note: In GStreamer terms, intra is translated to the TRICKMODE_KEY_UNITS seek flag, which already exists and is handled by several demuxers. We added a new flag for predicted, "TRICKMODE_FORWARD_PREDICTED", and new API for the seek event to set and get the trickmode interval. Currently, only qtdemux supports requesting an interval, and the FORWARD_PREDICTED trickmode is only supported by h264parse, which should be added by the user in the media pipeline. Unlike the KEY_UNITS trickmode, we could not add support for this directly in our target demuxer, qtdemux. In MP4s, keyframes are marked as such, but predicted frames are not distinguishable from other frames without parsing the codec bitstream. --- ### Client side * `rtspsrc`, `rtponvifparse` * Different approaches depending on the needed feature level Note: Let's talk about the client side now, the relevant elements and illustrate with a simple demonstration. The standard RTSP client element in GStreamer is rtspsrc, and only a few additions were necessary to make it ONVIF compatible: * The onvif-mode property was added. When set to True, seeks are interpreted as nanoseconds since prime epoch (January the first 1900), TCP is selected as the transport protocol, and trickmode flags are transformed into the appropriate headers. * The onvif-rate-control property was added to control the setting of the Rate-Control header. * Another property was added, "is-live". Setting it to FALSE makes the source preroll, and also means the client side pipeline can be PAUSED without triggering a PAUSE request. --- ### Client side demo: foreword * Server example emulates a footage archive * It has been recording since prime epoch (01-01-1900) * Same footage over and over, with regular blank intervals Note: The demonstration I will run now is made up of two executables, a server and a client. The server side emulates a footage archive. This synthetic archive has been recording the same footage over again since prime epoch, with evenly-sized intervals in between. For example, given a one minute clip, the server will send one minute of footage for the first minute of the last century, then nothing for 5 seconds, then the same minute footage again. --- ### Demo Note: Now is probably the right time to tell you a dirty little secret: The demo you just saw is always using Rate-Control=no .. Remember the slide earlier where I mentioned that the intended use case for this was file transfer? Well it turns out that it can also give a smart enough client complete control over the pace of transfer, and that's actually what we used to implement reverse playback satisfyingly: a major issue when reversing playback is how much data should be sent as fast as possible, for a full GOP to be decoded at the client side. This is further compounded by the obvious fact that GoPs aren't necessarily always the same size, and trying to be clever at the server side by looking at the actual media stream seemed like a clear road to catastrophe. With Rate-Control=no, all these problems go away, and the decoder can perform its usual job for reversing a stream, acting as a dynamically sized buffer. * Mention the fact that rtspsrc doesn't act as a live source in our case * Mention the workaround for transmitting buffers outside the Range in reverse --- ### Instant rate change * Patches exist to support client-side only instant rate changes * Request needed only for changing direction --- ### Trying this out * Examples in `gst-rtsp-server`: - `examples/test-onvif-server` - `examples/test-onvif-client` * Tests in `gst-rtsp-server`: `tests/check/gst/onvif.c` --- ### Thanks * Axis --- # Questions
{"metaMigratedAt":"2023-06-15T01:11:19.966Z","metaMigratedFrom":"YAML","title":"Talk slides template","breaks":true,"contributors":"[{\"id\":\"1dffacd1-d804-4d44-9327-3768f40fdd66\",\"add\":30951,\"del\":19018}]"}
    282 views