GES - HackMD

# GES ## Handling seeks ![](https://i.imgur.com/dhP3fCn.png) ok, I drew myself some diagrams for the different timescales. From bottom to top, the elements are: * the timeline, which uses the coordinates (T), * the input of the `nleoperation`, which uses the coordinates (A_i), * the `nleoperation` itself, which contains sub-elements not shown, * the output of the `nleoperation`, which uses the coordinates (A_o), * the input of the `nlesource`, which uses the coordinates (B_i), * the `nlesource`, which contains sub-elements that change the playback rate, * the output after the rate changing sub-element, which uses the coordinates (B_o), * the input of the decoder within the `nlesource`, which uses the coordinates (C_i). Note that the different coordinates are non-linearly related. The `nleoperation` has the properties: * start_a = 6 * duration_a = 6 * inpoint_a = 5 The start and duration are in the timeline coordinates (T), while inpoint is a difference in the output coordinates (A_o) The nlesource has the properties: * start_b = ? * duration_b = ? * inpoint_b = 3 The problem here is that start and duration are to be given in the timeline coordinates (T). Before time 5 in coordinates (B_i), there is no way of translating to (T) coordinates because the nleoperation provides no mapping in this area (that's why I used dots, rather than solid lines for any region not covered by the nleoperation). Therefore, it seems that start and duration are only meaningful for the most downstream nleobject. For subsequent nleobjects, I think choosing a start and inpoint that simply matches the previous nleobject makes sense. i.e. choosing * start_b = start_a * duration_b = duration_a > [name=Thibault Saunier] What do you mean by "choosing", nleobject start/duration are set by users (I think it is a wording/context issue rather than you making an reasoning error) The reason inpoint_b is 3 is because this is the shift that is performed when translating from (B_o) to (C_i). In some contexts, a negative inpoint may be relevant then. Note, in the above, I'm defining inpoint to be exactly this shift. I think other definitions would be too complicated. If nleoperation was an effect, it would have a zero inpoint. Provided the time 0 is always mapped to 0 by the effect, and rate changes within nlesource similarly maps 0 to 0, then inpoint_b would indeed refer to the media start time. However, we are allowing for the possibility of effects that do perform a shift in media, and therefore do not map 0 to 0. Effectively, we can think of inpoint as a basic form of such an effect. But, back to the general form in the diagram. We can see that translating from a seek to position in (T), would need to be shifted by start_a to get into the (A_i) coordinates. Therefore, we want translate_incoming_seek for the nlesource to simply apply ``` new_pos = current_pos - start eq.1 ``` We do not want to apply any media_duration_factor!! When this seek is passed to the sub-elements of nleoperation, it is transformed to ``` new_pos = f_a(current_pos) eq.2 ``` which puts the seek in the (A_o) coordinates. f_a is generally an unknown function. Now, to translate this seek to the (B_i) coordinates, we need to shift by inpoint_a. However, the nlesource will not have direct access to inpoint_a, so translate_outgoing_seek will need to perform ``` new_pos = current_pos + inpoint eq.3a ``` Now, really, for any nleobject that is not adjacent to the timeline (any upstream element), we would want translate_incoming_seek to do nothing to the seek. However, I'm not sure whether it is possible for an nleobject to know where it sits relative to the timeline. If we look at eq.1, we can see that we have already set the behaviour of translate_incoming_seek to subtract start_a. Therefore, if we make sure that start_a == start_b, then it would we could change translate_outgoing_seek to ``` new_pos = current_pos + inpoint + start eq.3b ``` so that when the nlesource applies translate_incoming_seek using eq.1, then it will reverse this + start operation. The current behaviour seems to rely on this trick. Two problems with this trick: * we would have to make sure that all start properties are exactly the same within a stack for this to work. * if the element upstream from an nleobject is not another nleobject, then this additional shift will not be reversed, and the seek will be incorrect. > But, I feel like the second case shouldn't really happen. Now, for the nlesource things are a bit more difficult. Suppose we have received a seek, which has already been processed by the standard translate_incoming_seek for nleobjects, i.e. we have a seek already in the (B_i) coordinates. If the nlesource had no internal rate-changing effects, then we could translate from the seek from the (B_i) coordinates to the (C_i) coordinates using ``` new_pos = current_pos + inpoint eq.4 ``` And this seek could be passed through the sub-elements to eventually reach the decoder. Two problems: * eq.4 would have to be applied in addition to eq.1, so we would have make translate_incoming_seek check if we are an nlesource instance, and then send the seek off to be further modified. * if nlesource contains sub-elements that change seek positions (like videorate) then this won't work To expand on the second point, if the effect of the sub-elements on the seek position is some function f_b. Then to translate from the (B_i) to the (C_i) coordinates, we would want: ``` new_position = f_b(current_position) + inpoint eq.5 ``` i.e., we need need to apply f_b before adding the inpoint shift. But we are trying to apply the shift before passing it to the sub-elements! I've thought of four options: 1. calculate eq.5 on receiving the seek on the nlesource. Pass the new seek position to the sub-elements, but add a seek flag that tells these elements to not change the seek positions. 2. if f_b can be reversed, calculate f_b^-1(f_b(current_position) + inpoint), and pass this seek position to the sub-elements (with no seek flags). The sub-elements will reverse the f_b^-1. 3. do not allow an nlesource to have such sub-elements, and ignore its inpoint value. I.e. an nlesource is basically just a decoder that exposes the raw data (like the otio MediaReference object). Instead, make a GESUriSource correspond to an nlesource-nleoperation pair. The nlesource would just hold a reference to the media file. The nleoperation would hold the rate and inpoint properties of the GESUriSource. Then, the nleopertion would apply the rate change, and afterwards would apply the inpoint shift using translate_outgoing_seek. 4. create a new special element who's only job is to shift seeks. Insert this new element between the seek-changing elements and the decoder. I.e., make the internal pipeline something like ``` decoder ! shifter shift=inpoint ! videorate ! ... ``` The shifter element would pass on all received data verbatim, except for seek events, which it will apply a shift to. Note, if we are only concerned with fixed rate changes to a media source, then option 1. would correspond to calculating ``` new_position = rate * current_position + inpoint eq.6 ``` and option 2. would correspond to ``` new_position = current_position + inpoint / rate eq.7 ```