# Dataset Size Report
### Deep Signatures for Indexing and Retrieval in Large Motion Databases
[Paper Link](https://dl.acm.org/doi/10.1145/2822013.2822024)
This actually is a quite interesting paper that they proposed a motion compression, indexing method specific for data-driven motion research. However, their method is quite different with ours.
They have 2 Dataset
1. diversified CMU Motion Database
2. a medium size subset of the CMU MotionDatabase with manual segmentation and annotations
They also cut the dataset to many motion segments.
**Each motion segment is about 50 frames. Depends on the dataset, they also used 100 to 200 frames long motion segment.**
Refer to Section 4.1, 5.2.1, 5.2.2
>Motion segments are the fundamental units in our motion database.
Original, captured motion clips can be segmented either manually
or automatically. It is up to the users how they want to manually
segment the motion clips. For large database like the CMU Motion
Database with thousands of motion clips, it is less feasible to perform manual segmentation. We provide a very basic velocity-based
automatic segmentation method.
>We first compute the weighted velocity of joint rotations per
frame. Torso joint rotations are given higher weights than endeffector joint rotations like the wrists and fingers. If the weighted
velocity of frame i is a local minima, and below a velocity
threshold, frame i is then detected as a segmentation point. We
use this segmentation method to process large motion databases.
Typically, an automatically segmented motion segment is about
one hundred frames long and is subsampled to 50 frames. We
discard short segments which are less than 20 frames long.
>The duration of the manually extracted segments varies, from 40
frames to hundreds of frames, depending on the motion content.
-----
### Motion Graph
[Paper Link](https://research.cs.wisc.edu/graphics/Papers/Gleicher/Mocap/mograph.pdf)
They tested out their method with mutiple different datasets and styles of motion, they didn't mention the exact size in megabyte, instead, the number of frames is given
3 Datasets
Walking motion : 3000 frames (100 seconds)
Martial Art : 3000 frames (100 seconds)
Walk + sneak + martial art : 6000 frames (200 seconds)
They did mention that they set the branch and bound for better motion searching.
**So I would think the motion segement is 25-120 frames**
refer to section 4.2
>While branch and bound reduces the number of graph walks we
have to test against f , it does not change the fact that the search
process is inherently exponential — it merely lowers the effective
branching factor. For this reason we generate a graph walk incrementally. At each step we use branch and bound to find an optimal
graph walk of n frames. We retain the first m frames of this graph
walk and use the final retained node as a starting point for another
search. This process continues until a complete graph walk is generated. In our implementation we used values of n from 80 to 120
frames (2$^2/_3$ to 4 seconds) and m from 25 to 30 frames (about one
second).
------
### Compression of Motion Capture Databases
[Paper Link](https://dl.acm.org/doi/10.1145/1141911.1141971)
They applied their compression method to large dataset combined with mutiple different locomotion
Database1 (standing,walking,running,skipping,etc) : 180MB (1.5 hours)
Database2 (CMU dataset same with us): 1085MB (6.5 hours)
During compression, they actually split the motion database into clips of *k* subsequent frames. More details in Section 4.5
**Conclusion : I think they apply their compression algorithm to the motion data of every 16-32 frames**
>For compression, we split the motion database into clips of k
subsequent frames. For example, the first k frames is the first clip
and the next k subsequent frames is the second. The clip size (k)
is a compression parameter that affects the compression. We will
discuss compression parameters in Section 4.5
>4.5 *k*: the number of frames in a clip. The bigger this number is,
the smoother the reconstructed signal will be. If k is too small,
then the compression will not be able to take full advantage of
the temporal coherence. If it is too big, the correlation between joints will not be linear and CPCA will perform poorly.
Optimal numbers we found are 16-32 frames (130 - 270 milliseconds).
-----
### Automated Extraction and Parameterization of Motions in Large Data Sets
[Paper Link](https://dl.acm.org/doi/abs/10.1145/1015706.1015760)
This is paper that proposed a motion clustering algorithm. Finding similar motions.
Dataset : 37000 frames (10 minutes)
Then this data is divided to 30 files range from 3s to 75s.
Each clip would contain 3s to 75s motion data.
Refer to section 3.4
>We tested our match web implementation on a data set containing 37,000 frames, or a little over 10 minutes of motion sampled
at 60Hz. This data was divided into thirty files ranging in length
from 3s to 75s, and it included both motions where the actor performed a scripted sequence of specific moves and motions consisting of random variations of the same action. The former class of
motions included picking up and putting back objects at predefined
locations, walking/jogging in a spiral at different speeds, stepping
onto/off of platforms of various heights, and sitting down/standing
up using chairs of various heights. The latter class of motions consisted of kicks, punches, cartwheels, jumping, and hopping on one
foot.
-----
### A deep learning framework for character motion synthesis and editing
[Paper Link](https://dl.acm.org/doi/abs/10.1145/2897824.2925975)
Even if this is not a motion compression paper, they seperated the entire motion dataset to small clips for better training.
Dataset : CMU + their internal captures : total size is double of CMU mocap dataset,
fixed window size : 240 frames for each clip
>In general, our model does not require motion clips to have a fixed
length, but having a fixed window size during training can improve
the speed, so for this purpose we separate the motion database
into overlapping windows of n frames (overlapped by n/2 frames),
where n = 240 in our experiments. This results in a final input vector, representing a single sample from the database, as X ∈ R
n×d
with n being the window size and d the degrees of freedom of the
body model, which is 70 in our experiments. After training the window size n is not fixed in our framework, thus it can handle motions
of arbitrary lengths.