Dataset Size Report

# Dataset Size Report ### Deep Signatures for Indexing and Retrieval in Large Motion Databases [Paper Link](https://dl.acm.org/doi/10.1145/2822013.2822024) This actually is a quite interesting paper that they proposed a motion compression, indexing method specific for data-driven motion research. However, their method is quite different with ours. They have 2 Dataset 1. diversified CMU Motion Database 2. a medium size subset of the CMU MotionDatabase with manual segmentation and annotations They also cut the dataset to many motion segments. **Each motion segment is about 50 frames. Depends on the dataset, they also used 100 to 200 frames long motion segment.** Refer to Section 4.1, 5.2.1, 5.2.2 >Motion segments are the fundamental units in our motion database. Original, captured motion clips can be segmented either manually or automatically. It is up to the users how they want to manually segment the motion clips. For large database like the CMU Motion Database with thousands of motion clips, it is less feasible to perform manual segmentation. We provide a very basic velocity-based automatic segmentation method. >We first compute the weighted velocity of joint rotations per frame. Torso joint rotations are given higher weights than endeffector joint rotations like the wrists and fingers. If the weighted velocity of frame i is a local minima, and below a velocity threshold, frame i is then detected as a segmentation point. We use this segmentation method to process large motion databases. Typically, an automatically segmented motion segment is about one hundred frames long and is subsampled to 50 frames. We discard short segments which are less than 20 frames long. >The duration of the manually extracted segments varies, from 40 frames to hundreds of frames, depending on the motion content. ----- ### Motion Graph [Paper Link](https://research.cs.wisc.edu/graphics/Papers/Gleicher/Mocap/mograph.pdf) They tested out their method with mutiple different datasets and styles of motion, they didn't mention the exact size in megabyte, instead, the number of frames is given 3 Datasets Walking motion : 3000 frames (100 seconds) Martial Art : 3000 frames (100 seconds) Walk + sneak + martial art : 6000 frames (200 seconds) They did mention that they set the branch and bound for better motion searching. **So I would think the motion segement is 25-120 frames** refer to section 4.2 >While branch and bound reduces the number of graph walks we have to test against f , it does not change the fact that the search process is inherently exponential — it merely lowers the effective branching factor. For this reason we generate a graph walk incrementally. At each step we use branch and bound to find an optimal graph walk of n frames. We retain the first m frames of this graph walk and use the final retained node as a starting point for another search. This process continues until a complete graph walk is generated. In our implementation we used values of n from 80 to 120 frames (2$^2/_3$ to 4 seconds) and m from 25 to 30 frames (about one second). ------ ### Compression of Motion Capture Databases [Paper Link](https://dl.acm.org/doi/10.1145/1141911.1141971) They applied their compression method to large dataset combined with mutiple different locomotion Database1 (standing,walking,running,skipping,etc) : 180MB (1.5 hours) Database2 (CMU dataset same with us): 1085MB (6.5 hours) During compression, they actually split the motion database into clips of *k* subsequent frames. More details in Section 4.5 **Conclusion : I think they apply their compression algorithm to the motion data of every 16-32 frames** >For compression, we split the motion database into clips of k subsequent frames. For example, the first k frames is the first clip and the next k subsequent frames is the second. The clip size (k) is a compression parameter that affects the compression. We will discuss compression parameters in Section 4.5 >4.5 *k*: the number of frames in a clip. The bigger this number is, the smoother the reconstructed signal will be. If k is too small, then the compression will not be able to take full advantage of the temporal coherence. If it is too big, the correlation between joints will not be linear and CPCA will perform poorly. Optimal numbers we found are 16-32 frames (130 - 270 milliseconds). ----- ### Automated Extraction and Parameterization of Motions in Large Data Sets [Paper Link](https://dl.acm.org/doi/abs/10.1145/1015706.1015760) This is paper that proposed a motion clustering algorithm. Finding similar motions. Dataset : 37000 frames (10 minutes) Then this data is divided to 30 files range from 3s to 75s. Each clip would contain 3s to 75s motion data. Refer to section 3.4 >We tested our match web implementation on a data set containing 37,000 frames, or a little over 10 minutes of motion sampled at 60Hz. This data was divided into thirty files ranging in length from 3s to 75s, and it included both motions where the actor performed a scripted sequence of specific moves and motions consisting of random variations of the same action. The former class of motions included picking up and putting back objects at predefined locations, walking/jogging in a spiral at different speeds, stepping onto/off of platforms of various heights, and sitting down/standing up using chairs of various heights. The latter class of motions consisted of kicks, punches, cartwheels, jumping, and hopping on one foot. ----- ### A deep learning framework for character motion synthesis and editing [Paper Link](https://dl.acm.org/doi/abs/10.1145/2897824.2925975) Even if this is not a motion compression paper, they seperated the entire motion dataset to small clips for better training. Dataset : CMU + their internal captures : total size is double of CMU mocap dataset, fixed window size : 240 frames for each clip >In general, our model does not require motion clips to have a fixed length, but having a fixed window size during training can improve the speed, so for this purpose we separate the motion database into overlapping windows of n frames (overlapped by n/2 frames), where n = 240 in our experiments. This results in a final input vector, representing a single sample from the database, as X ∈ R n×d with n being the window size and d the degrees of freedom of the body model, which is 70 in our experiments. After training the window size n is not fixed in our framework, thus it can handle motions of arbitrary lengths.