# Data-preprocessing
The data preprocessing steps are mainly divided into five aspects:
* Collection of real photos
* Preprocessing of 3D Models
* Preprocessing of Exemplars
* Preprocessing of Exemplar-Shape Pairs
* Training data generation
## 1. Collection of real photos
Collect a large number of real pictures of a certain category (e.g. chairs) from the Internet, and require the background to be pure white, as clear as possible, and as few other sundries as possible.
## 2. Preprocessing of 3D Models
**Step 1:** Download the ShapeNet and PartNet datasets from the official website to the `data/3D_Dataset` folder and unzip them. You only need to extract files corresponding to the category you want(e.g. chair).
**Step 2:** Merge the parts of each model in partNet into a whole shape, and assign corresponding semantics to each part according to the partNet semantic tree.
```
python -m src.data_preprocess.shapes.merge_partnet_chair
```
**Step 3:** Preprocess the PartNet shapes so that they have UV maps. Ensure that `PARTNET_CORE_DIR`, `PARTNET_TAXONOMY_PATH` and `PARTNET_META_DIR` are properly set and run.
```
python -m src.data_preprocess.shapes.preprocess_partnet
```
**Step 4:** In order to leverage the the ShapeNet's material prior in generating training data, you need to align the ShapeNet and PartNet shapes.
```
python -m src.data_preprocess.shapes.align_shapenet2partnet
```
**Step 5:** You need to find the corresponding shape in ShapeNet for each shape in PartNet, and generate the uv map of the corresponding shape in ShapeNet to use its reasonable segmentation information to help the subsequent generation of reasonable segmentation training data.
```
python -m src.data_preprocess.shapes.preprocess_shapenet
```
## 3. Preprocessing of Exemplars
**Step 1:** Create an Exemplars table in the database.
```
docker exec -i $(docker-compose ps -q postgres) psql -U photoshape_user photoshape_db < data/postgres/exemplars.sql
```
**Step 2:** Insert every collected real img exemplar into the Exemplars table.
```
```
**Step 3:** Compute Substance Map for each real exemplar.
```
python -m src.data_preprocess.exemplars.compute_substance_maps --category chair
```
## 4. Preprocessing of Exemplar-Shape Pairs
**Step 1:** Put the Shapes under 200 different camera views and take photos, and then compute the shape feature vector of each rendered image.
```
DISPLAY=:0 vglrun python -m src.data_preprocess.shapes.generate_alignment_rends
```
**Step 2:** Extract the feature vectors of the rendered images in 200 different camera views and store them.
```
python -m src.data_preprocess.shapes.generate_alignment_features --chair
```
**Step 3:** Extract the feature vectors of Exemplars and store them.
```
python -m src.data_preprocess.exemplars.compute_alignment_features --chair
```
**Step 4:** Match the feature vectors between Shapes and Exemplars and find matching pairs.
```
python -m src.data_preprocess.pair.compute_pairs
```
**Step 5:** You need to calculate the mask of the Shapes in pair in this view for subsequent processing.
```
DISPLAY=:0 vglrun python -m src.data_preprocess.pair.generate_part_renderings
```
**Step 6:** Compute the SIFT Flow from Exemplars to Shapes.
```
python -m src.data_preprocess.pair.compute_flows
```
**Step 7:** Compute the warped renderings from Exemplars to Shapes.
```
python -m src.data_preprocess.pair.generate_part_warped_renderings
```
## 5. Training data generation
**Step 1:** Extract the pairs of a reasonable camera viewing angle.
```
python -m src.data_preprocess.generate_data.camera_pose_piror
```
**Step 2:** According to shapeNet, the shape of partNet is divided into more reasonable parts, and the semantic substance distribution of each part semantic is counted according to the exemplar of each pair, and the statistical results will be saved in the json file.
```
python -m src.data_preprocess.generate_data.colletor_server --port 6667 ./data/trainning_data/masks/
DISPLAY=:0 vglrun python -m src.data_preprocess.generate_data. generate_new_data --category table --client-id 1 --host 127.0.0.1 --port 6667
```
**Step 3:** Count the material comparison between Shapes with multiple parts in ShapeNet and corresponding Shapes in PartNet.
```
python -m src.data_preprocess.generate_data.analysis_shapenet_material_mask
```
**Step 4:** Count the correlation of each part.
```
python -m src.data_preprocess.generate_data.compute_material_semantic_ matrix
```
**Step 5:** Visualize the correlation of part, select a reasonable pair, and record it in the txt file.
```
python -m src.data_preprocess.generate_data.vis_merge_material
```
**Step 6:** Extract the semantic segmentation maps of all the Shapes.
```
python -m src.data_preprocess.generate_data.extract_image_test
```
**Step 7:** Check whether the segmentation maps of the Shapes after part
grouping is reasonable. If there is any segmentation unreasonable, continue to add or remove pair, update it to the previously mentioned txt file, and run the following code after all the pair has been identified.
```
python -m src.data_preprocess.generate_data.group_materials
```
**Step 8:** Generate blender file of each pair.
```
python -m data_preprocess.generate_data.generate_blender_file --category chair --num-workers 12
```
**Step 9:** Based on the generated blender file, the training data is generated by blender rendering.
```
find ./blender_all_chair -name '*.inferred.blend' -exec ../../blender-2.83/blender -b -P blender2.83_img_mask.py -- {} \;
```
**Step 10:** According to the dataset generated by rendering, it is divided into datasets, found training pairs, and renamed, etc.
```
python -m src.data_preprocess.generate_data.preprocess_trainig_data
```