## DF Recovery Proceedure Outline
The following procedure is based on Director's ability to deploy clouds using predictable IP addresses and an in-depth knowledge of the Heat database and Engine operational characteristics.
**Before beginning this procedure, please read everything within this doc.**
#### Administrative Tasks
The individual performing the administrative tasks will need to pull information on to the Director node to create a complete set of templates to recover this environment. This procedure assumes all assorted backups have been created, and the deployment is ready for recovery.
###### Setting up Predictable IP Addresses
The use of predictable IP addresses follows a specific file format; [see the official documentation for OSP 13 and predictable IP addresses](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/sect-controlling_node_placement#sect-Predictable_IPs) for more information on the requirements.
> Please note that the order of the IP addresses corresponds with the numbering order of the host indices; i.e., controller-0 will be the first IP, followed by the second.
###### Example `ips_from_pool.yaml` file (not to be used)
* Source file: `/usr/share/openstack-tripleo-heat-templates/environments/ips-from-pool-all.yaml`
The following example file is stored within the `stack` user's home folder.
``` yaml
Resource_registry:
OS::TripleO::Controller::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/external_from_pool.yaml
OS::TripleO::Controller::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml
OS::TripleO::Controller::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
OS::TripleO::Controller::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt_from_pool.yaml
OS::TripleO::Controller::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/tenant_from_pool.yaml
# Management network is optional and disabled by default
#OS::TripleO::Controller::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management_from_pool.yaml
OS::TripleO::Compute::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Compute::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml
OS::TripleO::Compute::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
OS::TripleO::Compute::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::Compute::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/tenant_from_pool.yaml
#OS::TripleO::Compute::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management_from_pool.yaml
OS::TripleO::CephStorage::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::CephStorage::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::CephStorage::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
OS::TripleO::CephStorage::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt_from_pool.yaml
OS::TripleO::CephStorage::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
#OS::TripleO::CephStorage::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management_from_pool.yaml
OS::TripleO::ObjectStorage::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::ObjectStorage::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml
OS::TripleO::ObjectStorage::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
OS::TripleO::ObjectStorage::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt_from_pool.yaml
OS::TripleO::ObjectStorage::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
#OS::TripleO::ObjectStorage::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management_from_pool.yaml
OS::TripleO::BlockStorage::Ports::ExternalPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
OS::TripleO::BlockStorage::Ports::InternalApiPort: /usr/share/openstack-tripleo-heat-templates/network/ports/internal_api_from_pool.yaml
OS::TripleO::BlockStorage::Ports::StoragePort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_from_pool.yaml
OS::TripleO::BlockStorage::Ports::StorageMgmtPort: /usr/share/openstack-tripleo-heat-templates/network/ports/storage_mgmt_from_pool.yaml
OS::TripleO::BlockStorage::Ports::TenantPort: /usr/share/openstack-tripleo-heat-templates/network/ports/noop.yaml
#OS::TripleO::BlockStorage::Ports::ManagementPort: /usr/share/openstack-tripleo-heat-templates/network/ports/management_from_pool.yaml
parameter_defaults:
ControllerIPs:
# Each controller will get an IP from the lists below, first controller, first IP
ctlplane:
- 192.168.24.23
external:
- 10.46.44.25
internal_api:
- 172.17.1.116
storage:
- 172.17.3.133
storage_mgmt:
- 172.17.4.128
tenant:
- 172.17.2.200
ComputeIPs:
# Each compute will get an IP from the lists below, first compute, first IP
ctlplane:
- 192.168.24.24
internal_api:
- 172.17.1.17
storage:
- 172.17.3.101
tenant:
- 172.17.2.171
```
Once the predictable IP address file has been constructed, add this file to the deployment process as the very last environment file included in the stack. A typical inclusion would look something like so, `-e /home/stack/ips_from_pool.yaml`.
> The IP address information can be obtained via the os-net-config JSON file. This JSON file is located under `/etc/os-net-config/config.json` on each Overcloud node.
> The usage of each IP range can be found in the `network_data.yaml` file on the Undercloud that is used for the Overcloud deployment.
**Remove given Predictable IPs from the provisioning pools. This ensures that reserved IPs for the overcloud nodes are not distributed by accident in the regular DHCP pools.**
* Modify `network_data.yaml` (or equivalent) to reduce DHCP pools in order to not conflict with the predictable IPs
> If applicable: for more information on modifying the network DHCP pools, please refer to the [official OSP 13 documentation for modifying basic network isolation](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/basic-network-isolation#modifying-isolated-network-configuration).
###### Disable Cleaning
To be sure there are no adverse effects, modify the PXE driver to *fake_pxe*, and ensure that node cleaning has been **disabled**.
###### Ceph FSID Setting
We believe we should hard code the ceph `FSID` within the customer's templates to ensure further nothing is lost or modified when it comes to the deployment configuration.
###### Database Backups
Before running anything potentially destructive, we recommend performing a full backup of the databases. While the database on the Undercloud is in a precarious state, we recommend making a backup of the existing condition to ensure we're able to restore the database in the event of any adverse effects.
``` shell
# Escalate your privledges to root
sudo su -
# Create a plain text dump of all databases on the undercloud
mysqldump --all-databases > alldbs.sql
```
###### Workload Backups
It is recommended that the cloud owner perform workload backups before any recovery operation is processed. While this step is not 100% required, it provides insurance that recovery is possible should anything go wrong; this is especially important when dealing with workloads built by hand without accompanying automation.
> The backup is an extra precaution we hope not to have to use. The procedure includes stopping ceph temporarily to backup a Ceph Mon; this is not a data backup.
* Backup overcloud - [Official OSP13 documentation](https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/undercloud_and_control_plane_back_up_and_restore/index).
Collect all relavant information about the cloud workload using all available tenant **\$RC** files.
``` shell
# Source tenant file
source $RC
# Pull container list
openstack container list
# Pull server list
openstack server list
```
> openstack container list is run here to check if they have any swift containers which are backed by RGW in their case. Any important containers can be downloaded with the `swift` CLI tools. Steps for `cinder` or `glance` have not been included; however, a similar discovery, download/export process could be executed as needed.
Once instances attached to the ceph datastore have been identified, it should be possible to snapshot running VMs to images.
#### Modifying the *heat* Database
Once the database is backed-up, begin with the restoration process. The first thing we need to do is update the Heat database to set the failed resources to *update*. To access the database, run the following commands.
``` shell
# Escalate your privledges to root
sudo su -
# Access the mysql REPL
mysql
```
Once on the SQL REPL, run the following update to set the failed and deleted resources to *update*.
``` sql
USE heat
UPDATE resource SET action='UPDATE' WHERE action='DELETE' AND status='FAILED';
UPDATE stack SET action='UPDATE' WHERE action='DELETE' AND status='FAILED';
FLUSH PRIVILEGES;
EXIT
```
#### Redeployment
With everything in place and all backups secured, we're now able to run the update deployment process.
> The command used for the deployment is unique for every environment and should be well documented for the customer.
Upon completion of the deployment process Director will have regained control of the stack.