Reporte EPEC GC === ###### tags: `epec_gc` `reporte` :::info - **Info:** Lista de los intentos para migrar la DB. - **Date:** 27/08/2020 ::: ## Estado de los planes - :o: : abortada - :x: : error - :construction: en progreso - :exclamation: : descartado - :hourglass: : pendiente - :ok: : implementado y operativo |numero | plan |estado | |-------|----------|--------| |1 | mover la db al chasis nuevo | :exclamation: | |2 | generar dump de la db sin la telemetria y exportarla en la nueva db | :o: | |3 | generar dump de la db sin las tablas de mayor peso y exportarla en la nueva db | :x: | |4 | hacer el restore de un backup existente | :o: | |5 | borrar datos de la db vieja | :ok: | |6 | arrancar con una db “limpia” | :o: | |7 | cambiar la IP usada para conectarse a la DB | :x: | |8 | hacer uqe "alguien" copie manualmente con un disco | :exclamation: | |9 | pedir a epec que haga la copia del volumen entre servers | :construction: | |10| hacer un restore selectivo de un backup | :x: | |11| hacer la restauracion en una maquina | :o: | |12| hacer snapshot de la VM | :exclamation: | ## Estado de la Base de datos ::: info **Tamaño en disco**: 542 GB :::spoiler resultado comando `du` ```bash ascentio@tmov03:/opt/thingsboard-db$ sudo du -hs * 100G backups 20K changesets 542G data 4.0K docker-compose.yml 4.0K thingsboard-db.service ascentio@tmov03:/opt/thingsboard-db$ ``` ::: ::: info **tamaño de las tablas de la base de datos (las 4 mayores)** |table_name |row_estimate|total |index |table | |-----------------------|------------|----------|----------|-------| |ts_kv |1.19997E+9 |397 GB |252 GB |145 GB | |event |7.399E+7 |139 GB |18 GB |110 GB | |ts_kv_latest |2.58909E+6 |2643 MB |1271 MB |1373 MB| |attribute_kv |327984 |1659 MB |643 MB |1017 MB| :::spoiler resultado completo |table_name |row_estimate|total |index |table | |-----------------------|------------|----------|----------|-------| |ts_kv |1.19997E+9 |397 GB |252 GB |145 GB | |event |7.399E+7 |139 GB |18 GB |110 GB | |ts_kv_latest |2.58909E+6 |2643 MB |1271 MB |1373 MB| |attribute_kv |327984 |1659 MB |643 MB |1017 MB| |alarm |5835 |815 MB |24 MB |204 MB | |relation |56013 |18 MB |10104 kB |8360 kB| |audit_log |7271 |3568 kB |368 kB |3112 kB| |pg_statistic |917 |2608 kB |136 kB |1840 kB| |widget_type |71 |1888 kB |16 kB |136 kB | |device_credentials |4037 |1416 kB |232 kB |1176 kB| |pg_depend |6490 |1200 kB |712 kB |488 kB | |device |4037 |1032 kB |240 kB |784 kB | |pg_proc |2961 |992 kB |344 kB |640 kB | |pg_attribute |3224 |720 kB |216 kB |504 kB | |pg_rewrite |121 |616 kB |32 kB |128 kB | |pg_description |3130 |544 kB |184 kB |352 kB | |dashboard |11 |336 kB |16 kB |48 kB | |component_descriptor |42 |304 kB |72 kB |168 kB | |pg_collation |705 |296 kB |80 kB |216 kB | |pg_class |441 |248 kB |120 kB |128 kB | |pg_operator |788 |240 kB |88 kB |152 kB | |pg_amop |722 |200 kB |112 kB |88 kB | |pg_type |451 |192 kB |72 kB |120 kB | |pg_amproc |458 |128 kB |64 kB |64 kB | |pg_constraint |31 |128 kB |80 kB |40 kB | |user_credentials |11 |112 kB |64 kB |40 kB | |pg_index |187 |104 kB |32 kB |72 kB | |pg_conversion |132 |104 kB |48 kB |56 kB | |sql_features |671 |96 kB |0 bytes |88 kB | |pg_opclass |134 |88 kB |32 kB |56 kB | |tb_user |11 |80 kB |32 kB |40 kB | |pg_cast |226 |80 kB |32 kB |48 kB | |pg_ts_config_map |304 |80 kB |32 kB |48 kB | |pg_opfamily |116 |80 kB |32 kB |48 kB | |pg_language |4 |72 kB |32 kB |40 kB | |pg_auth_members |3 |72 kB |32 kB |40 kB | |pg_ts_template |5 |72 kB |32 kB |40 kB | |pg_database |5 |72 kB |32 kB |40 kB | |pg_tablespace |2 |72 kB |32 kB |40 kB | |pg_authid |9 |72 kB |32 kB |40 kB | |pg_namespace |6 |72 kB |32 kB |40 kB | |pg_ts_parser |1 |72 kB |32 kB |40 kB | |pg_ts_dict |16 |72 kB |32 kB |40 kB | |pg_extension |1 |72 kB |32 kB |40 kB | |pg_am |6 |72 kB |32 kB |40 kB | |pg_ts_config |16 |72 kB |32 kB |40 kB | |pg_shdepend |9 |72 kB |32 kB |40 kB | |tenant |2 |64 kB |16 kB |40 kB | |rule_node |23 |64 kB |16 kB |40 kB | |rule_chain |5 |64 kB |16 kB |40 kB | |widgets_bundle |12 |64 kB |16 kB |40 kB | |pg_aggregate |138 |64 kB |16 kB |48 kB | |customer |1 |64 kB |16 kB |40 kB | |pg_init_privs |162 |64 kB |16 kB |48 kB | |admin_settings |2 |64 kB |16 kB |40 kB | |pg_shdescription |3 |64 kB |16 kB |40 kB | |pg_range |6 |56 kB |16 kB |40 kB | |pg_pltemplate |8 |56 kB |16 kB |40 kB | |databasechangeloglock |1 |56 kB |16 kB |40 kB | |databasechangelog |5 |48 kB |0 bytes |40 kB | |sql_languages |4 |48 kB |0 bytes |40 kB | |sql_implementation_info|12 |48 kB |0 bytes |40 kB | |sql_sizing |23 |48 kB |0 bytes |40 kB | |sql_parts |9 |48 kB |0 bytes |40 kB | |sql_packages |10 |48 kB |0 bytes |40 kB | |pg_trigger |0 |32 kB |24 kB |0 bytes| |pg_statistic_ext |0 |32 kB |24 kB |0 bytes| |pg_attrdef |0 |24 kB |16 kB |0 bytes| |pg_enum |0 |24 kB |24 kB |0 bytes| |pg_foreign_server |0 |16 kB |16 kB |0 bytes| |asset |0 |16 kB |8192 bytes|0 bytes| |entity_view |0 |16 kB |8192 bytes|0 bytes| |pg_user_mapping |0 |16 kB |16 kB |0 bytes| |pg_subscription |0 |16 kB |16 kB |0 bytes| |pg_inherits |0 |16 kB |16 kB |0 bytes| |pg_event_trigger |0 |16 kB |16 kB |0 bytes| |pg_db_role_setting |0 |16 kB |8192 bytes|0 bytes| |pg_foreign_data_wrapper|0 |16 kB |16 kB |0 bytes| |pg_policy |0 |16 kB |16 kB |0 bytes| |pg_replication_origin |0 |16 kB |16 kB |0 bytes| |pg_default_acl |0 |16 kB |16 kB |0 bytes| |pg_seclabel |0 |16 kB |8192 bytes|0 bytes| |pg_shseclabel |0 |16 kB |8192 bytes|0 bytes| |pg_transform |0 |16 kB |16 kB |0 bytes| |pg_publication |0 |16 kB |16 kB |0 bytes| |pg_publication_rel |0 |16 kB |16 kB |0 bytes| |pg_subscription_rel |0 |8192 bytes|8192 bytes|0 bytes| |pg_sequence |0 |8192 bytes|8192 bytes|0 bytes| |pg_partitioned_table |0 |8192 bytes|8192 bytes|0 bytes| |pg_largeobject_metadata|0 |8192 bytes|8192 bytes|0 bytes| |pg_largeobject |0 |8192 bytes|8192 bytes|0 bytes| |sql_sizing_profiles |0 |8192 bytes|0 bytes |0 bytes| |pg_foreign_table |0 |8192 bytes|8192 bytes|0 bytes| :::spoiler query sql ```sql SELECT table_name, row_estimate, pg_size_pretty(total_bytes) AS total , pg_size_pretty(index_bytes) AS index , pg_size_pretty(table_bytes) AS table FROM ( SELECT *, total_bytes-index_bytes-coalesce(toast_bytes,0) AS table_bytes FROM ( SELECT c.oid,nspname AS table_schema, relname AS table_name , c.reltuples AS row_estimate , pg_total_relation_size(c.oid) AS total_bytes , pg_indexes_size(c.oid) AS index_bytes , pg_total_relation_size(reltoastrelid) AS toast_bytes FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE relkind = 'r' ) a ORDER BY total_bytes Desc ) a; ``` ::: ::: info **Datos de los server** - srv-db: - ip: 192.168.209.28 y 10.1.45.28 - srv-tb: - ip: 192.168.209.27 y 10.1.45.27 ::: ## Plan 1: mover la db al chasis nuevo - **tareas**: - [ ] paso 1: crear la db en la vm - [ ] paso 2: scp de la VM vieja a la vm nueva - [ ] paso 3: verificación - [ ] paso 4: iniciar la lectura de los perfiles - **Problemas encontrados**: - Se intento este camino en el primer dia, pero nos dimos cuenta que la tasa de tranferencia es de aprox 1.4MB/s (Megabyte/sec) - el calculo para la tranferencia llevaria varios dias ::: danger **Estado**: Este plan no se ejecuto. ::: ## Plan 2: generar dump de la db sin la telemetria y exportarla en la nueva db - **tareas**: - [ ] paso 1: generar el dump - [ ] paso 2: crear la db en la vm - [ ] paso 3: hacer el restore - [ ] paso 4: verificación - [ ] paso 5: iniciar la lectura de los perfiles - **Problemas encontrados**: - existen microcortes en la conexion ssh, por lo que se genero un job en jenkins para que se pueda ejecutar - el proceso lleva 15hs ejecutandose y sigue en proceso ::: warning **Estado**: En ejecución... ::: ## Plan 3: generar dump de la db sin las tablas de mayor peso y exportarla en la nueva db - **tareas**: - [ ] paso 1: generar el dump - [ ] paso 2: crear la db en la vm - [ ] paso 3: hacer el restore - [ ] paso 4: verificación - [ ] paso 5: iniciar la lectura de los perfiles - **Problemas encontrados**: - se realizo el proceso via ssh - al realizar el dump se encontro un mensaje de error - microcortes en el proceso de restore ::: danger **Estado**: Error al hacer la restauracion del dump ::: spoiler detalle del error ```bash ascentio@tmov04:/opt/thingsboard-db$ sudo docker exec --interactive thingsboard-db_postgres_1 pg_restore -F c -U thingsboard -v -d thingsboard < backups/thingsboard-dump_26_08_2020_local_2.sql.gz pg_restore: connecting to database for restore pg_restore: creating TABLE "public.admin_settings" pg_restore: creating TABLE "public.alarm" pg_restore: creating TABLE "public.asset" pg_restore: creating TABLE "public.attribute_kv" pg_restore: creating TABLE "public.audit_log" pg_restore: creating TABLE "public.component_descriptor" pg_restore: creating TABLE "public.customer" pg_restore: creating TABLE "public.dashboard" pg_restore: creating TABLE "public.databasechangelog" pg_restore: creating TABLE "public.databasechangeloglock" pg_restore: creating TABLE "public.device" pg_restore: creating TABLE "public.device_credentials" pg_restore: creating TABLE "public.entity_view" pg_restore: creating TABLE "public.event" pg_restore: creating TABLE "public.relation" pg_restore: creating TABLE "public.rule_chain" pg_restore: creating TABLE "public.rule_node" pg_restore: creating TABLE "public.tb_user" pg_restore: creating TABLE "public.tenant" pg_restore: creating TABLE "public.user_credentials" pg_restore: creating TABLE "public.widget_type" pg_restore: creating TABLE "public.widgets_bundle" pg_restore: processing data for table "public.admin_settings" pg_restore: processing data for table "public.alarm" pg_restore: processing data for table "public.asset" pg_restore: processing data for table "public.attribute_kv" pg_restore: processing data for table "public.audit_log" pg_restore: processing data for table "public.component_descriptor" pg_restore: processing data for table "public.customer" pg_restore: processing data for table "public.dashboard" pg_restore: processing data for table "public.databasechangelog" pg_restore: processing data for table "public.databasechangeloglock" pg_restore: processing data for table "public.device" pg_restore: processing data for table "public.device_credentials" pg_restore: processing data for table "public.entity_view" pg_restore: processing data for table "public.event" pg_restore: [custom archiver] could not read from input file: end of file ascentio@tmov04:/opt/thingsboard-db$ ``` ::: ## Plan 4: hacer el restore de un backup existente - **tareas**: - [ ] paso 1: scp de la VM vieja a la vm nueva - [ ] paso 2: restore de la db - [ ] paso 3: verificación - [ ] paso 4: iniciar la lectura de los perfiles - **Problemas encontrados**: - microcortes, por lo que la tarea se repitio un par de veces - se genero un job de jenkins para poder completar el restore - al verificar la restauración, se perdieron los accesos de usuarios - problemas hasta poder definir el job de jenkins ::: danger **Estado**: Este plan no se completo. ::: ## Plan 5: borrar datos de la db vieja - **tareas**: - [ ] paso 1: descargar backup del mes de julio - [ ] paso 1: hacer la restauracion - [ ] paso 2: eliminar los datos de las tablas `events`, `ts_kv` y `ts_kv_latest` - [ ] paso 3: comprimir el volumen y pos scp subirlo con el nombre de archivo: `epec_gc_wthitout_tk.zip` - [ ] paso 4: integrar el volumen - [ ] paso 5: verificación - [ ] paso 6: hacer una lectura de perfiles para chequear la integración - [ ] paso 7: hacer una lectura de billing completa - [ ] paso 6: iniciar la lectura de los perfiles - **Problemas encontrados**: - no hemos podido restaurar un backup, por lo que se podria hacer una copia del volumen - solucion: restauramos ls db en una maquina personal ::: warning **Estado**: pendiente ::: ## Plan 6: arrancar con una db "limpia" - **tareas**: - [X] paso 1: generar una db en ambiente de desarrollo - [X] paso 2: hacer un backup del ambiente de desarrollo y ponerlo en producción - [X] paso 3: verificación - [ ] paso 4: integrar con los gateways - [ ] paso 5: iniciar lectura de billing para que se carguen los medidores - [ ] paso 6: migrar las vistas - [ ] paso 7: iniciar lectura de perfiles desde el 1 de agosto - **Problemas encontrados**: - ::: danger **Estado**: descartado ya que funciono el plan 5 y al querer replicar, notamos que no tenemos el detalle para replicar la VM tal cual esta en produccion. Ejemplo: no tenemos la contraseña del cliente que usa el sistema de facturación ::: ## Plan 7: cambiar la IP usada para conectarse a la DB - **tareas**: - [ ] paso 1: actualizar la configuración - [ ] paso 2: verificación - [ ] paso 3: iniciar la lectura de los perfiles - **Problemas encontrados**: - se repite el error por el que intentamos mover la db :::spoiler ```bash tb-gateway_1 | 2020-08-27 12:35:34,306 [nioEventLoopGroup-2-57] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:37,304 [nioEventLoopGroup-2-59] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:40,339 [nioEventLoopGroup-2-62] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:40,487 [nioEventLoopGroup-2-48] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-06_1 | 2020-08-27 12:35:41,471 [nioEventLoopGroup-2-18] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:41,452 [nioEventLoopGroup-2-50] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-07_1 | 2020-08-27 12:35:42,946 [nioEventLoopGroup-2-69] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:43,252 [nioEventLoopGroup-2-64] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-03_1 | 2020-08-27 12:35:42,686 [nioEventLoopGroup-2-52] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:42,839 [nioEventLoopGroup-2-54] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-05_1 | 2020-08-27 12:35:43,228 [nioEventLoopGroup-2-94] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-06_1 | 2020-08-27 12:35:43,551 [nioEventLoopGroup-2-20] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:44,530 [nioEventLoopGroup-2-55] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:45,312 [nioEventLoopGroup-2-66] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-03_1 | 2020-08-27 12:35:45,520 [nioEventLoopGroup-2-54] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-07_1 | 2020-08-27 12:35:45,062 [nioEventLoopGroup-2-97] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-06_1 | 2020-08-27 12:35:46,588 [nioEventLoopGroup-2-22] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:47,054 [nioEventLoopGroup-2-59] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:48,342 [nioEventLoopGroup-2-4] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-03_1 | 2020-08-27 12:35:47,645 [nioEventLoopGroup-2-57] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-05_1 | 2020-08-27 12:35:49,016 [nioEventLoopGroup-2-95] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:51,424 [nioEventLoopGroup-2-6] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-03_1 | 2020-08-27 12:35:51,574 [nioEventLoopGroup-2-60] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:51,052 [nioEventLoopGroup-2-62] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-07_1 | 2020-08-27 12:35:51,638 [nioEventLoopGroup-2-10] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-06_1 | 2020-08-27 12:35:51,864 [nioEventLoopGroup-2-24] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-04_1 | 2020-08-27 12:35:54,052 [nioEventLoopGroup-2-64] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-05_1 | 2020-08-27 12:35:54,624 [nioEventLoopGroup-2-96] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway-07_1 | 2020-08-27 12:35:54,971 [nioEventLoopGroup-2-12] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. tb-gateway_1 | 2020-08-27 12:35:55,314 [nioEventLoopGroup-2-8] WARN o.t.g.s.gateway.MqttGatewayService - Lost connection to ThingsBoard. ``` ::: ::: danger **Estado**: genera el mismo error de conexión ::: ## Plan 8: hacer que "alguien" copie manualmente con un disco - **tareas**: - [ ] paso 1: coordinar con alguien de EPEC para que haga una copia a disco de la carpeta y la copie en el segundo server - [ ] paso 2: integrar la copia con el sistema - [ ] paso 3: verificación - [ ] paso 4: iniciar lectura de perfiles - **Problemas encontrados**: - dependemos de la capacidad del disco que tenga la gente de epec - ::: danger **Estado**: la copia se realizo via remota ::: ## Plan 9: pedir a epec que haga la copia del volumen entre servers ::: danger **Estado**: se descarta ::: ## Plan 10: hacer un restore selectivo de un backup ::: danger **Estado**: no se pudo realizar ya que el backup generado es un sql. ::: ## Plan 11: hacer la restauracion en una maquina - **tareas**: - [ ] paso 1: restaurar la db en una máquina - [ ] paso 2: hacer backup del volumen - [ ] paso 3: verificación - [ ] paso 4: borrar los datos de telemetria - [ ] paso 5: mover la DB a produccion - [ ] paso 6: verificacion en produccion - [ ] paso 7: iniciar lectura de perfiles desde el 1 de agosto - **Problemas encontrados**: - - ::: danger **Estado**: se realizo, pero la db restaurada se uso para poder ejecutar el plan 5, ya que al tener menos datos, la db comprimida ocupaba 500MB. ::: ## Plan 12: hacer snapshot de la VM ::: danger **Estado**: descartado ::: ## Notas <!-- Other important details discussed during the meeting can be entered here. --> sss