Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug abortion of automatic backups #25

Open
simon-wacker opened this issue Jan 4, 2022 · 0 comments
Open

Debug abortion of automatic backups #25

simon-wacker opened this issue Jan 4, 2022 · 0 comments
Labels
bug Something isn't working ISE Fraunhofer Institute for Solar Energy Systems ISE production

Comments

@simon-wacker
Copy link
Contributor

After running make backup-database multiple times in the directory /app/machine aborting at least once during the process, the database service became unhealthy, exited and attempted various restarts. The logs read

database_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
database_1  | 2022-01-04 16:19:08.898 UTC [1] LOG:  starting PostgreSQL 13.4 (Debian 13.4-4.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
database_1  | 2022-01-04 16:19:08.898 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database_1  | 2022-01-04 16:19:08.898 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database_1  | 2022-01-04 16:19:08.902 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1  | 2022-01-04 16:19:08.907 UTC [25] LOG:  database system was shut down at 2022-01-04 15:57:02 UTC
database_1  | 2022-01-04 16:19:08.907 UTC [25] LOG:  invalid record length at 0/16CC8D8: wanted 24, got 0
database_1  | 2022-01-04 16:19:08.907 UTC [25] LOG:  invalid primary checkpoint record
database_1  | 2022-01-04 16:19:08.907 UTC [25] PANIC:  could not locate a valid checkpoint record
database_1  | 2022-01-04 16:19:09.263 UTC [1] LOG:  startup process (PID 25) was terminated by signal 6: Aborted
database_1  | 2022-01-04 16:19:09.267 UTC [1] LOG:  aborting startup due to startup process failure
database_1  | 2022-01-04 16:19:09.283 UTC [1] LOG:  database system is shut down

and

database_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
database_1  | 2022-01-04 16:33:28.642 UTC [1] LOG:  starting PostgreSQL 13.5 (Debian 13.5-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
database_1  | 2022-01-04 16:33:28.643 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database_1  | 2022-01-04 16:33:28.643 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database_1  | 2022-01-04 16:33:28.663 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1  | 2022-01-04 16:33:28.670 UTC [25] LOG:  database system was shut down at 2022-01-04 16:32:47 UTC
database_1  | 2022-01-04 16:33:28.744 UTC [1] LOG:  database system is ready to accept connections
database_1  | 2022-01-04 16:33:40.877 UTC [1] LOG:  received fast shutdown request
database_1  | 2022-01-04 16:33:40.879 UTC [1] LOG:  aborting any active transactions
database_1  | 2022-01-04 16:33:40.883 UTC [1] LOG:  background worker "logical replication launcher" (PID 31) exited with exit code 1
database_1  | 2022-01-04 16:33:40.885 UTC [26] LOG:  shutting down
database_1  | 2022-01-04 16:33:40.905 UTC [1] LOG:  database system is shut down
...
database_1  | 2022-01-04 15:53:51.648 UTC [1] LOG:  database system is shut down
database_1  | PostgreSQL Database directory appears to contain a database; Skipping initialization
database_1  | 2022-01-04 15:53:52.464 UTC [1] LOG:  starting PostgreSQL 13.4 (Debian 13.4-4.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
database_1  | 2022-01-04 15:53:52.465 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database_1  | 2022-01-04 15:53:52.465 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database_1  | 2022-01-04 15:53:52.468 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database_1  | 2022-01-04 15:53:52.474 UTC [25] LOG:  database system was shut down at 2022-01-04 15:53:41 UTC
database_1  | 2022-01-04 15:53:52.479 UTC [1] LOG:  database system is ready to accept connections
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  could not open file "postmaster.pid": No such file or directory
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  performing immediate shutdown because data directory lock file is invalid
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  received immediate shutdown request
database_1  | 2022-01-04 15:56:52.191 UTC [1] LOG:  could not open file "postmaster.pid": No such file or directory
database_1  | 2022-01-04 15:56:52.193 UTC [29] WARNING:  terminating connection because of crash of another server process
database_1  | 2022-01-04 15:56:52.193 UTC [29] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
database_1  | 2022-01-04 15:56:52.193 UTC [29] HINT:  In a moment you should be able to reconnect to the database and repeat your command.

How did that happen?

To fix the problem I dropped into a shell in a new database container by running

docker-compose --file docker-compose.production.yml --project-name database_production run database bash

I changed to directory /usr/lib/postgresql/13/bin and I became the user `postgres by running

cd /usr/lib/postgresql/13/bin
su postgres

I then reset the write-ahead log by running

./pg_resetwal /var/lib/postgresql/data/

This may leave the database in an inconsistent state. For details see the documentation of pg_resetwal. However, it makes it possible to start the server again. You can the restore the database from the latest backup.

@simon-wacker simon-wacker added bug Something isn't working production labels Jan 4, 2022
@christoph-maurer christoph-maurer moved this to 📋 Backlog in EQWIN-P Jan 17, 2023
@simon-wacker simon-wacker added the ISE Fraunhofer Institute for Solar Energy Systems ISE label Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ISE Fraunhofer Institute for Solar Energy Systems ISE production
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant