Akkoma server going down daily

sloth · 15 August 2024 07:58

my self-hosted server has started to go down daily, if Akkoma is fine it has to be something else, how do I know?

I see this in /var/log/nginx/error.log:

7u2024/08/15 01:18:14 [crit] 925#925: *46468 SSL_do_handshake() failed (SSL: error:0A00006C:SSL routines::bad key share) while SSL handshaking, client: 40.83.135.136, server: 0.0.0.0:443
2024/08/15 01:41:58 [crit] 923#923: *47580 SSL_do_handshake() failed (SSL: error:0A00006C:SSL routines::bad key share) while SSL handshaking, client: 212.102.40.218, server: 0.0.0.0:443
2024/08/15 01:55:10 [error] 925#925: *48112 unexpected range in slice response: 0-1048576 while reading response header from upstream, client: 66.249.70.174, server: media.example.com, request: “GET /proxy/rMN3qUOiAl3ROGTGwz4U9mRMhaY/aHR0cHM6Ly9pbWFnZS5ub3N0ci5idWlsZC9mZWFlODQzNTI4YzMwMzQzOTY2ZWY5ZWQyOTZmMGNiN2E5ZTNlNTY1YjUzZWM0YjFhNjdmYmU3ZDQzZDY2ZWIxLmpwZw/feae843528c30343966ef9ed296f0cb7a9e3e565b53ec4b1a67fbe7d43d66eb1.jpg HTTP/1.1”, upstream: “http://127.0.0.1:4000/proxy/rMN3qUOiAl3ROGTGwz4U9mRMhaY/aHR0cHM6Ly9pbWFnZS5ub3N0ci5idWlsZC9mZWFlODQzNTI4YzMwMzQzOTY2ZWY5ZWQyOTZmMGNiN2E5ZTNlNTY1YjUzZWM0YjFhNjdmYmU3ZDQzZDY2ZWIxLmpwZw/feae843528c30343966ef9ed296f0cb7a9e3e565b53ec4b1a67fbe7d43d66eb1.jpg”, host: "media.example.com
2024/08/15 01:57:00 [crit] 924#924: *48189 SSL_do_handshake() failed (SSL: error:0A00006C:SSL routines::bad key share) while SSL handshaking, client: 167.172.34.93, server: 0.0.0.0:443
2024/08/15 04:11:21 [error] 924#924: OCSP responder timed out (110: Connection timed out) while requesting certificate status, responder: e5.o.lencr.org, peer: 184.51.252.197:80, certificate: “/etc/letsencrypt/live/subdomain.example.com/fullchain.pem”
2024/08/15 04:11:21 [error] 923#923: OCSP responder timed out (110: Connection timed out) while requesting certificate status, responder: e5.o.lencr.org, peer: 184.51.252.197:80, certificate: “/etc/letsencrypt/live/subdomain.example.com/fullchain.pem”
2024/08/15 03:17:14 [error] 957#957: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 178.63.49.103, server: subdomain.example.com, request: “POST /inbox HTTP/1.1”, upstream: “http://127.0.0.1:4000/inbox”, host: “subdomain.example.com

ilja · 15 August 2024 08:40

i am 100% guessing here, but that seems like you have proxy enabled, and now it tries to fetch remote media through the proxy, but the other server is having problems with it’s https certificates (or maybe just doesn’t exist (any more)?). Akkoma shouldn’t go down bc of that, though

Can you reproduce this? Like if you go to <your_instance>/proxy/rMN3qUOiAl3ROGTGwz4U9mRMhaY/aHR0cHM6Ly9pbWFnZS5ub3N0ci5idWlsZC9mZWFlODQzNTI4YzMwMzQzOTY2ZWY5ZWQyOTZmMGNiN2E5ZTNlNTY1YjUzZWM0YjFhNjdmYmU3ZDQzZDY2ZWIxLmpwZw/feae843528c30343966ef9ed296f0cb7a9e3e565b53ec4b1a67fbe7d43d66eb1.jpg, do you get this error? And, if so, does Akkoma go down?

You can also check the Akkoma logs after going down. When using systemd, that should be journalctl -u akkoma. Or if you want to follow what new loglines come in real-time journalctl -fu akkoma

sloth · 15 August 2024 11:13

I get only 404 Not Found, Akkoma doesn’t go down for that

when the site is down I get “The connection has timed out” error and I have to unplug the server and restart it, there is no SSH to the server when the site is down

FloatingGhost · 15 August 2024 13:47

wait i’m confused

so you’ve given a log for something that just 404s and not the totally seperate issue that takes your server down?

from your description it sounds like you run out of memory

sloth · 15 August 2024 17:09

please be patient, I am running Akkoma as a non-techie

normally the CPU runs with about 90% idle, but there are internal server errors and after such an error when I check the server all the memory plus the swap file are in use, but that doesn’t kill the server, I can see it happening.

When the server goes down I can’t see anything, no SSH, I have to unplug the server and restart

the server crash happens the same time every day, during the night in my time zone.

If it is not Akkoma, what could it be?

ilja · 15 August 2024 17:37

So you have a time frame where you can see the resources being used, right? You can use top to see what processes are running and what resources they take. If you are on a Debian based distro (e.g. Debian or Ubuntu (possibly other distro’s too, but i’m unsure)), then you can press shift+M to sort on memory usage and shift+P for sorting on CPU usage. This should tell you what process is using everything up and if it’s Akkoma or not.

sloth · 16 August 2024 07:16

I had a performance monitor (Glances) on the server but I took it away, there are simple commands that do the same thing, free -h and mpstat -P ALL and cat /sys/class/thermal/thermal_zone*/temp

there are days without server crash, hopefully the problem will go away by itself

sloth · 17 August 2024 07:17

the problem is still there, most days the server goes down during the night, no SSH, has to be unplugged and restarted, it is a Raspberry Pi 4, 4GB memory
what would you do for to find out why this happens?

root@server:/home/debian# free -h
total used free shared buff/cache available
Mem: 3.7Gi 851Mi 1.7Gi 173Mi 1.4Gi 2.9Gi
Swap: 199Mi 0B 199Mi

Oneric · 18 August 2024 23:03

Check dmesg before rebooting or wherever it is logged to for prior boots (depends on distro; e.g. /var/log/kern.log*) if you ran out of memory and some daemons were OOM killed.

If this doesn’t help, perhaps bring back the performance monitor you dropped before to get more insight into what’s happening before Akkoma and SSH goes down. Even if it’s related to Akkoma, without more info it’s impossible to help you

sloth · 19 August 2024 01:54

I see this by the command journalctl -k -b -2 -r

Aug 17 20:30:54 server kernel: Out of memory: Killed process 939 (beam.smp) total-vm:6556396kB, anon-rss:3413188kB, file-rss:5376kB>
Aug 17 20:30:54 server kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,>

I don’t have /var/log/kern.log in my Debian12

why should the whole server go down if Akkoma goes down?

my problem is similar to this:

“Debian 12 system crash/halt/shutdown at 3am (and random crashes)”

“Hi, around 10 days ago I updated my Debian server from 12.5 to 12.6 and since that the system crashes/halts every day at 3am, and sometimes just randomly crashes/halts, and every time this happens I have to hard reset the system.”

https://www.reddit.com/r/debian/comments/1dxx968/debian_12_system_crashhaltshutdown_at_3am_and/?rdt=61915

Oneric · 19 August 2024 17:29

You previously mentioned no SSH, no Akkoma. If you’re running out of memory and processes get killed it’s possible not only Akkoma but SSH also got killed. If memory is the issue the real solution is to get more memory or move some services to a different system. You may be able to alleviate some symptoms with cgroup limits (to set expectations of how much memory a process should be able to use and have more control over when what gets killed), zram/zswap and swap, but they’re only plastering over the lack of resources not solving it.

If your whole system freezes it cannot be related to Akkoma and if it’s indeed the same as the thread you linked it’s some kernel and/or watchdog bug. We cannot help you with bug in core system or any other non-Akkoma components.

sloth · 21 August 2024 08:02

this is what happens before the server goes down
error-1

then the memory use and CPU shoots up and the server goes down
error-2

this was yesterday, after the relaunch it looks different in Monitorix

is there a minimum RAM requirement in Akkoma now that exceeds the 4GB that I have?

I use Nostr and things, maybe that is the problem?

sloth · 29 August 2024 18:24

zram/zswap and swap didn’t help, now I have Monit restarting Akkoma when certain amount of RAM has been used. The server doesn’t go down anymore.

I don’t know how much memory Akkoma needs, Pleroma used to run with 2GB RAM

snott · 5 September 2024 05:38

yeah I used to run on 1GB ram and 512 swap, and set swappiness real low so it cleared or it would just fill and bog down. At 2GB I never had that issue, but I believe Elixir apps grow to fill (reserve) available memory.

sloth · 12 January 2025 20:57

the update to v3.14.0 solved the problem
I kept the server running with the help of Monit, now it’s not needed
memory allocation looks different now

I saw that there is v3.14.1 now, how can I get that?
the system gave me v3.14.0 not v3.14.1