Hey there,
I seem to have what I would caracterize as federation issues on my solo instance.
Symptoms are:
a variable number of posts/boosts by followed accounts never seem to reach my timeline; used to be one here and there, but recently I’ve been seemingly missing all posts from 3 heavy posters; it lasted for a while then somehow came back for one of them, although in a chaotic way
the follow mechanism seems to be broken: for 2 of the accounts mentioned above, I unfollowed and then followed again, thinking it might reset some broken link: alas, I was no no longer able to restore the follow! On my end, it says “Request sent!” forever, although on the followed account’s profile on their instance, my account appears in the followers list… I tested by creating an account on a big mastodon instance and noticed the same behaviour: distant says I’m a follower, but local says “request sent!” forever
My instance or account don’t seem to be blocked, I’m still successfully following accounts from those instances.
Also, my instance seems to work correctly with other accounts, it’s not like if my timeline was empty.
Is there a way for me to investigate this any further? Instance logs or browser console don’t seem to have any info on this, or at least I couldn’t find it.
Cheers!
I eventually manually updated table following_relationships to change statuses “pending” (1) to “following” (2) for the concerned profiles
this (obviously) changed the visible status when checking the followed profiles on my instance to “followed” and added them to the list of people I follow in my profile
this also seemed to be enough to reinstate the follow from the feed perspective, because I started to see some posts and reposts from these people in my feed again – not all of them, though, for some reason
I noticed that the following_relationships table contained seemingly all activity related to trying to follow people, so maybe it got corrupt at some point or there is a bug somewhere in the way it is managed, I don’t know.
Now, the fetch part is still chaotic but improving(?): I seem to receive recent activity in a timely manner in most cases, although from time to time I get pretty old posts/reposts (several days to several months) for some reason (might also be related to the DB maintenance tasks I ran shortly after my first post).
OK so I might have found the issue, thanks to the new Oban dashboard…
I opened it for the first time today and noticed in the Jobs panel that there were dozens of jobs started more than a year ago that would never stop.
After first increasing the queue limits that were exceeded, I realized that I could cancel the jobs altogether directly through the web interface – and that I should, given these jobs were outdated fetch requests that most likely cluttered up the job pipe, which would explain my previous problems.
I’m still monitoring incoming messages to confirm updates are now back to normal.
In any case, I don’t think pleroma_ctl has any way of figuring out what jobs are locked/pending, so the Oban dashboard really is a nice feature!
For some reason a migration hadn’t properly applied and now jobs wouldn’t run. The most notable part of it, was that old jobs weren’t removed, so I had a whole bunch with state ‘completed’, ‘cancelled’ and ‘discarded’, while those should be removed.
Federation worked partly, but retries didn’t happen. It took me much longer to realise this, though, so I’m not sure if it’s the same problem as you are having.
In my case, the database was missing a table oban_peers. I forced the migration to run again, although in a somewhat hacky way.
Ah! Thanks for your insight, but I eventually figured out where the issue was and it’s far simpler (and stupider) than that.
I occurred to have set up a borked fail2ban rule which would ban most fediverse servers over time -_-"
Maybe one thing that might be improved from the Akkoma side is that even debug won’t show you the http status and message the distant server receives (and sends to your server) when it tries to confirm the subscription.
I set up a Mitra instance, which does show this, and this is what eventually helped me get to the root cause, as the 401 error message was clear enough for me to understand distant servers were somehow blacklisted from my side (considering I didn’t know the process flow behind AP subscription).