Cannot fetch own profile statuses while logged-in

dashie · 5 December 2022 19:23

While logged-in if I browse my profile, It will causes a request to be made to /api/v1/accounts/1/statuses?with_muted=true&exclude_replies=1&limit=20 which ends up failing in 502 because akkoma took too long to reply (I did add a few proxy_*_timeout to 2400 to nginx and same problem).

However, if I do that call while using private mode in firefox, or curl, it works and returns in 2/3s !

When the request fails while logged-in, I get the following in my postgresql log:

2022-12-05 20:17:32.181 CET [4137675] pleroma@pleroma ERROR:  canceling statement due to user request
2022-12-05 20:17:32.181 CET [4137675] pleroma@pleroma CONTEXT:  SQL statement "SELECT array_agg(following.follower_address)                           FROM following_relationships
          JOIN users ON users.id = following_relationships.follower_id
          JOIN users AS following ON following.id = following_relationships.following_id
          WHERE users.ap_id = actor"
        PL/pgSQL function thread_visibility(character varying,character varying,character varying) line 11 at SQL statement
2022-12-05 20:17:32.181 CET [4137675] pleroma@pleroma STATEMENT:  SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", b2."id", b2."user_id", b2."activity_id", b2."inserted_at", b2."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at", NOT (t3."id" IS NULL) FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') LEFT OUTER JOIN "bookmarks" AS b2 ON (b2."user_id" = $1) AND (b2."activity_id" = a0."id") LEFT OUTER JOIN "thread_mutes" AS t3 ON (t3."user_id" = $2) AND (t3."context" = a0."data"->>'context') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f4 ON TRUE WHERE ((($3 && a0."recipients")) OR (a0."actor" = $4)) AND (o1."data"->>'inReplyTo' is null) AND (a0."actor" = $5) AND (a0."data"->>'type' = ANY($6)) AND (not (a0."actor" = ANY($7))) AND (((not (a0."recipients" && $8)) or a0."actor" = $9)) AND ((recipients_contain_blocked_domains(a0."recipients", $10) = false) or a0."actor" = $11) AND (not (a0."data"->>'type' = 'Announce' and a0."data"->'to' ?| $12)) AND ((not (split_part(a0."actor", '/', 3) = ANY($13))) or a0."actor" = ANY($14)) AND ((not (split_part(o1."data"->>'actor', '/', 3) = ANY($15))) or (o1."data"->>'actor') = ANY($16)) AND (thread_visibility($17, (a0."data")->>'id', $18) = true) AND (not ( a0."data"->>'type' = 'Announce' and a0."actor" = ANY($19))) AND (not(o1."data"->>'type' = 'Answer')) AND (NOT (a0."actor" = ANY($20))) ORDER BY a0."id" desc nulls last LIMIT $21

Idk if useful, but in my akkoma config for postgres I have:

pool_size: 50,
queue_target: 5000,
timeout: 120000,
prepare: :named,
parameters: [
plan_cache_mode: "force_custom_plan"
]

First two ones are from the pleroma era and db issues, the timeout has been added in hope to “fix” the long response, without success. The whole DB has been reimported a few days ago so indexes are “fresh”.

Any ideas how to debug that ? I am on latest stable, and that was previously a pleroma I’ve been using since 2018 for an idea of the mess the database contains.

FloatingGhost · 5 December 2022 19:57

going to need a bunch more info

what pleroma version did you migrate from?

what postgres version?

also turn the postgres pool down to 10, that shouldn’t be increased

dashie · 5 December 2022 20:05

it was a pleroma 2.4.2 and postgresql 15.1.

ok for the pool, I’ve restarted with it to 10.

FloatingGhost · 5 December 2022 20:39

i have a command for you

can you run

wget https://akkoma.dev/AkkomaGang/doctor/raw/branch/main/schema_diff.sh && bash schema_diff.sh DATABASE_NAME

and report the output?

dashie · 5 December 2022 20:50

done, result:

DIFF FOLLOWS
54a55
> 20180829082743
74a76
> 20190128181211
100a103
> 20190510135645
113a117
> 20190711042020
120a125
> 20190824195028
161a167
> 20191104143558

FloatingGhost · 5 December 2022 20:56

interesting, you’ve got a bunch of migrations which i can find no reference to in pleroma nor akkoma - i bet you they cause the issue

might have to dive deeper into a schema diff

can you run

pg_dump -s DBNAME | sed "s|; Owner: .*||"

and attach the output? (should be a whole bunch of sql)

dashie · 5 December 2022 21:20

I was able to find thoses ones:

20190510135645 git.pleroma.social/pleroma/pleroma/-/commit/8d4e0342e1b5ebbe486dc538e3c8fe81d53220e6
20190711042020 & 20191104143558 git.pleroma.social/pleroma/pleroma/-/issues/2354#note_80517 references them as file not found

as for the pg_dump, here it is: https://dev.otter.sh/-/snippets/8/raw/main/snippetfile1.txt

FloatingGhost · 5 December 2022 22:21

ok doing a diff between that schema and the one i have, the difference is rum

now i’ve absolutely no guarantee that this will work, none whatsoever, but you’ve got a backup so what’s the worst that can happen? (slight pained expression)

want to be a guinea pig?

config :pleroma, :database, rum_enabled: false

then

MIX_ENV=prod mix ecto.rollback --migrations-path priv/repo/optional_migrations/rum_indexing

and see how that goes

rum is typically slower so maybe it’s noticably slower in this case

dashie · 6 December 2022 07:02

Yeah I am able to make backups that’s not an issue.
I’m using a docker build so I will have to look how to do that rollback properly-ish.
I will do that later this week as the previous restore I had to do was… not quick (db is a 18GB monster).

dashie · 6 December 2022 13:54

It’ s done but same issue.

Difference between the two schemas is now:

169,184d168
< -- Name: objects_fts_update(); Type: FUNCTION; Schema: public
< --
<
< CREATE FUNCTION public.objects_fts_update() RETURNS trigger
<     LANGUAGE plpgsql
<     AS $$
<     begin
<       new.fts_content := to_tsvector('english', new.data->>'content');
<       return new;
<     end
<     $$;
<
<
< ALTER FUNCTION public.objects_fts_update() OWNER TO pleroma;
<
< --
1428,1429c1412
<     updated_at timestamp without time zone NOT NULL,
<     fts_content tsvector
---
>     updated_at timestamp without time zone NOT NULL
2904c2887
< CREATE INDEX objects_fts ON public.objects USING rum (fts_content public.rum_tsvector_addon_ops, inserted_at) WITH (attach=inserted_at, "to"=fts_content);
---
> CREATE INDEX objects_fts ON public.objects USING gin (to_tsvector('english'::regconfig, (data ->> 'content'::text)));
3174,3180d3156
< -- Name: objects tsvectorupdate; Type: TRIGGER; Schema: public
< --
<
< CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON public.objects FOR EACH ROW EXECUTE FUNCTION public.objects_fts_update();
<
<
< --

FloatingGhost · 6 December 2022 16:24

ok, not rum then

further SQL debugging

you’re from-source right?

any chance you could switch to the diagnostics branch (you’ll have to ecto migrate on that)

then you can run

mix pleroma.diagnostics user_timeline your_username your_username

to get query plan

dashie · 6 December 2022 16:46

Yes I’m building a docker image from the repository.

I will try tomorrow and report back, thanks!

dashie · 12 December 2022 12:23

Finally have some time to do that, but there is no diagnostics branch I can fetch on AkkomaGang/akkoma: Magically expressive social media - akkoma - Akkoma Development

update: it seems to have been merged in develop, and then stable.

I’m updating stable and will test.

dashie · 12 December 2022 12:56

Ok I ran the diagnostics command, and it is extremely fast… meanwhile the mentioned endpoint in the first post still timeout when logged-in.

Here is the final explain analyze:

Limit  (cost=6253.18..6699.77 rows=20 width=1702) (actual time=472.605..657.366 rows=20 loops=1)
  ->  Nested Loop Left Join  (cost=6253.18..12728.79 rows=290 width=1702) (actual time=472.605..657.355 rows=20 loops=1)
        Join Filter: ((t3.context)::text = (a0.data ->> 'context'::text))
        Rows Removed by Join Filter: 60
        ->  Nested Loop Left Join  (cost=6253.18..12722.68 rows=290 width=1701) (actual time=472.592..657.294 rows=20 loops=1)
              ->  Nested Loop  (cost=6253.03..12629.47 rows=290 width=1645) (actual time=472.587..657.231 rows=20 loops=1)
                    ->  Nested Loop  (cost=0.84..16.89 rows=1 width=76) (actual time=0.034..0.035 rows=1 loops=1)
                          ->  Index Scan using users_ap_id_index on users  (cost=0.42..8.44 rows=1 width=38) (actual time=0.022..0.022 rows=1 loops=1)
                                Index Cond: ((ap_id)::text = 'https://pleroma.otter.sh/users/dashie'::text)
                                Filter: is_active
                          ->  Index Scan using users_ap_id_index on users u5  (cost=0.42..8.44 rows=1 width=38) (actual time=0.012..0.012 rows=1 loops=1)
                                Index Cond: ((ap_id)::text = 'https://pleroma.otter.sh/users/dashie'::text)
                                Filter: (NOT invisible)
                    ->  Nested Loop  (cost=6252.19..12609.69 rows=290 width=1645) (actual time=472.550..657.184 rows=20 loops=1)
                          ->  Bitmap Heap Scan on activities a0  (cost=6251.63..10110.73 rows=291 width=778) (actual time=472.527..656.745 rows=20 loops=1)
                                Recheck Cond: (((data ->> 'type'::text) = ANY ('{Create,Announce}'::text[])) AND ((actor)::text = 'https://pleroma.otter.sh/users/dashie'::text))
                                Rows Removed by Index Recheck: 72
                                Filter: thread_visibility('https://pleroma.otter.sh/users/dashie'::character varying, ((data ->> 'id'::text))::character varying, 'https://pleroma.otter.sh/#Public'::character varying)
                                Rows Removed by Filter: 1
                                Heap Blocks: exact=13 lossy=9
                                ->  BitmapAnd  (cost=6251.63..6251.63 rows=928 width=0) (actual time=440.971..440.971 rows=0 loops=1)
                                      ->  Bitmap Index Scan on activities_context_index  (cost=0.00..1894.35 rows=59898 width=0) (actual time=409.545..409.545 rows=4861403 loops=1)
                                            Index Cond: ((data ->> 'type'::text) = ANY ('{Create,Announce}'::text[]))
                                      ->  Bitmap Index Scan on "activities_actor_id_DESC_NULLS_LAST_index"  (cost=0.00..4356.88 rows=92843 width=0) (actual time=11.588..11.588 rows=86056 loops=1)
                                            Index Cond: ((actor)::text = 'https://pleroma.otter.sh/users/dashie'::text)
                          ->  Index Scan using objects_unique_apid_index on objects o1  (cost=0.56..8.59 rows=1 width=867) (actual time=0.018..0.018 rows=1 loops=20)
                                Index Cond: ((data ->> 'id'::text) = COALESCE(((a0.data -> 'object'::text) ->> 'id'::text), (a0.data ->> 'object'::text)))
                                Filter: ((data ->> 'type'::text) <> 'Answer'::text)
              ->  Index Scan using bookmarks_user_id_activity_id_index on bookmarks b2  (cost=0.15..0.32 rows=1 width=56) (actual time=0.002..0.002 rows=0 loops=20)
                    Index Cond: ((user_id = '00000000-0000-0000-0000-000000000001'::uuid) AND (activity_id = a0.id))
        ->  Materialize  (cost=0.00..1.04 rows=1 width=524) (actual time=0.001..0.001 rows=3 loops=20)
              ->  Seq Scan on thread_mutes t3  (cost=0.00..1.04 rows=1 width=524) (actual time=0.005..0.006 rows=3 loops=1)
                    Filter: (user_id = '00000000-0000-0000-0000-000000000001'::uuid)
Planning Time: 3.918 ms
Execution Time: 659.274 ms

FloatingGhost · 12 December 2022 13:20

ok there is one other possibility

you could try disabling rich media

https://docs.akkoma.dev/stable/configuration/cheatsheet/#rich_media-consumer

if the db is returning happily this might be the issue

dashie · 12 December 2022 13:27

turned it off in adminfe, saved, and even restarted the container, and no luck, still having that cursed issue.

FloatingGhost · 12 December 2022 13:30

you could try finding out if it’s a specific status messing with you

take the endpoint `/api/v1/accounts/1/statuses?with_muted=true&exclude_replies=1&limit=1 and gradually increase limit until it errors, then we’ll know it’s a specific status

dashie · 12 December 2022 13:45

it errors even with limit=1

dashie · 12 December 2022 15:08

Did more tests, this happens with any users I try to view the timeline (tried my local user, and a remote user), logged-in nothing load (except pinned posts apparently? but that might be from another endpoint) and logged-out no issues.

I have placed a few Logger.debug() around stuff and the last one before it timeout is lib/pleroma/web/activity_pub/activity_pub.ex function def fetch_activities(recipients, opts \\ %{}, pagination \\ :keyset) do, placed my logger right before the list_memberships, so it’s happening somewhere in fetch_activities_query I guess.

After that it start to call way too many stuff around and I don’t really have time for now to dig more.
However I can run any patch or tests you might have to debug more that issue.

FloatingGhost · 12 December 2022 15:37

that would be extremely weird, since the mix task you ran earlier does that exact query, to the letter