Broke my server trying to configure cloud storage

fedward · 19 July 2023 18:12

I connected my server to use a Digital Ocean space and confirmed that it successfully uploaded media to the space, except it uploaded to the root directory and not the child path I would prefer.

I made what I thought would be the right change to my configuration, and instead I caused Akkoma to crash, and it won’t restart now. Even if I try to do a config dump on the command line I get the same error. Here’s the meat of the failure:

17:56:52.462 [error] Failed to start!
17:56:52.485 [error] {:error, {:shutdown, {:failed_to_start_child, Pleroma.Config.TransferTask, {:EXIT, {%Protocol.UndefinedError{protocol: Enumerable, value: nil, description: “”}, [{Enumerable, :impl_for!, 1, [file: ~c"lib/enum.ex", line: 1]}, {Enumerable, :reduce, 3, [file: ~c"lib/enum.ex", line: 166]}, {Enum, :each, 2, [file: ~c"lib/enum.ex", line: 4387]}, {Pleroma.Config.TransferTask, :configure, 1, [file: ~c"lib/pleroma/config/transfer_task.ex", line: 113]}, {Enum, :“-each/2-lists^foreach/1-0-”, 2, [file: ~c"lib/enum.ex", line: 984]}, {Pleroma.Config.TransferTask, :load_and_update_env, 2, [file: ~c"lib/pleroma/config/transfer_task.ex", line: 51]}, {Pleroma.Config.TransferTask, :start_link, 1, [file: ~c"lib/pleroma/config/transfer_task.ex", line: 34]}, {:supervisor, :do_start_child_i, 3, [file: ~c"supervisor.erl", line: 420]}]}}}}} > 17:56:52.486 [notice] Application pleroma exited: Pleroma.Application.start(:normal, ) returned an error: shutdown: failed to start child: Pleroma.Config.TransferTask
** (EXIT) an exception was raised:
** (Protocol.UndefinedError) protocol Enumerable not implemented for nil of type Atom. This protocol is implemented for the following type(s): DBConnection.PrepareStream, DBConnection.Stream, Date.Range, Ecto.Adapters.SQL.Stream, File.Stream, Floki.HTMLTree, Function, GenEvent.Stream, HashDict, HashSet, IO.Stream, Jason.OrderedObject, List, Map, MapSet, Phoenix.LiveView.LiveStream, Postgrex.Stream, Range, Stream, Timex.Interval
(elixir 1.15.0) lib/enum.ex:1: Enumerable.impl_for!/1
(elixir 1.15.0) lib/enum.ex:166: Enumerable.reduce/3
(elixir 1.15.0) lib/enum.ex:4387: Enum.each/2
(pleroma 3.9.3) lib/pleroma/config/transfer_task.ex:113: Pleroma.Config.TransferTask.configure/1
(elixir 1.15.0) lib/enum.ex:984: Enum.“-each/2-lists^foreach/1-0-”/2
(pleroma 3.9.3) lib/pleroma/config/transfer_task.ex:51: Pleroma.Config.TransferTask.load_and_update_env/2
(pleroma 3.9.3) lib/pleroma/config/transfer_task.ex:34: Pleroma.Config.TransferTask.start_link/1
(stdlib 4.3.1.2) supervisor.erl:420: :supervisor.do_start_child_i/3

I would be happy if I could revert it to local storage via the command line or even with sql, but I’m not sure how to do that.

How can I get my server up and running again?

fedward · 19 July 2023 20:03

Sigh. I managed to get my server up and running again by reloading my last configuration backup, but now I can’t configure any settings that don’t already exist. In the admin console, whenever I click an input that should have more options, instead the UI says “no data.” For instance, under Pleroma.Upload, “Uploader” is set to “Pleroma.Uploaders.Local” and I can’t even select the one for s3. On MRF, the selected policies include Pleroma.Web.ActivityPub.MRF.SimplePolicy, but the actual configuration block for it isn’t appearing in the interface.

What have I done, and how do I undo it?

ilja · 20 July 2023 05:18

i have no idea what went wrong or what’s causing your current issues, but fwiw, DB config is stored in the config table, select * from public.config c . Changing the values directly is non-trivial, so don’t try that. But if you know what config it is that is causing troubles, you can delete the row based on group and/or key.

Or maybe start with a hard refresh of the admin screen if you haven’t already tried that, maybe it’s just that some old values are cached, idk. ctrl+shift+r or ctrl+F5.

fedward · 20 July 2023 17:55

Yeah, it’s very odd. I first tried deleting the last couple configs I’d added just like you suggest, but there was clearly something corrupted that was causing that enumeration error. I got it up and running by truncating the config table and doing a fresh migrate_to_db, and I’ve been restoring the configuration changes I’d made since the last backup. I need better backup hygiene, but that’s another issue. (I have a better backup right now since I’ve used this opportunity to check that my configs survive a round trip from export to import, but I didn’t have that habit in general).

At this point posting and federation all work, and some option pickers in the admin interface work, but other option pickers still just come up as “no data.”

fedward · 20 July 2023 20:03

This just gets weirder. Logging isn’t turned on by default, and if I try to enable it the following things happen:

I get a 500 error in the browser with the admin UI
The server crashes.
The server will fail to restart (or even dump configs) until I do this: delete from config where key = ':backends';

I’m going to see what happens if I hardcode the logging in the prod.secret.exs file and do another migration. That should at least let me identify if the setting in the database is the problem or if something’s happening in the UI that’s causing the value to be corrupted on save.

snott · 21 July 2023 10:00

I have had issues before with in-db config (my own fault of course), I ended up purging the config from db and started again, migrating to db from the config file, and then re-entering what I needed back into AdminFE.

fedward · 21 July 2023 17:16

Yeah, that’s what I’ve done. In fact I’ve done it multiple times now just to test that I have a complete backup of my config. Remember kids, if you haven’t tested your backups, you don’t have backups.

fedward · 21 July 2023 17:28

So anyway. I hardcoded logging in prod.secret.exs and the value went into the DB without being corrupted. Once logging was working correctly fixing my S3 driver configuration was a snap (pro tips: the host value in the :ex_aws :s3 config should omit the “https://” part, and if you are using Digital Ocean Spaces and you want everything to go in a folder instead of the top level, put the folder in the bucket like it’s a URI (bucket: "bucketname/foldername").

As for the crashing and UI weirdness, my working theory is now that something got corrupted by the admin UI when I originally saved a change to my S3 configuration. I’ve been messing around with individual settings by using dump_to_file and load_from_file so I’m not reintroducing whatever problem happened in the admin UI. I suspect that something isn’t 100% right in the prod.secret.exs that I’ve created with all my round trips through migrate_from_db and migrate_to_db but the system hasn’t been crashing as I’ve been loading new settings through careful crafting (or iterative editing) of individual JSON files.

I would love to figure out how this happened in the first place and file a replicable bug, but I’m not there yet.