Swarm mode almanac

This page is a Work In Progress

A page to understand WTF is going on with Swarm mode and how we rely on it, how we might not rely on it and other related threads. Please add to this page as you see fit! If we can establish some shared understanding of what is going on under the hood, we can come up with a collective solution which meets everyones needs.

Support matrix¶

In practice, this is what we currently rely on Swarm mode for.

Feature	Explanation
Encrypted secrets	When you run `abra secret generate`, it uses something like `printf foo \| docker secret create foo -` under the hood. This feature only works if you have first run `docker swarm join`. Swarm mode securely transports and stores your secret encrypted on the server. `docker compose` does not support encrypting or storing secrets because it only runs client-side.
Template driver	If you use `template_driver: golang` in your `compose.yml` to insert secrets or environment variables into your configs, then you are using a template driver. This feature has almost 0 documentation and does not appear to be supported by the actual Compose Spec and is actually completely blocked by `docker compose` (source). Several recipes use this feature and it seems quite crucial for our usage.
Stacks	Firstly, a service is key concept here. A stack is then a shared namespace of services with networks, volumes, configs etc. The concept of a stack is a unique to Swarm mode. Any replacement for Swarm mode would have to implement this kind of namespacing feature for backwards compatibility purposes. See `psviderski/uncloud#94` for more.
Orchestration	When you run `abra app deploy`, we're running a slightly customised `docker stack deploy` under the hood. Swarm mode is supposed to automagically handle zero downtime updates and rollbacks if things fail. However, we're seeing the limitations of this approach.

Unsupport matrix¶

Feature	Explanation
Multi-node	It is possible but it doesn't seem like anyone in our community is really doing this. We believe the majority of Co-op Cloud installs are single node. There is also a lack of CSI support for coordinating storage across multiple hosts when using Swarm mode. This means we kind of throw out the majority of the features of Swarm mode.

Limitations¶

Swarm mode is still eerily underdeveloped and lacking features as a system. There are still some lurking network and stability bugs which are common. We're grateful for the undercover live reporting from people in-the-know adjacent to our network below. There are even folx inside Docker who are apparently calling it abandonware (source). All this does not really put us at ease.

Docker whiskey leaks

https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/

Mirantis' relationship with "swarm" is very confusing! my understanding is that there are people (or one person? lol) at mirantis who do some work on the orchestration engine that is "docker swarm," but only to the extent that it supports mirantis' platform. i don't believe there's any active feature development beyond that. you're right that it's a misleading headline -- it sounds to me that they're just saying that they'll continue swarm support in their v3 kubernetes platform, not that they're committed to developing swarm as an orchestration system.

Way back when (i guess in 2019? before my time!), docker sold off its enterprise platform which was called "swarm" to mirantis, so that's still a product that mirantis has and has developed in their way, but it's not the open-source swarm(kit) that's part of the docker cli. this is a good quick explanation: https://forums.docker.com/t/docker-swarmkit-and-the-mirantis-deal-not-docker-swarm/88886

The orchestration features of Swarm mode are opaque, causing failed deployments to be difficult to understand. This can cause a litany of a issues. For example, in the case where your database has been migrated and a rollback of your failing app doesn't support the new schema. This is being discussed extensively on organising#682.

Potential alternatives¶

uncloud.run: The Uncloud folks are creating a very different system. Something beyond compose but not k8s and not Swarm. This means they have to implement a lot of features of the orchestration from scratch. However, they're going for a nice approach: a straight-forward imperative deployment model (supports depends_on for predictable ordering during deployments). They're choosing which parts of the Compose Spec they implement and it's noteworthy that they don't implement secrets yet. See the Compose support matrix for more. They are however very focused on multi-node functionality. It's a system to keep an eye on with the hope that we can use some part of it in the future. Lines of communication have been opened.
docker compose: Plain old docker compose. A more elegant weapon for a more civilised age. It is however missing features we need such as encrypted secrets and template_driver support. There may be more things missing. They are developing a promising SDK exposes a public API for handling various operations. This would need some serious investigation and most likely some custom solutions for the features we're missing.

What we need¶

Something that is backwards compatible with our existing recipe configuration commons and the current deployments. We can't re-invent the wheel because we all rely on this system. So, we need to look towards incremental improvements or changes which are backwards compatible. We can always agree to change the config commons or some shared practices but then we need to establish a clear agreement with decision making. This is the social part.
Some way of conveniently using secrets when deploying services. This method should easily support working in a team which doesn't stray too far from our established Git Ops workflow of sharing $ABRA_DIR. They don't need to be encrypted and stored on the server (removing the need for Swarm mode handling) as long as they're mounted as secrets in the usual /run/secret/<name> manner at runtime.
Template driver support so we can template values into our configurations. This is used in enough recipes to warrant continued support.
A way to namespace services into a deployment, aka a "Docker Stack". This would appear to be a minor implementation detail after all is said and done. It's services all the way down and they have some linked networks/configs/volumes/etc. and a shared naming convention.
Some way to achieve Fearless YunoHost-esque Upgrades. In other words, some predictable way to deploy / upgrade / rollback and some way to intervene when things go wrong. It should be easy to understand for everyone and would enable real stability for operators. I think we want some sort of anti-orchestration implementation which is super simple.