I am generally a huge vertical sharding skeptic but there are special cases where it is beneficial. If you have a simple query pattern on one table that represents a big fraction of your entire workload you can put it into its own instance and it becomes much easier to monitor. It’s easy to see why vertical sharding is sometimes the right answer by inverting the decision: should we put two unrelated large applications on the same instance? Obviously not, there is no benefit and ops becomes more difficult.
Using it right now, the community feels non-existent and the documentation is as barebones as it gets. Luckily my requirements are easy since I only need schema-based sharding so from an operations perspective, it's "just" managing one database clusters for each shard but I would just not use Postgres the next time and instead go with Vitess since at least you're not all on your own then.
If you truly reach the limit of vertical scaling with Postgres, 9/10 I guarantee you have some hideously unoptimized schema and queries. The remaining 1/10, I hope, has actual DB experts on staff that know what they’re doing.
I'll make an excuse for PostgreSQL lacking that feature: it's a transactional relational database with extremely well understood performance characteristics and a reputation for robustly delivering new reliable features without breaking backwards compatibility.
Adding sharding-style horizontal scaling to that while maintaining the characteristics of PostgreSQL that make it so popular is a very tall order. My suspicion is that handling sharded horizontal write scalability through external systems and extensions is a better architectural fit than trying to bake that into core.
Of course - but that is the best case scenario. You will need to support other kinds of queries as well, including writes, which is where it gets even more complicated. The guarantees provided by your RDBMS go away when you shard your database like this. Transactions are local to each database so writes to multiple cannot be a single transaction anymore.
"Tables (apps) are never really split up. Even in the perfect world, two apps developed by the same company will need to talk to each other"
I don't understand why this sentence appears to treat "tables" and "apps" as the same concept. Is this a use of the term "app" that I'm not familiar with?
---
TLDR of this post is that it's promoting a new open source (AGPL) horizontal sharding solution for PostgreSQL called PgDog: https://github.com/pgdogdev/pgdog
Looks like the trick this one uses is to parse your SQL queries inside a custom router/load-balancer and redirect them based on introspecting the WHERE clause of a SELECT/UPDATE/DELETE or the VALUES clause of an INSERT to identify sharding keys: https://docs.pgdog.dev/features/sharding/query-routing/
It let you do all sorts of clever tricks with Lua scripting, though oddly enough it looks like the repo was archived back in 2024 (and the last commit to it was 11 years ago), and the linked docs on https://dev.mysql.com/doc/mysql-proxy/en/ are a 404 now.
I am generally a huge vertical sharding skeptic but there are special cases where it is beneficial. If you have a simple query pattern on one table that represents a big fraction of your entire workload you can put it into its own instance and it becomes much easier to monitor. It’s easy to see why vertical sharding is sometimes the right answer by inverting the decision: should we put two unrelated large applications on the same instance? Obviously not, there is no benefit and ops becomes more difficult.
I've not evaluated it or ever worked with it myself, but CitusDB has been working on PostgreSQL horizontal scaling for quite some time now, right?
Does anyone have positive or negative experiences with Citus?
Using it right now, the community feels non-existent and the documentation is as barebones as it gets. Luckily my requirements are easy since I only need schema-based sharding so from an operations perspective, it's "just" managing one database clusters for each shard but I would just not use Postgres the next time and instead go with Vitess since at least you're not all on your own then.
If you truly reach the limit of vertical scaling with Postgres, 9/10 I guarantee you have some hideously unoptimized schema and queries. The remaining 1/10, I hope, has actual DB experts on staff that know what they’re doing.
Written in 2025 and still acting databases like Cassandra are trendy. It's 16 years old.
Stop making excuses for PostgreSQL lacking a built-in, supported horizontal scalability solution.
I'll make an excuse for PostgreSQL lacking that feature: it's a transactional relational database with extremely well understood performance characteristics and a reputation for robustly delivering new reliable features without breaking backwards compatibility.
They have added horizontal scalability in the form of built-in replication: https://www.postgresql.org/docs/current/high-availability.ht...
Adding sharding-style horizontal scaling to that while maintaining the characteristics of PostgreSQL that make it so popular is a very tall order. My suspicion is that handling sharded horizontal write scalability through external systems and extensions is a better architectural fit than trying to bake that into core.
What’s wrong with the Postgres Citus extension for this use case?
Surely there must be a way to do the joins in software, without doing it by hand, eg a SQL-like library? Pandas or equivalent?
Of course - but that is the best case scenario. You will need to support other kinds of queries as well, including writes, which is where it gets even more complicated. The guarantees provided by your RDBMS go away when you shard your database like this. Transactions are local to each database so writes to multiple cannot be a single transaction anymore.
"Tables (apps) are never really split up. Even in the perfect world, two apps developed by the same company will need to talk to each other"
I don't understand why this sentence appears to treat "tables" and "apps" as the same concept. Is this a use of the term "app" that I'm not familiar with?
---
TLDR of this post is that it's promoting a new open source (AGPL) horizontal sharding solution for PostgreSQL called PgDog: https://github.com/pgdogdev/pgdog
Looks like the trick this one uses is to parse your SQL queries inside a custom router/load-balancer and redirect them based on introspecting the WHERE clause of a SELECT/UPDATE/DELETE or the VALUES clause of an INSERT to identify sharding keys: https://docs.pgdog.dev/features/sharding/query-routing/
There was a MySQL proxy a decade or two ago that did that.
Do you mean this one? https://github.com/mysql/mysql-proxy
It let you do all sorts of clever tricks with Lua scripting, though oddly enough it looks like the repo was archived back in 2024 (and the last commit to it was 11 years ago), and the linked docs on https://dev.mysql.com/doc/mysql-proxy/en/ are a 404 now.
MySQL also has https://github.com/vitessio/vitess which is alive and well.
[dead]