Scalable tools
Distributed, multi-node systems like mongo/rabbitmq/kubernetes/microservices are often grouped together and called “scalable”. This suggests that these kinds of tools can be used at scale, as opposed to the rest, which do not scale as well.
This suggestion is wrong:
- these tools do not guarantee scalability, and
- single-node systems can scale fine, most (like 99.9%) of the time.
So, what these multi-node “scalable” tools are really doing is allowing potentially higher ceilings. And you probably won’t be anywhere close to your ceilings, unless you are one of the biggest companies in the world, even there you likely only reach this ceiling in certain hotspots.
Funny: “MongoDB is Web Scale”
Better ways to scale
Your algorithms and data structures matter 100x more than your tools.
Setup some benchmarks and know the limitations of different operations. For common latencies, I find these back of the envelope calculations useful.
Look around a bit to understand why your more boring not so scalable tools are slow. They might have features, or options, that improve your bottlenecks. e.g.
- SQLite has
WAL
mode that vastly improves writes - Postgres has a feature called Materialized Views that removes performance issues associated with many joins
- SQL has indexes which you should be using appropriately - Use the Index, Luke
Use caching to ease load of your main database.
Run multiple servers, one per client / namespace / region / etc. AKA Single tenancy. See DHH - Multi-tenancy is what’s hard about scaling services.
Tennis and trade-offs
As a recent beginner dabbling in tennis, I was in the market for a racket. I bought one for $30, not $200. It could’ve been better, but as a non-professional, the marginal improvement is not worth the 6x investment.
The tradeoffs of the two tennis rackets are simple. More expensive, but better quality in every way, probably.
The list of tradeoffs of technology with higher ceilings is longer and often not taken into consideration at all.
- complexity
- latency
- consistency - distributed systems often have to sacrifice immediate consistency (as they achieve higher throughput via eventual consistency)
- operating costs
- learning curve
Ya ain’t gonna need it (YAGNI)
There are tons of mantras (YAGNI, Premature Optimization) that emphasize focusing on the now/soon rather than focusing on “what might be in the future”.
But there are simply too many variables to predict as you think about orders of magnitude.
“I’ve built three distributed job systems at this point. A handy rule of thumb which I have promoted for years is ‘build for 10x your current scale.’” - random hn dude