PHP at Scale #17
Deep dive into databases
This month, I want to talk about databases. Not the basics - you know how to write queries and set up indexes. I’m talking about the advanced techniques that separate applications that handle a few thousand users from those that serve millions without breaking a sweat.
Here’s the thing: most of us learned databases through ORMs. Doctrine or Eloquent became our window to the database world, and for years, that was enough. But as your application grows, you start hitting walls. Queries that worked fine start timing out. Features that seemed simple become performance nightmares. And suddenly, you need to understand what’s actually happening beneath your ORM’s abstraction layer.
I’ve collected six areas where I consistently see senior PHP developers level up their database game. Some of these I learned the hard way. Let’s dig in.
Processing 11 million rows in minutes
An interesting piece where Brent shares the journey of optimizing a batch job from 30 events per second to 50,000. The techniques he used weren’t revolutionary - removing unnecessary ORDER BY clauses, bypassing the ORM entirely for bulk operations, and wrapping work in transactions. But the results were transformational.
The truth is that most modern ORMs (especially those that do not use an active record approach) allow you to optimize queries easily. If you look at Doctrine, you can fine-tune the query to join the right tables when required, load partial objects or even fetch data into DTOs. Doctrine also has documentation on how to batch insert data. But there are three problems with ORMs:
It takes time and requires some knowledge of the ORM in use, debugging resulting queries, etc.
The object/sql transformation also comes at a cost. Sometimes, the cost becomes too big.
Sometimes using ORM makes us developers lazy - we don’t spend the 30s to think what data we need, just pull objects from the DB, and assume it’s fine.
I’m not saying abandon your ORM - it’s still excellent for complex business logic with a handful of entities. But when you’re processing reports, running exports, or handling background jobs at scale, learn to drop down to DBAL or native queries.
I’ve seen too many projects that were slow just because loading a simple page resulted in 1000 queries. But I’ve also seen situations where adding all required joins to these queries resulted in no improvements, as the specific structure was fetching 50MB of data from the DB. A separate read model in this case can improve the performance by a magnitude.
Side-note: while this blog post nicely summarizes some optimization flows, keep in mind that for data processing, you can also use ETL tools like Flow PHP, and it will be even faster.
PostgreSQL JSONB: The Hybrid Sweet Spot
If you’re using PostgreSQL, JSONB might be the most underused feature in your toolkit. It gives you document database flexibility without abandoning relational guarantees. Perfect for those messy real-world cases where your schema can’t anticipate every variation.
I’ve seen teams reach for MongoDB the moment they need flexible schemas, only to struggle later when they need joins or transactions. Meanwhile, PostgreSQL’s JSONB handles both use cases elegantly. Oskar wrote a thoughtful piece onJSONB that covers the hybrid approach: keeping your core domain in proper columns while using JSONB for metadata, configurations, or user-defined attributes.
The key is understanding when to use it. JSONB shines for: configuration blobs that vary per entity, audit logs with variable payloads, metadata that doesn’t need relational integrity, and rapid prototyping where the schema is still evolving. But don’t go overboard: I’ve seen teams stuff everything into JSONB columns and lose all the benefits of relational modelling.
When you do use JSONB, index it properly. GIN indexes on JSONB columns can be surprisingly fast, but only if you set them up correctly. The Crunchy Data team has a practical guide on indexing JSONB in Postgres that’s worth bookmarking.
From my personal stories: unfortunately, not all engines treat JSON so well as PostgreSQL. I remember a project based on MariaDB where it came out that even fetching/selecting a JSON field caused a big performance difference, not even filtering by it!
Search Beyond LIKE: Full-Text and Vector Search
Quite often, the projects we work with include a search feature that uses code like: WHERE name LIKE '%query%'. It works, kind of, until the table grows and users complain it’s slow and doesn’t find what they’re looking for. Just a small typo prevents the right result from being shown.
PostgreSQL’s built-in full-text search is surprisingly capable and criminally underused. It handles stemming (so “running” matches “run”), ignores stop words, and can rank results by relevance - all without an external service. Remi Mercier wrote a beginner-friendly guide to PostgreSQL full-text search that explains tokens, lexemes, and practical implementation without the usual documentation density.
For most internal tools and admin panels or even lower-traffic features, PostgreSQL FTS is more than enough. You don’t need Elasticsearch for a product search that handles a few thousand items. A GIN index on a tsvector column and some tweaking of weights will get you 90% of the way there with zero additional infrastructure.
One caveat: it does not support all languages out of the box, but you can easily add support for the ones required.
But what about semantic search? The kind where “comfortable shoes for walking” finds products tagged as “ergonomic sneakers”? That’s where vector search comes in.
Ben Bjurstrom created a pgvector driver for Laravel Scout that makes vector similarity search accessible without a dedicated vector database. For Symfony developers, Pauline Vos demonstrated the full stack at SymfonyCon 2025 with her archaeology matcher using Symfony AI - a fun demo that shows how to combine text and image embeddings for similarity search.
Is vector search something every PHP application needs? No. But if you’re building features that require understanding meaning rather than matching keywords - recommendations, semantic search, content similarity - it’s now within reach without leaving the PHP ecosystem. I’ve shared examples for Laravel and Symfony, but embedding and storing vectors is really easy to implement. And you can do it with PostgreSQL or some dedicated database engines.
I’ve recently reimplemented a search feature to include vectors, and the outcomes are great. The search feature is now also connected to an AI model, and when it needs more data, it can easily find the matching content. And the MVP implementation took an hour to build!
Specialized Databases
I believe modern SQL databases can handle 95% of what most applications need. But that remaining 5% is worth knowing about, so you recognize when you’ve hit its limits.
I don’t link any specific articles in this section, but some use-case-specific databases that worked well for us:
ClickHouse for analytics: If you’re running aggregation queries over millions of rows - dashboards, reports, time-series analysis - a columnar database like ClickHouse can be 100x faster. It’s not a replacement for your transactional database, but it’s excellent for read-heavy analytical workloads. The pattern I’ve seen work well is streaming events to ClickHouse for analytics while keeping your source of truth in a transactional database. This helped us provide blazing-fast price insights without adding any load to our main database.
Neo4j for graph relationships: When your JOINs start looking like spider webs - social connections, recommendation engines, fraud detection, dependency analysis - a graph database makes those queries natural instead of painful. Five levels of JOIN In SQL becomes a simple traversal pattern.
The key insight is that these aren’t either/or decisions. Mature systems often have multiple databases, each handling what it’s best at. Your transactional data lives in PostgreSQL, your analytics flow to ClickHouse, and your relationship queries hit Neo4j. The complexity is real, but so are the performance gains when you match the tool to the problem. Just don’t include them when they are not needed ;)
I can recall projects where Neo4j allowed us to simplify the project a lot - it was a system calculating the routes inside buildings for blind people. We had the map of beacons inside the building stored as a graph. Each beacon is a node, and the connections have information on distance, time required to travel between them and if it is possible to take this path in a wheelchair. Now connecting two points was a rather simple query :)
But I’ve also made the mistake of including Elasticsearch as a specialized search database, and although it worked perfectly, it was just too big for such a small project. I did not consider the resources vs the traffic that will reach it.
Expand/Contract: making a breaking change without a big bang
Schema changes in production used to terrify me. A simple ALTER TABLE ADD COLUMN on a 100-million row table can lock it for minutes, and minutes of downtime mean angry users and panicked Slack channels.
For simple changes - adding a nullable column, creating a new table, adding an index - you often don’t need much ceremony. Just make sure the column has a default or allows NULL, and your existing code will keep working. This “additive-only” approach handles 80% of migrations without any special patterns.
But what about breaking changes? Renaming a column, changing a data type, splitting a table? That’s where the expand-contract pattern comes in. This article explains it clearly - you expand the schema to support both old and new structures, migrate the data, update your application, then contract by removing the old structure.
It sounds like more work, and it is. But it’s also the difference between a planned migration and a 3 AM incident. The pattern works for any database - PostgreSQL, MySQL, whatever you’re running.
A few practical tips: never run migrations in the same deployment as code changes (separate them), use CREATE INDEX CONCURRENTLY in PostgreSQL, test your migrations against a production-sized dataset before running them live, and always have a rollback plan that doesn’t require restoring from backup.
I recall a project of another team in a company I worked for. They unfortunately had an issue with the version released, and they learned the hard way that their database changes rollback requires 4 hours to finish. I don’t wish anyone to be in such a situation. You just sit and wait until it finishes, hoping it won’t fail on some stupid issue.
DevOps Considerations (A Brief Mention)
I originally planned to cover the operational side - read replicas, connection pooling, table partitioning strategies, and monitoring with pg_stat_statements. But this newsletter is already long, and each of those topics deserves proper treatment.
So consider this a preview: if you’re scaling your database, these are the levers you’ll eventually need to pull. Read replicas for scaling reads, connection pooling for handling more concurrent users, partitioning for managing huge tables, and monitoring to understand what’s actually slow.
I might cover these in a future edition. Let me know if there’s a specific area you’d like me to prioritize.
Don’t overcomplicate
On a closing note, I would like to highlight that in most cases, you can achieve exceptional improvements by using simple changes - be it a dedicated read model, with a fine-tuned query. If you need more proof that PostgreSQL can handle serious scale, OpenAI recently shared how they run ChatGPT for 800 million users on a single-primary PostgreSQL instance - no sharding, just fundamentals done right: read replicas, connection pooling, and disciplined schema changes.
PS. We help teams modernize PHP applications and fix the architectural problems that hold them back - the kind of stuff I write about here. If that sounds like your situation, we currently have a slot for a new project. Just email me or reach out on LinkedIn.
Why is this newsletter for me?
If you are passionate about well-crafted software products and despise poor software design, this newsletter is for you! With a focus on mature PHP usage, best practices, and effective tools, you'll gain valuable insights and techniques to enhance your PHP projects and keep your skills up to date.
I hope this edition of PHP at Scale is informative and inspiring. I aim to provide the tools and knowledge you need to excel in your PHP development journey. As always, I welcome your feedback and suggestions for future topics. Stay tuned for more insights, tips, and best practices in our upcoming issues.
May thy software be mature!


