How a bulk marketing campaign starved transactional email at Prachyam Studios — and how I fixed it by understanding Postfix queue internals and building a priority queue on top of Mailcow.
It was a Tuesday afternoon when the support tickets started coming in.
"Password reset email never arrived." "Can't log in, OTP not received." A user who'd been waiting forty minutes for a simple reset link. Then another. Then five more. Our Mailcow dashboard showed no errors. Postfix wasn't bouncing anything. The queue had messages in it — a lot of them, actually — and they were moving. Just not the ones that needed to move.
We were two months into running a self-hosted Mailcow stack at Prachyam Studios, a media company with two offices (Pune and Varanasi), 20 people, and an active marketing operation. I'd proposed the setup myself in a cost-reduction meeting: take the marketing email budget we were burning on Mailchimp — conservatively $300,000 for the 6-month campaign we had planned, given our ~350 million-record target dataset sat far above Mailchimp's 200k-contact Premium cap — and replace it with a self-hosted stack running on RackNerd VPS instances. Total infra cost for the campaign: roughly ₹9,000 (~$97). The math was obviously right. The configuration, as it turned out, needed work.
Six RackNerd VPS servers. Twelve custom sending domains, each with SPF, DKIM, and DMARC configured — rotating domains and IPs so no single deliverability incident could take down the whole operation. Mailcow running as a Docker Compose stack on each server: Postfix for the MTA, Dovecot for IMAP, Rspamd for spam filtering, the full bundle. Internal team mail for all 20 people ran through the same servers. So did onboarding emails, password resets, OTPs — every transactional notification the platform sent.
That last part is where it went wrong.
The marketing team had launched a new batch. Nothing unusual — they'd done several runs by this point. This one was a larger send, pushing out into the deeper segments of the dataset. I was focused on something else when the first ticket came in. By the time the third one landed, I was looking at the queue.
The numbers looked fine at first glance. Messages were processing. Rspamd scores were clean. No obvious delivery failures in /var/log/mail.log. I ran postqueue -p on the active sending server and got back several thousand lines. The queue was full — not stuck, just backed up with the marketing batch. That's when the shape of the problem started to form.
I piped the output and looked at the message timestamps:
postqueue -p | grep "^[A-F0-9]" | awk '{print $3, $4, $5}' | head -40Karanveer Singh Shaktawat
Full Stack Engineer & Infrastructure Architect
Building portfolio, contributing to open source, and seeking remote full-time roles with significant technical ownership.
Pick what you want to hear about — I'll only email when it's worth it.
Did this resonate?
The oldest messages in the queue were from an hour ago. They were transactional — notifications, a password reset, two OTP sends. Sitting behind thousands of marketing messages that had arrived after them but had the same queue priority: zero. Postfix doesn't care what an email contains. It processes the queue roughly in arrival order, with some nuance for connection limits and retry intervals — but there is no concept of "this one matters more than that one" in a default configuration.
My first wrong assumption was that Rspamd was throttling something. I checked the Rspamd web interface and the greylisting logs. Nothing unusual. My second wrong assumption was that the sending rate limits were too aggressive on that particular server. I checked smtp_destination_rate_delay and smtp_destination_concurrency_limit. They were fine — tuned appropriately for the volume. The marketing mail was moving at a good clip.
That was exactly the problem. The marketing mail was moving at a good clip, and it had an hours-long head start.
Postfix manages several queue directories: incoming, active, deferred, hold, corrupt. The active queue is what's actually being delivered — Postfix moves messages from incoming to active up to a configurable limit (qmgr_message_active_limit, default 20,000 messages). Once active is full, new incoming messages wait in incoming regardless of their priority.
When a marketing batch drops 50,000 messages into incoming at once and the active queue fills with them, a transactional message that arrives mid-batch joins the back of the incoming line. It won't see the active queue until the batch drains enough to make room — and at the sending rates we were running, that took a long time.
This is not a bug in Postfix. It's doing exactly what it's configured to do: deliver mail. It's just that "deliver mail" and "deliver this mail first" are two different requirements, and the default configuration only satisfies the first one.
The fix Postfix provides for this is transport maps — the ability to route different mail through different delivery "transports," each with its own rate limits, concurrency settings, and queue behavior. Two transports means two queues, independent of each other. A transactional message routed through the internal transport will never compete with a marketing message routed through the bulk transport for the same active queue slot.
The implementation required three pieces working together.
First, a custom transport definition in master.cf — two separate SMTP transports, one for bulk and one for internal/transactional sends:
# /etc/postfix/master.cf (appended)
bulk unix - - n - 20 smtp
-o smtp_destination_rate_delay=2s
-o smtp_destination_concurrency_limit=10
-o smtp_extra_recipient_limit=10
internal unix - - n - 50 smtp
-o smtp_destination_rate_delay=0
-o smtp_destination_concurrency_limit=50The bulk transport is deliberately rate-limited: a 2-second delay between deliveries per destination, low concurrency. The internal transport runs with no artificial delay and higher concurrency — transactional messages move as fast as the destination will accept them.
Second, transport maps in main.cf to route by sender domain:
# /etc/postfix/main.cf
transport_maps = hash:/etc/postfix/transport_maps
# /etc/postfix/transport_maps
# Marketing domains → bulk transport
campaigns.prachyam.com bulk:
marketing.prachyam.com bulk:
# All else falls through to internal transport
# (set smtp_fallback_relay or default_transport = internal)Third — and this is the part that's easy to miss — you need to tell Postfix's queue manager that the two transports are independent, so it doesn't let one block the other:
# /etc/postfix/main.cf
defer_transports =
qmgr_message_active_limit = 20000
# Separate active queue limits per transport are enforced
# automatically once transports are distinct in master.cfAfter running postmap /etc/postfix/transport_maps and reloading Postfix, the change was immediate. I watched postqueue -p in one terminal and the mail log in another. The transactional messages that had been sitting in queue for over an hour cleared within minutes. The marketing batch kept going, unaffected — it just no longer owned the whole queue.
The starvation problem went away. Not "improved" — went away. Internal mail started delivering within its normal window regardless of what the marketing batch was doing. The two queues were independent, and they stayed that way through the rest of the campaign run.
Across the full 6-month campaign, the stack processed approximately 20 million outbound emails to the marketing dataset, plus internal mail for 20 people. Total infra cost: ₹14,000/year (~$151) for six servers and twelve domains. The Mailchimp alternative for the same volume would have been a custom enterprise quote north of $300,000. Even AWS SES + Listmonk — the credible DIY middle ground — would have run around $7,500 for the campaign run alone.
The priority queue configuration wasn't a huge amount of code. The relevant master.cf additions are about ten lines. The transport map is straightforward. What took time was understanding why the default configuration didn't handle this, rather than just cargo-culting a solution from a forum post. The queue semantics, the transport abstraction, the way qmgr manages active slots — once those clicked, the fix was obvious.
The honest answer is: separate the transactional and bulk sending paths earlier, before they're sharing a server at all. The priority queue configuration works, and it worked throughout the rest of the campaign, but it's a configuration solution to an architecture problem. If the marketing blast had been running through a dedicated server from the start, the starvation could never have happened — there's nothing to share.
The reason I didn't do that initially was operational: fewer servers means fewer log streams to watch, fewer configs to keep in sync, fewer things to go wrong in unfamiliar ways. For a team without a dedicated DevOps person, that reasoning held up. But the configuration complexity of the priority queue — the transport maps, the per-transport rate limits, the DKIM alignment across queue classes — is its own kind of operational surface. Whether it's cheaper than a second server depends on how much you trust your own Postfix knowledge, which before this incident I'd overestimated.
I'd also automate the DNS verification checks post-DKIM rotation across all twelve domains. The current setup requires manually running dig against each domain to confirm SPF, DKIM, and DMARC records are propagating correctly. A small shell script could catch misconfigurations before they affect deliverability instead of after. That's a gap that hasn't bitten us yet, but only because we've been careful.
The mail stack is handed off to the Prachyam team now. The priority queue is running. The cost savings are permanent as long as the servers stay up. But the thing I think about is those transactional messages sitting in the queue for over an hour — the users who couldn't log in, couldn't reset their passwords, were just waiting. A configuration decision I hadn't made yet was the direct cause of that. Getting the fix right required understanding the system, not just fixing the symptom. That's the part worth keeping.
The full story of building a self-hosted email stack across 12 domains and 6 servers at Prachyam Studios — the architecture, the hard lessons, and why I'd do it again.