Modular Monolith Pattern

Why TestMesh uses a modular monolith and how it enables future microservices migration.

TestMesh is structured as a modular monolith: a single deployable binary organized into domain modules with enforced boundaries. This is a deliberate design choice, not a stepping stone toward microservices.

What Is a Modular Monolith?

A modular monolith is a single process divided into modules that have:

Clear ownership: each module owns its data and logic
Explicit interfaces: modules interact through defined Go interfaces, not internal package access
No circular dependencies: the dependency graph is a DAG
Separate database schemas: each domain owns its schema, not individual tables across schemas

It is different from a "big ball of mud" monolith (no structure) and different from microservices (separate processes).

Modular Monolith                     Microservices
────────────────────                 ────────────────────
┌───────────────────┐                ┌──────┐  ┌──────┐
│  Single Process   │                │ API  │  │ Jobs │
│  ┌─────────────┐  │                └──┬───┘  └──┬───┘
│  │  API Domain │  │                   │ HTTP    │ gRPC
│  ├─────────────┤  │                ┌──▼─────────▼───┐
│  │  Runner     │  │                │    Runner       │
│  ├─────────────┤  │                └────────┬───────┘
│  │  Scheduler  │  │                         │ HTTP
│  ├─────────────┤  │                ┌─────────▼──────┐
│  │  Storage    │  │                │   Storage      │
│  └─────────────┘  │                └────────────────┘
└───────────────────┘

Benefits for This Use Case

For Development Speed

A single codebase means:

One go build produces the entire system
Cross-domain refactors are IDE-assisted (rename across modules instantly)
No API contracts to maintain between internal services
Stack traces include the full call chain, not "service A called service B returned 500"

For Performance

In-process function calls are approximately 1000x faster than HTTP between services:

Communication Type	Latency
In-process function call	~1–10 microseconds
HTTP to localhost	~1–5 milliseconds
HTTP across network	~5–50 milliseconds

For a test execution that calls the runner 100 times per flow, the difference between in-process and HTTP is measurable.

For Operational Simplicity

One binary to deploy, monitor, and debug
One set of logs to search
One deployment unit to roll back
Database transactions span domains — no distributed transaction complexity

For Future Flexibility

The modular structure means microservices extraction is a mechanical change when needed:

Add an HTTP or gRPC API layer on top of the domain
Replace direct function calls with RPC calls at the call sites
Deploy as a separate service
Update Kubernetes manifests

Estimated effort per domain: 2–4 weeks.

Domain Boundaries

What Makes a Boundary

Each domain in TestMesh satisfies:

Single responsibility: the domain does one thing (execute flows, store data, schedule jobs, handle HTTP)
Its own schema: flows.*, executions.*, scheduler.* — not shared tables
Exported interface: other domains call the domain through its public Go interface, not internal implementation details
No inbound dependencies from lower layers: Storage doesn't call Runner; Runner doesn't call API

The Allowed Dependency Graph

API ──────────────────► Scheduler
 │                          │
 │                          │ (Redis Streams)
 │                          ▼
 └──────────────────► Runner
                          │
                          ▼
                       Storage
                          │
                          ▼
                        Shared
                    (DB, Redis, Config, Logger)

Reverse arrows are not allowed. This is enforced via Go's internal/ package visibility — each domain's implementation is unexported; only its public interface is accessible.

Interface Example

// The Runner domain exposes a clean interface
type Executor interface {
    Execute(ctx context.Context, flow *Flow) (*ExecutionResult, error)
}

// The API domain depends on the interface, not the implementation
type FlowHandler struct {
    executor runner.Executor  // interface, not *runner.ConcreteExecutor
    flowRepo storage.FlowRepository
}

Database Schema Organization

Each domain owns its own PostgreSQL schema:

-- API domain does not own any tables directly
-- Runner domain
CREATE SCHEMA executions;
CREATE TABLE executions.executions (...);
CREATE TABLE executions.execution_steps (...);

-- Scheduler domain
CREATE SCHEMA scheduler;
CREATE TABLE scheduler.schedules (...);
CREATE TABLE scheduler.jobs (...);

-- Storage domain
CREATE SCHEMA flows;
CREATE TABLE flows.flows (...);
CREATE TABLE flows.versions (...);

This makes future database splitting trivial: point each service at its own database instance and update the connection strings. No data migration required.

Migration Path to Microservices

The current architecture is production-ready and scales well with horizontal replication. Microservices extraction should only happen when there is a concrete, measured reason — not as a proactive architectural exercise.

When to Split

Split a domain into a separate service when:

That domain needs 10x more capacity than the rest of the system
Teams need to deploy it independently on a different schedule
The domain needs a different technology (e.g., the runner needs GPU access)
Team structure has grown to where separate ownership is needed

Do not split just because:

It "feels more modern"
You expect to need it someday
Traffic is growing but still manageable with replicas

Phase 1: Extract Storage

// Before (in-process)
flow, err := storage.Get(ctx, id)

// After (HTTP client)
client := flowsapi.NewClient("http://storage-service:5016")
flow, err := client.GetFlow(ctx, id)

Cost: +1-10ms per storage call. Benefit: storage scales independently.

Phase 2: Extract Runner

// Before (in-process)
result, err := executor.Execute(ctx, flow)

// After (gRPC)
client := runnerapi.NewClient("runner-service:50051")
result, err := client.Execute(ctx, flow)

Cost: +1-10ms per execution dispatch. Benefit: runner scales independently.

Phase 3: Extract Scheduler

The scheduler already communicates with the runner via Redis Streams — no code change needed to deploy it as a separate process. Just point it at the same Redis instance and give it the runner service address.

Summary

Property	Value
Architecture	Modular Monolith
Domains	4 (API, Runner, Scheduler, Storage)
Communication	In-process function calls + Redis Streams (async)
Database	Single PostgreSQL instance with separate schemas per domain
Deployment	Single binary + Docker / Kubernetes
Migration cost	2–4 weeks per domain when splitting is needed

Modular Monolith Pattern

On this page