All Articles
49 min read

Building a Production-Ready File Upload System: My 7-Phase Journey

Article

Building a Production-Ready File Upload System: My 7-Phase Journey

A detailed walkthrough of building a scalable, secure file upload system with direct-to-S3 uploads, virus scanning, and background processing.


Why I Built This

Like many developers, I started with the "simple" approach to file uploads: receive the file on my server, save it somewhere, and call it done. Then reality hit:

  • Server memory exploded with large file uploads
  • Users waited forever while files uploaded AND processed
  • No virus scanning = security nightmare waiting to happen
  • Zero scalability - couldn't handle multiple uploads simultaneously

I needed something production-ready. So I built it, learned a ton, and now I'm sharing the entire journey with you.

What You'll Learn

By the end of this post, you'll understand:

  • Why direct-to-S3 uploads matter (and how to implement them)
  • How to scan files for viruses without blocking your API
  • Background job processing with BullMQ and Redis
  • Making your system Kubernetes-ready with health checks
  • Packaging everything for production deployment

Tech Stack: NestJS, BullMQ, Redis, S3, ClamAV, Sharp, Docker, Kubernetes

Quick Tech Stack Overview

Before we dive in, here's what each technology does:

  • NestJS: A progressive Node.js framework with TypeScript support, dependency injection, and excellent structure. Think Express.js but with batteries included and enterprise-ready patterns built-in.

  • BullMQ: Redis-based job queue for handling background tasks. Manages job scheduling, retries, priorities, and failure handling.

  • Redis: In-memory data store used by BullMQ. Super fast (sub-millisecond operations) for coordinating work between services.

  • S3 (AWS): Cloud object storage with infinite scalability. Stores files with 99.999999999% durability across multiple data centers.

  • ClamAV: Open-source antivirus engine for scanning uploaded files for malware, viruses, and malicious content.

  • Sharp: High-performance image processing library. Resizes, crops, and optimizes images 4-5x faster than alternatives.

  • Docker: Containerization platform that packages applications with all dependencies. "Works on my machine" becomes "works everywhere."

  • Kubernetes: Container orchestration system that manages deployment, scaling, and operation of containerized applications across clusters.


Phase 1: The Naive Approach (And Why It Failed)

What I Built First

// DON'T DO THIS - The naive approach
@Post('upload')
async uploadFile(@UploadedFile() file: Express.Multer.File) {
  // File goes through your server's memory
  await this.saveFile(file);
  return { success: true };
}

The Problems

  1. Memory Issues: A 100MB file = 100MB in server RAM
  2. Slow: Upload → Process → Respond (everything sequential)
  3. Not Scalable: 10 concurrent uploads = server crash
  4. No Validation: Viruses? Malicious files? Good luck!

What I Learned

Lesson #1: Your server shouldn't be a middleman for large files.

The solution? Presigned URLs - let clients upload directly to S3.


Phase 2: Direct-to-S3 with Presigned URLs

The Breakthrough

What is S3? Amazon S3 (Simple Storage Service) is a cloud object storage service that's designed to store and retrieve any amount of data from anywhere. Think of it as an infinitely scalable hard drive in the cloud. It's highly reliable (99.999999999% durability), fast, and handles millions of requests per second. Instead of storing files on your server's limited disk space, you store them in S3 where they're automatically replicated across multiple data centers.

Instead of files flowing through my server, I became a URL generator:

// The better way - Presigned URLs
@Post('upload/request')
async requestUpload(@Body() dto: UploadRequestDto) {
  // 1. Validate the request (file type, size)
  this.validateRequest(dto);

  // 2. Generate presigned URL
  const uploadUrl = await this.s3Service.generatePresignedUrl({
    bucket: this.configService.get('S3_BUCKET'),
    key: `uploads/${uuid()}/${dto.filename}`,
    contentType: dto.contentType,
    expiresIn: 300 // 5 minutes
  });

  // 3. Store metadata (we'll track this upload)
  const fileId = await this.uploadRepo.create({
    filename: dto.filename,
    status: 'pending',
    size: dto.size,
  });

  return { uploadUrl, fileId };
}

How It Works

Client                    API                     S3
  |                        |                       |
  |---(1) Request URL----->|                       |
  |                        |---(2) Generate------->|
  |                        |<----Presigned URL-----|
  |<---(3) Return URL------|                       |
  |                                                 |
  |-----------(4) Upload directly----------------->|
  |                                                 |

What are Presigned URLs? A presigned URL is a temporary, secure URL that grants time-limited access to a specific S3 object. Instead of giving users permanent access credentials, you generate a URL that's only valid for a short period (like 5 minutes) and for a specific operation (like uploading one file). Think of it as a temporary key card that expires - secure, controlled, and perfect for allowing direct uploads without exposing your AWS credentials.

The Magic

  • Zero server memory usage for file content
  • Parallel uploads - S3 handles the load
  • Secure - URLs expire in 5 minutes, single-use
  • Fast - Direct connection to S3

Implementation Deep Dive

// src/s3/s3.service.ts
@Injectable()
export class S3Service {
  private readonly s3: S3Client;

  constructor(private config: ConfigService) {
    this.s3 = new S3Client({
      region: config.get("AWS_REGION"),
      credentials: {
        accessKeyId: config.get("AWS_ACCESS_KEY_ID"),
        secretAccessKey: config.get("AWS_SECRET_ACCESS_KEY"),
      },
    });
  }

  async generatePresignedUrl(params: PresignedUrlParams): Promise<string> {
    const command = new PutObjectCommand({
      Bucket: params.bucket,
      Key: params.key,
      ContentType: params.contentType,
      // Security: enforce content type
      ContentLength: params.size,
    });

    return getSignedUrl(this.s3, command, {
      expiresIn: params.expiresIn,
    });
  }
}

What I Learned

Lesson #2: Use presigned URLs for uploads. Your server generates URLs, S3 handles the heavy lifting.

But now I had a new problem: How do I know when uploads complete? And how do I scan them for viruses?


Phase 3: Virus Scanning with ClamAV

The Security Wake-Up Call

Allowing direct uploads is great for performance, but terrifying for security. Users could upload anything:

  • Malware
  • Ransomware
  • Trojan horses
  • Viruses disguised as PDFs

Enter ClamAV

What is ClamAV? ClamAV (Clam AntiVirus) is an open-source antivirus engine designed for detecting trojans, viruses, malware, and other malicious threats. It's free, actively maintained, and widely used in production systems for scanning files. Unlike commercial antivirus software, it's designed to be integrated into applications via its API. It maintains an up-to-date virus database and can scan files in milliseconds to seconds depending on size.

I integrated ClamAV into my system:

// src/scanning/clamav.service.ts
@Injectable()
export class ClamAVService {
  private client: NodeClam;

  async scanFile(s3Key: string): Promise<ScanResult> {
    // 1. Download file from S3 to temp location
    const tempPath = await this.downloadToTemp(s3Key);

    try {
      // 2. Scan with ClamAV
      const { isInfected, viruses } = await this.client.scanFile(tempPath);

      if (isInfected) {
        // 3. Quarantine infected files
        await this.quarantineFile(s3Key);

        return {
          status: "infected",
          threats: viruses,
        };
      }

      return { status: "clean" };
    } finally {
      // 4. Always cleanup temp files
      await fs.unlink(tempPath);
    }
  }

  private async quarantineFile(s3Key: string): Promise<void> {
    // Move to quarantine bucket
    await this.s3.copyObject({
      CopySource: `${this.bucket}/${s3Key}`,
      Bucket: this.quarantineBucket,
      Key: s3Key,
    });

    // Delete from main bucket
    await this.s3.deleteObject({
      Bucket: this.bucket,
      Key: s3Key,
    });
  }
}

The Architecture

Upload Complete
      |
      v
[S3 Event Notification]
      |
      v
[Trigger Scan Job]
      |
      v
[ClamAV Scans File]
      |
      +--- Clean? --> [Mark as safe]
      |
      +--- Infected? --> [Quarantine + Alert]

Docker Setup

Running ClamAV locally for development:

# docker-compose.yml
services:
  clamav:
    image: clamav/clamav:latest
    ports:
      - "3310:3310"
    volumes:
      - clamav-data:/var/lib/clamav
    environment:
      - CLAMAV_NO_FRESHCLAM=false
    healthcheck:
      test: ["CMD", "clamdscan", "--ping"]
      interval: 30s
      timeout: 10s
      retries: 3

What I Learned

Lesson #3: Never trust user uploads. Scan everything before making it available.

But scanning takes time (5-30 seconds per file). I couldn't block the API waiting for scans...


Phase 4: Background Jobs with BullMQ

The Problem

Virus scanning is slow:

  • Small files: 2-5 seconds
  • Large files: 10-30 seconds
  • Images to process: +10 seconds

I couldn't make users wait 40+ seconds for a response!

The Solution: Job Queues

Enter BullMQ - a Redis-based job queue.

What is BullMQ? It's a powerful Node.js library that turns Redis into a robust job queue system. Think of it as a to-do list manager for your application: you add tasks to the queue, and worker processes pick them up and execute them in the background. BullMQ handles all the complexity of job scheduling, retries, priorities, and failure handling. It's production-ready, widely used, and integrates seamlessly with NestJS.

Here's how I used it:

// src/jobs/scan-job.producer.ts
@Injectable()
export class ScanJobProducer {
  constructor(
    @InjectQueue("file-processing")
    private queue: Queue
  ) {}

  async queueFileScan(fileId: string, s3Key: string): Promise<void> {
    await this.queue.add(
      "scan-file",
      {
        fileId,
        s3Key,
        priority: "high", // Scans first
      },
      {
        attempts: 3,
        backoff: {
          type: "exponential",
          delay: 2000,
        },
      }
    );
  }
}

// workers/scan-job.consumer.ts
@Processor("file-processing")
export class ScanJobConsumer {
  constructor(
    private clamav: ClamAVService,
    private uploads: UploadRepository
  ) {}

  @Process("scan-file")
  async handleScan(job: Job<ScanJobData>): Promise<void> {
    const { fileId, s3Key } = job.data;

    // Update status: scanning
    await this.uploads.update(fileId, { status: "scanning" });

    // Perform scan
    const result = await this.clamav.scanFile(s3Key);

    if (result.status === "infected") {
      await this.uploads.update(fileId, {
        status: "quarantined",
        scanResult: result.threats,
      });

      // Alert admins
      await this.alertService.notifyInfectedFile(fileId);
    } else {
      await this.uploads.update(fileId, {
        status: "clean",
        scannedAt: new Date(),
      });

      // Queue image processing if needed
      if (this.isImage(s3Key)) {
        await this.queueImageProcessing(fileId, s3Key);
      }
    }
  }
}

The Flow

API Request
    |
    v
[Generate Presigned URL]
    |
    v
[Return immediately] <---- User gets fast response
    |
    v
[Queue scan job]
    |
    v
[Worker picks up job]
    |
    v
[Scan in background]
    |
    v
[Update status in DB]

Job Priorities

// High priority: Security scans
await queue.add("scan-file", data, { priority: 1 });

// Medium priority: Image processing
await queue.add("process-image", data, { priority: 5 });

// Low priority: Analytics
await queue.add("update-stats", data, { priority: 10 });

Monitoring Jobs

// Check job status
@Get('upload/:id/status')
async getStatus(@Param('id') fileId: string) {
  const upload = await this.uploadRepo.findOne(fileId);

  return {
    status: upload.status, // pending, scanning, clean, infected
    progress: upload.progress,
    result: upload.scanResult,
  };
}

What I Learned

Lesson #4: Separate sync from async operations. Return fast, process in the background.

Now I had async processing, but what about images? Users upload photos - they need thumbnails!


Phase 5: Image Processing with Sharp

The Image Problem

Users upload high-resolution images:

  • 4K photos: 8-15 MB
  • Phone pics: 3-8 MB
  • Screenshots: 1-5 MB

Nobody wants to load 10MB images in a gallery. I needed:

  • Thumbnails (150x150)
  • Medium sizes (800x600)
  • Optimized originals

Sharp to the Rescue

What is Sharp? Sharp is a high-performance Node.js image processing library built on libvips. It's incredibly fast - typically 4-5x faster than ImageMagick or GraphicsMagick - because it's written in C++ and optimized for modern CPUs. Sharp can resize, crop, rotate, and convert images between formats with minimal memory usage. It's the go-to choice for production image processing in Node.js applications.

Here's how I used Sharp:

// workers/image-processing.consumer.ts
@Processor("file-processing")
export class ImageProcessingConsumer {
  constructor(private s3: S3Service) {}

  @Process("process-image")
  async handleImageProcessing(job: Job<ImageJobData>): Promise<void> {
    const { fileId, s3Key } = job.data;

    // Download original
    const imageBuffer = await this.s3.downloadFile(s3Key);

    // Process in parallel
    await Promise.all([
      this.createThumbnail(imageBuffer, s3Key),
      this.createMedium(imageBuffer, s3Key),
      this.optimizeOriginal(imageBuffer, s3Key),
    ]);

    // Update metadata
    await this.uploads.update(fileId, {
      processed: true,
      variants: {
        thumbnail: `${s3Key}-thumb.jpg`,
        medium: `${s3Key}-medium.jpg`,
        original: s3Key,
      },
    });
  }

  private async createThumbnail(buffer: Buffer, s3Key: string): Promise<void> {
    const thumbnail = await sharp(buffer)
      .resize(150, 150, {
        fit: "cover",
        position: "center",
      })
      .jpeg({ quality: 80 })
      .toBuffer();

    await this.s3.uploadFile(`${s3Key}-thumb.jpg`, thumbnail, "image/jpeg");
  }

  private async createMedium(buffer: Buffer, s3Key: string): Promise<void> {
    const medium = await sharp(buffer)
      .resize(800, 600, {
        fit: "inside",
        withoutEnlargement: true,
      })
      .jpeg({ quality: 85 })
      .toBuffer();

    await this.s3.uploadFile(`${s3Key}-medium.jpg`, medium, "image/jpeg");
  }

  private async optimizeOriginal(buffer: Buffer, s3Key: string): Promise<void> {
    const metadata = await sharp(buffer).metadata();

    // Only optimize if it's huge
    if (metadata.width > 2000 || metadata.height > 2000) {
      const optimized = await sharp(buffer)
        .resize(2000, 2000, {
          fit: "inside",
          withoutEnlargement: true,
        })
        .jpeg({ quality: 90 })
        .toBuffer();

      await this.s3.uploadFile(s3Key, optimized, "image/jpeg");
    }
  }
}

The Processing Pipeline

Image Upload
    |
    v
[Scan for viruses]
    |
    v
Clean? --> [Queue image processing]
    |
    v
[Download from S3]
    |
    v
[Process in parallel]
    |
    +---> [Thumbnail 150x150]
    |
    +---> [Medium 800x600]
    |
    +---> [Optimized original]
    |
    v
[Upload variants to S3]
    |
    v
[Update metadata]

Performance Wins

// Before: Sequential processing
const thumb = await createThumbnail(); // 2s
const medium = await createMedium(); // 3s
const optimized = await optimize(); // 4s
// Total: 9 seconds

// After: Parallel processing
await Promise.all([
  createThumbnail(), // 2s
  createMedium(), // 3s
  optimize(), // 4s
]);
// Total: 4 seconds (bottlenecked by slowest)

What I Learned

Lesson #5: Process images in the background. Generate multiple sizes in parallel. Users love fast galleries.

Everything worked locally, but how do I know if it's healthy in production?


Phase 6: Health Checks & Kubernetes Readiness

The Production Reality

In production (especially Kubernetes), you need to answer:

  • Is my app alive? (Liveness probe) - Should Kubernetes restart my container if it's stuck?
  • Is my app ready for traffic? (Readiness probe) - Should Kubernetes send requests to this instance?
  • Are my dependencies healthy? (Database, Redis, S3, ClamAV) - Can my app actually do its job?

Why health checks matter: Without them, Kubernetes keeps sending traffic to broken pods, or doesn't restart crashed containers. Health checks are how Kubernetes knows when something's wrong and what action to take. Think of them as the heartbeat and diagnostic system that keeps your application resilient.

Health Check Implementation

// src/health/health.controller.ts
@Controller("health")
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private db: TypeOrmHealthIndicator,
    private redis: RedisHealthIndicator,
    private disk: DiskHealthIndicator
  ) {}

  // Basic liveness: "Am I running?"
  @Get("liveness")
  @HealthCheck()
  liveness() {
    return this.health.check([() => ({ status: "ok" })]);
  }

  // Detailed readiness: "Can I serve traffic?"
  @Get("readiness")
  @HealthCheck()
  readiness() {
    return this.health.check([
      // Database
      () => this.db.pingCheck("database", { timeout: 2000 }),

      // Redis
      () => this.redis.pingCheck("redis", { timeout: 2000 }),

      // S3 connectivity
      async () => this.checkS3(),

      // ClamAV
      async () => this.checkClamAV(),

      // Disk space
      () =>
        this.disk.checkStorage("disk", {
          thresholdPercent: 0.9, // Alert at 90%
          path: "/",
        }),
    ]);
  }

  @Get()
  @HealthCheck()
  check() {
    return this.health.check([
      () => this.db.pingCheck("database"),
      () => this.redis.pingCheck("redis"),
      async () => this.checkS3(),
      async () => this.checkClamAV(),
      () =>
        this.disk.checkStorage("disk", {
          thresholdPercent: 0.9,
          path: "/",
        }),
    ]);
  }

  private async checkS3(): Promise<HealthIndicatorResult> {
    try {
      await this.s3.headBucket({
        Bucket: this.config.get("S3_BUCKET"),
      });
      return { s3: { status: "up" } };
    } catch (error) {
      return { s3: { status: "down", error: error.message } };
    }
  }

  private async checkClamAV(): Promise<HealthIndicatorResult> {
    try {
      await this.clamav.ping();
      return { clamav: { status: "up" } };
    } catch (error) {
      return { clamav: { status: "down", error: error.message } };
    }
  }
}

Kubernetes Configuration

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: file-upload-api
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: api
          image: file-upload-api:latest
          ports:
            - containerPort: 3000

          # Liveness: Restart if this fails
          livenessProbe:
            httpGet:
              path: /health/liveness
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3

          # Readiness: Don't send traffic if this fails
          readinessProbe:
            httpGet:
              path: /health/readiness
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 2

Why This Matters

Without health checks:

Pod crashes --> Kubernetes doesn't know --> Traffic keeps coming --> Errors

With health checks:

Pod unhealthy --> Readiness fails --> No traffic --> Time to recover
Pod crashes --> Liveness fails --> Auto-restart --> Back online

Monitoring in Action

# Check health
curl http://api/health

# Response
{
  "status": "ok",
  "info": {
    "database": { "status": "up" },
    "redis": { "status": "up" },
    "s3": { "status": "up" },
    "clamav": { "status": "up" },
    "disk": { "status": "up", "usage": "45%" }
  }
}

# If something's wrong
{
  "status": "error",
  "error": {
    "clamav": { "status": "down", "error": "Connection refused" }
  }
}

What I Learned

Lesson #6: Health checks are non-negotiable in production. Kubernetes needs to know when to restart your pods and when to route traffic.

Now I had a production-ready system, but how do I deploy it?


Phase 7: Production Packaging & Deployment

The Deployment Challenge

I had working code, but needed:

  • Docker images for API and workers
  • Docker Compose for local/staging
  • Helm charts for Kubernetes
  • Documentation for deployment
  • SDK generation for client libraries

Docker Multi-Stage Builds

Why multi-stage builds? A multi-stage Dockerfile uses multiple FROM statements to create intermediate build stages. The first stage compiles your code with all dev dependencies (compilers, build tools, etc.), while the final stage copies only the compiled output and production dependencies. This results in much smaller images (often 50-70% smaller) because you're not shipping build tools to production. Smaller images = faster deployments, less storage, and smaller attack surface.

# apps/api/Dockerfile
FROM node:20-alpine AS builder

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci

# Build
COPY . .
RUN npm run build

# Production image
FROM node:20-alpine AS runner

WORKDIR /app

# Copy only production dependencies
COPY package*.json ./
RUN npm ci --only=production

# Copy built app
COPY --from=builder /app/dist ./dist

# Non-root user for security
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
USER nodejs

EXPOSE 3000

CMD ["node", "dist/main"]

Docker Compose for Local Development

For local development, I use MinIO instead of AWS S3. MinIO is an open-source, S3-compatible object storage server that perfectly mimics AWS S3's API. The beauty? You can develop and test locally without touching AWS, and when you deploy to production, you simply swap the endpoint from MinIO to S3 - no code changes needed. It's a game-changer for local development.

# deploy/compose/docker-compose.yml
version: "3.8"

services:
  # API Server
  api:
    build:
      context: ../../apps/api
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - REDIS_HOST=redis
      - REDIS_PORT=6379
      - S3_ENDPOINT=http://minio:9000
      - CLAMAV_HOST=clamav
      - CLAMAV_PORT=3310
    depends_on:
      redis:
        condition: service_healthy
      minio:
        condition: service_healthy
      clamav:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # Background Workers
  workers:
    build:
      context: ../../workers
      dockerfile: Dockerfile
    environment:
      - NODE_ENV=production
      - REDIS_HOST=redis
      - S3_ENDPOINT=http://minio:9000
      - CLAMAV_HOST=clamav
    depends_on:
      - redis
      - minio
      - clamav

  # Redis (Job Queue)
  # What is Redis? It's an in-memory data store that's incredibly fast (sub-millisecond latency).
  # We use it as the backbone for BullMQ to store job queues, track job status, and coordinate
  # between multiple worker processes. Think of it as a super-fast, shared memory space that
  # all your services can access to communicate and coordinate work.
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5

  # MinIO (S3-compatible storage)
  # What is MinIO? It's an open-source, S3-compatible object storage server
  # that you can run locally or self-host. Perfect for development and testing
  # because it mimics AWS S3's API exactly, but runs on your machine.
  # In production, you'd swap this for real AWS S3 with zero code changes.
  minio:
    image: minio/minio:latest
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      - MINIO_ROOT_USER=minioadmin
      - MINIO_ROOT_PASSWORD=minioadmin
    command: server /data --console-address ":9001"
    volumes:
      - minio-data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  # ClamAV (Virus Scanner)
  clamav:
    image: clamav/clamav:latest
    ports:
      - "3310:3310"
    volumes:
      - clamav-data:/var/lib/clamav
    healthcheck:
      test: ["CMD", "clamdscan", "--ping"]
      interval: 60s
      timeout: 10s
      retries: 3
      start_period: 300s # ClamAV takes time to start

  # Redis Commander (UI for debugging)
  redis-commander:
    image: rediscommander/redis-commander:latest
    ports:
      - "8081:8081"
    environment:
      - REDIS_HOSTS=local:redis:6379
    depends_on:
      - redis

volumes:
  redis-data:
  minio-data:
  clamav-data:

Helm Charts for Kubernetes

What is Helm? Helm is the "package manager for Kubernetes" - think npm for Node.js or apt for Linux, but for Kubernetes applications. Instead of managing dozens of YAML files manually, Helm lets you define your entire application stack in a reusable "chart" with templating and versioning. You can deploy, upgrade, and rollback complex applications with a single command. It's essential for managing Kubernetes applications at scale.

# deploy/helm/file-upload-system/values.yaml
replicaCount: 3

image:
  repository: your-registry/file-upload-api
  tag: "1.0.0"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 3000

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: api.yourdomain.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: api-tls
      hosts:
        - api.yourdomain.com

resources:
  limits:
    cpu: 1000m
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

redis:
  enabled: true
  master:
    persistence:
      size: 8Gi

s3:
  bucket: my-uploads
  region: us-east-1

clamav:
  enabled: true
  resources:
    limits:
      memory: 2Gi
    requests:
      memory: 1Gi

Deployment Commands

# Local development with Docker Compose
docker-compose -f deploy/compose/docker-compose.yml up -d

# Kubernetes with Helm
helm install file-upload ./deploy/helm/file-upload-system \
  --set image.tag=v1.0.0 \
  --set s3.bucket=prod-uploads \
  --set ingress.hosts[0].host=api.yourcompany.com \
  --namespace production

# Update deployment
helm upgrade file-upload ./deploy/helm/file-upload-system \
  --set image.tag=v1.1.0

# Rollback if needed
helm rollback file-upload

SDK Generation

I created scripts to auto-generate client SDKs from my OpenAPI spec:

# scripts/generate-sdk.sh
#!/bin/bash

# Generate TypeScript SDK
npx @openapitools/openapi-generator-cli generate \
  -i openapi/openapi.yaml \
  -g typescript-axios \
  -o sdks/typescript \
  --additional-properties=npmName=@yourcompany/upload-sdk

# Generate Python SDK
npx @openapitools/openapi-generator-cli generate \
  -i openapi/openapi.yaml \
  -g python \
  -o sdks/python \
  --additional-properties=packageName=upload_sdk

What I Learned

Lesson #7: Production readiness means more than working code. You need Docker images, orchestration configs, monitoring, and docs.


Key Takeaways

Architecture Principles

  1. Decouple upload from processing

    • Use presigned URLs for uploads
    • Queue background jobs for processing
    • Return responses immediately
  2. Security is not optional

    • Scan every file for viruses
    • Validate content types and sizes
    • Use time-limited, single-use URLs
  3. Design for failure

    • Retry failed jobs with exponential backoff
    • Quarantine infected files automatically
    • Health checks for all dependencies
  4. Make it observable

    • Health endpoints for monitoring
    • Job queue visibility
    • Structured logging
  5. Package for production

    • Docker images with multi-stage builds
    • Helm charts for Kubernetes
    • Docker Compose for local dev

Performance Wins

  • Direct-to-S3: Eliminated server bottleneck
  • Parallel processing: 60% faster image generation
  • Background jobs: Sub-second API responses
  • Horizontal scaling: 3+ replicas in Kubernetes

Tech Choices That Worked

NestJS - Great DI, decorators, TypeScript support
BullMQ - Reliable job queue with Redis
Sharp - Blazing fast image processing
ClamAV - Free, effective virus scanning
Docker Compose - Easy local development
Helm - Standard Kubernetes deployments


Results

Before (Naive approach):

  • Upload time: 15-30s for 10MB file
  • Server memory: 500MB+ per upload
  • Concurrent uploads: Max 5
  • Security: None

After (Production system):

  • Upload time: 2-3s for 10MB file
  • Server memory: <50MB per upload
  • Concurrent uploads: Unlimited (S3 handles it)
  • Security: 100% scanned, quarantine system

Try It Yourself

The entire project is open source and ready to run:

# Clone the repo
git clone https://github.com/Aymen-Guerrouf/file-upload-system

# Start everything with Docker Compose
cd file-upload-system
docker-compose -f deploy/compose/docker-compose.yml up -d

# Test it
curl http://localhost:3000/health

GitHub: Aymen-Guerrouf/file-upload-system


Final Thoughts

Building this system taught me that production-ready means so much more than "it works on my machine":

  • It means thinking about scale from day one
  • It means security can't be an afterthought
  • It means observability and monitoring are critical
  • It means packaging and deployment matter

If you're building a file upload system, don't start with the naive approach. Learn from my mistakes and build it right the first time.

What's Next?

Some ideas for extending this system:

  • CDN integration for serving files
  • Webhooks to notify clients of processing completion
  • Advanced image processing (face detection, OCR)
  • Video processing (transcoding, thumbnails)
  • Metrics and dashboards (Prometheus + Grafana)

Let's Connect

Found this helpful? Have questions? Want to contribute?

If you learned something, give the repo a star - it helps others find it!


Built with love and lots of coffee. Happy coding!

Enjoyed this post?

Found this helpful? Share the link with others!