Building a Distributed Storage Metadata Service in Go

Modern distributed storage systems like Amazon S3, Google Drive, and HDFS are built on a deceptively simple idea: separate metadata from data, and manage storage in chunks instead of files.

In this project, I built a production-style metadata service in Go that mimics core concepts of distributed object storage systems:

Chunk-based storage (4MB fixed-size blocks)
Content-addressed storage using SHA-256 hashing
Deduplication using reference counting
gRPC streaming upload/download APIs
PostgreSQL-backed metadata layer
Background garbage collection system

This article breaks down the architecture, design decisions, and internal mechanics of the system.

System Overview

At a high level, the system is composed of four main layers:

Client
  ↓
gRPC Metadata Service (Control Plane)
  ↓
PostgreSQL (Metadata Store)
  ↓
Local Disk Storage (Chunk Store)
  ↓
Garbage Collection Worker

The system is designed to simulate how real-world storage engines separate:

Control plane - metadata, indexing, lifecycle
Data plane - actual file storage
Background maintenance - GC, cleanup

Core Design Goals

The system was designed with the following goals:

1. Efficient large file handling

Files are split into 4MB chunks and streamed.

2. Deduplication at the storage layer

Identical chunks are stored only once using SHA-256 hashing.

3. Strong metadata consistency

PostgreSQL acts as the single source of truth.

4. Fault-tolerant deletion

Deletion is delayed and handled via garbage collection.

5. Streaming-first design

No full file buffering on the server side.

Architecture Breakdown

1. gRPC API Layer (Metadata Service)

The API layer exposes three core operations:

UploadObject (streaming upload)
DownloadObject (streaming download)
DeleteObject (logical deletion)

Why gRPC streaming?

Instead of uploading files in a single request, the system uses streaming:

Handles large files efficiently
Avoids memory pressure
Enables real-time chunk processing

2. Upload Pipeline

The upload flow is the most critical part of the system.

Step-by-step flow

Client streams the file in 4MB chunks
Server receives a chunk
SHA-256 hash is computed
Chunk is written to disk (if not already present)
Chunk metadata is stored in PostgreSQL
Object -> chunk mapping is created

Key design decision: content addressing

Each chunk is stored using:

SHA256(chunk_data) → filename

This ensures:

Deterministic storage
Deduplication across all objects
Fast lookup without scanning disk

Storage layout

data/chunks/ab/cd/<sha256-hash>

This prevents single-directory file explosion and improves filesystem performance.

3. Metadata Schema (PostgreSQL)

PostgreSQL acts as the system's source of truth.

Objects table

Stores file-level metadata:

object_id (UUID)
name
status (PENDING -> READY)
deleted flag
timestamps

Chunks table

Stores deduplicated chunks:

chunk_id
hash
storage_path
ref_count

object_chunks table

Maintains ordering:

object_id
chunk_id
order_index

This is essential for reconstructing files correctly during download.

Deduplication System

The system implements hash-based deduplication with reference counting.

How it works?

When a chunk is uploaded:

If the chunk already exists -> increment ref_count
Else -> create a new file and DB entry

Example

Object A → [chunk1, chunk2, chunk3]
Object B → [chunk2, chunk3, chunk4]

Chunks 2 and 3 are shared.

This reduces storage usage significantly for:

Backups
Versioned files
Repeated uploads

Deletion Model (Two-Phase Design)

Deletion is not immediate. Instead, it follows a 2-phase lifecycle.

Phase 1: Logical deletion

Mark object as deleted = true
Remove object -> chunk mappings
Decrement chunk reference counts

At this stage, data is still physically present on disk.

Phase 2: Garbage collection

A background worker periodically:

Finds orphan chunks
Deletes files with ref_count == 0
Removes metadata entries

This ensures:

No race conditions
Safe concurrent uploads/deletes
Crash-safe cleanup

Garbage Collection System

The GC worker runs continuously and performs the following.

1. Deleted object cleanup

Finds objects marked as deleted
Processes associated chunks
Removes metadata safely

2. Orphan chunk detection

Chunks not referenced by any object:

NOT EXISTS (SELECT 1 FROM object_chunks)

...are considered garbage.

3. Physical deletion

Removes the file from disk
Deletes the DB record

Why background GC instead of immediate deletion?

Because immediate deletion can cause:

Race conditions during uploads
Inconsistent reference counts
Partial reads during streaming

Download Flow

The download process reconstructs files deterministically.

Steps

Fetch object metadata
Retrieve ordered chunks
Stream each chunk via gRPC
Write sequentially to the client file

Guarantee

File reconstruction is byte-identical to the original input.

Consistency & Transactions

All critical operations use PostgreSQL transactions:

Upload object
Delete object
Chunk mapping updates

This ensures:

Atomicity
No partial writes
Safe failure recovery

Performance Characteristics

What this system optimizes for

Large file streaming
Storage deduplication
Metadata consistency
Sequential reconstruction speed

What it intentionally does NOT optimize

Distributed scaling
Multi-node replication
High availability
Global consistency protocols

Key System Properties

Property	Description
Content Addressable Storage	Chunks identified by SHA-256 hash
Deduplication	Same chunk stored only once
Streaming Upload/Download	No full-file buffering
Safe Garbage Collection	Reference-count + background cleanup
Transactional Metadata Layer	PostgreSQL ensures correctness

Limitations

This is a single-node prototype system, not a production storage engine.

Limitations include:

No distributed chunk replication
No fault tolerance across nodes
No erasure coding
No consensus layer (Raft, etc.)
GC is periodic, not event-triggered
Local disk storage only

Even as a prototype, this system demonstrates real-world engineering concepts used in:

Amazon S3 - object storage design
HDFS -- chunk-based storage model
Git - content-addressed storage principles
Ceph - metadata and object separation

Conclusion

This project is a miniature storage engine, built to explore how real distributed storage systems are designed internally.

It brings together the following:

Systems programming (Go)
Database design (PostgreSQL)
Storage architecture (CAS + deduplication)
Network programming (gRPC streaming)
Background processing (GC systems)

Command Palette

System Overview

Core Design Goals

1. Efficient large file handling

2. Deduplication at the storage layer

3. Strong metadata consistency

4. Fault-tolerant deletion

5. Streaming-first design

Architecture Breakdown

1. gRPC API Layer (Metadata Service)

Why gRPC streaming?

2. Upload Pipeline

Step-by-step flow

Key design decision: content addressing

Storage layout

3. Metadata Schema (PostgreSQL)

Objects table

Chunks table

object_chunks table

Deduplication System

How it works?

Example

Deletion Model (Two-Phase Design)

Phase 1: Logical deletion

Phase 2: Garbage collection

Garbage Collection System

1. Deleted object cleanup

2. Orphan chunk detection

3. Physical deletion

Why background GC instead of immediate deletion?

Download Flow

Steps

Guarantee

Consistency & Transactions

Performance Characteristics

What this system optimizes for

What it intentionally does NOT optimize

Key System Properties

Limitations

Conclusion

Comments

More from this blog