cover-img

How to build a Distributed File System

A simple DFS inspired by the Google File System

27 October, 2020

7

7

0

Contributors

TLDR ⏰

The best way to learn about a system is to build one yourself. Here I took 2 weeks to study, research, implement, and test a simple Distributed File System based off learnings from the Google File System and Hadoop's HDFS. If you're interested in Big Data, I hope you find this helpful!
Languages
Java

Java

Serialization Frameworks
Protobuf

Protobuf

Cloud Hosting
Amazon EC2

Amazon EC2

Databases
Hadoop

Hadoop

㌥ System Design

Client Node

Responsibilities
The client is responsible for user interaction. This is what the you will be interacting with. It will handle:
Storing a file
Requesting, and retrieving a file
Before the client stores a file, it will break the files into multiple chunks, communicate with the Controllers on where to store (which Storage Nodes) and retrieve the file chunks, then deliver the file chunks to the respective Storage Nodes to be stored safely

Data Structures
Map of the Filename to a list of Chunks to where it was stored

Features
Parallel Retrievals: Client will split files into multiple chunks before storage. Client retrieves these chunks in parallel using a thread pool.
Chunk size to be received as an input by the User
Recontructs the file after retrieving all chunks from Storage Nodes

Controller

The Controller is responsible for managing resources in the system, somewhat like an HDFS NameNode.

Features
Probabilistic Routing: to enable lookups without requiring excessive RAM, client requests will be routed probabilistically to relevant storage nodes via a collection of bloom filters.
Maintains metadata on contents stored by each Storage node

Data Structures:
Active Storage Nodes
Routing table: Set of BloomFilters for each individual Storage Node
Metadata on Storage Node contents
Storage Node Timestamps

Responsibilities
File storage: When the client wishes to store a new file, it will send a storage request to the controller, and will be responded with a list of destination storage nodes (plus replica locations) to send the chunks to.
File retrieval: Returns a list of SN’s to the client that ‘may’ contain the given chunk queried. Uses the bloom filters to check for existence of the chunk.
Maintaining Storage nodes: Receives heartbeats from SN’s to continuously update the status of each storage node (Status: Memory availability, Timestamp, Storage Node Id, Host name, Port number).
Maintaining a routing table: A bloom filter of file names is created for each storage node. When the controller receives a file retrieval request from a client, it will query the bloom filter of each storage node with the file name and return a list of matching nodes (due to the nature of bloom filters, this may include false positives).
Fault tolerance and Node Recovery:
Controller is responsible for detecting storage node failures and ensuring the system replication level is maintained. Every chunk will be replicated n-1 times where n is the replication factor set by the programmer. If a Storage Node goes down, Controller will maintain the replication level by creating more chunk copies.
At each interval, the controller checks if the timestamps of the last received Heartbeat message from each SN. If the timestamp has passed a given threshold, it assumes the SN is inactive and performs a recovery of all the files contained in the SN

Storage Node

Features
Entropy-Driven Compression: file entropy is analyzed before storage; low-entropy files will be compressed before storage to save disk space.
When a chunk is stored, it will be checksummed so on-disk corruption can be detected.
When a chunk is being retrieved, its checksum will be checked. If it does not match, on-disk corruption is detected, and the Node will retrieve an uncorrupted copy from its replicas.
Accepted messages:
Store chunk [File name, Chunk Number, Chunk Data]
Get total number of chunks [File name]
Retrieve chunk [File name, Chunk Number]
List chunks and file names [No input]

Data Structures
File checksums
Zipped files
File and the chunks associated with the File

Responsibilities
Storage nodes are responsible for storing and retrieving file chunks.
When retrieving a file, if the file has been corrupted, Storage Nodes will fetch a new version of the file from its replicas and replace its corrupted version, before responding to the client request.

Architecture & Workflows

Communication
Interoperability: the DFS will use Google Protocol Buffers to serialize messages. Do not use Java serialization. This allows other applications to easily implement your wire format.

Store a file Workflow
img

Store a file Workflow

Retrieve a file Workflow
img

Retrieve a file Workflow

File Corruption Workflow
img

File Corruption Workflow

Node recovery & Fault Tolerance Workflow
img

Node recovery & Fault tolerance Workflow

big data

protocol buffers

hadoop

google file system

7

7

0

big data

protocol buffers

hadoop

google file system

Masa Ueda

Seattle, WA, USA

Software Engineer @ AWS App Sync. Everyday is Day One. Always excited about all things DevOps and BigData.

Comments are disabled for this post

More Articles

Showwcase is a professional tech network with over 0 users from over 150 countries. We assist tech professionals in showcasing their unique skills through dedicated profiles and connect them with top global companies for career opportunities.

© Copyright 2025. Showcase Creators Inc. All rights reserved.