How to build a Distributed File System

A simple DFS inspired by the Google File System

27 October, 2020

Contributors

Masa Ueda

@masa

TLDR ⏰

The best way to learn about a system is to build one yourself. Here I took 2 weeks to study, research, implement, and test a simple Distributed File System based off learnings from the Google File System and Hadoop's HDFS. If you're interested in Big Data, I hope you find this helpful!

Languages

Java

Serialization Frameworks

Protobuf

Cloud Hosting

Amazon EC2

Databases

Hadoop

㌥ System Design

Client Node

•

Responsibilities

The client is responsible for user interaction. This is what the you will be interacting with. It will handle:

Storing a file

Requesting, and retrieving a file

Before the client stores a file, it will break the files into multiple chunks, communicate with the Controllers on where to store (which Storage Nodes) and retrieve the file chunks, then deliver the file chunks to the respective Storage Nodes to be stored safely

•

Data Structures

Map of the Filename to a list of Chunks to where it was stored

•

Features

Parallel Retrievals: Client will split files into multiple chunks before storage. Client retrieves these chunks in parallel using a thread pool.

Chunk size to be received as an input by the User

Recontructs the file after retrieving all chunks from Storage Nodes

Controller

The Controller is responsible for managing resources in the system, somewhat like an HDFS NameNode.

•

Features

Probabilistic Routing: to enable lookups without requiring excessive RAM, client requests will be routed probabilistically to relevant storage nodes via a collection of bloom filters.

Maintains metadata on contents stored by each Storage node

•

Data Structures:

Active Storage Nodes

Routing table: Set of BloomFilters for each individual Storage Node

Metadata on Storage Node contents

Storage Node Timestamps

•

Responsibilities

File storage: When the client wishes to store a new file, it will send a storage request to the controller, and will be responded with a list of destination storage nodes (plus replica locations) to send the chunks to.

File retrieval: Returns a list of SN’s to the client that ‘may’ contain the given chunk queried. Uses the bloom filters to check for existence of the chunk.

Maintaining Storage nodes: Receives heartbeats from SN’s to continuously update the status of each storage node (Status: Memory availability, Timestamp, Storage Node Id, Host name, Port number).

Maintaining a routing table: A bloom filter of file names is created for each storage node. When the controller receives a file retrieval request from a client, it will query the bloom filter of each storage node with the file name and return a list of matching nodes (due to the nature of bloom filters, this may include false positives).

Fault tolerance and Node Recovery:

Controller is responsible for detecting storage node failures and ensuring the system replication level is maintained. Every chunk will be replicated n-1 times where n is the replication factor set by the programmer. If a Storage Node goes down, Controller will maintain the replication level by creating more chunk copies.

At each interval, the controller checks if the timestamps of the last received Heartbeat message from each SN. If the timestamp has passed a given threshold, it assumes the SN is inactive and performs a recovery of all the files contained in the SN

Storage Node

•

Features

Entropy-Driven Compression: file entropy is analyzed before storage; low-entropy files will be compressed before storage to save disk space.

When a chunk is stored, it will be checksummed so on-disk corruption can be detected.

When a chunk is being retrieved, its checksum will be checked. If it does not match, on-disk corruption is detected, and the Node will retrieve an uncorrupted copy from its replicas.

Accepted messages:

Store chunk [File name, Chunk Number, Chunk Data]

Get total number of chunks [File name]

Retrieve chunk [File name, Chunk Number]

List chunks and file names [No input]

•

Data Structures

File checksums

Zipped files

File and the chunks associated with the File

•

Responsibilities

Storage nodes are responsible for storing and retrieving file chunks.

When retrieving a file, if the file has been corrupted, Storage Nodes will fetch a new version of the file from its replicas and replace its corrupted version, before responding to the client request.

Architecture & Workflows

•

Communication

Interoperability: the DFS will use Google Protocol Buffers to serialize messages. Do not use Java serialization. This allows other applications to easily implement your wire format.

•

Store a file Workflow

•

Retrieve a file Workflow

•

File Corruption Workflow

•

Node recovery & Fault Tolerance Workflow

Node recovery & Fault tolerance Workflow

big data

protocol buffers

hadoop

google file system

How to build a Distributed File System

TLDR ⏰

㌥ System Design

Client Node

Controller

Storage Node

Architecture & Workflows

More Articles