Building a Large File Chunked Upload/Download System

Introduction

Recently, while learning Node.js + frontend development, I encountered a very practical problem: how to elegantly handle large file uploads and downloads? Traditional single-file uploads become inadequate when dealing with multi-GB videos and data packages. After a period of research and practice, I implemented a complete large file chunked upload/download system that supports resumable uploads, instant uploads, and other features.

Today, I want to share the entire implementation process with everyone, hoping to help fellow developers who are learning Node.js + frontend development.

Technology Stack Selection

Before we begin, let me introduce the technology stack I chose:

Backend (download-service-backend):

Node.js + Express: Lightweight, rapid API development
TypeScript: Type safety, reducing runtime errors
Multer: Professional file upload middleware
fs-extra: Enhanced file system operations
crypto: Node.js built-in encryption module for calculating file hashes

Frontend (download-service-web-app):

React 19 + TypeScript: Modern frontend framework
Vite: Fast build tool
Tailwind CSS: Utility-first styling framework
js-md5: Client-side hash calculation

System Architecture Overview

The following diagram illustrates the complete architecture of our large file chunked upload/download system:

Rendering diagram...

Core Concepts Explained

Before diving into the code, let's understand several key concepts:

1. File Chunking

Breaking large files into smaller pieces (I set it to 2MB per chunk) and uploading them separately. Benefits include:

Reducing the risk of single transmission failures
Supporting parallel uploads (though my demo uses serial uploads)
Only needing to retransmit failed chunks when network interruptions occur

2. Resume Upload

After network interruptions, the ability to continue uploading from where it left off rather than starting over. Implementation principle:

Server records uploaded chunks
Client queries progress when reconnecting
Only uploads missing chunks

3. Instant Upload

If the server already has the same file (determined by hash value), it directly returns the download link without needing to upload again.

4. Range Requests (HTTP Range Requests)

Supporting resumable downloads, where clients can request specific byte ranges of files.

System Architecture Diagram

Rendering diagram...

Detailed Implementation Process

Step 1: Frontend File Processing

1.1 File Hash Calculation

First, we need to generate a unique identifier for files using MD5 hash:

// src/utils/upload.ts
import { md5 } from "js-md5";

export async function calculateFileHash(file: File): Promise<string> {
  const buffer = await file.arrayBuffer();
  const hash = md5(buffer);
  return hash;
}

Technical Points:

Using file.arrayBuffer() to read the entire file into memory
For extremely large files, consider chunked hash calculation to save memory

1.2 File Chunking

// Default chunk size 2MB
export const DEFAULT_CHUNK_SIZE = 2 * 1024 * 1024;

export function createFileChunks(
  file: File,
  chunkSize: number = DEFAULT_CHUNK_SIZE
): Blob[] {
  const chunks: Blob[] = [];
  let start = 0;

  while (start < file.size) {
    const end = Math.min(start + chunkSize, file.size);
    chunks.push(file.slice(start, end)); // Key: Using File.slice to create chunks
    start = end;
  }

  return chunks;
}

Technical Points:

File.slice() doesn't actually copy data, only creates references, making it memory-friendly
Chunk size needs to be balanced: too small increases HTTP request overhead, too large affects resumable upload effectiveness

1.3 Upload State Management

I use React Hooks to manage upload state:

// src/hooks/useFileUpload.ts
export const UploadStatus = {
  IDLE: "idle", // Waiting for upload
  HASHING: "hashing", // Calculating hash
  UPLOADING: "uploading", // Uploading
  PAUSED: "paused", // Paused
  COMPLETED: "completed", // Completed
  ERROR: "error", // Upload failed
} as const;

export interface UploadItem {
  id: string;
  file: File;
  fileHash: string;
  status: UploadStatus;
  progress: number;
  uploadedChunks: number[]; // Uploaded chunk indices
  needUpload: number[]; // Chunks that need to be uploaded
  totalChunks: number;
  chunkSize: number;
  error?: string;
  downloadUrl?: string;
}

Step 2: Backend API Design

2.1 Upload Initialization Interface

// src/controllers/uploadController.ts
async initUpload(req: Request, res: Response): Promise<void> {
  const { fileName, fileSize, totalChunks, fileHash } = req.body;

  // 1. Check if file already exists (instant upload)
  const completedFilePath = path.join(this.completedDir, fileHash);
  const isCompleted = await fs.pathExists(completedFilePath);

  if (isCompleted) {
    res.json({
      success: true,
      message: 'File already exists, no need to upload again',
      data: {
        fileHash,
        downloadUrl: `/api/download/${fileHash}`
      }
    });
    return;
  }

  // 2. Check uploaded chunks (key for resumable upload)
  const chunkDirPath = path.join(this.chunksDir, fileHash);
  await ensureDir(chunkDirPath);

  const uploadedChunks: number[] = [];
  for (let i = 0; i < totalChunks; i++) {
    const chunkPath= path.join(chunkDirPath, `chunk_${i}`);
    if (await fs.pathExists(chunkPath)) {
      uploadedChunks.push(i);
    }
  }

  // 3. Calculate chunks that need to be uploaded
  const needUpload= Array.from({ length: totalChunks }, (_, i)=> i)
    .filter(i=> !uploadedChunks.includes(i));

  res.json({
    success: true,
    message: 'Initialization successful',
    data: {
      fileHash,
      needUpload  // Tell frontend which chunks need to be uploaded
    }
  });
}

Technical Points:

Implementing resumable upload by checking chunk file existence
Returning needUpload array, frontend only uploads missing chunks

2.2 Chunk Upload Interface

async uploadChunk(req: Request, res: Response): Promise<void> {
  if (!req.file) {
    res.status(400).json({ success: false, message: 'No uploaded file found' });
    return;
  }

  const { chunkIndex, totalChunks, fileHash, fileName, totalSize } = req.body;

  const chunkInfo: ChunkInfo = {
    chunkIndex: parseInt(chunkIndex),
    chunkSize: req.file.size,
    totalChunks: parseInt(totalChunks),
    fileHash,
    fileName,
    totalSize: parseInt(totalSize)
  };

  // Save chunk
  const chunkBuffer = req.file.buffer;
  const success = await this.uploadService.uploadChunk(chunkInfo, chunkBuffer);

  if (success) {
    // Check if all chunks have been uploaded
    const progress = await this.uploadService.checkUploadStatus(fileHash);

    res.json({
      success: true,
      message: `Chunk ${chunkIndex} uploaded successfully`,
      data: {
        fileHash,
        uploadedChunks: progress?.uploadedChunks || [],
        ...(progress?.isCompleted && {
          downloadUrl: `/api/download/${fileHash}`
        })
      }
    });
  }
}

2.3 Chunk Merging Logic

This is the most critical part of the entire system:

// src/utils/fileSystem.ts
export async function mergeChunks(
  chunkDir: string,
  outputPath: string,
  totalChunks: number
): Promise<void> {
  const writeStream = fs.createWriteStream(outputPath);

  try {
    // Key: Process each chunk in order
    for (let i = 0; i < totalChunks; i++) {
      const chunkPath= path.join(chunkDir, `chunk_${i}`);

      // Check if chunk file exists
      if (!(await fs.pathExists(chunkPath))) {
        throw new Error(`Chunk file does not exist: chunk_${i}`);
      }

      const chunkBuffer= await fs.readFile(chunkPath);

      // Use Promise to wrap write operation, ensuring chunks are written in order
      await new Promise<void>((resolve, reject) => {
        writeStream.write(chunkBuffer, (error) => {
          if (error) reject(error);
          else resolve();
        });
      });
    }

    writeStream.end();

    // Wait for write completion
    await new Promise<void>((resolve, reject) => {
      writeStream.on("finish", resolve);
      writeStream.on("error", reject);
    });
  } catch (error) {
    writeStream.destroy();
    throw error;
  }
}

Technical Points:

Must merge chunks in index order to ensure file integrity
Use Promise to wrap asynchronous write operations
Properly clean up resources on errors

2.4 File Integrity Verification

// src/services/uploadService.ts
private async mergeFile(fileHash: string, chunkInfo: ChunkInfo): Promise<void> {
  const chunkDirPath = path.join(this.chunksDir, fileHash);
  const outputPath = path.join(this.completedDir, fileHash);

  // Merge chunks
  await mergeChunks(chunkDirPath, outputPath, chunkInfo.totalChunks);

  // Key: Verify file integrity
  const mergedFileHash = await calculateFileHash(outputPath);
  if (mergedFileHash !== fileHash) {
    await safeDeleteFile(outputPath);  // Delete corrupted file
    throw new Error('File hash mismatch after merging, file may be corrupted');
  }

  // Clean up temporary chunk files
  await fs.remove(chunkDirPath);
}

Step 3: Resumable Download

3.1 HTTP Range Request Handling

// src/services/downloadService.ts
private async handleRangeRequest(
  filePath: string,
  range: string,
  fileSize: number,
  res: Response
): Promise<void> {
  // Parse Range header: bytes=1024-2048
  const parts = range.replace(/bytes=/, '').split('-');
  const start = parseInt(parts[0], 10);
  const end = parts[1] ? parseInt(parts[1], 10) : fileSize - 1;

  if (start >= fileSize || end >= fileSize) {
    res.status(416).setHeader('Content-Range', `bytes */${fileSize}`);
    res.end();
    return;
  }

  const chunkSize = (end - start) + 1;
  const readStream = fs.createReadStream(filePath, { start, end });

  // Set 206 Partial Content response
  res.status(206);
  res.setHeader('Content-Range', `bytes ${start}-${end}/${fileSize}`);
  res.setHeader('Accept-Ranges', 'bytes');
  res.setHeader('Content-Length', chunkSize.toString());

  await new Promise<void>((resolve, reject) => {
    readStream.pipe(res);
    readStream.on('end', resolve);
    readStream.on('error', reject);
  });
}

Technical Points:

Status code 206 indicates partial content
Content-Range header tells client the current byte range being returned
Accept-Ranges: bytes declares support for byte range requests

Complete Upload Flow Sequence Diagram

Rendering diagram...

Technical Challenges and Solutions

Challenge 1: Large File Memory Management

Problem: Reading large files directly into memory can cause memory overflow.

Solution:

// ❌ Wrong approach: Read large file at once
const fileBuffer = await fs.readFile(filePath);

// ✅ Correct approach: Use streaming
const readStream = fs.createReadStream(filePath);
readStream.pipe(writeStream);

Challenge 2: Chunk Order Guarantee

Problem: Asynchronous uploads may cause chunks to be out of order, corrupting files during merging.

Solution:

// Force processing by index order during merging
for (let i = 0; i < totalChunks; i++) {
  const chunkPath = path.join(chunkDir, `chunk_${i}`);
  // Write each chunk in order
}

Challenge 3: Concurrency Safety

Problem: Multiple clients uploading the same file simultaneously may create race conditions.

Solution:

// Use file locks or atomic operations
const lockFile = path.join(chunkDir, ".lock");
if (await fs.pathExists(lockFile)) {
  throw new Error("File is being processed by another process");
}
await fs.writeFile(lockFile, "");

Challenge 4: Error Recovery

Problem: How to recover upload state after network interruptions or service restarts.

Solution:

// Persist upload progress to database or files
private uploadProgress = new Map<string, UploadProgress>();

// Recover state by scanning disk when service restarts
async recoverUploadProgress() {
  const chunkDirs = await fs.readdir(this.chunksDir);
  for (const fileHash of chunkDirs) {
    // Scan chunk files to recover progress
  }
}

Challenge 5: Frontend State Management

Problem: Complex upload states (pause, resume, retry) are difficult to manage.

Solution:

// Use AbortController to manage requests
const abortController = new AbortController();
abortControllersRef.current.set(id, abortController);

// Cancel requests when pausing
const pauseUpload = useCallback((id: string) => {
  const abortController = abortControllersRef.current.get(id);
  if (abortController) {
    abortController.abort();
  }
}, []);

Performance Optimization Suggestions

1. Chunk Size Optimization

// Dynamically adjust chunk size based on network environment
const getOptimalChunkSize = (networkSpeed: number) => {
  if (networkSpeed > 10) return 5 * 1024 * 1024; // 5MB for fast network
  if (networkSpeed > 1) return 2 * 1024 * 1024; // 2MB for normal network
  return 1 * 1024 * 1024; // 1MB for slow network
};

2. Concurrent Upload Control

// Limit the number of simultaneous chunk uploads
const maxConcurrentUploads = 3;
const uploadQueue = new Map();

3. Client-side Caching

// Cache calculated file hashes
const hashCache = new Map<string, string>();

Deployment Considerations

1. Nginx Configuration

# Increase upload file size limit
client_max_body_size 100M;

# Timeout settings
client_body_timeout 60s;
client_header_timeout 60s;

2. Node.js Configuration

// Increase request body size limit
app.use(express.json({ limit: "50mb" }));
app.use(express.urlencoded({ extended: true, limit: "50mb" }));

3. File Cleanup Strategy

// Regularly clean up expired temporary chunks
const cleanupOldChunks = async () => {
  const oneDayAgo = Date.now() - 24 * 60 * 60 * 1000;
  // Clean up temporary files older than 1 day
};

Summary and Learning Recommendations

Through this project, I gained deep understanding of file upload/download technical principles, with main takeaways:

Key Technical Points

File Chunking: Proper use of File.slice()
Hash Calculation: Different MD5 implementations on frontend and backend
Streaming: Avoiding large file memory issues
HTTP Range: Standard implementation of resumable downloads
State Management: Complex asynchronous flow state control

Advice for Beginners

Understand concepts before coding: Get clear on chunking, hashing, Range requests, etc.
Emphasize error handling: Must consider exceptions like network interruptions, file corruption
Implement features incrementally:
- Version 1: Basic file upload/download
- Version 2: Add chunking functionality
- Version 3: Implement resumable uploads
- Version 4: Add instant upload
Test edge cases extensively:
- Extra large files (several GB)
- Network interruption recovery
- Concurrent uploads
- Service restarts
Learn related technologies:
- Node.js Streams
- Deep HTTP protocol understanding
- Frontend File APIs
- Advanced TypeScript usage

Extension Directions

This basic version can be further optimized:

Database Persistence: Store upload progress in database
Multi-file Concurrency: Support simultaneous upload of multiple files
Chunk Verification: Hash verification for each chunk
Compressed Transmission: Compress chunks before transmission
CDN Integration: Store files in cloud storage services

I hope this article helps fellow developers learning Node.js + frontend! Feel free to reach out with any questions for discussion.

Project Repository: