Jun 17, 2026system designcontainersbackend engineeringsoftware engineeringdockerleetcodedsaalgorithmsdevopssoftware architecture

How Leetcode handles millions of code submissions.

How does LeetCode securely handle millions of code submissions? Dive into the system design, exploring Docker containers, worker queues, caching, and vital security constraints.

If you're CS student or a Software Engineer, you've probably heard about platforms like Leetcode.

If you haven't, it's basically a platform where you strive to solve different curated programming problems on data structures and algorithms in the most efficient way.

Now that out of the way let's get started.

Since this is not a complete system design, we're going to only focus on how Leetcode (and probably other platforms) handle submissions to a problem by users. For the same purpose and we will start with some predefined architecture and iterate from there.

Let's get started!

Initial Architecture

Our initial architecture is pretty simple (we only serve the requirement, not worrying about scale first).

The Client makes a request through the API Gateway to our Submission Service to submit the user's solution and create a record in our database.

How do we run User's code?

The next obvious question should be, how do we run the user's code and check their solution against our test cases.

Let's explore someways:

1. Running the code in our server: This is the worst way to run the user's code. This can expose our server to bad actors who can deploy malicious code that can compromise our security and consume resources.

2. Running in Serverless Functions: While this is 100% doable and a much better approach, serverless require cold start time which can really hamper user experience.

3. Running in Virtual Machines: This is another great approach which can isolate user's code but comes with similar bottlenecks. VMs required cold start time and it can be an operational overhead to manage the lifecycle of VMs to make sure it doesn't consume too many resources due to some malicious code.

5. Running in Docker Containers: This is a great and arguably the optimal approach to our problem. Containers are lightweight isolated environments to run code. They are like VMs, but much faster and efficient. This is what we're going to choose as our solutions.

Since platforms like Leetcode allow users to provide solutions to problems in different programming languages, we will have a containers running for each programming language, in our request body, we can accept a parameter called `lang`, which we can use to choose the right container to run the user's code.

Great let's update our architecture and move on to figure out how we're going to scale our system.

Update with containers for running code.

Scaling to Millions of Users

Leetcode handles hundreds of thousands of submissions per second (especially during contests).

Here's what we can do handle this scale:

1. Vertically Scale: This basically means adding more compute resources to our server (and there's a limit to it). This is a bad decision, and you're probably not an engineer if thought of this.

2. Horizontally Scaling Containers: We can autoscale the no. of container for each language depending on the demand. In case of AWS, something like ECS, would be good enough for this.

The only downside here is that there's a risk of over-provisioning which could result in higher costs for ununsed resources. While this is a manageable risk and modern cloud provides make it easy to downscale and upscale, we can still do better.

3. Horizontally Scaling Containers with Queue: We can take the same exact approach as above but add a queue between the API server and the containers. This will allow us to buffer submissions during peak times and ensure that we don't overwhelm the containers.

We can use a managed queue service like SQS to handle this for us and upscale or downscale the no. of containers based on the queue's size.

Here's how it's going to work:
When the client makes a request to submit a solution, we will create a submission record in our database and push the running of code in to the queue (job). We can have a worker nodes that can pull these jobs, one by one and run the code in respective language's container. When the code run in complete the worker can update the result of the submission in the database.

To show updates about the run status of their submission to the client, we can add http polling to a lightweight http endpoint that fetches the status of the submission every 1-3s. We could've chose SSE here to show updates, but since it doesn't require absolute realtime updates, the complexity for implementation isn't worth it.

To minimize load on our database, the results could be cached and updated in a Redis KV store with the problem and submission ids, so that we don't overwhelm our database with lots of read requests when polling for results. The lightweight endpoint can then fetch the status from this cache.

Network tab showoing the request on submission in Leetcode

Next time, try submitting a solution and open the Networks tab in dev tools, you will see request being made to /check endpoint to retrieve the status for your submission.

Alright! we understand how leetcode handles your submissions at scale, now let's update our architecture diagram and consider some measures for security.

Final Architecture and Security Measures

Above, we've the updated architecture diagram to handle millions of users on our system.

Now let's discuss some security measures we can take while running code to protect our system from bad actors:

Read Only Filesystem: To prevent users from writing to the filesystem, we can mount the code directory as read-only.
Compute Resource Bounds: To prevent users from consuming excessive resources, we can set CPU and memory limits on the container. If these limits are exceeded, the container will be killed, preventing resource exhaustion.
Explicit Timeout: To prevent users from running infinite loops, we can wrap the user's code in a timeout that kills the process if it runs for longer than a predefined time limit, say 5 seconds.
Limit Network Access: To prevent users from making network requests, we can disable network access in the container.
No System Calls (Seccomp): We can use seccomp to restrict the system calls that the container can make that could compromise the host system.

Alright! With that it's the end to this article.

We started with an initial architecture and built our way up to support millions of users and also learned about some measures we could take to protect our system from bad actors.

I regularly post content related to Software Engineering, AI, tech, and startups.
If you liked this article, you can follow me on X and LinkedIn and check out my other blogs too.

That's it for this one. Have a great day :)