EP10: Learning System Design | URL Shortener | Part 2 Engineering for Scale
Designing URL ID Generation That Scales to Billions
Howdy friends,
Time for part 2.
From part 1, I was learning system design with Amazon Q using a URL shortener problem. Part 1 was about something invisible but very important. Learning how to ask better questions before jumping into architecture.
Now we move into the engineering side.
I am still using Amazon Q like an interviewer. I answer as a candidate. I give my solution. If my thinking is weak, it corrects me. If something is missing, it pushes me to think deeper. We go back and forth until the idea becomes stronger.
Today’s main focus is:
How do we generate short URLs that can scale to billions of users?
If we design this wrong now, later we might need heavy changes in application logic. So I want something that works long term.
There are two types of short URLs in our design.
Custom URL.
Auto generated URL.
Let me start with custom.
Custom URL is mostly for premium users. For example someone wants short.url/google or short.url/cloudwithalon.
But here I realized something. Anyone can try to create popular names and redirect them to any site. That can create misuse and brand problems.
So I suggested maybe we can use an open source dataset if one exists, or maintain our own reserved keywords list, starting small and growing it over time.
Then we moved to auto generated URLs.
This is where I got stuck for a long time.
The big question was how do we generate unique short IDs that scale to billions with no collision.
My first thoughts were random string generators or hashing the original URL. But I remembered collisions can happen.
Even if rare, at billion scale rare is not rare anymore.
I was not fully confident in my answer. Then Amazon Q introduced counter based string generation.
Instead of random, we use a counter and then encode it.
That means we use increasing numbers like 12, 13, 14 for each auto generated URL and then convert them into Base64.
Example:
12 encoded in Base64 becomes MTIK
13 encoded in Base64 becomes MTMK
And so on.
10,000,000,000 encoded in Base64 becomes MTAwMDAwMDAwMDAK
Even if we reach ten billion URLs, the encoded value is still 16 characters, which is compact and unique.
That was a shift for me.
Instead of hoping values do not collide, we guarantee uniqueness because each number is unique.
But then new questions came.
Where does this counter live?
What type of database do we need?
What should the API look like?
If we have multiple servers, how do they coordinate?
At this point, I realized generating the short URL is not the real problem.
Making it work in a distributed environment is the real challenge.
Now we move into application and database scaling.
I am still thinking through this part and discussing it with Amazon Q. I will continue this in the next part once I feel more clear.
Growing into a better cloud architect everyday.
Alon





