LLD Note
Requirements and Capacity Math
A URL shortener is usually read-heavy. Creation writes are important, but the redirect path is the product surface: users expect a short link to resolve in a few milliseconds, even when a campaign link becomes viral.
A 7-character Base62 code gives 62^7 combinations, roughly 3.5 trillion possible codes. That gives enough headroom for large interview assumptions while keeping links short and readable.
The design should separate create-time guarantees from redirect-time speed. The database guarantees uniqueness and ownership; Redis and edge routing protect the hot read path.
- Core APIs are create short link, resolve short link, disable link, and read analytics.
- Writes store durable mappings; reads prefer cache and fall back to storage.
- Redirect latency matters more than analytics freshness.
LLD Note
Short Code Generation Tradeoffs
Random codes are simple, but collision probability grows and every collision forces another write attempt. Hash slicing can create deterministic codes, but collisions still need handling and it leaks similarity when users shorten the same URL under different accounts.
A database auto-increment counter encoded as Base62 is easy to reason about, but it centralizes allocation and makes codes predictable. A distributed range allocator or key generation service avoids per-request coordination by leasing blocks of IDs to creation workers.
The chosen design uses a range allocator, encodes IDs as 7-character Base62, optionally shuffles the ID before encoding, and still relies on a unique storage constraint as the final correctness boundary.
- Generated codes use allocator plus Base62.
- Custom aliases use explicit unique reservation.
- Storage uniqueness is non-negotiable, even when allocator is trusted.
LLD Note
Redirect Read Path and Cache Strategy
The redirect service first parses the code, rejects impossible inputs, checks negative cache or a Bloom-filter style guard, then reads Redis for active mapping metadata. Only cache misses go to the mapping store.
Cache-aside works well because the mapping store remains the source of truth. Creation can warm Redis, but creation should still succeed if Redis is temporarily unavailable. The next redirect can backfill the cache after a database read.
Invalid short codes should be negative-cached briefly. That protects the database from enumeration attacks without permanently hiding a code that might be created later.
- Cache value should include long URL, status, expiry, owner or tenant metadata, and redirect mode.
- Expired, disabled, or abuse-blocked links should use short TTLs or explicit invalidation.
- Hot links may need replication, edge caching, or per-code load protection.
LLD Note
Custom Aliases, Expiration, and Abuse
Custom aliases are a product feature and a security surface. They need strict length and character rules, reserved words, brand-sensitive deny lists, and atomic uniqueness checks so two users cannot claim the same alias.
Long URLs should be validated beyond syntax. The system should reject unsafe protocols, private IP targets, localhost or metadata-service destinations, known phishing domains, malware links, and suspicious redirect chains.
A link can be active, expired, disabled by owner, or abuse-blocked by platform policy. Redirect resolution must evaluate those states on every request, which is another reason 302 is the default redirect status.
- Use 302 by default because links may expire, be disabled, or become unsafe.
- Use 301 only for stable links where analytics and mutability are not required.
- Support safe preview pages for suspicious but not yet blocked destinations.
LLD Note
Analytics Without Slowing Redirects
Every redirect can produce valuable analytics: click time, short code, account, referrer, country, user agent, device type, status, and request id. The redirect path should emit this as an event and continue, not wait for aggregation.
Aggregation workers can compute total clicks, unique visitors, top referrers, geography, device split, abuse signals, and campaign conversion summaries. If the analytics queue is down, redirects should continue while events are sampled, buffered, or dropped according to policy.
- Analytics is eventually consistent.
- Click event publishing should have a tiny timeout.
- Redirect success should not depend on analytics storage availability.
LLD Note
Sharding and Operational Reliability
The mapping store can shard by hash of shortCode so generated IDs and custom aliases distribute evenly. Hashed partitioning avoids a single hot write range when codes are created sequentially by an allocator.
Redirect services should be stateless and horizontally scaled. Redis should use eviction policy, memory limits, replication, and sensible TTLs so hot mappings stay available while cold links fall back to storage.
The allocator needs its own reliability story. Creation workers should lease ranges in batches, keep small local buffers, and fall back to another allocator replica or safe random-with-unique-insert mode during allocator incidents.
- Use read replicas for redirect fallback reads, with primary fallback after fresh creation if needed.
- Alert on cache miss spikes, redirect latency, allocator exhaustion, DB errors, and abuse-blocked traffic.
- Protect hot short codes with replicated cache entries and edge-aware traffic routing.