Architecture
How to Read a Codebase You Did Not Write
A practical investigation process for mapping unfamiliar systems quickly without getting lost in every file.
Reading an unfamiliar codebase is not the same as reading a book. There is no guaranteed beginning, middle, or end.
The fastest path is not to inspect every file. It is to build a working map of the system, then deepen understanding where the risk or value is highest.
Start With the Product
Before opening code, understand what the software does.
What are the core workflows? Who uses it? What data does it protect? What parts of the product make money, reduce cost, or carry compliance obligations?
This context changes how you read the code. A messy module in a low-value admin tool matters less than a smaller module that controls billing, permissions, or customer onboarding.
Find the Entry Points
Look for routes, controllers, scheduled jobs, event handlers, command-line scripts, and API endpoints.
Entry points show how external requests become internal behaviour. They also reveal naming conventions, domain boundaries, and common dependency paths.
Map the Data
Most systems are easier to understand once the data model is visible.
Find the core tables, entities, schemas, queues, and external data sources. Identify which data is authoritative and which is derived. Look for shared databases, duplicated fields, and unclear ownership.
Data structure often explains architecture choices that are not obvious from code alone.
Use History
Commit history tells you where change happens.
Files that changed recently or frequently are often more important than files that merely look complicated. Look for recurring changes, repeated bug fixes, large refactors, and modules touched by many developers.
History shows the living parts of the system.
Follow One Workflow End to End
Choose a business-critical workflow and trace it completely.
Start at the user action or API call. Follow validation, business rules, persistence, integration calls, events, background jobs, notifications, and observability.
This reveals the real architecture faster than static diagrams.
Separate Known From Inferred
When reading unfamiliar code, write down what is known, what is inferred, and what needs validation.
This prevents confident but wrong conclusions. It also gives the team a useful investigation record.
Look for Risk Signals
Useful risk signals include:
- Complex branching in critical workflows.
- Missing tests around important behaviour.
- Shared mutable state.
- Global configuration.
- Hidden dependencies.
- Manual deployment or migration steps.
- Security-sensitive logic without clear ownership.
These are the areas to inspect more deeply.
Stop When You Know Enough
The goal is not total knowledge. The goal is decision-quality understanding.
A good codebase review should explain the system's shape, important behaviours, main risks, and next investigation steps. That is usually more valuable than reading every line.
Back to blog posts