You've used git commit hundreds of times. But do you know what actually happens when you run it?
Most developers treat Git like a black box—type the magic spell, hope for the best. When something goes wrong (detached HEAD, lost commits, merge nightmares), they panic because they don't understand the mechanics.
This article changes that.
After reading this, you'll see Git not as a mysterious version control system, but as what it really is: a simple key-value database with some clever pointers on top.
What you'll learn:
- What the
.gitfolder actually contains (and why deleting it erases everything) - How Git stores your files (Blobs, Trees, Commits)
- What really happens during
git addandgit commit - Why branches are "free" and commits are permanent
- How to explore Git's internals yourself
Part 1: The .git Folder — Where Everything Lives
When you run git init or git clone, Git creates a hidden folder called .git in your project root. This folder IS your repository.
my-project/
├── src/
├── package.json
├── README.md
└── .git/ ← This IS the repository. Everything else is just "the current version"
[!IMPORTANT]
Delete.git= Delete all history. The project files remain, but every commit, every branch, every bit of version control is gone. The.gitfolder is not a cache or backup—it IS Git.
What's Inside .git?
# Run this in any Git repository
ls -la .git/
# Output:
.git/
├── HEAD ← "You Are Here" marker
├── config ← Repository settings (remotes, user info)
├── description ← Used by GitWeb (you can ignore this)
├── hooks/ ← Scripts that run on events (commit, push, etc.)
├── index ← The Staging Area (binary file)
├── logs/ ← History of where HEAD and refs have been (reflog)
├── objects/ ← THE DATABASE: All your files, folders, and commits
├── refs/ ← Branch and tag pointers
│ ├── heads/ ← Local branches (each is a text file!)
│ └── tags/ ← Tags
└── packed-refs ← Optimization: compressed refs for large repos
The Big Three you need to understand:
-
objects/— The database storing all content -
refs/— The labels (branches, tags) pointing to commits -
HEAD— The pointer saying "you are currently here"
Everything else is configuration or optimization. Master these three, and you master Git.
Part 2: Git Objects — The Building Blocks
Git stores everything as objects in the .git/objects/ directory. There are only three types you need to know:
| Object Type | What It Stores | Analogy |
|---|---|---|
| Blob | File content (just the bytes) | A page of text |
| Tree | Directory structure (list of blobs and other trees) | A table of contents |
| Commit | Snapshot metadata (points to a tree + parent + message) | A chapter in a book |
Why Only Three?
This is Git's genius: by breaking everything into these three primitives, Git can:
- Deduplicate identical content automatically
- Verify integrity with cryptographic hashes
- Build any structure from simple building blocks
2.1 Blobs: The Content Store
A blob (Binary Large Object) stores the raw content of a file—nothing else. No filename. No permissions. Just bytes.
Example: You create a file called hello.txt with content Hello, World!
Git doesn't store:
filename: hello.txt
content: Hello, World!
Git stores ONLY:
blob: Hello, World!
Why no filename? Because the filename is metadata, stored separately in the Tree. This allows Git to detect when you rename a file—the blob stays the same, only the Tree changes.
The SHA-1 Hash (Content Address)
Every object gets a unique 40-character ID based on its content:
echo "Hello, World!" | git hash-object --stdin
# Output: 8ab686eafeb1f44702738c8b0f24f2567c36da6d
This hash IS the address. Git stores the blob at:
.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d
↑↑
First 2 characters = directory name
[!NOTE]
Why hashing matters: If anyone changes even one byte of your file, the hash changes completely. Git uses this to guarantee data integrity—if the hash matches, the content is exactly what was saved.
Same Content = Same Blob
Create two files with identical content:
echo "Hello, World!" > file1.txt
echo "Hello, World!" > file2.txt
Git creates only ONE blob. Both files point to the same 8ab686e... object. This is how Git saves space—duplicates are free.
2.2 Trees: The Directory Structure
A tree is like a directory listing. It contains:
- Pointers to blobs (files)
- Pointers to other trees (subdirectories)
- Filenames and permissions
Example Tree:
100644 blob 8ab686ea... hello.txt
100644 blob 2b7e1f5c... style.css
040000 tree 5d3c8f2a... src/
| Entry | Meaning |
|---|---|
100644 |
File permissions (normal file) |
blob |
Object type |
8ab686ea... |
SHA-1 hash of the blob |
hello.txt |
Filename |
Why this structure?
- Filenames live HERE, not in blobs
- Renaming a file = new tree, same blob (efficient!)
- Subdirectories are just nested trees
Diagram: Tree → Blob Relationship
Tree (root directory)
┌──────────────────────────────────────┐
│ 100644 blob abc123 index.html │
│ 100644 blob def456 style.css │
│ 040000 tree 789abc src/ │
└──────────────────────────────────────┘
│
┌─────────┴─────────┐
▼ ▼
Blob (abc123) Tree (789abc)
┌────────────┐ ┌─────────────────────┐
│ <html> │ │ 100644 blob aaa111 │
│ <head> │ │ app.js │
│ ... │ │ 100644 blob bbb222 │
└────────────┘ │ utils.js │
└─────────────────────┘
2.3 Commits: Snapshots in Time
A commit is the glue that holds everything together. It contains:
| Field | What It Is |
|---|---|
tree |
Pointer to the root tree (the project snapshot) |
parent |
Pointer to the previous commit (or none for first commit) |
author |
Who wrote the code + timestamp |
committer |
Who made the commit + timestamp |
message |
Your commit message |
Example Commit Object:
tree 5d3c8f2a4b1e0f3d2c1a0b9e8d7c6f5a4b3c2d1e
parent 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b
author Piyush <piyush@example.com> 1706712000 +0530
committer Piyush <piyush@example.com> 1706712000 +0530
feat: add user login functionality
Why parent matters:
- Creates a linked list of history
- Each commit knows what came before it
- Commits only look BACKWARD, never forward
The Time Travel Analogy
Imagine Git history as a timeline:
- You are in 2024 (the latest commit)
- You time travel back to 1990 (an older commit)
- You decide to stay in 1990 and create a new commit
This new commit branches off from 1990. It doesn't connect to 2024 because, in this new timeline, 2024 hasn't happened yet. If the new commit automatically referenced the "future" commit, it would create a loop rather than a history.
This is why detached commits are orphaned: They're on an alternate timeline that doesn't connect to the main branch.
Diagram: The Complete Object Relationship
┌─────────────────────────────────────────────────────────────────────────────┐
│ COMPLETE OBJECT RELATIONSHIP │
└─────────────────────────────────────────────────────────────────────────────┘
Commit (abc123) Commit (def456)
┌──────────────────────┐ ┌──────────────────────┐
│ tree: 111aaa │ │ tree: 222bbb │
│ parent: def456 ────┼──────────►│ parent: (none) │
│ author: Piyush │ │ author: Piyush │
│ message: "Add login" │ │ message: "Initial" │
└──────────────────────┘ └──────────────────────┘
│ │
▼ ▼
Tree (111aaa) Tree (222bbb)
┌─────────────┐ ┌─────────────┐
│ login.html │ │ index.html │
│ style.css │ │ README.md │
│ src/ │ └─────────────┘
└─────────────┘ │
│ ▼
▼ Blobs (files)
Blobs (files)
Each commit is a COMPLETE SNAPSHOT, not a diff!
Git calculates diffs on-the-fly by comparing trees.
[!NOTE]
Commits are snapshots, not diffs. Git doesn't store "what changed." It stores the entire tree at that point. Diffs are calculated when you ask for them by comparing two commits.
Part 3: What Happens During git add
Now that you understand the building blocks, let's trace what happens when you run git add.
The Three Areas
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ WORKING DIR │ │ STAGING AREA │ │ REPOSITORY │
│ │ git │ (Index) │ git │ (.git/objects)│
│ Your files │ add │ .git/index │commit│ │
│ on disk │─────►│ │─────►│ Permanent │
│ │ │ "Ready to │ │ history │
│ │ │ commit" │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Step-by-Step: What git add src/login.js Does
1. Hash the file content
# Git calculates the SHA-1 hash of the file
sha1("content of login.js") = 8ab686eafeb1f...
2. Create a blob object
# If this hash doesn't exist yet, Git creates the blob:
.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d
3. Update the index (staging area)
# Git updates .git/index with:
# "When you commit, include: src/login.js → blob 8ab686e..."
Verify It Yourself
# Create a file
echo "console.log('Hello');" > test.js
# Stage it
git add test.js
# See what's in the staging area
git ls-files --stage
# Output: 100644 a1b2c3d4e5f6... 0 test.js
# ↑ ↑ ↑ ↑
# perms blob hash stage filename
# The blob now exists in .git/objects/
find .git/objects -type f
# You'll see: .git/objects/a1/b2c3d4e5f6...
Key insight: git add doesn't just "stage" a file—it creates the blob object immediately. The staging area is a list of "blobs I want to commit."
Part 4: What Happens During git commit
When you run git commit, Git does three things:
Step 1: Create Tree Object(s)
Git reads the staging area (.git/index) and creates tree objects representing the directory structure.
Index says:
- src/login.js → blob 8ab686e
- src/auth.js → blob 4d5e6f0
- style.css → blob 2b7e1f5
Git creates:
- Tree for src/ (pointing to login.js and auth.js blobs)
- Tree for root (pointing to src/ tree and style.css blob)
Step 2: Create Commit Object
Git creates a commit containing:
- Pointer to the root tree
- Pointer to the current HEAD commit (parent)
- Your author info and message
# The new commit object:
tree 5d3c8f2a4b...
parent 7a8b9c0d1e... ← Current HEAD becomes parent
author Piyush <...>
message: feat: add login
Step 3: Update the Branch Pointer
# Before commit:
.git/refs/heads/main → 7a8b9c0d1e...
# After commit:
.git/refs/heads/main → abc123def4... ← Updated to new commit!
That's it. The branch file is just updated with the new commit's hash.
Diagram: The Complete Flow
┌─────────────────────────────────────────────────────────────────────────────┐
│ git add → git commit FLOW │
└─────────────────────────────────────────────────────────────────────────────┘
WORKING DIRECTORY STAGING AREA REPOSITORY
┌─────────────────┐ ┌─────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ login.js ─────┼── add ───►│ login.js │ │ objects/ │
│ (modified) │ │ (blob hash) │ │ ├── blobs │
│ │ │ │── commit─►│ ├── trees │
│ style.css ─────┼── add ───►│ style.css │ │ └── commits │
│ (modified) │ │ (blob hash) │ │ │
│ │ │ │ │ refs/heads/ │
└─────────────────┘ └─────────────┘ │ └── main ─────┼─┐
│ (updated) │ │
└─────────────────┘ │
▲ │
│ │
└─────────┘
Points to new
commit hash
Part 5: Refs and HEAD — The Label System
Branches Are Just Text Files
This is the most liberating Git insight: a branch is literally a text file containing a 40-character hash.
# See what 'main' branch points to:
cat .git/refs/heads/main
# Output: abc123def456789...
# That's it. That's the entire branch.
Why this matters:
- Creating a branch is instant (just create a tiny file)
- Deleting a branch doesn't delete commits
- "Merging" is just moving pointers
HEAD: The "You Are Here" Marker
HEAD tells Git where you currently are. It usually points to a branch:
cat .git/HEAD
# Output: ref: refs/heads/main
This means: "I'm on the main branch."
Normal vs Detached HEAD
| State | HEAD Contains | What Happens on Commit |
|---|---|---|
| Normal | ref: refs/heads/main |
Branch moves forward with you |
| Detached |
abc123def456... (raw hash) |
No branch moves; commit is orphaned |
Normal State:
HEAD → refs/heads/main → Commit C
↑
You commit D
↓
HEAD → refs/heads/main → Commit D
Detached State:
HEAD → Commit B (directly)
↑
You commit D
↓
HEAD → Commit D
But 'main' still points to Commit C!
D has no branch. It's an orphan.
[!CAUTION]
Detached HEAD warning: If you commit in detached HEAD state, your work is at risk. Always create a branch (git checkout -b new-branch) before committing if you're detached.
What Counts as a "Reference"?
| Reference Type | Example | Stable? |
|---|---|---|
| Branch name |
main, feature-x
|
✅ Yes |
| Tag | v1.0.0 |
✅ Yes |
| HEAD (on a branch) |
HEAD → main → commit |
✅ Yes |
| Detached HEAD |
HEAD → commit directly |
❌ Only while you're there! |
The moment you leave a detached commit, it becomes eligible for garbage collection.
Reachability: Why Some Commits Survive
Git's garbage collector deletes objects with zero references. But it checks reachability:
main → C → B → A (all reachable from main)
↑
└─ D (orphan - no branch points here)
Garbage collector:
✓ A is reachable from main (through C → B → A)
✓ B is reachable from main (through C → B)
✓ C is reachable from main (directly)
✗ D is NOT reachable - will be deleted
The Chain of Custody: As long as a branch points to the tip, all ancestor commits are protected because Git follows parent pointers.
The Mountain Climbers Analogy
Imagine a team of mountain climbers roped together:
-
The Helicopter (Branch) is holding the top climber (
C) -
Climber C is holding the rope for Climber
B -
Climber B is holding the rope for Climber
A
Even though the helicopter only holds C, climbers A and B don't fall because they're chained to C.
Your detached commit D is a climber who tied their rope to B, but B isn't holding onto D. If the helicopter (branch) doesn't come down to pick up D specifically, D falls.
Key insight: Parent pointers only go backward. B doesn't know D exists.
Part 6: Hands-On Exploration
Commands to Inspect Git Objects
# See what type an object is
git cat-file -t abc123
# Output: commit, tree, or blob
# See the content of an object
git cat-file -p abc123
# Output: The actual content
Example: Trace a Commit to Its Files
# 1. Get the latest commit hash
git rev-parse HEAD
# Output: abc123def456...
# 2. See the commit object
git cat-file -p abc123
# Output:
# tree 111aaa222bbb...
# parent 333ccc444ddd...
# author Piyush <...>
# ... message ...
# 3. See the tree (directory snapshot)
git cat-file -p 111aaa
# Output:
# 100644 blob 555eee666fff login.html
# 100644 blob 777ggg888hhh style.css
# 040000 tree 999iii000jjj src
# 4. See a blob (file content)
git cat-file -p 555eee
# Output: The actual HTML content!
The Reflog: Your Safety Net
Even if you lose a commit, Git remembers where HEAD has been:
git reflog
# Output:
# abc123 HEAD@{0}: commit: feat: add login
# def456 HEAD@{1}: checkout: moving from main to feature
# 789abc HEAD@{2}: commit: fix: typo
Recover a lost commit:
# Find it in reflog
git reflog
# Create a branch to save it
git branch rescue-branch abc123
[!TIP]
Reflog entries expire after 30 days for unreachable commits and 90 days for reachable ones. Act quickly!
Part 7: The Mental Model Summary
After all this, here's the simple truth:
Git Is a Content-Addressable Filesystem
| Concept | Reality |
|---|---|
| Repository | A folder called .git with a key-value database |
| Blob | File content, addressed by SHA-1 hash |
| Tree | Directory listing, addressed by SHA-1 hash |
| Commit | Metadata pointing to a tree + parent |
| Branch | A text file containing a commit hash |
| HEAD | A text file saying which branch you're on |
git add |
Create blob, update index |
git commit |
Create tree + commit, update branch pointer |
Visual Summary
┌─────────────────────────────────────────────────────────────────────────────┐
│ GIT'S ARCHITECTURE │
└─────────────────────────────────────────────────────────────────────────────┘
You (HEAD)
│
▼
┌─────────┐
│ main │ ← Branch (text file)
└────┬────┘
│ (contains hash)
▼
┌─────────┐
│ Commit │ ← Commit object
│ abc123 │
└────┬────┘
│ (tree pointer)
▼
┌─────────┐
│ Tree │ ← Root directory
│ 111aaa │
└────┬────┘
│ (blob and tree pointers)
┌───────────┼───────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Blob │ │ Blob │ │ Tree │
│ file1 │ │ file2 │ │ src/ │
└────────┘ └────────┘ └────────┘
Conclusion: Why This Knowledge Matters
Understanding Git's internals transforms you from a command-memorizer to a confident user:
| Before | After |
|---|---|
| "I ran reset and lost my work!" | "I know it's in reflog for 30 days" |
| "Detached HEAD is scary" | "Just means HEAD points to hash, not branch" |
| "Branches are expensive to create" | "They're just 41-byte text files" |
| "Git is mysterious" | "Git is a key-value store with pointers" |
Your Next Steps
- Explore your own .git folder — Run the commands from Part 6
- Create a throwaway repo and experiment — Break things on purpose
- Read the hash — When you see error messages with hashes, you now know what they mean
The key insight: Every complex Git operation (rebase, cherry-pick, reset) is just manipulating objects and pointers. Once you see the database, the commands become obvious.
You now understand Git better than 90% of developers. Use this power wisely. 🚀
Have questions? Found this helpful? Let me know in the comments below!
Top comments (0)