Pratham

Posted on Jan 31

Inside Git: How It Works and the Role of the .git Folder

#webdev #git #versioncontrol #chaicode

You've used git commit hundreds of times. But do you know what actually happens when you run it?

Most developers treat Git like a black box—type the magic spell, hope for the best. When something goes wrong (detached HEAD, lost commits, merge nightmares), they panic because they don't understand the mechanics.

This article changes that.

After reading this, you'll see Git not as a mysterious version control system, but as what it really is: a simple key-value database with some clever pointers on top.

What you'll learn:

What the .git folder actually contains (and why deleting it erases everything)
How Git stores your files (Blobs, Trees, Commits)
What really happens during git add and git commit
Why branches are "free" and commits are permanent
How to explore Git's internals yourself

Part 1: The .git Folder — Where Everything Lives

When you run git init or git clone, Git creates a hidden folder called .git in your project root. This folder IS your repository.

my-project/
├── src/
├── package.json
├── README.md
└── .git/           ← This IS the repository. Everything else is just "the current version"

[!IMPORTANT]
Delete .git = Delete all history. The project files remain, but every commit, every branch, every bit of version control is gone. The .git folder is not a cache or backup—it IS Git.

What's Inside .git?

# Run this in any Git repository
ls -la .git/

# Output:
.git/
├── HEAD                 ← "You Are Here" marker
├── config               ← Repository settings (remotes, user info)
├── description          ← Used by GitWeb (you can ignore this)
├── hooks/               ← Scripts that run on events (commit, push, etc.)
├── index                ← The Staging Area (binary file)
├── logs/                ← History of where HEAD and refs have been (reflog)
├── objects/             ← THE DATABASE: All your files, folders, and commits
├── refs/                ← Branch and tag pointers
│   ├── heads/           ← Local branches (each is a text file!)
│   └── tags/            ← Tags
└── packed-refs          ← Optimization: compressed refs for large repos

The Big Three you need to understand:

objects/ — The database storing all content
refs/ — The labels (branches, tags) pointing to commits
HEAD — The pointer saying "you are currently here"

Everything else is configuration or optimization. Master these three, and you master Git.

Part 2: Git Objects — The Building Blocks

Git stores everything as objects in the .git/objects/ directory. There are only three types you need to know:

Object Type	What It Stores	Analogy
Blob	File content (just the bytes)	A page of text
Tree	Directory structure (list of blobs and other trees)	A table of contents
Commit	Snapshot metadata (points to a tree + parent + message)	A chapter in a book

Why Only Three?

This is Git's genius: by breaking everything into these three primitives, Git can:

Deduplicate identical content automatically
Verify integrity with cryptographic hashes
Build any structure from simple building blocks

2.1 Blobs: The Content Store

A blob (Binary Large Object) stores the raw content of a file—nothing else. No filename. No permissions. Just bytes.

Example: You create a file called hello.txt with content Hello, World!

Git doesn't store:

filename: hello.txt
content: Hello, World!

Git stores ONLY:

blob: Hello, World!

Why no filename? Because the filename is metadata, stored separately in the Tree. This allows Git to detect when you rename a file—the blob stays the same, only the Tree changes.

The SHA-1 Hash (Content Address)

Every object gets a unique 40-character ID based on its content:

echo "Hello, World!" | git hash-object --stdin
# Output: 8ab686eafeb1f44702738c8b0f24f2567c36da6d

This hash IS the address. Git stores the blob at:

.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d
              ↑↑
              First 2 characters = directory name

[!NOTE]
Why hashing matters: If anyone changes even one byte of your file, the hash changes completely. Git uses this to guarantee data integrity—if the hash matches, the content is exactly what was saved.

Same Content = Same Blob

Create two files with identical content:

echo "Hello, World!" > file1.txt
echo "Hello, World!" > file2.txt

Git creates only ONE blob. Both files point to the same 8ab686e... object. This is how Git saves space—duplicates are free.

2.2 Trees: The Directory Structure

A tree is like a directory listing. It contains:

Pointers to blobs (files)
Pointers to other trees (subdirectories)
Filenames and permissions

Example Tree:

100644 blob 8ab686ea... hello.txt
100644 blob 2b7e1f5c... style.css
040000 tree 5d3c8f2a... src/

Entry	Meaning
`100644`	File permissions (normal file)
`blob`	Object type
`8ab686ea...`	SHA-1 hash of the blob
`hello.txt`	Filename

Why this structure?

Filenames live HERE, not in blobs
Renaming a file = new tree, same blob (efficient!)
Subdirectories are just nested trees

Diagram: Tree → Blob Relationship

                         Tree (root directory)
                         ┌──────────────────────────────────────┐
                         │ 100644 blob abc123  index.html       │
                         │ 100644 blob def456  style.css        │
                         │ 040000 tree 789abc  src/             │
                         └──────────────────────────────────────┘
                                        │
                              ┌─────────┴─────────┐
                              ▼                   ▼
                    Blob (abc123)           Tree (789abc)
                    ┌────────────┐          ┌─────────────────────┐
                    │ <html>     │          │ 100644 blob aaa111  │
                    │ <head>     │          │   app.js            │
                    │ ...        │          │ 100644 blob bbb222  │
                    └────────────┘          │   utils.js          │
                                            └─────────────────────┘

2.3 Commits: Snapshots in Time

A commit is the glue that holds everything together. It contains:

Field	What It Is
`tree`	Pointer to the root tree (the project snapshot)
`parent`	Pointer to the previous commit (or none for first commit)
`author`	Who wrote the code + timestamp
`committer`	Who made the commit + timestamp
`message`	Your commit message

Example Commit Object:

tree 5d3c8f2a4b1e0f3d2c1a0b9e8d7c6f5a4b3c2d1e
parent 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b
author Piyush <piyush@example.com> 1706712000 +0530
committer Piyush <piyush@example.com> 1706712000 +0530

feat: add user login functionality

Why parent matters:

Creates a linked list of history
Each commit knows what came before it
Commits only look BACKWARD, never forward

The Time Travel Analogy

Imagine Git history as a timeline:

You are in 2024 (the latest commit)
You time travel back to 1990 (an older commit)
You decide to stay in 1990 and create a new commit

This new commit branches off from 1990. It doesn't connect to 2024 because, in this new timeline, 2024 hasn't happened yet. If the new commit automatically referenced the "future" commit, it would create a loop rather than a history.

This is why detached commits are orphaned: They're on an alternate timeline that doesn't connect to the main branch.

Diagram: The Complete Object Relationship

┌─────────────────────────────────────────────────────────────────────────────┐
│                           COMPLETE OBJECT RELATIONSHIP                      │
└─────────────────────────────────────────────────────────────────────────────┘

    Commit (abc123)                    Commit (def456)
    ┌──────────────────────┐           ┌──────────────────────┐
    │ tree: 111aaa         │           │ tree: 222bbb         │
    │ parent: def456   ────┼──────────►│ parent: (none)       │
    │ author: Piyush       │           │ author: Piyush       │
    │ message: "Add login" │           │ message: "Initial"   │
    └──────────────────────┘           └──────────────────────┘
              │                                  │
              ▼                                  ▼
        Tree (111aaa)                      Tree (222bbb)
        ┌─────────────┐                    ┌─────────────┐
        │ login.html  │                    │ index.html  │
        │ style.css   │                    │ README.md   │
        │ src/        │                    └─────────────┘
        └─────────────┘                           │
              │                                   ▼
              ▼                            Blobs (files)
        Blobs (files)

    Each commit is a COMPLETE SNAPSHOT, not a diff!
    Git calculates diffs on-the-fly by comparing trees.

[!NOTE]
Commits are snapshots, not diffs. Git doesn't store "what changed." It stores the entire tree at that point. Diffs are calculated when you ask for them by comparing two commits.

Part 3: What Happens During `git add`

Now that you understand the building blocks, let's trace what happens when you run git add.

The Three Areas

┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
│  WORKING DIR    │      │  STAGING AREA   │      │   REPOSITORY    │
│                 │ git  │    (Index)      │ git  │   (.git/objects)│
│  Your files     │ add  │ .git/index      │commit│                 │
│  on disk        │─────►│                 │─────►│  Permanent      │
│                 │      │  "Ready to      │      │  history        │
│                 │      │   commit"       │      │                 │
└─────────────────┘      └─────────────────┘      └─────────────────┘

Step-by-Step: What `git add src/login.js` Does

1. Hash the file content

# Git calculates the SHA-1 hash of the file
sha1("content of login.js") = 8ab686eafeb1f...

2. Create a blob object

# If this hash doesn't exist yet, Git creates the blob:
.git/objects/8a/b686eafeb1f44702738c8b0f24f2567c36da6d

3. Update the index (staging area)

# Git updates .git/index with:
# "When you commit, include: src/login.js → blob 8ab686e..."

Verify It Yourself

# Create a file
echo "console.log('Hello');" > test.js

# Stage it
git add test.js

# See what's in the staging area
git ls-files --stage
# Output: 100644 a1b2c3d4e5f6... 0    test.js
#         ↑       ↑              ↑    ↑
#       perms    blob hash    stage  filename

# The blob now exists in .git/objects/
find .git/objects -type f
# You'll see: .git/objects/a1/b2c3d4e5f6...

Key insight: git add doesn't just "stage" a file—it creates the blob object immediately. The staging area is a list of "blobs I want to commit."

Part 4: What Happens During `git commit`

When you run git commit, Git does three things:

Step 1: Create Tree Object(s)

Git reads the staging area (.git/index) and creates tree objects representing the directory structure.

Index says:
  - src/login.js → blob 8ab686e
  - src/auth.js  → blob 4d5e6f0
  - style.css    → blob 2b7e1f5

Git creates:
  - Tree for src/ (pointing to login.js and auth.js blobs)
  - Tree for root (pointing to src/ tree and style.css blob)

Step 2: Create Commit Object

Git creates a commit containing:

Pointer to the root tree
Pointer to the current HEAD commit (parent)
Your author info and message

# The new commit object:
tree 5d3c8f2a4b...
parent 7a8b9c0d1e...  ← Current HEAD becomes parent
author Piyush <...>
message: feat: add login

Step 3: Update the Branch Pointer

# Before commit:
.git/refs/heads/main → 7a8b9c0d1e...

# After commit:
.git/refs/heads/main → abc123def4...  ← Updated to new commit!

That's it. The branch file is just updated with the new commit's hash.

Diagram: The Complete Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                        git add → git commit FLOW                            │
└─────────────────────────────────────────────────────────────────────────────┘

  WORKING DIRECTORY              STAGING AREA                REPOSITORY
  ┌─────────────────┐           ┌─────────────┐           ┌─────────────────┐
  │                 │           │             │           │                 │
  │  login.js  ─────┼── add ───►│ login.js    │           │  objects/       │
  │  (modified)     │           │ (blob hash) │           │   ├── blobs     │
  │                 │           │             │── commit─►│   ├── trees     │
  │  style.css ─────┼── add ───►│ style.css   │           │   └── commits   │
  │  (modified)     │           │ (blob hash) │           │                 │
  │                 │           │             │           │  refs/heads/    │
  └─────────────────┘           └─────────────┘           │   └── main ─────┼─┐
                                                          │     (updated)   │ │
                                                          └─────────────────┘ │
                                                                    ▲         │
                                                                    │         │
                                                                    └─────────┘
                                                                  Points to new
                                                                  commit hash

Part 5: Refs and HEAD — The Label System

Branches Are Just Text Files

This is the most liberating Git insight: a branch is literally a text file containing a 40-character hash.

# See what 'main' branch points to:
cat .git/refs/heads/main
# Output: abc123def456789...

# That's it. That's the entire branch.

Why this matters:

Creating a branch is instant (just create a tiny file)
Deleting a branch doesn't delete commits
"Merging" is just moving pointers

HEAD: The "You Are Here" Marker

HEAD tells Git where you currently are. It usually points to a branch:

cat .git/HEAD
# Output: ref: refs/heads/main

This means: "I'm on the main branch."

Normal vs Detached HEAD

State	HEAD Contains	What Happens on Commit
Normal	`ref: refs/heads/main`	Branch moves forward with you
Detached	`abc123def456...` (raw hash)	No branch moves; commit is orphaned

Normal State:

HEAD → refs/heads/main → Commit C
                              ↑
                         You commit D
                              ↓
HEAD → refs/heads/main → Commit D

Detached State:

HEAD → Commit B (directly)
            ↑
       You commit D
            ↓
HEAD → Commit D

But 'main' still points to Commit C!
D has no branch. It's an orphan.

[!CAUTION]
Detached HEAD warning: If you commit in detached HEAD state, your work is at risk. Always create a branch (git checkout -b new-branch) before committing if you're detached.

What Counts as a "Reference"?

Reference Type	Example	Stable?
Branch name	`main`, `feature-x`	✅ Yes
Tag	`v1.0.0`	✅ Yes
HEAD (on a branch)	`HEAD` → `main` → commit	✅ Yes
Detached HEAD	`HEAD` → commit directly	❌ Only while you're there!

The moment you leave a detached commit, it becomes eligible for garbage collection.

Reachability: Why Some Commits Survive

Git's garbage collector deletes objects with zero references. But it checks reachability:

main → C → B → A (all reachable from main)
           ↑
           └─ D (orphan - no branch points here)

Garbage collector:
✓ A is reachable from main (through C → B → A)
✓ B is reachable from main (through C → B)
✓ C is reachable from main (directly)
✗ D is NOT reachable - will be deleted

The Chain of Custody: As long as a branch points to the tip, all ancestor commits are protected because Git follows parent pointers.

The Mountain Climbers Analogy

Imagine a team of mountain climbers roped together:

The Helicopter (Branch) is holding the top climber (C)
Climber C is holding the rope for Climber B
Climber B is holding the rope for Climber A

Even though the helicopter only holds C, climbers A and B don't fall because they're chained to C.

Your detached commit D is a climber who tied their rope to B, but B isn't holding onto D. If the helicopter (branch) doesn't come down to pick up D specifically, D falls.

Key insight: Parent pointers only go backward. B doesn't know D exists.

Part 6: Hands-On Exploration

Commands to Inspect Git Objects

# See what type an object is
git cat-file -t abc123
# Output: commit, tree, or blob

# See the content of an object
git cat-file -p abc123
# Output: The actual content

Example: Trace a Commit to Its Files

# 1. Get the latest commit hash
git rev-parse HEAD
# Output: abc123def456...

# 2. See the commit object
git cat-file -p abc123
# Output:
# tree 111aaa222bbb...
# parent 333ccc444ddd...
# author Piyush <...>
# ... message ...

# 3. See the tree (directory snapshot)
git cat-file -p 111aaa
# Output:
# 100644 blob 555eee666fff    login.html
# 100644 blob 777ggg888hhh    style.css
# 040000 tree 999iii000jjj    src

# 4. See a blob (file content)
git cat-file -p 555eee
# Output: The actual HTML content!

The Reflog: Your Safety Net

Even if you lose a commit, Git remembers where HEAD has been:

git reflog
# Output:
# abc123 HEAD@{0}: commit: feat: add login
# def456 HEAD@{1}: checkout: moving from main to feature
# 789abc HEAD@{2}: commit: fix: typo

Recover a lost commit:

# Find it in reflog
git reflog

# Create a branch to save it
git branch rescue-branch abc123

[!TIP]
Reflog entries expire after 30 days for unreachable commits and 90 days for reachable ones. Act quickly!

Part 7: The Mental Model Summary

After all this, here's the simple truth:

Git Is a Content-Addressable Filesystem

Concept	Reality
Repository	A folder called `.git` with a key-value database
Blob	File content, addressed by SHA-1 hash
Tree	Directory listing, addressed by SHA-1 hash
Commit	Metadata pointing to a tree + parent
Branch	A text file containing a commit hash
HEAD	A text file saying which branch you're on
`git add`	Create blob, update index
`git commit`	Create tree + commit, update branch pointer

Visual Summary

┌─────────────────────────────────────────────────────────────────────────────┐
│                           GIT'S ARCHITECTURE                                │
└─────────────────────────────────────────────────────────────────────────────┘

                    You (HEAD)
                        │
                        ▼
                   ┌─────────┐
                   │  main   │  ← Branch (text file)
                   └────┬────┘
                        │ (contains hash)
                        ▼
                   ┌─────────┐
                   │ Commit  │  ← Commit object
                   │ abc123  │
                   └────┬────┘
                        │ (tree pointer)
                        ▼
                   ┌─────────┐
                   │  Tree   │  ← Root directory
                   │ 111aaa  │
                   └────┬────┘
                        │ (blob and tree pointers)
            ┌───────────┼───────────┐
            ▼           ▼           ▼
       ┌────────┐  ┌────────┐  ┌────────┐
       │  Blob  │  │  Blob  │  │  Tree  │
       │ file1  │  │ file2  │  │  src/  │
       └────────┘  └────────┘  └────────┘

Conclusion: Why This Knowledge Matters

Understanding Git's internals transforms you from a command-memorizer to a confident user:

Before	After
"I ran reset and lost my work!"	"I know it's in reflog for 30 days"
"Detached HEAD is scary"	"Just means HEAD points to hash, not branch"
"Branches are expensive to create"	"They're just 41-byte text files"
"Git is mysterious"	"Git is a key-value store with pointers"

Your Next Steps

Explore your own .git folder — Run the commands from Part 6
Create a throwaway repo and experiment — Break things on purpose
Read the hash — When you see error messages with hashes, you now know what they mean

The key insight: Every complex Git operation (rebase, cherry-pick, reset) is just manipulating objects and pointers. Once you see the database, the commands become obvious.

You now understand Git better than 90% of developers. Use this power wisely. 🚀

Have questions? Found this helpful? Let me know in the comments below!

DEV Community

Inside Git: How It Works and the Role of the .git Folder

Part 1: The .git Folder — Where Everything Lives

What's Inside .git?

Part 2: Git Objects — The Building Blocks

Why Only Three?

2.1 Blobs: The Content Store

The SHA-1 Hash (Content Address)

Same Content = Same Blob

2.2 Trees: The Directory Structure

Diagram: Tree → Blob Relationship

2.3 Commits: Snapshots in Time

The Time Travel Analogy

Diagram: The Complete Object Relationship

Part 3: What Happens During `git add`

The Three Areas

Step-by-Step: What `git add src/login.js` Does

Verify It Yourself

Part 4: What Happens During `git commit`

Step 1: Create Tree Object(s)

Step 2: Create Commit Object

Step 3: Update the Branch Pointer

Diagram: The Complete Flow

Part 5: Refs and HEAD — The Label System

Branches Are Just Text Files

HEAD: The "You Are Here" Marker

Normal vs Detached HEAD

What Counts as a "Reference"?

Reachability: Why Some Commits Survive

The Mountain Climbers Analogy

Part 6: Hands-On Exploration

Commands to Inspect Git Objects

Example: Trace a Commit to Its Files

The Reflog: Your Safety Net

Part 7: The Mental Model Summary

Git Is a Content-Addressable Filesystem

Visual Summary

Conclusion: Why This Knowledge Matters

Your Next Steps

Top comments (0)

Part 1: The .git Folder — Where Everything Lives

What's Inside .git?

Part 2: Git Objects — The Building Blocks

Why Only Three?

2.1 Blobs: The Content Store

The SHA-1 Hash (Content Address)

Same Content = Same Blob

2.2 Trees: The Directory Structure

Diagram: Tree → Blob Relationship

2.3 Commits: Snapshots in Time

The Time Travel Analogy

Diagram: The Complete Object Relationship

Part 3: What Happens During git add

The Three Areas

Step-by-Step: What git add src/login.js Does

Verify It Yourself

Part 4: What Happens During git commit

Step 1: Create Tree Object(s)

Step 2: Create Commit Object

Step 3: Update the Branch Pointer

Diagram: The Complete Flow

Part 5: Refs and HEAD — The Label System

Branches Are Just Text Files

HEAD: The "You Are Here" Marker

Normal vs Detached HEAD

What Counts as a "Reference"?

Reachability: Why Some Commits Survive

The Mountain Climbers Analogy

Part 6: Hands-On Exploration

Commands to Inspect Git Objects

Example: Trace a Commit to Its Files

The Reflog: Your Safety Net

Part 7: The Mental Model Summary

Git Is a Content-Addressable Filesystem

Visual Summary

Conclusion: Why This Knowledge Matters

Your Next Steps

Part 3: What Happens During `git add`

Step-by-Step: What `git add src/login.js` Does

Part 4: What Happens During `git commit`