DEV Community

Rost
Rost

Posted on • Originally published at glukhov.org

Converting Windows Text to Linux Format

Line ending inconsistencies between Windows and Linux systems cause formatting issues, Git warnings, and script failures.
This comprehensive guide covers detection, conversion, and prevention strategies.

Understanding Line Ending Differences

Operating systems use different conventions to mark the end of a line in text files, creating compatibility challenges in cross-platform development:

  • Windows: Carriage Return + Line Feed (\r\n or CRLF, hex 0D 0A)
  • Linux/Unix: Line Feed only (\n or LF, hex 0A)
  • Classic Mac OS: Carriage Return only (\r or CR, hex 0D)

This historical difference stems from typewriter mechanics. Windows inherited the CRLF convention from DOS, which maintained compatibility with teletype machines that required both a carriage return (move to line start) and line feed (advance paper).

Common Problems Caused by Line Ending Mismatches

1. Script Execution Failures

Bash scripts with Windows line endings fail with cryptic errors:

bash: ./script.sh: /bin/bash^M: bad interpreter: No such file or directory
Enter fullscreen mode Exit fullscreen mode

The ^M character (carriage return) becomes part of the shebang line, causing the interpreter lookup to fail.

2. Git Warnings and Diff Noise

When committing Windows files to Git on Linux, you'll see:

warning: CRLF will be replaced by LF in file.txt.
The file will have its original line endings in your working directory
Enter fullscreen mode Exit fullscreen mode

Git diffs may show entire files as changed when only line endings differ, obscuring actual code changes.

3. Visual Artifacts in Editors

Linux text editors that don't auto-detect line endings display ^M characters at line ends, making files difficult to read and edit. This is especially problematic in Hugo markdown files where it can break frontmatter parsing.

4. Data Processing Issues

Scripts parsing text files may include carriage returns in extracted data, causing comparison failures and unexpected behavior in data pipelines.

Detecting Windows Line Endings

Before converting files, identify which ones need conversion to avoid unnecessary modifications.

Method 1: Using the file Command

The most reliable detection method:

file content/post/my-post/index.md
Enter fullscreen mode Exit fullscreen mode

Output examples:

# Windows line endings:
index.md: UTF-8 Unicode text, with CRLF line terminators

# Linux line endings:
index.md: UTF-8 Unicode text

# Mixed line endings (problematic):
index.md: UTF-8 Unicode text, with CRLF, LF line terminators
Enter fullscreen mode Exit fullscreen mode

Method 2: Visual Inspection with cat

Display control characters:

cat -A filename.txt
Enter fullscreen mode Exit fullscreen mode

Windows files show ^M$ at line ends, while Linux files show only $.

Method 3: Using grep

Search for carriage returns:

grep -r $'\r' content/post/2025/11/
Enter fullscreen mode Exit fullscreen mode

This identifies all files containing CRLF in the specified directory.

Method 4: Hexdump Analysis

For detailed byte-level inspection:

hexdump -C filename.txt | head -n 20
Enter fullscreen mode Exit fullscreen mode

Look for 0d 0a (CRLF) versus 0a (LF) sequences.

Converting Windows to Linux Format

Multiple tools provide reliable conversion with different trade-offs in availability, features, and performance.

Solution 1: dos2unix (Recommended)

The most robust and feature-rich solution specifically designed for line ending conversion.

Installation

# Ubuntu/Debian
sudo apt install dos2unix

# Red Hat/CentOS/Fedora
sudo yum install dos2unix

# macOS (Homebrew)
brew install dos2unix

# Arch Linux
sudo pacman -S dos2unix
Enter fullscreen mode Exit fullscreen mode

Basic Usage

# Convert single file (modifies in-place)
dos2unix filename.txt

# Convert with backup (creates .bak file)
dos2unix -b filename.txt

# Convert multiple files
dos2unix file1.txt file2.txt file3.txt

# Convert with wildcards
dos2unix *.txt
dos2unix content/post/2025/11/*/index.md
Enter fullscreen mode Exit fullscreen mode

Advanced Options

# Dry run - preview without modifying
dos2unix --dry-run filename.txt

# Keep modification timestamp
dos2unix -k filename.txt

# Convert only if line endings differ
dos2unix -f filename.txt

# Recursive conversion
find . -name "*.md" -exec dos2unix {} \;

# Convert all markdown files in directory tree
find content/post -type f -name "*.md" -exec dos2unix {} \;
Enter fullscreen mode Exit fullscreen mode

Batch Processing Hugo Posts:

# Convert all index.md files in 2025 posts
dos2unix content/post/2025/**/index.md

# Convert all markdown files excluding specific directories
find content/post -name "*.md" ! -path "*/drafts/*" -exec dos2unix {} \;
Enter fullscreen mode Exit fullscreen mode

Solution 2: sed Command

Available on all Unix systems without additional installation, though less efficient for large batches.

# Convert single file
sed -i 's/\r$//' filename.txt

# Convert multiple files with loop
for file in content/post/2025/11/*/index.md; do 
    sed -i 's/\r$//' "$file"
done

# Convert with backup
sed -i.bak 's/\r$//' filename.txt

# Recursive with find
find . -name "*.txt" -exec sed -i 's/\r$//' {} \;
Enter fullscreen mode Exit fullscreen mode

Important Notes

  • sed -i modifies files in-place
  • On macOS, use sed -i '' 's/\r$//' filename.txt
  • Creates temporary files during processing
  • Slower than dos2unix for large file sets

Solution 3: tr Command

Pipe-based approach useful in data processing workflows:

# Basic conversion (requires output redirection)
tr -d '\r' < input.txt > output.txt

# Process and convert in pipeline
cat input.txt | tr -d '\r' | process_data.sh

# Cannot modify in-place - use temp file
tr -d '\r' < input.txt > temp.txt && mv temp.txt input.txt
Enter fullscreen mode Exit fullscreen mode

Advantages

  • Available on all Unix systems
  • Excellent for streaming data
  • Integrates well in pipes

Disadvantages

  • Cannot modify files in-place
  • Requires manual backup handling
  • Less convenient for batch operations

Solution 4: Using awk

Alternative for complex text processing:

awk '{sub(/\r$/,"")}1' input.txt > output.txt

# Or more explicitly:
awk 'BEGIN{RS="\r\n"} {print}' input.txt > output.txt
Enter fullscreen mode Exit fullscreen mode

Comparison Table

Tool In-place Batch Backup Speed Availability
dos2unix Fast Requires install
sed Medium Built-in
tr Fast Built-in
awk Medium Built-in

Prevention Strategies

Preventing Windows line endings is more efficient than repeatedly converting files.

Git Configuration

Configure Git to automatically normalize line endings across platforms.

Option 1: Repository-level (.gitattributes)

Create .gitattributes in repository root:

# Auto detect text files and normalize to LF
* text=auto

# Explicitly declare text files
*.md text
*.txt text
*.sh text eol=lf
*.py text eol=lf
*.go text eol=lf
*.js text eol=lf
*.json text eol=lf

# Binary files
*.jpg binary
*.png binary
*.pdf binary
Enter fullscreen mode Exit fullscreen mode

This ensures consistent line endings regardless of platform and prevents unnecessary conversions.

Option 2: Global User Configuration

Configure Git behavior for all repositories:

# Linux/macOS: Convert CRLF to LF on commit, leave LF unchanged
git config --global core.autocrlf input

# Windows: Convert LF to CRLF on checkout, CRLF to LF on commit
git config --global core.autocrlf true

# Disable auto-conversion (rely on .gitattributes only)
git config --global core.autocrlf false
Enter fullscreen mode Exit fullscreen mode

Recommended Setup

  • Linux/macOS developers: core.autocrlf input
  • Windows developers: core.autocrlf true
  • All projects: Use .gitattributes for explicit control

Normalizing Existing Repository

If your repository already contains mixed line endings:

# Remove all files from Git index
git rm --cached -r .

# Restore files with normalized line endings
git reset --hard

# Commit the normalized files
git add .
git commit -m "Normalize line endings"
Enter fullscreen mode Exit fullscreen mode

Editor Configuration

Configure text editors to use Unix line endings by default.

Visual Studio Code (settings.json)

{
  "files.eol": "\n",
  "files.encoding": "utf8",
  "files.insertFinalNewline": true,
  "files.trimTrailingWhitespace": true
}
Enter fullscreen mode Exit fullscreen mode

Set per-language if needed:

{
  "[markdown]": {
    "files.eol": "\n"
  }
}
Enter fullscreen mode Exit fullscreen mode

Vim/Neovim (.vimrc)

set fileformat=unix
set fileformats=unix,dos
Enter fullscreen mode Exit fullscreen mode

Emacs (.emacs or init.el)

(setq-default buffer-file-coding-system 'utf-8-unix)
Enter fullscreen mode Exit fullscreen mode

Sublime Text (Preferences.sublime-settings)

{
  "default_line_ending": "unix"
}
Enter fullscreen mode Exit fullscreen mode

JetBrains IDEs (Settings → Editor → Code Style)

  • Line separator: Unix and macOS (\n)

EditorConfig

Create .editorconfig in project root for cross-editor compatibility:

root = true

[*]
end_of_line = lf
charset = utf-8
insert_final_newline = true
trim_trailing_whitespace = true

[*.md]
trim_trailing_whitespace = false

[*.{sh,bash}]
end_of_line = lf

[*.bat]
end_of_line = crlf
Enter fullscreen mode Exit fullscreen mode

Most modern editors automatically respect EditorConfig settings, ensuring consistency across team members using different editors.

Automation and Scripting

Integrate line ending checks into development workflows to catch issues early.

Pre-commit Git Hook

Create .git/hooks/pre-commit:

#!/bin/bash
# Check for files with CRLF line endings

FILES=$(git diff --cached --name-only --diff-filter=ACM)
CRLF_FILES=""

for FILE in $FILES; do
    if file "$FILE" | grep -q "CRLF"; then
        CRLF_FILES="$CRLF_FILES\n  $FILE"
    fi
done

if [ -n "$CRLF_FILES" ]; then
    echo "Error: The following files have Windows line endings (CRLF):"
    echo -e "$CRLF_FILES"
    echo ""
    echo "Convert them using: dos2unix <filename>"
    echo "Or configure your editor to use Unix line endings (LF)"
    exit 1
fi

exit 0
Enter fullscreen mode Exit fullscreen mode

Make executable:

chmod +x .git/hooks/pre-commit
Enter fullscreen mode Exit fullscreen mode

Continuous Integration Check

Add to CI pipeline (GitHub Actions example):

name: Check Line Endings

on: [push, pull_request]

jobs:
  check-line-endings:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Check for CRLF line endings
        run: |
          if git ls-files | xargs file | grep CRLF; then
            echo "Error: Files with CRLF line endings detected"
            exit 1
          fi
Enter fullscreen mode Exit fullscreen mode

Bulk Conversion Script

Create convert-line-endings.sh for project maintenance:

#!/bin/bash
# Convert all text files in project to Unix line endings

set -e

EXTENSIONS=("md" "txt" "sh" "py" "go" "js" "json" "yml" "yaml" "toml")

echo "Converting line endings to Unix format..."

for ext in "${EXTENSIONS[@]}"; do
    echo "Processing *.$ext files..."
    find . -name "*.$ext" ! -path "*/node_modules/*" ! -path "*/.git/*" \
        -exec dos2unix {} \; 2>/dev/null || true
done

echo "Conversion complete!"
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Common Issues

Issue 1: Script Still Fails After Conversion

Symptom: Bash script converted with dos2unix still shows interpreter errors.

Solution: Check file encoding and byte order mark (BOM):

# Check encoding
file -i script.sh

# Remove BOM if present
sed -i '1s/^\xEF\xBB\xBF//' script.sh

# Verify shebang line
head -n 1 script.sh | od -c
Enter fullscreen mode Exit fullscreen mode

Issue 2: Mixed Line Endings in Single File

Symptom: File shows both CRLF and LF endings.

Solution: Normalize with dos2unix force mode:

dos2unix -f filename.txt
Enter fullscreen mode Exit fullscreen mode

Or use more aggressive sed:

# First convert all CR to nothing, then normalize
sed -i 's/\r//g' filename.txt
Enter fullscreen mode Exit fullscreen mode

Issue 3: Git Still Shows File as Modified

Symptom: After converting line endings, Git shows file as modified with no visible changes.

Solution: Refresh Git index:

git add -u
git status

# If still showing, check Git config
git config core.autocrlf

# Temporarily disable autocrlf
git config core.autocrlf false
git add -u
Enter fullscreen mode Exit fullscreen mode

Issue 4: Hugo Build Fails After Conversion

Symptom: Hugo fails to parse frontmatter after line ending conversion.

Solution: Check for Unicode BOM and frontmatter syntax:

# Remove BOM from markdown files
find content -name "*.md" -exec sed -i '1s/^\xEF\xBB\xBF//' {} \;

# Verify YAML frontmatter
hugo --debug
Enter fullscreen mode Exit fullscreen mode

Issue 5: dos2unix Not Available

Symptom: System doesn't have dos2unix and you can't install packages.

Solution: Use portable shell function:

dos2unix_portable() {
    sed -i.bak 's/\r$//' "$1" && rm "${1}.bak"
}

dos2unix_portable filename.txt
Enter fullscreen mode Exit fullscreen mode

Special Cases for Hugo Sites

Hugo static sites have specific considerations for line endings, particularly in content files and configuration.

Converting Hugo Content

# Convert all markdown content files
find content -name "*.md" -exec dos2unix {} \;

# Convert configuration files
dos2unix config.toml config.yaml

# Convert i18n translation files
find i18n -name "*.yaml" -exec dos2unix {} \;

# Convert layout templates
find layouts -name "*.html" -exec dos2unix {} \;
Enter fullscreen mode Exit fullscreen mode

Handling Frontmatter

YAML frontmatter is particularly sensitive to line ending issues. Ensure consistency:

# Check frontmatter-containing files
for file in content/post/**/index.md; do
    if head -n 1 "$file" | grep -q "^---$"; then
        file "$file"
    fi
done | grep CRLF
Enter fullscreen mode Exit fullscreen mode

Hugo Build Scripts

Ensure build and deployment scripts use Unix line endings:

dos2unix deploy.sh build.sh
chmod +x deploy.sh build.sh
Enter fullscreen mode Exit fullscreen mode

Performance Considerations

For large projects with thousands of files, conversion performance matters.

Benchmark Comparison

Converting 1000 markdown files:

# dos2unix: ~2 seconds
time find . -name "*.md" -exec dos2unix {} \;

# sed: ~8 seconds
time find . -name "*.md" -exec sed -i 's/\r$//' {} \;

# Parallel dos2unix: ~0.5 seconds
time find . -name "*.md" -print0 | xargs -0 -P 4 dos2unix
Enter fullscreen mode Exit fullscreen mode

Parallel Processing

Use GNU Parallel or xargs for faster batch conversion:

# Using xargs with parallel execution
find . -name "*.md" -print0 | xargs -0 -P 8 dos2unix

# Using GNU Parallel
find . -name "*.md" | parallel -j 8 dos2unix {}
Enter fullscreen mode Exit fullscreen mode

Cross-Platform Development Best Practices

Establish team conventions to prevent line ending issues from the start.

1. Repository Setup Checklist

  • [ ] Add .gitattributes with text file declarations
  • [ ] Set core.autocrlf in team documentation
  • [ ] Include .editorconfig in repository
  • [ ] Add pre-commit hooks for validation
  • [ ] Document line ending policy in README

2. Team Onboarding

New team members should configure:

# Clone repository
git clone <repository>
cd <repository>

# Configure Git
git config core.autocrlf input  # Linux/macOS
git config core.autocrlf true   # Windows

# Verify setup
git config --list | grep autocrlf
cat .gitattributes
Enter fullscreen mode Exit fullscreen mode

3. Code Review Guidelines

  • Reject PRs with line ending-only changes
  • Use git diff --ignore-cr-at-eol for reviews
  • Enable line ending checks in CI/CD

4. Documentation

Include in project README:

## Line Ending Convention

This project uses Unix line endings (LF) for all text files.

**Setup:**

- Linux/macOS: git config core.autocrlf input
- Windows: git config core.autocrlf true

**Converting Files:**
dos2unix filename.txt

See .gitattributes for file-specific configurations.
Enter fullscreen mode Exit fullscreen mode

Related Hugo and Linux Topics

Working with text files across platforms involves understanding various related tools and workflows. Here are resources for deeper dives into complementary topics:

External Resources

These authoritative sources provided technical details and best practices for this article:

Top comments (0)