UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data: what it means and how to fix it

TL;DR: Validate locally, fix the first real error, validate again (no upload).

Fix UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data by decoding safely and locally (no upload).

What the error means

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data means a decoder rejected the input as invalid encoding. The fastest path is to identify what format you have, normalize it, then decode again.

Most common real-world causes

  • A UTF-8 BOM (byte order mark) or invisible leading characters were introduced at the start.
  • The bytes are not actually UTF-8 (file is UTF-16/Windows-1251/Latin-1, or it's binary data).
  • The input is not actually encoded in the expected format (Base64 vs Base64URL vs plain text).
  • You copied only part of the string (truncated token/payload).
  • Whitespace/newlines were introduced during copy/paste.
  • Wrong character set: URL-safe Base64 uses '-' and '_' instead of '+' and '/'.
  • You decoded using the wrong function (decodeURIComponent on non-URL-encoded data, atob on non-Base64).

Fast debugging steps

  • If you control the source, re-export as UTF-8 without BOM; otherwise decode with the right codec.
  • Inspect the first bytes (hex) and try decoding with the correct encoding (utf-8-sig can strip BOM).
  • Confirm what you are decoding (URL encoding, Base64, Base64URL, JWT).
  • Trim whitespace and remove line breaks before decoding.
  • If it's a JWT, ensure it has 3 dot-separated parts (header.payload.signature).
  • If it's Base64URL, convert '-' -> '+' and '_' -> '/' and add padding if needed.

Code example (python)

# Python: handle UTF-8 BOM / wrong encoding safely
from pathlib import Path

p = Path('input.txt')
raw = p.read_bytes()
print('first bytes:', raw[:12])

# Try UTF-8 with BOM stripping
text = raw.decode('utf-8-sig', errors='strict')
print(text[:200])

# If that still fails, the file is not UTF-8. Try the correct encoding (example):
# text = raw.decode('cp1251', errors='strict')
# print(text[:200])

Fix without uploading data

Encoded strings often contain secrets (tokens, IDs). Decode locally and share only redacted snippets.

FAQ

Is Base64 the same as Base64URL? No. Base64URL uses '-' and '_' and often omits padding. Normalize before decoding.

Does decoding a JWT verify it? No. Decoding shows claims; verification requires the signing key.

Privacy & Security
All processing happens locally in your browser. Files are never uploaded.