UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data: what it means and how to fix it
TL;DR: Validate locally, fix the first real error, validate again (no upload).
Fix UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data by decoding safely and locally (no upload).
What the error means
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: unexpected end of data means a decoder rejected the input as invalid encoding. The fastest path is to identify what format you have, normalize it, then decode again.
Most common real-world causes
- A UTF-8 BOM (byte order mark) or invisible leading characters were introduced at the start.
- The bytes are not actually UTF-8 (file is UTF-16/Windows-1251/Latin-1, or it's binary data).
- The input is not actually encoded in the expected format (Base64 vs Base64URL vs plain text).
- You copied only part of the string (truncated token/payload).
- Whitespace/newlines were introduced during copy/paste.
- Wrong character set: URL-safe Base64 uses '-' and '_' instead of '+' and '/'.
- You decoded using the wrong function (decodeURIComponent on non-URL-encoded data, atob on non-Base64).
Fast debugging steps
- If you control the source, re-export as UTF-8 without BOM; otherwise decode with the right codec.
- Inspect the first bytes (hex) and try decoding with the correct encoding (utf-8-sig can strip BOM).
- Confirm what you are decoding (URL encoding, Base64, Base64URL, JWT).
- Trim whitespace and remove line breaks before decoding.
- If it's a JWT, ensure it has 3 dot-separated parts (header.payload.signature).
- If it's Base64URL, convert '-' -> '+' and '_' -> '/' and add padding if needed.
Code example (python)
# Python: handle UTF-8 BOM / wrong encoding safely
from pathlib import Path
p = Path('input.txt')
raw = p.read_bytes()
print('first bytes:', raw[:12])
# Try UTF-8 with BOM stripping
text = raw.decode('utf-8-sig', errors='strict')
print(text[:200])
# If that still fails, the file is not UTF-8. Try the correct encoding (example):
# text = raw.decode('cp1251', errors='strict')
# print(text[:200])
Fix without uploading data
Encoded strings often contain secrets (tokens, IDs). Decode locally and share only redacted snippets.
- URL Encode/Decode for percent-encoding.
- Base64 Encode/Decode for Base64/Base64URL payloads.
- JWT Decoder to inspect header/payload without uploads.
FAQ
Is Base64 the same as Base64URL? No. Base64URL uses '-' and '_' and often omits padding. Normalize before decoding.
Does decoding a JWT verify it? No. Decoding shows claims; verification requires the signing key.
Related tools
Related guides
Privacy & Security
All processing happens locally in your browser. Files are never uploaded.