lxml.etree.XMLSyntaxError: EntityRef: expecting ';': what it means and how to fix it
Troubleshoot lxml.etree.XMLSyntaxError: EntityRef: expecting ';' and fix invalid XML locally (no upload).
What the error means
lxml.etree.XMLSyntaxError: EntityRef: expecting ';' means the XML parser found invalid markup. Most XML issues are fixable by correcting the first broken tag/character and re-validating.
Most common real-world causes
- A raw '&' appears in text (needs escaping as &), or an entity is incomplete.
- Mismatched tags (opening/closing tags do not match).
- Unescaped reserved characters (especially & in text).
- Multiple root elements (XML must have exactly one root).
- Invalid characters (control chars, bad Unicode) inside the file.
- Wrong encoding or a stray BOM/prolog mismatch.
- Cut/truncated XML (missing closing tags).
Fast debugging steps
- Search for '&' in the failing area and replace '&' with '&' when it is plain text.
- Validate the first error first. Fixing the first broken tag usually resolves many downstream errors.
- Search around the reported line/column for mismatched tags or a raw '&' that should be '&'.
- Confirm the XML starts with a single root and ends with the matching closing tag.
- If the file comes from a feed/export, try a smaller sample to isolate the first invalid record.
Code example (lxml)
from lxml import etree
try:
# Use bytes if you're unsure about encoding
root = etree.fromstring(xml_bytes)
print(root.tag)
except etree.XMLSyntaxError as e:
print('XMLSyntaxError:', e)
# e.error_log can include line/column details
print('error_log:', e.error_log)
Fix without uploading data
If the XML contains sensitive data, validate and convert locally. No Upload Tools runs 100% in your browser.
- XML to JSON to trigger strict parsing and see the first parser error.
- JSON Validator to validate output if you post-process to JSON.
Workflow: validate -> fix the first error -> validate again -> export/convert.
FAQ
Why do I get different errors in different tools? XML parsers differ. Always fix the first error in the raw input and re-check.
Should I regex-fix XML? Avoid blind regex edits. Validate after each change so you know what you fixed.
Related by intent
Closest pages and hubs to accelerate crawl discovery and first impressions.