UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0
The file you’re reading isn’t UTF-8 encoded, but Python is trying to read it as UTF-8.
Fix 1: Specify the correct encoding
# ❌ Assumes UTF-8
with open("data.csv") as f:
content = f.read()
# ✅ Try latin-1 (handles most Western European text)
with open("data.csv", encoding="latin-1") as f:
content = f.read()
# ✅ Or Windows encoding
with open("data.csv", encoding="cp1252") as f:
content = f.read()
Fix 2: Detect the encoding
pip install chardet
import chardet
with open("data.csv", "rb") as f:
result = chardet.detect(f.read())
print(result) # {'encoding': 'ISO-8859-1', 'confidence': 0.73}
with open("data.csv", encoding=result["encoding"]) as f:
content = f.read()
Fix 3: Ignore or replace bad characters
# Skip bad characters
with open("data.csv", encoding="utf-8", errors="ignore") as f:
content = f.read()
# Replace bad characters with ?
with open("data.csv", encoding="utf-8", errors="replace") as f:
content = f.read()
Fix 4: Read as binary
# If you don't need text (e.g., images, PDFs)
with open("file.bin", "rb") as f:
data = f.read()