TLDR: Convert your problem file with Sublime Text by opening the file and using “Save with encoding” as
utf-8. Alternatively, use
iconv -t UTF-8//TRANSLIT -c Zip_Zhvi_SingleFamilyResidence.csv > new_file.csv
When does this error happen?
I wanted to parse the housing data from Zillow at their research page. Zip code is a great measure of single family home real estate values.
However, when I download this data set as “Zip_Zhvi_SingleFamilyResidence.csv”, I could not simply load this data into
This last line seemed like the clue:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 4: invalid continuation byte
Well, what format is that file?
Using a Mac, we can use
file -I <file_name>
Oh, great! its “us-ascii”, we just pass that
Oh maybe, I need to specify the encoding I want. WHY PANDAS, WHY!?
Why does this error happen?
Some encoding error has occurred, maybe because you accidentally opened Excel before opening
ipython or Zillow saves in a crazy format.
Awesome, lets just convert it
Let’s use the *nix program
iconv to convert the file. According to the man page (
man iconv), “The iconv program converts text form one encoding to another encoding. Great!
Let’s use this.
iconv -f us-ascii -t utf-8 < Zip_Zhvi_SingleFamilyResidence.csv > new_zip_code_file.csv
iconv, that’s your only job… you know, unix philosophy, one program, one job done well etc etc.
Turns out if you use “//TRANSLIT” appended to the encoding, characters are transliterated when needed and
possible (man page)
Solution 1 –
> iconv -t UTF-8//TRANSLIT -c Zip_Zhvi_SingleFamilyResidence.csv > new_file.csv
> mv new_file.csv Zip_Zhvi_SingleFamilyResidence.csv
Solution 2 (easier to remember) – Sublime Text
Is there a better free editor than Sublime? Be a good citizen and buy your license.
Step 1: Open your file in Sublime Text
Step 2: Save with Encoding > UTF-8
read_csv to your hearts desire 🙂
ipython> data = pd.read_csv("new_file.csv")