Clean up UTF files

This command:

iconv -f utf-8 -t utf-8 -c file.txt

will clean up your UTF-8 file, skipping all the invalid characters.

-f is the source format -t the target format -c skips any invalid sequence

Also can use Linux "strings" application