Results 1 to 4 of 4

Thread: UTF-8 Encoding for Text Files

  1. #1
    IV Newbie
    Join Date
    Jul 2018
    Posts
    1

    Default UTF-8 Encoding for Text Files

    When opening text files, such as a list of file names for use in a slideshow, if the text file uses UTF-8 characters but is not saved as UTF-8 BOM (base only media), then IrfanView will incorrectly parse the file. Thus any file with unicode characters will be inaccessible.

    It is possible to go into like, Notepad++, and then change the encoding and save the file (from UTF to UTF BOM).

    It is a cryptic error that will cause people to report unicode bugs, unopenable files, etc..

    You can avoid this by detecting the encoding of the text file based on the text contents. This is a routine thing done by lots of programs so there should be something you can just copy and paste from github for your next update
    Last edited by Bhikkhu Pesala; 05.07.2018 at 06:57 PM. Reason: Fixed thread title

  2. #2
    Moderator Enterprise User Bhikkhu Pesala's Avatar
    Join Date
    May 2007
    Location
    East London
    Posts
    5,784
    Version
    IrfanView 4.51
    OS
    64-bit Win 10
    CPU Cores
    1

    Default

    If you save a slideshow text file from IrfanView that includes Unicode characters the file will look like this in Notepad or Notepad2:

    ; UNICODE FILE - edit with care ;-)


    C:\TEMP\IrfanView 4.51 32-bit\UTF-8 Filenāme.jpg

    If you edit text files in other apps, it is up to you to take care that it is compatible with the encoding used by IrfanView.

    I won't forward this report, as I am sure that Irfan Skiljan will just say that it is not a bug in IrfanView. If you disagree, read the sticky thread and submit your own bug report, then report back to tell us what he said.

  3. #3
    Power User j7n's Avatar
    Join Date
    Jun 2006
    Location
    Cyberspace
    Posts
    515
    Version
    IrfanView 4.51
    OS
    32-bit Win Server 2003 SP1
    CPU Cores
    1

    Default

    A slideshow list saved from IrfanView is a 16-bit Unicode file with the Byte Order Mark (base only media?). The first two bytes are FF FE. They must be preserved while editing, or IrfanView will not load the list back in. It loads a list as ANSI by default, or loads nothing if the list contains null bytes as with a normal Unicode file.

    Could Irfan reliably tell the encoding between ansi-win-cp and utf-8 if just one or two special characters are encountered in the list? Filenames are usually kept ansi-safe for compatibility with old programs, but sometimes a special character slips in.

  4. #4
    Moderator Enterprise User Bhikkhu Pesala's Avatar
    Join Date
    May 2007
    Location
    East London
    Posts
    5,784
    Version
    IrfanView 4.51
    OS
    64-bit Win 10
    CPU Cores
    1
    Last edited by Bhikkhu Pesala; 07.07.2018 at 06:21 AM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •