Announcement

Collapse
No announcement yet.

Unicode Filenames

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    Nice idea to add some data. Here's a pdf with a description of the Unicode structure :



    And another one, htm, with the code charts.

    0.6180339887
    Rest In Peace, Sam!

    Comment


      #17
      Some little points that I'll explain later:


      1. There's no need to rebuild the entire system from 8-bits to 16-bits.
      2. We really need global language support.
      3. Temporal solution.
      4. Quoted.


      1. There's no need to rebuild the entire system from 8-bits to 16-bits.

      Users with different locales can read files with local names in their own environments. I made some little experiments. I installed Irfanview in a Spanish Windows XP Pro and it reads almost all the files with western names. I installed it in a Chinese Windows XP, and I reads files in japanese, chinese and 128 base western character set, some accents but not Ç and Ñ. Ok, then I installed applocale* and I created two shorcuts for Irfanview, one applocaled and the normal one. Then, I can "applocale" Irfanview under a chinese environment, and I can open files with chinese, and japanese names. The problem is, cause I have to explicit tell applocale when to open a program with applocale support, I can't establish it as the double click default option. OK. The point is, under a spanish environment I can open a chinese named file, just using applocale. I figure names just affect the way the file is opened or accessed.

      Of course a tool like Total Commander demands a lot of Unicode support and of course a lot of coding. It's a tool for file management. It works a lot with file names. But IrfanView just need it to access the file buffer... that is, to open, to save and to rename, nothing else. Maybe thumbnails demands more support.

      2. We really need global language support.

      I can't ask all the time my chinese teacher to rename the files she gave me in western encoding. More, I worked in the language department at university, and there she was the one who asked me about computer support. And of course it is not a professional answer to say "Errrmm, teacher, please rename all your files to western encoding cause we don't support it". We can open her chinese named files... songs in winamp, letters in word, spread sheets in excel... and so on. And images? "Oh, please rename all your one hundred or more images you take the day of the event in western encoding..." No, the answer turned in "let me find a program that supports chinese in file names". Of course, I found a program.

      We can't limit compatibility between systems. Even current Windows systems are able to show japanese file names in the DOS prompt (provided you installed the right font). Linux supports UTF8, I think MacIntosh too, and so on.

      Some files must be written in its original names. If I name a file zhang, no body would know who or what is it, zhang needs extra accent characters like zhǎng or zhāng, and those tone marks ("accents") are also special characters... and still it does not reflect the content of the file.

      Some people like me may have tons of images, maybe 10 GB or more, and a lot of them have chinese names. In my case i have about 6 GB in images with eastern names. Of course I have no time to rename all of them. Of course i cannot use Irfanview batch conversion to rename them, then I have to search for a tool able to rename all of them... Hey!! If I have to spend my time searching for such tool, then I'll search for a image viewer supporting files with special characters in its name. I'm talking about facts, there's no time for renaming all those files, and i don't want to lose relevant information placed in the file name just to use an obsolete rule of thumb...

      3. Temporal solution.

      I'm using Applocale to force irfanview to read chinese and japanese characters. I have no way to tell it to automatically use one environment or another. Then i have to use Kujawiak Viewer as the double click default application and place a link for Irfanview in my dock. But i'm tiring of this, I want Kujaviaw to be as good as Irfanview is or to see Irfanview support character encoding as Kujawiak does...

      4. Quoted

      Originally posted by j7n View Post
      I have absolutely no idea how Japanese or Chinese users work with a computer. I've seen crazy tools called IME for Windows and can't imagine typing everyday text using any other thing than simple PS/2 keyboard.
      IMEs are really simple tools in current days, I use them everyday and there's no problem using them, the best IMES are almost as simple as the simple keyboard. And the best tool I saw is the google pinyin IMe... I mention it for two reasons, One to clarify, Chinese, Japanese, Korean, Arabic,... , people writes in their own character encodings. They don't use Western encodings, and of course, they prefer (and even really need) to write some names using those encodings. For example, if I have to save a file of somebody names Zhang, I often need to place the correct character, cause there's to many characters for the sound Zhang. The second reson, even something as big as google is noticing the demand of Unicode support, and it's doing its part... then...

      * Applocale is a tool for Windows XP that allows users to force a language support to a program. For example you can open IrfanView or WinRAR and make the program "believe" it is running under a Korean environment, for example. Applocale works in a per program basis, it means, you open a program through it, and the program is affected by environment "change" but the other programs works in the real language environment. More, you can open an Irfanview in your local env, and then open a second applocaled Irfanview, and then open a third applocaled, but in a different environment, Irfanview.

      A better applocale version is the piapplocale (papplocale).
      more cablop?
      http://cablop.net

      Comment


        #18
        Thanks a lot for your written argumentation.
        A thorough investigation of the issues here. I've learned about the practical aspects involved.
        No wonder software has been developed to bridge the presentation, like Applocale. And I always like the elegancy, if
        such an app operates in a local environment, instead of changing the whole system.
        But these are just tools to decrease the burden.
        I agree fully with :
        2. We really need global language support.

        ~ 1. There's no need to rebuild the entire system from 8-bits to 16-bits. -
        I have my doubts. At least to obtain a smooth functioning. Not so much in filenaming, but in other areas.
        The ASCII set is 8 bits, but has a an area of special characters beyond 128, so 7 bits as a basic.
        I can imagine the 16 bits set having a 'basic' set of 256, and an expanded set of the higher bits, containing the 'special characters'.

        Another aspect I wonder about. Chinese characters are combinations of 'entities', not of letters.
        So if this should be represented on a screen by calling a data base to represent the proper pixels, I suppose making such a character would imply combining several pixel patterns into a new one. Characters will have 'layers'.
        0.6180339887
        Rest In Peace, Sam!

        Comment


          #19
          I think Chinese letters are just letters from the Unicode point of view. If a character can be combined from simpler shapes instead of being entirely redrawn it can be done at the font level – in order to reduce size of the font. The system would not care if it is either case. It is also done for normal Latin characters with additional marks (äéņ, etc), not just Far East.

          We really need global language support.
          Unicode support is already there where we need it most: In text processing, both typographical (Word) and plaintext (EmEditor), web publishing. Unicode works there also under Win98. It makes no sense to entirely rebuild the way how computers see text. I'd compare the work to be done to convert everything to 16-bits with teaching every person in the world another universal language.

          In the past I've read about attempts to internationalize domain name system of the Internet. DNS records would be exchanged similarly to HTML entities and could contain any crazy characters. Some agencies offered considerably reduced prices to register these domains. Complete BS.

          Comment


            #20
            Good points. But it would be nice if IV could handle some aspects better.

            Maybe 'special characters' should be defined as those, one shouldn't use, because they're part of some syntax.
            Like a plain '&' in a HTML source.
            0.6180339887
            Rest In Peace, Sam!

            Comment


              #21
              +1.
              Just wanted to add this thread, but it's already added.
              UNICODE names support is needed.

              Comment


                #22
                About character encodings

                There are two approaches to it.

                One is to use a index like you said, 8 bits per character, 16 bits, and so on. This is the Unicode way.

                Second approach, like utf8, is better, western characters use 8 bits, and other characters used more bits, same way telephone numbers do. Local numbers have just 7 digits (8 in some locations) and long distance numbers starts with an unique identifiers and uses more digits. Same here.

                You can test it using a plain file and a browser, just copy and paste some western code in a txt file, and save it normally, then open it in a browser and change the encoding of this view to utf8 and you notice basic characters remains intact.

                Even more, this trick is useful when you have a bad formated file in chinese or japanese and you need to recover the "garbage" characters. Place the code inside a txt file and open in the browser and start changing the character encoding to find the correct text...

                Characters are just a matter of bits, and there are some libraries available to handle them.

                About chinese components, in encoding each character is just a character, that's simple. The radicals and strokes are just used for font displaying, we don't need to take much care about it, just use the system font renderer and go ahead.

                Is IV programmed in C++?
                Last edited by cablop; 10.12.2007, 02:35 AM.
                more cablop?
                http://cablop.net

                Comment


                  #23
                  Second approach, like utf8, is better
                  It does not remove the need to convert everything to unicode. If you simply stick UTF strings in place of ANSI, you'll break the correctly functioning special characters that make use of the eight bit (128-255). It's not that currently you can have only 7-bit basic latin.

                  Originally posted by cablop View Post
                  You can test it using a plain file and a browser, just copy and paste some western code in a txt file, and save it normally, then open it in a browser and change the encoding of this view to utf8 and you notice basic characters remains intact.

                  Even more, this trick is useful when you have a bad formated file in chinese or japanese and you need to recover the "garbage" characters. Place the code inside a txt file and open in the browser and start changing the character encoding to find the correct text...
                  I find EmEditor 3 plaintext editor very useful for this purpose. It has more encodings available, including obscure DOS/OEM and all Unicode variants (basically every NLS installed on the system). It's paid software though, if you care.

                  About chinese components, in encoding each character is just a character, that's simple. The radicals and strokes are just used for font displaying, we don't need to take much care about it, just use the system font renderer and go ahead.
                  I agree. Recently I read an article on Wikipedia about precomposed characters and got an impression that this normal way of treating special symbols is somehow old fashioned. The modern way of doing things is to let the software program combine the complex letter from each fragment. While separate 'diacritical marks' might indeed be useful in certain situations, such as when placing accents over arbitrary letters, I don't believe it's the right way of processing everyday text.

                  Wikipedia claims that "The precomposed characters are included in the character set to aid computer systems with incomplete Unicode support, where decomposed equivalent characters may render incorrectly."

                  I wonder if it has anything to do with the fact that computer newbies usually see special characters as the base letter plus another typographical entity.
                  Last edited by j7n; 10.12.2007, 03:36 PM.

                  Comment


                    #24
                    You are right, 8-bit based characters change in utf-8. That's right. But, what I mean is simple text things does not need to be changed.

                    And about the composition of characters by parts... we does not need to care about it. We just need to manage the names in utf-8/unicode.

                    I was reading my fat32 and ntfs file systems in linux. Doing that i know Windows XP is saving file names in utf-8, it does mean it is not using unicode. If i don't use utf-8, some characters, like ñ or ç are rendered with another character, but i can still read the files. That makes me thing, regardless of how each character is represented, file management just takes care of the number. I mean system is working with bytes not with characters. If we can read utf-8, what we just need to do is to be able to retrieve the string BYTES and to set the utf-8 string from BYTES when we need to do that. In this way we just need to translate file names utf-8 to bytes when we read it internally in program, and translate from bytes and use utf-8 when program interact with Operative System, for example in renaming and searching... But it's just an idea.
                    more cablop?
                    http://cablop.net

                    Comment


                      #25
                      The Windows interface supports ASCII and 16bit Widecharacters. There are functions to convert between both using a codepage parameter. Most windows functions using strings as parameters or in structures are available in these two variants marked with an A or an W at the end. That's it.

                      It's a lot of work to port a quite old program code to Unicode. Filenames and paths are used not just to access files they must also be presented in the user interface. Immediately if you use one element in Unicode you also have to change other strings. Because there are no functions allowing a mixture of parameter types.

                      Then there is the issue of plugins. They are not all under control of Irfanview.

                      I'm not sure that there will be a Unicode version of Irfanview using the current code base. There are ways to fake the program, maybe this will be the solution.

                      I would say to get Unicode rewrite Irfanview from scratch taking the chance to clean up all the small odds. There is then also a chance to get a linux or mac version.

                      Comment


                        #26
                        I was reading my fat32 and ntfs file systems in linux. Doing that i know Windows XP is saving file names in utf-8, it does mean it is not using unicode.
                        Dunno about NTFS, but FAT32 have long filenames (LFNs) in 16-bit Unicode. UTF is the same unicode but with its 16 bits distributed across four bytes, omiting most significant bits of they're zero.

                        Comment


                          #27
                          I found two programs for UTF file names reading, one is Kujawiak, the other is Quivi. Both are open source.

                          The advantage in this case is some parts of Quivi source code are under a MIT style license, that means you can use its code in closed source applications. I mean, we can take Quivi and read its code and legally implement it in IrfanView and we also don't need to release any bit of code. Of course qe must check if the code we need is Mit licensed or not.

                          The other solution could be join Irfan and develop a solution for unicode support for IV...

                          I think unicode file names support is a hard task to accomplish, but a lack in this feature is a great menace to this image viewer.

                          In this moment the advantage of IV over Quivi is IV open files really really fast and reliable and has much more features. But is just a matter of time for an open source application to arrive at status of other applications.


                          Note.- I taked a second look at quivi license and i think whole package is under a"FreeImage Public License - Version 1.0" but just quivi code without the other packeges are MIT style licensed
                          Last edited by cablop; 25.12.2007, 08:01 PM.
                          more cablop?
                          http://cablop.net

                          Comment


                            #28
                            Originally posted by cablop View Post
                            I think unicode file names support is a hard task to accomplish, but a lack in this feature is a great menace to this image viewer.
                            I fail to see how the lack of Unicode filename support could be a great menace to IrfanView.

                            I think one needs to keep a sense of proportion here. IrfanView is not going to be replaced by other free image viewers due to lack of Unicode support. Other features should be a higher priority for Irfan to spend his precious development time on fixing.

                            For example, toolbar and shortcut customisation would probably be of greater benefit for most users than Unicode filename support. Fixing the bugs with zooming is the highest priority, in my view.
                            Before you post ... Edit your profile • IrfanView 4.62 • Windows 10 Home 19045.2486

                            Irfan PaintIrfan View HelpIrfanPaint HelpRiot.dllMore SkinsFastStone CaptureUploads

                            Comment


                              #29
                              Of course it is a menace!

                              For example I can't see my images, and I have no time to rename them (a collection of about 20 GB of images and more than 4 GB of them in japanese, chinese and korean names, not saying what happened if I share my spanish accented file names with friends in China and Japan). Of course I cannot use batch rename of IV, IV is not reading their original names.

                              What I mean is, if Irfan View just support its own locale, I cannot exchange files with people around the world.

                              IV supports free and commercial use, so some few scenarios that render IV useless:
                              • Use at home interchanging fotos with friends in Japan, Korea, China, Poland and Russia... I can't open their files without making some tricks before.
                              • Working with designers from other countries (mostly freelance working). I didn't do it before, I know I need to buy some commercial licences of IV, but of course I'll pay for a useful tool.
                              • Working at university at foreign language department resource center. We had Japanese, Chinese and Russian there. No way to open images of homeworks, material teachers bring to center and so on. Of course not we all technician support staff knew every language at center, so we just need to work with things in an easy way. "Teacher, do you need to print that? OK, just wait a minute." (Regardless if we could or not to read it, computer must).
                              We like IrfanView a lot, this is one of the best programs I ever used. But it's hard to make low level tricks to make it works.

                              In fact, world is moving to globalization and to language exchange. Windows 2000 Operative systems supports Unicode (7 years ago) and of course Windows XP improved it, and even the painful Windows Vista has an improved Unicode/UTF8 support. All the leading applications supports it, Office, Media players, Internet Browsers, Mail Clients, Java, PHP, Python, MySQL, PostgreSQL, Macromedia, Corel and so on. I moved from WinRAR to 7zip cause 7zip program and format work better with character names. And I'm just talking about Windows. Linux has a nice language support, and for free, you just switch from one user session using the whole interface in portuguese to another user session in same machine to see all the interface in chinese, menus, tool bars, help files, pop up menus, all!. More than one language environment in the same operative system.

                              No matter how nice, good, necessary or powerful a new program feature could or will be, that become useless if we cannot open the files.

                              In other words, applications with no Unicode support are in the same situation as hardware with no Y2K support was.

                              Do you want to see something IV styled software able to open files with chinese or korean file names? Take a look at Universal Viewer (ATViewer) and see how similar to IV this software is, but it does the homework.
                              more cablop?
                              http://cablop.net

                              Comment


                                #30
                                Originally posted by cablop View Post
                                Of course it is a menace!
                                I think you have no idea at all what most IV users need. Users from Asia who are sending you files with Unicode filenames/paths obviously do not use IV. Those who are using IV clearly do not regard Unicode support as essential. Many other things will be a higher priority for them as they never need to open such files. For them, time spent on adding Unicode support means time lost for other tasks.

                                This feature request is not even in the pipeline for development as far as I can tell, though it has been known about for a long time.

                                IV is not like a browser or a word-processor. It can process the image data if the file is renamed. I suggest looking for a file renaming utility that can handle Unicode.
                                Before you post ... Edit your profile • IrfanView 4.62 • Windows 10 Home 19045.2486

                                Irfan PaintIrfan View HelpIrfanPaint HelpRiot.dllMore SkinsFastStone CaptureUploads

                                Comment

                                Working...
                                X