Announcement

Collapse
No announcement yet.

Seeking definitions, guidlines for "Config" settings in Kadmos plug-in, (OCR)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Seeking definitions, guidlines for "Config" settings in Kadmos plug-in, (OCR)

    The Kadmos plug-in for IrfanView does not have a Help option in its "Config" window.
    Altering even one default setting can produce improved OCR recognition, as for example, deselecting "spot removal". However, many words used within "Config" are not intuitively obvious. Examples of that are "reject limit", "reject level", and "alternative segmentation". Also, the number of possible configuration choices is very large so discovering optimum settings becomes a challenge of trail-and-error.Where can I find definitions and guidelines for "Config" so I can get better Kadmos results? Thank you.

    #2
    I don't see any meaningful help file. I would suggest looking around the Kadmos Site and contacting them for some user-friendly documentation. Perhaps it has not be written yet. All I could find looked like a programmer's manual.
    Before you post ... Edit your profile • IrfanView 4.62 • Windows 10 Home 19045.2486

    Irfan PaintIrfan View HelpIrfanPaint HelpRiot.dllMore SkinsFastStone CaptureUploads

    Comment


      #3
      Building a record of "Config" experiments

      I will pursue that. Thank you.
      Meanwhile I am building a record of my "Config" experiments.
      [/FONT
      [/SIZE]]

      Comment


        #4
        Please let us know if you find out anything useful. I am sure others will be interested.
        Before you post ... Edit your profile • IrfanView 4.62 • Windows 10 Home 19045.2486

        Irfan PaintIrfan View HelpIrfanPaint HelpRiot.dllMore SkinsFastStone CaptureUploads

        Comment


          #5
          Kadmos developers' manual

          Kadmos' down-loadable developer's manual seems to be of little use to users, although this parenthetical statement is included: "The characters given under font have to be at least 10 pixels in width, but maximum 20 pixels (xminmax). The height of the characters is at least 15 pixels, but maximum 30 pixels (yminmax)." I have not examined the manual further. I did send a query re user manual to the North American Kadmos representative as listed at the Kadmos site.

          Comment


            #6
            Kadmos' response to my request for a guide to Config use was this on 12 Jan 2010:

            "Thank you for using KADMOS within Irfanview.
            That integration is intended as a demo to showcase the powerful functionality available in the KADMOS SDK.
            The options are described in detail in the manual, but from a developer's perspective.



            >From the online manual...

            reject_limit

            The value of this parameter determines the level of confidence (rec_value) for which alternatives are provided internally and returned in the recognition results. This has significant impact on computing time. The default value is 150.

            alternative segmentaion (in bold below)

            typograph
            This parameter allows to submit additional information to the REL and REP modules about the given images to simplify and speed up segmentation (if such information is available). The predefined parameter values below can be combined using the logical OR operator "|" as long as the combinations make sense. Of cause, not all combinations make sense.
            For REL and REP:

            TYPO_PROPORTIONAL

            Proportional spacing is assumed.

            TYPO_EQUIDISTANT

            Equidistant spacing (fixed or monospacing) is assumed.

            TYPO_NOLIGATURES

            With many fonts, but especially with handprint, some neighbouring characters may overlap, and touch each other. This is called a 'ligature'. Segmenting ligatures is a difficult problem and requires special algorithms. With fonts such as equidistant machine print there are normally no ligatures, so the related algorithms can be switched off. In this case this value must be set.

            TYPO_NOTOUCHINGCHARS

            No attached characters are assumed.

            TYPO_NOSEGALTERNATIV

            Segmentation into single characters is the most difficult task in character recognition. If characters are badly recognized, KADMOS tries alternative possibilities for segmentation. To switch this off (for example, if very good images are to be recognized) this value must be set.

            TYPO_4_SEGALTERNATIV, TYPO_8_SEGALTERNATIV

            4 or 8 segmentation alternatives are possible.

            TYPO_KEEPIMG

            For good segmentation it is insufficient to describe the segmented characters (lines or dots) by their surrounding rectangle only. If there is a need for direct access to the segmented images, then they must be stored separately, and this parameter value has to be set. The segmented images then can be accessed through result_image.

            reject level (not directly defined as part of the API)

            Controls at what confidence level, a character result is replaced by an internal function, by the default reject character. <end of Kadmos' response for help>

            Comment


              #7
              I think writing a report about the configuration would be very useful !
              0.6180339887
              Rest In Peace, Sam!

              Comment


                #8
                Remove Kadmos as a plug-in, replace with FreeOCR

                Owing to the lack of a Kadmos user guide and owing to Kadmos' lackluster desire to provide help, I have abandoned further research into that IrfanView plug-in. However, I would like to alert the forum about a superior and 100% free OCR program. While Kadmos for IrfanView, with much tweaking, still produced more than 25 errors in the conversion in a column of text, FreeOCR produced only two, and those were hyphens. The program is well-designed and intutive. According to its Help button, FreeOCR, V.3.0, Jan 2010, was written by Ralph Richardson using Tesseract v2.04. It operates under Apache License, Version 2.0. The program is available at www.paperfile.net and is registered to Ralph Richardson 4 Victoria Avenue, Hornsea, HU181NH UK. The program declares there is no support because it has no staff. I have been unable to find Richardson via typical searches, and I do not know if the address above is for the ISP or for Richardson or for both. This person needs to be told that he has provided an immense service to writers. IrfanView should look into this program as a replacement plug-in for Kadmos.

                Comment


                  #9
                  FreeOCR is one of the programs on my review pages.

                  Its fine for occasional use. To be honest, for anything more than occasional use it would be better to buy Abbyy FineReader Pro or similar.
                  Before you post ... Edit your profile • IrfanView 4.62 • Windows 10 Home 19045.2486

                  Irfan PaintIrfan View HelpIrfanPaint HelpRiot.dllMore SkinsFastStone CaptureUploads

                  Comment


                    #10
                    Originally posted by Bhikkhu Pesala View Post
                    FreeOCR is one of the programs on my review pages.

                    Its fine for occasional use. To be honest, for anything more than occasional use it would be better to buy Abbyy FineReader Pro or similar.
                    Your reviews are most useful. I downloaded Jarte at http://homepage.ntlworld.com/pesala/...tml/jarte.html and have enjoyed it.

                    Comment


                      #11
                      Like ggia, I too do not find the KADMOS OCR Plug-in too useful, as it gives too many errors. Besides, KADMOS has a mind-boggling variety of settings which I do not have the time (or inclination) to study!

                      Here is a comparison of FreeOCR and KADMOS, on a small, single-column of text: the original is the scan of a newspaper article, which you can see in the FreeOCR Window. The OCR results are for the second column of text in the Newspaper article. The settings used for both FreeOCR and KADMOS are their respective defaults.

                      Clearly, even in this tiny picture, FreeOCR gives significantly better results! Since we are comparing freeware, I think IrfanView will definitely benefit by switching from KADMOS to FreeOCR!
                      Attached Files
                      Last edited by WellOiledPC; 14.01.2010, 04:59 AM.
                      Download IrfanView Help Manual from:
                      IrfanView Website - Here
                      Sam_Zen's Website - Here
                      Author's Website - Here

                      Comment


                        #12
                        Yes. A plugin using the open-source Tesseract OCR engine would be nice. FreeOCR does not seem to support command-line operation.

                        EDIT: OCRopus might be the solution in the future.
                        Last edited by teohhanhui; 16.01.2010, 11:25 AM.

                        Comment


                          #13
                          Welcome to the Forum, teohhannui.

                          OCROpus still seems to be some time in the making, but could be worth watching...
                          Download IrfanView Help Manual from:
                          IrfanView Website - Here
                          Sam_Zen's Website - Here
                          Author's Website - Here

                          Comment


                            #14
                            Originally posted by Bhikkhu Pesala View Post
                            FreeOCR is one of the programs on my review pages.

                            Its fine for occasional use. To be honest, for anything more than occasional use it would be better to buy Abbyy FineReader Pro or similar.
                            I know this is a necrothread, but since others are bound to skip to the bottom of the thread in the future and see the last post for the recommended software to do OCR ... I just have to say (since nobody else has yet) that I found FreeOCR to be exceptionally good. I have no idea why I'd need anything else. There's no malware or nagware or anything associated with it as far as I can tell. Not even on install.

                            I'd like to know what advantages abbyy-finereader-pro has that FreeOCR doesn't have. I'll admit the one thing I wish FreeOCR did was parse together each paragraph into it's own line, but I just made an open-office spreadsheet that does that for me: http://liveinterface.com/davea0511fi...20COMBINER.zip

                            Just saying. Try FreeOCR before you decide to drop a wad-o-cash.

                            Comment


                              #15
                              Originally posted by davea0511 View Post
                              I'd like to know what advantages abbyy-finereader-pro has that FreeOCR doesn't have. I'll admit the one thing I wish FreeOCR did was parse together each paragraph into it's own line,
                              Pro OCR Software will preserve formatting, recognising fonts, etc. Free OCR has a button to remove line breaks. Never having had the need to use OCR for more than a page I am also happy to use Free OCR. I doubt whether its limited feature set would satisfy someone using OCR regularly.
                              Before you post ... Edit your profile • IrfanView 4.62 • Windows 10 Home 19045.2486

                              Irfan PaintIrfan View HelpIrfanPaint HelpRiot.dllMore SkinsFastStone CaptureUploads

                              Comment

                              Working...
                              X