Announcement

Collapse
No announcement yet.

Batch Removing Metadata (EXIF, XMP...) Increases File Size - Why?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Batch Removing Metadata (EXIF, XMP...) Increases File Size - Why?

    Hi All,

    When processing several JPG images to remove all metadata using Batch Conversion, I noticed almost all the files increased in size (File>Batch Conversion>Options>Un-checked all boxes). I just wanted to clean out the metadata. This makes no sense to me. I was under the impression that removing all the metadata would make the files smaller. If anything, they would at least stay the same size.

    All the "Advanced" features are turned off (Resize, crop, color, etc.).

    Sizes increased from as little as 5% up to a 40%. Any ideas on this?
    However, if the increase in file size is increase the images quality, I will be OK with that. I just can't notice any visual difference between the old and new image.

    #2
    Hi Brian,

    you can also use ImageMagick for that problem - via commandline you can remove all exif data:

    Code:
    mogrify -strip sample.jpg
    What be great if you can provide us a testfile or something. So I can try it with the commandline command and later we can check the filesize!

    So you can use IrfanView for that kind of "test" and I am using ImageMagick and later we can compare our results!

    Stefan

    Comment


      #3
      Hi Stefan,

      I'm not proficient with command line steps. I'm a GUI guy.
      Here's a Folder that includes 43 images and Excel with comparison files sizes before and after IrfanView. Now that I see them side by side, it makes no sense what's going on. Take note of TerriGarr.jpg in the spreadsheet. It went from 47KB to 113KB after removing all metadata. The test had all "Options" boxes unchecked (no XMP, no EXIF, etc.)

      Thanks.
      Last edited by BJ1200; 20.03.2012, 03:58 AM.

      Comment


        #4
        Hi,

        input files:

        Code:
        [stefan@politeia Invelos Update]$ du -skb * > old.txt
        66633    allen bestwick.jpg
        29759    Andrea Romano.jpg
        28329    barney hall.jpg
        35175    ben drew.jpg
        105164    billoah greene.jpg
        42672    BJ Hogg.jpg
        152343    Bobby Labonte.jpg
        204402    Bruce Timm.jpg
        175466    Cary Alexander.jpg
        37763    catalina rodriguez.jpg
        71735    Cathy Cahin Ryan.jpg
        25291    Christian Murphy.jpg
        47163    Danny Jacobs.jpg
        171149    david snell.jpg
        200301    jack oconnell.jpg
        5244    Jamie Downey.jpg
        311628    Jeff Gordon.jpg
        221497    Jimmie Johnson.jpg
        14715    Joe Moore.jpg
        56234    john fricker.jpg
        36920    John Pickard.jpg
        25908    Joseph Gilgun.jpg
        124342    julianne grossman.jpg
        28086    Keith Ferguson.jpg
        83457    Ken Leung.jpg
        189489    Keyonna Patterson.jpg
        74552    Liz Dainiels.jpg
        104490    Mary Alice Haney.jpg
        221684    Matt Kenseth.jpg
        19702    michael chiklis.jpg
        27022    Micke Spreitz.jpg
        28632    rasmus hardiker.jpg
        33961    rebecca chaney.jpg
        69179    Rick Gifford.jpg
        137515    ryan newman.jpg
        24359    Saratoga Ballentine.jpg
        266524    simon farnaby.jpg
        155083    steven christopher parker.jpg
        125360    Susan Eisenberg.jpg
        48147    Teri Garr.jpg
        162943    Tony Stewart.jpg
        81470    Wade Williams.jpg
        (Size in kb)

        Then remove all exif:

        Code:
        [stefan@politeia Invelos Update]$ mogrify -strip *.jpg
        Gives:

        Code:
        [stefan@politeia Invelos Update]$ du -skb * > new.txt
        63806    allen bestwick.jpg
        27437    Andrea Romano.jpg
        25036    barney hall.jpg
        33120    ben drew.jpg
        95501    billoah greene.jpg
        42055    BJ Hogg.jpg
        143919    Bobby Labonte.jpg
        181433    Bruce Timm.jpg
        113178    Cary Alexander.jpg
        34161    catalina rodriguez.jpg
        71689    Cathy Cahin Ryan.jpg
        22670    Christian Murphy.jpg
        43362    Danny Jacobs.jpg
        154290    david snell.jpg
        180480    jack oconnell.jpg
        4579    Jamie Downey.jpg
        284421    Jeff Gordon.jpg
        228996    Jimmie Johnson.jpg
        14712    Joe Moore.jpg
        59687    john fricker.jpg
        32846    John Pickard.jpg
        26275    Joseph Gilgun.jpg
        116324    julianne grossman.jpg
        25936    Keith Ferguson.jpg
        77889    Ken Leung.jpg
        178470    Keyonna Patterson.jpg
        68493    Liz Dainiels.jpg
        97140    Mary Alice Haney.jpg
        229877    Matt Kenseth.jpg
        20175    michael chiklis.jpg
        24876    Micke Spreitz.jpg
        25053    rasmus hardiker.jpg
        29629    rebecca chaney.jpg
        66073    Rick Gifford.jpg
        69591    ryan newman.jpg
        22413    Saratoga Ballentine.jpg
        249183    simon farnaby.jpg
        147880    steven christopher parker.jpg
        133561    Susan Eisenberg.jpg
        47317    Teri Garr.jpg
        94759    Tony Stewart.jpg
        78263    Wade Williams.jpg
        So I gonna write a little c++ program which stores filename + the file size for both lists in two different maps:

        Code:
        #include <map>
        #include <iostream>
        #include <fstream>
        #include <string>
        #include <boost/regex.hpp>
        #include <boost/lexical_cast.hpp>
        
        using namespace std;
        using namespace boost;
        
        typedef map<string, long> dataSheet;
        
        
        void readFileIntoMap(string &fileName, dataSheet &map) {
        	
        	ifstream inputFileStream{fileName.c_str()};
        	
        	string currentLine;
        	
        	while(getline(inputFileStream, currentLine)) {
        	
        		// Boost Matching
        		regex pattern{"(\\d*)\\s*(.+)"};
        		
        		smatch result;
        		
        		if (regex_match(currentLine, result, pattern)) {
        			
        			string currentFileName{(string)result[2]};
        			
        			long currentFileSize = boost::lexical_cast<long>(result[1]);
        			
        			map.insert(dataSheet::value_type(currentFileName, currentFileSize));
        		}
        	}
        }
        
        int main()  {
        	
        	string beforeStrippingExif{"./old.txt"};
        	string afterStrippingExif{"./new.txt"};
        	
        	// Map contains filename and filesize - multiple filenames are not allowed!!!!
        	// We are a bit restricted: no unicode filenames, please ;)
        	dataSheet exifStrippingSheetBefore;
        	dataSheet exifStrippingSheetAfter;
        		
        	readFileIntoMap(beforeStrippingExif, exifStrippingSheetBefore);
        	readFileIntoMap(afterStrippingExif, exifStrippingSheetAfter);
        	
        	cout << "FileSize - size before stripping / size after stripping - diff" << endl;
        	
        	for (auto it = exifStrippingSheetBefore.cbegin(); it != exifStrippingSheetBefore.cend(); ++it) {
        		// print some statistics
        		
        		if (exifStrippingSheetAfter.count(it->first)) {
        			cout << it->first << " - " << it->second;
        			cout << " / " << exifStrippingSheetAfter[it->first] << " - " << (long)it->second - (long)exifStrippingSheetAfter[it->first] << endl;
        		}
        	}
        }
        Compiled via:

        Code:
        [stefan@politeia Invelos Update]$ g++ -std=c++0x -lboost_regex exifSheet.cpp
        Results:

        Code:
        FileSize - size before stripping / size after stripping - diff
        Andrea Romano.jpg - 29759 / 27437 - 2322
        BJ Hogg.jpg - 42672 / 42055 - 617
        Bobby Labonte.jpg - 152343 / 143919 - 8424
        Bruce Timm.jpg - 204402 / 181433 - 22969
        Cary Alexander.jpg - 175466 / 113178 - 62288
        Cathy Cahin Ryan.jpg - 71735 / 71689 - 46
        Christian Murphy.jpg - 25291 / 22670 - 2621
        Danny Jacobs.jpg - 47163 / 43362 - 3801
        Jamie Downey.jpg - 5244 / 4579 - 665
        Jeff Gordon.jpg - 311628 / 284421 - 27207
        Jimmie Johnson.jpg - 221497 / 228996 - -7499
        Joe Moore.jpg - 14715 / 14712 - 3
        John Pickard.jpg - 36920 / 32846 - 4074
        Joseph Gilgun.jpg - 25908 / 26275 - -367
        Keith Ferguson.jpg - 28086 / 25936 - 2150
        Ken Leung.jpg - 83457 / 77889 - 5568
        Keyonna Patterson.jpg - 189489 / 178470 - 11019
        Liz Dainiels.jpg - 74552 / 68493 - 6059
        Mary Alice Haney.jpg - 104490 / 97140 - 7350
        Matt Kenseth.jpg - 221684 / 229877 - -8193
        Micke Spreitz.jpg - 27022 / 24876 - 2146
        Rick Gifford.jpg - 69179 / 66073 - 3106
        Saratoga Ballentine.jpg - 24359 / 22413 - 1946
        Susan Eisenberg.jpg - 125360 / 133561 - -8201
        Teri Garr.jpg - 48147 / 47317 - 830
        Tony Stewart.jpg - 162943 / 94759 - 68184
        Wade Williams.jpg - 81470 / 78263 - 3207
        allen bestwick.jpg - 66633 / 63806 - 2827
        barney hall.jpg - 28329 / 25036 - 3293
        ben drew.jpg - 35175 / 33120 - 2055
        billoah greene.jpg - 105164 / 95501 - 9663
        catalina rodriguez.jpg - 37763 / 34161 - 3602
        david snell.jpg - 171149 / 154290 - 16859
        jack oconnell.jpg - 200301 / 180480 - 19821
        john fricker.jpg - 56234 / 59687 - -3453
        julianne grossman.jpg - 124342 / 116324 - 8018
        michael chiklis.jpg - 19702 / 20175 - -473
        rasmus hardiker.jpg - 28632 / 25053 - 3579
        rebecca chaney.jpg - 33961 / 29629 - 4332
        ryan newman.jpg - 137515 / 69591 - 67924
        simon farnaby.jpg - 266524 / 249183 - 17341
        steven christopher parker.jpg - 155083 / 147880 - 7203
        Stefan

        Comment


          #5
          Hi,

          what happened with:

          Jimmie Johnson.jpg - 221497 / 228996 - -7499
          Joseph Gilgun.jpg - 25908 / 26275 - -367
          Matt Kenseth.jpg - 221684 / 229877 - -8193
          Susan Eisenberg.jpg - 125360 / 133561 - -8201
          john fricker.jpg - 56234 / 59687 - -3453
          michael chiklis.jpg - 19702 / 20175 - -473
          That means the file is bigger after exif stripping

          Stefan

          EDIT: I'm using a Fedora (16) system, gcc 4.6.2, ImageMagick 6.7.0-10 2011-11-03 Q16

          Comment


            #6
            Hi,

            I just found out that the strip command does some kind of "compressing" to the images:

            Code:
            [stefan@politeia Pictures]$ perceptualdiff org.jpg mod.jpg 
            FAIL: Images are visibly different
            50744 pixels are different
            
            [stefan@politeia Pictures]$ perceptualdiff andrea_org.jpg andrea_mod.jpg 
            FAIL: Images are visibly different
            1790 pixels are different
            Very bad. So i tried the perl image tool "exiftool" for removing exif metadata:

            Code:
            FileSize - size before stripping / size after stripping - diff
            Andrea Romano.jpg - 29759 / 29481 - 278
            BJ Hogg.jpg - 42672 / 42394 - 278
            Bobby Labonte.jpg - 152343 / 152065 - 278
            Bruce Timm.jpg - 204402 / 204124 - 278
            Cary Alexander.jpg - 175466 / 117385 - 58081
            Cathy Cahin Ryan.jpg - 71735 / 71717 - 18
            Christian Murphy.jpg - 25291 / 25013 - 278
            Danny Jacobs.jpg - 47163 / 46885 - 278
            Jamie Downey.jpg - 5244 / 5226 - 18
            Jeff Gordon.jpg - 311628 / 311350 - 278
            Jimmie Johnson.jpg - 221497 / 221219 - 278
            Joe Moore.jpg - 14715 / 14697 - 18
            John Pickard.jpg - 36920 / 36642 - 278
            Joseph Gilgun.jpg - 25908 / 25855 - 53
            Keith Ferguson.jpg - 28086 / 27808 - 278
            Ken Leung.jpg - 83457 / 83179 - 278
            Keyonna Patterson.jpg - 189489 / 189211 - 278
            Liz Dainiels.jpg - 74552 / 74274 - 278
            Mary Alice Haney.jpg - 104490 / 104212 - 278
            Matt Kenseth.jpg - 221684 / 221406 - 278
            Micke Spreitz.jpg - 27022 / 26744 - 278
            Rick Gifford.jpg - 69179 / 68901 - 278
            Saratoga Ballentine.jpg - 24359 / 24081 - 278
            Susan Eisenberg.jpg - 125360 / 125082 - 278
            Teri Garr.jpg - 48147 / 48068 - 79
            Tony Stewart.jpg - 162943 / 103802 - 59141
            Wade Williams.jpg - 81470 / 81192 - 278
            allen bestwick.jpg - 66633 / 66355 - 278
            barney hall.jpg - 28329 / 28051 - 278
            ben drew.jpg - 35175 / 34897 - 278
            billoah greene.jpg - 105164 / 104886 - 278
            catalina rodriguez.jpg - 37763 / 37485 - 278
            david snell.jpg - 171149 / 170871 - 278
            jack oconnell.jpg - 200301 / 179632 - 20669
            john fricker.jpg - 56234 / 55956 - 278
            julianne grossman.jpg - 124342 / 124064 - 278
            michael chiklis.jpg - 19702 / 16534 - 3168
            rasmus hardiker.jpg - 28632 / 28354 - 278
            rebecca chaney.jpg - 33961 / 33683 - 278
            ryan newman.jpg - 137515 / 78625 - 58890
            simon farnaby.jpg - 266524 / 266246 - 278
            steven christopher parker.jpg - 155083 / 154805 - 278
            Diff an original and a exiftool "edited" file:

            Code:
            [stefan@politeia Pictures]$ perceptualdiff Jimmie\ Johnson.jpg mod.jpg
            No difference! That means exiftool does a great job in removing exif metadata without touching the image (just filesize will be smaller!).

            Hope this could help someone

            Stefan

            Comment


              #7
              Thanks Stefan, you just put me through Fortran flashbacks from college. That's good to know it's messing with the images. Fortunately, I already have Exiftool....wait for it.......their GUI executable. I'm not sure their GUI can clean. I will look into it.

              Comment


                #8
                Hey Stefan,

                ExifTool GUI does a great job for metadata cleaning. It creates a report when done of any files that had errors and couldn't be processed, including explanations of the errors. However, on a set of 16,000 images it stalled and the end and couldn't generate the report (Not Responding). I used Irfan and reconverted the error files to JPG and it fixed all the errors.

                Comment


                  #9
                  Hi,

                  you should try to use normal batch script (call the exiftool - the exif tool gui also calls it) - maybe it works

                  Stefan

                  Comment

                  Working...
                  X