Page 1 of 2 1 2 LastLast
Results 1 to 10 of 12

Thread: Files header strings

  1. #1
    Moderator Sam_Zen's Avatar
    Join Date
    May 2007
    Location
    NL
    Posts
    1,896
    Version
    IrfanView 4.27
    OS
    Win XP Home SP1
    CPU Cores
    1

    Default Files header strings

    Almost every graphic file has a certain 'header part' in the file to identify the nature of the bitmap/video, before the actual data begins.
    When e.g. a file gets the wrong extension to associate with, causing trouble, it's important to check the header line of the file, to identify which format one really is dealing with. So it can be corrected.
    This can be checked by opening the bitmap-file with an ascii-editor, like NotePad or equivalents.
    Codes will be there, probably shown as blocks, but some recognizable strings will be shown at the first lines.

    Bitmaps:
    BMP - First chars : "BM"
    JPG - On first line : "JFIF"
    JPG - From camera with EXIF data : On first line "Exif", two blocks, then "II"
    PNG - On first line : "PNG"
    GIF - First chars : "GIF89a" (very old GIFs : "GIF87")
    TIF - (no compression) First chars : "II" or "MM"
    JP2 - On first line : "jP"
    PSP - First chars : "Paint Shop Pro Image File"

    Video:
    MPG - After three codes : "º!" (hex: BA 21)
    AVI - On first line : "RIFF" and "AVI LIST"
    WMV - First chars : "0&²uŽfÏ" (hex: 30 26 B2 75 8E 66 CF)
    FLV - First chars : "FLV"
    RM - First chars : ".RMF"
    SWF - First chars : "FWS"
    MP4 - On first line : "mp41"
    MOV - Variable so far. I noticed the presence of the string "moov"

    Some audio formats as well:
    WAV - On first line : "RIFF" and "WAVEfmt"
    WAV - (Compressed ADPCM) "RIFF" and "WAVEfmt 2"
    AU - First chars : ".snd"
    IFF - (Amiga) Some strings : "FORM - SVXVHDR - CHAN - BODY"
    AIF - (Apple) Some strings : "FORM - AIFFCOMM - SSND"
    WMA - First chars : "0&²uŽfϦ٠ª bÎl" (hex: 30 26 B2 75 8E 66 CF 11 A6 D9 00 AA 00 62 CE 6C)
    OGG - First chars : "OggS"
    FLAC - First chars : "fLaC"
    RA - First chars : ".RMF"

    Some other formats:
    ZIP - First chars : "PK"
    Last edited by Sam_Zen; 13.11.2007 at 01:02 AM.
    0.6180339887
    Rest In Peace, Sam!

  2. #2
    Moderator Sam_Zen's Avatar
    Join Date
    May 2007
    Location
    NL
    Posts
    1,896
    Version
    IrfanView 4.27
    OS
    Win XP Home SP1
    CPU Cores
    1

    Default About

    I've recieved some comments about this thread, being not based on facts. As Matera wrote, this is the report of an observation.

    And I want to state, that this survey is done with a plain ascii-viewer, not with a fancy word-processor.
    Maybe some strings with 'odd' characters will be represented different in this post to different users, like those of WMV or FLA.
    But then, that modified string could be recognizable in comparison as well.

    I could have used a Hex-viewer as well, to be more precisely, but this would have forced me to describe the exact position in the file of every specific hex-string.

    I must admit that the TIF format has incomplete information, because I almost never use it, so I have not much files to test.
    0.6180339887
    Rest In Peace, Sam!

  3. #3
    IV Amateur
    Join Date
    Dec 2006
    Posts
    11
    Version
    IrfanView 3.98
    OS
    Win XP Home SP2

    Default

    TIFF files can also start with MM. To be exact, the first two characters give the order of bytes (MM = Motorola format = high bits first, II = Intel format = low bits first). The next two characters are the 16-bit number "42" in the appropriate order, so the first four bytes in hexadecimal are 49 49 2a 00 (II*null) or 4d 4d 00 2a (MMnull*).

    (This is one of the dumbest decisions ever. The idea was to allow different machines to handle TIFFs in their native format, but as all TIFF-reading software has to understand both formats in order for TIFFs to be portable it just means extra work for everyone with no real benefit.)

  4. #4
    Occasional User
    Join Date
    Dec 2006
    Posts
    113

    Default

    Quote Originally Posted by MatthewW View Post
    (This is one of the dumbest decisions ever. The idea was to allow different machines to handle TIFFs in their native format, but as all TIFF-reading software has to understand both formats in order for TIFFs to be portable it just means extra work for everyone with no real benefit.)
    Why? This was a quite fair decision. Both machines can read files in their own byte order variant fast and getting the overhead for the other format.

  5. #5
    IV Amateur
    Join Date
    Dec 2006
    Posts
    11
    Version
    IrfanView 3.98
    OS
    Win XP Home SP2

    Default

    Quote Originally Posted by midora View Post
    Why? This was a quite fair decision. Both machines can read files in their own byte order variant fast and getting the overhead for the other format.
    Because they can't read the values quickly anyway. It actually takes longer to decide which way the bytes should be and then deal with them as a word than it would take to handle them as individual bytes with a fixed ordering, especially on modern processors where a branch can be costly. Formats such as JFIF and PNG specify the byte ordering and don't suffer for it, and TIFF's provision for two byte orderings merely means that code to handle TIFFs is unnecessarily bulky and slow.

  6. #6
    Moderator Sam_Zen's Avatar
    Join Date
    May 2007
    Location
    NL
    Posts
    1,896
    Version
    IrfanView 4.27
    OS
    Win XP Home SP1
    CPU Cores
    1

    Default

    Nice info, MatthewW.

    A good idea to mention the hex code as well.
    But, only if necessary. I don't see a reason to repeat 'JFIF' with '4A 46 49 46'.
    This was meant to have a quick look in the first place. Since everyone at least has notepad, it's the shortest way.
    I use the Lister of TC for this, so I can switch views between ascii and hex.
    So I will add the hex-code to the items above with the 'odd' characters, to give precise information.

    Btw, this difference of Motorola and Intel already caused trouble in the DOS days on a XT..
    0.6180339887
    Rest In Peace, Sam!

  7. #7
    Occasional User
    Join Date
    Dec 2006
    Posts
    113

    Default

    Quote Originally Posted by MatthewW View Post
    Because they can't read the values quickly anyway. It actually takes longer to decide which way the bytes should be and then deal with them as a word than it would take to handle them as individual bytes with a fixed ordering.
    But you are doing the decision which kind of decoder to use just ones when you open the file and not for each single word. So each processor has the optimal speed for its native byte order. So this is optimal.

  8. #8
    Power User j7n's Avatar
    Join Date
    Jun 2006
    Location
    Cyberspace
    Posts
    535
    Version
    IrfanView 4.51
    OS
    32-bit Win Server 2003 SP1
    CPU Cores
    1

    Default

    Hi Sam_Zen.

    You had a great idea by starting this table of file headers. Very useful. In the past I had used a program called WhatFormat. The problem with it, besides VisualBasic, was that it only worked with formats deemed important for the author. A hex editor or Notepad for smaller files is the only reliable and universal tool. It's like an oscilloscope, where you can throw any signal and in time learn to recognize various patterns.

    I've been guessing file formats using a hex editor for some time. I hope you don't mind if I make a few remarks. The special (Unicode) characters you have mentioned in the first post have little meaning since their appearance depends on the current codepage.

    Newer flash animations may also begin with "CWS".

    WMA, or I guess actually ASF, files can visually be identified by the presence if "0&" folowed later by "Seh". Then they may also contain metadata tags in Unicode beginning with "WM/".

    The oldschool MPEG formats don't have a distinct header. Instead each packed in the stream begins with a predefined pattern of bits that carry information about the parameters of that packet. This is actually very good since most these MPEG streams can be cut and still be valid for immediate playback. MPEGs may be identified by repeating occurences of this packet header.

    For example, MPEG-2 PS, as on Video DVDs, can be identified by reguarly repeating sequence of "00 00 01 BA 44" which visually appears as repeating "║D" (if using a DOS/OEM codepage).

    Continued from the previous, each AC-3 frame starts with "0B 77".

    MP3 files can't be reliably identified by the packet headers. You could look for repeated "FF FB" for the most common near-CD-quality files. The most popular and industry standard LAME encoder will write is name and version in the stream. You may also encounter the following dwords near the very start of the file: "Xing" (VBR), "Info" (CBR made by LAME), "VBRI" (VBR by Fraunhoffer). MP3's tagged with the terrible ID3v2 standard start with "ID3".

    Each packet of the MP2 data starts with "FF FD" for the most common near-CD-quality.

    Each packet of DTS audio starts with "7F FE 80" for 44.1 kHz, surround. Needs additional verification for other configurations. Very important to know when extracting DTS from various containers such as SPDIF or CDDA.

    Monkey's Audio (also known as APE, MAC) files start with "MAC".

    Matroska media files will nicely report themselves on the first line as "matroska".

    RAR archives always begin with "Rar!".

    CorelDraw CDR files contain "RIFF" followed by "CDRBvrsn".

    Zip. For additional verification, scroll to the end of file for a list of all archived filenames, each item also beginning with "PK".

  9. #9
    Moderator Sam_Zen's Avatar
    Join Date
    May 2007
    Location
    NL
    Posts
    1,896
    Version
    IrfanView 4.27
    OS
    Win XP Home SP1
    CPU Cores
    1

    Default

    @ 7jn
    An excellent contribution.
    To add some more :
    XM module tracker files start with "Extended Module: " plus track title.
    IT module tracker files start with "IMPM" plus track title.
    Difference between a stereo WAV and a 4 channel WAV : instead of the string "data", it's "qdata".
    PK - files for a quick load of WAV files in Cool Edit. Start with "F1 06"
    TER starts with "TERRAGENTERRAIN"
    PDF starts "%PDF-" plus version
    MID starts "MThd" - few bytes further "ÀMTrk" - or C0 4D 54 72 6B

    A nice tool to investigate the contents of especially executables like .exe or .dll is Textscan at AnalogX.
    It can show possible ascii-strings inside a file, among the codes, for identification.
    0.6180339887
    Rest In Peace, Sam!

  10. #10
    Advanced User matera's Avatar
    Join Date
    May 2006
    Location
    3 miles below poverty level
    Posts
    1,383
    Version
    IrfanView 4.44
    OS
    64-bit Linux Distribution
    CPU Cores
    4

    Default

    Speaking of hex viewers/editors, I have found Tiny Hexer from www.mirkes.de to be the most helpful of all tools. Unlike most hex editors, it has a multi-document interface, and it is very fast and configurable. It can be used to compare two different files side by side, by tiling the windows.

    I have used it to repair a file with a damaged header, copying parts from a known good JPG to a corrupted one.
    Its: Belongs to "It"
    It's: Shortened form of "It is"
    ---------------------
    Lose: Fail to keep
    Loose: Not tight

    ---------------------
    Plurals do not require apostrophes

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •