Microsoft common object file format




















There is a rather complete set of rules that linkers follow to decide which sections to combine and how. A section in an OBJ file may be intended for the linker's use, and not make it into the final executable. A section like this would be intended for the compiler to pass information to the linker.

Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values, which can differ. Each section starts at an offset that's some multiple of the alignment value.

For instance, in the PE file, a typical alignment would be 0x Thus, every section begins at a file offset that's a multiple of 0x Once mapped into memory, sections always start on at least a page boundary. That is, when a PE section is mapped into memory, the first byte of each section corresponds to a memory page.

Likewise, the. It's possible to create PE files in which the sections start at the same offset in the file as they start from the load address in memory. This makes for larger executables, but can speed loading under Windows 9 x or Windows Me.

An interesting linker feature is the ability to merge sections. If two sections have similar, compatible attributes, they can usually be combined into a single section at link time. For instance, the following linker option combines the. The advantage to merging sections is that it saves space, both on disk and in memory.

At a minimum, each section occupies one page in memory. If you can reduce the number of sections in an executable from four to three, there's a decent chance you'll use one less page of memory. Of course, this depends on whether the unused space at the end of the two merged sections adds up to a page. Things can get interesting when you're merging sections, as there are no hard and fast rules as to what's allowed. For example, it's OK to merge. Prior to Visual Studio. NET, you could merge.

In Visual Studio. NET, this is not allowed, but the linker often merges parts of the. Since portions of the imports data are written to by the Windows loader when they are loaded into memory, you might wonder how they can be put in a read-only section.

Once the imports table is initialized, the pages are then set back to their original protection attributes. In an executable file, there are many places where an in-memory address needs to be specified.

For instance, the address of a global variable is needed when referencing it. PE files can load just about anywhere in the process address space. While they do have a preferred load address, you can't rely on the executable file actually loading there. For this reason, it's important to have some way of specifying addresses that are independent of where the executable file loads.

For instance, consider an EXE file loaded at address 0x, with its code section at address 0x The RVA of the code section would be:.

To convert an RVA to an actual address, simply reverse the process: add the RVA to the actual load address to find the actual memory address. Want to go spelunking through some arbitrary DLL's data structures in memory? Here's how. There are many data structures within executable files that need to be quickly located. Some obvious examples are the imports, exports, resources, and base relocations.

All of these well-known data structures are found in a consistent manner, and the location is known as the DataDirectory. The DataDirectory is an array of 16 structures.

Each array entry has a predefined meaning for what it refers to. A more detailed description of many of the pointed-to data structures will be included in Part 2 of this article.

When you use code or data from another DLL, you're importing it. When any PE file loads, one of the jobs of the Windows loader is to locate all the imported functions and data and make those addresses available to the file being loaded.

I'll save the detailed discussion of data structures used to accomplish this for Part 2 of this article, but it's worth going over the concepts here at a high level. You don't have to do anything to make the addresses of the imported APIs available to your code. The loader takes care of it all.

The alternative is explicit linking. Likewise, if you import from GDI Visual Basic 6. When implicitly linking, the resolution process for the main EXE file and all its dependent DLLs occurs when the program first starts. Can't store VBA macro code or Excel 4. An Excel 4. Users can open a workbook in this file format in Excel , Excel , and Excel But, users can't save an Excel file to this file format.

When the XPS file is viewed online or printed, it maintains exactly the format that users intended, and the data in the file cannot be easily changed. A slide that is saved as a bit graphic for use with Microsoft Windows 95 and later versions. A format that can be opened in presentation applications that use the OpenDocument Presentation format, such as Google Docs and OpenOffice.

Users can also open presentations in the. Some information might be lost when users save and open. A presentation that always opens in Slide Show view instead of in Normal view. Saving a file as a PowerPoint Picture presentation reduces the file size, but some information is lost. A presentation outline that is saved as a text-only document that provides smaller file sizes and the ability to share macro-free files with other users who might not have the same version of PowerPoint or the operating system.

Any text in the notes pane is not saved with this file format. A style sheet that includes definitions of a color theme, font theme, and effect theme. A slide that is saved as a bit graphic for use with Microsoft Windows 3. A presentation that is saved as a video. PowerPoint , PowerPoint , and PowerPoint presentations can be saved at High Quality x , 30 frames per second ; Medium Quality x , 24 frames per sec ; and Low Quality X , 15 frames per second.

Note: If users copy a Windows metafile picture from another program, Excel pastes the picture as an enhanced metafile. Binary file formats for Excel versions 5. Excel objects, Excel objects, objects from correctly registered programs that support OLE 2.

Hypertext Markup Language. Note: When users copy text from another program, Excel pastes the text in HTML format, regardless of the format of the original text. Number of data-dictionary entries in the remainder of the Optional Header. Function called just after thread initialization. Function called just before thread initialization; does not apply to the first thread allocated.

Base Relocation Table. Total size of the section when loaded. Address of the first byte of the section, when loaded into memory, relative to the image base. Physical file size of the initialized data. Pointer to Linenumbers.

A value indicating what kind of relocation should be performed. Reference to a section address. Reference to an offset from a section address. Direct reference to a bit address. Reference to the high portion of a bit address. Direct reference to a bit address which is relative to the image base. Hint to the processor indicating which cache line will be loaded into the Instruction cache, for the target of a jump.

Reference to a bit address relative to the image base. When nonzero, this field specifies a one-based line number. Symbol Table Index. Used when Linenumber is 0: index to symbol table entry for a function. Used when Linenumber is greater than 0: relative virtual address of the executable code that corresponds to the source line indicated. Name of the symbol, represented by union of three structures. Value associated with the symbol. Signed integer identifying the section, using a one-based index into the Section Table.

A number representing type. Enumerated value representing storage class. Number of Aux Symbols. An array of eight bytes. Symbol record is not yet assigned a section. The symbol has a value but is not an address. The symbol provides general type or debugging information but does not correspond to a section. Used by Microsoft tools for external symbols. Used by Microsoft tools for symbol records that define the extent of a function: begin function named.

Weak external. Size of the executable code for the function itself. File offset of the first COFF line-number entry for the function, or zero if none exists. Pointer to Next Function. Symbol-table index of the record for the next function. Auxiliary Format The Value field is unused. A symbol record named. The Value field gives the number of lines in the function. The Value field has the same number as the Total Size field in the function-definition symbol record.

Actual ordinal line number 1, 2, 3, etc. Symbol-table index of the next. A value of 1 indicating that no library search for sym1 should be performed, or a value of 2 indicating that the linker should search all libraries. Number Of Linenumbers. Checksum for communal data. The linker generates a warning if more than one section defines the same COMDAT symbol, but links in one of the sections anyway.

The linker chooses an arbitrary section among the duplicate sections having same COMDAT symbol ; however, all must be the same size or the linker generates a warning. All duplicate sections must match exactly. Format of debugging information: this field enables support of multiple debuggers. Pointer to Raw Data. COFF debug information line numbers, symbol table, and string table.

A table with just one row unlike the debug directory. An array of RVAs of exported symbols. Array of the ordinals that correspond to members of the Name Pointer Table. Starting ordinal number for exports in this image. Address of the Export Name Pointer Table, relative to the image base. Address of the exported symbol when loaded into memory, relative to the image base. Relative virtual address of the Import Lookup Table; this table contains a name or ordinal for each import.

Relative virtual address of the Import Address Table: this table is identical in contents to the Import Lookup Table until the image is bound. Index into the Export Name Pointer Table. ASCII string containing name to import.

A trailing zero pad byte appears after the trailing null byte, if necessary, to align the next entry on an even boundary. The Reserved parameter should be set to zero. The Reason parameter can take the following values:.

Current versions of the Microsoft linker and Windows XP and later versions of Windows use a new version of this structure for bit xbased systems that include reserved SEH technology. This provides a list of safe structured exception handlers that the operating system uses during exception dispatching. Otherwise, the operating system terminates the application. This helps prevent the "x86 exception handler hijacking" exploit that has been used in the past to take control of the operating system.

The Microsoft linker automatically provides a default load configuration structure to include the reserved SEH data. If the user code already provides a load configuration structure, it must include the new reserved SEH fields. The data directory entry for a pre-reserved SEH load configuration structure must specify a particular size of the load configuration structure because the operating system loader always expects it to be a certain value.

In that regard, the size is really only a version check. For compatibility with Windows XP and earlier versions of Windows, the size must be 64 for x86 images.

Delayload import table in its own. Module contains suppressed export information. This also infers that the address taken IAT table is also present in the load config. Mask for the subfield that contains the stride of Control Flow Guard function table entries that is, the additional count of bytes per table entry. Additionally, the Windows SDK winnt.

Resources are indexed by a multiple-level binary-sorted tree structure. By convention, however, Windows uses three levels:. A series of resource directory tables relates all of the levels in the following way: Each directory table is followed by a series of directory entries that give the name or identifier ID for that level Type, Name, or Language level and an address of either a data description or another directory table.

If the address points to a data description, then the data is a leaf in the tree. If the address points to another directory table, then that table lists directory entries at the next level down. A leaf's Type, Name, and Language IDs are determined by the path that is taken through directory tables to reach the leaf. The first table determines Type ID, the second table pointed to by the directory entry in the first table determines Name ID, and the third table determines Language ID.

Each resource directory table has the following format. This data structure should be considered the heading of a table because the table actually consists of directory entries described in section 6. The directory entries make up the rows of a table. Each resource directory entry has the following format. Whether the entry is a Name or ID entry is indicated by the resource directory table, which indicates how many Name and ID entries follow it remember that all the Name entries precede all the ID entries for the table.

All entries for the table are sorted in ascending order: the Name entries by case-sensitive string and the ID entries by numeric value. The resource directory string area consists of Unicode strings, which are word-aligned. These strings are stored together after the last Resource Directory entry and before the first Resource Data entry. This minimizes the impact of these variable-length strings on the alignment of the fixed-size directory entries.

Each resource directory string has the following format:. Each Resource Data entry describes an actual unit of raw data in the Resource Data area. A Resource Data entry has the following format:. CLR metadata is stored in this section. It is used to indicate that the object file contains managed code. The format of the metadata is not documented, but can be handed to the CLR interfaces for handling metadata. The valid exception handlers of an object are listed in the.

It contains the COFF symbol index of each valid handler, using 4 bytes per index. The COFF archive format provides a standard mechanism for storing collections of object files. These collections are commonly called libraries in programming documentation. The first 8 bytes of an archive consist of the file signature. The rest of the archive consists of a series of archive members, as follows:. The first and second members are "linker members. Typically, a linker places information into these archive members.

The linker members contain the directory of the archive. The third member is the "longnames" member. This optional member consists of a series of null-terminated ASCII strings in which each string is the name of another archive member.

The rest of the archive consists of standard object-file members. Each of these members contains the contents of one object file in its entirety. An archive member header precedes each member. The following list shows the general structure of an archive:. The archive file signature identifies the file type. Any utility for example, a linker that takes an archive file as input can check the file type by reading this signature.

Each member linker, longnames, or object-file member is preceded by a header. An archive member header has the following format, in which each field is an ASCII text string that is left justified and padded with spaces to the end of the field. There is no terminating null character in any of these fields. Each member header starts on the first even address after the end of the previous archive member.

The Name field has one of the formats shown in the following table. As mentioned earlier, each of these strings is left justified and padded with trailing spaces within a field of 16 bytes:. The first linker member is included for backward compatibility. It is not used by current linkers, but its format must be correct.

This linker member provides a directory of symbol names, as does the second linker member. For each symbol, the information indicates where to find the archive member that contains the symbol. The elements in the offsets array must be arranged in ascending order.

This fact implies that the symbols in the string table must be arranged according to the order of archive members. For example, all the symbols in the first object-file member would have to be listed before the symbols in the second object file. Although both linker members provide a directory of symbols and archive members that contain them, the second linker member is used in preference to the first by all current linkers.

The second linker member includes symbol names in lexical order, which enables faster searching by name. The longnames member is a series of strings of archive member names. A name appears here only when there is insufficient room in the Name field 16 bytes.

The longnames member is optional. It can be empty with only a header, or it can be completely absent without even a header. The strings are null-terminated. Each string begins immediately after the null byte in the previous string. Traditional import libraries, that is, libraries that describe the exports from one image for use by another, typically follow the layout described in section 7, Archive Library File Format.

The primary difference is that import library members contain pseudo-object files instead of real ones, in which each member includes the section contributions that are required to build the import tables that are described in section 6.

The section contributions for an import can be inferred from a small set of information. The linker can either generate the complete, verbose information into the import library for each member at the time of the library's creation or write only the canonical information to the library and let the application that later uses it generate the necessary data on the fly.

This is sufficient information to accurately reconstruct the entire contents of the member at the time of its use. This structure is followed by two null-terminated strings that describe the imported symbol's name and the DLL from which it came.

These values are used to determine which section contributions must be generated by the tool that uses the library if it must access that data. The null-terminated import symbol name immediately follows its associated import header. The following values are defined for the Name Type field in the import header. They indicate how the name is to be used to generate the correct symbols that represent the import:.

Several attribute certificates are expected to be used to verify the integrity of the images. However, the most common is Authenticode signature. To accomplish this task, Authenticode signatures contain something called a PE image hash. The Authenticode PE image hash, or file hash for short, is similar to a file checksum in that it produces a small value that relates to the integrity of a file. A checksum is produced by a simple algorithm and is used primarily to detect memory failures.

That is, it is used to detect whether a block of memory on disk has gone bad and the values stored there have become corrupted. However, unlike most checksum algorithms, it is very difficult to modify a file so that it has the same file hash as its original unmodified form. That is, a checksum is intended to detect simple memory failures that lead to corruption, but a file hash can be used to detect intentional and even subtle modifications to a file, such as those introduced by viruses, hackers, or Trojan horse programs.

In an Authenticode signature, the file hash is digitally signed by using a private key known only to the signer of the file. A software consumer can verify the integrity of the file by calculating the hash value of the file and comparing it to the value of signed hash contained in the Authenticode digital signature.

If the file hashes do not match, part of the file covered by the PE image hash has been modified. It is not possible or desirable to include all image file data in the calculation of the PE image hash. Sometimes it simply presents undesirable characteristics for example, debugging information cannot be removed from publicly released files ; sometimes it is simply impossible.

For example, it is not possible to include all information within an image file in an Authenticode signature, then insert the Authenticode signature that contains that PE image hash into the PE image, and later be able to generate an identical PE image hash by including all image file data in the calculation again, because the file now contains the Authenticode signature that was not originally there.

This section describes how a PE image hash is calculated and what parts of the PE image can be modified without invalidating the Authenticode signature. The PE image hash for a specific file can be included in a separate catalog file without including an attribute certificate within the hashed file. This is relevant, because it becomes possible to invalidate the PE image hash in an Authenticode-signed catalog file by modifying a PE image that does not actually contain an Authenticode signature.

All data in sections of the PE image that are specified in the section table are hashed in their entirety except for the following exclusion ranges:. The file CheckSum field of the Windows-specific fields of the optional header. This checksum includes the entire file including any attribute certificates in the file. In all likelihood, the checksum will be different than the original value after inserting the Authenticode signature.

Information related to attribute certificates. The areas of the PE image that are related to the Authenticode signature are not included in the calculation of the PE image hash because Authenticode signatures can be added to or removed from an image without affecting the overall integrity of the image.

This is not a problem, because there are user scenarios that depend on re-signing PE images or adding a time stamp. Authenticode excludes the following information from the hash calculation:. The Certificate Table and corresponding certificates that are pointed to by the Certificate Table field listed immediately above. To calculate the PE image hash, Authenticode orders the sections that are specified in the section table by address range, then hashes the resulting sequence of bytes, passing over the exclusion ranges.

Information past of the end of the last section. The area past the last section defined by highest offset is not hashed. This area commonly contains debug information. Debug information can generally be considered advisory to debuggers; it does not affect the actual integrity of the executable program.

It is quite literally possible to remove debug information from an image after a product has been delivered and not affect the functionality of the program. In fact, this is sometimes done as a disk-saving measure. It is worth noting that debug information contained within the specified sections of the PE Image cannot be removed without invaliding the Authenticode signature.

You can use the makecert and signtool tools provided in the Windows Platform SDK to experiment with creating and verifying Authenticode signatures. For more information, see Reference, below. Creating, Viewing, and Managing Certificates. Kernel-Mode Code Signing Walkthrough. ImageHlp Functions. Skip to main content. This browser is no longer supported. Download Microsoft Edge More info. Contents Exit focus mode.

Is this page helpful? Please rate your experience Yes No. Any additional feedback? Note This document is provided to aid in the development of tools and applications for Windows but is not guaranteed to be a complete specification in all respects. Note Statically declared TLS data objects can be used only in statically loaded image files. Note The PE image hash for a specific file can be included in a separate catalog file without including an attribute certificate within the hashed file.

In this article. A certificate that is used to associate verifiable statements with an image. A number of different verifiable statements can be associated with a file; one of the most useful ones is a statement by a software manufacturer that indicates what the message digest of the image is expected to be.

A message digest is similar to a checksum except that it is extremely difficult to forge. Therefore, it is very difficult to modify a file to have the same message digest as the original file.

The statement can be verified as being made by the manufacturer by using public or private key cryptography schemes. This document describes details about attribute certificates other than to allow for their insertion into image files. In most cases, the format of each stamp is the same as that used by the time functions in the C run-time library. The location of an item within the file itself, before being processed by the linker in the case of object files or the loader in the case of image files.

In other words, this is a position within the file as stored on disk. A file that is given as input to the linker. The linker produces an image file, which in turn is used as input by the loader. The term "object file" does not necessarily imply any connection to object-oriented programming.

A description of a field that indicates that the value of the field must be zero for generators and consumers must ignore the field. In an image file, this is the address of an item after it is loaded into memory, with the base address of the image file subtracted from it. The RVA of an item almost always differs from its position within the file on disk file pointer. In an object file, an RVA is less meaningful because memory locations are not assigned. In this case, an RVA would be an address within a section described later in this table , to which a relocation is later applied during linking.

For simplicity, a compiler should just set the first RVA in each section to zero. For example, all code in an object file can be combined within a single section or depending on compiler behavior each function can occupy its own section. With more sections, there is more file overhead, but the linker is able to link in code more selectively. A section is similar to a segment in Intel architecture. All the raw data in a section must be loaded contiguously.

In addition, an image file can contain a number of sections, such as. Same as RVA, except that the base address of the image file is not subtracted. The address is called a VA because Windows creates a distinct VA space for each process, independent of physical memory. For almost all purposes, a VA should be considered just an address. A VA is not as predictable as an RVA because the loader might not load the image at its preferred location.

The number that identifies the type of target machine. For more information, see Machine Types. The number of sections. This indicates the size of the section table, which immediately follows the headers. This value should be zero for an image because COFF debugging information is deprecated. The number of entries in the symbol table. This data can be used to locate the string table, which immediately follows the symbol table.

The size of the optional header, which is required for executable files but not for object files. This value should be zero for an object file. For a description of the header format, see Optional Header Image Only. The flags that indicate the attributes of the file.

For specific flag values, see Characteristics. This indicates that the file does not contain base relocations and must therefore be loaded at its preferred base address. If the base address is not available, the loader reports an error. The default behavior of the linker is to strip base relocations from executable EXE files.

Image only. This indicates that the image file is valid and can be run. If this flag is not set, it indicates a linker error. COFF symbol table entries for local symbols have been removed. This flag is deprecated and should be zero. Aggressively trim working set. This flag is deprecated for Windows and later and must be zero.

The image file is a dynamic-link library DLL. Such files are considered executable files for almost all purposes, although they cannot be directly run.

The unsigned integer that identifies the state of the image file. The most common number is 0x10B, which identifies it as a normal executable file. The size of the code text section, or the sum of all code sections if there are multiple sections. The size of the initialized data section, or the sum of all such sections if there are multiple data sections.

The address of the entry point relative to the image base when the executable file is loaded into memory. For program images, this is the starting address. For device drivers, this is the address of the initialization function. An entry point is optional for DLLs. When no entry point is present, this field must be zero.

The address that is relative to the image base of the beginning-of-code section when it is loaded into memory. The address that is relative to the image base of the beginning-of-data section when it is loaded into memory.

The preferred address of the first byte of image when loaded into memory; must be a multiple of 64 K. The default for DLLs is 0x The alignment in bytes of sections when they are loaded into memory.

It must be greater than or equal to FileAlignment. The default is the page size for the architecture. The alignment factor in bytes that is used to align the raw data of sections in the image file. The value should be a power of 2 between and 64 K, inclusive. The default is The size in bytes of the image, including all headers, as the image is loaded in memory. It must be a multiple of SectionAlignment. The image file checksum. The following are checked for validation at load time: all drivers, any DLL loaded at boot time, and any DLL that is loaded into a critical Windows process.

The subsystem that is required to run this image. For more information, see Windows Subsystem. For more information, see DLL Characteristics later in this specification. The size of the stack to reserve. Only SizeOfStackCommit is committed; the rest is made available one page at a time until the reserve size is reached. The size of the local heap space to reserve.



0コメント

  • 1000 / 1000