Delphi Basics Articles

Delphi Basics Articles Archive: Archive: Delphi Basics Articles

How to Enable Vertical Tabs in Google Chrome

posted 27 Sep 2011, 02:56 by Delphi Basics

Enabling vertical tabs is easy in new versions of Google Chrome.

 To get started, open Chrome and type about:flags in the URL bar and press enter. This will take you to Chrome’s hidden experimental settings page.

Click Enable under Side Tabs.

Click the Restart Now button at the bottom of the page.

To activate side tabs, right click on a tab in Chrome and select Use Side Tabs.

Aphex - Dissection of a Backdoor

posted 28 Feb 2011, 14:32 by Delphi Basics   [ updated 2 Apr 2012, 06:10 ]

Back in the day when backdoors still had room to grow, Aphex was signed on with a major publisher to write a book on the subject. He missed a few deadlines and they ended up axing the project. In 2004, He sent me a few chapters to check out. Here is chapter two for the internet museum (hope he doesn’t mind). 
-stm 

Remember:
akcom, drocon, archphase, MrJinxy, Olympus, PrinceAli, d-one, nelix, k0nsl, aza, b0b, slash, illy and everyone else that experienced the #trinity days. 

Aphex - Dissection of a Backdoor.doc


Aphex - Dissection of a Backdoor.doc

Installing Oracle Database 10g Express Edition on Windows Vista

posted 26 Oct 2010, 15:42 by Delphi Basics

I had a lot of trouble installing Oracle Database on my Windows Vista machine. 
Enjoy a step by step tutorial from installation to working with demonstration data. 

  1. Download:
    Oracle Database 10g Express Edition.
    "Single-byte LATIN1 database for Western European language storage, with the Database Homepage user interface in English only".
    http://www.oracle.com/technetwork/database/express-edition/downloads/102xewinsoft-090667.html
  2. Installation:
    Unzip both archives.
    Right click on Setup.exe and click Properties. Select the Compatibility tab and check the box to run the program in compatibility mode for Windows XP (Service Pack 2).
    Open System by clicking the Start button , clicking Control Panel, clicking System and Maintenance, and then clicking System.
    In the left pane, click Advanced system settings.
    On the Advanced tab, under Performance, click Settings.
    Click the Advanced tab, and then, under Virtual memory, click Change.
    Clear the Automatically manage paging file size for all drives check box.
    Under Drive [Volume Label], click the drive that contains the paging file you want to change.
    Click Custom size, type 4603 megabytes in the Initial size (MB) and Maximum size (MB) box, click Set, and then click OK.
    Run Setup.exe in the ds_windows_x86_101202_disk1 folder.
    When requested, navigate to the stage folder in the ds_windows_x86_101202_disk2 folder.
    Follow on-screen instructions to complete installation.
    NOTE: Be sure to remember the SYSTEM account password.
  3. Access demonstration data:
    Navigate to Oracle Database 10g Express Edition in your start menu.
    Click Run SQL Command Line.
    Enter the following commands (with a carriage return at the end of each line):
    conn system/[SYSTEMPASSWORDSEENOTE]@xe
    alter user hr account unlock;
    conn hr/hr@xe
    Follow instructions to change hr account password.
  4. Working with demonstration data in comand line:
    Enter the following commands to list the available tables to the hr accout:
    select table_name from user_tables;
    Other SQL commands can be executed here.
  5. Working with demonstration data in GUI:
    Navigate to Oracle Database 10g Express Edition in your start menu.
    Click Go To Database Home Page.
    Log in using the username hr and the password you selected for the hr account.
    You can now interface with the data graphically.
tutorial complete :D

How to Import / Export Google Sites

posted 23 Mar 2010, 16:32 by Delphi Basics   [ updated 23 Mar 2010, 16:57 ]

Google has released an API for Google Sites [http://code.google.com/apis/sites/] that lets you create or edit pages, upload or download attachments, monitor the activity of a site programmatically. The API could be use to create a new interface for Google Sites, to upload files from other sources or to migrate your data.

Google's Data Liberation team built a Java application for importing and exporting Google Sites. The application lets you export the pages from a site and all their attachments to a folder.

"The folder structure of an exported site is meant to mimic the Sites UI as closely as possible. Thus if exporting to a directory "rootdirectory," a top-level page normally located at webspace/pagename, would be in a file named index.html, located in rootdirectory/pagename. A subpage of that page, normally located at webspace/pagename/subpage, would be in a file named index.html in rootdirectory/pagename/subpage. Attachments are downloaded to the same directory as the index.html page to which they belong," mentions the user guide. [http://code.google.com/p/google-sites-liberation/wiki/UsersGuide].

You should only enter the domain name if you use Google Apps. "Webspace" is the name of your site: http://sites.google.com/site/sitename/.
In my personal experience, I left the Domain field blank.

Unfortunately, you can't use this tool to import HTML files to an existing site. The importing option is only useful for the sites exported using the same application.

Here is the original link to the application:
http://code.google.com/p/google-sites-liberation/

If the above link is not working, you may use this personal direct download link:
http://h1.ripway.com/delphibasics/DelphiBasicsUploads/ImportExportGoogleSites.7z

The Python Paradox

posted 21 Mar 2010, 19:40 by Delphi Basics

In a recent talk I said something that upset a lot of people: that you could get smarter programmers to work on a Python project than you could to work on a Java project.

I didn't mean by this that Java programmers are dumb. I meant that Python programmers are smart. It's a lot of work to learn a new programming language. And people don't learn Python because it will get them a job; they learn it because they genuinely like to program and aren't satisfied with the languages they already know.

Which makes them exactly the kind of programmers companies should want to hire. Hence what, for lack of a better name, I'll call the Python paradox: if a company chooses to write its software in a comparatively esoteric language, they'll be able to hire better programmers, because they'll attract only those who cared enough to learn it. And for programmers the paradox is even more pronounced: the language to learn, if you want to get a good job, is a language that people don't learn merely to get a job.

Only a few companies have been smart enough to realize this so far. But there is a kind of selection going on here too: they're exactly the companies programmers would most like to work for. Google, for example. When they advertise Java programming jobs, they also want Python experience.

A friend of mine who knows nearly all the widely used languages uses Python for most of his projects. He says the main reason is that he likes the way source code looks. That may seem a frivolous reason to choose one language over another. But it is not so frivolous as it sounds: when you program, you spend more time reading code than writing it. You push blobs of source code around the way a sculptor does blobs of clay. So a language that makes source code ugly is maddening to an exacting programmer, as clay full of lumps would be to a sculptor.

At the mention of ugly source code, people will of course think of Perl. But the superficial ugliness of Perl is not the sort I mean. Real ugliness is not harsh-looking syntax, but having to build programs out of the wrong concepts. Perl may look like a cartoon character swearing, but there are cases where it surpasses Python conceptually.

So far, anyway. Both languages are of course moving targets. But they share, along with Ruby (and Icon, and Joy, and J, and Lisp, and Smalltalk) the fact that they're created by, and used by, people who really care about programming. And those tend to be the ones who do it well.

Read more: http://en.wikipedia.org/wiki/Java_%28programming_language%29
                http://en.wikipedia.org/wiki/Python_%28programming_language%29

Portable Executable [ PE ] File Format - Overview

posted 16 Mar 2010, 09:34 by Delphi Basics   [ updated 16 Mar 2010, 09:43 ]

PE stands for Portable Executable. It's the native file format of Win32. Its specification is derived somewhat from the Unix Coff (common object file format). The meaning of "portable executable" is that the file format is universal across win32 platform: the PE loader of every win32 platform recognizes and uses this file format even when Windows is running on CPU platforms other than Intel. It doesn't mean your PE executables would be able to port to other CPU platforms without change. Every win32 executable (except VxDs and 16-bit Dlls) uses PE file format. Even NT's kernel mode drivers use PE file format. Thus studying the PE file format gives you valuable insights into the structure of Windows.

Let's jump into the general outline of PE file format without further ado.

DOS MZ header
DOS stub
PE header
Section table
Section 1
Section 2
Section ...
Section n


The above picture is the general layout of a PE file. All PE files (even 32-bit DLLs) must start with a simple DOS MZ header. We usually aren't interested in this structure much. It's provided in the case when the program is run from DOS, so DOS can recognize it as a valid executable and can thus run the DOS stub which is stored next to the MZ header. The DOS stub is actually a valid EXE that is executed in case the operating system doesn't know about PE file format. It can simply display a string like "This program requires Windows" or it can be a full-blown DOS program depending on the intent of the programmer. We are also not very interested in DOS stub: it's usually provided by the assembler/compiler. In most case, it simply uses int 21h, service 9 to print a string saying "This program cannot run in DOS mode".

After the DOS stub comes the PE header. The PE header is a general term for the PE-related structure named IMAGE_NT_HEADERS. This structure contains many essential fields that are used by the PE loader. We will be quite familiar with it as you know more about PE file format. In the case the program is executed in the operating system that knows about PE file format, the PE loader can find the starting offset of the PE header from the DOS MZ header. Thus it can skip the DOS stub and go directly to the PE header which is the real file header.

The real content of the PE file is divided into blocks called sections. A section is nothing more than a block of data with common attributes such as code/data, read/write etc. You can think of a PE file as a logical disk. The PE header is the boot sector and the sections are files in the disk. The files can have different attributes such as read-only, system, hidden, archive and so on. I want to make it clear from this point onwards that the grouping of data into a section is done on the common attribute basis: not on logical basis. It doesn't matter how the code/data are used , if the data/code in the PE file have the same attribute, they can be lumped together in a section. You should not think of a section as "data", "code" or some other logical concepts: sections can contain both code and data provided that they have the same attribute. If you have a block of data that you want to be read-only, you can put that data in the section that is marked as read-only. When the PE loader maps the sections into memory, it examines the attributes of the sections and gives the memory block occupied by the sections the indicated attributes.

If we view the PE file format as a logical disk, the PE header as the boot sector and the sections as files, we still don't have enough information to find out where the files reside on the disk, ie. we haven't discussed the directory equivalent of the PE file format. Immediately following the PE header is the section table which is an array of structures. Each structure contains the information about each section in the PE file such as its attribute, the file offset, virtual offset. If there are 5 sections in the PE file, there will be exactly 5 members in this structure array. We can then view the section table as the root directory of the logical disk. Each member of the array is equvalent to the each directory entry in the root directory.

That's all about the physical layout of the PE file format. I'll summarize the major steps in loading a PE file into memory below:

  1. When the PE file is run, the PE loader examines the DOS MZ header for the offset of the PE header. If found, it skips to the PE header.
  2. The PE loader checks if the PE header is valid. If so, it goes to the end of the PE header.
  3. Immediately following the PE header is the section table. The PE header reads information about the sections and maps those sections into memory using file mapping. It also gives each section the attributes as specified in the section table.
  4. After the PE file is mapped into memory, the PE loader concerns itself with the logical parts of the PE file, such as the import table.

The above steps are oversimplification and are based on my own observation. There may be some inaccuracies but it should give you the clear picture of the process.

UPX, the Ultimate Packer for eXecutables

posted 15 Mar 2010, 17:24 by Delphi Basics   [ updated 15 Mar 2010, 17:27 ]

UPX, the Ultimate Packer for eXecutables, is a free and open source executable packer supporting a number of file formats from different operating systems.

Read more about UPX from Official Site: http://upx.sourceforge.net/#download
                                            Wikipedia:    http://en.wikipedia.org/wiki/UPX

Read about executable compression: http://en.wikipedia.org/wiki/Executable_compression

At DelphiBasics, we pack all our executables with UPX to minimize the storage required for our resources.

To pack an executable with UPX, just drag it onto the UPX program in Windows Explorer. :)

An In-Depth Look into the Win32 Portable Executable File Format - Part 2

posted 15 Mar 2010, 17:04 by Delphi Basics   [ updated 20 Nov 2010, 18:34 ]

See Part 1 here: An In-Depth Look into the Win32 Portable Executable File Format - Part 1

Source: http://msdn.microsoft.com/en-us/magazine/cc301808.aspx

SUMMARY The Win32 Portable Executable File Format (PE) was designed to be a standard executable format for use on all versions of the operating systems on all supported processors. Since its introduction, the PE format has undergone incremental changes, and the introduction of 64-bit Windows has required a few more. Part 1 of this series presented an overview and covered RVAs, the data directory, and the headers. This month in Part 2 the various sections of the executable are explored. The discussion includes the exports section, export forwarding, binding, and delayloading. The debug directory, thread local storage, and the resources sections are also covered.

Last month in Part 1 of this article, I began a comprehensive tour of Portable Executable (PE) files. I described the history of PE files and the data structures that make up the headers, including the section table. The PE headers and section table tell you what kind of code and data exists in the executable and where you should look to find it.
      This month I'll describe the more commonly encountered sections. If you're not familiar with basic PE file concepts, you should read Part 1 of this article first.
      Last month I described how a section is a chunk of code or data that logically belongs together. For example, all the data that comprises an executable's import tables are in a section. Let's look at some of the sections you'll encounter in executables and OBJs. Unless otherwise stated, the section names in Figure 1 come from Microsoft tools.

Figure 1 Section Names
Name
Description
.text
The default code section.
.data
The default read/write data section. Global variables typically go here.
.rdata
The default read-only data section. String literals and C++/COM vtables are examples of items put into .rdata.
.idata
The imports table. It has become common practice (either explicitly, or via linker default behavior) to merge the .idata section into another section, typically .rdata. By default, the linker only merges the .idata section into another section when creating a release mode executable.
.edata
The exports table. When creating an executable that exports APIs or data, the linker creates an .EXP file. The .EXP file contains an .edata section that's added into the final executable. Like the .idata section, the .edata section is often found merged into the .text or .rdata sections.
.rsrc
The resources. This section is read-only. However, it should not be named anything other than .rsrc, and should not be merged into other sections.
.bss
Uninitialized data. Rarely found in executables created with recent linkers. Instead, the VirtualSize of the executable's .data section is expanded to make enough room for uninitialized data.
.crt
Data added for supporting the C++ runtime (CRT). A good example is the function pointers that are used to call the constructors and destructors of static C++ objects.
.tls
Data for supporting thread local storage variables declared with __declspec(thread). This includes the initial value of the data, as well as additional variables needed by the runtime.
.reloc
The base relocations in an executable. Base relocations are generally only needed for DLLs and not EXEs. In release mode, the linker doesn't emit base relocations for EXE files. Relocations can be removed when linking with the /FIXED switch.
.sdata
"Short" read/write data that can be addressed relative to the global pointer. Used for the IA-64 and other architectures that use a global pointer register. Regular-sized global variables on the IA-64 will go in this section.
.srdata
"Short" read-only data that can be addressed relative to the global pointer. Used on the IA-64 and other architectures that use a global pointer register.
.pdata
The exception table. Contains an array of IMAGE_RUNTIME_FUNCTION_ENTRY structures, which are CPU-specific. Pointed to by the IMAGE_DIRECTORY_ENTRY_EXCEPTION slot in the DataDirectory. Used for architectures with table-based exception handling, such as the IA-64. The only architecture that doesn't use table-based exception handling is the x86.
.debug$S
Codeview format symbols in the OBJ file. This is a stream of variable-length CodeView format symbol records.
.debug$T
Codeview format type records in the OBJ file. This is a stream of variable-length CodeView format type records.
.debug$P
Found in the OBJ file when using precompiled headers.
.drectve
Contains linker directives and is only found in OBJs. Directives are ASCII strings that could be passed on the linker command line. For instance:
  -defaultlib:LIBC
Directives are separated by a space character.
.didat
Delayload import data. Found in executables built in nonrelease mode. In release mode, the delayload data is merged into another section.

The Exports Section

      When an EXE exports code or data, it's making functions or variables usable by other EXEs. To keep things simple, I'll refer to exported functions and exported variables by the term "symbols." At a minimum, to export something, the address of an exported symbol needs to be obtainable in a defined manner. Each exported symbol has an ordinal number associated with it that can be used to look it up. Also, there is almost always an ASCII name associated with the symbol. Traditionally, the exported symbol name is the same as the name of the function or variable in the originating source file, although they can also be made to differ.
      Typically, when an executable imports a symbol, it uses the symbol name rather than its ordinal. However, when importing by name, the system just uses the name to look up the export ordinal of the desired symbol, and retrieves the address using the ordinal value. It would be slightly faster if an ordinal had been used in the first place. Exporting and importing by name is solely a convenience for programmers.
      The use of the ORDINAL keyword in the Exports section of a .DEF file tells the linker to create an import library that forces an API to be imported by ordinal, not by name.
      I'll begin with the IMAGE_EXPORT_DIRECTORY structure, which is shown in Figure 2.

Figure 2 IMAGE_EXPORT_DIRECTORY Structure Members
Size
Member
Description
DWORD
Characteristics
Flags for the exports. Currently, none are defined.
DWORD
TimeDateStamp
The time/date that the exports were created. This field has the same definition as the IMAGE_NT_HEADERS.FileHeader. TimeDateStamp (number of seconds since 1/1/1970 GMT).
WORD
MajorVersion
The major version number of the exports. Not used, and set to 0.
WORD
MinorVersion
The minor version number of the exports. Not used, and set to 0.
DWORD
Name
A relative virtual address (RVA) to an ASCII string with the DLL name associated with these exports (for example, KERNEL32.DLL).
DWORD
Base
This field contains the starting ordinal value to be used for this executable's exports. Normally, this value is 1, but it's not required to be so. When looking up an export by ordinal, the value of this field is subtracted from the ordinal, with the result used as a zero-based index into the Export Address Table (EAT).
DWORD
NumberOfFunctions
The number of entries in the EAT. Note that some entries may be 0, indicating that no code/data is exported with that ordinal value.
DWORD
NumberOfNames
The number of entries in the Export Names Table (ENT). This value will always be less than or equal to the NumberOf-Functions field. It will be less when there are symbols exported by ordinal only. It can also be less if there are numeric gaps in the assigned ordinals. This field is also the size of the export ordinal table (below).
DWORD
AddressOfFunctions
The RVA of the EAT. The EAT is an array of RVAs. Each nonzero RVA in the array corresponds to an exported symbol.
DWORD
AddressOfNames
The RVA of the ENT. The ENT is an array of RVAs to ASCII strings. Each ASCII string corresponds to a symbol exported by name. This table is sorted so that the ASCII strings are in order. This allows the loader to do a binary search when looking for an exported symbol. The sorting of the names is binary (like the C++ RTL strcmp function provides), rather than a locale-specific alphabetic ordering.
DWORD
AddressOfNameOrdinals
The RVA of the export ordinal table. This table is an array of WORDs. This table maps an array index from the ENT into the corresponding export address table entry.

The exports directory points to three arrays and a table of ASCII strings. The only required array is the Export Address Table (EAT), which is an array of function pointers that contain the address of an exported function. An export ordinal is simply an index into this array (see Figure 3).



Figure 3 The IMAGE_EXPORT_DIRECTORY Structure

      Let's go through an example to show exports at work. Figure 4 shows some of the exports from KERNEL32.DLL.
Figure 4 KERNEL32 Exports
exports table:
Name: KERNEL32.dll
Characteristics: 00000000
TimeDateStamp: 3B7DDFD8 -> Fri Aug 17 23:24:08 2001
Version: 0.00
Ordinal base: 00000001
# of functions: 000003A0
# of Names: 000003A0

Entry Pt Ordn Name
00012ADA 1 ActivateActCtx
000082C2 2 AddAtomA
•••remainder of exports omitted
Let's say you've called GetProcAddress on the AddAtomA API in KERNEL32. The system begins by locating KERNEL32's IMAGE_EXPORT_DIRECTORY. From that, it obtains the start address of the Export Names Table (ENT). Knowing that there are 0x3A0 entries in the array, it does a binary search of the names until it finds the string "AddAtomA".
      Let's say that the loader finds AddAtomA to be the second array entry. The loader then reads the corresponding second value from the export ordinal table. This value is the export ordinal of AddAtomA. Using the export ordinal as an index into the EAT (and taking into account the Base field value), it turns out that AddAtomA is at a relative virtual address (RVA) of 0x82C2. Adding 0x82C2 to the load address of KERNEL32 yields the actual address of AddAtomA.

Export Forwarding

      A particularly slick feature of exports is the ability to "forward" an export to another DLL. For example, in Windows NT®, Windows® 2000, and Windows XP, the KERNEL32 HeapAlloc function is forwarded to the RtlAllocHeap function exported by NTDLL. Forwarding is performed at link time by a special syntax in the EXPORTS section of the .DEF file. Using HeapAlloc as an example, KERNEL32's DEF file would contain:
   EXPORTS
•••
HeapAlloc = NTDLL.RtlAllocHeap
      How can you tell if a function is forwarded rather than exported normally? It's somewhat tricky. Normally, the EAT contains the RVA of the exported symbol. However, if the function's RVA is inside the exports section (as given by the VirtualAddress and Size fields in the DataDirectory), the symbol is forwarded.
      When a symbol is forwarded, its RVA obviously can't be a code or data address in the current module. Instead, the RVA points to an ASCII string of the DLL and symbol name to which it is forwarded. In the prior example, it would be NTDLL.RtlAllocHeap.

The Imports Section

      The opposite of exporting a function or variable is importing it. In keeping with the prior section, I'll use the term "symbol" to collectively refer to imported functions and imported variables.
      The anchor of the imports data is the IMAGE_IMPORT_DESCRIPTOR structure. The DataDirectory entry for imports points to an array of these structures. There's one IMAGE_IMPORT_DESCRIPTOR for each imported executable. The end of the IMAGE_IMPORT_DESCRIPTOR array is indicated by an entry with fields all set to 0. Figure 5 shows the contents of an IMAGE_IMPORT_DESCRIPTOR.

Figure 5
IMAGE_IMPORT_DESCRIPTOR Structure
Size
Member
Description
DWORD
OriginalFirstThunk
This field is badly named. It contains the RVA of the Import Name Table (INT). This is an array of IMAGE_THUNK_DATA structures. This field is set to 0 to indicate the end of the array of IMAGE_IMPORT_DESCRIPTORs.
DWORD
TimeDateStamp
This is 0 if this executable is not bound against the imported DLL. When binding in the old style (see the section on Binding), this field contains the time/date stamp (number of seconds since 1/1/1970 GMT) when the binding occurred. When binding in the new style, this field is set to -1.
DWORD
ForwarderChain
This is the Index of the first forwarded API. Set to -1 if no forwarders. Only used for old-style binding, which could not handle forwarded APIs efficiently.
DWORD
Name
The RVA of the ASCII string with the name of the imported DLL.
DWORD
FirstThunk
Contains the RVA of the Import Address Table (IAT). This is array of IMAGE_THUNK_DATA structures.

      Each IMAGE_IMPORT_DESCRIPTOR typically points to two essentially identical arrays. These arrays have been called by several names, but the two most common names are the Import Address Table (IAT) and the Import Name Table (INT). Figure 6 shows an executable importing some APIs from USER32.DLL.


Figure 6 Two Parallel Arrays of Pointers

      Both arrays have elements of type IMAGE_THUNK_DATA, which is a pointer-sized union. Each IMAGE_THUNK_DATA element corresponds to one imported function from the executable. The ends of both arrays are indicated by an IMAGE_THUNK_DATA element with a value of zero. The IMAGE_THUNK_DATA union is a DWORD with these interpretations:
DWORD Function;       // Memory address of the imported function
DWORD Ordinal; // Ordinal value of imported API
DWORD AddressOfData; // RVA to an IMAGE_IMPORT_BY_NAME with
// the imported API name
DWORD ForwarderString;// RVA to a forwarder string
      The IMAGE_THUNK_DATA structures within the IAT lead a dual-purpose life. In the executable file, they contain either the ordinal of the imported API or an RVA to an IMAGE_IMPORT_BY_NAME structure. The IMAGE_IMPORT_BY_NAME structure is just a WORD, followed by a string naming the imported API. The WORD value is a "hint" to the loader as to what the ordinal of the imported API might be. When the loader brings in the executable, it overwrites each IAT entry with the actual address of the imported function. This a key point to understand before proceeding. I highly recommend reading Russell Osterlund's article in this issue which describes the steps that the Windows loader takes.
      Before the executable is loaded, is there a way you can tell if an IMAGE_THUNK_DATA structure contains an import ordinal, as opposed to an RVA to an IMAGE_IMPORT_BY_NAME structure? The key is the high bit of the IMAGE_THUNK_DATA value. If set, the bottom 31 bits (or 63 bits for a 64-bit executable) is treated as an ordinal value. If the high bit isn't set, the IMAGE_THUNK_ DATA value is an RVA to the IMAGE_IMPORT_BY_NAME.
      The other array, the INT, is essentially identical to the IAT. It's also an array of IMAGE_THUNK_DATA structures. The key difference is that the INT isn't overwritten by the loader when brought into memory. Why have two parallel arrays for each set of APIs imported from a DLL? The answer is in a concept called binding. When the binding process rewrites the IAT in the file (I'll describe this process later), some way of getting the original information needs to remain. The INT, which is a duplicate copy of the information, is just the ticket.
      An INT isn't required for an executable to load. However, if not present, the executable cannot be bound. The Microsoft linker seems to always emit an INT, but for a long time, the Borland linker (TLINK) did not. The Borland-created files could not be bound.
      In early Microsoft linkers, the imports section wasn't all that special to the linker. All the data that made up an executable's imports came from import libraries. You could see this for yourself by running Dumpbin or PEDUMP on an import library. You'd find sections with names like .idata$3 and .idata$4. The linker simply followed its rules for combining sections, and all the structures and arrays magically fell into place. A few years back, Microsoft introduced a new import library format that creates significantly smaller import libraries at the cost of the linker taking a more active role in creating the import data.

Binding

      When an executable is bound (via the Bind program, for instance), the IMAGE_THUNK_DATA structures in the IAT are overwritten with the actual address of the imported function. The executable file on disk has the actual in-memory addresses of APIs in other DLLs in its IAT. When loading a bound executable, the Windows loader can bypass the step of looking up each imported API and writing it to the IAT. The correct address is already there! This only happens if the stars align properly, however. My May 2000 column contains some benchmarks on just how much load-time speed increase you can get from binding executables.
      You probably have a healthy skepticism about the safety of executable binding. After all, what if you bind your executable and the DLLs that it imports change? When this happens, all the addresses in the IAT are invalid. The loader checks for this situation and reacts accordingly. If the addresses in the IAT are stale, the loader still has all the necessary information from the INT to resolve the addresses of the imported APIs.
      Binding your programs at installation time is the best possible scenario. The BindImage action of the Windows installer will do this for you. Alternatively, IMAGEHLP.DLL provides the BindImageEx API. Either way, binding is good idea. If the loader determines that the binding information is current, executables load faster. If the binding information becomes stale, you're no worse off than if you hadn't bound in the first place.
      One of the key steps in making binding effective is for the loader to determine if the binding information in the IAT is current. When an executable is bound, information about the referenced DLLs is placed into the executable. The loader checks this information to make a quick determination of the binding validity. This information wasn't added with the first implementation of binding. Thus, an executable can be bound in the old way or the new way. The new way is what I'll describe here.
      The key data structure in determining the validity of bound imports is an IMAGE_BOUND_IMPORT_DESCRIPTOR. A bound executable contains a list of these structures. Each IMAGE_BOUND_IMPORT_DESCRIPTOR structure represents the time/date stamp of one imported DLL that has been bound against. The RVA of the list is given by the IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT element in the DataDirectory. The elements of the IMAGE_BOUND_IMPORT_DESCRIPTOR are:
  • TimeDateStamp, a DWORD that contains the time/date stamp of the imported DLL.
  • OffsetModuleName, a WORD that contains an offset to a string with the name of the imported DLL. This field is an offset (not an RVA) from the first IMAGE_BOUND_IMPORT_DESCRIPTOR.
  • NumberOfModuleForwarderRefs, a WORD that contains the number of IMAGE_BOUND_FORWARDER_REF structures that immediately follow this structure. These structures are identical to the IMAGE_BOUND_IMPORT_DESCRIPTOR except that the last WORD (the NumberOfModuleForwarderRefs) is reserved.
      In a simple world, the IMAGE_BOUND_IMPORT_DESCRIPTORs for each imported DLL would be a simple array. But, when binding against an API that's forwarded to another DLL, the validity of the forwarded DLL has to be checked too. Thus, the IMAGE_BOUND_FORWARDER_REF structures are interleaved with the IMAGE_BOUND_IMPORT_DESCRIPTORs.
      Let's say you linked against HeapAlloc, which is forwarded to RtlAllocateHeap in NTDLL. Then you ran BIND on your executable. In your EXE, you'd have an IMAGE_BOUND_IMPORT_DESCRIPTOR for KERNEL32.DLL, followed by an IMAGE_BOUND_FORWARDER_REF for NTDLL.DLL. Immediately following that might be additional IMAGE_ BOUND_IMPORT_DESCRIPTORs for other DLLs you imported and bound against.

Delayload Data

      Earlier I described how delayloading a DLL is a hybrid approach between an implicit import and explicitly importing APIs via LoadLibrary and GetProcAddress. Now let's take a look at the data structures and see how delayloading works.
      Remember that delayloading is not an operating system feature. It's implemented entirely by additional code and data added by the linker and runtime library. As such, you won't find many references to delayloading in WINNT.H. However, you can see definite parallels between the delayload data and regular imports data.
      The delayload data is pointed to by the IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT entry in the DataDirectory. This is an RVA to an array of ImgDelayDescr structures, defined in DelayImp.H from Visual C++. Figure 7 shows the contents. There's one ImgDelayDescr for each delayload imported DLL.

Figure 7
ImgDelayDescr Structure
Size
Member
Description
DWORD
grAttrs
The attributes for this structure. Currently, the only flag defined is dlattrRva (1), indicating that the address fields in the structure should be treated as RVAs, rather than virtual addresses.
RVA
rvaDLLName
An RVA to a string with the name of the imported DLL. This string is passed to LoadLibrary.
RVA
rvaHmod
An RVA to an HMODULE-sized memory location. When the Delayloaded DLL is brought into memory, its HMODULE is stored at this location.
RVA
rvaIAT
An RVA to the Import Address Table for this DLL. This is the same format as a regular IAT.
RVA
rvaINT
An RVA to the Import Name Table for this DLL. This is the same format as a regular INT.
RVA
rvaBoundIAT
An RVA of the optional bound IAT. An RVA to a bound copy of an Import Address Table for this DLL. This is the same format as a regular IAT. Currently, this copy of the IAT is not actually bound, but this feature may be added in future versions of the BIND program.
RVA
rvaUnloadIAT
An RVA of the optional copy of the original IAT. An RVA to an unbound copy of an Import Address Table for this DLL. This is the same format as a regular IAT. Currently always set to 0.
DWORD
dwTimeStamp
The date/time stamp of the delayload imported DLL. Normally set to 0.

      The key thing to glean from ImgDelayDescr is that it contains the addresses of an IAT and an INT for the DLL. These tables are identical in format to their regular imports equivalent, only they're written to and read by the runtime library code rather than the operating system. When you call an API from a delayloaded DLL for the first time, the runtime calls LoadLibrary (if necessary), and then GetProcAddress. The resulting address is stored in the delayload IAT so that future calls go directly to the API.
      There is a bit of goofiness about the delayload data that needs explanation. In its original incarnation in Visual C++ 6.0, all ImgDelayDescr fields containing addresses used virtual addresses, rather than RVAs. That is, they contained actual addresses where the delayload data could be found. These fields are DWORDs, the size of a pointer on the x86.
      Now fast-forward to IA-64 support. All of a sudden, 4 bytes isn't enough to hold a complete address. Ooops! At this point, Microsoft did the correct thing and changed the fields containing addresses to RVAs. As shown in Figure 7, I've used the revised structure definitions and names.
      There is still the issue of determining whether an ImgDelayDescr is using RVAs or virtual addresses. The structure has a field to hold flag values. When the "1" bit of the grAttrs field is on, the structure members should be treated as RVAs. This is the only option starting with Visual Studio® .NET and the 64-bit compiler. If that bit in grAttrs is off, the ImgDelayDescr fields are virtual addresses.

The Resources Section

      Of all the sections within a PE, the resources are the most complicated to navigate. Here, I'll describe just the data structures that are used to get to the raw resource data such as icons, bitmaps, and dialogs. I won't go into the actual format of the resource data since it's beyond the scope of this article.
      The resources are found in a section called .rsrc. The IMAGE_DIRECTORY_ENTRY_RESOURCE entry in the DataDirectory contains the RVA and size of the resources. For various reasons, the resources are organized in a manner similar to a file system—with directory and leaf nodes.
      The resource pointer from the DataDirectory points to a structure of type IMAGE_RESOURCE_DIRECTORY. The IMAGE_RESOURCE_DIRECTORY structure contains unused Characteristic, TimeDateStamp, and version number fields. The only interesting fields in an IMAGE_RESOURCE_DIRECTORY are the NumberOfNamedEntries and the NumberOfIdEntries.
      Following each IMAGE_RESOURCE_DIRECTORY structure is an array of IMAGE_RESOURCE_DIRECTORY_ENTRY structures. Adding the NumberOfNamedEntries and NumberOfIdEntries fields from the IMAGE_RESOURCE_DIRECTORY yields the count of IMAGE_RESOURCE_DIRECTORY_ENTRYs. (If all these data structure names are painful for you to read, let me tell you, it's also awkward writing about them!)
      A directory entry points to either another resource directory or to the data for an individual resource. When the directory entry points to another resource directory, the high bit of the second DWORD in the structure is set and the remaining 31 bits are an offset to the resource directory. The offset is relative to the beginning of the resource section, not an RVA.
      When a directory entry points to an actual resource instance, the high bit of the second DWORD is clear. The remaining 31 bits are the offset to the resource instance (for example, a dialog). Again, the offset is relative to the resource section, not an RVA.
      Directory entries can be named or identified by an ID value. This is consistent with resources in an .RC file where you can specify a name or an ID for a resource instance. In the directory entry, when the high bit of the first DWORD is set, the remaining 31 bits are an offset to the string name of the resource. If the high bit is clear, the bottom 16 bits contain the ordinal identifier.
      Enough theory! Let's look at an actual resource section and decipher what it means. Figure 8 shows abbreviated PEDUMP output for the resources in ADVAPI32.DLL.

Figure 8 Resources from ADVAPI32.DLL
Resources (RVA: 6B000)
ResDir (0) Entries:03 (Named:01, ID:02) TimeDate:00000000
———————————————————————————————
ResDir (MOFDATA) Entries:01 (Named:01, ID:00) TimeDate:00000000
ResDir (MOFRESOURCENAME) Entries:01 (Named:00, ID:01) TimeDate:00000000
ID: 00000409 DataEntryOffs: 00000128
DataRVA: 6B6F0 DataSize: 190F5 CodePage: 0
———————————————————————————————
ResDir (STRING) Entries:01 (Named:00, ID:01) TimeDate:00000000
ResDir (C36) Entries:01 (Named:00, ID:01) TimeDate:00000000
ID: 00000409 DataEntryOffs: 00000138
DataRVA: 6B1B0 DataSize: 0053C CodePage: 0
———————————————————————————————
ResDir (RCDATA) Entries:01 (Named:00, ID:01) TimeDate:00000000
ResDir (66) Entries:01 (Named:00, ID:01) TimeDate:00000000
ID: 00000409 DataEntryOffs: 00000148
DataRVA: 85908 DataSize: 0005C CodePage: 0
Each line that starts with "ResDir" corresponds to an IMAGE_RESOURCE_DIRECTORY structure. Following "ResDir" is the name of the resource directory, in parentheses. In this example, there are resource directories named 0, MOFDATA, MOFRESOURCENAME, STRING, C36, RCDATA, and 66. Following the name is the combined number of directory entries (both named and by ID). In this example, the topmost directory has three immediate directory entries, while all the other directories contain a single entry.
      In everyday use, the topmost directory is analogous to the root directory of a file system. Each directory entry below the "root" is always a directory in its own right. Each of these second-level directories corresponds to a resource type (strings tables, dialogs, menus, and so on). Underneath each of the second-level "resource type" directories, you'll find third-level subdirectories.
      There's a third-level subdirectory for each resource instance. For example, if there were five dialogs, there would be a second-level DIALOG directory with five directory entries beneath it. Each of the five directory entries would themselves be a directory. The name of the directory entry corresponds to the name or ID of the resource instance. Under each of these directory entries is a single item which contains the offset to the resource data. Simple, no?
      If you learn more efficiently by reading code, be sure to check out the resource dumping code in PEDUMP (see the February 2002 code download for this article). Besides displaying all the resource directories and their entries, it also dumps out several of the more common types of resource instances such as dialogs.

Base Relocations

      In many locations in an executable, you'll find memory addresses. When an executable is linked, it's given a preferred load address. These memory addresses are only correct if the executable loads at the preferred load address specified by the ImageBase field in the IMAGE_FILE_HEADER structure.
      If the loader needs to load the DLL at another address, all the addresses in the executable will be incorrect. This entails extra work for the loader. The May 2000 Under The Hood column (mentioned earlier) describes the performance hit when DLLs have the same preferred load addresses and how the REBASE tool can help.
      The base relocations tell the loader every location in the executable that needs to be modified if the executable doesn't load at the preferred load address. Luckily for the loader, it doesn't need to know any details about how the address is being used. It just knows that there's a list of locations that need to be modified in some consistent way.
      Let's look at an x86-based example to make this clear. Say you have the following instruction, which loads the value of a local variable (at address 0x0040D434) into the ECX register:
00401020: 8B 0D 34 D4 40 00  mov ecx,dword ptr [0x0040D434]
The instruction is at address 0x00401020 and is six bytes long. The first two bytes (0x8B 0x0D) make up the opcode of the instruction. The remaining four bytes hold a DWORD address (0x0040D434). In this example, the instruction is from an executable with a preferred load address of 0x00400000. The global variable is therefore at an RVA of 0xD434.
      If the executable does load at 0x00400000, the instruction can run exactly as is. But let's say that the executable somehow gets loaded at address of 0x00500000. If this happens, the last four bytes of the instruction need to be changed to 0x0050D434.
      How can the loader make this change? The loader compares the preferred and actual load addresses and calculates a delta. In this case, the delta value is 0x00100000. This delta can be added to the value of the DWORD-sized address to come up with the new address of the variable. In the previous example, there would be a base relocation for address 0x00401022, which is the location of the DWORD in the instruction.
      In a nutshell, base relocations are just a list of locations in an executable where a delta value needs to be added to the existing contents of memory. The pages of an executable are brought into memory only as they're needed, and the format of the base relocations reflects this. The base relocations reside in a section called .reloc, but the correct way to find them is from the DataDirectory using the IMAGE_DIRECTORY_ENTRY_BASERELOC entry.
      Base relocations are a series of very simple IMAGE_BASE_RELOCATION structures. The VirtualAddress field contains the RVA of the memory range to which the relocations belong. The SizeOfBlock field indicates how many bytes make up the relocation information for this base, including the size of the IMAGE_BASE_RELOCATION structure.
      Immediately following the IMAGE_BASE_RELOCATION structure is a variable number of WORD values. The number of WORDs can be deduced from the SizeOfBlock field. Each WORD consists of two parts. The top 4 bits indicate the type of relocation, as given by the IMAGE_REL_BASED_xxx values in WINNT.H. The bottom 12 bits are an offset, relative to the VirtualAddress field, where the relocation should be applied.
      In the previous example of base relocations, I simplified things a bit. There are actually multiple types of base relocations and methods for how they're applied. For x86 executables, all base relocations are of type IMAGE_REL_BASED_HIGHLOW. You will often see a relocation of type IMAGE_REL_BASED_ABSOLUTE at the end of a group of relocations. These relocations do nothing, and are there just to pad things so that the next IMAGE_BASE_RELOCATION is aligned on a 4-byte boundary.
      For IA-64 executables, the relocations seem to always be of type IMAGE_REL_BASED_DIR64. As with x86 relocations, there will often be IMAGE_REL_BASED_ABSOLUTE relocations used for padding. Interestingly, although pages in IA-64 EXEs are 8KB, the base relocations are still done in 4KB chunks.
      In Visual C++ 6.0, the linker omits relocations for EXEs when doing a release build. This is because EXEs are the first thing brought into an address space, and therefore are essentially guaranteed to load at the preferred load address. DLLs aren't so lucky, so base relocations should always be left in, unless you have a reason to omit them with the /FIXED switch. In Visual Studio .NET, the linker omits base relocations for debug and release mode EXE files.

The Debug Directory

      When an executable is built with debug information, it's customary to include details about the format of the information and where it is. The operating system doesn't require this to run the executable, but it's useful for development tools. An EXE can have multiple forms of debug information; a data structure known as the debug directory indicates what's available.
      The DebugDirectory is found via the IMAGE_DIRECTORY_ENTRY_DEBUG slot in the DataDirectory. It consists of an array of IMAGE_DEBUG_DIRECTORY structures (see Figure 9), one for each type of debug information. The number of elements in the debug directory can be calculated using the Size field in the DataDirectory.

Figure 9 Fields of IMAGE_DEBUG_DIRECTORY
Size
Member
Description
DWORD
Characteristics
Unused and set to 0.
DWORD
TimeDateStamp
The time/date stamp of this debug information (number of seconds since 1/1/1970, GMT).
WORD
MajorVersion
The major version of this debug information. Unused.
WORD
MinorVersion
The minor version of this debug information. Unused.
DWORD
Type
The type of the debug information. The following types are the most commonly encountered:
IMAGE_DEBUG_TYPE_COFF
IMAGE_DEBUG_TYPE_CODEVIEW // Including PDB files
IMAGE_DEBUG_TYPE_FPO // Frame pointer omission
IMAGE_DEBUG_TYPE_MISC // IMAGE_DEBUG_MISC
IMAGE_DEBUG_TYPE_OMAP_TO_SRC
IMAGE_DEBUG_TYPE_OMAP_FROM_SRC
IMAGE_DEBUG_TYPE_BORLAND // Borland format
DWORD
SizeOfData
The size of the debug data in this file. Doesn't count the size of external debug files such as .PDBs.
DWORD
AddressOfRawData
The RVA of the debug data, when mapped into memory. Set to 0 if the debug data isn't mapped in.
DWORD
PointerToRawData
The file offset of the debug data (not an RVA).

      By far, the most prevalent form of debug information today is the PDB file. The PDB file is essentially an evolution of CodeView-style debug information. The presence of PDB information is indicated by a debug directory entry of type IMAGE_DEBUG_TYPE_CODEVIEW. If you examine the data pointed to by this entry, you'll find a short CodeView-style header. The majority of this debug data is just a path to the external PDB file. In Visual Studio 6.0, the debug header began with an NB10 signature. In Visual Studio .NET, the header begins with an RSDS.
      In Visual Studio 6.0, COFF debug information can be generated with the /DEBUGTYPE:COFF linker switch. This capability is gone in Visual Studio .NET. Frame Pointer Omission (FPO) debug information comes into play with optimized x86 code, where the function may not have a regular stack frame. FPO data allows the debugger to locate local variables and parameters.
      The two types of OMAP debug information exist only for Microsoft programs. Microsoft has an internal tool that reorganizes the code in executable files to minimize paging. (Yes, more than the Working Set Tuner can do.) The OMAP information lets tools convert between the original addresses in the debug information and the new addresses after having been moved.
      Incidentally, DBG files also contain a debug directory like I just described. DBG files were prevalent in the Windows NT 4.0 era, and they contained primarily COFF debug information. However, they've been phased out in favor of PDB files in Windows XP.

The .NET Header

      Executables produced for the Microsoft .NET environment are first and foremost PE files. However, in most cases normal code and data in a .NET file are minimal. The primary purpose of a .NET executable is to get the .NET-specific information such as metadata and intermediate language (IL) into memory. In addition, a .NET executable links against MSCOREE.DLL. This DLL is the starting point for a .NET process. When a .NET executable loads, its entry point is usually a tiny stub of code. That stub just jumps to an exported function in MSCOREE.DLL (_CorExeMain or _CorDllMain). From there, MSCOREE takes charge, and starts using the metadata and IL from the executable file. This setup is similar to the way apps in Visual Basic (prior to .NET) used MSVBVM60.DLL. The starting point for .NET information is the IMAGE_COR20_HEADER structure, currently defined in CorHDR.H from the .NET Framework SDK and more recent versions of WINNT.H. The IMAGE_COR20_HEADER is pointed to by the IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR entry in the DataDirectory. Figure 10 shows the fields of an IMAGE_COR20_HEADER. The format of the metadata, method IL, and other things pointed to by the IMAGE_COR20_HEADER will be described in a subsequent article.

Figure 10 IMAGE_COR20_HEADER Structure
Type
Member
Description
DWORD
cb
Size of the header in bytes.
WORD
MajorRuntimeVersion
The minimum version of the runtime required to run this program. For the first release of .NET, this value is 2.
WORD
MinorRuntimeVersion
The minor portion of the version. Currently 0.
IMAGE_DATA_DIRECTORY
MetaData
The RVA to the metadata tables.
DWORD
Flags
Flag values containing attributes for this image. These values are currently defined as:
COMIMAGE_FLAGS_ILONLY // Image contains only IL code that
// is not required to run on a specific CPU.
COMIMAGE_FLAGS_32BITREQUIRED // Only runs in 32-bit processes.
COMIMAGE_FLAGS_IL_LIBRARY
STRONGNAMESIGNED // Image is signed with hash data
COMIMAGE_FLAGS_TRACKDEBUGDATA // Causes the JIT/runtime to
// keep debug information
// around for methods.

DWORD
EntryPointToken
Token for the MethodDef of the entry point for the image. The .NET runtime calls this method to begin managed execution in the file.
IMAGE_DATA_DIRECTORY
Resources
The RVA and size of the .NET resources.
IMAGE_DATA_DIRECTORY
StrongNameSignature
The RVA of the strong name hash data.
IMAGE_DATA_DIRECTORY
CodeManagerTable
The RVA of the code manager table. A code manager contains the code required to obtain the state of a running program (such as tracing the stack and track GC references).
IMAGE_DATA_DIRECTORY
VTableFixups
The RVA of an array of function pointers that need fixups. This is for support of unmanaged C++ vtables.
IMAGE_DATA_DIRECTORY
ExportAddressTableJumps
The RVA to an array of RVAs where export JMP thunks are written. These thunks allow managed methods to be exported so that unmanaged code can call them.
IMAGE_DATA_DIRECTORY
ManagedNativeHeader
For internal use of the .NET runtime in memory. Set to 0 in the executable.

TLS Initialization

      When using thread local variables declared with __declspec(thread), the compiler puts them in a section named .tls. When the system sees a new thread starting, it allocates memory from the process heap to hold the thread local variables for the thread. This memory is initialized from the values in the .tls section. The system also puts a pointer to the allocated memory in the TLS array, pointed to by FS:[2Ch] (on the x86 architecture).
      The presence of thread local storage (TLS) data in an executable is indicated by a nonzero IMAGE_DIRECTORY_ENTRY_TLS entry in the DataDirectory. If nonzero, the entry points to an IMAGE_TLS_DIRECTORY structure, shown in Figure 11.

Figure 11 IMAGE_TLS_DIRECTORY Structure

Size
Member
Description
DWORD
StartAddressOfRawData
The beginning address of a range of memory used to initialize a new thread's TLS data in memory.
DWORD
EndAddressOfRawData
The ending address of the range of memory used to initialize a new thread's TLS data in memory.
DWORD
AddressOfIndex
When the executable is brought into memory and a .tls section is present, the loader allocates a TLS handle via TlsAlloc. It stores the handle at the address given by this field. The runtime library uses this index to locate the thread local data.
DWORD
AddressOfCallBacks
Address of an array of PIMAGE_TLS_CALLBACK function pointers. When a thread is created or destroyed, each function in the list is called. The end of the list is indicated by a pointer-sized variable set to 0. In normal Visual C++ executables, this list is empty.
DWORD
SizeOfZeroFill
The size in bytes of the initialization data, beyond the initialized data delimited by the StartAddressOfRawData and EndAddressOfRawData fields. All per-thread data after this range is initialized to 0.
DWORD
Characteristics
Reserved. Currently set to 0.

      It's important to note that the addresses in the IMAGE_TLS_DIRECTORY structure are virtual addresses, not RVAs. Thus, they will get modified by base relocations if the executable doesn't load at its preferred load address. Also, the IMAGE_TLS_DIRECTORY itself is not in the .tls section; it resides in the .rdata section.

Program Exception Data

      Some architectures (including the IA-64) don't use frame-based exception handling, like the x86 does; instead, they used table-based exception handling in which there is a table containing information about every function that might be affected by exception unwinding. The data for each function includes the starting address, the ending address, and information about how and where the exception should be handled. When an exception occurs, the system searches through the tables to locate the appropriate entry and handles it. The exception table is an array of IMAGE_RUNTIME_FUNCTION_ENTRY structures. The array is pointed to by the IMAGE_DIRECTORY_ENTRY_EXCEPTION entry in the DataDirectory. The format of the IMAGE_RUNTIME_FUNCTION_ENTRY structure varies from architecture to architecture. For the IA-64, the layout looks like this:
DWORD BeginAddress;
DWORD EndAddress;
DWORD UnwindInfoAddress;
The format of the UnwindInfoAddress data isn't given in WINNT.H. However, the format can be found in Chapter 11 of the "IA-64 Software Conventions and Runtime Architecture Guide" from Intel.

Wrap-up

      The Portable Executable format is a well-structured and relatively simple executable format. It's particularly nice that PE files can be mapped directly into memory so that the data structures on disk are the same as those Windows uses at runtime. I've also been surprised at how well the PE format has held up with all the various changes that have been thrown at it in the past 10 years, including the transition to 64-bit Windows and .NET.
      Although I've covered many aspects of PE files, there are still topics that I haven't gotten to. There are flags, attributes, and data structures that occur infrequently enough that I decided not to describe them here. However, I hope that this "big picture" introduction to PE files has made the Microsoft PE specifications easier for you to understand.

An In-Depth Look into the Win32 Portable Executable File Format - Part 1

posted 15 Mar 2010, 16:57 by Delphi Basics   [ updated 15 Mar 2010, 17:18 ]

See Part 2 here: An In-Depth Look into the Win32 Portable Executable File Format - Part 2

Source: http://msdn.microsoft.com/en-us/magazine/bb985992.aspx

SUMMARY A good understanding of the Portable Executable (PE) file format leads to a good understanding of the operating system. If you know what's in your DLLs and EXEs, you'll be a more knowledgeable programmer. This article, the first of a two-part series, looks at the changes to the PE format that have occurred over the last few years, along with an overview of the format itself.
      After this update, the author discusses how the PE format fits into applications written for .NET, PE file sections, RVAs, the DataDirectory, and the importing of functions. An appendix includes lists of the relevant image header structures and their descriptions.

A long time ago, in a galaxy far away, I wrote one of my first articles for Microsoft Systems Journal (now MSDN® Magazine). The article, "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format," turned out to be more popular than I had expected. To this day, I still hear from people (even within Microsoft) who use that article, which is still available from the MSDN Library. Unfortunately, the problem with articles is that they're static. The world of Win32® has changed quite a bit in the intervening years, and the article is severely dated. I'll remedy that situation in a two-part article starting this month.
      You might be wondering why you should care about the executable file format. The answer is the same now as it was then: an operating system's executable format and data structures reveal quite a bit about the underlying operating system. By understanding what's in your EXEs and DLLs, you'll find that you've become a better programmer all around.
      Sure, you could learn a lot of what I'll tell you by reading the Microsoft specification. However, like most specs, it sacrifices readability for completeness. My focus in this article will be to explain the most relevant parts of the story, while filling in the hows and whys that don't fit neatly into a formal specification. In addition, I have some goodies in this article that don't seem to appear in any official Microsoft documentation.

Bridging the Gap

      Let me give you just a few examples of what has changed since I wrote the article in 1994. Since 16-bit Windows® is history, there's no need to compare and contrast the format to the Win16 New Executable format. Another welcome departure from the scene is Win32s®. This was the abomination that ran Win32 binaries very shakily atop Windows 3.1.
      Back then, Windows 95 (codenamed "Chicago" at the time) wasn't even released. Windows NT® was still at version 3.5, and the linker gurus at Microsoft hadn't yet started getting aggressive with their optimizations. However, there were MIPS and DEC Alpha implementations of Windows NT that added to the story.
      And what about all the new things that have come along since that article? 64-bit Windows introduces its own variation of the Portable Executable (PE) format. Windows CE adds all sorts of new processor types. Optimizations such as delay loading of DLLs, section merging, and binding were still over the horizon. There are many new things to shoehorn into the story.
      And let's not forget about Microsoft® .NET. Where does it fit in? To the operating system, .NET executables are just plain old Win32 executable files. However, the .NET runtime recognizes data within these executable files as the metadata and intermediate language that are so central to .NET. In this article, I'll knock on the door of the .NET metadata format, but save a thorough survey of its full splendor for a subsequent article.
      And if all these additions and subtractions to the world of Win32 weren't enough justification to remake the article with modern day special effects, there are also errors in the original piece that make me cringe. For example, my description of Thread Local Storage (TLS) support was way out in left field. Likewise, my description of the date/time stamp DWORD used throughout the file format is accurate only if you live in the Pacific time zone!
      In addition, many things that were true then are incorrect now. I had stated that the .rdata section wasn't really used for anything important. Today, it certainly is. I also said that the .idata section is a read/write section, which has been found to be most untrue by people trying to do API interception today.
      Along with a complete update of the PE format story in this article, I've also overhauled the PEDUMP program, which displays the contents of PE files. PEDUMP can be compiled and run on both the x86 and IA-64 platforms, and can dump both 32 and 64-bit PE files. Most importantly, full source code for PEDUMP is available for download fropm the link at the top of this article, so you have a working example of the concepts and data structures described here.

Overview of the PE File Format

      Microsoft introduced the PE File format, more commonly known as the PE format, as part of the original Win32 specifications. However, PE files are derived from the earlier Common Object File Format (COFF) found on VAX/VMS. This makes sense since much of the original Windows NT team came from Digital Equipment Corporation. It was natural for these developers to use existing code to quickly bootstrap the new Windows NT platform.
      The term "Portable Executable" was chosen because the intent was to have a common file format for all flavors of Windows, on all supported CPUs. To a large extent, this goal has been achieved with the same format used on Windows NT and descendants, Windows 95 and descendants, and Windows CE.
      OBJ files emitted by Microsoft compilers use the COFF format. You can get an idea of how old the COFF format is by looking at some of its fields, which use octal encoding! COFF OBJ files have many data structures and enumerations in common with PE files, and I'll mention some of them as I go along.
      The addition of 64-bit Windows required just a few modifications to the PE format. This new format is called PE32+. No new fields were added, and only one field in the PE format was deleted. The remaining changes are simply the widening of certain fields from 32 bits to 64 bits. In most of these cases, you can write code that simply works with both 32 and 64-bit PE files. The Windows header files have the magic pixie dust to make the differences invisible to most C++-based code.
      The distinction between EXE and DLL files is entirely one of semantics. They both use the exact same PE format. The only difference is a single bit that indicates if the file should be treated as an EXE or as a DLL. Even the DLL file extension is artificial. You can have DLLs with entirely different extensions—for instance .OCX controls and Control Panel applets (.CPL files) are DLLs.
      A very handy aspect of PE files is that the data structures on disk are the same data structures used in memory. Loading an executable into memory (for example, by calling LoadLibrary) is primarily a matter of mapping certain ranges of a PE file into the address space. Thus, a data structure like the IMAGE_NT_HEADERS (which I'll examine later) is identical on disk and in memory. The key point is that if you know how to find something in a PE file, you can almost certainly find the same information when the file is loaded in memory.
      It's important to note that PE files are not just mapped into memory as a single memory-mapped file. Instead, the Windows loader looks at the PE file and decides what portions of the file to map in. This mapping is consistent in that higher offsets in the file correspond to higher memory addresses when mapped into memory. The offset of an item in the disk file may differ from its offset once loaded into memory. However, all the information is present to allow you to make the translation from disk offset to memory offset (see Figure 1).


Figure 1 Offsets

      When PE files are loaded into memory via the Windows loader, the in-memory version is known as a module. The starting address where the file mapping begins is called an HMODULE. This is a point worth remembering: given an HMODULE, you know what data structure to expect at that address, and you can use that knowledge to find all the other data structures in memory. This powerful capability can be exploited for other purposes such as API interception. (To be completely accurate, an HMODULE isn't the same as the load address under Windows CE, but that's a story for yet another day.)
      A module in memory represents all the code, data, and resources from an executable file that is needed by a process. Other parts of a PE file may be read, but not mapped in (for instance, relocations). Some parts may not be mapped in at all, for example, when debug information is placed at the end of the file. A field in the PE header tells the system how much memory needs to be set aside for mapping the executable into memory. Data that won't be mapped in is placed at the end of the file, past any parts that will be mapped in.
      The central location where the PE format (as well as COFF files) is described is WINNT.H. Within this header file, you'll find nearly every structure definition, enumeration, and #define needed to work with PE files or the equivalent structures in memory. Sure, there is documentation elsewhere. MSDN has the "Microsoft Portable Executable and Common Object File Format Specification," for instance (see the October 2001 MSDN CD under Specifications). But WINNT.H is the final word on what PE files look like.
      There are many tools for examining PE files. Among them are Dumpbin from Visual Studio, and Depends from the Platform SDK. I particularly like Depends because it has a very succinct way of examining a file's imports and exports. A great free PE viewer is PEBrowse Professional, from Smidgeonsoft (http://www.smidgeonsoft.com). The PEDUMP program included with this article is also very comprehensive, and does almost everything Dumpbin does.
      From an API standpoint, the primary mechanism provided by Microsoft for reading and modifying PE files is IMAGEHLP.DLL.
      Before I start looking at the specifics of PE files, it's worthwhile to first review a few basic concepts that thread their way through the entire subject of PE files. In the following sections, I will discuss PE file sections, relative virtual addresses (RVAs), the data directory, and how functions are imported.

PE File Sections

      A PE file section represents code or data of some sort. While code is just code, there are multiple types of data. Besides read/write program data (such as global variables), other types of data in sections include API import and export tables, resources, and relocations. Each section has its own set of in-memory attributes, including whether the section contains code, whether it's read-only or read/write, and whether the data in the section is shared between all processes using the executable.
      Generally speaking, all the code or data in a section is logically related in some way. At a minimum, there are usually at least two sections in a PE file: one for code, the other for data. Commonly, there's at least one other type of data section in a PE file. I'll look at the various kinds of sections in Part 2 of this article next month.
      Each section has a distinct name. This name is intended to convey the purpose of the section. For example, a section called .rdata indicates a read-only data section. Section names are used solely for the benefit of humans, and are insignificant to the operating system. A section named FOOBAR is just as valid as a section called .text. Microsoft typically prefixes their section names with a period, but it's not a requirement. For years, the Borland linker used section names like CODE and DATA.
      While compilers have a standard set of sections that they generate, there's nothing magical about them. You can create and name your own sections, and the linker happily includes them in the executable. In Visual C++, you can tell the compiler to insert code or data into a section that you name with #pragma statements. For instance, the statement
#pragma data_seg( "MY_DATA" )
causes all data emitted by Visual C++ to go into a section called MY_DATA, rather than the default .data section. Most programs are fine using the default sections emitted by the compiler, but occasionally you may have funky requirements which necessitate putting code or data into a separate section.
      Sections don't spring fully formed from the linker; rather, they start out in OBJ files, usually placed there by the compiler. The linker's job is to combine all the required sections from OBJ files and libraries into the appropriate final section in the PE file. For example, each OBJ file in your project probably has at least a .text section, which contains code. The linker takes all the sections named .text from the various OBJ files and combines them into a single .text section in the PE file. Likewise, all the sections named .data from the various OBJs are combined into a single .data section in the PE file. Code and data from .LIB files are also typically included in an executable, but that subject is outside the scope of this article.
      There is a rather complete set of rules that linkers follow to decide which sections to combine and how. I gave an introduction to the linker algorithms in the July 1997 Under The Hood column in MSJ. A section in an OBJ file may be intended for the linker's use, and not make it into the final executable. A section like this would be intended for the compiler to pass information to the linker.
      Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values, which can differ. Each section starts at an offset that's some multiple of the alignment value. For instance, in the PE file, a typical alignment would be 0x200. Thus, every section begins at a file offset that's a multiple of 0x200.
      Once mapped into memory, sections always start on at least a page boundary. That is, when a PE section is mapped into memory, the first byte of each section corresponds to a memory page. On x86 CPUs, pages are 4KB aligned, while on the IA-64, they're 8KB aligned. The following code shows a snippet of PEDUMP output for the .text and .data section of the Windows XP KERNEL32.DLL.
Section Table
01 .text VirtSize: 00074658 VirtAddr: 00001000
raw data offs: 00000400 raw data size: 00074800
•••
02 .data VirtSize: 000028CA VirtAddr: 00076000
raw data offs: 00074C00 raw data size: 00002400
The .text section is at offset 0x400 in the PE file and will be 0x1000 bytes above the load address of KERNEL32 in memory. Likewise, the .data section is at file offset 0x74C00 and will be 0x76000 bytes above KERNEL32's load address in memory.
      It's possible to create PE files in which the sections start at the same offset in the file as they start from the load address in memory. This makes for larger executables, but can speed loading under Windows 9x or Windows Me. The default /OPT:WIN98 linker option (introduced in Visual Studio 6.0) causes PE files to be created this way. In Visual Studio® .NET, the linker may or may not use /OPT:NOWIN98, depending on whether the file is small enough.
      An interesting linker feature is the ability to merge sections. If two sections have similar, compatible attributes, they can usually be combined into a single section at link time. This is done via the linker /merge switch. For instance, the following linker option combines the .rdata and .text sections into a single section called .text:
/MERGE:.rdata=.text
      The advantage to merging sections is that it saves space, both on disk and in memory. At a minimum, each section occupies one page in memory. If you can reduce the number of sections in an executable from four to three, there's a decent chance you'll use one less page of memory. Of course, this depends on whether the unused space at the end of the two merged sections adds up to a page.
      Things can get interesting when you're merging sections, as there are no hard and fast rules as to what's allowed. For example, it's OK to merge .rdata into .text, but you shouldn't merge .rsrc, .reloc, or .pdata into other sections. Prior to Visual Studio .NET, you could merge .idata into other sections. In Visual Studio .NET, this is not allowed, but the linker often merges parts of the .idata into other sections, such as .rdata, when doing a release build.
      Since portions of the imports data are written to by the Windows loader when they are loaded into memory, you might wonder how they can be put in a read-only section. This situation works because at load time the system can temporarily set the attributes of the pages containing the imports data to read/write. Once the imports table is initialized, the pages are then set back to their original protection attributes.

Relative Virtual Addresses

      In an executable file, there are many places where an in-memory address needs to be specified. For instance, the address of a global variable is needed when referencing it. PE files can load just about anywhere in the process address space. While they do have a preferred load address, you can't rely on the executable file actually loading there. For this reason, it's important to have some way of specifying addresses that are independent of where the executable file loads.
      To avoid having hardcoded memory addresses in PE files, RVAs are used. An RVA is simply an offset in memory, relative to where the PE file was loaded. For instance, consider an EXE file loaded at address 0x400000, with its code section at address 0x401000. The RVA of the code section would be:
(target address) 0x401000 - (load address)0x400000  = (RVA)0x1000.
      To convert an RVA to an actual address, simply reverse the process: add the RVA to the actual load address to find the actual memory address. Incidentally, the actual memory address is called a Virtual Address (VA) in PE parlance. Another way to think of a VA is that it's an RVA with the preferred load address added in. Don't forget the earlier point I made that a load address is the same as the HMODULE.
      Want to go spelunking through some arbitrary DLL's data structures in memory? Here's how. Call GetModuleHandle with the name of the DLL. The HMODULE that's returned is just a load address; you can apply your knowledge of the PE file structures to find anything you want within the module.

The Data Directory

      There are many data structures within executable files that need to be quickly located. Some obvious examples are the imports, exports, resources, and base relocations. All of these well-known data structures are found in a consistent manner, and the location is known as the DataDirectory.
      The DataDirectory is an array of 16 structures. Each array entry has a predefined meaning for what it refers to. The IMAGE_DIRECTORY_ENTRY_ xxx #defines are array indexes into the DataDirectory (from 0 to 15). Figure 2 describes what each of the IMAGE_DATA_DIRECTORY_xxx values refers to. A more detailed description of many of the pointed-to data structures will be included in Part 2 of this article.

Figure 2 IMAGE_DATA_DIRECTORY Values
Value
Description
IMAGE_DIRECTORY_ENTRY_EXPORT
Points to the exports (an IMAGE_EXPORT_DIRECTORY structure).
IMAGE_DIRECTORY_ENTRY_IMPORT
Points to the imports (an array of IMAGE_IMPORT_DESCRIPTOR structures).
IMAGE_DIRECTORY_ENTRY_RESOURCE
Points to the resources (an IMAGE_RESOURCE_DIRECTORY structure.
IMAGE_DIRECTORY_ENTRY_EXCEPTION
Points to the exception handler table (an array of IMAGE_RUNTIME_FUNCTION_ENTRY structures). CPU-specific and for table-based exception handling. Used on every CPU except the x86.
IMAGE_DIRECTORY_ENTRY_SECURITY
Points to a list of WIN_CERTIFICATE structures, defined in WinTrust.H. Not mapped into memory as part of the image. Therefore, the VirtualAddress field is a file offset, rather than an RVA.
IMAGE_DIRECTORY_ENTRY_BASERELOC
Points to the base relocation information.
IMAGE_DIRECTORY_ENTRY_DEBUG
Points to an array of IMAGE_DEBUG_DIRECTORY structures, each describing some debug information for the image. Early Borland linkers set the Size field of this IMAGE_DATA_DIRECTORY entry to the number of structures, rather than the size in bytes. To get the number of IMAGE_DEBUG_DIRECTORYs, divide the Size field by the size of an IMAGE_DEBUG_DIRECTORY.
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE
Points to architecture-specific data, which is an array of IMAGE_ARCHITECTURE_HEADER structures. Not used for x86 or IA-64, but appears to have been used for DEC/Compaq Alpha.
IMAGE_DIRECTORY_ENTRY_GLOBALPTR
The VirtualAddress field is the RVA to be used as the global pointer (gp) on certain architectures. Not used on x86, but is used on IA-64. The Size field isn't used. See the November 2000 Under The Hood column for more information on the IA-64 gp.
IMAGE_DIRECTORY_ENTRY_TLS
Points to the Thread Local Storage initialization section.
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG
Points to an IMAGE_LOAD_CONFIG_DIRECTORY structure. The information in an IMAGE_LOAD_CONFIG_DIRECTORY is specific to Windows NT, Windows 2000, and Windows XP (for example, the GlobalFlag value). To put this structure in your executable, you need to define a global structure with the name __load_config_used, and of type IMAGE_LOAD_CONFIG_DIRECTORY. For non-x86 architectures, the symbol name needs to be _load_config_used (with a single underscore). If you do try to include an IMAGE_LOAD_CONFIG_DIRECTORY, it can be tricky to get the name right in your C++ code. The symbol name that the linker sees must be exactly: __load_config_used (with two underscores). The C++ compiler adds an underscore to global symbols. In addition, it decorates global symbols with type information. So, to get everything right, in your C++ code, you'd have something like this:
extern "C"
IMAGE_LOAD_CONFIG_DIRECTORY _load_config_used = {...}
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
Points to an array of IMAGE_BOUND_IMPORT_DESCRIPTORs, one for each DLL that this image has bound against. The timestamps in the array entries allow the loader to quickly determine whether the binding is fresh. If stale, the loader ignores the binding information and resolves the imported APIs normally.
IMAGE_DIRECTORY_ENTRY_IAT
Points to the beginning of the first Import Address Table (IAT). The IATs for each imported DLL appear sequentially in memory. The Size field indicates the total size of all the IATs. The loader uses this address and size to temporarily mark the IATs as read-write during import resolution.
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT
Points to the delayload information, which is an array of CImgDelayDescr structures, defined in DELAYIMP.H from Visual C++. Delayloaded DLLs aren't loaded until the first call to an API in them occurs. It's important to note that Windows has no implicit knowledge of delay loading DLLs. The delayload feature is completely implemented by the linker and runtime library.
IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR
This value has been renamed to IMAGE_DIRECTORY_ENTRY_COMHEADER in more recent updates to the system header files. It points to the top-level information for .NET information in the executable, including metadata. This information is in the form of an IMAGE_COR20_HEADER structure.

Importing Functions

      When you use code or data from another DLL, you're importing it. When any PE file loads, one of the jobs of the Windows loader is to locate all the imported functions and data and make those addresses available to the file being loaded. I'll save the detailed discussion of data structures used to accomplish this for Part 2 of this article, but it's worth going over the concepts here at a high level.
      When you link directly against the code and data of another DLL, you're implicitly linking against the DLL. You don't have to do anything to make the addresses of the imported APIs available to your code. The loader takes care of it all. The alternative is explicit linking. This means explicitly making sure that the target DLL is loaded and then looking up the address of the APIs. This is almost always done via the LoadLibrary and GetProcAddress APIs.
      When you implicitly link against an API, LoadLibrary and GetProcAddress-like code still executes, but the loader does it for you automatically. The loader also ensures that any additional DLLs needed by the PE file being loaded are also loaded. For instance, every normal program created with Visual C++® links against KERNEL32.DLL. KERNEL32.DLL in turn imports functions from NTDLL.DLL. Likewise, if you import from GDI32.DLL, it will have dependencies on the USER32, ADVAPI32, NTDLL, and KERNEL32 DLLs, which the loader makes sure are loaded and all imports resolved. (Visual Basic 6.0 and the Microsoft .NET executables directly link against a different DLL than KERNEL32, but the same principles apply.)
      When implicitly linking, the resolution process for the main EXE file and all its dependent DLLs occurs when the program first starts. If there are any problems (for example, a referenced DLL that can't be found), the process is aborted.
      Visual C++ 6.0 added the delayload feature, which is a hybrid between implicit linking and explicit linking. When you delayload against a DLL, the linker emits something that looks very similar to the data for a regular imported DLL. However, the operating system ignores this data. Instead, the first time a call to one of the delayloaded APIs occurs, special stubs added by the linker cause the DLL to be loaded (if it's not already in memory), followed by a call to GetProcAddress to locate the called API. Additional magic makes it so that subsequent calls to the API are just as efficient as if the API had been imported normally.
      Within a PE file, there's an array of data structures, one per imported DLL. Each of these structures gives the name of the imported DLL and points to an array of function pointers. The array of function pointers is known as the import address table (IAT). Each imported API has its own reserved spot in the IAT where the address of the imported function is written by the Windows loader. This last point is particularly important: once a module is loaded, the IAT contains the address that is invoked when calling imported APIs.
      The beauty of the IAT is that there's just one place in a PE file where an imported API's address is stored. No matter how many source files you scatter calls to a given API through, all the calls go through the same function pointer in the IAT.
      Let's examine what the call to an imported API looks like. There are two cases to consider: the efficient way and inefficient way. In the best case, a call to an imported API looks like this:
CALL DWORD PTR [0x00405030]
If you're not familiar with x86 assembly language, this is a call through a function pointer. Whatever DWORD-sized value is at 0x405030 is where the CALL instruction will send control. In the previous example, address 0x405030 lies within the IAT.
      The less efficient call to an imported API looks like this:
CALL 0x0040100C
•••
0x0040100C:
JMP DWORD PTR [0x00405030]
In this situation, the CALL transfers control to a small stub. The stub is a JMP to the address whose value is at 0x405030. Again, remember that 0x405030 is an entry within the IAT. In a nutshell, the less efficient imported API call uses five bytes of additional code, and takes longer to execute because of the extra JMP.
      You're probably wondering why the less efficient method would ever be used. There's a good explanation. Left to its own devices, the compiler can't distinguish between imported API calls and ordinary functions within the same module. As such, the compiler emits a CALL instruction of the form
CALL XXXXXXXX
where XXXXXXXX is an actual code address that will be filled in by the linker later. Note that this last CALL instruction isn't through a function pointer. Rather, it's an actual code address. To keep the cosmic karma in balance, the linker needs to have a chunk of code to substitute for XXXXXXXX. The simplest way to do this is to make the call point to a JMP stub, like you just saw.
      Where does the JMP stub come from? Surprisingly, it comes from the import library for the imported function. If you were to examine an import library, and examine the code associated with the imported API name, you'd see that it's a JMP stub like the one just shown. What this means is that by default, in the absence of any intervention, imported API calls will use the less efficient form.
      Logically, the next question to ask is how to get the optimized form. The answer comes in the form of a hint you give to the compiler. The __declspec(dllimport) function modifier tells the compiler that the function resides in another DLL and that the compiler should generate this instruction
CALL DWORD PTR [XXXXXXXX]
rather than this one:
CALL XXXXXXXX
      In addition, the compiler emits information telling the linker to resolve the function pointer portion of the instruction to a symbol named __imp_functionname. For instance, if you were calling MyFunction, the symbol name would be __imp_MyFunction. Looking in an import library, you'll see that in addition to the regular symbol name, there's also a symbol with the __imp__ prefix on it. This __imp__ symbol resolves directly to the IAT entry, rather than to the JMP stub.
      So what does this mean in your everyday life? If you're writing exported functions and providing a .H file for them, remember to use the __declspec(dllimport) modifier with the function:
__declspec(dllimport) void Foo(void);
If you look at the Windows system header files, you'll find that they use __declspec(dllimport) for the Windows APIs. It's not easy to see this, but if you search for the DECLSPEC_IMPORT macro defined in WINNT.H, and which is used in files such as WinBase.H, you'll see how __declspec(dllimport) is prepended to the system API declarations.

PE File Structure

      Now let's dig into the actual format of PE files. I'll start from the beginning of the file, and describe the data structures that are present in every PE file. Afterwards, I'll describe the more specialized data structures (such as imports or resources) that reside within a PE's sections. All of the data structures that I'll discuss below are defined in WINNT.H, unless otherwise noted.
      In many cases, there are matching 32 and 64-bit data structures—for example, IMAGE_NT_HEADERS32 and IMAGE_NT_HEADERS64. These structures are almost always identical, except for some widened fields in the 64-bit versions. If you're trying to write portable code, there are #defines in WINNT.H which select the appropriate 32 or 64-bit structures and alias them to a size-agnostic name (in the previous example, it would be IMAGE_NT_HEADERS). The structure selected depends on which mode you're compiling for (specifically, whether _WIN64 is defined or not). You should only need to use the 32 or 64-bit specific versions of the structures if you're working with a PE file with size characteristics that are different from those of the platform you're compiling for.

The MS-DOS Header

      Every PE file begins with a small MS-DOS® executable. The need for this stub executable arose in the early days of Windows, before a significant number of consumers were running it. When executed on a machine without Windows, the program could at least print out a message saying that Windows was required to run the executable.
      The first bytes of a PE file begin with the traditional MS-DOS header, called an IMAGE_DOS_HEADER. The only two values of any importance are e_magic and e_lfanew. The e_lfanew field contains the file offset of the PE header. The e_magic field (a WORD) needs to be set to the value 0x5A4D. There's a #define for this value, named IMAGE_DOS_SIGNATURE. In ASCII representation, 0x5A4D is MZ, the initials of Mark Zbikowski, one of the original architects of MS-DOS.

The IMAGE_NT_HEADERS Header

      The IMAGE_NT_HEADERS structure is the primary location where specifics of the PE file are stored. Its offset is given by the e_lfanew field in the IMAGE_DOS_HEADER at the beginning of the file. There are actually two versions of the IMAGE_NT_HEADER structure, one for 32-bit executables and the other for 64-bit versions. The differences are so minor that I'll consider them to be the same for the purposes of this discussion. The only correct, Microsoft-approved way of differentiating between the two formats is via the value of the Magic field in the IMAGE_OPTIONAL_HEADER (described shortly).
      An IMAGE_NT_HEADER is comprised of three fields:
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
In a valid PE file, the Signature field is set to the value 0x00004550, which in ASCII is "PE00". A #define, IMAGE_NT_SIGNATURE, is defined for this value. The second field, a struct of type IMAGE_FILE_HEADER, predates PE files. It contains some basic information about the file; most importantly, a field describing the size of the optional data that follows it. In PE files, this optional data is very much required, but is still called the IMAGE_OPTIONAL_HEADER.
      Figure 3 shows the fields of the IMAGE_FILE_HEADER structure, with additional notes for the fields. This structure can also be found at the very beginning of COFF OBJ files.

Figure 3
IMAGE_FILE_HEADER
Size
Field
Description
WORD
Machine
The target CPU for this executable. Common values are:
IMAGE_FILE_MACHINE_I386    0x014c // Intel 386
IMAGE_FILE_MACHINE_IA64 0x0200 // Intel 64
WORD
NumberOfSections
Indicates how many sections are in the section table. The section table immediately follows the IMAGE_NT_HEADERS.
DWORD
TimeDateStamp
Indicates the time when the file was created. This value is the number of seconds since January 1, 1970, Greenwich Mean Time (GMT). This value is a more accurate indicator of when the file was created than is the file system date/time. An easy way to translate this value into a human-readable string is with the _ctime function (which is time-zone-sensitive!). Another useful function for working with this field is gmtime.
DWORD
PointerToSymbolTable
The file offset of the COFF symbol table, described in section 5.4 of the Microsoft specification. COFF symbol tables are relatively rare in PE files, as newer debug formats have taken over. Prior to Visual Studio .NET, a COFF symbol table could be created by specifying the linker switch /DEBUGTYPE:COFF. COFF symbol tables are almost always found in OBJ files. Set to 0 if no symbol table is present.
DWORD
NumberOfSymbols
Number of symbols in the COFF symbol table, if present. COFF symbols are a fixed size structure, and this field is needed to find the end of the COFF symbols. Immediately following the COFF symbols is a string table used to hold longer symbol names.
WORD
SizeOfOptionalHeader
The size of the optional data that follows the IMAGE_FILE_HEADER. In PE files, this data is the IMAGE_OPTIONAL_HEADER. This size is different depending on whether it's a 32 or 64-bit file. For 32-bit PE files, this field is usually 224. For 64-bit PE32+ files, it's usually 240. However, these sizes are just minimum values, and larger values could appear.
WORD
Characteristics
A set of bit flags indicating attributes of the file. Valid values of these flags are the IMAGE_FILE_xxx values defined in WINNT.H. Some of the more common values include those listed in Figure 4.

Figure 4 lists the common values of IMAGE_FILE_xxx.
Figure 4 IMAGE_FILE_XXX
Value
Description
IMAGE_FILE_RELOCS_STRIPPED
Relocation information stripped from a file.
IMAGE_FILE_EXECUTABLE_IMAGE
The file is executable.
IMAGE_FILE_AGGRESIVE_WS_TRIM
Lets the OS aggressively trim the working set.
IMAGE_FILE_LARGE_ADDRESS_AWARE
The application can handle addresses greater than two gigabytes.
IMAGE_FILE_32BIT_MACHINE
This requires a 32-bit word machine.
IMAGE_FILE_DEBUG_STRIPPED
Debug information is stripped to a .DBG file.
IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP
If the image is on removable media, copy to and run from the swap file.
IMAGE_FILE_NET_RUN_FROM_SWAP
If the image is on a network, copy to and run from the swap file.
IMAGE_FILE_DLL
The file is a DLL.
IMAGE_FILE_UP_SYSTEM_ONLY
The file should only be run on single-processor machines.

Figure 5 shows the members of the IMAGE_OPTIONAL_HEADER structure.
Figure 5 IMAGE_OPTIONAL_HEADER
Size
Structure Member
Description
WORD
Magic
A signature WORD, identifying what type of header this is. The two most common values are IMAGE_NT_OPTIONAL_HDR32_MAGIC 0x10b and IMAGE_NT_OPTIONAL_HDR64_MAGIC 0x20b.
BYTE
MajorLinkerVersion
The major version of the linker used to build this executable. For PE files from the Microsoft linker, this version number corresponds to the Visual Studio version number (for example, version 6 for Visual Studio 6.0).
BYTE
MinorLinkerVersion
The minor version of the linker used to build this executable.
DWORD
SizeOfCode
The combined total size of all sections with the IMAGE_SCN_CNT_CODE attribute.
DWORD
SizeOfInitializedData
The combined size of all initialized data sections.
DWORD
SizeOfUninitializedData
The size of all sections with the uninitialized data attributes. This field will often be 0, since the linker can append uninitialized data to the end of regular data sections.
DWORD
AddressOfEntryPoint
The RVA of the first code byte in the file that will be executed. For DLLs, this entrypoint is called during process initialization and shutdown and during thread creations/destructions. In most executables, this address doesn't directly point to main, WinMain, or DllMain. Rather, it points to runtime library code that calls the aforementioned functions. This field can be set to 0 in DLLs, and none of the previous notifications will be received. The linker /NOENTRY switch sets this field to 0.
DWORD
BaseOfCode
The RVA of the first byte of code when loaded in memory.
DWORD
BaseOfData
Theoretically, the RVA of the first byte of data when loaded into memory. However, the values for this field are inconsistent with different versions of the Microsoft linker. This field is not present in 64-bit executables.
DWORD
ImageBase
The preferred load address of this file in memory. The loader attempts to load the PE file at this address if possible (that is, if nothing else currently occupies that memory, it's aligned properly and at a legal address, and so on). If the executable loads at this address, the loader can skip the step of applying base relocations (described in Part 2 of this article). For EXEs, the default ImageBase is 0x400000. For DLLs, it's 0x10000000. The ImageBase can be set at link time with the /BASE switch, or later with the REBASE utility.
DWORD
SectionAlignment
The alignment of sections when loaded into memory. The alignment must be greater or equal to the file alignment field (mentioned next). The default alignment is the page size of the target CPU. For user mode executables to run under Windows 9x or Windows Me, the minimum alignment size is a page (4KB). This field can be set with the linker /ALIGN switch.
DWORD
FileAlignment
The alignment of sections within the PE file. For x86 executables, this value is usually either 0x200 or 0x1000. The default has changed with different versions of the Microsoft linker. This value must be a power of 2, and if the SectionAlignment is less than the CPU's page size, this field must match the SectionAlignment. The linker switch /OPT:WIN98 sets the file alignment on x86 executables to 0x1000, while /OPT:NOWIN98 sets the alignment to 0x200.
WORD
MajorOperatingSystemVersion
The major version number of the required operating system. With the advent of so many versions of Windows, this field has effectively become irrelevant.
WORD
MinorOperatingSystemVersion
The minor version number of the required OS.
WORD
MajorImageVersion
The major version number of this file. Unused by the system and can be 0. It can be set with the linker /VERSION switch.
WORD
MinorImageVersion
The minor version number of this file.
WORD
MajorSubsystemVersion
The major version of the operating subsystem needed for this executable. At one time, it was used to indicate that the newer Windows 95 or Windows NT 4.0 user interface was required, as opposed to older versions of the Windows NT interface. Today, because of the proliferation of the various versions of Windows, this field is effectively unused by the system and is typically set to the value 4. Set with the linker /SUBSYSTEM switch.
WORD
MinorSubsystemVersion
The minor version of the operating subsystem needed for this executable.
DWORD
Win32VersionValue
Another field that never took off. Typically set to 0.
DWORD
SizeOfImage
SizeOfImage contains the RVA that would be assigned to the section following the last section if it existed. This is effectively the amount of memory that the system needs to reserve when loading this file into memory. This field must be a multiple of the section alignment.
DWORD
SizeOfHeaders
The combined size of the MS-DOS header, PE headers, and section table. All of these items will occur before any code or data sections in the PE file. The value of this field is rounded up to a multiple of the file alignment.
DWORD
CheckSum
The checksum of the image. The CheckSumMappedFile API in IMAGEHLP.DLL can calculate this value. Checksums are required for kernel-mode drivers and some system DLLs. Otherwise, this field can be 0. The checksum is placed in the file when the /RELEASE linker switch is used.
WORD
Subsystem
An enum value indicating what subsystem (user interface type) the executable expects. This field is only important for EXEs. Important values include:
IMAGE_SUBSYSTEM_NATIVE       // Image doesn't require a subsystem
IMAGE_SUBSYSTEM_WINDOWS_GUI // Use the Windows GUI
IMAGE_SUBSYSTEM_WINDOWS_CUI // Run as a console mode application
// When run, the OS creates a console
// window for it, and provides stdin,
// stdout, and stderr file handles
WORD
DllCharacteristics
Flags indicating characteristics of this DLL. These correspond to the IMAGE_DLLCHARACTERISTICS_xxx fields #defines. Current values are:
IMAGE_DLLCHARACTERISTICS_NO_BIND
// Do not bind this image
IMAGE_DLLCHARACTERISTICS_WDM_DRIVER
// Driver uses WDM model
IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE
// When the terminal server loads
// an application that is not
// Terminal- Services-aware, it
// also loads a DLL that contains
// compatibility code
DWORD
SizeOfStackReserve
In EXE files, the maximum size the initial thread in the process can grow to. This is 1MB by default. Not all this memory is committed initially.
DWORD
SizeOfStackCommit
In EXE files, the amount of memory initially committed to the stack. By default, this field is 4KB.
DWORD
SizeOfHeapReserve
In EXE files, the initial reserved size of the default process heap. This is 1MB by default. In current versions of Windows, however, the heap can grow beyond this size without intervention by the user.
DWORD
SizeOfHeapCommit
In EXE files, the size of memory committed to the heap. By default, this is 4KB.
DWORD
LoaderFlags
This is obsolete.
DWORD
NumberOfRvaAndSizes
At the end of the IMAGE_NT_HEADERS structure is an array of IMAGE_DATA_DIRECTORY structures. This field contains the number of entries in the array. This field has been 16 since the earliest releases of Windows NT.
IMAGE_
DataDirectory[16]
An array of IMAGE_DATA_DIRECTORY structures. Each structure contains the RVA and size of some important part of the executable (for instance, imports, exports, resources).

      The DataDirectory array at the end of the IMAGE_OPTIONAL_HEADERs is the address book for important locations within the executable. Each DataDirectory entry looks like this:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress; // RVA of the data
DWORD Size; // Size of the data
};

The Section Table

      Immediately following the IMAGE_NT_HEADERS is the section table. The section table is an array of IMAGE_SECTION_HEADERs structures. An IMAGE_SECTION_HEADER provides information about its associated section, including location, length, and characteristics. Figure 6 contains a description of the IMAGE_SECTION_HEADER fields. The number of IMAGE_SECTION_HEADER structures is given by the IMAGE_NT_HEADERS.FileHeader.NumberOfSections field.

Figure 6 The IMAGE_SECTION_HEADER
Size
Field
Description
BYTE
Name[8]
The ASCII name of the section. A section name is not guaranteed to be null-terminated. If you specify a section name longer than eight characters, the linker truncates it to eight characters in the executable. A mechanism exists for allowing longer section names in OBJ files. Section names often start with a period, but this is not a requirement. Section names with a $ in the name get special treatment from the linker. Sections with identical names prior to the $ character are merged. The characters following the $ provide an alphabetic ordering for how the merged sections appear in the final section. There's quite a bit more to the subject of sections with $ in the name and how they're combined, but the details are outside the scope of this article
DWORD
Misc.VirtualSize
Indicates the actual, used size of the section. This field may be larger or smaller than the SizeOfRawData field. If the VirtualSize is larger, the SizeOfRawData field is the size of the initialized data from the executable, and the remaining bytes up to the VirtualSize should be zero-padded. This field is set to 0 in OBJ files.
DWORD
VirtualAddress
In executables, indicates the RVA where the section begins in memory. Should be set to 0 in OBJs.
DWORD
SizeOfRawData
The size (in bytes) of data stored for the section in the executable or OBJ. For executables, this must be a multiple of the file alignment given in the PE header. If set to 0, the section is uninitialized data.
DWORD
PointerToRawData
The file offset where the data for the section begins. For executables, this value must be a multiple of the file alignment given in the PE header.
DWORD
PointerToRelocations
The file offset of relocations for this section. This is only used in OBJs and set to zero for executables. In OBJs, it points to an array of IMAGE_RELOCATION structures if non-zero.
DWORD
PointerToLinenumbers
The file offset for COFF-style line numbers for this section. Points to an array of IMAGE_LINENUMBER structures if non-zero. Only used when COFF line numbers are emitted.
WORD
NumberOfRelocations
The number of relocations pointed to by the PointerToRelocations field. Should be 0 in executables.
WORD
NumberOfLinenumbers
The number of line numbers pointed to by the NumberOfRelocations field. Only used when COFF line numbers are emitted.
DWORD
Characteristics
Flags OR'ed together, indicating the attributes of this section. Many of these flags can be set with the linker's /SECTION option. Common values include those listed in Figure 7.

      The file alignment of sections in the executable file can have a significant impact on the resulting file size. In Visual Studio 6.0, the linker defaulted to a section alignment of 4KB, unless /OPT:NOWIN98 or the /ALIGN switch was used. The Visual Studio .NET linker, while still defaulting to /OPT:WIN98, determines if the executable is below a certain size and if that is the case uses 0x200-byte alignment.
      Another interesting alignment comes from the .NET file specification. It says that .NET executables should have an in-memory alignment of 8KB, rather than the expected 4KB for x86 binaries. This is to ensure that .NET executables built with x86 entry point code can still run under IA-64. If the in-memory section alignment were 4KB, the IA-64 loader wouldn't be able to load the file, since pages are 8KB on 64-bit Windows.

Wrap-up

      That's it for the headers of PE files. In Part 2 of this article I'll continue the tour of portable executable files by looking at commonly encountered sections. Then I'll describe the major data structures within those sections, including imports, exports, and resources. And finally, I'll go over the source for the updated and vastly improved PEDUMP.

Preserve Code Folding When Reopening a Project

posted 13 Mar 2010, 17:06 by Delphi Basics

Introduced in Delphi 8, code folding is a feature of the Delphi IDE which lets you collapse (hide) and expand (show) your code to make it easier to navigate and read. By default, Delphi Code editor adds folding regions to classes, function and procedures and unit section. By default, all regions are expanded. Note the little "-" sign to the left of a procedure declaration line. Click it to collapse the procedure implementation code, click it again to expand it.

Custom collapsible code blocks in Delphi Code Editor can be easily created using the "{$REGION 'Caption'}" and "{$ENDREGION}" directives. If you collapse a region and close the project, next time you open the project (a code unit) the "collapsed" code block will be expanded. Code folding enables the user to read and navigate their code more easily but, when the project is closed, all the regions are expanded.

In Delphi 2005, a feature was introduced to preserve the code folding preferences so that the folded code can be folded after reopening the project.
Tools - Options - Environment Options - Autosave options - Project desktop (check box) .

Delphi saves the arrangement of your desktop when you close a project or close the IDE. When you later open the same project, all files opened when the project was last closed are opened again, regardless of whether they are used by the project. Folded code will stay folded when the project is reopened. :)

Read another article here:
http://delphi.about.com/od/delphitips2009/qt/preserve-code-folding.htm

Screenshot coming later.

1-10 of 17