BMOW title
Floppy Emu banner

What Happens Before main()

Did you know that a C program’s main() function is not the first code to be run? Depending on the program and the compiler, there are all kinds of interesting and complex functions that get run before main(), automatically inserted by the compiler and invisible to casual observers. For the past several days I’ve been on a quest to reverse engineer a minimal C program, to see what’s inside the executable file and how it’s put together. I was generally aware that some kind of special initialization happened before main() was called, but knew nothing about the details. As it turned out, understanding what happens before main() proved to be central to explaining large chunks of mystery code that I’d struggled with during my first analysis.

In my previous post, I used dumpbin, OllyDbg, and the IDA disassembler to examine the contents of a Windows executable file created from an 18 line C program. This example program is a text console application that only references printf, scanf, and strlen. The C functions compile into 120 bytes of x86 code. Yet dumpbin revealed that the executable file contained 2234 bytes of code, and imported 38 different functions from DLLs. It also located over 1300 bytes of unknown data and constants. The implementations of printf etc were in a C runtime library DLL, so that couldn’t explain the unexpected code bloat. Something else was at work.

 
Scaffold for a C Program

By compiling with debug symbols, loading the executable in a debugger, and examining the disassembly, I was able to see the true structure of the example program. This included all the things happening behind the scenes. You can view the complete disassembly with symbols here. Here’s an outline, based on compiling with Microsoft Visual Studio Express 2012, for a release build with compiler settings selected to eliminate all extras like C++ exception handling and array bounds checking. Pseudocode function names are my descriptions and don’t necessarily match the names obtained from debug symbols.

ProgramEntryPoint()
{
    security_init_cookie();
    
    // beginning of __tmainCRTStartup()
    setup_SEH_frame();
    
    // call init functions from a table of function pointers:
    
    // from pre_c_init()
    is_managed_app = ParseAppHeader(); // checks for initial "MZ" bytes, PE header fields 
    init_exit_callbacks();
    run_time_error_checking_initialize(); // calls init functions from an empty table
    matherr();
    setusermatherr(matherr);
    setdefaultprecision(); // calls controlfp_s(0) and maybe calls invoke_watson()
    configthreadlocale(); // for C library function string formatting of numbers and time

    CxxSetUnhandledExceptionFilter(myExceptionFilter);

    // from pre_cpp_init()
    register_exit_callback(run_time_error_checking_terminate); 
    get_command_line_args();

    // check tls_init_callback
    if (dynamic_thread_local_storage_callback != 0 && IsNonWritableInCurrentImage())
    {
        dynamic_thread_local_storage_callback();
    }

    // now the C program runs
    retVal = main();

    // C program has now finished
    if (!is_managed_app)
    {
        // clean-up C library, and terminate process
        exit(retVal); 
    }
    else
    {
        // clean-up C library, but do not terminate process
        cexit();

        cleanup_SEH_frame();
        return retVal;
    }
}
 
IsNonWritableInCurrentImage()
{
    check_security_cookie();

    return (ValidateImageBase() &&
            IsNonWritable(FindPESection()));
}

myExceptionFilter()
{
    if (IsRecognizedExceptionType())
    {
         terminate();
         break_in_debugger();
    }
}

register_exit_callback(pCallback)
{
   setup_SEH_frame();
   onexit(pCallback);
   // also maintains onexit callbacks for DLLs
   cleanup_SEH_frame();
}

This was enough to help me identify the general purpose of most of the code in the executable file, even if the details weren’t all entirely clear. During the program analysis in my previous post, I was confused by large chunks of code that didn’t appear to be called from anywhere. The answer to that mystery was tables of function pointers, which I discovered are used in many places during program startup to call a whole series of initialization functions. The addresses of the functions are stored in a table in the data section, and then the address of the table is passed to _initterm. I’d thought _initterm had something to do with terminal settings, but it’s actually just a helper function to iterate over a table and call each function.

Even with that mystery explained, there were still quite a few snippets of unreachable code in the disassembly. Most of these were only 5 or 10 lines of code, and appeared to be related to other nearby functions. My guess is that many of these scaffold/startup functions were written in assembly language by Microsoft developers, and the linker can’t tell which lines are actually used or not. As a result of some conditionally-included features, or just carelessness on the part of the compiler development team, a few lines of orphaned code were left over and got included into my example program’s executable.

 
Exploring the Scaffold Functions

Let’s start at the entry point and work our way through the scaffold functions.

security_init_cookie is related to a compiler-generated security feature that checks for buffer overruns. This function generates a cookie value based on the current time and other data that’s difficult for an attacker to predict. On entry to an overrun-protected function, the cookie is put on the stack, and on exit, the value on the stack is compared with the global cookie. In this example program, buffer overrun checking was explicitly disabled in the compiler settings, yet security_init_cookie is called anyway. Hmm.

Next the structured exception handling frame is configured on the stack. SEH is a Windows mechanism that’s used to catch and handle CPU exceptions. I’ve never used them, but I believe they can be used to handle errors like division by zero or invalid memory references.

The next set of functions are called from a pointer table that’s placed in the data section, rather than by direct function calls. The code parses the in-memory executable header, including the DOS and PE headers, to determine whether this is a managed app or if it’s native code. It then initializes the exit callbacks, a mechanism that can be used to register other functions to be called when the program exits. Following this, it calls a function to initialize run-time error checks, another compiler-generated feature that can catch problems with type conversions and uninitialized variables. In the example program, run-time error checks were disabled in the compiler settings. The call to init RTC is still present, but it uses an empty table of function pointers to do its work, and so it ultimately does nothing.

After this it calls the math error handler, and then installs that error handler. I’m not sure why it directly calls the math error handler first, but it’s a stub function that does nothing and returns zero.

The call after the math handler initialization is to an internal function called setdefaultprecision, which sets the precision used for floating point calculations. The implementation of this function is curious. It calls controlfp_s(0) to set the precision, and if this returns an error, it invokes the Doctor Watson debugger. This is the only place in any of the scaffolding code where Doctor Watson is referenced or used. If it’s used at all, I would have expected to see it as part of the exception handling mechanism, but in fact it’s only called here during initialization of the floating point precision.

The last task performed by pre_c_init is to configure the locale settings, to help make correctly-formatted numbers and date strings in the C standard library functions.

Next, the scaffold code registers a handler to be used for SEH exceptions. This handler is mostly useless. If the exception is one of four recognized types, the handler calls terminate and then performs an INT 3 debugger break. Otherwise it just returns without doing anything.

After that, the code registers an exit callback function which terminates the run-time error checking feature. The registration mechanism makes use of SEH frames. It also appears to handle exit functions for DLLs, although I was unclear about exactly how that works. I assume that if a DLL used by the program needs to perform some kind of clean-up code or destructors before the program exits, it can register a callback here.

get_command_line_args does what it sounds like, and initializes argc, argv, and envp. I never really thought about it before, but of course these need to be provided by the operating system somehow, and this is where it happens.

The next piece of code is the most complicated and confusing of the whole lot. The code checks the value of something called __dyn_tls_init_callback, which is a global variable initialized to zero in the program’s .data section. This appears related to thread local storage – an area of memory that’s unique for each thread. If __dyn_tls_init_callback is not zero (though I don’t see any mechanism that could make it be non-zero), it calls another internal function called IsNonWritableInCurrentImage. This is the beginning of a fairly involved group of functions that scan the in-memory DOS and PE headers, and attempt to locate a particular section in the PE header. Depending on what it finds there, it may or may not call the __dyn_tls_init_callback function. Notably, IsNonWritableInCurrentImage also makes use of the security cookie for detecting buffer overruns.

Finally, after all this setup work, at last it’s time to call the C main() function. Hooray! This is where the real work happens, and what most people think of as “the program” when they talk about a C-based software application.

Eventually the C program finishes its work, and control returns from main(). The scaffolding code is now responsible for cleaning things up and shutting everything down in an orderly manner. If it was previously determined that this is not a managed app, the code simply calls exit() to terminate the process. On the other hand, if it is a managed app, the scaffold code calls cexit(), cleans up the SEH frame, and returns control to whomever originally called the entry point.

 
Efficiency

From my description, I hope it’s clear that the scaffold functions aren’t especially space-efficient. Probably most people don’t care about a few hundred or few thousand bytes of code wasted, but it’s easy to see where some optimizations could be made:

When the compiler knows ahead of time that RTC checking is disabled, it should completely eliminate the functions related to RTC initialization and cleanup, instead of retaining them but having them iterate over an empty function table.

If the main() function doesn’t use argc and argv, then don’t bother to call get_command_line_args().

The compiler must know whether it’s making a native or managed app, so it can set the scaffold behavior as needed for each case. This would be far simpler than including code to parse the PE header at runtime, and shutdown/cleanup code that must handle both native and managed cases.

 
Bypassing the Scaffolding

While the inefficiencies of the scaffold code are annoying, what’s more bothersome is that many of the scaffold features simply can’t be turned off by any compiler setting that I’ve found. If we group the scaffold functions into broad categories, it looks like this:

  • buffer overrun detection
  • SEH handling
  • run-time error checks (RTC)
  • math error handling and default precision
  • exit callbacks
  • thread local storage
  • command line args
  • managed/native app detection
  • locale settings

It would be great if there were compiler settings that could be used to disable each of these features when appropriate, for squeezing the last few hundred bytes out of the code. What’s maddening is that there are settings to disable the first two, but it appears they only prevent the features from being used in the main body of code. Support for the features is still present in the executable, because the scaffolding code uses them.

Another approach is to define a custom entry point for the program, and bypass the scaffolding completely. This could be as simple as adding

int MyEntryPoint()
{
    return main(0, NULL);
}

and then setting the program’s entry point to MyEntryPoint in the advanced linker settings. This causes all of the standard scaffold code to be omitted, and with my example program it shrunk the executable from 6144 to 2560 bytes. It also drastically reduced the number of external functions in the imports list, from 38 to 3.

Caution: when using this approach, none of the standard systems will be initialized. The program will misbehave or crash if it attempts to use the command line args, or thread local storage, or locale-dependent functions in the C runtime. The custom entry point can initialize many of these manually if needed. Most of the necessary functions like __getmainargs are documented in MSDN. The rest can be handled by using the debugger to examine the scaffold code, and copying what it does.

Read 5 comments and join the conversation 

5 Comments so far

  1. Buddy October 2nd, 2015 9:05 pm

    Check out how small you can get it — 97 bytes!
    http://www.phreedom.org/research/tinype/

  2. IuriC October 4th, 2015 4:31 pm

    Nice series of articles! Keep it going!

    IuriC

  3. WestfW November 15th, 2015 4:11 pm

    A similar analysis for an Arduino sketch is here: https://blog.adafruit.com/2015/11/15/what-happens-before-main/
    (I guess it’s a bit outdated by now, but should still be somewhat interesting.)

  4. Steve Chamberlin November 16th, 2015 5:15 pm

    Wait, a similar analysis for an Arduino sketch? No, that’s just my own BMOW write-up reposted at Adafruit.

  5. WestfW November 16th, 2015 6:07 pm

Leave a reply. Comments may not be monitored regularly. For product support questions, visit the Contact page.