In this blog post, I will talk about the capabilities of the Eclipse OMR Compiler’s option processing framework, its current limitations, and share my experience re-working the options processing framework to address these issues.
The OMR Compiler Options Processing Framework
The OMR Compiler technology’s option processing framework provides a very powerful mechanism to control its run-time behaviour. The options can be specified through the use of environment variables, or as command line arguments when the compiler is being used as part of a runtime. Some of the useful capabilities of option processing include being able to control the level of optimization to apply, what features to enable/disable, specifying the methods to compile and applying certain options to their compilations, etc. These capabilities make the option processing framework a very useful tool for OMR Compiler developers from playing with experimental features to problem determination. To learn more about how to use the OMR Compiler options, you can read more here.
However, there are a number of issues that exist in the current options processing framework that make it difficult for both new and experienced OMR Compiler developers to work with.
Challenges of working with OMR::Options
To demonstrate some of the challenges of working with OMR::Options, I will walk you through the process of adding a new boolean option.
Regardless of whether this option is used within OMR or downstream projects, you will first need to go to omr/compiler/control/OMROptions.hpp and find the TR_CompilationOptions enum definition that looks something like this:
enum TR_CompilationOptions { . . . TR_DisableStripMining = 0x00040000 + 8, TR_EnableSharedCacheTiming = 0x00080000 + 8, // Available = 0x00100000 + 8, . . . }
The boolean options are stored in a single 32-bit integer field and are set and queried by bitmasking using their values defined in enum TR_CompilationOptions. To add a new option, you need to define it in one of the available slots in the enum definition (eg, 0x00100000 + 8 in the snippet above).
The next step is to add the ability to toggle this option through the command-line, for which you will need to add an entry to the compiler option table defined in omr/compiler/control/OMROptions.cpp. The option table is sorted alphabetically, so you have to ensure that your new entry does not break the order. Here is how the option table looks like:
// The following options must be placed in alphabetical order for them to work properly TR::OptionTable OMR::Options::_jitOptions[] = { { "abstractTimeGracePeriodInliningAggressiveness=", "O<nnn>Time to maintain full inlining aggressiveness\t", TR::Options::setStaticNumeric, (intptrj_t)&OMR::Options::_abstractTimeGracePeriod, 0, "F%d", NOT_IN_SUBSET }, { "abstractTimeToReduceInliningAggressiveness=", "O<nnn>Time to lower inlining aggressiveness from highest to lowest level\t", TR::Options::setStaticNumeric, (intptrj_t)&OMR::Options::_abstractTimeToReduceInliningAggressiveness, 0, "F%d", NOT_IN_SUBSET }, {"acceptHugeMethods", "O\tallow processing of really large methods", SET_OPTION_BIT(TR_ProcessHugeMethods), "F" }, . . .
The option table in OMROptions.cpp is very large mainly because it has to contain entries that are used in all downstream projects (ie, OpenJ9). This is problematic because if more language runtimes consume OMR, more of these project-specific options would have to be added into this table. It may also result in running out of space in the enum definition to add any new boolean options. The option table has to be in alphabetical order because the command line options are matched to the appropriate entry in the table using a string comparison binary search.
The OMR::Options class is very large, combining all aspects of option processing in one place. As demonstrated above, if someone wanted to add a new option, there is a lot of code to navigate to achieve that. Another limitation developers have to deal with is that there is no standard way to set default values for options, so they have to be set somewhere during the initialization of the options. This doesn’t happen at the same place, so it adds to the difficulty in tracking how the options are set and modified during the lifetime of a compilation or runtime.
Redesign the options processing framework
I spent a lot of time trying to grasp how the option processing mechanism worked and experimented with different approaches to solve the issues discussed in the previous section. My task was to redesign the OMR Options class to address known limitations without removing any existing capabilities or negatively affecting performance.
The final solution involved using a script to process some parts of the options framework at build time. The script, called `options-gen` takes JSON files containing information required to generate data fields for the options as well as table entries required for processing those options when they are specified in the command line arguments.
I’ve split the responsibilities of option processing to different classes, namely:
-
- CompilerOptions
- CompilerOptionsManager
- OptionsBuilder
- OptionProcessors
I will now explain the roles of these components and talk about some of the design decisions I’ve made along the way. Hopefully, this will help you get a picture of how they work together and address the issues with the existing Options class.
options-gen
options-gen is a python script that runs at build time and simplifies the management of the option list. The input to this script is a JSON file containing OMR options (and optionally, downstream project-specific options). It introduces a number of new capabilities to option processing such as:
- removes the need to have any downstream project options within the OMR project
- simplifies adding new options and removing existing ones
- provides a mechanism to set default values for options
- generates a hash table for option look up at build time, reducing the cost of initializing options at start-up
The file containing the list of OMR options is called OMROptions.json. In its current state, it contains downstream project options too. When downstream projects migrate to using the new option processing framework, the project specific options will be moved to Options.json in the downstream projects. This migration will happen over time as we identify what must stay in OMR and what needs to go to their project-specific option list. options-gen combines the data from OMROptions.json and Options.json and processes them together.
Here is how a JSON entry looks like:
{ "name": "x86UseMFENCE", "category": "M", "desc": "Enable to use mfence to handle volatile store", "option-member": "TR_X86UseMFENCE", "type": "bool", "default": "false", "processing-fn": "setTrue", "subsettable": "no" }
- name: The name field is used as an identifier for an option. Every entry should have a unique name. options-gen will warn you if there are duplicate option names. Options provided in the command line and the environment are matched against this field.
- category: The category field is used to group the options into different categories (eg, codegen, optimization, logging, etc). In this case, “M” stands for miscellaneous options. This information will be used when outputting options help text.
- desc: Option description text explaining what this option does. It will be used when outputting options help text.
- option-member: This field is used to identify the field in the options class that stores the value of the option.
- type: Type of the option data.
- default: This is the value the option is initialized as, and will remain that way unless changed through the command line, environment variable, or some other logic after options have been processed. For boolean options, the default is false. Numeric and textual data may use this field to set a default value, and will also have the option of using an additional field in the json object to provide as a parameter to the processing function.
- processing-fn: This is the name of the function in the TR::OptionProcessors class that will be used to process this option if it appears on the command line or environment variable.
- subsettable: This field is used to set whether an option can be a part of an option subset to apply to specific method compilations.
To add a new option, all you need to do is to add a new entry anywhere in the JSON file and re-build. If the available option processing functions do not meet your requirements, you may define a new function in TR::OptionProcessors and specify its name in the processing-fn field.
options-gen generates the following files at build-time, and are included in different files of the new options processing classes:
- OptionCharMap.inc: an array of case-insensitive values associated with the characters used in the hashing function to determine the index into the hash table. This array is indexed into using the ascii values.
- OptionTableProperties.inc: contains macro definitions regarding the option table properties such as min/max hash value, table size, etc
- OptionTableEntries.inc: contains an aggregate initialization list for the hash table
- Options.inc: contains the data members of the CompilerOptions class
- OptionTranslatingSwitch.inc: contains the body of a switch statement that helps translating between the option word masks used in the existing Options class and the corresponding pointer to member of the new CompilerOptions class. This is temporarily in place to enable querying the new options class using the existing getOption(TR_CompilationOptions o) and setOption(TR_CompilationOption o) functions.
- OptionEnumToStringSwitch.inc: contains the body of a switch statement that translates the existing option masks to strings that can be used for debugging cases of option setting mismatches
The option lookup table
The generated OptionCharMap.inc, OptionTableProperties.inc, and OptionTableEntries.inc files are used for the option lookup table. As discussed earlier, the existing option table design looks something like this:
{ {aa}, {ab}, {ad}, {ba}, . . NULL }
Only option name field (and dummy option names) are used to represent the entries for simplicity. Matching command-line options to the entry would require a binary search on the entire OMR option table as well as the front-end option table if the option was not found in the OMR option table.
Since we already know all the option names at build time, it was a good opportunity to make use of a build time generated hash table. It had to be done by options-gen because not all of OMR’s C++ build compilers supported constant expressions to set this table up at build time. The approach involves options-gen reading the JSON files, calculating hash values using a hashing algorithm, and creating an aggregate initializer list for initializing an array of table entries and writing it to OptionTableEntries.inc. This is how it looks like:
{ {{aa}}, {{ab},{ba}}, {}, {{ad}}, . . {} }
Again, I’ve simplified the entries to make it easier to understand. To get the entry for an option in the command line, OptionsBuilder simply calculates the hash value of the option using the same hashing algorithm used by options-gen to generate the table. The hash value is used to index into the table, obtaining the “bucket”. A bucket can be thought of as a row in the table. The bucket can have multiple entries if there are collisions, so after accessing the bucket, a string compare is required to verify if we got the correct entry and trying the other entries in the bucket if not. With the current set of boolean options in OMR the worst case scenario is 4 entries in a single bucket, so the impact is minimal.
The hash table is not perfect, nor minimal. I’ve spent some time investigating the possibility of generating a perfect hash table so that I could completely eliminate string comparisons when looking up the table. The approach I tried was to mimic gperf‘s perfect hash function generating mechanism. The algorithm I implemented would try to come up with the right combination of associated values for every letter such that no 2 input strings would hash to the same value. This is why the OptionCharMap.inc file is generated, which is used by the C++ hashing algorithm to reach the same hash value for a given option name.
Although options-gen was able to create a perfect hash table, it turned out to be not suitable in this case because the set of option names in OMR were too similar and there were many of them. This resulted in options-gen requiring a long time (about a minute) to come up with the right combination of associated values to have every option hash to a unique value. Another drawback of that approach was that the generated files would not be the same for the same input due to some randomness involved in the algorithm, which may complicate error reproduction and debugging. Furthermore, the hash value range was extremely large, resulting in a very large table. Therefore, I’ve decided to allow collisions in the hash table.
CompilerOptions
The members of the class CompilerOptions are used to store the option data to query from. The list of members of this class is in the generated Options.inc file. Every option gets its own field in this class.
For boolean options, it’s a departure from how it is in the existing implementation, where a single 32-bit integer field is used to store all the option data. The integer field was read and set using bit manipulation using the option flag words as mask. This approach was very space-efficient, but querying the bit settings had more overhead than simply querying boolean fields. To verify that, I’ve implemented a benchmark that queried random options using the 2 different approaches 1 billion times each, and the result was that the boolean field approach was about 3x faster to query and set than the bit manipulation approach. Using boolean fields meant we need slightly more memory to store the boolean options.
OptionProcessors
OptionProcessors is an extensible class that contains the option processing functions. There will be a set of standard processing functions (eg, setTrue, setFalse, setInt32, setStringFromOptionArg, etc) that you may use for processing an option you add to the JSON files. If you would need your option to be processed in a very specific way not handled by existing processors (eg, option affecting multiple members of the option class), you may simply define your own and specify its name in the processing-fn field of the JSON object.
CompilerOptionsManager and OptionsBuilder
CompilerOptionsManager is meant to manage the different aspects of option processing, such as initializing the options, returning the appropriate options for the current method being compiled, etc.
OptionsBuidler’s main role is to parse the command-line string, get the appropriate option table entries, and process the options (and option sets, if applicable).
Current status
I have opened a pull request contributing the main components of the new option processing framework I just discussed in this blog. It only introduces support for boolean options. By using #if/#else pre-processor directives, I set up a transition state where the new options can be used only by explicitly defining the NEW_OPTIONS macro, which can be done by running cmake configure with -DOMR_COMPILER_NEW_OPTIONS=ON. Currently, the new options are initialized with the existing options to make the transition state easier to manage, but eventually this will be managed by the CompilerOptionsManager.
To avoid any potential performance regression, I used an OpenJ9 build that makes use of the new options and ran DayTrader startup and throughput benchmarks on x86-64 Linux, PPC64LE Linux, and Z Linux. The results suggest that initializing and querying the new option processing mechanism did not cause any performance regression, even when for every call of getOption(), there was a comparison done between the results obtained from the existing Options object and the new CompilerOptions object.
What’s next
Over the next few months, I will open a series of pull requests across OMR and OpenJ9 to add the remaining parts of the new options processing framework and eventually completely replace the existing Options class.
Thanks for reading my first Eclipse OMR blog!
About me
I will start the last year of my undergraduate program at the University of Alberta in September this year, majoring in Computing Science with a minor in Mathematics. I was introduced to the Eclipse OMR project in the summer of 2017, when I joined Dr. Sarah Nadi‘s research group at the University of Alberta. As an undergraduate summer research assistant, I was part of the team that started a new IBM CAS research project working on a variability-aware static analysis tool for C++ projects. The goal was to leverage Clang’s static analysis capabilities and introduce a variant handling mechanism so that the tool can analyze multiple build variants simultaneously and report errors in any configuration. Such a tool would help eliminate errors that only manifest themselves in certain build configurations. This project was motivated by the OMR Compiler’s complex variability mechanism making use of -D directives as well varying the -I paths to determine the inheritance hierarchy for its extensible classes implementing a static polymorphism mechanism. A publication we made in late 2017 talks more in detail about the project, and you can read it here.
My work with Clang in the research project got me interested in working with and learning more about compilers, and so the following year I joined the OMR Compiler team in the IBM Toronto Software Lab to satisfy my program’s industrial experience requirement.
I was recently trying to figure out how we set default values for options in the current codebase, and it was less than completely intuitive. This new framework sounds great to me!
Great post, many thanks 🙂
Heroic! That framework has been begging for a revamp for a long, long time.