2.0.5-beta (revision 1823)
OpenMP Pragma And Region Instrumentor
Latest Release News

Modularization

The instrumentation functionality in OPARI2 was rewritten in a way to make it easier to add extended support for more paradigms than OpenMP and POMP2 user instrumentation. These changes are mostly under the hood, so the user experience should stay mostly the same. However some command line options have changed:

LINK STEP

OPARI2 uses a new mechanism to link files. The main advantage is, that no opari.rc file is needed anymore. Libraries can now be preinstrumented and parallel builds are supported. To achieve this, the handles for parallel regions are instrumented using a ctc_string.

POMP2

The POMP2 interface is not compatible with the original POMP interface. All functions of the new API begin with POMP2_. The declaration prototypes can be found in pomp2_lib.h.

POMP2_Parallel_fork

The POMP2_Parallel_fork() call has an additional argument to pass the requested number of threads to the POMP2 library. This allows the library to prepare data structures and allocate memory for the threads before they are created. The value passed to the library is determined as follows:

In Fortran, instead of omp_get_max_threads(), a wrapper function pomp_get_max_threads_XXX_X is used. This function is needed to avoid multiple definitions of omp_get_max_threads() since we do not know whether it is defined in the user code or not. Removing all definitions in the user code would require much more Fortran parsing than is done with OPARI2, since function definitions cannot easily be distinguished from variable definitions.

pomp_tpd

If it is necessary for the POMP2 library to pass information from the master thread to its children, the option –tpd can be used. OPARI2 uses the copyin clause to pass a threadprivate variable pomp_tpd to the newly spawned threads at the beginning of a parallel region. This is a 64 bit integer variable, since Fortran does not allow pointers. However a pointer can be stored in this variable, passed to child threads with the copyin clause (in C/C++ or Fortran) and later on be cast back to a pointer in the pomp library.

To support mixed programming (C/Fortran) the variable name depends on the name mangling of the Fortran compiler. This means, for GNU, Sun, Intel and PGI C compilers the variable is called pomp_tpd_ and for IBM it is called pomp_tpd in C. In Fortran it is of course always called pomp_tpd. The –tpd-mangling option can be used to change this. The variable is declared extern in all program units, so the pomp library contains the actual variable declaration of pomp_tpd as a 64 bit integer.

Tasking construct

In OpenMP 3.0 the new tasking construct was introduced. All parts of a program are now implicitly executed as tasks and the user gets the possibility of creating tasks that can be scheduled for asynchronous execution. Furthermore these tasks can be interrupted at certain scheduling points and resumed later on (see the OpenMP API 3.0 for more detailed information).

OPARI2 instruments functions POMP2_Task_create_begin and POMP2_Task_create_end to allow the recording of the task creation time. For the task execution time, the functions POMP2_Task_begin and POMP2_Task_end are instrumented in the code. To correctly record a profile or a trace of a program execution these different instances of tasks need to be differentiated. Since OpenMP does not provide Task ids, the performance measurement system needs to create and maintain own task ids. This cannot be done by code instrumentation as done by OPARI2 alone but requires some administration of task ids during runtime. To allow the measurement system to administrate these ids, additional task id parameters (pomp_old_task/pomp_new_task) were added to all functions belonging to OpenMP constructs which are task scheduling points. With this package there is a "dummy" library, which can be used as an adapter to your measurement system. This library contains all the relevant functionality to keep track of the different instances of tasks and it is highly recommended to use it as a template to implement your own adapter for your measurement system.

For more detailed information on this mechanism see:
"How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP"
by Daniel Lorenz, Bernd Mohr, Christian Rössel, Dirk Schmidl, and Felix Wolf
In: Proc. of 6th Int. Workshop of OpenMP (IWOMP), LNCS, vol. 6132, pp. 109121
DOI: 10.1007/978-3-642-13217-9_9

Preprocessing of source files

OPARI2 allows to instrument preprocessed source files. This feature is useful when header files contain OpenMP code or when preprocessor defines are used around OpenMP constructs. To use this feature, some special steps have to be taken before the instrumentation, to ensure, that the instrumented code is at the right position.

These steps are:

  1. Add the following lines as first lines of your source file.
   #include <stdint.h>
   #include <opari2/pomp2_lib.h>
   ___POMP2_INCLUDE___
  1. Preprocess the source file with the same preprocessor defines as usual. Usually compilers provide an option to do this step (e.g. -E for the gcc compiler). We recommend to use the same compiler for this step and for the compilation later on, since compilers set additional defines.
  2. Instrument the generated file with OPARI2 and the flag –preprocessed.
  3. Proceed with the instrumented file as usual.