-
Notifications
You must be signed in to change notification settings - Fork 58
Timemory
This profiling tool is designed to work with the modular performance analysis library: timemory. Timemory is a template library whose goal is provide a flexible and highly-efficient API for implementing one or more performance analysis tools in the form of "components" which are recursively injected by other components or explicitly listed in variadic tuple-like structures.
This connector provides an easy method for Kokkos developers to generating custom
analysis metrics: Kokkos developers can write their own component specifications and
the add the name of the component into the profile_entry_t
component tuple. In general,
a component that collects data should be a struct with following minimum specification:
- A default constructor
- Inherit from CRTP base class
base<this_type, value_type>
-
this_type
is the component itself -
value_type
is measurement data type- E.g. a wall-clock timer may store the start/stop values as integer timestamps
-
-
T get() const
-
T
can be any type - E.g. a wall-clock timer may convert the difference of the integer timestamps into a double-precision floating-point value
-
-
U get_display() const
-
U
can be any type - This should return a type that supports
operator<<
for printing to screen
-
-
static value_type record()
- This is used for one-time measurements
-
static std::string label()
- This is used for generating the output file name
static std::string description()
-
void start()
member function- This should (generally) update
value_type value
inherited from the base class
- This should (generally) update
-
void start()
member function- This should (generally) update
value_type accum
from the inherited base class with the delta
- This should (generally) update
namespace tim
{
namespace component
{
struct trip_count : public base<trip_count, int64_t>
{
using value_type = int64_t;
using this_type = trip_count;
using base_type = base<this_type, value_type>;
static std::string label() { return "trip_count"; }
static std::string description() { return "Number of invocations"; }
static value_type record() { return 1; }
value_type get() const { return accum; }
value_type get_display() const { return get(); }
void start() { value = record(); }
void stop() { accum += value; }
};
} // namespace component
} // namespace tim
Several pre-built components are provided, these can be queried with the timemory-avail
command-line tool.
Using the pre-built components or a user-built component is straight-forward. Components can be explicitly specified at compile time or used within other components.
For example, component_tuple<wall_clock, cpu_clock>
creates a single handle for a wall-clock timer
and cpu-clock timer that are started and stopped via the obj.start()
and obj.stop()
member functions.
component_tuple<user_tuple_bundle> obj
combined with user_tuple_bundle::configure<wall_clock, cpu_clock>()
accomplishes the same result. In the former method, direct access to the tools
is possible in C++, e.g. obj.get<wall_clock>().start()
, but the components must be specified at compile time whereas the latter does not allow direct access to the tool but allows for
dynamic runtime configuration of the tool.
First install the timemory library. The timemory library is uses a standard CMake build system. It is recommended to toggle the statistics settings as desired and to build the Python interface (provides plotting for Kokkos output in addition to the Python profiling and instrumentation capabilities).
git clone https://github.com/NERSC/timemory.git timemory
mkdir build-timemory
cd build-timemory
cmake -DCMAKE_INSTALL_PREFIX=/usr/local -DBUILD_STATIC_LIBS=OFF ../timemory
make -j8
make install -j8
The TIMEMORY_REQUIRE_PACKAGES=ON
option will add REQUIRED
to every find_package(...)
. This
is quite useful to be sure that the installation includes all the tools you might want to
use in Kokkos-tools. External packages
The build system uses CMake. Prefix the environment variable CMAKE_PREFIX_PATH
with the root
folder of the timemory installation, run CMake, and build/install.
Various options in the form of USE_<PACKAGE>
should be configured to
activate the various external libraries. Timers and memory measurements require no external
packages. By default, all forms of output are generated.
The output comes in 4 forms: print to screen at end of application,
output to text at end of application, output to JSON at end of application,
and, provided the Python interface was built and JSON output is enabled, the JSON output
will be plotted. If any errors in the plotting occur, ensure the timemory Python installation
is in PYTHONPATH
and the Python installation includes matplotlib
and pillow
.
The default kp_timemory.so
connector library uses the KOKKOS_TIMEMORY_COMPONENTS
environment
variable to specify the components to measure.
The option BUILD_CONFIG=ON
will generate several connector libraries that will explicitly
measure certain components, e.g. kp_timemory_timers.so
will generate a connector for
component_tuple<wall_clock, cpu_clock, cpu_util>
, kp_timemory_memory.so
will generate
a connector for component_tuple<peak_rss, page_rss, virtual_memory>
, etc.
For roofline capabilities, PAPI is used for generating the CPU roofline and CUPTI is used for generating the (NVIDIA) GPU roofline. Rooflines require running the application twice. First set
KOKKOS_ROOFLINE=ON
, setTIMEMORY_OUTPUT_PATH=<OUTPUT-DIR>
, run once withTIMEMORY_ROOFLINE_MODE=op
, and then a second time withTIMEMORY_ROOFLINE_MODE=ai
. At the end of the application, timemory will run roofline models to empirically determine the peak performance of the hardware. Then use thetimemory-roofline
command line tool and specify the files output during the run, e.g.timemory-roofline -t gpu_roofline -op timemory-output/gpu_roofline_counters.json -ai timemory-output/gpu_roofline_activity.json
.
Unless KOKKOS_ROOFLINE
is set to ON
, output will be located in timemory-output/<DATE-TIME>
.
The <DATE-TIME>
uses strftime
formatting specifications and can be altered via the TIMEMORY_TIME_FORMAT
env variable.
The default is "%F_%I.%M_%p"
.
SAND2017-3786