Skip site navigation (1)Skip section navigation (2)

FreeBSD Manual Pages

  
 
  

home | help
LLVMOPENMP(1)			  LLVM/OpenMP			 LLVMOPENMP(1)

NAME
       llvmopenmp - LLVM/OpenMP

       NOTE:
	  This document	is a work in progress and most of the expected content
	  is not yet available.	While you can expect changes, we  always  wel-
	  come	 feedback   and	  additions.  Please  contact,	e.g.,  through
	  openmp-dev@lists.llvm.org.

       OpenMP impacts various parts of the LLVM	project, from the frontends (-
       Clang and Flang), through middle-end optimizations, up to the multitude
       of available OpenMP runtimes.

       A high-level overview of	OpenMP in LLVM can be found here.

OPENMP IN LLVM --- DESIGN OVERVIEW
   Resources
       o OpenMP	 Booth	@  SC19:  "OpenMP   clang   and	  flang	  Development"
	 https://youtu.be/6yOa-hRi63M

   LLVM/OpenMP Runtimes
       There are four distinct types of	LLVM/OpenMP runtimes

   LLVM/OpenMP Host Runtime (libomp)
       An  early (2015)	design document	for the	LLVM/OpenMP host runtime, aka.
       libomp.so, is available as a pdf.

   LLVM/OpenMP Target Host Runtime (libomptarget)
   Environment Variables
       libomptarget uses environment variables to control  different  features
       of  the	library	at runtime. This allows	the user to obtain useful run-
       time information	as well	as enable or disable certain features. A  full
       list of supported environment variables is defined below.

	  o LIBOMPTARGET_DEBUG=<Num>

	  o LIBOMPTARGET_PROFILE=<Filename>

	  o LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD=<Num>

	  o LIBOMPTARGET_INFO=<Num>

   LIBOMPTARGET_DEBUG
       LIBOMPTARGET_DEBUG  controls  whether or	not debugging information will
       be displayed. This feature is only availible if libomptarget was	 built
       with  -DOMPTARGET_DEBUG.	 The debugging output provided is intended for
       use by libomptarget developers. More user-friendly output is  presented
       when using LIBOMPTARGET_INFO.

   LIBOMPTARGET_PROFILE
       LIBOMPTARGET_PROFILE  allows libomptarget to generate time profile out-
       put similar to Clang's -ftime-trace option. This	generates a JSON  file
       based on	Chrome Tracing that can	be viewed with chrome://tracing	or the
       Speedscope App. Building	this feature depends on	the LLVM  Support  Li-
       brary  for  time	trace output. Using this library is enabled by default
       when building using the CMake option OPENMP_ENABLE_LIBOMPTARGET_PROFIL-
       ING. The	output will be saved to	the filename specified by the environ-
       ment variable. For multi-threaded applications, profiling in libomp  is
       also needed. Setting the	CMake option OPENMP_ENABLE_LIBOMP_PROFILING=ON
       to enable the feature. Note that	this will turn libomp into a  C++  li-
       brary.

   LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD
       LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD sets the threshold	size for which
       the libomptarget	memory manager will handle the allocation. Any alloca-
       tions larger than this threshold	will not use the memory	manager	and be
       freed after the device kernel exits. The	 default  threshold  value  is
       8KB.  If	 LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD	is set to 0 the	memory
       manager will be completely disabled.

   LIBOMPTARGET_INFO
       LIBOMPTARGET_INFO allows	the user to request different types of runtime
       information from	libomptarget. LIBOMPTARGET_INFO	uses a 32-bit field to
       enable or disable different types of information. This includes	infor-
       mation  about  data-mappings and	kernel execution. It is	recommended to
       build your application with debugging information  enabled,  this  will
       enable filenames	and variable declarations in the information messages.
       OpenMP Debugging	information is enabled at any level of debugging so  a
       full  debug runtime is not required.  For minimal debugging information
       compile with -gline-tables-only,	or compile with	-g for full debug  in-
       formation. A full list of flags supported by LIBOMPTARGET_INFO is given
       below.

	  o Print all data arguments upon entering an  OpenMP  device  kernel:
	    0x01

	  o Indicate  when  a mapped address already exists in the device map-
	    ping table:	0x02

	  o Dump the contents of the device pointer map	at kernel exit:	0x04

	  o Print OpenMP kernel	information from device	plugins: 0x10

       Any combination of these	flags can be used by setting  the  appropriate
       bits. For example, to enable printing all data active in	an OpenMP tar-
       get region along	with CUDA information, run the following bash command.

	  $ env	LIBOMPTARGET_INFO=$((1 << 0x1 |	1 << 0x10)) ./your-application

       Or, to enable every flag	run with every bit set.

	  $ env	LIBOMPTARGET_INFO=-1 ./your-application

       For example, given a small application implementing the ZAXPY BLAS rou-
       tine,  Libomptarget  can	provide	useful information about data mappings
       and thread usages.

	  #include <complex>

	  using	complex	= std::complex<double>;

	  void zaxpy(complex *X, complex *Y, complex D,	std::size_t N) {
	  #pragma omp target teams distribute parallel for
	    for	(std::size_t i = 0; i <	N; ++i)
	      Y[i] = D * X[i] +	Y[i];
	  }

	  int main() {
	    const std::size_t N	= 1024;
	    complex X[N], Y[N],	D;
	  #pragma omp target data map(to:X[0 : N]) map(tofrom:Y[0 : N])
	    zaxpy(X, Y,	D, N);
	  }

       Compiling this code targeting nvptx64 with all information enabled will
       provide the following output from the runtime library.

	  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only zaxpy.cpp -o zaxpy
	  $ env	LIBOMPTARGET_INFO=-1 ./zaxpy

	  Info:	Device supports	up to 65536 CUDA blocks	and 1024 threads with a	warp size of 32
	  Info:	Entering OpenMP	data region at zaxpy.cpp:14:1 with 2 arguments:
	  Info:	to(X[0:N])[16384]
	  Info:	tofrom(Y[0:N])[16384]
	  Info:	OpenMP Host-Device pointer mappings after block	at zaxpy.cpp:14:1:
	  Info:	Host Ptr	   Target Ptr	      Size (B) RefCount	Declaration
	  Info:	0x00007fff963f4000 0x00007fd225004000 16384    1	Y[0:N] at zaxpy.cpp:13:17
	  Info:	0x00007fff963f8000 0x00007fd225000000 16384    1	X[0:N] at zaxpy.cpp:13:11
	  Info:	Entering OpenMP	kernel at zaxpy.cpp:6:1	with 4 arguments:
	  Info:	firstprivate(N)[8] (implicit)
	  Info:	use_address(Y)[0] (implicit)
	  Info:	tofrom(D)[16] (implicit)
	  Info:	use_address(X)[0] (implicit)
	  Info:	Mapping	exists (implicit) with HstPtrBegin=0x00007ffe37d8be80,
		TgtPtrBegin=0x00007f90ff004000,	Size=0,	updated	RefCount=2, Name=Y
	  Info:	Mapping	exists (implicit) with HstPtrBegin=0x00007ffe37d8fe80,
		TgtPtrBegin=0x00007f90ff000000,	Size=0,	updated	RefCount=2, Name=X
	  Info:	Launching kernel __omp_offloading_fd02_c2c4ac1a__Z5daxpyPNSt3__17complexIdEES2_S1_m_l6
		with 8 blocks and 128 threads in SPMD mode
	  Info:	OpenMP Host-Device pointer mappings after block	at zaxpy.cpp:6:1:
	  Info:	Host Ptr	   Target Ptr	      Size (B) RefCount	Declaration
	  Info:	0x00007fff963f4000 0x00007fd225004000 16384    1	Y[0:N] at zaxpy.cpp:13:17
	  Info:	0x00007fff963f8000 0x00007fd225000000 16384    1	X[0:N] at zaxpy.cpp:13:11
	  Info:	Exiting	OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
	  Info:	to(X[0:N])[16384]
	  Info:	tofrom(Y[0:N])[16384]

       From  this  information,	we can see the OpenMP kernel being launched on
       the CUDA	device with enough threads and blocks for all 1024  iterations
       of  the	loop  in simplified SPMD Mode. The information from the	OpenMP
       data region shows the two arrays	X and Y	being copied from the host  to
       the  device. This creates an entry in the host-device mapping table as-
       sociating the host pointers to the newly	created	device data. The  data
       mappings	 in  the  OpenMP device	kernel show the	default	mappings being
       used for	all the	variables used implicitly on the device. Because X and
       Y are already mapped in the device's table, no new entries are created.
       Additionally, the default mapping shows that D will be copied back from
       the  device  once  the  OpenMP device kernel region ends	even though it
       isn't written to. Finally, at the end of	the OpenMP data	region the en-
       tries for X and Y are removed from the table.

   Errors:
       libomptarget  provides error messages when the program fails inside the
       OpenMP target region. Common causes of  failure	could  be  an  invalid
       pointer access, running out of device memory, or	trying to offload when
       the device is busy. If the application was built	with debugging symbols
       the error messages will additionally provide the	source location	of the
       OpenMP target region.

       For example, consider the following code	that implements	a simple  par-
       allel  reduction	on the GPU. This code has a bug	that causes it to fail
       in the offloading region.

	  #include <cstdio>

	  double sum(double *A,	std::size_t N) {
	    double sum = 0.0;
	  #pragma omp target teams distribute parallel for reduction(+:sum)
	    for	(int i = 0; i <	N; ++i)
	      sum += A[i];

	    return sum;
	  }

	  int main() {
	    const int N	= 1024;
	    double A[N];
	    sum(A, N);
	  }

       If this code is compiled	and run, there will be an error	message	 indi-
       cating what is going wrong.

	  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o	sum
	  $ ./sum

	  CUDA error: Error when copying data from device to host.
	  CUDA error: an illegal memory	access was encountered
	  Libomptarget error: Copying data from	device failed.
	  Libomptarget error: Call to targetDataEnd failed, abort target.
	  Libomptarget error: Failed to	process	data after launching the kernel.
	  Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings.
	  sum.cpp:5:1: Libomptarget error 1: failure of	target construct while offloading is mandatory

       This  shows  that there is an illegal memory access occuring inside the
       OpenMP target region once execution has moved to	the CUDA device,  sug-
       gesting	a  segmentation	 fault.	 This  then causes a chain reaction of
       failures	in libomptarget. Another message suggests using	the LIBOMPTAR-
       GET_INFO	environment variable as	described in Environment Variables. If
       we do this it will print	the sate of the	host-target  pointer  mappings
       at the time of failure.

	  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o	sum
	  $ env	LIBOMPTARGET_INFO=4 ./sum

	  info:	OpenMP Host-Device pointer mappings after block	at sum.cpp:5:1:
	  info:	Host Ptr	   Target Ptr	      Size (B) RefCount	Declaration
	  info:	0x00007ffc058280f8 0x00007f4186600000 8	       1	sum at sum.cpp:4:10

       This tells us that the only data	mapped between the host	and the	device
       is the sum variable that	will be	copied back from the device  once  the
       reduction  has ended. There is no entry mapping the host	array A	to the
       device. In this situation, the compiler cannot determine	 the  size  of
       the  array at compile time so it	will simply assume that	the pointer is
       mapped on the device already by default.	The solution is	to add an  ex-
       plicit map clause in the	target region.

	  double sum(double *A,	std::size_t N) {
	    double sum = 0.0;
	  #pragma omp target teams distribute parallel for reduction(+:sum) map(to:A[0 : N])
	    for	(int i = 0; i <	N; ++i)
	      sum += A[i];

	    return sum;
	  }

   OpenMP in LLVM --- Offloading Design
   OpenMP Target Offloading ---	SPMD Mode
   OpenMP Target Offloading ---	Generic	Mode
   LLVM/OpenMP Target Host Runtime Plugins (libomptarget.rtl.XXXX)
   Remote Offloading Plugin:
       The remote offloading plugin permits the	execution of OpenMP target re-
       gions on	devices	in remote hosts	in addition to the  devices  connected
       to  the	local  host. All target	devices	on the remote host will	be ex-
       posed to	the application	as if they were	local devices,	that  is,  the
       remote  host  CPU  or its GPUs can be offloaded to with the appropriate
       device number. If the server is running on the same host,  each	device
       may  be	identified  twice:  once  through  the device plugins and once
       through the device plugins that the server application has access to.

       This plugin consists  of	 libomptarget.rtl.rpc.so  and  openmp-offload-
       ing-server which	should be running on the (remote) host.	The server ap-
       plication does not have to be running on	a remote host, and can instead
       be  used	 on  the same host in order to debug memory mapping during of-
       floading.  These	are implemented	via gRPC/protobuf so  these  libraries
       are  required  to  build	and use	this plugin. The server	must also have
       access to the necessary target-specific plugins in order	to perform the
       offloading.

       Due  to	the experimental nature	of this	plugin,	the CMake variable LI-
       BOMPTARGET_ENABLE_EXPERIMENTAL_REMOTE_PLUGIN must be set	 in  order  to
       build  this  plugin.  For example, the rpc plugin is not	designed to be
       thread-safe, the	server cannot concurrently handle offloading from mul-
       tiple applications at once (it is synchronous) and will terminate after
       a single	execution. Note	that openmp-offloading-server is unable	to re-
       mote  offload  onto  a remote host itself and will error	out if this is
       attempted.

       Remote offloading is configured via environment variables at runtime of
       the OpenMP application:

	      o	LIBOMPTARGET_RPC_ADDRESS=<Address>:<Port>

	      o	LIBOMPTARGET_RPC_ALLOCATOR_MAX=<NumBytes>

	      o	LIBOMPTARGET_BLOCK_SIZE=<NumBytes>

	      o	LIBOMPTARGET_RPC_LATENCY=<Seconds>

   LIBOMPTARGET_RPC_ADDRESS
       The  address  and port at which the server is running. This needs to be
       set for the server and the application, the default is 0.0.0.0:50051. A
       single OpenMP executable	can offload onto multiple remote hosts by set-
       ting this to comma-seperated values of the addresses.

   LIBOMPTARGET_RPC_ALLOCATOR_MAX
       After allocating	this size, the protobuf	allocator will clear. This can
       be set for both endpoints.

   LIBOMPTARGET_BLOCK_SIZE
       This  is	 the  maximum  size  of	 a single message while	streaming data
       transfers between the two endpoints and can be set for both endpoints.

   LIBOMPTARGET_RPC_LATENCY
       This is the maximum amount of time the client will wait for a  response
       from the	server.

   LLVM/OpenMP Target Device Runtime (libomptarget-ARCH-SUBARCH.bc)
       LLVM,  since version 11 (12 Oct 2020), has an OpenMP-Aware optimization
       pass as well as the ability to perform  "scalar	optimizations"	across
       OpenMP region boundaries.

       In-depth	discussion of the topic	can be found here.

OPENMP OPTIMIZATIONS IN	LLVM
       LLVM,  since version 11 (12 Oct 2020), has an OpenMP-Aware optimization
       pass as well as the ability to perform  "scalar	optimizations"	across
       OpenMP region boundaries.

   OpenMP-Aware	Optimizations
   OpenMPOpt
   Resources
       o 2020  LLVM Developersa	Meeting: "(OpenMP) Parallelism-Aware Optimiza-
	 tions"	https://youtu.be/gtxWkeLCxmU

       o 2019  EuroLLVM	 Developersa  Meeting:	"Compiler  Optimizations   for
	 (OpenMP) Target Offloading to GPUs" https://youtu.be/3AbS82C3X30

   OpenMP-Unaware Optimizations
   Resources
       o 2018  LLVM  Developersa  Meeting: "Optimizing Indirections, using ab-
	 stractions without remorse" https://youtu.be/zfiHaPaoQPc

       o 2019 LLVM Developersa	Meeting:  "The	Attributor:  A	Versatile  In-
	 ter-procedural		 Fixpoint	  Iteration	    Framework"
	 https://youtu.be/CzWkc_JcfS0

       LLVM has	an elaborate ecosystem around analysis	and  optimization  re-
       marks  issues  during  compilation. The remarks can be enabled from the
       clang frontend [1] [2] in various formats [3] [4] to be used by	tools,
       i.a., opt-viewer	or llvm-opt-report (dated).

       The  OpenMP  optimizations in LLVM have been developed with remark sup-
       port as a priority. For a list of OpenMP	specific remarks and more  in-
       formation on them, please refer to remarks/OptimizationRemarks.

       o [1]
	 https://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports

       o [2]
	 https://clang.llvm.org/docs/ClangCommandLineReference.html#diagnostic-flags

       o [3]
	 https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-foptimization-record-file

       o [4]
	 https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang1-fsave-optimization-record

OPENMP OPTIMIZATION REMARKS
       The OpenMP-Aware	optimization pass is able to generate compiler remarks
       for  performed  and  missed   optimisations.   To   emit	  them,	  pass
       -Rpass=openmp-opt,	     -Rpass-analysis=openmp-opt,	   and
       -Rpass-missed=openmp-opt	to the Clang invocation.  For more information
       and  features  of  the  remark system the clang documentation should be
       consulted:

       o Clang options to emit optimization reports

       o Clang diagnostic and remark flags

       o The	   -foptimization-record-file	    flag	and	   the
	 -fsave-optimization-record flag

       Some  OpenMP remarks start with a "tag",	like [OMP100], which indicates
       that there is further information about them on this page. To  directly
       jump	 to	 the	  respective	  entry,      navigate	    to
       https://openmp.llvm.org/docs/remarks/OptimizationRemarks.html#ompXXX
       where XXX is the	three digit code shown in the tag.

					----

   [OMP100] Potentially	unknown	OpenMP target region caller
       A function remark that indicates	the function, when compiled for	a GPU,
       is potentially called from outside the translation unit.	 Note  that  a
       remark  is  only	 issued	 if  we	tried to perform an optimization which
       would require us	to know	all callers on the GPU.

       To facilitate OpenMP semantics on GPUs we provide a  runtime  mechanism
       through	which  the code	that makes up the body of a parallel region is
       shared with the threads in the team. Generally we use  the  address  of
       the outlined parallel region to identify	the code that needs to be exe-
       cuted. If we know all target regions that reach the parallel region  we
       can  avoid  this	 function pointer passing scheme and often improve the
       register	usage on the GPU. However, If a	parallel region	on the GPU  is
       in  a function with external linkage we may not know all	callers	stati-
       cally. If there are outside callers within target regions, this	remark
       is  to  be  ignored. If there are no such callers, users	can modify the
       linkage and  thereby  help  optimization	 with  a  static  or  __attri-
       bute__((internal))  function annotation.	If changing the	linkage	is im-
       possible, e.g., because there are outside callers on the	host, one  can
       split the function into an external visible interface which is not com-
       piled for the target and	an internal implementation which  is  compiled
       for the target and should be called from	within the target region.

       Dealing	with  OpenMP can be complicated. For help with the setup of an
       OpenMP (offload)	capable	compiler  toolchain,  its  usage,  and	common
       problems, consult the Support and FAQ page.

       We  also	 encourage  everyone  interested  in OpenMP in LLVM to get in-
       volved.

SUPPORT, GETTING INVOLVED, AND FAQ
       Please do not hesitate to reach out to us via openmp-dev@lists.llvm.org
       or join one of our regular calls. Some common questions are answered in
       the FAQ.

   Calls
   OpenMP in LLVM Technical Call
       o Development updates on	OpenMP (and OpenACC) in	the LLVM Project,  in-
	 cluding Clang,	optimization, and runtime work.

       o Join OpenMP in	LLVM Technical Call.

       o Time: Weekly call on every Wednesday 7:00 AM Pacific time.

       o Meeting minutes are here.

       o Status	tracking page.

   OpenMP in Flang Technical Call
       o Development updates on	OpenMP and OpenACC in the Flang	Project.

       o Join OpenMP in	Flang Technical	Call

       o Time: Weekly call on every Thursdays 8:00 AM Pacific time.

       o Meeting minutes are here.

       o Status	tracking page.

   FAQ
       NOTE:
	  The  FAQ  is	a work in progress and most of the expected content is
	  not yet available. While you can expect changes, we  always  welcome
	  feedback    and    additions.	   Please   contact,   e.g.,   through
	  openmp-dev@lists.llvm.org.

   Q: How to contribute	a patch	to the webpage or any other part?
       All patches go through the regular LLVM review process.

   Q: How to build an OpenMP offload capable compiler?
       To build	an effective OpenMP offload capable compiler, only  one	 extra
       CMake  option,  LLVM_ENABLE_RUNTIMES="openmp",  is needed when building
       LLVM (Generic information about	building  LLVM	is  available  here.).
       Make  sure  all	backends that are targeted by OpenMP to	be enabled. By
       default,	Clang will be built with all backends enabled.

       If your build machine is	not the	target machine or automatic  detection
       of the available	GPUs failed, you should	also set:

       o CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX where XX	is the architecture of
	 your GPU, e.g,	80.

       o LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY where YY  is  the  numeric
	 compute capacity of your GPU, e.g., 75.

       NOTE:
	  The  compiler	 that  generates  the  offload code should be the same
	  (version) as the compiler that builds	the  OpenMP  device  runtimes.
	  The OpenMP host runtime can be built by a different compiler.

   Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
       For  now,  the  answer is most likely no. Please	see Q: How to build an
       OpenMP offload capable compiler?.

   Q: Does OpenMP offloading support work in packages distributed as  part  of
       my OS?
       For  now,  the  answer is most likely no. Please	see Q: How to build an
       OpenMP offload capable compiler?.

   Q: Does Clang support _math.h_ and _complex.h_ operations in	OpenMP	target
       on GPUs?
       Yes,  LLVM/Clang	allows math functions and complex arithmetic inside of
       OpenMP target regions that are compiled for GPUs.

       Clang provides a	set of wrapper	headers	 that  are  found  first  when
       math.h  and  complex.h,	for  C,	cmath and complex, for C++, or similar
       headers are included by the application.	These wrappers will eventually
       include	the system version of the corresponding	header file after set-
       ting up a target	device specific	environment. The fact that the	system
       header is included is important because they differ based on the	archi-
       tecture and operating system and	may  contain  preprocessor,  variable,
       and function definitions	that need to be	available in the target	region
       regardless of the targeted device architecture.	However, various func-
       tions  may  require  specialized	device versions, e.g., sin, and	others
       are only	available on certain devices,  e.g.,  __umul64hi.  To  provide
       "native"	 support  for math and complex on the respective architecture,
       Clang will wrap the "native" math functions, e.g., as provided  by  the
       device  vendor, in an OpenMP begin/end declare variant. These functions
       will then be picked up instead of the host  versions  while  host  only
       variables  and function definitions are still available.	Complex	arith-
       metic and functions are support through	a  similar  mechanism.	It  is
       worth  noting  that  this support requires extensions to	the OpenMP be-
       gin/end declare variant	context	 selector  that	 are  exposed  through
       LLVM/Clang to the user as well.

   Q: What is a	way to debug errors from mapping memory	to a target device?
       An  experimental	way to debug these errors is to	use remote process of-
       floading.   By  using   libomptarget.rtl.rpc.so	 and   openmp-offload-
       ing-server,  it	is possible to explicitly perform memory transfers be-
       tween processes on the host CPU and run sanitizers while	 doing	so  in
       order to	catch these errors.

       The current (in-progress) release notes can be found here while release
       notes for releases, starting with LLVM 12, will	be  available  on  the
       Download	Page.

OPENMP 12.0.0 RELEASE NOTES
       WARNING:
	  These	 are  in-progress  notes for the upcoming LLVM 12.0.0 release.
	  Release notes	for previous releases can be  found  on	 the  Download
	  Page.

   Introduction
       This  document  contains	 the release notes for the OpenMP runtime, re-
       lease 12.0.0.  Here we describe the status of OpenMP,  including	 major
       improvements  from  the	previous  release.  All	OpenMP releases	may be
       downloaded from the LLVM	releases web site.

   Non-comprehensive list of changes in	this release
       o Extended the libomptarget API functions to  include  source  location
	 information  and OpenMP target	mapper support.	This allows libomptar-
	 get to	know the source	location of the	OpenMP region it is executing,
	 as well as the	name and declarations of all the variables used	inside
	 the region. Each function generated now uses its mapper variant.  The
	 old  API calls	now call into the new API functions with nullptr argu-
	 ments for backwards compatibility with	old binaries. Source  location
	 information  for  libomptarget	is now generated by Clang at any level
	 of debugging information.

       o Added improved	error messages for libomptarget	and CUDA plugins.  Er-
	 ror messages are now presented	without	requiring a debug build	of li-
	 bomptarget. The newly added source location information can  also  be
	 used  to identify which OpenMP	target region the failure occurred in.
	 More information can be found here.

       o Added additional environment variables	to control output from the li-
	 bomptarget  runtime  library.	LIBOMPTARGET_PROFILE  to generate time
	 profile output	similar	to Clang's  -ftime-trace  option.   LIBOMPTAR-
	 GET_MEMORY_MANAGER_THRESHOLD  sets  the  threshold size for which the
	 libomptarget memory manager will handle the  allocation.   LIBOMPTAR-
	 GET_INFO  allows the user to request certain information from the li-
	 bomptarget runtime using a 32-bit field. A full description  of  each
	 environment variable is described here.

       o target	 nowait	 was supported via hidden helper task, which is	a task
	 not bound to any parallel region. A hidden helper team	with a	number
	 of  threads  is  created when the first hidden	helper task is encoun-
	 tered.	The number of threads can be configured	 via  the  environment
	 variable LIBOMP_NUM_HIDDEN_HELPER_THREADS. By default it is 8.	If LI-
	 BOMP_NUM_HIDDEN_HELPER_THREADS=0, hidden helper task is disabled  and
	 falls	back to	a regular OpenMP task. It can also be disabled by set-
	 ting the environment variable LIBOMP_USE_HIDDEN_HELPER_TASK=OFF.

       o deviceRTLs for	NVPTX platform is  CUDA	 free  now.  It	 is  generally
	 OpenMP	  code.	   Target   dependent	parts	are  implemented  with
	 Clang/LLVM/NVVM intrinsics. CUDA SDK is also dropped as a  dependence
	 to  build  the	device runtime,	which means device runtime can also be
	 built on a CUDA free system. However, it is disabled by default.  Set
	 the  CMake  variable  LIBOMPTARGET_BUILD_NVPTX_BCLIB=ON to enable the
	 build of NVPTX	device runtime on a CUDA free system. gcc-multilib and
	 g++-multilib  are required. If	CUDA is	found, the device runtime will
	 be built by default.

	 o Static NVPTX	 device	 runtime  library  (libomptarget-nvptx.a)  was
	   dropped.

	 A  bitcode library is required	to build an OpenMP program. If the li-
	 brary is not found in the default path	or any of the paths defined by
	 LIBRARY_PATH, an error	will be	raised.	User can also specify the path
	 to the	bitcode	device library via --libomptarget-nvptx-bc-path=.

AUTHOR
       unknown

COPYRIGHT
       2013-2021, LLVM/OpenMP

				 Sep 21, 2021			 LLVMOPENMP(1)

NAME | OPENMP IN LLVM --- DESIGN OVERVIEW | OPENMP OPTIMIZATIONS IN LLVM | OPENMP OPTIMIZATION REMARKS | SUPPORT, GETTING INVOLVED, AND FAQ | OPENMP 12.0.0 RELEASE NOTES | AUTHOR | COPYRIGHT

Want to link to this manual page? Use this URL:
<https://www.freebsd.org/cgi/man.cgi?query=llvmopenmp12&sektion=1&manpath=FreeBSD+13.0-RELEASE+and+Ports>

home | help