Skip to content

Latest commit

 

History

History
496 lines (458 loc) · 74.5 KB

OpenMPSupport.rst

File metadata and controls

496 lines (458 loc) · 74.5 KB
.none { background-color: #FFCCCC } .part { background-color: #FFFF99 } .good { background-color: #CCFF99 }

Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2. Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS. OMPT is also supported for NVIDIA and AMD GPUs.

For the list of supported features from OpenMP 5.0 and 5.1 see OpenMP implementation details and OpenMP 51 implementation details.

  • New collapse clause scheme to avoid expensive remainder operations. Compute loop index variables after collapsing a loop nest via the collapse clause by replacing the expensive remainder operation with multiplications and additions.
  • When using the collapse clause on a loop nest the default behavior is to automatically extend the representation of the loop counter to 64 bits for the cases where the sizes of the collapsed loops are not known at compile time. To prevent this conservative choice and use at most 32 bits, compile your program with the -fopenmp-optimistic-collapse.

Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions. Often, the optimizer is able to reduce the cost of Generic mode to the level of Cuda mode, but the flag, as well as other assumption flags, can be used for tuning.

  • Cancellation constructs are not supported.
  • Doacross loop nest is not supported.
  • User-defined reductions are supported only for trivial types.
  • Nested parallelism: inner parallel regions are executed sequentially.
  • Debug information for OpenMP target regions is supported, but sometimes it may be required to manually specify the address class of the inspected variables. In some cases the local variables are actually allocated in the global memory, but the debug info may be not aware of it.

The following table provides a quick overview over various OpenMP 5.0 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

CategoryFeatureStatusReviews
loopsupport != in the canonical loop formdoneD54441
loop#pragma omp loop (directive)partialD145823 (combined forms)
loop#pragma omp loop bindworked onD144634 (needs review)
loopcollapse imperfectly nested loopdone 
loopcollapse non-rectangular nested loopdone 
loopC++ range-base for loopdone 
loopclause: if for SIMD directivesdone 
loopinclusive scan (matching C++17 PSTL)done 
memory managementmemory allocatorsdoner341687,r357929
memory managementallocate directive and allocate clausedoner355614,r335952
OMPDOMPD interfacesdonehttps://reviews.llvm.org/D99914 (Supports only HOST(CPU) and Linux
OMPTOMPT interfaces (callback support)done 
thread affinitythread affinitydone 
tasktaskloop reductiondone 
tasktask affinitynot upstreamhttps://github.com/jklinkenberg/openmp/tree/task-affinity
taskclause: depend on the taskwait constructdoneD113540 (regular codegen only)
taskdepend objects and detachable tasksdone 
taskmutexinoutset dependence-type for tasksdoneD53380,D57576
taskcombined taskloop constructsdone 
taskmaster taskloopdone 
taskparallel master taskloopdone 
taskmaster taskloop simddone 
taskparallel master taskloop simddone 
SIMDatomic and simd constructs inside SIMD codedone 
SIMDSIMD nontemporaldone 
deviceinfer target functions from initializersworked on 
deviceinfer target variables from initializersdoneD146418
deviceOMP_TARGET_OFFLOAD environment variabledoneD50522
devicesupport full 'defaultmap' functionalitydoneD69204
devicedevice specific functionsdone 
deviceclause: device_typedone 
deviceclause: extended devicedone 
deviceclause: uses_allocators clausedone 
deviceclause: in_reductionworked onr308768
deviceomp_get_device_num()doneD54342,D128347
devicestructure mapping of referencesunclaimed 
devicenested target declaredoneD51378
deviceimplicitly map 'this' (this[:1])doneD55982
deviceallow access to the reference count (omp_target_is_present)done 
devicerequires directivedone 
deviceclause: unified_shared_memorydoneD52625,D52359
deviceclause: unified_addresspartial 
deviceclause: reverse_offloadpartialD52780,D155003
deviceclause: atomic_default_mem_orderdoneD53513
deviceclause: dynamic_allocatorsunclaimed partsD53079
deviceuser-defined mappersdoneD56326,D58638,D58523,D58074,D60972,D59474
devicemap array-section with implicit mapperdonellvm#101101
devicemapping lambda expressiondoneD51107
deviceclause: use_device_addr for target datadone 
devicesupport close modifier on map clausedoneD55719,D55892
deviceteams construct on the host devicedoner371553
devicesupport non-contiguous array sections for target updatedone 
devicepointer attachmentdone 
atomichints for the atomic constructdoneD51233
base languageC11 supportdone 
base languageC++11/14/17 supportdone 
base languagelambda supportdone 
miscarray shapingdoneD74144
misclibrary shutdown (omp_pause_resource[_all])doneD55078
miscmetadirectivesmostly doneD91944
miscconditional modifier for lastprivate clausedone 
misciterator and multidependencesdone 
miscdepobj directive and depobj dependency kinddone 
miscuser-defined function variantsdone.D67294, D64095, D71847, D71830, D109635
miscpointer/reference to pointer based array reductionsdone 
miscprevent new type definitions in clausesdone 
memory modelmemory model update (seq_cst, acq_rel, release, acquire,...)done 

The following table provides a quick overview over various OpenMP 5.1 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

CategoryFeatureStatusReviews
atomic'compare' clause on atomic constructdoneD120290, D120007, D118632, D120200, D116261, D118547, D116637
atomic'fail' clause on atomic constructworked onD123235 (in progress)
base languageC++ attribute specifier syntaxdoneD105648
device'present' map type modifierdoneD83061, D83062, D84422
device'present' motion modifierdoneD84711, D84712
device'present' in defaultmap clausedoneD92427
devicemap clause reordering based on 'present' modifierunclaimed 
devicedevice-specific environment variablesunclaimed 
deviceomp_target_is_accessible routineunclaimed 
deviceomp_get_mapped_ptr routinedoneD141545
devicenew async target memory copy routinesdoneD136103
devicethread_limit clause on target constructpartialD141540 (offload), D152054 (host, in progress)
devicehas_device_addr clause on target constructunclaimed 
deviceiterators in map clause or motion clausesunclaimed 
deviceindirect clause on declare target directiveunclaimed 
deviceallow virtual functions calls for mapped object on devicepartial 
deviceinterop constructpartialparsing/sema done: D98558, D98834, D98815
deviceassorted routines for querying interoperable propertiespartialD106674
loopLoop tiling transformationdoneD76342
loopLoop unrolling transformationdoneD99459
loop'reproducible'/'unconstrained' modifiers in 'order' clausepartialD127855
memory managementalignment for allocate directive and clausedoneD115683
memory management'allocator' modifier for allocate clausedonellvm#114883
memory management'align' modifier for allocate clausedonellvm#121814
memory managementnew memory management routinesunclaimed 
memory managementchanges to omp_alloctrait_key enumunclaimed 
memory modelseq_cst clause on flush constructdonellvm#114072
misc'omp_all_memory' keyword and use in 'depend' clausedoneD125828, D126321
miscerror directivedoneD139166
miscscope constructdoneD157933, llvm#109197
miscroutines for controlling and querying team regionspartialD95003 (libomp only)
miscchanges to ompt_scope_endpoint_t enumunclaimed 
miscomp_display_env routinedoneD74956
miscextended OMP_PLACES syntaxunclaimed 
miscOMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env varsdoneD138769
misc'target_device' selector in context specifierworked on 
miscbegin/end declare variantdoneD71179
miscdispatch construct and function variant argument adjustmentworked onD99537, D99679
miscassumes directivesworked on 
miscassume directivedone 
miscnothing directivedoneD123286
miscmasked construct and related combined constructsdoneD99995, D100514, PR-121741(parallel_masked_taskloop) PR-121746(parallel_masked_task_loop_simd),PR-121914(masked_taskloop) PR-121916(masked_taskloop_simd)
miscdefault(firstprivate) & default(private)doneD75591 (firstprivate), D125912 (private)
otherdeprecating master constructunclaimed 
OMPTnew barrier types added to ompt_sync_region_t enumunclaimed 
OMPTasync data transfers added to ompt_target_data_op_t enumunclaimed 
OMPTnew barrier state values added to ompt_state_t enumunclaimed 
OMPTnew 'emi' callbacks for external monitoring interfacesdone 
OMPTdevice tracing interfaceunclaimed 
task'strict' modifier for taskloop constructunclaimed 
taskinoutset in depend clausedoneD97085, D118383
tasknowait clause on taskwaitpartialparsing/sema done: D131830, D141531

The following table provides a quick overview over various OpenMP 6.0 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

FeatureC/C++ StatusFortran StatusReviews
free-agent threadsunclaimedunclaimed 
threadset clause:worked onunclaimed 
Recording of task graphsunclaimedunclaimed 
Parallel inductionsunclaimedunclaimed 
init_complete for scan directiveunclaimedunclaimed 
Loop transformation constructsunclaimedunclaimed 
loop stripe transformationdonellvm#119891
work distribute constructunclaimedunclaimed 
task_iterationunclaimedunclaimed 
memscope clause for atomic and flushunclaimedunclaimed 
transparent clause (hull tasks)unclaimedunclaimed 
rule-based compound directivesunclaimedunclaimed 
C23, C++23unclaimedunclaimed 
Fortran 2023unclaimedunclaimed 
decl attribute for declarative directivesunclaimedunclaimed 
C attribute syntaxunclaimedunclaimed 
pure directives in DO CONCURRENTunclaimedunclaimed 
Optional argument for all clausesunclaimedunclaimed 
Function references for locator list itemsunclaimedunclaimed 
All clauses accept directive name modifierunclaimedunclaimed 
Extensions to depobj constructunclaimedunclaimed 
Extensions to atomic constructunclaimedunclaimed 
Private reductionspartialunclaimedParse/Sema:llvm#129938
Self mapspartialunclaimedparsing/sema done: llvm#129888
Release map type for declare mapperunclaimedunclaimed 
Extensions to interop constructunclaimedunclaimed 
no_openmp_constructsdoneunclaimedllvm#125933
safe_sync and progress with identifier and APIunclaimedunclaimed 
OpenMP directives in concurrent loop regionsdoneunclaimedllvm#125621
atomics constructs on concurrent loop regionsdoneunclaimedllvm#125621
Loop construct with DO CONCURRENTunclaimedunclaimed 
device_type clause for target constructunclaimedunclaimed 
nowait for ancestor target directivesunclaimedunclaimed 
New API for devices' num_teams/thread_limitunclaimedunclaimed 
Host and device environment variablesunclaimedunclaimed 
num_threads ICV and clause accepts listunclaimedunclaimed 
Numeric names for environment variablesunclaimedunclaimed 
Increment between places for OMP_PLACESunclaimedunclaimed 
OMP_AVAILABLE_DEVICES envirableunclaimedunclaimed 
Traits for default device envirableunclaimedunclaimed 
Optionally omit array length expressionunclaimedunclaimed 
Canonical loop sequencesunclaimedunclaimed 
Clarifications to Fortran map semanticsunclaimedunclaimed 
default clause at target constructunclaimedunclaimed 
ref count update use_device_{ptr, addr}unclaimedunclaimed 
Clarifications to implicit reductionsunclaimedunclaimed 
ref modifier for map clausesunclaimedunclaimed 
map-type modifiers in arbitrary positiondoneunclaimedllvm#90499
Lift nesting restriction on concurrent loopdoneunclaimedllvm#125621
priority clause for target constructsunclaimedunclaimed 
changes to target_data constructunclaimedunclaimed 
Non-const do_not_sync for nowait/nogroupunclaimedunclaimed 

The following table provides a quick overview over various OpenMP extensions and their implementation status. These extensions are not currently defined by any standard, so links to associated LLVM documentation are provided. As these extensions mature, they will be considered for standardization. Please post on the Discourse forums (Runtimes - OpenMP category) to provide feedback.

CategoryFeatureStatusReviews
atomic extension'atomic' strictly nested within 'teams'prototypedD126323
device extension'ompx_hold' map type modifierprototypedD106509, D106510
device extension'ompx_bare' clause on 'target teams' constructprototyped#66844, #70612
device extensionMulti-dim 'num_teams' and 'thread_limit' clause on 'target teams ompx_bare' constructpartial#99732, #101407, #102715
close