These functions from C++ ABI are defined in
compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp and are supposed to
replace implementations from libstdc++/libc++abi.
We need to export them similar to why we need to export other
interceptors and TSan runtime functions - e.g. if a dlopen-ed shared
library depends on `__cxa_guard_acquire`, it needs to pick up the
exported definition from the TSan runtime that was linked into the main
executable calling the dlopen()
However, because the `__cxa_guard_` functions don't use traditional
interceptor machinery, they are omitted from the auto-generated
`libclang_rt.tsan.a.syms` files. Fix this by adding them to
tsan.syms.extra file explicitly.
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
While figuring out how to perform an atomic exchange on a memref, I
tried the generic atomic rmw with the yielded value captured from the
enclosing scope (instead of a plain atomic_rmw with
`arith::AtomicRMWKind::assign`). Instead of segfaulting, this PR changes
the pass to produce an error when the result is not found in the
region's IR map.
It might be more useful to give a suggestion to the user, but giving an
error message instead of a crash is at least an imrovement, I think.
See: #172184
Add AssumeSingleUse (default = false) argument to mayFoldIntoVector to
allow us to match combineVectorSizedSetCCEquality behaviour with
AssumeSingleUse=true
Hopefully we can drop the AssumeSingleUse entirely soon, but there are a
number of messy test regressions that need handling first
Typedef/using declarations in structs and classes were not created with
the native PDB plugin. The following would only create `Foo` and
`Foo::Bar`:
```cpp
struct Foo {
struct Bar {};
using Baz = Bar;
using Int = int;
};
```
With this PR, they're created. One complication is that typedefs and
nested types show up identical. The example from above gives:
```
0x1006 | LF_FIELDLIST [size = 40, hash = 0x2E844]
- LF_NESTTYPE [name = `Bar`, parent = 0x1002]
- LF_NESTTYPE [name = `Baz`, parent = 0x1002]
- LF_NESTTYPE [name = `Int`, parent = 0x0074 (int)]
```
To distinguish nested types and typedefs, we check if the parent of a
type is equal to the current one (`parent(0x1002) == 0x1006`) and if the
basename matches the nested type name.
Commit c6f501d479 fixed an issue where some tests were incorrectly
marked as unsupported for a bootstrapping build. This exposed in our
'slow' full-bootstrap qemu-system CI that the fdr-mode.cpp fails on
RISC-V. We mark it as unsupported. I believe _xray_ArgLoggerEntry needs
to be implemented in xray_trampoline_risc*.S for this to work.
Noticed while trying to replace the IsVectorBitCastCheap helper with
mayFoldIntoVector (still some work to do as we have a number of multiuse
cases) - technically its possible for a extload to reach this point.
This is the last of the generic instructions created from MVE
intrinsics. It was a little more awkward than the others due to it
taking a Type as one of the arguments. This creates a new function to
create the intrinsic we need.
Plugins appear to be broken on AIX, CI fails. There is logic in
CMakeLists for plugins+AIX, but it was never tested before...
Note: when plugins work, also enable tests in Examples/IRTransforms.
There's no Sparc support for JIT tests, so disable the JIT tests in the
examples (copied from ExecutionEngine/lit.local.cfg).
Following the suggestions in #170061, I replaced `SmallVector<SDValue>`
with `std::array<SDValue, NumPatterns>` and `SmallBitVector` with
`Bitset<NumPatterns>`.
I had to make some changes to the `collectLeaves` and
`reassociatableMatchHelper` functions. In `collectLeaves` specifically,
I changed the return type so I could propagate a failure in case the
number of found leaves is greater than the number of expected patterns.
I also added a new unit test that, together with the one already present
in the previous line, checks if the matching fails in the cases where
the number of patterns is less or more than the number of leaves.
I don't think this is going to completely address the increased compile
time reported in #169644, but hopefully it leads to an improvement.
For some reason, the current `std::barrier`'s wait implementation polls
the underlying atomic in a loop with sleeps instead of using the native
wait.
This change should also indirectly fix the performance issue of
`std::barrier` described in
https://github.com/llvm/llvm-project/issues/123855.
Fixes#123855
Its current effect is to use the `rootUri` provided by the client as the
working directory for fallback commands.
Future effects will include other behaviors appropriate for cases
where clangd is being run in the context of editing a single
workspace, such as using the `compile_commands.json` file in the
workspace root for all opened files.
The flag is hidden until other behaviors are implemented and they
constitute a cohesive "mode" for users.
Original PR in #155905, which was reverted due to a UBSan failure
that is fixed in this version.
It looks like these were copied from fp16 tests, and forgot to update the
intrinsic types. Also remove some old definitions that are no longer required.
This reverts commit 2612dc9b5f but keeps
`Predicates = [HasNDD]` removed.
There are two issues identified related to the change. One is
INSERT_SUBREG cannot guarantee source and dest to be the same register.
It mostly happens on O0. The other one is zero_extend is not a chain
node, as a result, we will lose the chain for SETZUCCr.
If we do not collect the diagnostics from the
CollectDiagnosticsToStringScope, even when the named_sequence applied
successfully, the Scope object's destructor will assert (with a
unhelpful message).
This comment was written 12 years ago. It's no longer correct to say
that diagnostic reporting is under heavy development, and we seem to be
doing just fine without tablegenned IDs, so I think we can simply remove
it.
Add test coverage for remark when runtime checks are not profitable with
threshold provided.
Also make sure that X86 remark tests actually passes an X86 triple,
which is needed for the threshold remark.
Also clean up the tests a bit.
PHIs that are larger than a legal integer type are split into multiple
virtual registers that are numbered sequentially. We can propagate the
known bits for each of these registers individually.
Big endian is not supported yet because the register order needs to be
reversed.
Fixes#171671
The function `EvaluateAsBooleanCondition` assumes (asserts) that the
input `Expr` is not in a dependent context. So it is caller's
responsibility to check the condition before the call. This commit fixes
the issue in `UnsafeBufferUsage.cpp`.
The issue was first found downstream because of some code not
upstreamed. This commit also includes those un-upstreamed code.
rdar://166217941
Fixes#70949. Prior to PR #151378 memory locations were incorrect; that
patch prevented the emission of the incorrect locations.
This patch fixes the underlying issue.
SelectionDAG::areNonVolatileConsecutiveLoads will only match loads that
have a MemoryVT the same size as the stride byte size, which will fail
for cases where large loads have been split (typically by
shift+truncates) and we're trying to stitch them back together.
As a fallback, this patch checks for cases where the candidate element's
byte size is a multiple of full MemoryVT bytes distance away from the base
load.
## Introduction
This patch implements `ranges::elements_of` from
[P2502R2](https://wg21.link/P2502R2). Specializations of `elements_of`
encapsulate a range and act as a tag in overload sets to disambiguate
when a range should be treated as a sequence rather than a single value.
```cpp
template <bool YieldElements>
std::generator<std::any> f(std::ranges::input_range auto &&r) {
if constexpr (YieldElements) {
co_yield std::ranges::elements_of(r);
} else {
co_yield r;
}
}
```
## Reference
- [P2502R2: `std::generator`: Synchronous Coroutine Generator for
Ranges](https://wg21.link/P2502R2)
- [[range.elementsof]](https://eel.is/c++draft/range.elementsof)
Partially addresses #105226
---------
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
Co-authored-by: A. Jiang <de34@live.cn>
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html
There appears to be an issue with annotating `operator*` and
`operator/`, see: https://llvm.org/PR171031
---------
Co-authored-by: A. Jiang <de34@live.cn>
Build examples and example plug-ins by default when running tests. If
examples are unwanted, they can still be disabled completely using
LLVM_INCLUDE_EXAMPLES=OFF. Plugin tests depend on examples and it is
beneficial to test them by default. By default, Examples will still not
be included in the default target or be installed, this remains
controlled by LLVM_BUILD_EXAMPLES (which defaults to OFF).
The additional cost for building examples for tests is 17 compilation
units (12 C++, 5 C), which should be tolerable.
I don't know how broken the examples currently are in the various build
configurations, but if we find breakage, it would be good to fix it.
Pull Request: https://github.com/llvm/llvm-project/pull/171998
Clients should be able to build the ORC runtime with or without
exceptions/RTTI, and this choice should be able to be made independently
of the corresponding settings for LLVM (e.g. it should be fine to build
LLVM with exceptions/RTTI disabled, and orc-rt with them enabled).
The orc-rt-c/config.h header will provide C defines that can be used by
both the ORC runtime and API clients to determine the value of the
options.
Future patches should build on this work to provide APIs that enable
some interoperability between the ORC runtime's error return mechanism
(Error/Expected) and C++ exceptions.
Using CallableTraitsHelper simplifies the expression of
ErrorHandlerTraits, which is responsible for running handlers based on
their argument and return types.
In a previous PR I fixed one case where subnormal long doubles would
cause an infinite loop in printf. It was an improper fix though. The
problem was that a shift on the fixed point representation would
sometimes go negative, since the effective exponent of a subnormal is
lower than the minimum allowed exponent value. This patch extends the
fixed point representation to have space for subnormals, and adds an
assert to check that lshifts are always positive. The previous fix of
sometimes shifting right instead of left caused a loss of precision
which also sometimes caused infinite loops in the %e code.
This is the second patch in a series that removes the dependency of
clangDependencyScanning on clangDriver, splitting the work from
#169964 into smaller changes (see comment linked below).
This patch updates the by-name scanning interface in
DependencyScanningWorker to accept only -cc1 command lines directly
and moves the logic for handling driver-style command lines into
DependencyScanningTool in clangTooling.
Support for -cc1 command lines in by-name scanning is introduced in
this patch.
The next patch will update the remaining parts of
DependencyScanningWorker to operate only on -cc1 command lines,
allowing its dependency on clangDriver to be removed.
https://github.com/llvm/llvm-project/pull/169964#pullrequestreview-3545879529
When the command interpreter is asked to not echo commands but still
print errors, a user has no idea what command caused the error.
For example, when I add `bogus` in my `~/.lldbinit`:
```
$ lldb
error: 'bogus' is not a valid command.
```
Things are even more confusing when we have inline diagnostics, which
point to nothing. For example, when I add `settings set target.run-args
-foo` to my `~/.lldbinit`:
```
❯ lldb
˄˜˜˜
╰─ error: unknown or ambiguous option
```
We should still echo the command if the command fails, making it obvious
which command caused the failure and fixing the inline diagnostics.
Fixes#171514
Enable `FailOnUnsupportedFP` for `ConvertToLLVMPattern` and set it to
`true` for all `math-to-llvm` patterns. This fixes various invalid
lowerings of `math` ops on `fp8`/`fp4` types.
This patch adds the ability to produce a summary report with a few KPIs
in the compare-benchmarks script. This is useful to regularly monitor
the progress of the library on these KPIs.
Example usage:
compare-benchmarks libstdcxx.lnt llvm-20.lnt llvm-21.lnt main.lnt \
--series-names "GNU,LLVM 20,LLVM 21,LLVM main" \
--format kpi \
--noise-threshold 0.1 \
--meta-candidate 'LLVM'
This would produce a short report showing the evolution of benchmarks
in the given LLVM releases as compared to a GNU baseline.
This patch adds register bank legalization support for buffer load byte
and short operations in the AMDGPU GlobalISel pipeline.
This is a re-land of #167798. I have fixed the failing test
/CodeGen/AMDGPU/GlobalISel/buffer-load-byte-short.ll
This PR migrates support for wide string literals from the incubator to
upstream.
## Changes
- Implement wide string literal support in
`getConstantArrayFromStringLiteral`
- Handle wchar_t, char16_t, and char32_t string literals
- Collect code units and create constant arrays with IntAttr elements
- Use ZeroAttr for null-filled strings
## Testing
- Copied `wide-string.cpp` test file from incubator
- Expanded test to include wchar_t test cases (incubator only had
char16_t and char32_t)
- All tests pass
---------
Co-authored-by: Andy Kaylor <akaylor@nvidia.com>
fixes#168960
Adds `ICK_HLSL_Matrix_Splat` and hooks it up to
`PerformImplicitConversion` and `IsMatrixConversion`. Map these to
`CK_HLSLAggregateSplatCast`.
Considering that the current loop fusion only supports adjacent loops,
we are able to simplify the checks in this pass. By removing
`isControlFlowEquivalent` check, this patch fixes multiple issues
including #166560, #166535, #165031, #80301 and #168263.
Now only the sequential/adjacent candidates are collected in the same
list. This patch is the implementation of approach 2 discussed in post
#171207.
Since `FatRawBufferCastOp` preserves the shape of its source operand,
the result dimensions can be reified by querying the source's
dimensions.
---------
Signed-off-by: Yu-Zhewen <zhewenyu@amd.com>
Guard "brx.idx" generation to appropriate PTX ISA and SM version.
In addition, do some minor refactoring moving the expansion into ISel as
doing this during operation legalization is more complex and offers no
benefits.
fixes https://github.com/llvm/llvm-project/issues/171709
As per @arsenm 's instructions, I've separated the non-functional
changes from https://github.com/llvm/llvm-project/pull/169958.
Afterwards I'll tackle the functional ones one by one. I hope I did
everything right this time.
Full descriptions in the article:
https://pvs-studio.com/en/blog/posts/cpp/1318/
3. Array overrun is possible.
The PVS-Studio warning: V557 Array overrun is possible. The value of
'regIdx' index could reach 31. VEAsmParser.cpp 696
10. Excessive check.
The PVS-Studio warning: V547 Expression 'IsLeaf' is always false.
PPCInstrInfo.cpp 419
11. Doubling the same check.
The PVS-Studio warning: V581 The conditional expressions of the 'if'
statements situated alongside each other are identical. Check lines:
5820, 5823. PPCInstrInfo.cpp 5823
15. Excessive check.
The PVS-Studio warning: V547 Expression 'i != e' is always true.
MachineFunction.cpp 1444
17. Excessive assignment.
The PVS-Studio warning: V1048 The 'FirstOp' variable was assigned the
same value. MachineInstr.cpp 1995
18. Excessive check.
The PVS-Studio warning: V547 Expression 'AllSame' is always true.
SimplifyCFG.cpp 1914
19. Excessive check.
The PVS-Studio warning: V547 Expression 'AbbrevDecl' is always true.
LVDWARFReader.cpp 398
Fix the args in the mms-bitfields test file to be aligned with the same
test in classical codegen (clang/test/CodeGen/mms-bitfields.c). After
#71148 is merged
Delinearization has its own `isKnownNonNegative` function, which wraps
`ScalarEvolution::isKnownNonNegative` and adds additional logic. The
additional logic is that, for a pointer addrec `{a,+,b}`, if the pointer
has `inbounds` and both `a` and `b` are known to be non-negative, then
the addrec is also known non-negative (i.e., it doesn't wrap). This
reasoning is incorrect. If the GEP and/or load/store using the pointer
are not unconditionally executed in the loop, then the addrec can still
wrap. Even though no actual example has been found where this causes a
miscompilation (probably because the subsequent checks fail so the
validation also fails), simply replacing it with
`ScalarEvolution::isKnownNonNegative` is safer, especially it doesn't
cause any regressions in the existing tests.
Resolve#169811
We fall back to `Objective-C++` when running C++ expressions in frames
that don't have debug-info. But we were missing a fallback note for this
situation. We would now print following note on expression error:
```
note: Possibly stopped inside system library, so speculatively enabled Objective-C. Ran expression as 'Objective C++'.
```
PR #170908 introduced an unconditional dereference of the local target
variable name, but in rare cases (I am not sure this can be reproduced
without `-mllvm -inline-all` currently), such variable may not have a
uniq name on the alloca. For instance, this can happen after a call to a
function with TARGET character result is inlined. The alloca is a temp
on the caller side, that gets the TARGET attribute in the inlined scope
via the result name.
This commit implements gcc_struct attribute with the behavior similar to
one in GCC. Current behavior is as follows:
When ItaniumRecordLayoutBuilder is used, [[gcc_struct]] will locally
cancel the effect of -mms-bitfields on a record. If -mms-bitfields is
not supplied and is not a default behavior on a target, [[gcc_struct]]
will be a no-op. This should provide enough compatibility with GCC.
If C++ ABI is "Microsoft", [[gcc_struct]] will currently always produce
a diagnostic, since support for it is not yet implemented in
MicrosoftRecordLayoutBuilder. Note, however, that all the infrastructure
is ready for the future implementation.
In particular, check for default value of -mms-bitfields is moved from a
driver to ASTContext, since it now non-trivially depends on other
supplied flags. This also, unfortunately, makes it impossible to use
usual argument parsing for `-m{no-,}ms-bitfields`.
The patch doesn't introduce any backwards-incompatible changes, except
for situations when cc1 is called directly with `-mms-bitfields` option.
Work towards #24757
---------
Co-authored-by: Martin Storsjö <martin@martin.st>
This patch fixes the lowering of the newly
added mbarrier.arrive Op w.r.t return value.
(Follow-up of PR #170545)
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This patch removes explicit dependencies on cxx_experimental for
installations that are local to the test suite. Such dependencies
are not required anymore from the test-suite installation targets
since the proper dependency is now encoded between cxx and
cxx_experimental.
The utility expressions in the `InstrumentationRuntime` plugins are just
plain C code, but we run them as `ObjC++`. That meant we were doing
redundant work (like looking up decls in the Objective-C runtime). The
sanitizer tests sporadically time out while looking up function symbols
in the Objective-C runtime. This patch switches the expression language
to `C`.
Didn't find a great way of testing this other than looking at the
expression log.
rdar://165656320
Instead of returning an `Expected<vector<...>>` it now returns an Error,
and receives a vector argument to fill in. This will be useful to
support a change were ParseMultiMemReadPacket will be called multiple
times in a loop with the same vector; without this change, we would have
to concatenate vectors and copy memory around.
A buildbot failure in https://github.com/llvm/llvm-project/pull/170323
when expensive checks were used highlighted that some of these patterns
were missing.
This patch adds `V_INDIRECT_REG_{READ/WRITE}_GPR_IDX` and
`V/S_INDIRECT_REG_WRITE_MOVREL` for `V6` and `V7` vector sizes.
This simplifies the code by removing the manual optimization for size ==
1, and also gives us an optimization for other small sizes.
Accept a `llvm::SmallVector` by value for the constructor and move it
into the destination, rather than accepting `ArrayRef` that we copy
from. This also lets us not have to construct a reference to the
elements of a `std::initializer_list`, which requires reading the
implementation of the constructor to know whether it's safe.
Also explicitly document that the constructor requires the input indexes
to have a size of at least 1.
Added `ds.atomic.barrier.arrive.rtn.b64` and
`ds.atomic.async.barrier.arrive.b64` to ROCDL. These are parts of the
LDS memory barrier concept in GFX1250. Also added alias analysis to
`global/flat` data prefetch ops. Extended rocdl tests.
This patch improves the way lldb checks if the terminal it's opened in
(if any) supports Unicode or not.
On POSIX systems, we check if `LANG` contains `UTF-8`.
On Windows, we always return `true` since we use the `WriteToConsoleW`
api.
This is a relanding of https://github.com/llvm/llvm-project/pull/168603.
The tests failed because the bots support Unicode but the tests expect
ASCII. To avoid different outputs depending on the environment the tests
are running in, this patch always force ASCII in the tests.
Since #170838 we no longer canonicalise away whole-lane shuffles of
horizontal ops, so we need to better handle cases where widened shuffle
masks might still contain undefs.
Fixes#172010
To support the `-fvisibility=...` option in Flang, we need a pass to
rewrite all the global definitions in the LLVM dialect that have the
default visibility to have the specified visibility. This change adds
such a pass.
Note that I did not add an option for `visiblity=default`; I believe
this makes sense for compiler drivers since users may want to tack an
option on at the end of a compile line to override earlier options, but
I don't think it makes sense for this pass to accept
`visibility=default`--it would just be an early exit IIUC.
We used to search for constants using the name we parsed. For C++, this
would mean using the demangled struct name (from the unique name). This
name is not always equal to the one used for the struct's name by the
compiler. For example:
```
0x105E | LF_STRUCTURE [size = 120, hash = 0xF38F] ``anonymous namespace'::Anonymous<A::B::C<void> >::D`
unique name: `.?AUD@?$Anonymous@U?$C@X@B@A@@@?A0x8C295248@@`
```
We would use the unique name and get to `(anonymous
namespace)::Anonymous<struct A::B::C<void>>::D`. Then, when finding the
constant in the field list, we'd search for `(anonymous
namespace)::Anonymous<struct A::B::C<void>>::D::StaticMember`. This
wouldn't yield any results, because the constant will use the demangled
name as given by the compiler.
With this PR, we use the struct's name as given in the PDB and append
the member name.
Invent `StylizedInstance` class to store special variables together with
the instantiated expression in omp::clause::Initializer. This will
eliminate the need for visiting the original AST nodes in lowering to
MLIR.
The bind clause specifies the name of the function to call on the
device, and takes either a string or identifier(per the standard):
"If the name is specified as an identifier, it is callled as if the name
were specified in the language being compiled. If the name is specified
as a string, the string is used for the procedure name unmodified".
The latter (as a string) is already implemented, this patch implements
the former. Unfortunately, no existing implementation of this in C++
seems to exist. Other languages, the 'name' of a function is sufficient
to identify it (in this case 'bind' can refer to undeclared functions),
so it is possible to figure out what the name should be. In C++ with
overloading (without a discriminator, ala-fortran), a name only names an
infinite overload set.
SO, in order to implement this, I've decided that the 'called as'
(bound) function must have the same signature as the one marked by the
'routine'. This is trivially sensible in non-member functions, however
requires a bit more thought for member(and thus lambda-call-operators)
functions. In this case, we 'promote' the type of the function to a
'free' function by turning the implicit 'this' to an explicit 'this'.
I believe this is the most sensible and reasonable way to implement
this, and really the only way to make something usable.
Binutils allows `-p`, which prevents it from modifying the timestamp.
`llvm-objcopy` already preserves the timestamp in the COFF header, so
the only missing piece is allowing the option in the config manager.
This is a follow up to https://github.com/llvm/llvm-project/pull/152020,
continuing the removal of now-redundant `if(process && target)` checks.
Since this causes a diff in every line of the affected functions, this
commit also uses the opportunity to create some helper functions and
reduce nesting of the affected methods by rewriting all pre-condition
checks as early returns, while remaining strictly NFC.
This has exposed some odd behaviors:
1. `SBFrame::GetVariables` has a variable `num_produced` which is
clearly meant to be incremented on every iteration of the loop but it is
only incremented once, after the loop. So its value is always 0 or
1. The variable now lives in `FetchVariablesUnlessInterrupted`.
2. `SBFrame::GetVariables` has an interruption mechanism for local
variables, but not for "recognized arguments". It's unclear if this is
by design or not, but it is now evident that there is a discrepancy
there.
3. In `SBFrame::EvaluateExpression` we only log some error paths, but
not all of them.
To stick to the strictly NFC nature of this patch, it does not address
any of these issues.
This change adds dense and sparse MMA with block scaling intrinsics to
MLIR -> NVVM IR -> NVPTX flow. NVVM and NVPTX implementation is based on
PTX ISA 9.0.
Commit
7e7ea9c535
added tensor support for scatter, but running the existing
canonicalization on tensors causes bugs, so we fix the canonicalization
with tensor output.
Closes https://github.com/llvm/llvm-project/issues/168695
---------
Signed-off-by: Ryutaro Okada <1015ryu88@gmail.com>
#169166 reported some false positives in this check. They stemmed from
the fact that it assumed `TypedefTypeLoc`s and `TagTypeLoc`s couldn't be
dependent. Turns out, this incorrect assumption leads to some false
*negatives* as well: https://godbolt.org/z/K6EvfrE6a. This PR fixes
those.
This patch changes the transfer_write -> transfer_read load store
forwarding canonicalization pattern to work based on permutation maps
and less on adhoc logic. The old logic couldn't canonicalize a simple
unit dim broadcast through transfer_write/transfer_read which is added
as a test in this patch.
This patch also details what would be needed to support cases which are
not yet implemented better.
This reverts commit 3847648e84.
Relands https://github.com/llvm/llvm-project/pull/158043 which got
auto-merged on a revision which wasn't approved.
The only addition to the approved version was that we adjust how we set
the time for failed tests. We used to just assign it the negative value
of the elapsed time. But if the test failed with `0` seconds (which some
of the new tests do), we would mark it `-0`. But the check for whether
something failed checks for `time < 0`. That messed with the new
`--filter-failed` option of this PR. This was only an issue on Windows
CI, but presumably can happen on any platform. Happy to do this in a
separate PR.
---- Original PR
This patch adds a new --filter-failed option to llvm-lit, which when
set, will only run the tests that have previously failed.
OpenACC data clauses of structured constructs may contain component
references (`obj%comp`, or `obj%array(i:j:k)`, ...).
This changes allows using the ACC dialect data operation result for such
clauses every time the component is referred to inside the scope of the
construct.
The bulk of the change is to add the ability to map
`evaluate::Component` to mlir values in the symbol map used in lowering.
This is done by adding the `ComponentMap` helper class to the lowering
symbol map, and using it to override `evaluate::Component` reference
lowering in expression lowering (ConvertExprToHLFIR.cpp).
Some changes are made in Lower/Support/Utils.h in order to set-up/expose
the hashing/equality helpers needed to use `evaluate::Component` in
llvm::DenseMap.
In OpenACC.cpp, `genPrivatizationRecipes` and `genDataOperandOperations`
are merged to unify the processing of Designator in data clauses.
New code is added to unwrap the rightmost `evaluate::Component`, if any,
and remap it.
Note that when the right most part is an array reference on a component
(`obj%array(i:j:k)`), the whole component `obj%array(` is remapped, and
the array reference is dealt with a bound operand like for whole object
array.
After this patch, all designators in data clauses on structured
constructs will be remapped, except for array reference in private/first
private/reduction (the result type of the related operation needs to be
changed and the recipe adapted to "offset back"), component reference in
reduction (this patch is adding a TODO for it), and device_ptr
(previously bypassed remapping because of issues with descriptors,
should be OK to lift it in a further patch).
Note that this patch assumes that it is illegal to have code with
intermediate variable indexing in the component reference (e.g.
`array(i)%comp`) where the value of the index would be changed inside
the region (e.g., `i` is assigned a new value), or where the component
would be used with different indices meant to have the same value as the
one used in the clause (e.g. `array(i)%comp` where `j` is meant to be
the same as `i`). I will try to add a warning in semantics for such
questionable/risky usages.
This patch implements a workaround for a VSCode bug that causes it to
send disassemble requests with empty memory reference. You can find more
detailed description
[here](https://github.com/microsoft/vscode/pull/270361). I propose to
allow empty memory reference and return invalid instructions when this
occurs.
Error log example:
```
1759923554.517830610 (stdio) --> {"command":"disassemble","arguments":{"memoryReference":"","offset":0,"instructionOffset":-50,"instructionCount":50,"resolveSymbols":true},"type":"request","seq":3}
1759923554.518007517 (stdio) queued (command=disassemble seq=3)
1759923554.518254757 (stdio) <-- {"body":{"error":{"format":"invalid arguments for request 'disassemble': malformed memory reference at arguments.memoryReference\n{\n \"instructionCount\": 50,\n \"instructionOffset\": -50,\n \"memoryReference\": /* error: malformed memory reference */ \"\",\n \"offset\": 0,\n \"resolveSymbols\": true\n}","id":3,"showUser":true}},"command":"disassemble","request_seq":3,"seq":0,"success":false,"type":"response"}
```
I am not sure that we should add workaround here when bug on VSCode
side, but I think this bug affects our users. WDYT?
This PR adds the cost model for the loop dependence mask intrinsics,
both for cases where they must be expanded and when they can be lowered
for AArch64.
---------
Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
This reverts commit
54a4da9df6.
MSVC supports an extension allowing to delete an array of objects via
pointer whose static type doesn't match its dynamic type. This is done
via generation of special destructors - vector deleting destructors.
MSVC's virtual tables always contain a pointer to the vector deleting
destructor for classes with virtual destructors, so not having this
extension implemented causes clang to generate code that is not
compatible with the code generated by MSVC, because clang always puts a
pointer to a scalar deleting destructor to the vtable. As a bonus the
deletion of an array of polymorphic object will work just like it does
with MSVC - no memory leaks and correct destructors are called.
This patch will cause clang to emit code that is compatible with code
produced by MSVC but not compatible with code produced with clang of
older versions, so the new behavior can be disabled via passing
-fclang-abi-compat=21 (or lower).
Fixes https://github.com/llvm/llvm-project/issues/19772
The patch motivated by Tosa Conformance test negate_32x45x49_i16_full failure.
TosaToLinalg pass has an optimization to transfer Tosa Negate to Sub if the zero points are zeros. However, when the input value is minimum negative number, the transformation will cause the underflow. By removing the transformation, if zp = 0 it would do the promotion to avoid the underflow.
Promotion types could be from int32 to int48. TOSA negate specification does not mention support for int48. Should we consider removing the promotion to int48 to stay aligned with the TOSA spec?
Both `LOOP_DEPENDENCE_WAR_MASK` and `LOOP_DEPENDENCE_RAW_MASK` are
currently hard to split correctly, and there are a number of incorrect
cases.
The difficulty comes from how the intrinsics are defined. For example,
take `LOOP_DEPENDENCE_WAR_MASK`.
It is defined as the OR of:
* `(ptrB - ptrA) <= 0`
* `elementSize * lane < (ptrB - ptrA)`
Now, if we want to split a loop dependence mask for the high half of the
mask we want to compute:
* `(ptrB - ptrA) <= 0`
* `elementSize * (lane + LoVT.getElementCount()) < (ptrB - ptrA)`
However, with the current opcode definitions, we can only modify ptrA or
ptrB, which may change the result of the first case, which should be
invariant to the lane.
This patch resolves these cases by adding a "lane offset" to the ISD
opcodes. The lane offset is always a constant. For scalable masks, it is
implicitly multiplied by vscale.
This makes splitting trivial as we increment the lane offset by
`LoVT.getElementCount()` now.
Note: In the AArch64 backend, we only support zero lane offsets (as
other cases are tricky to lower to whilewr/rw).
---------
Co-authored-by: Benjamin Maxwell <benjamin.maxwell@arm.com>
`gatherDataOperandAddrAndBounds` did not handle pointers and allocatable
pointer components (`obj%p`) in the same way as pointer and allocatable
whole objects (`p`).
The difference is that whole object are kept as a descriptor address
(`fir.ref<fir.box>`) in the acc data operation while components were
dereferenced (`fir.box<>`).
I do not think this was intentional, and is mainly a side effect of the
`genExprAddr` for components that generate a dereference for
pointer/allocatables.
In the work that I am doing on remapping components, this is an issue
because the data operation must return a fir.ref<fir.box> so that I can
remap any appearance to the component to it (which could be in a pointer
association statement for instance, requiring access to a descriptor
address as opposed to a value).
Fixed a crash in Blender due to some weird control flow.
The issue was with the "merge" function which was only looking at the
keys of the "Other" VMem/SGPR maps. It needs to look at the keys of both
maps and merge them.
Original commit message below
----
The pass was already "reinventing" the concept just to deal with 16 bit
registers. Clean up the entire tracking logic to only use register
units.
There are no test changes because functionality didn't change, except:
- We can now track more LDS DMA IDs if we need it (up to `1 << 16`)
- The debug prints also changed a bit because we now talk in terms of
register units.
This also changes the tracking to use a DenseMap instead of a massive
fixed size table. This trades a bit of access speed for a smaller memory
footprint. Allocating and memsetting a huge table to zero caused a
non-negligible performance impact (I've observed up to 50% of the time
in the pass spent in the `memcpy` built-in on a big test file).
I also think we don't access these often enough to really justify using
a vector. We do a few accesses per instruction, but not much more. In a
huge 120MB LL file, I can barely see the trace of the DenseMap accesses.
This is precommitting a full reproducer of one of our motivating
examples. Looking at a full reproducer is helpful for further discussion
on DependenceAnalysis and Delinearization issues and the runtime
predicates discussion. I appreciate that this is a larger than usual
test case, but that is by design, because I think it is useful to look
at the whole thing with all of its complexities.
I have given useful names to all the relevant loop variables, and the
relevant blocks in these loops and their functions, but have
intentionally not done that for others as there are quite a few more.
Instead of this:
```
0x00018cff: DW_TAG_imported_declaration
DW_AT_decl_line (12)
DW_AT_import (0x0000000000018cfb)
```
print:
```
0x00018cff: DW_TAG_imported_declaration
DW_AT_decl_line (12)
DW_AT_import (0x0000000000018cfb "platform")
```
Where `0x0000000000018cfb` in this example could be a `DW_TAG_module`
with `DW_AT_name ("platform")`
Because previous pr:
https://github.com/llvm/llvm-project/pull/136821 has test failures on
builder llvm-clang-x86_64-expensive-checks-ubuntu when enable
`-verify-machineinstrs`. So revert that pr.
This new pr change:
1. add `-verify-machineinstrs` in RUN command.
2. wirte register before reading to avoid error `Bad machine code: Using
an undefined physical register`.
Fix#47656.
This takes an ABI break unconditionally, since it's small enough that
nobody should be affected. This both simplifies `bitset` a bit and makes
us more conforming.
There is a generic fold for recursively calling simplifyICmpInst with
the ptrtoint cast stripped:
9b6b52b534/llvm/lib/Analysis/InstructionSimplify.cpp (L3850-L3867)
As such, we shouldn't have to explicitly do this for the
computePointerICmp() fold.
This is not strictly NFC because the recursion limit applies to the
generic fold, though I wouldn't expect this to matter in practice.
As the comment on the method indicates, this method is supposed to
produce a splat for vector types. However, currently it has a
ConstantInt return type that is incompatible with that. There is a
separate overload on IntegerType -- only that one should return
ConstantInt.
This also requires adjusting Type::getIntNTy() to return IntegerType
(matching the normal Type API), so it uses the right overload.
Explicitly cast the value to (int) before negating, so it gets properly
sign extended. Otherwise we end up with a large unsigned value instead
of a negative value for large bit widths.
This was found while working on
https://github.com/llvm/llvm-project/pull/171456.
Lockdown instructions for vector compares `not equal to non-zero (Ex:
vec[i]!=7)`. Current implementation can be made better by removing the
negation and using the identity ``` 0XFFFF + 1 = 0 and 0 + 1 = 0 ```
Co-authored-by: himadhith <himadhith.v@ibm.com>
Trace.local_head is currently uninitialized when Trace is created. It is
first initialized when the first event is added to the trace, via the
first call to TraceSwitchPartImpl.
However, ThreadContext::OnFinished uses local_head, assuming that it is
initialized. If it has not been initialized, we have undefined behavior,
likely crashing if the contents are garbage. The allocator (Alloc)
reuses previously allocations, so the contents of the uninitialized
memory are arbitrary.
In a C/C++ TSAN binary it is likely very difficult for a thread to start
and exit without a single event inbetween. For Go programs, code running
in the Go runtime itself is not TSan-instrumented, so goroutines that
exclusively run runtime code (such as GC workers) can quite reasonably
have no TSan events.
The addition of such a goroutine to the Go test.c is sufficient to
trigger this case, though for reliable failure (segfault) I've found it
necessary to poison the ThreadContext allocation like so:
```
diff --git a/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp b/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp index feee566f44..352db9aa7c 100644
--- a/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
+++ b/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
@@ -392,7 +392,9 @@
report_mtx(MutexTypeReport),
nreported(),
thread_registry([](Tid tid) -> ThreadContextBase* {
- return new (Alloc(sizeof(ThreadContext))) ThreadContext(tid);
+ void* ptr = Alloc(sizeof(ThreadContext));
+ internal_memset(ptr, 0xde, sizeof(ThreadContext));
+ return new (ptr) ThreadContext(tid);
}),
racy_mtx(MutexTypeRacy),
racy_stacks(),
```
The fix is trivial: local_head should be zero-initialized.
This bug was introduced by #108323, where the loc and ip were not
properly set. It may lead to errors when the operations are not linearly
asserted to the IR.
`[[nodiscard]]` should be applied to functions where discarding the
return value is most likely a correctness issue.
- https://libcxx.llvm.org/CodingGuidelines.html
Some classes in `<format>` were already annotated. This patch completes
the remaining.
Use that instead of register class to detect the mask operand in
lowerRISCVVMachineInstrToMCInst.
There are other instructions like vmerge and vadc that have a VMV0
operand that isn't optional and do not reach this code. Having a
dedicated marker for the optional mask is more precise.
VMaskOp primarily exists for parsing/printing in the MC layer where the
mask is optional. The vector pseudos are split into mask and unmasked
versions. The mask is always rquired for the mask version.
That check doesn't seem very useful. For non-dependent context records,
ShouldDeleteSpecialMember is called when checking implicitly defined
member functions, before the anonymous flag which the check relies on is
set. (One could notice that in ParseCXXClassMemberDeclaration,
ParseDeclarationSpecifiers ends up calling
ShouldDeleteSpecialMember, while the flag is only set later in
ParsedFreeStandingDeclSpec.)
For dependent contexts, this check actually breaks correctness: since we
don't create those special members until the template is instantiated,
their deletion checks are skipped because of the anonymity.
There's only one regression in ObjC test about notes; we are more
explanative now.
Fixes https://github.com/llvm/llvm-project/issues/167217
When an alloc slice's users include llvm.protected.field.ptr intrinsics
and their discriminators are consistent, drop the intrinsics in order
to avoid unnecessary pointer sign and auth operations.
Reviewers: nikic
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/151650
fixes#58676
- Make /Zpr and /Zpc turn on the -fmatrix-memory-layout= row-major and
column-major flags
- Add the new DXC driver flags to Options.td
- Error in the HLSL toolchain when both flags are specified
- Add the new error diagnostic to DiagnosticDriverKinds.td
- propogate the flag via the Clang toolchain
Remove all inline links to Intel and IBM compiler options from the
comparison tables, as these links have become stale (Intel links
redirect to generic pages, IBM links redirect to PDF-only pages).
Option names are preserved for readability. The Data sources section
still contains links to the main documentation pages.
Details:
- Removed 43 Intel compiler option links
- Removed 35 IBM compiler option links
- Removed 2 stale links in notes section
- Updated documentation text accordingly
Fixes#171464
---------
Co-authored-by: Tarun Prabhu <tarun@lanl.gov>
This change is motivated by the overall goal of finding alternative ways
to promote allocas to VGPRs. The current solution is effectively limited
to allocas whose size matches a register class, and we can't keep adding
more register classes. We have some downstream work in this direction,
and I'm currently looking at cleaning that up to bring it upstream.
This refactor paves the way to adding a third way of promoting allocas,
on top of the existing alloca-to-vector and alloca-to-LDS. Much of the
analysis can be shared between the different promotion techniques.
Additionally, the idea behind splitting the pass into an analysis
phase and a commit phase is that it ought to allow us to more easily
make
better "big picture" decision about which allocas to promote how in the
future.
@@ -89,3 +89,44 @@ bool mixedBinaryAndCpp(Value a, Value b, bool c) {
return a < b == c;
}
// CHECK-MESSAGES: :[[@LINE-2]]:12: warning: chained comparison 'v0 < v1 == v2' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
#define CHAINED_COMPARE(a, b, c) (a < b < c)
void macro_test(int x, int y, int z) {
bool result = CHAINED_COMPARE(x, y, z);
}
// CHECK-MESSAGES: :[[@LINE-2]]:35: warning: chained comparison 'v0 < v1 < v2' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
#define NESTED_LESS(a, b) a < b
#define NESTED_CHAIN(a, b, c) NESTED_LESS(a, b) < c
void nested_macro_test(int x, int y, int z) {
bool result = NESTED_CHAIN(x, y, z);
}
// CHECK-MESSAGES: :[[@LINE-2]]:32: warning: chained comparison 'v0 < v1 < v2' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
#define LESS_OP <
void operator_macro_test(int x, int y, int z) {
bool result = x LESS_OP y LESS_OP z;
}
// CHECK-MESSAGES: :[[@LINE-2]]:19: warning: chained comparison 'v0 < v1 < v2' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
#define PARTIAL_LESS(a, b) a < b
void mixed_macro_test(int x, int y, int z) {
bool result = PARTIAL_LESS(x, y) < z;
}
// CHECK-MESSAGES: :[[@LINE-2]]:32: warning: chained comparison 'v0 < v1 < v2' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
void if_macro_test(int x, int y, int z) {
if (CHAINED_COMPARE(x, y, z)) {}
}
// CHECK-MESSAGES: :[[@LINE-2]]:25: warning: chained comparison 'v0 < v1 < v2' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
// CHECK-MESSAGES: :[[@LINE-2]]:36: warning: chained comparison 'v0 < v1 < v2 < v3' may generate unintended results, use parentheses to specify order of evaluation or a logical operator to separate comparison expressions [bugprone-chained-comparison]
Instead of of specifying the mappings manually, it can be convenient to use the ``-fprebuilt-module-path`` option. Let's also use ``-fimplicit-module-maps`` instead of manually pointing to our module map.
Instead of specifying the mappings manually, it can be convenient to use the ``-fprebuilt-module-path`` option. Let's also use ``-fimplicit-module-maps`` instead of manually pointing to our module map.
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.