Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Widen loops: Augment assignment note when values invalidated at the start of a loop #122398

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dgg5503
Copy link
Contributor

@dgg5503 dgg5503 commented Jan 10, 2025

This PR attempts to improve the value assignment diagnostic specifically for the case where widen-loops is enabled during Clang static analysis.

The motivation behind this diagnostic is the fact that the current diagnostic is a bit confusing at first glance. For example:

class A {
public:
  void m_fn1();
};
void fn1() {
  A a;
  A *b = &a;
  for (;;) {
    (void)!b;
    b->m_fn1();
  }
}
> clang -cc1 -internal-isystem C:\...\lib\clang\include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -analyzer-checker=core -analyzer-max-loop 2 -analyzer-config widen-loops=true -analyzer-output=text loop-widening-notes-best.cpp

loop-widening-notes-best.cpp:10:5: warning: Called C++ object pointer is null [core.CallAndMessage]
   10 |     b->m_fn1();
      |     ^
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
    8 |   for (;;) {
      |   ^
loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b'
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
loop-widening-notes-best.cpp:9:11: note: Assuming 'b' is null
    9 |     (void)!b;
      |           ^~
loop-widening-notes-best.cpp:10:5: note: Called C++ object pointer is null
   10 |     b->m_fn1();
      |     ^
1 warning generated.

The message loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' appearing where it is makes technical sense if you understand what widen-loops does behind the scenes. In short, widen-loops invalidates nearly all values in the current state before the start of the loop. At least in this example, the variable b is invalidated at the start of the for loop and, as part of the process of invalidation, is reassigned a conjured symbol hence Value assigned to 'b'. Without that knowledge, it is not intuitive from the diagnostic why the assignment message appears where it does.

In such cases, I'd like to make the diagnostic a bit more verbose. I propose something like:

loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)

This indicates to the user that b has been invalidated and turned into a symbolic representation because of actions taken by widen-loops.

The implementation I present in this PR minimally passes all Clang static analysis LIT tests, however, I am not confident the approach I've taken is idiomatic with respect to the design of Clang's static analysis. I have what I think is a slightly more general solution here where MemRegion invalidations are tracked for each ProgramState, but it also has its own pitfalls (see TODO comments in the diff). I'd be curious to know if there's a preference in either approach, or, if a different approach should be taken.

I am open to any and all suggestions as my knowledge in the Clang static analyzer is limited.

…ed at the start of a loop

This patch augments the assignment note to inform the user when a value
is invalidated and reassigned a conjured symbol as part of loop
widening.
@dgg5503 dgg5503 requested a review from jcsxky January 10, 2025 01:33
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Jan 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Douglas (dgg5503)

Changes

This PR attempts to improve the value assignment diagnostic specifically for the case where widen-loops is enabled during Clang static analysis.

The motivation behind this diagnostic is the fact that the current diagnostic is a bit confusing at first glance. For example:

class A {
public:
  void m_fn1();
};
void fn1() {
  A a;
  A *b = &a;
  for (;;) {
    (void)!b;
    b->m_fn1();
  }
}
> clang -cc1 -internal-isystem C:\...\lib\clang\include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -analyzer-checker=core -analyzer-max-loop 2 -analyzer-config widen-loops=true -analyzer-output=text loop-widening-notes-best.cpp

loop-widening-notes-best.cpp:10:5: warning: Called C++ object pointer is null [core.CallAndMessage]
   10 |     b->m_fn1();
      |     ^
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
    8 |   for (;;) {
      |   ^
loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b'
loop-widening-notes-best.cpp:8:3: note: Loop condition is true.  Entering loop body
loop-widening-notes-best.cpp:9:11: note: Assuming 'b' is null
    9 |     (void)!b;
      |           ^~
loop-widening-notes-best.cpp:10:5: note: Called C++ object pointer is null
   10 |     b->m_fn1();
      |     ^
1 warning generated.

The message loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' appearing where it is makes technical sense if you understand what widen-loops does behind the scenes. In short, widen-loops invalidates nearly all values in the current state before the start of the loop. At least in this example, the variable b is invalidated at the start of the for loop and, as part of the process of invalidation, is reassigned a conjured symbol hence Value assigned to 'b'. Without that knowledge, it is not intuitive from the diagnostic why the assignment message appears where it does.

In such cases, I'd like to make the diagnostic a bit more verbose. I propose something like:

loop-widening-notes-best.cpp:8:3: note: Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)

This indicates to the user that b has been invalidated and turned into a symbolic representation because of actions taken by widen-loops.

The implementation I present in this PR minimally passes all Clang static analysis LIT tests, however, I am not confident the approach I've taken is idiomatic with respect to the design of Clang's static analysis. I have what I think is a slightly more general solution here where MemRegion invalidations are tracked for each ProgramState, but it also has its own pitfalls (see TODO comments in the diff). I'd be curious to know if there's a preference in either approach, or, if a different approach should be taken.

I am open to any and all suggestions as my knowledge in the Clang static analyzer is limited.


Full diff: https://github.com/llvm/llvm-project/pull/122398.diff

5 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h (+4)
  • (modified) clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp (+43)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+6-1)
  • (modified) clang/test/Analysis/loop-widening-notes.cpp (+5-5)
  • (modified) clang/test/Analysis/loop-widening.cpp (+1-1)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 20c446e33ef9a5..6017b7e0ea8f81 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -290,6 +290,10 @@ class ExprEngine {
   /// This tag applies to a node created after removeDead.
   static const ProgramPointTag *cleanupNodeTag();
 
+  /// A tag to track when loop widening occurs.
+  /// This tag applies to a node created after getWidenedLoopState.
+  static const ProgramPointTag *loopWideningNodeTag();
+
   /// processCFGElement - Called by CoreEngine. Used to generate new successor
   ///  nodes by processing the 'effects' of a CFG element.
   void processCFGElement(const CFGElement E, ExplodedNode *Pred,
diff --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
index a9b4dbb39b5bd6..53494b5cf9cca8 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
@@ -1353,6 +1353,49 @@ static void showBRDefaultDiagnostics(llvm::raw_svector_ostream &OS,
     OS << " to ";
     SI.Dest->printPretty(OS);
   }
+
+  // If the value was part of a loop widening node and its value was
+  // invalidated (i.e. replaced with a conjured symbol) then let the user know
+  // that it was due to loop widening.
+  if (SI.StoreSite && SI.Dest &&
+      SI.StoreSite->getLocation().getTag() ==
+          ExprEngine::loopWideningNodeTag()) {
+
+    // TODO: Is it always the case that there's only one predecessor?
+    assert(SI.StoreSite->hasSinglePred() &&
+           "more than one predecessor found, this needs to be handled...");
+    if (const ExplodedNode *previous = SI.StoreSite->getFirstPred()) {
+      // Was the associated memory region for this variable changed from
+      // non-Symbolic to Symbolic because of invalidation?
+      // TODO: Better yet, if there was simply a way to know if a given
+      // SVal / MemRegion was invalidated as part of the current state,
+      // then that should be more robust than what's going on here.
+      // Could we somehow make use of "RegionChanges" /
+      // "runCheckersForRegionChanges" and the ExplicitRegions parameter.
+      // Still need to somehow know when a particular Global
+      // Variable is invalidated. I have not found this to be possible with
+      // "RegionChanges" unless I'm missing something...
+      const ProgramState *lastState = previous->getState().get();
+      const SVal &lastSVal = lastState->getSVal(SI.Dest);
+      const SymbolRef valueSymbol = SI.Value.getAsSymbol(true);
+      const SymbolRef lastSymbol = lastSVal.getAsSymbol(true);
+
+      bool isNowSymbolic = false;
+      if (valueSymbol) {
+        if (!lastSymbol) {
+          // Last state did not have a symbol for SI.Value in current state.
+          // Probably (?) invalidated.
+          isNowSymbolic = true;
+        } else {
+          // If the symbol types differ, then there was likely invalidation
+          isNowSymbolic = valueSymbol->getKind() != lastSymbol->getKind();
+        }
+      }
+
+      if (isNowSymbolic)
+        OS << " (invalidation as part of widen-loops made this symbolic here)";
+    }
+  }
 }
 
 static bool isTrivialCopyOrMoveCtor(const CXXConstructExpr *CE) {
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index ff8bdcea9a2201..4660edbf3dde78 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1109,6 +1109,11 @@ const ProgramPointTag *ExprEngine::cleanupNodeTag() {
   return &cleanupTag;
 }
 
+const ProgramPointTag *ExprEngine::loopWideningNodeTag() {
+  static SimpleProgramPointTag loopWideningTag(TagProviderName, "Widen Loop");
+  return &loopWideningTag;
+}
+
 void ExprEngine::ProcessStmt(const Stmt *currStmt, ExplodedNode *Pred) {
   // Reclaim any unnecessary nodes in the ExplodedGraph.
   G.reclaimRecentlyAllocatedNodes();
@@ -2546,7 +2551,7 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
     const LocationContext *LCtx = Pred->getLocationContext();
     ProgramStateRef WidenedState =
         getWidenedLoopState(Pred->getState(), LCtx, BlockCount, Term);
-    nodeBuilder.generateNode(WidenedState, Pred);
+    nodeBuilder.generateNode(WidenedState, Pred, loopWideningNodeTag());
     return;
   }
 
diff --git a/clang/test/Analysis/loop-widening-notes.cpp b/clang/test/Analysis/loop-widening-notes.cpp
index a3f030dfe98826..ac98d81c23475e 100644
--- a/clang/test/Analysis/loop-widening-notes.cpp
+++ b/clang/test/Analysis/loop-widening-notes.cpp
@@ -9,7 +9,7 @@ int test_for_bug_25609() {
     bar();
   for (int i = 0;  // expected-note {{Loop condition is true.  Entering loop body}}                    
                    // expected-note@-1 {{Loop condition is false. Execution continues on line 16}}
-       ++i,        // expected-note {{Value assigned to 'p_a'}} 
+       ++i,        // expected-note {{Value assigned to 'p_a' (invalidation as part of widen-loops made this symbolic here)}}
        i < flag_a;
        ++i) {}
                                       
@@ -23,7 +23,7 @@ int while_analyzer_output() {
   flag_b = 100;
   int num = 10;
   while (flag_b-- > 0) { // expected-note {{Loop condition is true.  Entering loop body}} 
-                         // expected-note@-1 {{Value assigned to 'num'}} 
+                         // expected-note@-1 {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
                          // expected-note@-2 {{Loop condition is false. Execution continues on line 30}}
     num = flag_b;
   }
@@ -45,7 +45,7 @@ int do_while_analyzer_output() {
   do {   // expected-note {{Loop condition is true. Execution continues on line 47}} 
          // expected-note@-1 {{Loop condition is false.  Exiting loop}}
     num--;
-  } while (flag_c-- > 0); //expected-note {{Value assigned to 'num'}}
+  } while (flag_c-- > 0); //expected-note {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
   int local = 0;
   if (num == 0)       // expected-note {{Assuming 'num' is equal to 0}} 
                       // expected-note@-1 {{Taking true branch}}
@@ -59,7 +59,7 @@ int test_for_loop() {
   int num = 10;
   for (int i = 0;    // expected-note {{Loop condition is true.  Entering loop body}} 
                      // expected-note@-1 {{Loop condition is false. Execution continues on line 67}}
-       new int(10),  // expected-note {{Value assigned to 'num'}}
+       new int(10),  // expected-note {{Value assigned to 'num' (invalidation as part of widen-loops made this symbolic here)}}
        i < flag_d;
        ++i) {         
     ++num;
@@ -73,7 +73,7 @@ int test_for_loop() {
 
 int test_for_range_loop() {
   int arr[10] = {0};
-  for(auto x : arr) { // expected-note {{Assigning value}} 
+  for(auto x : arr) { // expected-note {{Assigning value (invalidation as part of widen-loops made this symbolic here)}}
     ++x;
   }
   if (arr[0] == 0)   // expected-note {{Assuming the condition is true}} 
diff --git a/clang/test/Analysis/loop-widening.cpp b/clang/test/Analysis/loop-widening.cpp
index fbcb72dee160ae..0b05267c463fc3 100644
--- a/clang/test/Analysis/loop-widening.cpp
+++ b/clang/test/Analysis/loop-widening.cpp
@@ -16,7 +16,7 @@ void fn1() {
 
   for (;;) { // expected-note{{Loop condition is true.  Entering loop body}}
              // expected-note@-1{{Loop condition is true.  Entering loop body}}
-             // expected-note@-2{{Value assigned to 'b'}}
+             // expected-note@-2{{Value assigned to 'b' (invalidation as part of widen-loops made this symbolic here)}}
              // no crash during bug report construction
 
     g = !b;     // expected-note{{Assuming 'b' is null}}

Comment on lines +1370 to +1377
// TODO: Better yet, if there was simply a way to know if a given
// SVal / MemRegion was invalidated as part of the current state,
// then that should be more robust than what's going on here.
// Could we somehow make use of "RegionChanges" /
// "runCheckersForRegionChanges" and the ExplicitRegions parameter.
// Still need to somehow know when a particular Global
// Variable is invalidated. I have not found this to be possible with
// "RegionChanges" unless I'm missing something...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this in main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-b but all MemRegions from the ProgramState and before are tracked. I'm wondering if there's a way to clear InvalidatedMemoryRegionSet at the start of each ProgramState, or, if there's a different approach I should be taking...

@steakhal
Copy link
Contributor

I agree that it's important to augment the bug reports with information about the source of the symbols. Especially for conjured and derived symbols that are produced widely because of invalidations.
Loop widening is just one source of invalidation, and we could generalize it slightly.

Read this post of mine: https://discourse.llvm.org/t/memory-region-invalidation-tracking-improvements/62432
Unfortunately, I never got to implement it, but I'd be very happy as a reviewer to guide you through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants