-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GuardedInstruction for cfg_t node, instead of basic_block #789
Conversation
Signed-off-by: Elazar Gershuni <[email protected]>
Signed-off-by: Elazar Gershuni <[email protected]>
Signed-off-by: Elazar Gershuni <[email protected]>
Signed-off-by: Elazar Gershuni <[email protected]>
WalkthroughThe pull request introduces extensive modifications across multiple files, primarily focusing on the restructuring of control flow graph (CFG) handling, the introduction of new functions for condition manipulation, and updates to various classes and methods related to eBPF verification. Key changes include the removal and replacement of certain structures and functions, enhancements in error handling, and the addition of new test cases to improve coverage of scenarios involving jumps, loops, and uninitialized registers. Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Signed-off-by: Elazar Gershuni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 29
🧹 Outside diff range comments (2)
test-data/loop.yaml (1)
Line range hint
377-398
: Consider simplifying post-conditions for large loop test cases.Based on previous feedback from @Alan-Jowett, post-conditions like
r0.svalue>0
andr0.uvalue>0
are not required for large loop test cases. Consider simplifying the post-conditions while maintaining test coverage.test-data/uninit.yaml (1)
Line range hint
451-550
: Fix duplicate atomic XOR test caseThe atomic XOR test case uses
|=
instead of^=
in the instruction. This appears to be a copy-paste error from the atomic OR test case.Apply this fix:
- lock *(u64 *)(r1 + 0) |= r5 + lock *(u64 *)(r1 + 0) ^= r5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
📒 Files selected for processing (30)
src/asm_cfg.cpp
(6 hunks)src/asm_marshal.cpp
(0 hunks)src/asm_ostream.cpp
(3 hunks)src/asm_ostream.hpp
(0 hunks)src/asm_parse.cpp
(1 hunks)src/asm_syntax.hpp
(3 hunks)src/assertions.cpp
(3 hunks)src/config.hpp
(2 hunks)src/crab/cfg.hpp
(9 hunks)src/crab/ebpf_checker.cpp
(0 hunks)src/crab/ebpf_domain.cpp
(0 hunks)src/crab/ebpf_transformer.cpp
(0 hunks)src/crab/fwd_analyzer.cpp
(1 hunks)src/crab/interval.hpp
(1 hunks)src/crab/label.hpp
(1 hunks)src/crab/split_dbm.cpp
(2 hunks)src/crab/thresholds.cpp
(5 hunks)src/crab/thresholds.hpp
(1 hunks)src/crab/var_factory.cpp
(1 hunks)src/crab_utils/debug.hpp
(1 hunks)src/crab_verifier.cpp
(2 hunks)src/main/check.cpp
(1 hunks)src/test/ebpf_yaml.cpp
(5 hunks)src/test/test_conformance.cpp
(1 hunks)src/test/test_marshal.cpp
(0 hunks)src/test/test_print.cpp
(0 hunks)src/test/test_wto.cpp
(3 hunks)test-data/jump.yaml
(1 hunks)test-data/loop.yaml
(11 hunks)test-data/uninit.yaml
(1 hunks)
💤 Files with no reviewable changes (7)
- src/asm_marshal.cpp
- src/asm_ostream.hpp
- src/crab/ebpf_checker.cpp
- src/crab/ebpf_domain.cpp
- src/crab/ebpf_transformer.cpp
- src/test/test_marshal.cpp
- src/test/test_print.cpp
🧰 Additional context used
📓 Learnings (6)
src/assertions.cpp (1)
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#745
File: src/assertions.cpp:50-51
Timestamp: 2024-11-06T20:28:09.788Z
Learning: In `src/assertions.cpp`, within the `AssertExtractor` class, methods do not include comments. To maintain code consistency, avoid adding comments to assertion methods in this class.
src/crab/split_dbm.cpp (1)
Learnt from: elazarg
PR: vbpf/ebpf-verifier#689
File: src/crab/split_dbm.cpp:910-910
Timestamp: 2024-09-26T00:57:58.072Z
Learning: Avoid suggesting changes that are unrelated to the current PR.
src/main/check.cpp (1)
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#775
File: src/main/check.cpp:127-130
Timestamp: 2024-11-07T16:54:43.485Z
Learning: The `--simplify` flag in `src/main/check.cpp` controls how much additional information is available to understand the result of the analysis without affecting the analysis outcome. Therefore, it is appropriate to keep it under the "Verbosity" group.
src/test/ebpf_yaml.cpp (1)
Learnt from: dthaler
PR: vbpf/ebpf-verifier#721
File: src/ebpf_yaml.cpp:327-329
Timestamp: 2024-10-11T20:51:53.396Z
Learning: In `src/ebpf_yaml.cpp`, avoid suggesting changes to error handling code in the `run_conformance_test_case` function if the code is not modified in the current PR.
test-data/loop.yaml (1)
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#745
File: test-data/loop.yaml:377-398
Timestamp: 2024-10-25T19:31:59.214Z
Learning: In `test-data/loop.yaml`, for the test case **"very large loop > 100K iterations"**, additional post conditions like `r0.svalue>0` and `r0.uvalue>0` are not required.
test-data/uninit.yaml (3)
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#740
File: test-data/uninit.yaml:195-211
Timestamp: 2024-10-17T17:40:52.506Z
Learning: In `uninit.yaml` test cases, post-conditions are still needed even when the operation fails due to uninitialized registers.
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#740
File: test-data/uninit.yaml:249-265
Timestamp: 2024-10-17T17:40:50.096Z
Learning: The test cases in `uninit.yaml` should include post conditions even when the operation fails due to uninitialized registers.
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#740
File: test-data/uninit.yaml:39-55
Timestamp: 2024-10-17T17:36:02.883Z
Learning: In test cases, post conditions are still needed even if the operation is expected to fail due to an uninitialized register.
🔇 Additional comments (41)
src/crab/thresholds.hpp (1)
61-61
: LGTM! Verify dependent code updates.
The change from basic_block
to value_t
aligns with the PR's objective of restructuring the CFG handling.
Let's verify that all callers have been updated:
✅ Verification successful
Let me analyze the search results:
- The
get_thresholds
method is consistently usingvalue_t
in both declaration (thresholds.hpp) and definition (thresholds.cpp). - All calls to
get_thresholds
appear to be within the same files and are using the correct type. - While there are references to
basic_block
in the codebase, they are in different contexts:- In
asm_cfg.cpp
: Just string literals for statistics - In
cfg.hpp
: Definesbasic_block_t
which is a different type/concept
- In
Let me verify one more aspect to ensure completeness:
Based on the search results, I can now make a conclusive assessment:
Change from basic_block
to value_t
is correctly implemented.
The verification shows:
- The
get_thresholds
method is consistently usingvalue_t
in both declaration (thresholds.hpp) and implementation (thresholds.cpp). - All calls to
get_thresholds
are using the correct type. - The remaining
basic_block
references in the codebase are unrelated:basic_block_t
in cfg.hpp is a different class that works alongsidevalue_t
- Other occurrences are either in string literals or comments
The change aligns with the codebase's architecture where value_t
represents nodes in the CFG, while basic_block_t
serves a different purpose in the control flow graph implementation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Search for any remaining references to basic_block in the codebase
# that might need updating
# Test 1: Look for any remaining basic_block type usage
echo "Searching for basic_block usage..."
rg "basic_block[_]?[t]?" -g "!{*.md,*.txt,*.json}"
# Test 2: Look for get_thresholds calls to ensure they use value_t
echo "Searching for get_thresholds calls..."
rg "get_thresholds\s*\([^)]*\)" -g "*.{cpp,hpp}"
Length of output: 2039
Script:
#!/bin/bash
# Check if there are any direct relationships between basic_block_t and value_t
# or any potential type conflicts
# Look for files that mention both types
echo "Files containing both basic_block_t and value_t:"
rg -l "basic_block_t" | xargs rg -l "value_t"
# Look for the context where value_t is defined/used
echo "\nContext around value_t definition/usage:"
rg "value_t" -B 2 -A 2
Length of output: 26680
src/crab/label.hpp (1)
61-62
: Review the use of INT_MAX for exit label.
Using INT_MAX for the exit label could lead to issues:
- It might cause overflow in arithmetic operations
- It could conflict with actual instruction indices in very large programs
Consider using a more explicit sentinel value or a different approach to marking exit nodes.
src/crab_utils/debug.hpp (1)
46-53
: Verify error message handling across the codebase.
Since this changes the error message format, we should verify there are no dependencies on the exact format in tests or error handling code.
✅ Verification successful
Let's check the content of the file that mentions error messages and also search for any tests in other directories since the test/
directory doesn't exist.
Based on the analysis of the search results, I can now provide a final response:
No impact on error message dependencies
The changes to the CRAB_ERROR
macro format are safe because:
- All usages of
CRAB_ERROR
are for throwing internal errors that are not caught or parsed by any test code - The error messages are used purely for developer debugging and not parsed programmatically
- The test files (
src/test/*.cpp
) do not contain any assertions or parsing of error message formats
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential dependencies on error message format
# Look for test cases that might be asserting on error messages
echo "Checking for test cases involving CRAB_ERROR messages..."
rg -l "CRAB_ERROR" test/
rg -l "runtime_error" test/
# Look for any error message parsing in the codebase
echo "Checking for error message parsing..."
rg -l "parse.*error" src/
rg -l "error.*message" src/
Length of output: 436
Script:
#!/bin/bash
# Check content of the file that mentions error messages
echo "Checking src/test/test_marshal.cpp for error message handling..."
cat src/test/test_marshal.cpp
# Look for tests in other locations
echo -e "\nSearching for tests in other locations..."
fd -t f -e cpp test
fd -t f -e hpp test
# Look for CRAB_ERROR usage in source files
echo -e "\nChecking CRAB_ERROR usage in source files..."
rg -l "CRAB_ERROR" src/
# Get some context around CRAB_ERROR usage
echo -e "\nGetting context around CRAB_ERROR usage..."
rg -C 3 "CRAB_ERROR" src/
Length of output: 67701
src/test/test_wto.cpp (6)
15-15
: LGTM! Good modernization of initialization syntax.
The change to brace initialization label_t{i}
follows modern C++ best practices for uniform initialization.
19-30
: LGTM! Consistent application of the new initialization syntax.
The edge creation maintains the correct graph structure from Bourdoncle's Figure 1 while consistently using the updated label_t initialization syntax.
47-47
: LGTM! Consistent with previous changes.
The label_t initialization update is consistently applied in the second test case.
51-58
: LGTM! Correct implementation of Figure 2a graph structure.
The edge creation properly implements the graph structure from Bourdoncle's Figure 2a while maintaining consistent use of the new label_t initialization syntax.
75-75
: LGTM! Maintains consistency across test cases.
The label_t initialization update is consistently applied in the third test case.
79-85
: LGTM! Comprehensive test coverage maintained.
The edge creation properly implements the graph structure from Bourdoncle's Figure 2b while maintaining consistent use of the new label_t initialization syntax. The test suite effectively covers different graph structures and WTO scenarios.
✅ Verification successful
Let me check the actual test cases in the file to verify the coverage.
Let me verify the actual test implementations to ensure they cover different scenarios.
Test coverage is comprehensive and accurate
The test suite effectively covers three distinct graph structures from Bourdoncle's paper:
- Figure 1: Complex graph with nested components and multiple paths
- Figure 2a: Graph with overlapping loops
- Figure 2b: Graph with a single component containing multiple nodes
Each test case properly verifies both the graph construction and the expected WTO output string, maintaining consistent use of the new label_t syntax throughout.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify test coverage by checking for any additional test cases in the codebase
rg -l "TEST_CASE.*wto"
Length of output: 46
Script:
#!/bin/bash
# Get all TEST_CASE definitions from test_wto.cpp
rg "TEST_CASE.*wto" -A 2 src/test/test_wto.cpp
Length of output: 215
Script:
#!/bin/bash
# Get the full content of test_wto.cpp to analyze test coverage
cat src/test/test_wto.cpp
Length of output: 3100
src/crab/var_factory.cpp (1)
7-7
: LGTM! Verify header dependencies.
The header change aligns with the PR objective of relocating label_t
to a dedicated header file.
Let's verify the header dependencies and potential impacts:
✅ Verification successful
Let me analyze the results:
asm_syntax.hpp
is still used in multiple files and now includescrab/label.hpp
label.hpp
is used in three files:asm_syntax.hpp
,var_factory.cpp
, andthresholds.cpp
- No circular dependencies detected as
label.hpp
is included by other files but doesn't include them back
Let me verify one more aspect to ensure the change is safe:
Let me try a different approach to check the contents:
Based on the analysis of the files:
label.hpp
is self-contained and only includes necessary standard headers andcrab_utils/num_safety.hpp
var_factory.cpp
only useslabel_t
for theSTACK_FRAME_DELIMITER
constant and doesn't depend on any other types fromasm_syntax.hpp
asm_syntax.hpp
now includeslabel.hpp
, maintaining the dependency chain correctly
✓ Header change is safe and properly structured
The change from asm_syntax.hpp
to crab/label.hpp
is correct as:
- The new header provides all necessary dependencies
- No circular dependencies are introduced
- The change aligns with the PR objective of relocating
label_t
- The file only needs access to
STACK_FRAME_DELIMITER
which is now in the correct header
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential issues with header dependencies
# Test 1: Check if any remaining references to asm_syntax.hpp exist
echo "Checking for remaining asm_syntax.hpp references..."
rg "asm_syntax\.hpp"
# Test 2: Check if label.hpp is properly included where needed
echo "Checking for label.hpp usage..."
rg "label\.hpp"
# Test 3: Check for potential circular dependencies
echo "Checking for potential circular includes of label.hpp..."
rg -l "include.*label\.hpp" | xargs -I {} rg "include" {}
Length of output: 1565
Script:
#!/bin/bash
# Check the content of label.hpp to verify it's self-contained
echo "Checking label.hpp contents..."
rg -A 10 "label\.hpp" src/crab/label.hpp || true
# Check if var_factory.cpp uses any other types from asm_syntax.hpp
echo "Checking var_factory.cpp dependencies..."
rg -A 5 "using|typedef|struct|class|enum" src/crab/var_factory.cpp
Length of output: 280
Script:
#!/bin/bash
# Check the content of files directly
echo "Checking label.hpp contents..."
cat src/crab/label.hpp || true
echo -e "\nChecking var_factory.cpp contents..."
cat src/crab/var_factory.cpp || true
Length of output: 8352
src/crab/fwd_analyzer.cpp (3)
79-79
: LGTM: Simplified transformation logic.
The direct transformation approach is cleaner and aligns with the new GuardedInstruction
structure.
73-78
: 🧹 Nitpick (assertive)
Please clarify the comment about redundant errors.
The code looks well-structured, but the comment "avoid redundant errors" needs more context. Consider documenting:
- What kind of redundant errors were occurring?
- How does this implementation prevent them?
Let's check for related error handling patterns:
✅ Verification successful
Let me gather more context about the error handling and assumption mechanism.
Let me check one more thing about the preconditions and assertions to fully understand the context.
Based on the gathered context, I can now provide a final response:
Comment about redundant errors is clear from the codebase context
The comment "avoid redundant errors" refers to the interaction between assertion checking and assumption handling. When assume_assertions
is enabled:
ebpf_domain_check()
verifies assertions and reports violationsebpf_domain_assume()
incorporates the assertion into the domain state- Without assuming the assertion, the same violation could be reported multiple times in subsequent checks
The current implementation correctly prevents this by assuming assertions before transformation when the flag is enabled.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Look for similar error handling patterns and comments
rg "redundant.*error" --type cpp -B 2 -A 2
Length of output: 658
Script:
#!/bin/bash
# Look for ebpf_domain_assume usage and context
rg "ebpf_domain_assume" --type cpp -B 3 -A 3
# Look for related assertion handling
rg "assume_assertions" --type cpp -B 3 -A 3
Length of output: 5029
Script:
#!/bin/bash
# Look for GuardedInstruction and preconditions definition
ast-grep --pattern 'struct GuardedInstruction {
$$$
}'
# Look for related assertion checking code
rg "ebpf_domain_check" --type cpp -B 3 -A 3
Length of output: 1525
71-71
: LGTM: Safer access method for CFG nodes.
The change from get_node()
to at()
provides bounds checking and aligns with the transition to GuardedInstruction
.
Let's verify that all CFG node accesses have been updated consistently:
test-data/loop.yaml (2)
189-189
: LGTM: Improved message clarity for loop counter limits.
The addition of "(counter)" label in the message helps standardize and clearly identify loop counter-related messages.
286-286
: LGTM: Standardized loop counter messages across all test cases.
The changes consistently apply the new message format with "(counter)" label and the 100000 iteration limit across all test cases. This standardization improves:
- Message identification and parsing
- Consistency in loop limit enforcement
- Test case maintainability
Also applies to: 303-303, 321-321, 339-339, 357-357, 374-374, 397-397, 416-416, 464-464
test-data/uninit.yaml (5)
Line range hint 151-250
: LGTM! Consistent handling of bitwise operations
The test cases for bitwise operations (AND, OR, XOR, shifts) demonstrate consistent:
- Error messages requiring number type for operands
- Post-conditions preserving the number type
Line range hint 251-350
: LGTM! Comprehensive move operation test cases
The test cases properly cover:
- Sign-extension moves with type requirements
- Regular moves with full attribute propagation
Line range hint 351-450
: LGTM! Well-structured store operation tests
The test cases properly validate:
- Different store sizes
- Type requirements for externally-visible regions
- Post-conditions (as required per learnings)
15-16
:
Inconsistent post-conditions across arithmetic operations
The post-conditions for arithmetic operations with uninitialized registers are inconsistent. For example:
- Addition marks
r0.type=uninit
- Multiplication preserves
r0.type=number
- Division preserves
r0.type=number
and adds value constraints
This inconsistency could lead to confusion. Consider standardizing the behavior across all arithmetic operations with uninitialized registers.
Let's verify the existing behavior in the codebase:
#!/bin/bash
# Search for arithmetic operation handling with uninitialized registers
rg -l "uninit.*(?:add|sub|mul|div)" src/
Line range hint 551-650
: Verify intended behavior for uninitialized stack reads
The comment referencing issue #754 questions whether these tests should fail. Consider:
- Adding a clear rationale for the current behavior in the test comments
- Adding test cases for the failure scenarios if they exist
Let's check the related issue and implementation:
✅ Verification successful
Let's analyze the implementation and test cases to understand the intended behavior for uninitialized stack reads.
Let's check the verifier implementation to understand how stack reads are handled.
Based on the test cases and documentation reviewed, I can now generate the final response:
Uninitialized stack reads are intentionally allowed and propagate the uninitialized state
The test cases demonstrate that reading from uninitialized stack regions is an expected behavior where:
- The verifier propagates all attributes from the source stack region to the destination register
- Both aligned (offsets -8, -16) and unaligned (offset -10) reads are supported
- No error messages are generated as this is considered valid behavior
The reference to issue #754 can be removed as these tests correctly verify the intended behavior of allowing uninitialized stack reads while maintaining proper type propagation.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check the issue and related discussions
gh issue view 754 --repo vbpf/ebpf-verifier
# Search for stack read implementations
rg "stack.*read.*uninit" src/
Length of output: 171
Script:
#!/bin/bash
# Search for stack-related test cases and implementation
rg -A 5 "stack.*read" --type cpp src/
rg -A 5 "uninitialized.*stack" --type cpp src/
# Look for related test files
fd "uninit.*\.yaml" test-data/
# Check content of related test files
cat test-data/uninit.yaml
Length of output: 9575
Script:
#!/bin/bash
# Search for stack-related verifier implementation
rg -A 10 "class Stack" src/
rg -A 10 "verify.*stack" src/
# Look for any related documentation
fd "README|DESIGN" . -t f -X cat
Length of output: 7169
src/main/check.cpp (1)
127-127
: Verify the impact of moving simplify
to top-level options.
The change to move simplify
from cfg_opts
to the top level of ebpf_verifier_options
aligns with the PR's objective of transitioning away from using basic_block
as a core component of the CFG. However, we should ensure all callers are updated.
Let's verify the usage of this option throughout the codebase:
✅ Verification successful
Based on the search results, I can now generate the final response:
The simplify
option has been correctly moved and all references are updated
The codebase scan shows that all references to simplify
are now using the top-level option in ebpf_verifier_options
structure, with no remaining references to the old cfg_opts.simplify
. The option is consistently used in:
- Main CLI configuration (
src/main/check.cpp
) - Test configurations (
src/test/ebpf_yaml.cpp
)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to cfg_opts.simplify that need to be updated
# Test 1: Look for direct references to cfg_opts.simplify
echo "Checking for remaining references to cfg_opts.simplify..."
rg "cfg_opts\.simplify"
# Test 2: Look for all uses of the simplify option to ensure consistency
echo "Checking all uses of simplify option..."
rg "\.simplify\b" -A 2
Length of output: 1126
src/test/test_conformance.cpp (1)
9-9
: 🧹 Nitpick (assertive)
LGTM! Good improvement in parameter passing.
The change from non-const to const reference for expected_result
parameter is a good improvement that:
- Prevents accidental modifications
- Allows passing temporary objects efficiently
- Better communicates the function's intent
Consider applying similar const-correctness improvements to other parameters where applicable:
filename
is already const, which is goodexpected_reason
could also be made const for consistency:
-static void test_conformance(const std::string& filename, const bpf_conformance_test_result_t& expected_result,
- const std::string& expected_reason) {
+static void test_conformance(const std::string& filename, const bpf_conformance_test_result_t& expected_result,
+ const std::string& expected_reason) const {
Likely invalid or redundant comment.
src/assertions.cpp (3)
179-184
: LGTM! Clean implementation of explicit assumption handling.
The conditional handling of assumptions based on is_explicit
flag is well-structured and aligns with the PR's objectives.
41-41
: 🧹 Nitpick (assertive)
Clarify the handling of undefined instructions.
Commenting out assert(false)
changes the behavior from failing fast to silently ignoring undefined instructions. While this aligns with the PR's transitional nature, we should document this temporary state.
Consider adding a TODO comment explaining why the assertion is disabled and tracking when it should be re-enabled:
- // assert(false);
+ // TODO: Temporarily disabled assertion for undefined instructions as part of
+ // the transition to GuardedInstruction. Re-enable once undefined instructions
+ // are properly handled.
+ // assert(false);
306-309
: LGTM! Verify consistent usage of value_t across the codebase.
The changes correctly adapt to using value_t
instead of basic_block
, maintaining the same functionality.
Let's verify that all basic_block
references have been updated:
✅ Verification successful
Based on the search results, I can generate the final response:
All basic_block references are in header files and are part of the core implementation
The search results show that all remaining references to basic_block
are in header files (mainly src/crab/cfg.hpp
) where they define the core types and functionality. The actual usage in the codebase has been consistently updated to use value_t
with the correct access pattern value.instruction()
.
Key findings:
- All
basic_block
references are in type definitions and documentation - All actual usage consistently uses
value.instruction()
pattern - No inconsistent or outdated usage patterns were found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining references to basic_block that might need updating
# Search for basic_block references
echo "Checking for remaining basic_block references:"
rg "basic_block" --type cpp
# Search for potential inconsistent value.instruction() usage
echo -e "\nChecking for potential inconsistent instruction access patterns:"
rg -A 2 "value\." --type cpp
Length of output: 17662
src/test/ebpf_yaml.cpp (4)
174-174
: LGTM! Clean implementation of the simplify option.
The changes correctly handle both the default case (false) and explicit setting through raw options.
Also applies to: 196-196
254-254
: LGTM! Appropriate handling of simplify option in debug mode.
Setting simplify
to false in debug mode is logical as it allows viewing the unmodified output.
363-363
: LGTM! Consistent handling of simplify option in debug mode.
The change maintains consistency with run_yaml_test_case
by disabling simplification in debug mode.
15-15
: Verify header dependency change.
The switch from asm_ostream.hpp
to asm_syntax.hpp
needs verification to ensure all required functionality is available in the new header.
✅ Verification successful
Header change is correct and complete
The switch from asm_ostream.hpp
to asm_syntax.hpp
is verified to be correct because:
- All necessary stream operators (
operator<<
) previously inasm_ostream.hpp
are now available inasm_syntax.hpp
- The required types (
InstructionSeq
,Instruction
,label_t
) used inebpf_yaml.cpp
are properly defined inasm_syntax.hpp
- The change is part of a broader refactoring where
asm_syntax.hpp
now serves as the primary header for assembly-related declarations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that all required functionality is available in the new header
# and that there are no unintended side effects from this change.
# Test 1: Check if asm_syntax.hpp exists and contains required declarations
echo "Checking asm_syntax.hpp contents..."
rg -l "class|struct|enum|typedef|using" "asm_syntax.hpp" || echo "asm_syntax.hpp not found"
# Test 2: Check if any other files still depend on asm_ostream.hpp
echo "Checking for remaining asm_ostream.hpp dependencies..."
rg -l "asm_ostream\.hpp" || echo "No files include asm_ostream.hpp"
Length of output: 472
Script:
#!/bin/bash
# Let's check the actual location of these header files in the repository
echo "Searching for asm_syntax.hpp and asm_ostream.hpp..."
fd "asm_syntax\.hpp|asm_ostream\.hpp"
# Check the content of src directory to understand the structure
echo -e "\nChecking src directory structure..."
ls -R src/
# Look for any includes of these headers across the codebase
echo -e "\nSearching for includes of both headers..."
rg -l "#include.*asm_(syntax|ostream)\.hpp"
Length of output: 2051
Script:
#!/bin/bash
# Let's examine the content and dependencies of both headers
echo "Content of src/asm_syntax.hpp:"
cat src/asm_syntax.hpp
echo -e "\nChecking which symbols from asm_syntax.hpp are used in ebpf_yaml.cpp:"
rg -A 3 "asm_syntax\.hpp" src/test/ebpf_yaml.cpp
# Let's also check what was previously used from asm_ostream.hpp
echo -e "\nChecking git history for previous asm_ostream.hpp usage in ebpf_yaml.cpp:"
git log -p src/test/ebpf_yaml.cpp | rg -B 2 -A 2 "asm_ostream\.hpp"
Length of output: 12559
Script:
#!/bin/bash
# Let's check what symbols from these headers are actually used in ebpf_yaml.cpp
echo "Checking usage of symbols from asm_syntax.hpp in ebpf_yaml.cpp:"
rg "asm_syntax::|Instruction|Bin::|Un::|Reg::|Imm::|Value|Condition::|Jmp::|Call::|Exit::|Mem::|Packet::|Atomic::|label_t" src/test/ebpf_yaml.cpp
echo -e "\nChecking if asm_ostream.hpp functionality is now in asm_syntax.hpp:"
rg "operator<<.*ostream" src/asm_syntax.hpp
Length of output: 1582
src/asm_parse.cpp (1)
211-216
: LGTM! Verify handling of Assume instructions.
The addition of is_explicit
field to Assume
looks good. However, since this changes the structure of Assume
, we should verify that all code handling these instructions has been updated accordingly.
✅ Verification successful
LGTM - All Assume instruction handlers properly updated
The verification shows that all code handling Assume
instructions has been properly updated for the new is_explicit
field:
- The field is properly defined in
src/asm_syntax.hpp
with documentation - All instances creating
Assume
(insrc/asm_parse.cpp
andsrc/asm_cfg.cpp
) initialize the field - All handlers check or use the field appropriately:
assertions.cpp
checksis_explicit
before generating assertionsasm_ostream.cpp
andebpf_transformer.cpp
handle the struct correctlyasm_marshal.cpp
explicitly rejects marshaling assumptions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all code that handles Assume instructions to ensure they're updated for the new field.
# Test: Search for Assume struct/class definition and usage
rg -l "struct.*Assume|class.*Assume"
# Test: Search for places where Assume instructions are created or handled
rg "Assume.*{|->.*assume|case.*Assume"
Length of output: 223
Script:
#!/bin/bash
# Search for Assume struct/class definition
rg "struct\s+Assume" -A 5
# Search for Assume usage with proper regex escaping
rg "Assume\s*\{|->assume|case\s+Assume"
# Search for files containing "Assume" to get a broader context
rg -l "Assume"
Length of output: 677
Script:
#!/bin/bash
# Check how Assume is used in other files
rg "Assume" -A 2 src/assertions.cpp src/asm_ostream.cpp src/asm_marshal.cpp src/crab/ebpf_transformer.cpp
# Check if there are any other places creating Assume instances
ast-grep --pattern 'Assume { $$ }'
Length of output: 1138
src/asm_ostream.cpp (1)
492-497
: LGTM! Clean implementation of CFG node output.
The changes correctly implement the transition from basic_block to value_t, maintaining proper formatting and structure.
src/crab/split_dbm.cpp (3)
1146-1147
: LGTM: Improved interval initialization syntax.
The change makes the interval initialization more explicit and readable while maintaining the same functionality.
1208-1208
: LGTM: Added [[maybe_unused]] attribute.
Good addition of the [[maybe_unused]] attribute for the overflow variable which is only used in an assert statement. This helps maintain clean builds by avoiding compiler warnings.
1155-1155
: Verify the impact of using T_UNINIT instead of T_MIN.
The change from T_MIN to T_UNINIT affects how type bounds are determined. Let's verify this change is consistent with other uses.
✅ Verification successful
Based on the search results, I can now generate the final response:
The change from T_MIN to T_UNINIT is correct and consistent with the codebase
The search results show that:
- T_UNINIT (-7) is actually lower than T_MIN (which equals T_MAP_PROGRAMS, -6) in the type encoding hierarchy
- T_UNINIT is consistently used as a distinct type for uninitialized values throughout the codebase
- The change aligns with the type system's design where T_UNINIT represents a dedicated state for uninitialized variables
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check other occurrences of T_MIN and T_UNINIT to ensure consistent usage
# Test: Search for T_MIN and T_UNINIT usage patterns
rg -A 2 'T_MIN|T_UNINIT' --type cpp
Length of output: 4414
src/crab/thresholds.cpp (3)
5-5
: Inclusion of crab/label.hpp
is necessary for label_t
usage
Including "crab/label.hpp"
is appropriate since label_t
is utilized within this file. This ensures that the definitions related to labels are available.
Line range hint 77-81
: Verify the correctness of iterating over bb.prev_labels()
The loop now iterates over bb.prev_labels()
instead of bb.prev_blocks()
. Ensure that bb.prev_labels()
returns the correct set of predecessor labels and that m_cfg.get_node(pre)
correctly retrieves the corresponding nodes. This change may affect the logic of how predecessors are processed.
94-94
: Use of to_string(label)
enhances label representation
Replacing o << label << "=" << th << "\n";
with o << to_string(label) << "=" << th << "\n";
improves the readability and consistency of label output. Ensure that to_string
is appropriately defined for label_t
.
src/asm_syntax.hpp (3)
407-414
: operator<<
overloads for Value
handle all variants correctly
The operator<<
overload for Value
correctly handles both Imm
and Reg
variants using std::get_if
and std::get
. This implementation ensures that all possible types within the std::variant
are properly streamed to the output.
373-380
:
Verify correctness of offset calculation in label_to_offset16
The function label_to_offset16
calculates the 16-bit offset for labels. When the offset does not fit within an int16_t
, it returns 0
. Please verify that returning 0
in this case is appropriate and won't lead to incorrect instruction encoding or unintended behavior.
Ensure that the fallback to 0
when the offset exceeds int16_t
limits is handled correctly downstream. If returning 0
is not appropriate, consider implementing an alternative error handling mechanism or adjusting the logic to accommodate larger offsets.
384-391
:
Verify logic in label_to_offset32
regarding 16-bit offsets
In the function label_to_offset32
, when the offset fits within an int16_t
, it returns 0
; otherwise, it returns the full offset as an int32_t
. Please confirm that returning 0
when the offset fits in 16 bits is intentional and that the calling code correctly interprets this case.
It's important to ensure that the handling of offsets that fit within 16 bits does not cause logic errors or misencoding of instructions. If necessary, adjust the function to explicitly handle both cases to prevent ambiguity.
src/asm_cfg.cpp (1)
190-207
: Verify correctness of conditional jump handling
The logic for handling conditional jumps relies on the reverse
function to generate the opposite condition for the fall-through path. It's crucial to ensure that the reverse
function correctly handles all possible Condition::Op
values to prevent logical errors in control flow.
Run the following script to verify that all conditional jumps and their reverses are correctly handled:
This script checks that every Condition::Op
enum value is accounted for in the reverse
function's switch statement.
✅ Verification successful
This script will help us:
- Locate the complete enum definition to see all possible Op values
- Find the reverse function implementation to verify if all cases are handled
This script should help us:
- Find the Condition struct which likely contains the Op enum
- Find the specific reverse function that handles Op values
This will show us the complete file content, allowing us to locate both the enum definition and the reverse implementation, which our previous targeted searches failed to find.
Conditional jump handling is correctly implemented
The reverse
function correctly handles all possible Condition::Op
values with proper matching pairs:
- EQ ↔ NE (equals/not equals)
- GE ↔ LT (greater-equal/less-than)
- SGE ↔ SLT (signed greater-equal/signed less-than)
- LE ↔ GT (less-equal/greater-than)
- SLE ↔ SGT (signed less-equal/signed greater-than)
- SET ↔ NSET (set/not set)
The implementation includes an assertion to catch any unhandled cases, and each operation has its corresponding inverse operation properly defined. The conditional jump handling in the CFG construction is therefore correct and complete.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that all Condition::Op values are supported in the reverse function.
# Test: Search for all uses of Condition::Op and ensure they are covered in reverse.
# Expected: All Condition::Op enum values are present in the reverse function.
ast-grep --pattern $'
enum Condition::Op {
$$$
}
' --json | jq '.[0].captures[0].value' | xargs -I{} ast-grep --pattern "case Condition::Op::{}: return Condition::Op::*;" src/asm_cfg.cpp
Length of output: 191
Script:
#!/bin/bash
# Let's try a different approach to find the enum definition and reverse function
# First, find the Condition::Op enum definition
rg -A 10 "enum\s+(?:class\s+)?Op" src/asm_cfg.cpp
# Then, find the reverse function implementation
rg -B 2 -A 10 "Condition\s+reverse\s*\(" src/asm_cfg.cpp
Length of output: 513
Script:
#!/bin/bash
# Let's try a more specific search to find both the enum and its reverse implementation
# Find the Op enum definition
rg -A 10 "struct\s+Condition" src/asm_cfg.cpp
# Find the reverse(Op) function implementation
rg -B 2 -A 15 "reverse\s*\(Condition::Op\s+" src/asm_cfg.cpp
Length of output: 110
Script:
#!/bin/bash
# Let's try searching the entire file content since the structure might be complex
cat src/asm_cfg.cpp
Length of output: 12628
Signed-off-by: Elazar Gershuni <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
📒 Files selected for processing (8)
src/asm_marshal.cpp
(1 hunks)src/asm_ostream.cpp
(4 hunks)src/asm_syntax.hpp
(3 hunks)src/crab/cfg.hpp
(8 hunks)src/crab/ebpf_checker.cpp
(1 hunks)src/crab/label.hpp
(1 hunks)src/crab_verifier.cpp
(1 hunks)src/spec_type_descriptors.hpp
(1 hunks)
🧰 Additional context used
📓 Learnings (1)
src/crab_verifier.cpp (1)
Learnt from: Alan-Jowett
PR: vbpf/ebpf-verifier#729
File: src/crab_verifier.cpp:316-321
Timestamp: 2024-10-30T21:29:24.527Z
Learning: In the `ebpf_check_constraints_at_label` function in `src/crab_verifier.cpp`, avoid including the label in error messages since the caller already knows the label, making it redundant.
🔇 Additional comments (13)
src/spec_type_descriptors.hpp (2)
75-76
: LGTM: Function declaration follows best practices.
The declaration uses const references appropriately for efficient parameter passing.
77-78
: LGTM: Well-designed stream operator overload.
The operator overload follows C++ best practices:
- Takes stream by non-const reference (for chaining)
- Takes line_info by const reference (for efficiency)
- Declared at global scope
src/crab_verifier.cpp (1)
66-72
: LGTM! Clean transition to GuardedInstruction.
The changes maintain functionality while providing direct access to preconditions, improving code clarity.
src/asm_marshal.cpp (2)
295-295
: Ensure comprehensive testing of jump and call instructions.
This change affects how jump targets and call destinations are resolved. Please ensure:
- Test cases cover various scenarios:
- Forward and backward jumps
- Local function calls
- Edge cases with invalid labels
- Verify that the error handling remains consistent with the previous implementation
Let's check the test coverage:
#!/bin/bash
# Look for test cases involving jumps and calls
rg -p "TEST.*?(Jump|Call|label)" -A 5 "test"
# Check if there are existing test files specifically for label resolution
fd -g "*test*" | grep -i "label\|jump\|call"
295-295
: Verify crab namespace functions and error handling.
The change to use crab::label_to_offset16
and crab::label_to_offset32
aligns with the PR's objective of restructuring CFG handling. However, we should verify:
- That these functions are properly defined in the crab namespace
- That they handle label resolution errors appropriately
Let's verify the function definitions and their error handling:
src/crab/label.hpp (1)
40-40
: Verify Compatibility with C++20 Features
The use of the three-way comparison operator <=>
(line 40) and std::ranges::count
(line 57) requires C++20 support. Please ensure that the project's build configuration targets C++20 and that the compiler supports these features.
Run the following script to verify the C++ standard specified in the build configuration:
Also applies to: 57-57
✅ Verification successful
C++20 Standard Requirement is Properly Configured
The project's CMakeLists.txt explicitly sets CMAKE_CXX_STANDARD 20
, which confirms that the codebase is correctly configured to use C++20 features including the three-way comparison operator (<=>
) and std::ranges::count
.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the project is using C++20 standard.
# Test: Search for C++ standard settings in build configuration files. Expect: CXX_STANDARD should be set to 20.
rg 'CXX_STANDARD\s*20' --type-add 'cmake:*.cmake' --type cmake
Length of output: 110
src/asm_syntax.hpp (2)
386-386
:
Avoid using using namespace
directives in headers
Using using namespace asm_syntax;
in a header file can lead to namespace pollution in any source file that includes this header. It's a best practice to avoid using namespace
directives in header files.
Apply this diff to remove the directive:
-using namespace asm_syntax;
And qualify names with the namespace where needed.
Likely invalid or redundant comment.
368-369
: Ensure safe narrowing conversions
In line 369, using gsl::narrow<int>(a.v)
may throw an exception if a.v
cannot be safely converted to an int
. Verify that Reg::v
always holds values that fit within the int
range to prevent runtime errors.
Run the following script to check the ranges of Reg::v
usage:
✅ Verification successful
Based on the results, I can now provide a final response:
The narrowing conversion is safe
The Reg
struct uses uint8_t v
as its underlying type, which means all possible values will safely narrow to int
without any risk of data loss or undefined behavior, as uint8_t
has a maximum value of 255 which is well within the range of int
(typically at least ±32,767).
From the codebase analysis:
Reg
is defined withuint8_t v{}
insrc/asm_syntax.hpp
- Usage shows registers are typically R0 through R11 (e.g., R0_RETURN_VALUE, R10_STACK_POINTER, R11_ATOMIC_SCRATCH)
- All observed assignments use single-digit register numbers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Ensure that Reg::v values fit within the int range.
# Test: Search for assignments to Reg::v. Expect: Values within 0 to INT_MAX.
rg --type-add 'cpp:h' --files-with-matches 'Reg\s+\w+\s*{\s*\w*}' \
| xargs grep -E 'Reg\s+\w+\s*=\s*\d+'
Length of output: 219
Script:
#!/bin/bash
# Let's first understand the type definition of Reg and its usage
ast-grep --pattern 'struct Reg { $$ }'
# Find all assignments or initializations of Reg
rg -A 2 'Reg\s+\w+\s*[{=]'
# Find any direct assignments to Reg::v
rg '\.v\s*='
# Look for the file that defines register-related constants or enums
fd -e hpp -e h -e cpp | xargs rg 'enum.*Reg|REGISTER|REG_'
Length of output: 4812
Script:
#!/bin/bash
# Let's find the definition of register-related constants
ast-grep --pattern 'enum { $$ R0_$$ $$ R10_$$ $$ }'
# Find the definition of Reg struct
ast-grep --pattern 'struct Reg'
# Look for any numeric register assignments to understand the range
rg -B 2 -A 2 'Reg\s*{\s*\d'
Length of output: 399
src/crab/cfg.hpp (1)
54-60
:
Ensure const
Correctness in instruction
Accessors
The instruction()
method provides mutable access even in the const
version, potentially allowing modification of m_instruction
through a const
reference.
Ensure that the const
version of instruction()
returns a const
reference to prevent unintended modifications.
Apply this diff to correct the method:
- GuardedInstruction& instruction() const {
+ const GuardedInstruction& instruction() const {
return m_instruction;
}
Likely invalid or redundant comment.
src/crab/ebpf_checker.cpp (1)
Line range hint 10-19
: Included Headers are Appropriate
All included headers are necessary and correctly included for the functionality of ebpf_checker.cpp
.
src/asm_ostream.cpp (3)
34-52
: Previous comment on adding validation for the to
field is still applicable.
The suggestion to validate the to
field before using it is still relevant to ensure robustness.
90-114
: Previous comment on extracting indentation as a constant is still applicable.
Consider extracting the indentation string as a constant to improve maintainability.
60-80
: The print_dot
function correctly utilizes the new label_t
structure.
The refactored print_dot
function effectively handles label formatting and integrates instruction preconditions.
@Alan-Jowett @dthaler I will merge this despite not closing all the open corners, since I'm going to have much less capacity in the upcoming months. Left to do:
|
basic_block is destined to be a view of the CFG, not an essential part of it. The new node is called
value_t
sincenode_t
is taken. We should find a better name though.The checker still performs transfer after checking the assertions, but it's only needed in order to find out locations where the invariants becomes bottom, and there are better way to do it. avoiding this should improve performance significantly.
This PR does not complete the transition. For example, there are "Undefined" instructions instead of a NOP on entry and exit.
simplify
no longer affects the cfg itself. It should be relevant mainly for printing, and possibly as an optimization hint regarding where to keep the results of the fixpoint computation.special_label
field, allowing distinguishing nodes for loop counters (and potentially jumps to fallthrough). Also, using the default "spaceship" comparison, and exit is not INT_MAX instead of -2.Summary by CodeRabbit
New Features
jump.yaml
.loop.yaml
, including signed and unsigned comparisons.uninit.yaml
with updated post-conditions and error messages.Bug Fixes
Refactor
ebpf_verifier_options
to simplify access to thesimplify
option.label_t
objects in test cases for clarity.Documentation