CNDB-12154: Port latest commits from main to main-5.0 #1467

djatnieks · 2024-12-18T00:28:54Z

What is the issue

The main-5.0 branch needs updating with the latest commits from main

What does this PR fix and why was it fixed

Ports the latest commits from main to main-5.0.

Checklist before you submit for review

Make sure there is a PR in the CNDB project updating the Converged Cassandra version
Use NoSpamLogger for log lines that may appear frequently in the logs
Verify test results on Butler
Test coverage for new/modified code is > 80%
Proper code formatting
Proper title for each commit staring with the project-issue number, like CNDB-1234
Each commit has a meaningful description
Each commit is not very long and contains related changes
Renames, moves and reformatting are in distinct commits

djatnieks · 2024-12-18T19:25:29Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

+     * @return the accumulated BTree size of the data contained in this update.
+     */
+    long accumulatedDataSize();
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:25:36Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

    /**
     * The size of the data contained in this update.
     *
     * @return the size of the data contained in this update.
     */
    int dataSize();

+    // FIXME review
+    long unsharedHeapSize();
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:26:11Z

src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java

+    public long accumulatedDataSize()
+    {
+        return dataSize;
+    }


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:26:17Z

src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java

+    public long unsharedHeapSize()
+    {
+        return dataSize;
+    }


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

Here's one possible implementation:

Subject: [PATCH] CNDB-11499: Fix incorrect thread names in CompactionControllerTest --- Index: src/java/org/apache/cassandra/db/partitions/TrieBackedPartition.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/src/java/org/apache/cassandra/db/partitions/TrieBackedPartition.java b/src/java/org/apache/cassandra/db/partitions/TrieBackedPartition.java --- a/src/java/org/apache/cassandra/db/partitions/TrieBackedPartition.java (revision 4f494eecc511ef7e5c77a6d93cc9b265c3e5ae3a) +++ b/src/java/org/apache/cassandra/db/partitions/TrieBackedPartition.java (date 1736169386706) @@ -144,6 +144,16 @@ return BTree.accumulate(columnsBTree, (ColumnData cd, long v) -> v + cd.unsharedHeapSizeExcludingData(), heapSize); } + public long unsharedHeapSize() + { + long heapSize = EMPTY_ROWDATA_SIZE + + BTree.sizeOfStructureOnHeap(columnsBTree) + + livenessInfo.unsharedHeapSize() + + deletion.unsharedHeapSize(); + + return BTree.accumulate(columnsBTree, (ColumnData cd, long v) -> v + cd.unsharedHeapSize(), heapSize); + } + public String toString() { return "row " + livenessInfo + " size " + dataSize(); Index: src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java b/src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java --- a/src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java (revision 4f494eecc511ef7e5c77a6d93cc9b265c3e5ae3a) +++ b/src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java (date 1736169423008) @@ -334,7 +334,17 @@ @Override public long unsharedHeapSize() { - return dataSize; + assert trie instanceof InMemoryTrie; + InMemoryTrie<Object> inMemoryTrie = (InMemoryTrie<Object>) trie; + long heapSize = inMemoryTrie.usedSizeOnHeap(); + for (Object o : inMemoryTrie.values()) + { + if (o instanceof RowData) + heapSize += ((RowData) o).unsharedHeapSize(); + else + heapSize += ((DeletionInfo) o).unsharedHeapSize(); + } + return heapSize; } /**

djatnieks · 2024-12-18T19:26:28Z

src/java/org/apache/cassandra/db/partitions/BTreePartitionUpdate.java

+    {
+        return BTree.<Row>accumulate(holder.tree, (row, value) -> row.dataSize() + value, 0L);
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

accumulatedDataSize doesn't seem to be present in 5.0, only dataSize (with an implementation that properly adds the static row and deletion sizes)?

Oh, you're right.

I overlooked that I actually added accumulatedSize in an attempt to fix a test in CNDB-11010.

Maybe the real problem with that test was that main already uses TrieMemtable by default and main-5.0 has been using SkipListMemtable? If that's the case, then the "fix" made was so the test would pass under that scenario, and if TrieMemtable is used then it won't be necessary at all?

Anyway accumulatedSize is only used in that SensorsWriteTest.testMultipleRowsMutationWithClusteringKey test, so it's probably best to just get rid of accumulatedSize here and revisit that test later if it fails again.

Note that here, on branch CNDB-12154, the default in MemtableParams is still SkipListMemtable. I opened https://github.com/riptano/cndb/issues/12290 to change this to TrieMemtable.

djatnieks · 2024-12-18T19:26:33Z

src/java/org/apache/cassandra/db/partitions/BTreePartitionUpdate.java

+        return BTree.<Row>accumulate(holder.tree, (row, value) -> row.unsharedHeapSize() + value, 0L)
+                + holder.staticRow.unsharedHeapSize();
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

We should account for the deletion info as well.

djatnieks · 2024-12-18T19:29:46Z

src/java/org/apache/cassandra/db/partitions/BTreePartitionUpdate.java

+        RegularAndStaticColumns columns = RegularAndStaticColumns.builder().addAll(columnSet).build();
+        return new BTreePartitionUpdate(metadata, partitionKey, holder.withColumns(columns), deletionInfo.mutableCopy(), false);
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:33:53Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

+
+        return count;
+    }
+


For Review:
affectedRowCount and affectedColumnCount were both present in the 5.0 PartitionUpdate and here made default methods when PartitionUpdate became an interface.

djatnieks · 2024-12-18T19:34:04Z

src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

@@ -188,7 +199,18 @@ static PartitionUpdate merge(List<? extends PartitionUpdate> updates)
        return updates.get(0).metadata().partitionUpdateFactory().merge(updates);
    }

-    public static SimpleBuilder simpleBuilder(TableMetadata metadata, Object... partitionKeyValues)
+    PartitionUpdate withOnlyPresentColumns();


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

djatnieks · 2024-12-18T19:34:56Z

src/java/org/apache/cassandra/db/partitions/TriePartitionUpdate.java

+        RegularAndStaticColumns columns = RegularAndStaticColumns.builder().addAll(columnSet).build();
+        return new TriePartitionUpdate(metadata, partitionKey, columns, stats, rowCountIncludingStatic, dataSize, trie, false);
+    }
+


For Review:
withOnlyPresentColumns , accumulatedDataSize and unsharedHeapSize are coming from 5.0 before PartitionUpdate became an interface.

The implementations have moved to BTreePartitionUpdate and TriePartitionUpdate.

This looks good.

djatnieks · 2024-12-18T19:38:02Z

test/distributed/org/apache/cassandra/distributed/test/FrozenUDTTest.java

+                                           1, 1);
+            cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) VALUES (?, " + json(1, 2) + ", ? )", ConsistencyLevel.ALL,
+                                           2, 2);
+            cluster.get(2).flush(KEYSPACE);


For Review:

iirc, the changes made here did not work for main-5.0 and since CASSANDRA-19764 seems to still be in-progress, I reverted the change made by CNDB-9850 with the idea that when CASSANDRA-19764 lands in Apache it will also be reflected in main-5.0 after a subsequent rebase.

With TCM it is no longer possible to get into this state, but in 5.0 it still is. Changing the test to do

@Test public void testDivergentSchemas() throws Throwable { try (Cluster cluster = init(Cluster.create(2))) { cluster.schemaChange("create type " + KEYSPACE + ".a (foo text)"); cluster.schemaChange("create table " + KEYSPACE + ".x (id int, ck frozen<a>, i int, primary key (id, ck))"); cluster.get(1).executeInternal("alter type " + KEYSPACE + ".a add bar text"); cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) VALUES (?, " + json(1, 2) + ", ? )", ConsistencyLevel.ALL, 1, 2); cluster.coordinator(1).execute("insert into " + KEYSPACE + ".x (id, ck, i) VALUES (?, " + json(1, 1) + ", ? )", ConsistencyLevel.ALL, 1, 1); cluster.get(2).flush(KEYSPACE); res1 = cluster.coordinator(1).execute("select i from " + KEYSPACE + ".x WHERE id = 1", ConsistencyLevel.ALL); res2 = cluster.coordinator(2).execute("select i from " + KEYSPACE + ".x WHERE id = 1", ConsistencyLevel.ALL); Assert.assertArrayEquals(res1, res2); } }

fails with corruption.

I don't know what changed to mute the exception we got when flushing in main. Is this running with the latest trie memtable implementation?

djatnieks · 2024-12-18T19:41:56Z

test/microbench/org/apache/cassandra/test/microbench/instance/SimpleTableWriter.java

+    int rowsPerPartition = 1;
+
+    int partitions;
+


For Review:

These changes in this class are coming from WriteTest made in CNDB-9850

djatnieks · 2024-12-18T19:44:56Z

test/microbench/org/apache/cassandra/test/microbench/instance/WriteBench.java

        }
    }

    public Object[] writeArguments(long i)
    {
-        return new Object[] { i, i, i };
+        return new Object[] { i % partitions, i, i };


I think this was just missed during first pass of cherry-pick from main

djatnieks · 2024-12-18T19:45:43Z

test/unit/org/apache/cassandra/db/memtable/MemtableQuickTest.java

-                                "trie");
+                                "trie",
+                                "trie_stage1",
+                                "persistent_memory");


For Review:

Trying to align with main - just want to double check.

djatnieks · 2024-12-18T19:45:52Z

test/unit/org/apache/cassandra/db/memtable/MemtableSizeTestBase.java

@@ -79,6 +79,7 @@ public static List<Object> parameters()
    {
        return ImmutableList.of("skiplist",
                                "skiplist_sharded",
+                                "trie_stage1",
                                "trie");
    }


For Review:

Trying to align with main - just want to double check.

djatnieks · 2024-12-18T19:47:27Z

test/unit/org/apache/cassandra/db/tries/TrieToDotTest.java


 public class TrieToDotTest
 {
    @Test
    public void testToDotContent() throws Exception
    {
-        InMemoryTrie<String> trie = new InMemoryTrie<>(BufferType.OFF_HEAP);
+        InMemoryTrie<Object> trie = new InMemoryTrie<>(ByteComparable.Version.OSS50, BufferType.OFF_HEAP, InMemoryTrie.ExpectedLifetime.LONG, null);


For Review:

Double check param values used here.

It doesn't matter here, could be just InMemoryTrie.shortLived(ByteComparable.Version.OSS50).

These two tests are not really testing anything, just showing how to generate dot/mermaid graphs from a trie.

djatnieks · 2024-12-18T19:47:32Z

test/unit/org/apache/cassandra/db/tries/TrieToMermaidTest.java


 public class TrieToMermaidTest
 {
    @Test
    public void testToMermaidContent() throws Exception
    {
-        InMemoryTrie<String> trie = new InMemoryTrie<>(BufferType.OFF_HEAP);
+        InMemoryTrie<Object> trie = new InMemoryTrie<>(ByteComparable.Version.OSS50, BufferType.OFF_HEAP, InMemoryTrie.ExpectedLifetime.LONG, null);


For Review:

Double check param values used here.

As above, use shortLived.

djatnieks · 2024-12-18T19:48:24Z

test/conf/cassandra.yaml

        trie:
            class_name: TrieMemtable
            parameters:
                shards: 4
+        trie_stage1:
+            class_name: TrieMemtableStage1
        skiplist_sharded:
            class_name: ShardedSkipListMemtable
            parameters:


For Review:

Aligning defined memtable types with main for use in MemtableQuickTest and MemtableSizeTestBase

djatnieks · 2024-12-18T20:48:46Z

test/unit/org/apache/cassandra/io/sstable/format/bti/RowIndexTest.java

+        return Arrays.asList(new Object[]{ Config.DiskAccessMode.standard, latestVersion },
+                             new Object[]{ Config.DiskAccessMode.mmap, latestVersion },
+                             new Object[]{ Config.DiskAccessMode.standard, legacyVersion },
+                             new Object[]{ Config.DiskAccessMode.mmap, legacyVersion });
    }


For Review:

In 5.0 RowIndexWriter uses o.a.c.io.sstable.Version, while main is using ByteComparable.Version.

I've kept the 5.0 io.sstableVersion here.

latestVersion and legacyVersion are defined as:

private static final Version latestVersion = new BtiFormat(null).getLatestVersion(); private static final Version legacyVersion = new BtiFormat(null).getVersion("aa");

I would change these explicitly to "da", "ca", and "aa" to test all three ByteComparable versions.

Okay, I'll do that

djatnieks · 2024-12-18T20:52:51Z

src/java/org/apache/cassandra/io/sstable/format/bti/BtiFormat.java

@@ -362,7 +362,7 @@ static class BtiVersion extends Version

            isLatestVersion = version.compareTo(current_version) == 0;
            correspondingMessagingVersion = MessagingService.VERSION_50;
-            byteComparableVersion = version.compareTo("ca") >= 0 ? ByteComparable.Version.OSS50 : ByteComparable.Version.LEGACY;
+            byteComparableVersion = version.compareTo("ca") >= 0 ? ByteComparable.Version.OSS41 : ByteComparable.Version.LEGACY;
            hasOldBfFormat = aOrLater && !bOrLater;


For Review:

Most tests pass using OSS41, while many fail using OSS50, so I decided to keep the alignment with main.

The DSE versions (pre-cx) should use legacy, main-based versions (cx) should be on OSS41, and main-5.0-based ones (dx) should be on OSS50.

Doh, of course, that makes sense.

Something like:

byteComparableVersion = version.compareTo("da") >= 0 ? ByteComparable.Version.OSS50 : version.compareTo("ca") >= 0 ? ByteComparable.Version.OSS41 : ByteComparable.Version.LEGACY;

djatnieks · 2024-12-18T21:09:27Z

test/unit/org/apache/cassandra/db/compaction/CompactionControllerTest.java

+                       "    org.apache.cassandra.db.compaction.CompactionControllerTest.memtableRaceFinishLatch, " +
+                       "    5, java.util.concurrent.TimeUnit.SECONDS);")
+    })
+    public void testMemtableRace() throws Exception


For Review:

CompactionControllerTest.testMemtableRace fails.

Changing MemtableParams default memtable factory from SkipListMemtableFactory to DefaultMemtableFactory (TrieMemtable) will pass testMemtableRace IF RUN BY ITSELF. If the entire CompactionControllerTest is run in IntelliJ then it still fails. Not sure what's happening.

I opened https://github.com/riptano/cndb/issues/12218 for this test failure

djatnieks · 2024-12-18T22:23:44Z

test/unit/org/apache/cassandra/db/memtable/MemtableThreadedTest.java

+///
+/// A problem with this will only appear as intermittent failures, never treat this test as flaky.
+@RunWith(Parameterized.class)
+public class MemtableThreadedTest extends CQLTester


For Review:
Running locally, MemtableThreadedTest fails (hangs) on command line, but passes in Intellij.
In CI (link above), MemtableThreadedTest.testConsistentUpdates[PersistentMemoryMemtable] fails with a Timeout.

I opened https://github.com/riptano/cndb/issues/12218 for this test failure

test/unit/org/apache/cassandra/index/sai/memory/AbstractTrieMemtableIndexTest.java

blambov · 2025-01-06T10:04:29Z

src/java/org/apache/cassandra/db/memtable/DefaultMemtableFactory.java

 /**
 * This class exists solely to avoid initialization of the default memtable class.
 * Some tests want to setup table parameters before initializing DatabaseDescriptor -- this allows them to do so.
 */
-public class DefaultMemtableFactory
+public class DefaultMemtableFactory implements Memtable.Factory


It seems I forgot in CNDB-9850 that the OSS branch has a better solution to the problem this is meant to solve: extract the factory outside the memtable class, so that it no longer has a dependency on the memtable class being constructed. On the OSS branch that's the SkipListMemtableFactory, here we can do the same moving TrieMemtable.Factory to top level as TrieMemtableFactory, and then referencing its INSTANCE from the default params no longer involves initializing TrieMemtable.

This avoids having to reproduce every method of Memtable.Factory here which we can easily forget to do.

Ok, I'll do that

cassci-bot · 2025-01-16T03:37:44Z

❌ Build ds-cassandra-pr-gate/PR-1467 rejected by Butler

62 new test failure(s) in 14 builds
See build details here

Found 62 new test failures

Showing only first 15 new test failures

Test	Explanation	Branch history
o.a.c.d.c.CompactionControllerTest.testMemtable...	regression	🔴🔴🔴🔴🔴🔴🔴
...adCommitLogAndSSTablesWithDroppedColumnTestCC40	regression	🔴🔴🔴🔴🔴🔴🔴
...adCommitLogAndSSTablesWithDroppedColumnTestCC50	regression	🔴🔴🔴🔴🔴🔴🔴
...sponseDoesNotLogTest.dispatcherErrorDoesNotLock	regression	🔴🔴🔴🔴🔴🔴🔴
...ersionTest.v4ConnectionCleansUpThreadLocalState	regression	🔴🔴🔴🔴🔴🔴🔴
...eadSizeWarningTest.warnThresholdSinglePartition	regression	🔴🔴🔴🔴🔴🔴🔴
...Test.warnThresholdSinglePartitionWithReadRepair	regression	🔴🔴🔴🔴🔴🔴🔴
...lIndexImplementationsTest.testDisjunction[SASI]	regression	🔴🔴🔴🔴🔴🔴🔴
o.a.c.i.c.CompressionMetadataTest.testMemoryIsF...	regression	🔴🔴🔴🔴🔴🔴🔴
....i.c.CompressionMetadataTest.testMemoryIsShared	regression	🔴🔴🔴🔴🔴🔴🔴
...ientRequestMetricsLatenciesTest.testReadMetrics	regression	🔴🔴🔴🔴🔴🔴🔴
...entRequestMetricsLatenciesTest.testWriteMetrics	regression	🔴🔴🔴🔴🔴🔴🔴
...ntIrWithPreviewFuzzTest.concurrentIrWithPreview	regression	🔴🔴🔴🔴🔴🔴🔴
o.a.c.t.SSTablePartitionsTest.testMinRows	regression	🔴🔴🔴🔴🔴🔴🔴
o.a.c.t.TransportTest.testAsyncTransport	regression	🔴🔴🔴🔴🔴🔴🔴

Found 37 known test failures

This additional metadata is somewhat valuable in the context of troubleshooting. Recently, we had an issue where the checksum itself was not (over)written and so it was stored as 0. In many cases, this won't be helpful, but since it is cheap and could be helpful, I propose adding some additional metadata when checksums don't match.

* Implement FSError#getMessage to ensure file name is logged For this code block: ```java var t = new FSWriteError(new IOException("Test failure"), new File("my", "file")); logger.error("error", t); ``` We used to log: ``` ERROR [main] 2024-09-19 11:09:18,599 VectorTypeTest.java:118 - error org.apache.cassandra.io.FSWriteError: java.io.IOException: Test failure at org.apache.cassandra.index.sai.cql.VectorTypeTest.endToEndTest(VectorTypeTest.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) Caused by: java.io.IOException: Test failure ... 42 common frames omitted ``` Now we will log: ``` ERROR [main] 2024-09-19 11:10:02,910 VectorTypeTest.java:118 - error org.apache.cassandra.io.FSWriteError: my/file at org.apache.cassandra.index.sai.cql.VectorTypeTest.endToEndTest(VectorTypeTest.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) Caused by: java.io.IOException: Test failure ... 42 common frames omitted ``` * Add super.getMessage to message

…strictions (#1449) Closes riptano/cndb#12139 This PR adds a test of row count of a SAI plan in the presence of restrictions. Currently it tests queries with inequality, equality and half-ranges on different SAI column types and with or without histograms.

…g VIntOutOfRangeException to the catch block of SSTableIdentityIterator.exhaust method.

…pactionProgress to return the operation type from the first entry in the shared progress; needed in cases that a CompactionTask type is changed after creation.

…opriate (#1469) Fixes riptano/cndb#12239 We found the System.nanoTime was using significant cpu cost, but because the timeout is high enough, we can accept the inaccuracy. - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits

…miss) (#1475)

Fixes regression in jvector 3.0.4 when compacting PQVectors larger than 2GB

### What is the issue SimpleClientPerfTest has been failing in CI since changes from CNDB-10759 ### What does this PR fix and why was it fixed This change in `SimpleClientPerfTest`, updates the anonymous class `Message.Codec<QueryMessage>` to override the correct method, `public CompletableFuture<Response> maybeExecuteAsync` from `QueryMessage`, whose signature was changed as part of CNDB-10759. ### Checklist before you submit for review - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits

…ing for async batchlog removal (#1485) The test asserts that the batchlog is removed immediately after the write completes, but removal of the batchlog is async and can be delayed, particularly in resource-limited environments like CI.

The test generates partition index accesses by reusing the same key, and if the key cache is enabled, the test will fail for bigtable profiles because the key will be in the key cache.

…by filtering queries (#1484) Queries creating fake index contexts each create their own context, which can then race on metric registration (as the metrics have the same patterns). This can cause a query to fail. These metrics are superfluous, we can skip creating them entirely.

… index format version 'dx', 'cx', or older.

…ate.accumulatedDataSize; it only worked to fix SensorsWriteTest.testMultipleRowsMutationWithClusteringKey for SkipListMemtable and may not be necessary if the default memtable becomes TrieMemtable. Revisit SensorsWriteTest later if necessary.

Move static class TrieMemtable.Factory to TrieMemtableFactory class; Use suggested TriePartitionUpdate.unsharedHeapSize implementation; Use InMemoryTrie.shortLived in TrieToDotTest and TrieToMermaidTest; Add specific versions aa, ca, and da to RowIndexTest;

Add addMemoryUsageTo in SkipListMemtable and TrieMemtable Add TrieMemtable.switchOut

Enqueue start time will now be correctly measured for every API for task execution Enqueue start times are measured at a wrong moment for some APIs. We now measure the enqueue start time as a time of entry to each task execution API. It was fixed so that we see the correct enqueue times in metrics

… while updating metrics (#1482) a JVM system property is read and parsed on the hotpath Cache the value on a constant Co-authored-by: Joel Knighton <[email protected]>

#1295) Addresses: riptano/cndb#8501 CNDB PR: riptano/cndb#11681 The idea is to expose sensors files to CQL client via the native protocol's custom payload (which is available in V4 or higher). This first iteration add `READ_BYTES` and `WRITE_BYTES`.

…for tidiers to run (#1498) This test relies on file system state being blank, so truncating isn't enough unless we wait for tidiers to run.

QueryView#build fell into an infinite loop in January certification which led to query timeouts. This PR fundamentally changes the algorithm used in QueryView#build. Instead of matching sstables and memtables to indexes we know, we go in the opposite direction now: we find a corresponding index for each sstable and memtable. That simplifies the code and reduces the need for retrying. Now we only need to retry the memtable index lookup failures because memtables are not refcounted explicitly like sstables so we can't prevent their concurrent release. But even retrying after a memtable index lookup failure is limited to once per memtable now. We don't have to retry sstable index lookups, because as long as we hold the refViewFragment with the referenced sstables, their indexes won't be released, so the lookups should never fail.

…eComparable (#1504) ### What is the issue Fixes: riptano/cndb#12445 ### What does this PR fix and why was it fixed Appears to be a regression introduced by 338902c or #1177 The general issue is that SAI uses `OSS41` to encode its primary keys in the trie index, but this code path attempted to interpret the clustering columns using `OSS50`.

…ery (#1501)

sonarqubecloud · 2025-01-17T22:23:09Z

Quality Gate passed

Issues
48 New issues
0 Accepted issues

Measures
0 Security Hotspots
85.7% Coverage on New Code
0.9% Duplication on New Code

See analysis details on SonarQube Cloud

djatnieks commented Dec 18, 2024

View reviewed changes

ekaterinadimitrova2 reviewed Dec 19, 2024

View reviewed changes

test/unit/org/apache/cassandra/index/sai/memory/AbstractTrieMemtableIndexTest.java Outdated Show resolved Hide resolved

djatnieks marked this pull request as ready for review December 19, 2024 23:53

djatnieks changed the title ~~CNDB-12154~~ CNDB-12154: Port latest commits from main to main-5.0 Dec 19, 2024

blambov reviewed Jan 6, 2025

View reviewed changes

djatnieks force-pushed the CNDB-12154 branch from 26a3edb to 59740b7 Compare January 15, 2025 02:42

michaeljmarshall added 2 commits January 17, 2025 12:57

k-rus and others added 26 commits January 17, 2025 12:57

CNDB-12222 Fix flaky tests in SSTableCorruptionDetectionTest by addin…

8083e00

…g VIntOutOfRangeException to the catch block of SSTableIdentityIterator.exhaust method.

CNDB-12223 Fix CorruptedSSTablesCompactionTest by modifying SharedCom…

0d19d5b

…pactionProgress to return the operation type from the first entry in the shared progress; needed in cases that a CompactionTask type is changed after creation.

CNDB-12257: do not call System.getProperty on an hotpath (ChunkCache …

572fac8

…miss) (#1475)

CNDB-12277: Upgrade to jvector 3.0.6 (#1478)

39f3a46

Fixes regression in jvector 3.0.4 when compacting PQVectors larger than 2GB

Improve typed reads in RandomAccessReader

7c1e605

CNDB-11437: Disable key cache in LazyBloomFilterTest

a5ed9c3

The test generates partition index accesses by reusing the same key, and if the key cache is enabled, the test will fail for bigtable profiles because the key will be in the key cache.

CNDB-9850 followup: BtiFormat set byteComparableVersion based on trie…

687bd47

… index format version 'dx', 'cx', or older.

CNDB-9850 followup: more alignment with main branch

10975f7

Add addMemoryUsageTo in SkipListMemtable and TrieMemtable Add TrieMemtable.switchOut

CNDB-12289: Cache USE_DSE_COMPATIBLE_HISTOGRAM_BOUNDARIES to save cpu…

b5971bd

… while updating metrics (#1482) a JVM system property is read and parsed on the hotpath Cache the value on a constant Co-authored-by: Joel Knighton <[email protected]>

CNDB-12393: Refactor TransportTest to account for JEP-416 (#1497)

d2bca3c

CNDB-11508: Add config to enable JDK22 test and run (#1503)

af60179

CNDB-12423: Fix flaky test failures in LegacySSTableTest by blocking …

ab2c117

…for tidiers to run (#1498) This test relies on file system state being blank, so truncating isn't enough unless we wait for tidiers to run.

moving PQ refine into CompactionGraph (#1471)

5e2de72

CNDB-12075: Allow UDFs within GROUP BY clause (#1494)

9421d23

CNDB-12041: Add missing check of guardrail sai_sstable_indexes_per_qu…

fcc7bfb

…ery (#1501)

djatnieks force-pushed the CNDB-12154 branch from e85b0c9 to fcc7bfb Compare January 17, 2025 21:06

djatnieks merged commit 02f5fa3 into main-5.0 Jan 17, 2025
515 of 559 checks passed

djatnieks deleted the CNDB-12154 branch January 17, 2025 23:55

CNDB-12154: Port latest commits from main to main-5.0 #1467

CNDB-12154: Port latest commits from main to main-5.0 #1467

Conversation

djatnieks commented Dec 18, 2024 • edited Loading

What is the issue

What does this PR fix and why was it fixed

Checklist before you submit for review

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

blambov Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blambov Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

djatnieks Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

djatnieks Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cassci-bot commented Jan 16, 2025

❌ Build ds-cassandra-pr-gate/PR-1467 rejected by Butler

Found 62 new test failures

Found 37 known test failures

sonarqubecloud bot commented Jan 17, 2025

Quality Gate passed

djatnieks commented Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

blambov Jan 6, 2025 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading

blambov Jan 6, 2025 •

edited

Loading

djatnieks Jan 6, 2025 •

edited

Loading

djatnieks Dec 18, 2024 •

edited

Loading