Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text and Keyword aggregation integration tests #176

Draft
wants to merge 9 commits into
base: Integ-newDataTypeForTextAggregations
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
import static org.opensearch.sql.legacy.TestUtils.getPhraseIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getResponseBody;
import static org.opensearch.sql.legacy.TestUtils.getStringIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getDataTextKeywordIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getWeblogsIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.isIndexExist;
import static org.opensearch.sql.legacy.TestUtils.loadDataByRestClient;
Expand Down Expand Up @@ -584,7 +585,11 @@ public enum Index {
CALCS(TestsConstants.TEST_INDEX_CALCS,
"calcs",
getMappingFile("calcs_index_mappings.json"),
"src/test/resources/calcs.json"),;
"src/test/resources/calcs.json"),
TEXTKEYWORD(TestsConstants.TEST_INDEX_TEXTKEYWORD,
"textkeyword",
getMappingFile("text_keyword_index_mapping.json"),
"src/test/resources/text_keyword_index.json");

private final String name;
private final String type;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,11 @@ public static String getDataTypeNonnumericIndexMapping() {
return getMappingFile(mappingFile);
}

public static String getDataTextKeywordIndexMapping() {
String mappingFile = "text_keyword_index_mapping.json";
return getMappingFile(mappingFile);
}

public static void loadBulk(Client client, String jsonPath, String defaultIndex)
throws Exception {
System.out.println(String.format("Loading file %s into opensearch cluster", jsonPath));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ public class TestsConstants {
public final static String TEST_INDEX_BEER = TEST_INDEX + "_beer";
public final static String TEST_INDEX_NULL_MISSING = TEST_INDEX + "_null_missing";
public final static String TEST_INDEX_CALCS = TEST_INDEX + "_calcs";
public final static String TEST_INDEX_TEXTKEYWORD = TEST_INDEX + "_textkeyword";

public final static String DATE_FORMAT = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'";
public final static String TS_DATE_FORMAT = "yyyy-MM-dd HH:mm:ss.SSS";
Expand Down
67 changes: 67 additions & 0 deletions integ-test/src/test/java/org/opensearch/sql/sql/TextTypeIT.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.sql;

import org.junit.Test;
import org.opensearch.sql.legacy.SQLIntegTestCase;
import java.io.IOException;


import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_TEXTKEYWORD;
import static org.opensearch.sql.util.MatcherUtils.schema;
import static org.opensearch.sql.util.MatcherUtils.verifySchema;

public class TextTypeIT extends SQLIntegTestCase {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other tests that you tried? These only look like aggregation tests, but it would be good to know if some of the following would also work:

  • WHERE field LIKE "keyFD??"
  • WHERE wildcard("field", "keyFD??")
  • SELECT field LIKE "keyFD??"
    and maybe a couple of string-like functions:
  • SELECT LOCATE("FD", field)
  • SELECT SUBSTRING(field, 3, 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also try

  • SELECT POSITION(substring IN field) since it seems to fail for text fields (as Margarit demonstrated)

Copy link
Author

@MitchellGale MitchellGale Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT POSITION(substring IN field)

Passed:

  • selectPositionKeyword
  • selectPositionText
  • selectPositionTextKeywordFieldNoFieldData
  • selectPositionTypeTextFieldData
  • selectLocateTextDataFieldNoFields
  • selectPositionTextDataFieldNoFields

Failed:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check selectPositionTextDataFieldNoFields - its giving you a parser error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, this looks exclusively like an issue with aggregation on text fields.
Strangely, I thought POSITION was going to fail on text fields, since it was failing for Margarit earlier.



@Override
public void init() throws Exception {
super.init();
loadIndex(Index.TEXTKEYWORD);
}

@Test
public void textKeywordTest() throws IOException {
var result = executeJdbcRequest(String.format("select typeText from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeText", null, "text"));
}

@Test
public void aggregateOnText() throws IOException {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeText", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnKeyword() throws IOException {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeKeyword", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnTextFieldData() throws IOException {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeTextFieldData", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnKeywordFieldData() throws IOException {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeKeywordFieldNoFieldData", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnTextAndFieldDataNoFields() throws IOException {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY textDataFieldNoFields", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"mappings" : {
"properties" : {
"typeKeyword" : {
"type" : "keyword"
},
"typeText" : {
"type" : "text"
},
"typeKeywordFieldNoFieldData" : {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding a row with more than 10 words. Interesting to test this too.

}
} },
"typeTextFieldData" : {
"type": "text",
"fielddata": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be interesting to know how it works if we have a text field with "fielddata": true but doesn't have any fields defined. Does this behave just like typeTextFieldData?

Copy link
Author

@MitchellGale MitchellGale Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 2a8fbac. It passes.

"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10
}
}
},
"textDataFieldNoFields" : {
"type": "text",
"fielddata": true
},
"int0" : {
"type": "integer"
}
}
}
}
16 changes: 16 additions & 0 deletions integ-test/src/test/resources/text_keyword_index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{"index": {}}
{"typeKeyword": "key00", "typeText": "text00", "typeKeywordFieldNoFieldData": "keyword00","typeTextFieldData": "keyFD00", "typeKeywordFieldData": "textFD00", "textDataFieldNoFields": "textFDNF00","int0": 0}
{"index": {}}
{"typeKeyword": "key01", "typeText": "text01", "typeKeywordFieldNoFieldData": "keyword01", "typeTextFieldData": "keyFD01", "typeTextFieldData": "textFD01OverTen", "textDataFieldNoFields": "textFDNF01", "int0": 1}
{"index": {},typeKeywordFieldData

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A typo? How it works?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. Issue created opensearch-project#1110

{"typeKeyword": "key02", "typeText": "text02", "typeKeywordFieldNoFieldData": "keyword02", "typeTextFieldData": "keyFD02", "typeKeywordFieldData": "textFD02", "textDataFieldNoFields": "textFDNF02", "int0": 2}
{"index": {}}
{"typeKeyword": "key03", "typeText": "text03", "typeKeywordFieldNoFieldData": "keyword03", "typeTextFieldData": "keyFD03OverTen", "typeKeywordFieldData": "textFD03", "textDataFieldNoFields": "textFDNF03", "int0": 3}
{"index": {}}
{"typeKeyword": "key04", "typeText": "text04", "typeKeywordFieldNoFieldData": "keyword04", "typeTextFieldData": "keyFD04", "typeKeywordFieldData": "textFD04", "textDataFieldNoFields": "textFDNF04", "int0": 4}
{"index": {}}
{"typeKeyword": "key05", "typeText": "text05", "typeKeywordFieldNoFieldData": "keyword05", "typeTextFieldData": "keyFD05", "typeKeywordFieldData": "textFD0OverTen5", "textDataFieldNoFields": "textFDNF05", "int0": 5}
{"index": {}}
{"typeKeyword": "key06", "typeText": "text06", "typeKeywordFieldNoFieldData": "keyword06", "typeTextFieldData": "keyFD06OverTen", "typeKeywordFieldData": "textFD06", "textDataFieldNoFields": "textFDNF06", "int0": 6}
{"index": {}}
{"typeKeyword": "key07", "typeText": "text07", "typeKeywordFieldNoFieldData": "keyword07", "typeTextFieldData": "keyFD07", "typeKeywordFieldData": "textFD07", "textDataFieldNoFields": "textFDNF07", "int0": 7}