Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48375][SQL] Add support for SIGNAL statement #49726

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

miland-db
Copy link
Contributor

@miland-db miland-db commented Jan 29, 2025

What changes were proposed in this pull request?

SIGNAL statement to enable users to raise an error from arbitrary place in code.

Why are the changes needed?

The intent is to add the possibility for user to raise builtin and user defined conditions/sqlstates. This will enable users to do error handling in a way similar to other popular programming languages.

Limitations

It is only possible to raise errors with user defined conditions declared in the same scope as signal statement.

Does this PR introduce any user-facing change?

No

How was this patch tested?

There are already existing test suites for SQL scripting that have been improved to test new functionalities:

  • SqlScriptingParserSuite
  • SqlScriptingExecutionSuite

Was this patch authored or co-authored using generative AI tooling?

No

@miland-db miland-db marked this pull request as draft January 29, 2025 16:17
@miland-db miland-db force-pushed the milan-dankovic_data/signal-statement-1 branch from 3e7ab54 to ca6f439 Compare January 31, 2025 13:09
sqlState = sqlState,
message = getMessageText,
cause = null,
origin = CurrentOrigin.get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ain't right, will update the comment tomorrow, writing just so it doesn't get forgotten

# Conflicts:
#	common/utils/src/main/resources/error/error-conditions.json
#	sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala
#	sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingErrors.scala
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingParserSuite.scala
#	sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecution.scala
#	sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionContext.scala
#	sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
#	sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala
#	sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala
#	sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
@github-actions github-actions bot removed the DOCS label Feb 4, 2025
@github-actions github-actions bot added the DOCS label Feb 4, 2025
@miland-db miland-db changed the title [DRAFT][SPARK-48375][SQL] Add support for SIGNAL statement [SPARK-48375][SQL] Add support for SIGNAL statement Feb 5, 2025
@miland-db miland-db marked this pull request as ready for review February 5, 2025 17:16
@miland-db
Copy link
Contributor Author

Adding reviewers: @cloud-fan, @davidm-db, @dejankrak-db, @dusantism-db, @MaxGekk

case class SignalStatement(
var isBuiltinError: Boolean = false,
errorCondition: Option[String] = None,
var sqlState: Option[String] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we put mutable var in a case class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqlState is resolved later for some kind of SIGNAL statements and not during parsing. I need a way to set it when I resolve it. One way is to create a new instance where we copy everything from the existing instance and provide new resolved sqlState, or to use var. Any thoughts @cloud-fan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating new instance is more scala-style and let's do it

@@ -2413,6 +2413,21 @@ class SqlScriptingParserSuite extends SparkFunSuite with SQLHelper {
head.asInstanceOf[SingleStatement].getText == "SELECT 3")
}

test("signal statement - invalid sqlstate") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add some positive tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add additional parser and execution tests for SIGNAL.

val errorCondition: Option[String] = None,
val sqlState: Option[String] = None,
val message: Either[String, UnresolvedAttribute],
val msgArguments: Option[UnresolvedAttribute],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we allow users to specify parameters like the TBLPROPERTIES syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srielau suggested that for the first version providing the variable that holds the map of arguments is enough. Next steps will be to add support for other ways to specify message arguments, e.g. directly using map(...) function etc.

@@ -3602,6 +3602,12 @@
],
"sqlState" : "42K09"
},
"INVALID_VARIABLE_TYPE_FOR_SIGNAL_STATEMENT" : {
"message" : [
"Variable type must be Map<String, String> type but got <varType>."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you output Map<String, String> in the same way as toSQLType

@@ -65,6 +65,7 @@ compoundStatement
| beginEndCompoundBlock
| declareConditionStatement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Can be done in a follow-up]

We should fix the DECLARE CONDITION statement. The simple case when we declare an "empty" condition, like DECLARE cond CONDITION; fails currently - CONDITION is recognized as a type and an exception is thrown that the type is not recognized.
I think the simplest (and best) way to do this is to reorder the parser rules and add a comment that declareConditionStatement needs to go before statement. I think the parser already relies on the rules order in multiple places and that this should be fine. So, I think we should go with this approach, but add a comment in a PR description about other possible approaches that we discussed offline.
Let's add the tests for this case as well.


SignalStatement(
errorCondition = Some(ctx.conditionName.getText.toUpperCase(Locale.ROOT)),
sqlState = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment that explains that we set the sqlState to None here and fill it in in visitCompoundBodyImpl once all the conditions have been resolved.

Comment on lines +369 to +373
// Try to get sqlState if condition is user defined.
signalStatement.sqlState = conditions.get(signalStatement.errorCondition.get)

// If condition is not user defined, check if it is a spark defined error condition.
if (signalStatement.sqlState.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a small comment explaining why we have this logic of filling in sqlState here would be nice.

override def visitSignalStatementWithCondition(
ctx: SignalStatementWithConditionContext): SignalStatement = {
val messageString = Option(ctx.msgStr)
.map(sl => Left(string(visitStringLit(sl)).replace("'", "")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are a bit inconsistent with visiting string literals in visitSignalStatementWithCondition and visitSignalStatementWithSqlState - let's double check what is the easiest/most compact way to do this. Also, do we need to replace quotes?

messageParameters,
isBuiltinError),
cause)
with SparkThrowable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation, should be the same level as extends

Comment on lines +420 to +421
errorCondition: Option[String] = None,
var sqlState: Option[String] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine to keep both errorCondition and sqlState as Options in the logical plan (due to the resolution in the visitors), but since we need to have both filled in, I would remove the Options from the exec node and add assert in the interpreter's bulidExecutionPlan

val argsValue = argsReference.eval(null)

if (argsValue == null) {
throw SqlScriptingErrors.nullVariableSignalStatement(CurrentOrigin.get, args.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot do CurrentOrigin.get here as well, we have the origin as the param of SingleStatementExec


if (varReferenceValue == null) {
throw SqlScriptingErrors
.nullVariableSignalStatement(CurrentOrigin.get, u.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing for CurrentOrigin.get


if (!argsReference.dataType.sameType(MapType(StringType, StringType))) {
throw SqlScriptingErrors
.invalidSignalStatementVariableType(CurrentOrigin.get, argsReference.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace CurrentOrigin.get here as well


if (!varReference.dataType.sameType(StringType)) {
throw SqlScriptingErrors
.invalidSignalStatementVariableType(CurrentOrigin.get, varReference.dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace CurrentOrigin.get here as well

messageParameters: Map[String, String] = Map.empty,
isBuiltinError: Boolean = false)
extends Exception(
errorMessageWithLineNumber(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not adding SQL State to the exception message and I think we should do it.
The regular exceptions are in format [<sql_state>] <message_text>. For SqlScriptingException we did {<error_line>} [<sql_state>] <message_text> and we should do the same for SqlScriptingRuntimeException

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed that it looks like this is fixed in SqlScriptingRuntimeException.errorMessageWithLineNumber?

Comment on lines +27 to +33
condition: String,
sqlState: Option[String] = None,
message: String,
cause: Throwable,
val origin: Origin,
messageParameters: Map[String, String] = Map.empty,
isBuiltinError: Boolean = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation seems off here

import org.apache.spark.sql.catalyst.trees.Origin
import org.apache.spark.sql.exceptions.SqlScriptingRuntimeException.errorMessageWithLineNumber

class SqlScriptingRuntimeException (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Can be done in a follow-up PR - together with the other related comment in SqlScriptingExecution]

nit: maybe adding a comment explaining what is this exception used for (wrapping all of the SparkThrowable exceptions thrown from the script into this, etc) would be nice

@@ -100,13 +100,13 @@ class SqlScriptingExecution(
// While we don't have a result statement, execute the statements.
while (currentStatement.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Can be done in a follow-up PR]

In handleException we should add logic that converts all of the SparkThrowable exceptions to the SqlScriptingRuntimeException.

This has a few consequences:

  • We need to start propagating the exception to findHandler functions instead of condition and SQL state.
  • We need to start tracking the currentStatementOrigin for each statement here.
  • Propagate currentStatementOrigin to handle exception in order to properly construct SqlScriptingRuntimeException.
  • withErrorHandling has to have Origin as an input param as well.

I've sent you the example commit offline regarding this.

val session: SparkSession,
override val origin: Origin)
extends LeafStatementExec
with ColumnResolutionHelper with WithOrigin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation should match the extends

Comment on lines +1028 to +1033
* Executable node for Signal Statement.
* @param errorCondition Name of the error condition/SQL State for error that will be thrown.
* @param sqlState SQL State of the error that will be thrown.
* @param message Error message (either string or variable name).
* @param msgArguments Error message parameters for builtin conditions.
* @param session Spark session that SQL script is executed within.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing some of the params in the comment

Comment on lines +1036 to +1037
val errorCondition: Option[String] = None,
val sqlState: Option[String] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in one of the previous comments, let's make these mandatory

@@ -274,6 +274,16 @@ case class SqlScriptingInterpreter(session: SparkSession) {
case iterateStatement: IterateStatement =>
new IterateStatementExec(iterateStatement.label)

case signalStatement: SignalStatement =>
new SignalStatementExec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in one of the previous comments, let's add assert that errorCondition and sqlState are non-empty in signalStatement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants