-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48375][SQL] Add support for SIGNAL statement #49726
base: master
Are you sure you want to change the base?
[SPARK-48375][SQL] Add support for SIGNAL statement #49726
Conversation
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecution.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/exceptions/SqlScriptingRuntimeException.scala
Outdated
Show resolved
Hide resolved
3e7ab54
to
ca6f439
Compare
sqlState = sqlState, | ||
message = getMessageText, | ||
cause = null, | ||
origin = CurrentOrigin.get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ain't right, will update the comment tomorrow, writing just so it doesn't get forgotten
# Conflicts: # common/utils/src/main/resources/error/error-conditions.json # sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala # sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingErrors.scala # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingParserSuite.scala # sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecution.scala # sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionContext.scala # sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala # sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala # sql/core/src/test/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionSuite.scala # sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala
Adding reviewers: @cloud-fan, @davidm-db, @dejankrak-db, @dusantism-db, @MaxGekk |
case class SignalStatement( | ||
var isBuiltinError: Boolean = false, | ||
errorCondition: Option[String] = None, | ||
var sqlState: Option[String] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we put mutable var
in a case class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlState
is resolved later for some kind of SIGNAL
statements and not during parsing. I need a way to set it when I resolve it. One way is to create a new instance where we copy everything from the existing instance and provide new resolved sqlState
, or to use var
. Any thoughts @cloud-fan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating new instance is more scala-style and let's do it
@@ -2413,6 +2413,21 @@ class SqlScriptingParserSuite extends SparkFunSuite with SQLHelper { | |||
head.asInstanceOf[SingleStatement].getText == "SELECT 3") | |||
} | |||
|
|||
test("signal statement - invalid sqlstate") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add some positive tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add additional parser and execution tests for SIGNAL.
sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala
Show resolved
Hide resolved
val errorCondition: Option[String] = None, | ||
val sqlState: Option[String] = None, | ||
val message: Either[String, UnresolvedAttribute], | ||
val msgArguments: Option[UnresolvedAttribute], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we allow users to specify parameters like the TBLPROPERTIES syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srielau suggested that for the first version providing the variable that holds the map of arguments is enough. Next steps will be to add support for other ways to specify message arguments, e.g. directly using map(...)
function etc.
@@ -3602,6 +3602,12 @@ | |||
], | |||
"sqlState" : "42K09" | |||
}, | |||
"INVALID_VARIABLE_TYPE_FOR_SIGNAL_STATEMENT" : { | |||
"message" : [ | |||
"Variable type must be Map<String, String> type but got <varType>." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you output Map<String, String>
in the same way as toSQLType
...st/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/SqlScriptingLogicalPlans.scala
Show resolved
Hide resolved
@@ -65,6 +65,7 @@ compoundStatement | |||
| beginEndCompoundBlock | |||
| declareConditionStatement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Can be done in a follow-up]
We should fix the DECLARE CONDITION
statement. The simple case when we declare an "empty" condition, like DECLARE cond CONDITION;
fails currently - CONDITION
is recognized as a type and an exception is thrown that the type is not recognized.
I think the simplest (and best) way to do this is to reorder the parser rules and add a comment that declareConditionStatement
needs to go before statement
. I think the parser already relies on the rules order in multiple places and that this should be fine. So, I think we should go with this approach, but add a comment in a PR description about other possible approaches that we discussed offline.
Let's add the tests for this case as well.
|
||
SignalStatement( | ||
errorCondition = Some(ctx.conditionName.getText.toUpperCase(Locale.ROOT)), | ||
sqlState = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment that explains that we set the sqlState
to None
here and fill it in in visitCompoundBodyImpl
once all the conditions have been resolved.
// Try to get sqlState if condition is user defined. | ||
signalStatement.sqlState = conditions.get(signalStatement.errorCondition.get) | ||
|
||
// If condition is not user defined, check if it is a spark defined error condition. | ||
if (signalStatement.sqlState.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a small comment explaining why we have this logic of filling in sqlState
here would be nice.
override def visitSignalStatementWithCondition( | ||
ctx: SignalStatementWithConditionContext): SignalStatement = { | ||
val messageString = Option(ctx.msgStr) | ||
.map(sl => Left(string(visitStringLit(sl)).replace("'", ""))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are a bit inconsistent with visiting string literals in visitSignalStatementWithCondition
and visitSignalStatementWithSqlState
- let's double check what is the easiest/most compact way to do this. Also, do we need to replace quotes?
messageParameters, | ||
isBuiltinError), | ||
cause) | ||
with SparkThrowable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation, should be the same level as extends
errorCondition: Option[String] = None, | ||
var sqlState: Option[String] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's fine to keep both errorCondition
and sqlState
as Options
in the logical plan (due to the resolution in the visitors), but since we need to have both filled in, I would remove the Options
from the exec node and add assert
in the interpreter's bulidExecutionPlan
val argsValue = argsReference.eval(null) | ||
|
||
if (argsValue == null) { | ||
throw SqlScriptingErrors.nullVariableSignalStatement(CurrentOrigin.get, args.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot do CurrentOrigin.get
here as well, we have the origin
as the param of SingleStatementExec
|
||
if (varReferenceValue == null) { | ||
throw SqlScriptingErrors | ||
.nullVariableSignalStatement(CurrentOrigin.get, u.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing for CurrentOrigin.get
|
||
if (!argsReference.dataType.sameType(MapType(StringType, StringType))) { | ||
throw SqlScriptingErrors | ||
.invalidSignalStatementVariableType(CurrentOrigin.get, argsReference.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace CurrentOrigin.get
here as well
|
||
if (!varReference.dataType.sameType(StringType)) { | ||
throw SqlScriptingErrors | ||
.invalidSignalStatementVariableType(CurrentOrigin.get, varReference.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace CurrentOrigin.get
here as well
messageParameters: Map[String, String] = Map.empty, | ||
isBuiltinError: Boolean = false) | ||
extends Exception( | ||
errorMessageWithLineNumber( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not adding SQL State to the exception message and I think we should do it.
The regular exceptions are in format [<sql_state>] <message_text>
. For SqlScriptingException
we did {<error_line>} [<sql_state>] <message_text>
and we should do the same for SqlScriptingRuntimeException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed that it looks like this is fixed in SqlScriptingRuntimeException.errorMessageWithLineNumber
?
condition: String, | ||
sqlState: Option[String] = None, | ||
message: String, | ||
cause: Throwable, | ||
val origin: Origin, | ||
messageParameters: Map[String, String] = Map.empty, | ||
isBuiltinError: Boolean = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation seems off here
import org.apache.spark.sql.catalyst.trees.Origin | ||
import org.apache.spark.sql.exceptions.SqlScriptingRuntimeException.errorMessageWithLineNumber | ||
|
||
class SqlScriptingRuntimeException ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Can be done in a follow-up PR - together with the other related comment in SqlScriptingExecution
]
nit: maybe adding a comment explaining what is this exception used for (wrapping all of the SparkThrowable
exceptions thrown from the script into this, etc) would be nice
@@ -100,13 +100,13 @@ class SqlScriptingExecution( | |||
// While we don't have a result statement, execute the statements. | |||
while (currentStatement.isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Can be done in a follow-up PR]
In handleException
we should add logic that converts all of the SparkThrowable
exceptions to the SqlScriptingRuntimeException
.
This has a few consequences:
- We need to start propagating the exception to
findHandler
functions instead of condition and SQL state. - We need to start tracking the
currentStatementOrigin
for each statement here. - Propagate
currentStatementOrigin
to handle exception in order to properly constructSqlScriptingRuntimeException
. withErrorHandling
has to haveOrigin
as an input param as well.
I've sent you the example commit offline regarding this.
val session: SparkSession, | ||
override val origin: Origin) | ||
extends LeafStatementExec | ||
with ColumnResolutionHelper with WithOrigin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation should match the extends
* Executable node for Signal Statement. | ||
* @param errorCondition Name of the error condition/SQL State for error that will be thrown. | ||
* @param sqlState SQL State of the error that will be thrown. | ||
* @param message Error message (either string or variable name). | ||
* @param msgArguments Error message parameters for builtin conditions. | ||
* @param session Spark session that SQL script is executed within. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing some of the params in the comment
val errorCondition: Option[String] = None, | ||
val sqlState: Option[String] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in one of the previous comments, let's make these mandatory
@@ -274,6 +274,16 @@ case class SqlScriptingInterpreter(session: SparkSession) { | |||
case iterateStatement: IterateStatement => | |||
new IterateStatementExec(iterateStatement.label) | |||
|
|||
case signalStatement: SignalStatement => | |||
new SignalStatementExec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in one of the previous comments, let's add assert that errorCondition
and sqlState
are non-empty in signalStatement
What changes were proposed in this pull request?
SIGNAL
statement to enable users to raise an error from arbitrary place in code.Why are the changes needed?
The intent is to add the possibility for user to raise builtin and user defined conditions/sqlstates. This will enable users to do error handling in a way similar to other popular programming languages.
Limitations
It is only possible to raise errors with user defined conditions declared in the same scope as signal statement.
Does this PR introduce any user-facing change?
No
How was this patch tested?
There are already existing test suites for SQL scripting that have been improved to test new functionalities:
SqlScriptingParserSuite
SqlScriptingExecutionSuite
Was this patch authored or co-authored using generative AI tooling?
No