Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog #50495

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Apr 2, 2025

What changes were proposed in this pull request?

Users will extend their session catalog based on the CatalogExtension and set it to spark_catalog. If they add new v2 functions to it, the current logic cannot load them because it can only load v1 functions for the spark_catalog at present.

e.g.

spark.sql.catalog.spark_catalog=MySessionFunctionCatalog
class MySessionFunctionCatalog extends CatalogExtension {
  override def loadFunction(ident: Identifier): UnboundFunction = {
    if (ident.equals(Identifier.of(Array("sys"), "my_function"))) {
      new UnboundFunction {
        override def bind(inputType: StructType): BoundFunction = new ScalarFunction[Int] {
          override def inputTypes(): Array[DataType] = Array(IntegerType)
          override def resultType(): DataType = IntegerType
          override def name(): String = "my_function"
          override def produceResult(input: InternalRow): Int = 123
        }
        override def description(): String = "hello"
        override def name(): String = "my_function"
      }
    } else {
      super.loadFunction(ident)
    }
  }
}

try SELECT sys.my_function(1) and got exception

[ROUTINE_NOT_FOUND] The routine `sys`.`my_function` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP ... IF EXISTS. SQLSTATE: 42883; line 1 pos 7
org.apache.spark.sql.AnalysisException: [ROUTINE_NOT_FOUND] The routine `sys`.`my_function` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP ... IF EXISTS. SQLSTATE: 42883; line 1 pos 7
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:1969)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.resolvePersistentFunctionInternal(SessionCatalog.scala:2154)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.resolvePersistentFunction(SessionCatalog.scala:2096)
	at org.apache.spark.sql.catalyst.analysis.FunctionResolution.resolveV1Function(FunctionResolution.scala:129)
	at org.apache.spark.sql.catalyst.analysis.FunctionResolution.$anonfun$resolveFunction$2(FunctionResolution.scala:66)

Why are the changes needed?

Fix the above bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add test: load v2 function with non-buildin V2 session catalog

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 2, 2025
@Zouxxyy Zouxxyy changed the title [SPARK-51694][SQL] Fix load v2 function with non-buildin V2 catalog [SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog Apr 2, 2025
@Zouxxyy Zouxxyy force-pushed the dev/fix-load-funtion branch from f99f603 to 2418cb5 Compare April 2, 2025 15:08
@@ -850,10 +850,11 @@ VALUES(IDENTIFIER('a.b.c.d')())
-- !query analysis
org.apache.spark.sql.AnalysisException
{
"errorClass" : "IDENTIFIER_TOO_MANY_NAME_PARTS",
"sqlState" : "42601",
"errorClass" : "REQUIRES_SINGLE_PART_NAMESPACE",
Copy link
Contributor Author

@Zouxxyy Zouxxyy Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for modifying these two golden files is that:
when using V2SessionCatalog, ident.asFunctionIdentifier was used previously, which would throw the exception IDENTIFIER_TOO_MANY_NAME_PARTS.
After modification, it follows the V2 logic, which trying catalog.asFunctionCatalog firstly and throwing the V2 exception REQUIRES_SINGLE_PART_NAMESPACE.

The meanings of the two are actually similar, but the latter behavior is more reasonable and more commonly used, as exemplified by lines 757-767.

-- !query
CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) USING csv
-- !query analysis
org.apache.spark.sql.AnalysisException
{
  "errorClass" : "REQUIRES_SINGLE_PART_NAMESPACE",
  "sqlState" : "42K05",
  "messageParameters" : {
    "namespace" : "`a`.`b`",
    "sessionCatalog" : "spark_catalog"
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant