[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog #50495

Zouxxyy · 2025-04-02T14:59:25Z

What changes were proposed in this pull request?

Users will extend their session catalog based on the CatalogExtension and set it to spark_catalog. If they add new v2 functions to it, the current logic cannot load them because it can only load v1 functions for the spark_catalog at present.

e.g.

spark.sql.catalog.spark_catalog=MySessionFunctionCatalog

class MySessionFunctionCatalog extends CatalogExtension {
  override def loadFunction(ident: Identifier): UnboundFunction = {
    if (ident.equals(Identifier.of(Array("sys"), "my_function"))) {
      new UnboundFunction {
        override def bind(inputType: StructType): BoundFunction = new ScalarFunction[Int] {
          override def inputTypes(): Array[DataType] = Array(IntegerType)
          override def resultType(): DataType = IntegerType
          override def name(): String = "my_function"
          override def produceResult(input: InternalRow): Int = 123
        }
        override def description(): String = "hello"
        override def name(): String = "my_function"
      }
    } else {
      super.loadFunction(ident)
    }
  }
}

try SELECT sys.my_function(1) and got exception

[ROUTINE_NOT_FOUND] The routine `sys`.`my_function` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP ... IF EXISTS. SQLSTATE: 42883; line 1 pos 7
org.apache.spark.sql.AnalysisException: [ROUTINE_NOT_FOUND] The routine `sys`.`my_function` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP ... IF EXISTS. SQLSTATE: 42883; line 1 pos 7
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.failFunctionLookup(SessionCatalog.scala:1969)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.resolvePersistentFunctionInternal(SessionCatalog.scala:2154)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.resolvePersistentFunction(SessionCatalog.scala:2096)
	at org.apache.spark.sql.catalyst.analysis.FunctionResolution.resolveV1Function(FunctionResolution.scala:129)
	at org.apache.spark.sql.catalyst.analysis.FunctionResolution.$anonfun$resolveFunction$2(FunctionResolution.scala:66)

Why are the changes needed?

Fix the above bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

Add test: load v2 function with non-buildin V2 session catalog

Was this patch authored or co-authored using generative AI tooling?

No

Zouxxyy · 2025-04-03T01:51:59Z

sql/core/src/test/resources/sql-tests/analyzer-results/identifier-clause.sql.out

@@ -850,10 +850,11 @@ VALUES(IDENTIFIER('a.b.c.d')())
 -- !query analysis
 org.apache.spark.sql.AnalysisException
 {
-  "errorClass" : "IDENTIFIER_TOO_MANY_NAME_PARTS",
-  "sqlState" : "42601",
+  "errorClass" : "REQUIRES_SINGLE_PART_NAMESPACE",


The reason for modifying these two golden files is that:
when using V2SessionCatalog, ident.asFunctionIdentifier was used previously, which would throw the exception IDENTIFIER_TOO_MANY_NAME_PARTS.
After modification, it follows the V2 logic, which trying catalog.asFunctionCatalog firstly and throwing the V2 exception REQUIRES_SINGLE_PART_NAMESPACE.

The meanings of the two are actually similar, but the latter behavior is more reasonable and more commonly used, as exemplified by lines 757-767.

-- !query CREATE TABLE IDENTIFIER('a.b.c')(c1 INT) USING csv -- !query analysis org.apache.spark.sql.AnalysisException { "errorClass" : "REQUIRES_SINGLE_PART_NAMESPACE", "sqlState" : "42K05", "messageParameters" : { "namespace" : "`a`.`b`", "sessionCatalog" : "spark_catalog" } }

github-actions bot added the SQL label Apr 2, 2025

Zouxxyy changed the title ~~[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 catalog~~ [SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog Apr 2, 2025

1

2418cb5

Zouxxyy force-pushed the dev/fix-load-funtion branch from f99f603 to 2418cb5 Compare April 2, 2025 15:08

fix test

5121465

Zouxxyy commented Apr 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog #50495

[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog #50495

Zouxxyy commented Apr 2, 2025 •

edited

Loading

Zouxxyy Apr 3, 2025 •

edited

Loading

[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog #50495

Are you sure you want to change the base?

[SPARK-51694][SQL] Fix load v2 function with non-buildin V2 session catalog #50495

Conversation

Zouxxyy commented Apr 2, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Zouxxyy Apr 3, 2025 • edited Loading

Choose a reason for hiding this comment

Zouxxyy commented Apr 2, 2025 •

edited

Loading

Zouxxyy Apr 3, 2025 •

edited

Loading