Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize metadata query with distinct + limit #25460

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jinyangli34
Copy link
Contributor

Description

Handle DistinctLimitNode in MetadataQueryOptimizer

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Mar 31, 2025
@github-actions github-actions bot added the iceberg Iceberg connector label Mar 31, 2025
@jinyangli34 jinyangli34 marked this pull request as ready for review March 31, 2025 16:20
@wendigo wendigo requested review from kasiafi and Copilot April 1, 2025 12:02
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes metadata queries by introducing special handling for DistinctLimitNode in the MetadataQueryOptimizer and updating the AggregationNode optimization flow.

  • Introduces a new visitor method for DistinctLimitNode tests.
  • Refactors AggregationNode handling to use a common Optional-based table scan optimization.
  • Adds a new test case for DISTINCT queries with LIMIT in the Iceberg plugin.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestMetadataQueryOptimization.java Adds a new test case verifying the plan for "SELECT DISTINCT ... LIMIT" queries.
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/MetadataQueryOptimizer.java Introduces and uses new logic to optimize DistinctLimitNode and refactors AggregationNode optimization.
Comments suppressed due to low confidence (2)

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/MetadataQueryOptimizer.java:127

  • [nitpick] Consider renaming the method 'optimize' to 'tryOptimizeTableScan' (or a similar descriptive name) to better communicate that it may not always perform an optimization.
private Optional<ValuesNode> optimize(TableScanNode tableScan)

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestMetadataQueryOptimization.java:107

  • [nitpick] Consider adding an additional test case that verifies the fallback behavior when the optimization conditions are not met to ensure complete coverage.
assertPlan( format("SELECT DISTINCT b, c FROM %s LIMIT 10", testTable),

Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

TableScanNode tableScan = result.get();
return findTableScan(node.getSource())
.flatMap(this::optimize)
.map(v -> SimplePlanRewriter.rewriteWith(new Replacer(v), node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename v to values

{
return findTableScan(node.getSource())
.flatMap(this::optimize)
.map(v -> SimplePlanRewriter.rewriteWith(new Replacer(v), node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename v to values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

2 participants