Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Search: Add support for async queries and FAR tool #77906

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tmat
Copy link
Member

@tmat tmat commented Mar 29, 2025

Allows semantic search queries to be async and to call additional pre-defined APIs that provide IDE layer capabilities without referencing Workspace/Feature layer.

Currently the additional APIs are:

public static class Tools
{
    /// <summary>
    /// Returns all syntax nodes that reference a given symbol.
    /// </summary>
    public IEnumerable<SyntaxNode> FindReferencingSyntaxNodes(this ISymbol symbol);

    /// <summary>
    /// Returns semantic model for given syntax tree.
    /// Use to access semantic information from syntax nodes or trees.
    /// </summary>
    public Task<SemanticModel> GetSemanticModelAsync(this SyntaxTree tree);
}

The implementation uses Solution behind the scenes to provide access to FAR and Semantic Model across the whole solution.
It is injected into the generated query assembly via reflection.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 29, 2025
@tmat
Copy link
Member Author

tmat commented Mar 29, 2025

@CyrusNajmabadi Quick impl of the FAR tool for semantic search we talked about: ReferencingSymbolsFinder. Borrowed some code from producer-consumer utils to convert callbacks to IAsyncEnumerable. This can likely be generalized into another helper in the utils. We can also refactor IFindUsagesService to accept ISymbol as input so that I don't need to go from ISymbol to location and then back to ISymbol. Might want to add another service that's not ILanguageService, since I do not start with document and don't know the language.

@tmat tmat marked this pull request as ready for review April 1, 2025 00:32
@tmat tmat requested review from a team as code owners April 1, 2025 00:32
@CyrusNajmabadi
Copy link
Member

The implementation uses Solution behind the scenes to provide access to FAR and Semantic Model across the whole solution.

I love this. That said, i think we might want some docs somewhere (and maybe they're already in teh PR) as to how this works and why thsi is safe. Someone on the team may go: "but wait... a SYntaxTree doesn't uniquely map to a SemanticModel!"

We want to explain that all these 'tools' operate at single snapshot point in time and thus these things do work.

continue;
}

var enclosingSymbol = semanticModel.GetEnclosingSymbol(reference.SourceSpan.SourceSpan.Start, cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understnad this. why are we returning the enclosing symbol for a found reference?

public async Task FindReferencingSyntaxNodes()
{
using var workspace = TestWorkspace.Create("""
<Workspace>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. indent one more.

<value>Parameter type '{0}' is not among supported types: {1}</value>
</data>
<data name="Return_type_0_is_not_among_supported_types_1" xml:space="preserve">
<value>Return type '{0}' is not among supported types: {1}</value>
</data>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

private static bool TryGetFindMethod(Assembly queryAssembly, [NotNullWhen(true)] out MethodInfo? method, out QueryKind queryKind, out string? error, out string[]? errorMessageArgs)
private static void SetModuleCancellationToken(Assembly queryAssembly, CancellationToken cancellationToken)
{
var pidType = queryAssembly.GetType("<PrivateImplementationDetails>", throwOnError: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who ends up potentially catching if this happens? or should this be impossible as we own the emitting of these assemblies?

@@ -207,7 +236,7 @@ private static bool TryGetFindMethod(Assembly queryAssembly, [NotNullWhen(true)]
{
try
{
(method, queryKind) = GetFindMethod(program, ref error);
(method, queryKind, isAsync) = GetFindMethod(program, ref error);
}
catch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't love the naked catch. Why not FatalError report this?

error = string.Format(
FeaturesResources.Return_type_0_is_not_among_supported_types_1,
method.ReturnType,
$"'{typeof(IEnumerable<ISymbol>).Name}', '{typeof(IAsyncEnumerable<ISymbol>).Name}'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does .Name do a good job here? i thought with instantiated generics it was something awful :) do you have a test showing what error message we actually get?

personally, it might be better to literally hardcode "IEnumerable<ISymbol>".

--

Oh, looking at this in a console app, you just get IEnumerable`1. That doesn't seem great here. Consider just hard coding the name.

else
{
foreach (var symbol in (IEnumerable<ISymbol?>)symbols)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider checking CT here. same with foreach loop above.

foreach (var node in e.FindReferencingSyntaxNodes())
{
var model = await node.SyntaxTree.GetSemanticModelAsync()
yield return model.GetEnclosingSymbol(node.SpanStart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super cool.

@@ -207,7 +236,7 @@ private static bool TryGetFindMethod(Assembly queryAssembly, [NotNullWhen(true)]
{
try
{
(method, queryKind) = GetFindMethod(program, ref error);
(method, queryKind, isAsync) = GetFindMethod(program, ref error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... looking at how this is used, i have an alternate suggestion. instead of returning this isAsync flag, just always return an IAsyncEnumerable. If you get an IEnumerable, then that is trivially wrappable in an IAsyncEnumerable. Which then means all subsequent code operates uniformly.

@@ -130,30 +130,49 @@ private async ValueTask InvokeAsync(Project project, Compilation compilation, ob

try
{
var symbols = (IEnumerable<ISymbol?>?)method.Invoke(null, [entity]) ?? [];
var symbols = method.Invoke(null, [entity]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, basically here get teh result, and if is an IEnumerable, wrap into an IAsyncEnumerable. And now everything is consistent, and you don't need the flag passed through.

{
var definitionItem = await symbol.ToClassifiedDefinitionItemAsync(
classificationOptions, project.Solution, s_findReferencesSearchOptions, isPrimary: true, includeHiddenLocations: false, cancellationToken).ConfigureAwait(false);
executionTime += Stopwatch.GetElapsedTime(executionStart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: odd that all the symbols are getting "isPrimary: true". but probably doesn't matter that much.

internal sealed class ReferencingSyntaxFinder(Solution solution, OptionsProvider<ClassificationOptions> classificationOptions, CancellationToken cancellationToken)
{
public IEnumerable<SyntaxNode> FindSyntaxNodes(ISymbol symbol)
=> FindSyntaxNodesAsync(symbol).ToBlockingEnumerable(cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. i didn't realize they had this.


namespace Microsoft.CodeAnalysis.SemanticSearch;

internal sealed class ReferencingSyntaxFinder(Solution solution, OptionsProvider<ClassificationOptions> classificationOptions, CancellationToken cancellationToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal sealed class ReferencingSyntaxFinder(Solution solution, OptionsProvider<ClassificationOptions> classificationOptions, CancellationToken cancellationToken)
internal sealed class ReferencingSyntaxFinder(
Solution solution,
OptionsProvider<ClassificationOptions> classificationOptions,
CancellationToken cancellationToken)

nit: when lines are too long for github, i like that as a good choice to wrap.

if (syntaxRef == null)
{
yield break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrmm... this seems wrong. it would mean that someone couldn'lt, for example, find referencing syntax nodes for a metadata symbol.

public IEnumerable<SyntaxNode> FindSyntaxNodes(ISymbol symbol)
=> FindSyntaxNodesAsync(symbol).ToBlockingEnumerable(cancellationToken);

public async IAsyncEnumerable<SyntaxNode> FindSyntaxNodesAsync(ISymbol symbol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i prefer FindReferencingSyntaxNodes and FindReferencingSyntaxNodesAsync. how do you feel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant since the containing type is already called ReferencingSyntaxFinder. Maybe just Find :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh true. i was thinking this was the name a user saw.

  1. i'm fine with this being called 'Find'
  2. i like the name you chose (FindReferencingSyntaxNodes) for what users see.

#resolved.

Exception? exception = null;
try
{
await service.FindReferencesAsync(context, document, location.SourceSpan.Start, classificationOptions, cancellationToken).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. this seems like a super strange way to do things :) i get hwat you're doing. you're mapping back from a source symbol to its decl location, so ou can call this here. However, you shouldn't have to IMO. You should be able to literally just call intot his directly

public static partial class SymbolFinder
{
    internal static async Task FindReferencesAsync(
        ISymbol symbol,
        Solution solution,
        IStreamingFindReferencesProgress progress,
        IImmutableSet<Document>? documents,
        FindReferencesSearchOptions options,
        CancellationToken cancellationToken)
    {

And then just have your IStreamingFindReferencesProgress push into a channel. And just return channel.Reader.ReadAllAsync.

It really should be super simple.

/// Returns <see cref="SemanticModel"/> for any <see cref="SyntaxTree"/> in the <see cref="Solution"/>.
/// </summary>
public Task<SemanticModel> GetSemanticModelAsync(SyntaxTree tree)
=> solution.GetRequiredDocument(tree).GetRequiredSemanticModelAsync(cancellationToken).AsTask();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so cool.

Note: we might want to doc that we should consider a caching strategy here. Personally, i think it probably makes sense that for the lifetime of of the entire LLM call that all the models are kept alive. And only dumped at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead VSCode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants