Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoping for confirmation of a few high-level ideas? #921

Open
plbremer opened this issue Mar 27, 2025 · 0 comments
Open

Hoping for confirmation of a few high-level ideas? #921

plbremer opened this issue Mar 27, 2025 · 0 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@plbremer
Copy link

Hi,

Thanks for putting together this very compelling tooling. I was hoping to ask a few specific questions about what is going on to make sure that everything is working as we expect before trying to productionize :)

  1. We can/should build a classic document retrieval index with Tantivy up-front in the case of >10,000 <100,000 documents. This index does not involve a vector store at all.
  2. In the publication's Figure 1a, the Tantivy document store is the tool that the Paper Search agent is interacting with.
  3. Any vectorization that occurs happens on-the-fly with the Gather Evidence Agent. Where is this vectorization stored? Is it possible to slowly accumulate vectors somewhere? I recognize that we can save a Docs object, however, every query will probably have a unique set of documents that is retrieved, so it is not clear if we can meaningfully aggregate previous vectorizations. (obviously the system works even if we cant accumulate these meaningfully)
  4. The README mentions options for larger-than-memory vector stores. Is this relevant for anything other than opting for a tremendously large k? Can we parametrically avoid this?
  5. If you have custom citations, or no citations, will the Citation Traversal agent simply not operate? Where does the citation graph come from? If I have internal documents, can I provide my own?
  6. It looks like my answers triggered the creation of an index. Is there any documentation around interacting with that SeachIndex?

Thanks for your time.

@dosubot dosubot bot added documentation Improvements or additions to documentation question Further information is requested labels Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant