-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LiteLLM configuration for caching and streaming #904
Comments
To correctly enable streaming and caching in the
By following these steps, you should be able to configure To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Any solution for these problems ? |
To address the issues you're encountering with the
By ensuring these configurations are correctly set, you should be able to resolve the issues with streaming and caching in the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
I guess |
Hello @Snikch63200 Please notice that the LLMs we use in paperqa comes from lmi, which is a wrapper over That said, I am assuming you want to locally cache, is that right? This page shows how to do it with Please check this code snippet to get it done in from lmi import LiteLLMModel
from aviary.core import Message
import litellm
from litellm.caching.caching import Cache
# Set up caching -- I did not need this setup. But since it's in the litellm page, I left it here
# litellm.cache = Cache()
llm = LiteLLMModel(name="gpt-4o-mini") # Change it to your llm
r = await llm.call_single(
messages=[Message(role="user", content="Tell me a very very very random joke")],
caching=True
)
print(r.text) I could rerun it multiple times and get the same output. Let me know if it works for you. |
Hello @maykcaldas Thanks for your answer. Your snippet perfectly works with llama3.3. So, I think there could be two explainations :
Best regards. |
Hey @Snikch63200 I don't think PQA uses caching anywhere. But you're right about streaming. Could you tell me why do you need the answer to be streamed back to you? In our ecosystem, we use Best, |
Hi,
I try to configure LiteLLM for enabling caching and streaming with LiteLLM.
'stream' parameter seems unrecognized as kwarg by litellm.
litellm.cache=True
leads to an error that indicates this arg cannot be a bool... Also tried to pass aCache()
object but leads to an error...What is the correct way to enable streaming and caching correctly ?
The text was updated successfully, but these errors were encountered: