-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added example Podcast_and_Audio_Transcription #665
base: main
Are you sure you want to change the base?
Conversation
Adds automated audio transcription using Gemini 2.0 with: ✅ Speaker identification (labeled or as Speaker A/B) ✅ Precision timestamps ([HH:MM:SS]) ✅ Music/sound effect detection (e.g., [Jingle] or [Song Name]) ✅ Clean text output with [END] marker Testing: Verified with podcasts & call recordings. Deps: jinja2, Gemini API client. Useful for podcasts, interviews, and call analysis.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks @SonjeVilas, that's an interesting example. I won't have time to review it today but I'll try to do it next week. |
@@ -0,0 +1,289 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. %pip install google-genai jinja2
Since google-genai
is already installed, it might be cleaner to move the jinja2
installation to the top like genai
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I can remove that from the code snippet.
@@ -0,0 +1,289 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick: it’d be good to follow the string formatting conventions used in the cookbook repo—keeping lines concise and using proper indentation for multi-line strings. For example, the formatting could look like:
Template(""" Generate a transcript of the episode. Include timestamps and identify speakers. Speakers are: {% for speaker in speakers %}- {{ speaker }}{% if not loop.last %}\n{% endif %}{% endfor %} ... """)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I can update it
@@ -0,0 +1,289 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll make that change.
@@ -0,0 +1,289 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to add an example that readers can use right away. It has to be open sourced.
Worst case, you can use the Apollo 11 audio recording we're already using in other notebooks.
EDIT: better idea, make one using NotebookLM!
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can make the notebook more readable to everyone
@@ -0,0 +1,289 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #47. model="gemini-2.0-flash",
Can you add a "MODEL_ID" variable like this?
MODEL_ID="gemini-2.0-flash" # @param ["gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}
As it would make the notebook easier to maintain in the future.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! Definitely I can make a changes
@@ -0,0 +1,289 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to fix the URL used by the button (cf. https://github.com/google-gemini/cookbook/actions/runs/14262332934/job/39989516292?pr=665)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok Thank you for addressing I can resolve the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @SonjeVilas,
That's a nice example. On top of what @andycandy already reported, and the minor stuff I pointed out, can you:
- move the notebook in the
examples/
directory - add a link to it in the examples' README
- add a "what's next" section at the end of the notebook, pointing to similar notebooks (or just you preferred ones).
- run the formatting script (cf. https://github.com/google-gemini/cookbook/actions/runs/14262332934/job/39989516304?pr=665)
Thanks again!
Adds automated audio transcription using Gemini 2.0 with:
✅ Speaker identification (labeled or as Speaker A/B)
✅ Precision timestamps ([HH:MM:SS])
✅ Music/sound effect detection (e.g., [Jingle] or [Song Name])
✅ Clean text output with [END] marker
Useful for podcasts, interviews, and call analysis.