“Extracting wisdom” from conference videos

PyCon US happened in May, this month, the 154 videos gradually started being published on YouTube. Between now and then many other interesting conferences took place. That’s a lot of talks, presentations, and content to be digested.

The truth is, I and most people, won’t watch it all since our time is limited. One option, is to look at the titles and descriptions, then guess what might be the most interesting content. This is a gamble, and my experience tells me that I often get disappointed with my picks.

This process can be repeated for the dozens of conferences a person is interested in. What if we could have a way of:

Finding the best videos to watch, based on our “needs”.
Extracting the main teachings of all content.
Store it is a consumable / searchable way.

It sounds like a lot of work. But fortunately, in 2024 our digital assistants can help us with that.

A couple of months ago, I wrote a blog post about how I run these AI tools on my device without the need of leaking any data to external services. Today, I’ll describe how I used them to help me with the task of extracting the key information and learnings from all the videos I won’t be able to watch.

In the process, I will also share the results publicly. So, let’s get started.

The tools

As I’ve mentioned in the other post, I use “ollama” to run the AI models locally. For this task, since it will require the digestion of a lot of content and given my machine has modest resources, I will rely on llama3:8b. A bigger model might render better results, but it would take forever.

The next steps are to provide the model with the content that needs to be analyzed, and then precisely instruct it on what to do. For this part of the task, I will rely on “fabric”.

“Fabric” provides a tool to download the transcripts from YouTube videos and contains a collection of carefully curated prompts (aka instructions) for the AI models to do a multitude of tasks. For my goal, I will use the extract_wisdom pattern/prompt.

Example output

To demonstrate what can be achieved by letting the model analyze a conference video, following the instructions of a good prompt, I’m including an example.

I used “extract_wisdom” on the following video: Iterate, Iterate, Iterate! (Moshe Zadka) — PyTexas 2024

I obtained the following output:

**SUMMARY**
Mosha ZKA presents on iterators and generators in Python. He discusses how to use them, their benefits, and common pitfalls.

**IDEAS:**
• Iterators are objects that define a sequence of values.
• Generators are special types of functions that can be used as iterators.
• The `iter()` function returns an iterator object.
• Use the `next()` method to get the next value from an iterator.
• Avoid using infinite loops with generators.
• Use iterators and generators to process large datasets efficiently.
• Python's built-in functions, such as `range()`, are implemented using iterators.
• You can use the `enumerate()` function to iterate over both indices and values.
• The `zip()` function takes iterables of equal length and returns an iterator.
• Use the `groupby()` function from the `itertools` module to group items based on a common attribute.
• Avoid using `for` loops with generators, as they can lead to infinite loops.

**INSIGHTS:**
• Iterators and generators are essential tools for working with large datasets in Python.
• Using iterators and generators can simplify your code and improve performance.
• Be mindful of the differences between iterators and generators, and use them accordingly.
• Avoid using `for` loops with generators, as they can lead to infinite loops.
• Use the `iter()` function to create an iterator object from a sequence or other iterable.
• Use the `next()` method to get the next value from an iterator.
• Iterators and generators can help you write more efficient and scalable code.

**QUOTES:**
• "Iterators are objects that define a sequence of values."
• "Generators are special types of functions that can be used as iterators."
• "The `iter()` function returns an iterator object."

**HABITS:**
• Set aside time each day to learn new programming concepts.
• Read books on Python programming and related topics.
• Practice writing code using iterators and generators.
• Join online communities and forums for Python developers.

**FACTS:**
• The first version of Python was released in 1991.
• Guido van Rossum, the creator of Python, wanted to create a scripting language that was easy to learn and use.
• Python is now one of the most popular programming languages worldwide.
• The `itertools` module in Python provides many useful functions for working with iterators.

**REFERENCES:**
• "Python Crash Course" by Eric Matthes
• "Automate the Boring Stuff with Python" by Al Sweigart and others

**ONE-SENTENCE TAKEAWAY:**
Iterators and generators are powerful tools that can simplify your code, improve performance, and help you write more scalable programs.

**RECOMMENDATIONS:**
• Use iterators and generators to process large datasets efficiently.
• Avoid using `for` loops with generators, as they can lead to infinite loops.
• Practice writing code using iterators and generators.
• Read books on Python programming and related topics.
• Join online communities and forums for Python developers.

Evaluating the results

After going through the notes and then picking some videos, it is clear that the extracted content is not even close to being 100% spot on. I’ve noticed things such as:

Focusing on only a small part of the video.
Highlighting superfluous stuff, while missing content that I would classify as important.
Misinterpretation of what has been said (I only found one occurrence of this, but I assume there will be more. It is in the example I’ve shown above, try to find it).

Nevertheless, I still found the results helpful for the purpose I was aiming for. I guess that some issues might be related to:

The model that I’ve chosen. Perhaps using a bigger one would render better notes.
The fact that the approach relies on transcripts. The model misses the information that is communicated visually (slides, demos, etc.).

This approach would definitely provide better results when applied to written content and podcast transcripts.

The repository

I’ve written a quick script to run this information extraction on all videos of a YouTube playlist (nothing fancy that’s worth sharing). I’ve also created a repository where I store the results obtained when I run it for a conference playlist (PyCon is still not there since all videos were not released yet).

Every time I do this for a conference I’m interested in, I will share these automatically generated notes there.

You are welcome to ask for a specific conference to be added. If it is on Youtube, it is likely I can generate them. Just create a new issue in the repository.