Author: Gonçalo Valério

  • The books I enjoyed the most in 2025

    Here we are at the end of another year, so I will share again the two books I enjoyed the most.

    Before starting, here are the links to similar posts covering the previous years:

    This time, after thoughtful consideration, I finally included one Portuguese book. It may not be that interesting for an international audience, but based on the defined criteria (my enjoyment), it made it on its own merit.

    “The Idea Factory”, by Jon Gertner

    This is not a new book, but I only had the time to read it early this year. It’s a fascinating and detailed story about how a single institution was able to produce so many groundbreaking innovations for such a long period of time.

    Bell Labs was indeed a remarkable place that was only possible due to the special circumstances of its time. Nevertheless, the number of incredibly smart people that ended up working there (so many well-known names) and their discoveries, innovations, and inventions is unbelievable.

    The book covers the early years that led to the creation of the laboratory to its demise in the final decades of the 20th century. It doesn’t only tell you about what was invented there but also leads you through the backstory and the events behind the scenes that led to the final outcome.

    It is an entertaining and well-written story.

    “As Causas do Atraso Português”, by Nuno Palma

    This book tries to provide an answer to a common question and discussion topic here in Portugal.

    The fact that the country lost the train (of development) a long time ago is not up for debate anymore; the more interesting questions are, what circumstances contributed to it? What changed? When did it change? Who bears that responsibility? What should we look at to avoid a similar situation in the future?

    Written by an economist and supported by historical economic data, the book provides a different perspective.

    Throughout more than 300 pages, the author goes back 5 centuries to rebuke common explanations and culprits. It then provides and backs a different view on the events, the actors, and the circumstances that eventually made Portugal fall behind the developed world.

    Perhaps he is a bit controversial; perhaps he is spot-on. I don’t know, but ultimately it is a very intriguing read that I recommend to my fellow Portuguese peers.

    Fediverse Reactions
  • More app recommendations

    The good part of having a personal blog is that I can write about whatever comes to my mind. Today I was thinking of how people find the software they use, how many people end up using the same apps because they don’t know any alternatives, and the fact that many creators (especially open-source ones) deserve more recognition.

    Over the years I already shared some of the software I use, and I’m glad it exists. On those posts you can find:

    I still stand behind those choices, and now I will introduce a few more. The key difference this time is that I will focus on apps that, I believe, aren’t as well known as the ones mentioned above while being as useful in my day-to-day.


    Skanlite

    On my Linux system, this little tool doesn’t let me down; it detects and connects to my network printer and allows me to quickly scan documents. Just works.

    Strawberry Music Player

    I’m not an audiophile, but whenever I need to play or organize some audio files stored locally, it provides all the features that I need. In the past I used VLC for everything, but it feels nice to have a dedicated tool.

    DBeaver

    From past discussions, I don’t know many people that resort to these graphical tools to explore and manage their databases. I also don’t use them often, but when I do, DBeaver always works like a charm. It is also useful to design some ER diagrams.

    Spectacle

    What can I say? The default tool to take screenshots in KDE’s Plasma desktop is the best one I ever used for that purpose.

    QOwnNotes

    Perhaps one of the app categories with more alternatives out there. “Note Taking.” I tried many; this was the one that worked for me. Very versatile, works with plain text documents, and has useful integrations with Nextcloud.


    All local apps that respect your “freedom.” Given I try very hard to avoid Electron apps, especially those that require an internet connection to work. That type of software I run on the web browser.

    OK, OK… I still have Signal installed, but I don’t have solid alternatives there. But it is the last remaining Electron app on the system.

    Let me know what you think and if there’s a better replacement for any of them.

    Fediverse Reactions
  • Django: Deferred constrain enforcement

    Another Friday, another Django related post. I guess this blog is becoming a bit monothematic. I promise the next ones will bring the much-needed diversity of contents, but today let’s explore a very useful feature of the Django’s ORM.

    Ok… Ok… it’s more of a feature of PostgreSQL that Django supports, and it isn’t available on the other database backends. But let’s dive in any way.

    Let’s imagine this incredibly simplistic scenario where you have the following model:

    class Player(models.Model):
      team = models.ForeignKey(Team, on_delete=models.CASCADE)
      squad_number = models.PositiveSmallIntegerField()
    
      class Meta:
        constrains = [
          models.UniqueConstraint(
            name="unique_squad_number",
            fields=["team", "squad_number"],
          )
        ]

    So a team has many players and each player has a different squad/shirt number. Only one player can use that number for a given team.

    Users, can select their teams and then re-arrange their player’s numbers however they like. To keep it simple, let’s assume it is done through the Django Admin, using a Player Inline on the Team‘s model admin.

    We add proper form validation, to ensure that no players in the submitted squad are assigned the same squad_number. Things work great until you start noticing that despite your validation and despite the user’s input not having any players assigned the same number, integrity errors are flying around. What’s happening?

    Well, when the system tries to update some player records after being correctly validated, each update/insertion is checked against the constraint (even atomically within a transaction). This means that the order of the updates, or in certain situations all updates with correct data, will “raise” integrity errors, due to conflicts with the data currently stored in the database.

    The solution? Deferring the integrity checks to the end of the transaction. Here’s how:

    class Player(models.Model):
      team = models.ForeignKey(Team, on_delete=models.CASCADE)
      squad_number = models.PositiveSmallIntegerField()
    
      class Meta:
        constrains = [
          models.UniqueConstraint(
            name="unique_squad_number",
            fields=["team", "squad_number"],
            deferrable=models.Deferrable.DEFERRED,
          )
        ]

    Now, when you save multiple objects within a single transaction, you will no longer see those errors if the input data is valid.

    Fediverse Reactions
  • Django: Overriding translations from dependencies

    This week, I’ll continue on the same theme of my previous “Django Friday Tips” post. Essentially, we will keep addressing small annoyances that can surface while developing your multilingual project.

    The challenge for this article shows up when a given string from a package that is a dependency of your project is either:

    • Not translated in the language you are targeting.
    • Translated in a slightly different way than you desire.

    As we are all aware, most packages use English by default, then the most popular ones often provide translations for languages that have more active users willing to contribute. But these efforts are laborious and can have differences for distinct regions, even if they use the same base language.

    Contributing upstream, might not always be an option.

    This means that to maintain the coherence of the interface of your project, you need to adapt these translations locally.

    Handling the localization of the code in your repository in Django is obvious and well documented. Django collects the strings and adds the translation files to the locale path (per app or per project).

    For the other packages, these strings and translations are located within their directory hierarchy, outside the reach of the makemessages command. Django, on the other hand, goes through all these paths, searching for the first match.

    With this in mind, the easiest and most straightforward way I was able to find to achieve this goal was:

    Create a file in your project (in an app directory or in a common project directory), let’s call it locale_overrides.py and put there the exact string from your dependency (Django or another) that you which to translate:

    from django.utils.translation import gettext_lazy as _
    
    locale_overrides = [
        _("This could sound better."),
        ...
    ]

    Then run manage.py makemessages, translate the new lines in the .po file as you wish, then finally compile your new translations with manage.py compilemessages.

    Since your new translations are found first, when your app is looking for them, they will be picked instead of the “original” ones.

    For tiny adjustments, this method works great. When the amount of content starts growing too much, a new approach might be more appropriate, but that will be a topic for another time.

  • Security.txt in the wild: 2025 edition

    One year ago, I checked the top 1 million “websites” for a security.txt file and then posted the results in this blog. As it was described at the time, I used a tool written by someone else who had already run this “experiment” in 2022.

    You can look at the post, if you are keen to know what is this file, why I was curious about the adoption numbers and what were last year’s results.

    As promised, I am collecting and publishing this information on the blog again this year. Yes, I did remember or, more precisely, my calendar app did.

    The first step was to download the same software again, and the second step was to download the most recent list of the top 1 million domains from the same source.

    Then, after consuming energy for a few of hours and wasting some bandwidth, the results were the following:

    TotalChange from 2024
    Sites scanned999968-0,003%
    Sites with a valid file1773-81%
    Sites with an invalid file12140+454%
    Sites without a file986055-0,25%
    ContactPolicyHiringEncryptionExpiry
    Sites with value120194526310730528480
    Change from 2024+30,3%+23,2%+21,2%+15,2%+70,9%

    Overall, there was an expected increase in usage, however the change from last year is again underwhelming. The number of domains with the file went from, 11501 to 13913, which is a minimal improvement.

    The valid/invalid numbers seem to be messed up, but this could be due to the software being outdated with the spec. I didn’t waste too much time on this.

    Even considering and ignoring the limitations described in the original author’s post, and ignoring the valid file detection issue, I think these results might not reflect the reality, due to the huge number of errors found in the output file.

    Overall, adoption seems to be progressing, but it still seems very far from being something mainstream.

    If I do this next year, perhaps it will be better to use a different methodology and tools, so I can obtain more reliable results.

    Fediverse Reactions
  • Status of old PyPI projects: archived

    Since late January, the python package index (PyPI) supports archiving projects/packages. This is, in fact, a very welcome feature, since it clearly tells without any doubt when a package is no longer maintained and will not receive any further updates.

    It makes it easier for the person looking for packages, to know which ones deserve a closer inspection and which ones are there abandoned, polluting the results.

    Previously, the only viable way to retire a package was by adding a disclaimer to the README, and let it sit there indefinitely, being treated just like the other active packages.

    “You had the option of deleting the package”, you might say. Yes, but as I explained in a previous post, this is dangerous and should be avoided. So, archiving is in my view the best course of action when a person no longer wants to maintain their published packages and projects.

    With this in mind, this week I decided to do my part and archive old packages that I had published for different reasons and were there abandoned for years. These were:

    • mdvis: a small package I wrote many years ago, mostly to learn how to publish things on PyPI.
    • auto-tune: something I was about to start working on for a previous employer and that was cancelled at the last minute.
    • django-cryptolock: an experiment done for a previous client. It tried to implement an existing proposal for an authentication scheme, using Monero wallets.
    • monero-python: a few years ago, during my day-to-day work, this package was removed (then renamed by the original author). At the time, it was a direct dependency for many projects and tools, which meant a malicious actor could have taken it and compromise those systems. As a precaution, I grabbed the open name. It has been there empty ever since.

    Now it is your turn.

    After a sufficient number of packages get marked as archived, we can hope for some enhancements to the search functionality of PyPI. Namely, a way of filtering out archived packages from the results and a visual marker for them in the list view. One step at a time.

    Fediverse Reactions
  • Why isn’t my translation showing up?

    Here we go again for another post of this blog’s irregular column, entitled Django’s Friday Tips. Today let’s address a silent issue, that any of you that have formerly worked with internationalization (i18n) almost certainly already faced.

    You add a string that must be translated:

    from django.utils.translation import gettext_lazy as _
    
    some_variable = _("A key that needs translation")

    You then execute the manage.py makemessages --locale pt command, go to the generated django.po file and edit the translation:

    msgid "A key that needs translation"
    msgstr "Uma chave que precisa de tradução"

    You compile (manage.py compilemessages --locale pt), and proceed with your work.

    A few moments later, when checking the results of your hard effort… nothing, the text is showing the key (in English).

    Time to double-check the code (did I forgot the gettext stuff?), the translation file, the I18n settings, etc. What the hell?

    Don’t waste any more time, most likely the translation is marked as fuzzy, like this:

    #: myapp/module.py:3
    #, fuzzy
    #| msgid "Another key that needed translation"
    msgid "A key that needs translation"
    msgstr "Uma chave que precisa de tradução"

    You see, you didn’t notice that #, fuzzy line and kept it there. The command that compiles those translations ignores the messages marked as fuzzy.

    So the solution is to remove those extra 2 lines, or to compile with the --use-fuzzy⁣ flag. That’s it, compile, and you should be able to proceed with the problem solved.

    Fediverse Reactions
  • The books I enjoyed the most in 2024

    Another year went by, and another batch of books was consumed. Just like I did last year, I want to share the ones that I enjoyed the most.

    But what kind of metric is that? Truth be told, it is not an objective one. Last year, I clearly described it like this:

    I don’t mean they are masterpieces or references in a given field, what I mean is that I truly enjoyed the experience. It could be because of the subject, the kind of book, the writing style or for any other reason.

    What matters is that I was able to appreciate the time I spent reading them.

    And I still think it is what truly matters.

    So this year I will repeat the dose — two more books that were entirely worth the money.

    “Broken Money”, by Lyn Alden

    This is a book about money (surprise, surprise). Not in the usual sense of telling the reader how to earn it or on how to spend it. The focus is instead on what it is, what forms of money existed throughout history, how it was used and how each of those forms failed to fulfil their purpose at a given time.

    As the book progresses, it introduces the reader to important financial concepts, practices, and institutions that were born to fulfil certain needs, or to accomplish a desired outcome. It discusses their purposes and their problems.

    When describing the current state of affairs, the author focuses on how the existing financial system doesn’t serve all people equally. Example after example, we can see how some benefit from it, while others are harmed by it, over and over again.

    The book ends by taking a look at the internet age and exploring “alternatives” that are surfacing on the horizon.

    It had a real impact on how I see money and the financial system.

    “Masters of Doom”, by David Kushner

    Another great book that was a joy to read was “Masters of Doom”, and I guess that every kid from the 90s that touched a PC during that time will know at least one game that is mentioned there.

    It tells the story about the people behind “id Software” and their journey throughout most of the decade while they developed and released games such as Commander Keen, Wolfenstein 3D, Doom, and Quake.

    As a kid, I remember playing and enjoying some of those games, many hours of fun and excitement. I was too young to know or follow the stories and the dramas of the game development industry, but I definitely hold great memories of the outcome.

    In the book you will find how they met, the ups, the downs, the drama, etc. You know, the whole rollercoaster that any new and successful company eventually goes through.

    While many other people were involved in making those games, and eventually make the company prosper, the two main characters in this story are John Carmack and John Romero. With very distinct personalities, it is remarkable how far they were able to take this endeavor together.

    If you lived during that time, I guess you will enjoy the book.

    Fediverse Reactions
  • Optimizing mastodon for a single user

    I’ve been participating in the Fediverse through my own mastodon instance since 2017.

    What started as an experiment to test new things, focused on exploring decentralized and federated alternatives for communicating on top of the internet, stuck. At the end of 2024, I’m still there.

    The rhetoric on this network is that you should find an instance with a community that you like and start from there. At the time, I thought that having full control over what I publish was a more interesting approach.

    Nowadays, there are multiple alternative software implementations that you can use to join the network (this blog is a recent example) that can be more appropriate for distinct use cases. At the time, the obvious choice was Mastodon in single user mode, but ohh boy it is heavy.

    Just to give you a glimpse, the container image surpasses 1 GB in size, and you must run at least 3 of those, plus PostgreSQL database, Redis broker and optionally elastic search.

    For a multiple user instance, this might make total sense, but for a low traffic, single user service, it is too much overhead and can get expensive.

    A more lightweight implementation, would fit my needs much better, but just thinking about the migration process gives me cold feet. I’m also very used to the apps I ended up using to interact with my instance, which might be specific for Mastodon’s API.

    So, I decided to go in a different direction, look for the available configurations that would allow me to reduce the weight of the software on my small machine, in order words, run Mastodon on the smallest machine possible.

    My config

    To achieve this goal and after taking a closer look at the available options, these are the settings that I ended up changing over time that produced some improvements:

    • ES_ENABLED=false — I don’t need advanced search capabilities, so I don’t need to run this extra piece of software. This was a decision I made on day 1.
    • STREAMING_CLUSTER_NUM=1 — This is an old setting that manages the number of processes that deal with web sockets and the events that are sent to the frontend. For 1 user, we don’t need more than one. In recent versions, this setting was removed, and the value is always one.
    • SIDEKIQ_CONCURRENCY=1 — Processing background tasks in a timely fashion is fundamental for how Mastodon works, but for an instance with a single user, 1 or 2 workers should be more than enough. The default value is 5, I’ve used 2 for years, but 1 should be enough.
    • WEB_CONCURRENCY=1 — Dealing with a low volume of requests, doesn’t too many workers, but having at least some concurrency is important. We can achieve that with threads, so we can keep the number of processes as 1.
    • MAX_THREADS=4 — The default is 5, I reduced it to 4, and perhaps I can go even further, but I don’t think I would have any significant gains.

    To save some disk space, I also periodically remove old content from users that live on other servers. I do that in two ways:

    • Changed the media cache retention period to 2 and user archive retention period to 7, in Administration>Server Settings>Content Retention.
    • Periodically run the tootctl media remove and tootctl preview_cards remove commands.

    Result

    In the end, I was able to reduce the resources used by my instance and avoid many of the alerts my monitoring tools were throwing all the time. However, I wasn’t able to downsize my machine and reduce my costs.

    It still requires at least 2 GB of RAM to run well, even though with these changes, there’s much more breathing room.

    If there is a lesson to be learned or a recommendation to be done with this post, it is that if you want to participate in the Fediverse, while having complete control, you should opt for a lighter implementation.

    Do you know any other quick tips that I could try to optimize my instance further? Let me know.

  • An experiment in fighting spam on public forms using “proof of work”

    Spam is everywhere. If you have an email account, a mailbox, a website with comments, a cellphone, a social media account, a public form, etc. We all know it, it is a plague.

    Over the years, there have been multiple attempts to fight spam, with various degrees of success, some more effective than others, some with more side effects, some simple, some complex, some proprietary…

    Online, one of the most successful approaches has been captchas. Just like spam, these little and annoying challenges are everywhere. I’m not a fan of captchas, for many reasons, but mainly because the experience for humans is often painful, they waste our time, and they make us work for free.

    So, for public forms on my websites, I usually don’t use any sort of measure to fight spam. The experience for humans is straightforward, and later I deal with the spam myself.

    It isn’t too much work, but these are also just several low traffic websites. Obviously, this doesn’t “scale”.

    So, in March, I decided to do a little experiment, to try to find a way of reducing the amount of spam I receive from these forms. I would rather not attach an external service, use a captcha or anything that could change the experience of the user.

    Proof-of-work enters the room

    It is public knowledge that the “Proof of work” mechanism, used in Bitcoin, started as one of these attempts to fight email spam. So this is not a novel idea, it has decades.

    The question is, would some version of it work for my public forms? The answer is yes, and in fact, it didn’t take much to get some meaningful results.

    So, without spending too much effort, I added a homegrown script to one of my forms that would do the following before the form is submitted to the server:

    fields = (grab all form fields)
    content = (concatenate contents of all fields)
    difficulty = (get the desired number of leading zeros)
    loop
      nonce = (get new nonce value)
      hash_input = (concatenate nonce with content)
      hash = (SHA256 digest of hash_input)
      if hash meets difficulty
        Add nonce to form
        break
      end if
    end loop
    submit form

    On the server side, I just calculate the hash of the contents plus the nonce, and check if it matches the desired difficulty. If it doesn’t, I discard the message.

    The mechanism described above, obviously, has serious flaws and could be bypassed in multiple ways (please don’t rely on it). It is not even difficult to figure out. But for the sake of the experiment, this was the starting point.

    I just tuned a bit the difficulty parameter to be something a human couldn’t distinguish from a slow website and that could be impactful to bots (nothing scientific).

    The results

    Six months went by, and the results are better than I initially expected. Specially because I thought I would have to gradually improve the implementation as the spammers figure out how to bypass this little trick.

    I also assumed that at least some of them would use headless browsers, to deal with all the JavaScript that websites include nowadays, and automatically bypass this protection.

    In the end, I was dead wrong. So this little form went from ~15 spam submissions every single day, to 0. During the whole 6-month period, a total of 1 spam message went through.

    But you might ask, has the traffic stopped? Did the bots stop submitting?

    Chart of the number of form submissions during the last 30 days.

    As the chart above shows, no, the spammers continued to submit their home cooked spam. During the last 30 days, a total of 210 POST requests were sent, but the spam was just filtered out.

    So, what did I learn with this experiment? I guess that the main lesson was that these particular spammers, are really low-effort creatures. You raise the bar a little, and they stop being effective.

    The second lesson was that we can definitely fight spam without relying on third parties and without wasting our users/visitors time. A better implementation is undoubtedly required, but for low traffic websites, it might be a good enough solution. We just need an off-the-shelf component that is easy to integrate (I guess some already exist, I just didn’t spend too much time exploring).

    I’m also curious about other alternatives, such as requiring a micropayment to submit the form (like a fraction of a cent). Until now, this would require a third-party integration and be a pain in every way, but with the Bitcoin’s lightning network becoming ubiquitous, this might become a viable alternative (there are similar projects out there, that work great).

  • Hawkpost enters “maintenance only” mode

    In practice this already happened a couple of years ago, now we are just making it official.

    For those who don’t know, Hawkpost is a side project that I started while at Whitesmith back in 2016 (8+ years ago). I’ve written about it here in the blog on several occasions.

    To sum it up, it is a tool made to solve a problem that at the time I frequently saw in the wild, while doing the typical agency/studio work. Clients and most people shared credentials and other secrets for their projects through insecure means (in plain text on chats, emails, etc.). It bothered me to the point of trying to figure out a solution, that was both, easy to use for me and my coworkers, and obvious/transparent to people who simply don’t care about it.

    Awareness about encryption at the time, while making rapid progress, was not as widespread as it is today. Tools were not as easy to use.

    Hawkpost ended up being very useful for many people. It didn’t have to be perfect, it just needed to improve the existing state of affairs, as it did.

    Eight years after, things have changed. I no longer do agency work, I’ve changed workplaces, awareness improved a lot, and many other tools appeared in the market. Hawkpost’s development has stalled, and while it still has its users, we haven’t seen much overall interest to keep working on it.

    To be sincere, I don’t use it anymore. That’s because I have other tools at my disposal that are much better for the specific use-cases they address, and perhaps also better for Hawkpost’s original purpose.

    Here are some examples:

    • For sharing credentials within a non-technical team (if you really must): Use a proper team password manager such as Bitwarden or 1Password.
    • For sharing files and other sizable data: one good alternative is to use send (the successor of Firefox Send). It also has an official CLI client.
    • For sharing and working on encrypted documents: CryptPad has a whole range of applications where data is encrypted E2E.

    So, this week, we released the version 1.4.0 of Hawkpost. It fixes some bugs, updates major dependencies and makes sure the project is in good shape to continue to receive small updates. The full list of changes can be found here.

    However, new features or other big improvements won’t be merged from now on (at least for the foreseeable future). The project is in “maintenance only” mode. Security issues and anything that could make the project unusable, will be handled, but nothing else.

  • Is it “/.well-known/”?

    Ironically, according to my experience, the .well-known directory doesn’t do justice to its name. Even in use cases that would fit nicely in its original purpose. 

    But I’m getting a bit ahead of myself. Let’s first start with what it is, then move to discuss where it’s used. But we’ll do this rapidly, otherwise this post will get boring really fast.

    Let’s look at what the RFC has to say:

    Some applications on the Web require the discovery of information about an origin before making a request.

    … designate a “well-known location” for data or services related to the origin overall, so that it can be easily located.

    … this memo reserves a path prefix in HTTP, HTTPS, WebSocket (WS), and Secure WebSocket (WSS) URIs for these “well-known locations”, “/.well-known/”. Future specifications that need to define a resource for such metadata can register their use to avoid collisions and minimise impingement upon origins’ URI space.

    So, briefly, it is a standard place or set of standard URIs, that can be used by people or automated processes to obtain (meta)data about resources of the domain in question. The purpose of the requests and the content of the responses doesn’t even need to be related to the web.

    The RFC introduces the need for this “place”, by providing the example of the “Robots Exclusion Protocol” (robots.txt), which is a good example… that paradoxically doesn’t use the well-known path.

    Now that the idea is more or less settled, here are other examples of cool and useful protocols that actually make use of it.


    ACME HTTP Challenge

    The use-case, here, is that an external entity needs to verify you own the domain. So to prove it, you place a unique/secret “token” in a certain path, in order for this entity to make a request and check that is true.

    Many Let’s Encrypt tools, make use of this approach.

    Security.txt

    This one is a bit obvious, and I already addressed it in previous posts (here and here). It is just a standard place to put your security contacts, so that researchers can easily find all the data they need to alert you about any of their findings.

    Web Key Directory (PGP)

    Traditionally, OpenPGP relied on “key servers” and the web of trust, for people to fetch the correct public keys for a given email address. With the “Web Key Directory”, domain owners can expose the correct and up-to-date public keys to associated addresses in a well-known path. Then, email clients can quickly fetch just by knowing the address itself.

    Lightning Address / LN URL Pay

    Sending on-chain Bitcoin to pay for a beer at the bar, or to send a small tip, is not that useful or practical at all (time and long addresses will get in the way).

    For small payments in Bitcoin, the lightning network is what you should use. While instantaneous, this approach requires a small dance between both wallets (showing QR code, etc.)

    Using a lightning address (which is essentially the same as an email address), solves this problem. You type the address and send the funds, done. Your wallet takes care of figuring the rest. To accomplish that, it fetches all the information from a standard place in the /.well-known/ path.

    I wrote about it before, and if you wish, you can buy me a beer by sending a few “sats” to my “email” address.

    • Suffix: lnurlp
    • Details 1 / 2

    Password-Change

    This feature allows password managers to know where the form to change the password of a given website is located. Allows users to go straight to that place from the password manager’s UI.

    Digital Asset Link

    Have you ever touched on a link while using your Android smartphone and received a suggestion to open it in a certain app instead of the predefined web browser?

    Me too, now you know how it is done.


    The whole list of recognized well-known URI’s can be found here. But I guess there are way more suffixes in use, since 2 of the 6 mentioned above are not there but are widely used within their ecosystems.

    That’s it, looking at the list above gives us a glimpse of how certain things are implemented and a few good ideas of things we could add to our domains/websites.

  • “Extracting wisdom” from conference videos

    PyCon US happened in May, this month, the 154 videos gradually started being published on YouTube. Between now and then many other interesting conferences took place. That’s a lot of talks, presentations, and content to be digested.

    The truth is, I and most people, won’t watch it all since our time is limited. One option, is to look at the titles and descriptions, then guess what might be the most interesting content. This is a gamble, and my experience tells me that I often get disappointed with my picks.

    This process can be repeated for the dozens of conferences a person is interested in. What if we could have a way of:

    • Finding the best videos to watch, based on our “needs”.
    • Extracting the main teachings of all content.
    • Store it is a consumable / searchable way.

    It sounds like a lot of work. But fortunately, in 2024 our digital assistants can help us with that.

    A couple of months ago, I wrote a blog post about how I run these AI tools on my device without the need of leaking any data to external services. Today, I’ll describe how I used them to help me with the task of extracting the key information and learnings from all the videos I won’t be able to watch.

    In the process, I will also share the results publicly. So, let’s get started.

    The tools

    As I’ve mentioned in the other post, I use “ollama” to run the AI models locally. For this task, since it will require the digestion of a lot of content and given my machine has modest resources, I will rely on llama3:8b. A bigger model might render better results, but it would take forever.

    The next steps are to provide the model with the content that needs to be analyzed, and then precisely instruct it on what to do. For this part of the task, I will rely on “fabric”.

    “Fabric” provides a tool to download the transcripts from YouTube videos and contains a collection of carefully curated prompts (aka instructions) for the AI models to do a multitude of tasks. For my goal, I will use the extract_wisdom pattern/prompt.

    Example output

    To demonstrate what can be achieved by letting the model analyze a conference video, following the instructions of a good prompt, I’m including an example.

    I used “extract_wisdom” on the following video: Iterate, Iterate, Iterate! (Moshe Zadka) — PyTexas 2024

    I obtained the following output:

    **SUMMARY**
    Mosha ZKA presents on iterators and generators in Python. He discusses how to use them, their benefits, and common pitfalls.
    
    **IDEAS:**
    • Iterators are objects that define a sequence of values.
    • Generators are special types of functions that can be used as iterators.
    • The `iter()` function returns an iterator object.
    • Use the `next()` method to get the next value from an iterator.
    • Avoid using infinite loops with generators.
    • Use iterators and generators to process large datasets efficiently.
    • Python's built-in functions, such as `range()`, are implemented using iterators.
    • You can use the `enumerate()` function to iterate over both indices and values.
    • The `zip()` function takes iterables of equal length and returns an iterator.
    • Use the `groupby()` function from the `itertools` module to group items based on a common attribute.
    • Avoid using `for` loops with generators, as they can lead to infinite loops.
    
    **INSIGHTS:**
    • Iterators and generators are essential tools for working with large datasets in Python.
    • Using iterators and generators can simplify your code and improve performance.
    • Be mindful of the differences between iterators and generators, and use them accordingly.
    • Avoid using `for` loops with generators, as they can lead to infinite loops.
    • Use the `iter()` function to create an iterator object from a sequence or other iterable.
    • Use the `next()` method to get the next value from an iterator.
    • Iterators and generators can help you write more efficient and scalable code.
    
    **QUOTES:**
    • "Iterators are objects that define a sequence of values."
    • "Generators are special types of functions that can be used as iterators."
    • "The `iter()` function returns an iterator object."
    
    **HABITS:**
    • Set aside time each day to learn new programming concepts.
    • Read books on Python programming and related topics.
    • Practice writing code using iterators and generators.
    • Join online communities and forums for Python developers.
    
    **FACTS:**
    • The first version of Python was released in 1991.
    • Guido van Rossum, the creator of Python, wanted to create a scripting language that was easy to learn and use.
    • Python is now one of the most popular programming languages worldwide.
    • The `itertools` module in Python provides many useful functions for working with iterators.
    
    **REFERENCES:**
    • "Python Crash Course" by Eric Matthes
    • "Automate the Boring Stuff with Python" by Al Sweigart and others
    
    **ONE-SENTENCE TAKEAWAY:**
    Iterators and generators are powerful tools that can simplify your code, improve performance, and help you write more scalable programs.
    
    **RECOMMENDATIONS:**
    • Use iterators and generators to process large datasets efficiently.
    • Avoid using `for` loops with generators, as they can lead to infinite loops.
    • Practice writing code using iterators and generators.
    • Read books on Python programming and related topics.
    • Join online communities and forums for Python developers.

    Evaluating the results

    After going through the notes and then picking some videos, it is clear that the extracted content is not even close to being 100% spot on. I’ve noticed things such as:

    • Focusing on only a small part of the video.
    • Highlighting superfluous stuff, while missing content that I would classify as important.
    • Misinterpretation of what has been said (I only found one occurrence of this, but I assume there will be more. It is in the example I’ve shown above, try to find it).

    Nevertheless, I still found the results helpful for the purpose I was aiming for. I guess that some issues might be related to:

    1. The model that I’ve chosen. Perhaps using a bigger one would render better notes.
    2. The fact that the approach relies on transcripts. The model misses the information that is communicated visually (slides, demos, etc.).

    This approach would definitely provide better results when applied to written content and podcast transcripts.

    The repository

    I’ve written a quick script to run this information extraction on all videos of a YouTube playlist (nothing fancy that’s worth sharing). I’ve also created a repository where I store the results obtained when I run it for a conference playlist (PyCon is still not there since all videos were not released yet).

    Every time I do this for a conference I’m interested in, I will share these automatically generated notes there.

    You are welcome to ask for a specific conference to be added. If it is on Youtube, it is likely I can generate them. Just create a new issue in the repository.

    Fediverse Reactions
  • Ways to have an atomic counter in Django

    This week, I’m back at my tremendously irregular Django tips series, where I share small pieces of code and approaches to common themes that developers face when working on their web applications.

    The topic of today’s post is how to implement a counter that isn’t vulnerable to race conditions. Counting is everywhere, when handling money, when controlling access limits, when scoring, etc.

    One common rookie mistake is to do it like this:

    model = MyModel.objects.get(id=id)
    model.count += 1
    model.save(update_fields=["count"])

    An approach that is subject to race conditions, as described below:

    • Process 1 gets count value (let’s say it is currently 5)
    • Process 2 gets counts value
    • Process 1 increments and saves
    • Process 2 increments and saves

    Instead of 7, you end up with 6 in your records.

    On a low stakes project or in a situation where precision is not that important, this might do the trick and not become a problem. However, if you need accuracy, you will need to do it differently.

    Approach 1

    with transaction.atomic():
        model = (
            MyModel.objects.select_for_update()
            .get(id=id)
        )
        model.count += 1
        model.save(update_fields=["count"])

    In this approach, when you first fetch the record, you ask for the database to lock it. While you are handling it, no one else can access it.

    Since it locks the records, it can create a bottleneck. You will have to evaluate if fits your application’s access patterns. As a rule of thumb, it should be used when you require access to the final value.

    Approach 2

    from django.db.models import F
    
    model = MyModel.objects.filter(id=id).update(
        count=F("count") + 1
    )

    In this approach, you don’t lock any values or need to explicitly work inside a transaction. Here, you just tell the database, that it should add 1 to the value that is currently there. The database will take care of atomically incrementing the value.

    It should be faster, since multiple processes can access and modify the record “at the same time”. Ideally, you would use it when you don’t need to access the final value.

    Approach 3

    from django.core.cache import cache
    
    cache.incr(f"mymodel_{id}_count", 1)

    If your counter has a limited life-time, and you would rather not pay the cost of a database insertion, using your cache backend could provide you with an even faster method.

    The downside, is the level of persistence and the fact that your cache backend needs to support atomic increments. As far as I can tell, you are well served with Redis and Memcached.

    For today, this is it. Please let me know if I forgot anything.

  • Are Redis ACL password protections weak?

    Earlier this year, I decided to explore Redis functionality a bit more deeply than my typical use-cases would require. Mostly due to curiosity, but also to have better knowledge of this tool in my “tool belt”.

    Curiously, a few months later, the whole ecosystem started boiling. Now we have Redis, Valkey, Redict, Garnet, and perhaps a few more. The space is hot right now and forks/alternatives are popping up like mushrooms.

    One common thing inherited from Redis is storing user passwords as SHA256 hashes. When I learned about this, I found it odd, since it goes against common best practices. The algorithm is very fast to brute force, does not protect against the usage of rainbow tables, etc.

    Instead of judging too fast, a better approach is to understand the reasoning for this decision, the limitations imposed by the use-cases and the threats such application might face.

    But first, let’s take a look at a more standard approach.

    Best practices for storing user passwords

    According to OWASP’s documentation on the subject, the following measures are important for applications storing user’s passwords:

    1. Use a strong and slow key derivation function (KDF).
    2. Add salt (if the KDF doesn’t include it already).
    3. Add pepper

    The idea for 1.is that computing a single hash should have a non-trivial cost (in time and memory), to decrease the speed at which an attacker can attempt to crack the stolen records.

    Adding a “salt” protects against the usage of “rainbow tables”, in other words, doesn’t let the attacker simply compare the values with precomputed hashes of common passwords.

    The “pepper” (a common random string used in all records), adds an extra layer of protection, given that, unlike the “salt”, it is not stored with the data, so the attacker will be missing that piece of information.

    Why does Redis use SHA256

    To store user passwords, Redis relies on a vanilla SHA256 hash. No multiple iterations for stretching, no salt, no pepper, nor any other measures.

    Since SHA256 is meant to be very fast and lightweight, it will be easier for an attacker to crack the hash.

    So why this decision? Understanding the use-cases of Redis gives us the picture that establishing and authenticating connections needs to be very, very fast. The documentation is clear about it:

    Using SHA256 provides the ability to avoid storing the password in clear text while still allowing for a very fast AUTH command, which is a very important feature of Redis and is coherent with what clients expect from Redis.

    Redis Documentation

    So this is a constraint that rules out the usage of standard KDF algorithms.

    For this reason, slowing down the password authentication, in order to use an algorithm that uses time and space to make password cracking hard, is a very poor choice. What we suggest instead is to generate strong passwords, so that nobody will be able to crack it using a dictionary or a brute force attack even if they have the hash.

    Redis Documentation

    So far, understandable. However, my agreement ends in the last sentence of the above quote.

    How can it be improved?

    The documentation leaves to the user (aka server administrator) the responsibility of setting strong passwords. In their words, if you set passwords that are lengthy and not guessable, you are safe.

    In my opinion, this approach doesn’t fit well with the “Secure by default” principle, which, I think, is essential nowadays.

    It leaves to the user the responsibility to not only set a strong password, but to also ensure that the password is almost uncrackable (a 32 bytes random string, in their docs). Experience tells me that most users and admins won’t be aware of it or won’t do it.

    Another point made to support the “vanilla SHA256” approach is:

    Often when you are able to access the hashed password itself, by having full access to the Redis commands of a given server, or corrupting the system itself, you already have access to what the password is protecting: the Redis instance stability and the data it contains.

    Redis Documentation

    Which is not entirely true, since ACL rules and users can be set as configuration files and managed externally. These files contain the SHA256 hashes. This means that in many setups and scenarios, the hashes won’t live only on the Redis server. This kind of configuration will be managed and stored elsewhere.

    I’m not the only one who thinks the current approach is not enough, teams implementing compatible but alternative implementations seem to share the concerns.

    So, after so many words and taking much of your precious time, you might ask, “what do you propose?”.

    Given the requirements for extremely fast connections and authentication, the first and main improvement would be to start using a “salt”. It is simple and won’t have any performance impact.

    The “salt” would make the hashes of not so strong passwords harder to crack, given that each password would have an extra random string that would have to be considered individually. Furthermore, this change could be made backwards compatible and added to existing external configuration files.

    Then, I would consider picking a key stretching approach or a more appropriate KDF to generate the hashes. This one would need to be carefully benchmarked, to minimize the performance impact. A small percentage of the time it takes for the whole process of initiating an authenticated connection, could be a good compromise.

    I would skip for now the usage of a “pepper”, since it is not clear how this could be done and managed from the user’s side. Pushing this responsibility to the user (Redis server operator), would create more complexity than it would be beneficial.

    An alternative approach, that could also be easy to implement and would be more secure than the current one, would be to automatically generate the “password” for the users by default. It would work like regular API keys, since it seems this is how Redis sees them:

    However ACL passwords are not really passwords. They are shared secrets between the server and the client, because the password is not an authentication token used by a human being.

    Redis Documentation

    The code already exists:

    …there is a special ACL command ACL GENPASS that generates passwords using the system cryptographic pseudorandom generator: …

    The command outputs a 32-byte (256-bit) pseudorandom string converted to a 64-byte alphanumerical string.

    Redis Documentation

    So it could be just a matter of requiring the user to explicitly bypass this automatic “API key” generation, to set up his own custom password.

    Summing it up

    To simply answer the question asked in the title: yes, I do think the user passwords could be better protected.

    Given the requirements and use-cases, it is understandable that there is a need to be fast. However, Redis should do more to protect the users’ passwords or at least ensure that users know what they are doing and pick an almost “uncrackable” password.

    So I ended up proposing:

    • An easy improvement: Add a salt.
    • A better improvement: Switch to a more appropriate KDF, with low work factor for performance reasons.
    • A different approach: Automatically generate by default a strong password for the ACL users.
    Fediverse Reactions