Categories
Python Software Development

Django Friday Tips: Password validation

This time I’m gonna address Django’s builtin authentication system, more specifically the ways we can build custom improvements over the already very solid foundations it provides.

The idea for this post came from reading an article summing up some considerations we should have when dealing with passwords. Most of those considerations are about what controls to implement (what “types” of passwords to accept) and how to securely store those passwords. By default Django does the following:

  • Passwords are stored using PBKDF2. There are also other alternatives such as Argon2 and bcrypt, that can be defined in the setting PASSWORD_HASHERS.
  • Every Django release the “strength”/cost of this algorithm is increased. For example, version 3.1 applied 216000 iterations and the last version (3.2 at the time of writing) applies 260000. The migration from one to another is done automatically once the user logs in.
  • There are a set of validators that control the kinds of passwords allowed to be used in the system, such as enforcing a minimum length. These validators are defined on the setting AUTH_PASSWORD_VALIDATORS.

By default when we start a new project these are the included validators :

  • UserAttributeSimilarityValidator
  • MinimumLengthValidator
  • CommonPasswordValidator
  • NumericPasswordValidator

The names are very descriptive and I would say a good starting point. But as the article mentions the next step is to make sure users aren’t reusing previously breached passwords or using passwords that are known to be easily guessed (even when complying with the other rules). CommonPasswordValidator already does part of this job but with a very limited list (20000 entries).

Improving password validation

So for the rest of this post I will show you some ideas on how we can make this even better. More precisely, prevent users from using a known weak password.

1. Use your own list

The easiest approach, but also the more limited one, is providing your own list to `CommonPasswordValidator`, containing more entries than the ones provided by default. The list must be provided as a file with one entry in lower case per line. It can be set like this:

{
  "NAME": "django.contrib.auth.password_validation.CommonPasswordValidator",
  "OPTIONS": {"password_list_path": "<path_to_your_file>"}
}

2. Use zxcvbn-python

Another approach is to use an existing and well-known library that evaluates the password, compares it with a list of known passwords (30000) but also takes into account slight variations and common patterns.

To use zxcvbn-python we need to implement our own validator, something that isn’t hard and can be done this way:

# <your_app>/validators.py

from django.core.exceptions import ValidationError
from zxcvbn import zxcvbn


class ZxcvbnValidator:
    def __init__(self, min_score=3):
        self.min_score = min_score

    def validate(self, password, user=None):
        user_info = []
        if user:
            user_info = [
                user.email, 
                user.first_name, 
                user.last_name, 
                user.username
            ]
        result = zxcvbn(password, user_inputs=user_info)

        if result.get("score") < self.min_score:
            raise ValidationError(
                "This passoword is too weak",
                code="not_strong_enough",
                params={"min_score": self.min_score},
            )

    def get_help_text(self):
        return "The password must be long and not obvious"

Then we just need to add to the settings just like the other validators. It’s an improvement but we still can do better.

3. Use “have i been pwned?”

As suggested by the article, a good approach is to make use of the biggest source of leaked passwords we have available, haveibeenpwned.com.

The full list is available for download, but I find it hard to justify a 12GiB dependency on most projects. The alternative is to use their API (documentation available here), but again we must build our own validator.

# <your_app>/validators.py

from hashlib import sha1
from io import StringIO

from django.core.exceptions import ValidationError

import requests
from requests.exceptions import RequestException

class LeakedPasswordValidator:
    def validate(self, password, user=None):
        hasher = sha1(password.encode("utf-8"))
        hash = hasher.hexdigest().upper()
        url = "https://api.pwnedpasswords.com/range/"

        try:
            resp = requests.get(f"{url}{hash[:5]}")
            resp.raise_for_status()
        except RequestException:
            raise ValidationError(
                "Unable to evaluate password.",
                code="network_failure",
            )

        lines = StringIO(resp.text).readlines()
        for line in lines:
            suffix = line.split(":")[0]

            if hash == f"{hash[:5]}{suffix}":
                raise ValidationError(
                    "This password has been leaked before",
                    code="leaked_password",
                )

    def get_help_text(self):
        return "Use a different password"

Then add it to the settings.

Edit: As suggested by one reader, instead of this custom implementation we could use pwned-passwords-django (which does practically the same thing).

And for today this is it. If you have any suggestions for other improvements related to this matter, please share them in the comments, I would like to hear about them.

Categories
Software Development

Documentation done right

One critical piece of the software development process that often gets neglected by companies and also by many open-source projects is explaining how it works and how it can be used to solve the problem in question.

Documentation is often lacking and people have an hard time figuring out how they can use or contribute to a particular piece of software. I think most developers and users have faced this situation at least once.

Looking at it from the other side, it isn’t always easy to pick and share the right information so others can hit the ground running. The fact that not everybody is starting from the same point and have the same goal, makes the job a bit harder.

One approach to solve this problem that I like is Divio’s documentation system.

Divio's documentation system explained. Showing the 4 quadrants and their relations.
The components of Divio’s documentation system.

It splits the problems in 4 areas targeting different stages and needs of the person reading the documentation. Django uses this system and is frequently praised for having great documentation.

From a user point of view it looks solid. You should take a look and apply it on your packages/projects, I surely will.

Categories
Software Development Technology and Internet

Mirroring GitHub Repositories

Git by itself is a distributed version control system (a very popular one), but over the years organizations started to rely on some internet services to manage their repositories and those services eventually become the central/single source of truth for their code.

The most well known service out there is GitHub (now owned by Microsoft), which nowadays is synonymous of git for a huge amount of people. Many other services exist, such as Gitlab and BitBucked, but GitHub gained a notoriety above all others, specially for hosting small (and some large) open source projects.

These centralized services provide many more features that help managing, testing and deploying software. Functionality not directly related to the main purpose of git.

Relying on these central services is very useful but as everything in life, it is a trade-off. Many large open source organizations don’t rely on these companies (such as KDE, Gnome, Debian, etc), because the risks involved are not worth the convenience of letting these platforms host their code and other data.

Over time we have been witnessing some of these risks, such as your project (and all the related data) being taken down without you having any chance to defend yourself (Example 1 and Example 2). Very similar to what some content creators have been experiencing with Youtube (I really like this one).

When this happens, your or your organizations don’t lose the code itself since you almost certainly have copies on your own devices (thanks to git), but you lose everything else, issues, projects, automated actions, documentation and essentially the known used by URL of your project.

Since Github is just too convenient to collaborate with other people, we can’t just leave. In this post I explain an easy alternative to minimize the risks described above, that I implemented myself after reading many guides and tools made by others that also tried to address this problem before.

The main idea is to automatically mirror everything in a machine that I own and make it publicly available side by side with the GitHub URLs, the work will still be done in Github but can be easily switched over if something happens.

The software

To achieve the desired outcome I’ve researched a few tools and the one that seemed to fit all my requirements (work with git and be lightweight) was “Gitea“. Next I will describe the steps I took.

The Setup

This part was very simple, I just followed the instructions present on the documentation for a docker based install. Something like this:

version: "3"

networks:
  gitea:
    external: false

services:
  server:
    image: gitea/gitea:latest
    container_name: gitea
    environment:
      - USER_UID=1000
      - USER_GID=1000
    restart: always
    networks:
      - gitea
    volumes:
      - ./gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "3000:3000"
      - "222:22"

If you are doing the same, don’t copy the snippet above. Take look here for updated instructions.

Since my website is not supposed to have much concurrent activity, using an SQLite database is more than enough. So after launching the container, I chose this database type and made sure I disabled the all the functionality I won’t need.

Part of Gitea's configuration page. Server and Third-Party Service Settings.
Part of the Gitea’s configuration page

After this step, you should be logged in as an admin. The next step is to create a new migration on the top right menu. We just need to choose the “Github” option and continue. You should see the below screen:

Screenshot of the page that lets the users create a new migration/mirror in Gitea.
Creating a new Github migration/mirror in Gitea.

If you choose This repository will be a mirror option, Gitea will keep your repository and wiki in sync with the original, but unfortunately it will not do the same for issues, labels, milestones and releases. So if you need that information, the best approach is to uncheck this field and do a normal migration. To keep that information updated you will have to repeat this process periodically.

Once migrated, do the same for your other repositories.

Conclusion

Having an alternative with a backup of the general Github data ended up being quite easy to set up. However the mirror feature would be much more valuable if it included the other items available on the standard migration.

During my research for solutions, I found Fossil, which looks very interesting and something that I would like to explore in the future, but at the moment all repositories are based on Git and for practical reasons that won’t change for the time being.

With this change, my public repositories can be found in:

Categories
Python Software Development

Why you shouldn’t remove your package from PyPI

Nowadays most software developed using the Python language relies on external packages (dependencies) to get the job done. Correctly managing this “supply-chain” ends up being very important and having a big impact on the end product.

As a developer you should be cautious about the dependencies you include on your project, as I explained in a previous post, but you are always dependent on the job done by the maintainers of those packages.

As a public package owner/maintainer, you also have to be aware that the code you write, your decisions and your actions will have an impact on the projects that depend directly or indirectly on your package.

With this small introduction we arrive to the topic of this post, which is “What to do as a maintainer when you no longer want to support a given package?” or ” How to properly rename my package?”.

In both of these situations you might think “I will start by removing the package from PyPI”, I hope the next lines will convince you that this is the worst you can do, for two reasons:

  • You will break the code or the build systems of all projects that depend on the current or past versions of your package.
  • You will free the namespace for others to use and if your package is popular enough this might become a juicy target for any malicious actor.

TLDR: your will screw your “users”.

The left-pad incident, while it didn’t happen in the python ecosystem, is a well known example of the first point and shows what happens when a popular package gets removed from the public index.

Malicious actors usually register packages using names that are similar to other popular packages with the hope that a user will end up installing them by mistake, something that already has been found multiple times on PyPI. Now imagine if that package name suddenly becomes available and is already trusted by other projects.

What should you do it then?

Just don’t delete the package.

I admit that in some rare occasions it might be required, but most of the time the best thing to do is to leave it there (specially for open-source ones).

Adding a warning to the code and informing the users in the README file that the package is no longer maintained or safe to use is also a nice thing to do.

A good example of this process being done properly was the renaming of model-mommy to model-bakery, as a user it was painless. Here’s an overview of the steps they took:

  1. A new source code repository was created with the same contents. (This step is optional)
  2. After doing the required changes a new package was uploaded to PyPI.
  3. Deprecation warnings were added to the old code, mentioning the new package.
  4. The documentation was updated mentioning the new package and making it clear the old package will no longer be maintained.
  5. A new release of the old package was created, so the user could see the deprecation warnings.
  6. All further development was done on the new package.
  7. The old code repository was archived.

So here is what is shown every time the test suite of an affected project is executed:

/lib/python3.7/site-packages/model_mommy/__init__.py:7: DeprecationWarning: Important: model_mommy is no longer maintained. Please use model_bakery instead: https://pypi.org/project/model-bakery/

In the end, even though I didn’t update right away, everything kept working and I was constantly reminded that I needed to make the change.

Categories
Software Development Technology and Internet

CSP headers using Cloudflare Workers

Last January I made a small post about setting up a “Content-Security-Policy” header for this blog. On that post I described the steps I took to reach a final result, that I thought was good enough given the “threats” this website faces.

This process usually isn’t hard If you develop the website’s software and have an high level of control over the development decisions, the end result ends up being a simple yet very strict policy. However if you do not have that degree of control over the code (and do not want to break the functionality) the policy can end up more complex and lax than you were initially hoping for. That’s what happened in my case, since I currently use a standard installation of WordPress for the blog.

The end result was a different security policy for different routes and sections (this part was not included on the blog post), that made the web-server configuration quite messy.

(With this intro, you might have already noticed that I’m just making excuses to redo the initial and working implementation, in order to test some sort of new technology)

Given the blog is behind the Cloudflare CDN and they introduced their “serverless” product called “Workers” a while ago, I decided that I could try to manage the policy dynamically on their servers.

Browser <--> CF Edge Server <--> Web Server <--> App

The above line describes the current setup, so instead of adding the CSP header on the “App” or the “Web Server” stages, the header is now added to the response on the last stage before reaching the browser. Let me describe how I’ve done it.

Cloudflare Workers

First a very small introduction to Workers, later you can find more detailed information on Workers.dev.

So, first Cloudflare added the v8 engine to all edge servers that route the traffic of their clients, then more recently they started letting these users write small programs that can run on those servers inside the v8 sandbox.

The programs are built very similarly to how you would build a service worker (they use the same API), the main difference being where the code runs (browser vs edge server).

These “serverless” scripts can then be called directly through a specific endpoint provided by Cloudflare. In this case they should create and return a response to the requests.

Or you can instruct Cloudflare to execute them on specific routes of your website, this means that the worker can generate the response, execute any action before the request reaches your website or change the response that is returned.

This service is charged based on the number of requests handled by the “workers”.

The implementation

Going back to the original problem and based on the above description, we can dynamically introduce or modify the “Content-Security-Policy” for each request that goes through the worker which give us an high degree of flexibility.

So for my case a simple script like the one below, did the job just fine.

addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})

/**
 * Forward the request and swap the response's CSP header
 * @param {Request} request
 */
async function handleRequest(request) {
  let policy = "<your-custom-policy-here>"
  let originalResponse = await fetch(request)
  response = new Response(originalResponse.body, originalResponse)
  response.headers.set('Content-Security-Policy', policy)
  return response
}

The script just listens for the request, passes it to a handler function (lines 1-3), forwards to the origin server (line 12), grabs the response (line 13), replaced the CSP header with the defined policy (line 14) and then returns the response.

If I needed something more complex, like making slight changes to the policy depending on the User-Agent to make sure different browsers behave as expected given the different implementations or compatibility issues, it would also be easy. This is something that would be harder to achieve in the config file of a regular web server (nginx, apache, etc).

Enabling the worker

Now that the script is done and the worker deployed, in order to make it run on certain requests to my blog, I just had to go to the Cloudflare’s dashboard of my domain, click on the “workers” section and add the routes I want it to be executed:

cloudflare workers routes modal
Configuring the routes that will use the worker

The settings displayed on the above picture will run the worker on all requests to this blog, but is can be made more specific and I can even have multiple workers for different routes.

Some sort of conclusion

Despite the use-case described in this post being very simple, there is potential in this new “serverless” offering from Cloudflare. It definitely helped me solve the problem of having different policies for different sections of the website without much trouble.

In the future I might comeback to it, to explore other user-cases or implementation details.

Categories
Software Development

Rust examples and exercises

Learning to program in Rust is as easy like other languages out there, because it ends up having different constrains and new concepts that you will have to go through, in the beginning everybody fights the compiler at least a little bit.

I started this journey a while ago, however I’ve been progressing slowly just dedicating some time once in a while when I don’t anything else to do.

I did what many recommendations on the internet tell you to do, start by reading the official book, that is in fact pretty good. But after reading one or two chapters, we need to practice and play with the language to have a feel of it and explore the new concepts you had just learned.

So in this small post I just want to share two open resources that can be used while you read the book to practice what you have just learned.

The first one is a website with examples you can modify and execute live in the browser called Rust by Example.

The second is an official rust project that will put your knowledge up to a test called Rustlings.

You can use it like the above video or with rustlings watch that stop and reload each exercise until you solve it.

This is it, I hope they end being helpful to someone else as well.

Categories
Software Development

Keep your dependencies under check

Nowadays most software projects with a “decent size” rely on many software dependencies, or in other words: libraries and tools, developed by other people. That usually are under constant change.

The reasons for these are clear and can go from implementing common patterns and avoid repeating ourselves, to accelerate the development, to use mature implementations and avoid some pitfalls, etc. Sometimes many projects rely on way too many dependencies for simples things (Remember the left-pad fiasco?).

Once these dependencies are loaded, integrated and working as expected, people often forget they are there and many times they stay untouched for long periods of time. Even when newer versions are released, unless something starts breaking, nobody remembers to keep them up to date, a situation that might lead to security vulnerabilities, not in your code but on the code your project depends on.

Of course I’m not telling you anything new, what I pretend to achieve with this post, is to show that there are many tools available to help you fight this problem. When you integrate them on your CI or on another step of your development process, they will keep you informed about what dependencies have known security vulnerabilities and what you should upgrade as soon as possible.

The majority of the programming languages have this sort of tools, so a little search should help you find the one that better suits you stack. Below are some examples:

As an example here is what I needed to do in order to check the dependencies of Hawkpost (an open-source project that I’m deeply involved with at the moment):

$ safety check --full-report -r requirements/requirements.txt
safety report
---
No known security vulnerabilities found

For most of these tools the basic check is this simple to do and in the long run it might save you from some headaches.

Update (26-06-2018): Added cargo-audit to the list