Categories
Python

Django Friday Tips: Subresource Integrity

As you might have guessed from the title, today’s tip is about how to add “Subresource integrity” (SRI) checks to your website’s static assets.

First lets see what SRI is. According to the Mozilla’s Developers Network:

Subresource Integrity (SRI) is a security feature that enables browsers to verify that resources they fetch (for example, from a CDN) are delivered without unexpected manipulation. It works by allowing you to provide a cryptographic hash that a fetched resource must match.

Source: MDN

So basically, if you don’t serve all your static assets and rely on any sort of external provider, you can force the browser to check that the delivered contents are exactly the ones you expect.

To trigger that behavior you just need to add the hash of the content to the integrity attribute of the <script> and/or <link> elements in question.

Something like this:

<script src="https://cdn.jsdelivr.net/npm/vue@2.6.12/dist/vue.min.js" integrity="sha256-KSlsysqp7TXtFo/FHjb1T9b425x3hrvzjMWaJyKbpcI=" crossorigin="anonymous"></script>

Using SRI in a Django project

This is all very nice but adding this info manually isn’t that fun or even practical, when your resources might change frequently or are built dynamically on each deployment.

To help with this task I recently found a little tool called django-sri that automates these steps for you (and is compatible with whitenoise if you happen to use it).

After the install, you just need to replace the {% static ... %} tags in your templates with the new one provided by this package ({% sri_static .. %}) and the integrity attribute will be automatically added.

Categories
Software Development

Documentation done right

One critical piece of the software development process that often gets neglected by companies and also by many open-source projects is explaining how it works and how it can be used to solve the problem in question.

Documentation is often lacking and people have an hard time figuring out how they can use or contribute to a particular piece of software. I think most developers and users have faced this situation at least once.

Looking at it from the other side, it isn’t always easy to pick and share the right information so others can hit the ground running. The fact that not everybody is starting from the same point and have the same goal, makes the job a bit harder.

One approach to solve this problem that I like is Divio’s documentation system.

Divio's documentation system explained. Showing the 4 quadrants and their relations.
The components of Divio’s documentation system.

It splits the problems in 4 areas targeting different stages and needs of the person reading the documentation. Django uses this system and is frequently praised for having great documentation.

From a user point of view it looks solid. You should take a look and apply it on your packages/projects, I surely will.

Categories
Python

Django Friday Tips: Permissions in the Admin

In this year’s first issue of my irregular Django quick tips series, lets look at the builtin tools available for managing access control.

The framework offers a comprehensive authentication and authorization system that is able to handle the common requirements of most websites without even needing any external library.

Most of the time, simple websites only make use of the “authentication” features, such as registration, login and logout. On more complex systems only authenticating the users is not enough, since different users or even groups of users will have access to distinct sets of features and data records.

This is when the “authorization” / access control features become handy. As you will see they are very simple to use as soon as you understand the implementation and concepts behind them. Today I’m gonna focus on how to use these permissions on the Admin, perhaps in a future post I can address the usage of permissions on other situations. In any case Django has excellent documentation, so a quick visit to this page will tell you what you need to know.

Under the hood

Simplified Entity-Relationship diagram of Django's authentication and authorization features.
ER diagram of Django’s “auth” package

The above picture is a quick illustration of how this feature is laid out in the database. So a User can belong to multiple groups and have multiple permissions, each Group can also have multiple permissions. So a user has a given permission if it is directly associated with him or or if it is associated with a group the user belongs to.

When a new model is added 4 permissions are created for that particular model, later if we need more we can manually add them. Those permissions are <app>.add_<model>, <app>.view_<model>, <app>.update_<model> and <app>.delete_<model>.

For demonstration purposes I will start with these to show how the admin behaves and then show how to implement an action that’s only executed if the user has the right permission.

The scenario

Lets image we have a “store” with some items being sold and it also has a blog to announce new products and promotions. Here’s what the admin looks like for the “superuser”:

Admin view, with all models being displayed.
The admin showing all the available models

We have several models for the described functionality and on the right you can see that I added a test user. At the start, this test user is just marked as regular “staff” (is_staff=True), without any permissions. For him the admin looks like this:

A view of Django admin without any model listed.
No permissions

After logging in, he can’t do anything. The store manager needs the test user to be able to view and edit articles on their blog. Since we expect in the future that multiple users will be able to do this, instead of assigning these permissions directly, lets create a group called “editors” and assign those permissions to that group.

Only two permissions for this group of users

Afterwards we also add the test user to that group (in the user details page). Then when he checks the admin he can see and edit the articles as desired, but not add or delete them.

Screenshot of the Django admin, from the perspective of a user with only "view" and "change" permissions.
No “Add” button there

The actions

Down the line, the test user starts doing other kinds of tasks, one of them being “reviewing the orders and then, if everything is correct, mark them as ready for shipment”. In this case, we don’t want him to be able to edit the order details or change anything else, so the existing “update” permissions cannot be used.

What we need now is to create a custom admin action and a new permission that would let specific users (or groups) execute that action. Lets start with the later:

class Order(models.Model):
    ...
    class Meta:
        ...
        permissions = [("set_order_ready", "Can mark the order as ready for shipment")]

What we are doing above, is telling Django there is one more permission that should be created for this model, a permission that we will use ourselves.

Once this is done (you need to run manage.py migrate), we can now create the action and ensure we check that the user executing it has the newly created permission:

class OrderAdmin(admin.ModelAdmin):
    ...
    actions = ["mark_as_ready"]

    def mark_as_ready(self, request, queryset):
        if request.user.has_perm("shop.set_order_ready"):
            queryset.update(ready=True)
            self.message_user(
                request, "Selected orders marked as ready", messages.SUCCESS
            )
        else:
            self.message_user(
                request, "You are not allowed to execute this action", messages.ERROR
            )

    mark_as_ready.short_description = "Mark selected orders as ready"

As you can see, we first check the user as the right permission, using has_perm and the newly defined permission name before proceeding with the changes.

And boom .. now we have this new feature that only lets certain users mark the orders as ready for shipment. If we try to execute this action with the test user (that does not have yet the required permission):

No permission assigned, no action for you sir

Finally we just add the permission to the user and it’s done. For today this is it, I hope you find it useful.

Categories
Software Development Technology and Internet

Mirroring GitHub Repositories

Git by itself is a distributed version control system (a very popular one), but over the years organizations started to rely on some internet services to manage their repositories and those services eventually become the central/single source of truth for their code.

The most well known service out there is GitHub (now owned by Microsoft), which nowadays is synonymous of git for a huge amount of people. Many other services exist, such as Gitlab and BitBucked, but GitHub gained a notoriety above all others, specially for hosting small (and some large) open source projects.

These centralized services provide many more features that help managing, testing and deploying software. Functionality not directly related to the main purpose of git.

Relying on these central services is very useful but as everything in life, it is a trade-off. Many large open source organizations don’t rely on these companies (such as KDE, Gnome, Debian, etc), because the risks involved are not worth the convenience of letting these platforms host their code and other data.

Over time we have been witnessing some of these risks, such as your project (and all the related data) being taken down without you having any chance to defend yourself (Example 1 and Example 2). Very similar to what some content creators have been experiencing with Youtube (I really like this one).

When this happens, your or your organizations don’t lose the code itself since you almost certainly have copies on your own devices (thanks to git), but you lose everything else, issues, projects, automated actions, documentation and essentially the known used by URL of your project.

Since Github is just too convenient to collaborate with other people, we can’t just leave. In this post I explain an easy alternative to minimize the risks described above, that I implemented myself after reading many guides and tools made by others that also tried to address this problem before.

The main idea is to automatically mirror everything in a machine that I own and make it publicly available side by side with the GitHub URLs, the work will still be done in Github but can be easily switched over if something happens.

The software

To achieve the desired outcome I’ve researched a few tools and the one that seemed to fit all my requirements (work with git and be lightweight) was “Gitea“. Next I will describe the steps I took.

The Setup

This part was very simple, I just followed the instructions present on the documentation for a docker based install. Something like this:

version: "3"

networks:
  gitea:
    external: false

services:
  server:
    image: gitea/gitea:latest
    container_name: gitea
    environment:
      - USER_UID=1000
      - USER_GID=1000
    restart: always
    networks:
      - gitea
    volumes:
      - ./gitea:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - "3000:3000"
      - "222:22"

If you are doing the same, don’t copy the snippet above. Take look here for updated instructions.

Since my website is not supposed to have much concurrent activity, using an SQLite database is more than enough. So after launching the container, I chose this database type and made sure I disabled the all the functionality I won’t need.

Part of Gitea's configuration page. Server and Third-Party Service Settings.
Part of the Gitea’s configuration page

After this step, you should be logged in as an admin. The next step is to create a new migration on the top right menu. We just need to choose the “Github” option and continue. You should see the below screen:

Screenshot of the page that lets the users create a new migration/mirror in Gitea.
Creating a new Github migration/mirror in Gitea.

If you choose This repository will be a mirror option, Gitea will keep your repository and wiki in sync with the original, but unfortunately it will not do the same for issues, labels, milestones and releases. So if you need that information, the best approach is to uncheck this field and do a normal migration. To keep that information updated you will have to repeat this process periodically.

Once migrated, do the same for your other repositories.

Conclusion

Having an alternative with a backup of the general Github data ended up being quite easy to set up. However the mirror feature would be much more valuable if it included the other items available on the standard migration.

During my research for solutions, I found Fossil, which looks very interesting and something that I would like to explore in the future, but at the moment all repositories are based on Git and for practical reasons that won’t change for the time being.

With this change, my public repositories can be found in:

Categories
Python

Django Friday Tips: Inspecting ORM queries

Today lets look at the tools Django provides out of the box to debug the queries made to the database using the ORM.

This isn’t an uncommon task. Almost everyone who works on a non-trivial Django application faces situations where the ORM does not return the correct data or a particular operation as taking too long.

The best way to understand what is happening behind the scenes when you build database queries using your defined models, managers and querysets, is to look at the resulting SQL.

The standard way of doing this is to set the logging configuration to print all queries done by the ORM to the console. This way when you browse your website you can check them in real time. Here is an example config:

LOGGING = {
    ...
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'filters': ['require_debug_true'],
            'class': 'logging.StreamHandler',
        },
        ...
    },
    'loggers': {
        ...
        'django.db.backends': {
            'level': 'DEBUG',
            'handlers': ['console', ],
        },
    }
}

The result will be something like this:

...
web_1     | (0.001) SELECT MAX("axes_accessattempt"."failures_since_start") AS "failures_since_start__max" FROM "axes_accessattempt" WHERE ("axes_accessattempt"."ip_address" = '172.18.0.1'::inet AND "axes_accessattempt"."attempt_time" >= '2020-09-18T17:43:19.844650+00:00'::timestamptz); args=(Inet('172.18.0.1'), datetime.datetime(2020, 9, 18, 17, 43, 19, 844650, tzinfo=<UTC>))
web_1     | (0.001) SELECT MAX("axes_accessattempt"."failures_since_start") AS "failures_since_start__max" FROM "axes_accessattempt" WHERE ("axes_accessattempt"."ip_address" = '172.18.0.1'::inet AND "axes_accessattempt"."attempt_time" >= '2020-09-18T17:43:19.844650+00:00'::timestamptz); args=(Inet('172.18.0.1'), datetime.datetime(2020, 9, 18, 17, 43, 19, 844650, tzinfo=<UTC>))
web_1     | Bad Request: /users/login/
web_1     | [18/Sep/2020 18:43:20] "POST /users/login/ HTTP/1.1" 400 2687

Note: The console output will get a bit noisy

Now lets suppose this logging config is turned off by default (for example, in a staging server). You are manually debugging your app using the Django shell and doing some queries to inspect the resulting data. In this case str(queryset.query) is very helpful to check if the query you have built is the one you intended to. Here’s an example:

>>> box_qs = Box.objects.filter(expires_at__gt=timezone.now()).exclude(owner_id=10)
>>> str(box_qs.query)
'SELECT "boxes_box"."id", "boxes_box"."name", "boxes_box"."description", "boxes_box"."uuid", "boxes_box"."owner_id", "boxes_box"."created_at", "boxes_box"."updated_at", "boxes_box"."expires_at", "boxes_box"."status", "boxes_box"."max_messages", "boxes_box"."last_sent_at" FROM "boxes_box" WHERE ("boxes_box"."expires_at" > 2020-09-18 18:06:25.535802+00:00 AND NOT ("boxes_box"."owner_id" = 10))'

If the problem is related to performance, you can check the query plan to see if it hits the right indexes using the .explain() method, like you would normally do in SQL.

>>> print(box_qs.explain(verbose=True))
Seq Scan on public.boxes_box  (cost=0.00..13.00 rows=66 width=370)
  Output: id, name, description, uuid, owner_id, created_at, updated_at, expires_at, status, max_messages, last_sent_at
  Filter: ((boxes_box.expires_at > '2020-09-18 18:06:25.535802+00'::timestamp with time zone) AND (boxes_box.owner_id <> 10))

This is it, I hope you find it useful.

Categories
Random Bits Technology and Internet

Giving a new life to old phones

Nowadays, in some “developed” countries, it is very common for people to have a bunch of old phones stored somewhere in a drawer. Ten years have passed since smartphones became ubiquitous and those devices tend to become unusable very quickly, at least for their primary purpose. Either a small component breaks, the vendor stops providing updates, newer apps don’t support those older versions, etc.

The thing is, these phones are still powerful computers. It would be great if we could give them another life once they are no longer fit for regular day to day use or the owner just wants to try a shiny new device.

I never had many smartphones, mines tend to last many years, but I still have one or two lying around. Recently I started thinking of new uses for them, make them work instead of just gathering dust. A quick search on the internet tells me that many people already had the same idea (I’m quite late to the party) and have been working on cool things to do with these devices.

However, most of these articles just throw the idea at you, without telling you how to do it. Others assume that your device is relatively recent.

Of course the difficulty increases with the age of the phone, in my case the software that I will be able to run on a 10 year old Samsung Galaxy S will not be as easy to find as the software that I can run on another device with just one or two years.

Bellow is a list posts I found online with cool things you can do with your old phones. What sets this list apart from other results is that all the items aren’t just ideas, they contain step by step instructions of how to achieve the end result.

You don’t have to follow the provided instructions rigorously and you should introduce some variations that are more appropriate to your use case.

Have fun and reuse your old devices.

Categories
Python Software Development

Why you shouldn’t remove your package from PyPI

Nowadays most software developed using the Python language relies on external packages (dependencies) to get the job done. Correctly managing this “supply-chain” ends up being very important and having a big impact on the end product.

As a developer you should be cautious about the dependencies you include on your project, as I explained in a previous post, but you are always dependent on the job done by the maintainers of those packages.

As a public package owner/maintainer, you also have to be aware that the code you write, your decisions and your actions will have an impact on the projects that depend directly or indirectly on your package.

With this small introduction we arrive to the topic of this post, which is “What to do as a maintainer when you no longer want to support a given package?” or ” How to properly rename my package?”.

In both of these situations you might think “I will start by removing the package from PyPI”, I hope the next lines will convince you that this is the worst you can do, for two reasons:

  • You will break the code or the build systems of all projects that depend on the current or past versions of your package.
  • You will free the namespace for others to use and if your package is popular enough this might become a juicy target for any malicious actor.

TLDR: your will screw your “users”.

The left-pad incident, while it didn’t happen in the python ecosystem, is a well known example of the first point and shows what happens when a popular package gets removed from the public index.

Malicious actors usually register packages using names that are similar to other popular packages with the hope that a user will end up installing them by mistake, something that already has been found multiple times on PyPI. Now imagine if that package name suddenly becomes available and is already trusted by other projects.

What should you do it then?

Just don’t delete the package.

I admit that in some rare occasions it might be required, but most of the time the best thing to do is to leave it there (specially for open-source ones).

Adding a warning to the code and informing the users in the README file that the package is no longer maintained or safe to use is also a nice thing to do.

A good example of this process being done properly was the renaming of model-mommy to model-bakery, as a user it was painless. Here’s an overview of the steps they took:

  1. A new source code repository was created with the same contents. (This step is optional)
  2. After doing the required changes a new package was uploaded to PyPI.
  3. Deprecation warnings were added to the old code, mentioning the new package.
  4. The documentation was updated mentioning the new package and making it clear the old package will no longer be maintained.
  5. A new release of the old package was created, so the user could see the deprecation warnings.
  6. All further development was done on the new package.
  7. The old code repository was archived.

So here is what is shown every time the test suite of an affected project is executed:

/lib/python3.7/site-packages/model_mommy/__init__.py:7: DeprecationWarning: Important: model_mommy is no longer maintained. Please use model_bakery instead: https://pypi.org/project/model-bakery/

In the end, even though I didn’t update right away, everything kept working and I was constantly reminded that I needed to make the change.

Categories
Random Bits Technology and Internet

Dynamic DNS using Cloudflare Workers

In this post I’ll try to describe a simple solution, that I came up with, to solve the issue of dynamically updating DNS records when the IP addresses of your machines/instances changes frequently.

While Dynamic DNS isn’t a new thing and many services/tools around the internet already provide solutions to this problem (for more than 2 decades), I had a few requirements that ruled out most of them:

  • I didn’t want to sign up to a new account in one of these external services.
  • I would prefer to use a domain name under my control.
  • I don’t trust the machine/instance that executes the update agent, so according to the principle of the least privilege, the client should only able to update one DNS record.

The first and second points rule out the usual DDNS service providers and the third point forbids me from using the Cloudflare API as is (like it is done in other blog posts), since the permissions we are allowed to setup for a new API token aren’t granular enough to only allow access to a single DNS record, at best I would’ve to give access to all records under that domain.

My solution to the problem at hand was to put a worker is front of the API, basically delegating half of the work to this “serverless function”. The flow is the following;

  • agent gets IP address and timestamp
  • agent signs the data using a previously known key
  • agent contacts the worker
  • worker verifies signature, IP address and timestamp
  • worker fetches DNS record info of a predefined subdomain
  • If the IP address is the same, nothing needs to be done
  • If the IP address is different, worker updates DNS record
  • worker notifies the agent of the outcome

Nothing too fancy or clever, right? But is works like a charm.

I’ve published my implementation on GitHub with a FOSS license, so anyone can modify and reuse. It doesn’t require any extra dependencies, it consists of only two files and you just need to drop them at the right locations and you’re ready to go. The repository can be found here and the README.md contains the detailed steps to deploy it.

There are other small features that could be implemented, such as using the same worker with several agents that need to update different records, so only one of these “serverless functions” would be required. But these improvements will have wait for another time, for now I just needed something that worked well for this particular case and that could be easily deployed in a short time.

Categories
Security

Security.txt

Some days ago while scrolling my mastodon‘s feed (for those who don’t know it is like Tweeter but instead of being a single website, the whole network is composed by many different entities that interact with each other), I found the following message:

To server admins:

It is a good practice to provide contact details, so others can contact you in case of security vulnerabilities or questions regarding your privacy policy.

One upcoming but already widespread format is the security.txt file at https://your-server/.well-known/security.txt.

See https://securitytxt.org/ and https://infosec-handbook.eu/.well-known/security.txt.

@infosechandbook@chaos.social

It caught my attention because my personal domain didn’t had one at the time. I’ve added it to other projects in the past, but do I need one for a personal domain?

After some thought, I couldn’t find any reason why I shouldn’t add one in this particular case. So as you might already have guessed, this post is about the steps I took to add it to my domain.

What is it?

A small text file, just like robots.txt, placed in a well known location, containing details about procedures, contacts and other key information required for security professionals to properly disclose their findings.

Or in other words: Contact details in a text file.

security.txt isn’t yet an official standard (still a draft) but it addresses a common issue that security researches encounter during their day to day activity: sometimes it’s harder to report a problem than it is to find it. I always remember the case of a Portuguese citizen, who spent ~5 months trying to contact someone that could fix some serious vulnerabilities in a governmental website.

Even though it isn’t an accepted standard yet, it’s already being used in the wild:

Need more examples? A small search finds it for you very quickly or you can also read here a small analysis of the current status on Alexa’s top 1000 websites.

Implementation

So to help the cause I added one for this domain. It can be found at https://ovalerio.net/.well-known/security.txt

Below are the steps I took:

  1. Go to https://securitytxt.org/ and fill the required fields of the form present on that website.
  2. Fill the extra fields if they apply.
  3. Generate the text document.
  4. Sign the content using your PGP key
    gpg --clear-sign security.txt
  5. Publish the signed file on your domain under https://<domain>/.well-known/security.txt

As you can see, this is a very low effort task and it can generate very high returns, if it leads to a disclosure of a serious vulnerability that otherwise would have gone unreported.

Categories
Python

Django Friday Tips: Feature Flags

This time, as you can deduce from the title, I will address the topic of how to use feature flags on Django websites and applications. This is an incredible functionality to have, specially if you need to continuously roll new code to production environments that might not be ready to be released.

But first what are Feature Flags? The Wikipedia tells us this:

A feature toggle (also feature switch, feature flag, …) is a technique in software development that attempts to provide an alternative to maintaining multiple branches in source code (known as feature branches), such that a software feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during runtime.

Wikipedia

It seems a pretty clear explanation and it gives us a glimpse of the potential of having this capability in a given project. Exploring the concept a bit more it uncovers a nice set of possibilities and use cases, such as:

  • Canary Releases
  • Instant Rollbacks
  • AB Testing
  • Testing features with production data

To dive further into the concept I recommend starting by reading this article, that gives you a very detailed explanation of the overall idea.

In the rest of the post I will describe how this kind of functionality can easily be included in a standard Django application. Overtime many packages were built to solve this problem however most aren’t maintained anymore, so for this post I picked django-waffle given it’s one of the few that are still in active development.

As an example scenario lets image a company that provides a suite of online office tools and is currently in the process of introducing a new product while redoing the main website’s design. The team wants some trusted users and the developers to have access to the unfinished product in production and a small group of random users to view the new design.

With the above scenario in mind, we start by install the package and adding it to our project by following the instructions present on the official documentation.

Now picking the /products page that is supposed to displays the list of existing products, we can implement it this way:

# views.py
from django.shortcuts import render

from waffle import flag_is_active


def products(request):
    if flag_is_active(request, "new-design"):
        return render(request, "new-design/product_list.html")
    else:
        return render(request, "product_list.html")
# templates/products.html
{% load waffle_tags %}

<!DOCTYPE html>
<html>
<head>
    <title>Available Products</title>
</head>
<body>
    <ul>
        <li><a href="/spreadsheets">Spreadsheet</a></li>
        <li><a href="/presentations">Presentation</a></li>
        <li><a href="/chat">Chat</a></li>
        <li><a href="/emails">Marketing emails</a></li>
        {% flag "document-manager" %}
            <li><a href="/documents"></a>Document manager</li>
        {% endflag %}
    </ul>
</body>
</html>

You can see above that 2 conditions are checked while processing a given request. These conditions are the flags, which are models on the database with certain criteria that will be evaluated against the provided request in order to determine if they are active or not.

Now on the database we can config the behavior of this code by editing the flag objects. Here are the two objects that I created (retrieved using the dumpdata command):

  {
    "model": "waffle.flag",
    "pk": 1,
    "fields": {
      "name": "new-design",
      "everyone": null,
      "percent": "2.0",
      "testing": false,
      "superusers": false,
      "staff": false,
      "authenticated": false,
      "languages": "",
      "rollout": false,
      "note": "",
      "created": "2020-04-17T18:41:31Z",
      "modified": "2020-04-17T18:51:10.383Z",
      "groups": [],
      "users": []
    }
  },
  {
    "model": "waffle.flag",
    "pk": 2,
    "fields": {
      "name": "document-manager",
      "everyone": null,
      "percent": null,
      "testing": false,
      "superusers": true,
      "staff": false,
      "authenticated": false,
      "languages": "",
      "rollout": false,
      "note": "",
      "created": "2020-04-17T18:43:27Z",
      "modified": "2020-04-17T19:02:31.762Z",
      "groups": [
        1,  # Dev Team
        2   # Beta Customers
      ],
      "users": []
    }
  }

So in this case new-design is available to 2% of the users and document-manager only for the Dev Team and Beta Customers user groups.

And for today this is it.