Tag: Python

  • Why you shouldn’t remove your package from PyPI

    Nowadays most software developed using the Python language relies on external packages (dependencies) to get the job done. Correctly managing this “supply-chain” ends up being very important and having a big impact on the end product.

    As a developer you should be cautious about the dependencies you include on your project, as I explained in a previous post, but you are always dependent on the job done by the maintainers of those packages.

    As a public package owner/maintainer, you also have to be aware that the code you write, your decisions and your actions will have an impact on the projects that depend directly or indirectly on your package.

    With this small introduction we arrive to the topic of this post, which is “What to do as a maintainer when you no longer want to support a given package?” or ” How to properly rename my package?”.

    In both of these situations you might think “I will start by removing the package from PyPI”, I hope the next lines will convince you that this is the worst you can do, for two reasons:

    • You will break the code or the build systems of all projects that depend on the current or past versions of your package.
    • You will free the namespace for others to use and if your package is popular enough this might become a juicy target for any malicious actor.

    TLDR: your will screw your “users”.

    The left-pad incident, while it didn’t happen in the python ecosystem, is a well known example of the first point and shows what happens when a popular package gets removed from the public index.

    Malicious actors usually register packages using names that are similar to other popular packages with the hope that a user will end up installing them by mistake, something that already has been found multiple times on PyPI. Now imagine if that package name suddenly becomes available and is already trusted by other projects.

    What should you do it then?

    Just don’t delete the package.

    I admit that in some rare occasions it might be required, but most of the time the best thing to do is to leave it there (specially for open-source ones).

    Adding a warning to the code and informing the users in the README file that the package is no longer maintained or safe to use is also a nice thing to do.

    A good example of this process being done properly was the renaming of model-mommy to model-bakery, as a user it was painless. Here’s an overview of the steps they took:

    1. A new source code repository was created with the same contents. (This step is optional)
    2. After doing the required changes a new package was uploaded to PyPI.
    3. Deprecation warnings were added to the old code, mentioning the new package.
    4. The documentation was updated mentioning the new package and making it clear the old package will no longer be maintained.
    5. A new release of the old package was created, so the user could see the deprecation warnings.
    6. All further development was done on the new package.
    7. The old code repository was archived.

    So here is what is shown every time the test suite of an affected project is executed:

    /lib/python3.7/site-packages/model_mommy/__init__.py:7: DeprecationWarning: Important: model_mommy is no longer maintained. Please use model_bakery instead: https://pypi.org/project/model-bakery/

    In the end, even though I didn’t update right away, everything kept working and I was constantly reminded that I needed to make the change.

  • Django Friday Tips: Feature Flags

    This time, as you can deduce from the title, I will address the topic of how to use feature flags on Django websites and applications. This is an incredible functionality to have, specially if you need to continuously roll new code to production environments that might not be ready to be released.

    But first what are Feature Flags? The Wikipedia tells us this:

    A feature toggle (also feature switch, feature flag, …) is a technique in software development that attempts to provide an alternative to maintaining multiple branches in source code (known as feature branches), such that a software feature can be tested even before it is completed and ready for release. Feature toggle is used to hide, enable or disable the feature during runtime.

    Wikipedia

    It seems a pretty clear explanation and it gives us a glimpse of the potential of having this capability in a given project. Exploring the concept a bit more it uncovers a nice set of possibilities and use cases, such as:

    • Canary Releases
    • Instant Rollbacks
    • AB Testing
    • Testing features with production data

    To dive further into the concept I recommend starting by reading this article, that gives you a very detailed explanation of the overall idea.

    In the rest of the post I will describe how this kind of functionality can easily be included in a standard Django application. Overtime many packages were built to solve this problem however most aren’t maintained anymore, so for this post I picked django-waffle given it’s one of the few that are still in active development.

    As an example scenario lets image a company that provides a suite of online office tools and is currently in the process of introducing a new product while redoing the main website’s design. The team wants some trusted users and the developers to have access to the unfinished product in production and a small group of random users to view the new design.

    With the above scenario in mind, we start by install the package and adding it to our project by following the instructions present on the official documentation.

    Now picking the /products page that is supposed to displays the list of existing products, we can implement it this way:

    # views.py
    from django.shortcuts import render
    
    from waffle import flag_is_active
    
    
    def products(request):
        if flag_is_active(request, "new-design"):
            return render(request, "new-design/product_list.html")
        else:
            return render(request, "product_list.html")
    # templates/products.html
    {% load waffle_tags %}
    
    <!DOCTYPE html>
    <html>
    <head>
        <title>Available Products</title>
    </head>
    <body>
        <ul>
            <li><a href="/spreadsheets">Spreadsheet</a></li>
            <li><a href="/presentations">Presentation</a></li>
            <li><a href="/chat">Chat</a></li>
            <li><a href="/emails">Marketing emails</a></li>
            {% flag "document-manager" %}
                <li><a href="/documents"></a>Document manager</li>
            {% endflag %}
        </ul>
    </body>
    </html>

    You can see above that 2 conditions are checked while processing a given request. These conditions are the flags, which are models on the database with certain criteria that will be evaluated against the provided request in order to determine if they are active or not.

    Now on the database we can config the behavior of this code by editing the flag objects. Here are the two objects that I created (retrieved using the dumpdata command):

      {
        "model": "waffle.flag",
        "pk": 1,
        "fields": {
          "name": "new-design",
          "everyone": null,
          "percent": "2.0",
          "testing": false,
          "superusers": false,
          "staff": false,
          "authenticated": false,
          "languages": "",
          "rollout": false,
          "note": "",
          "created": "2020-04-17T18:41:31Z",
          "modified": "2020-04-17T18:51:10.383Z",
          "groups": [],
          "users": []
        }
      },
      {
        "model": "waffle.flag",
        "pk": 2,
        "fields": {
          "name": "document-manager",
          "everyone": null,
          "percent": null,
          "testing": false,
          "superusers": true,
          "staff": false,
          "authenticated": false,
          "languages": "",
          "rollout": false,
          "note": "",
          "created": "2020-04-17T18:43:27Z",
          "modified": "2020-04-17T19:02:31.762Z",
          "groups": [
            1,  # Dev Team
            2   # Beta Customers
          ],
          "users": []
        }
      }

    So in this case new-design is available to 2% of the users and document-manager only for the Dev Team and Beta Customers user groups.

    And for today this is it.

  • Django Friday Tips: Testing emails

    I haven’t written one of these supposedly weekly posts with small Django tips for a while, but at least I always post them on Fridays.

    This time I gonna address how we can test emails with the tools that Django provides and more precisely how to check the attachments of those emails.

    The testing behavior of emails is very well documented (Django’s documentation is one of the best I’ve seen) and can be found here.

    Summing it up, if you want to test some business logic that sends an email, Django replaces the EMAIL_BACKEND setting with a testing backend during the execution of your test suite and makes the outbox available through django.core.mail.outbox.

    But what about attachments? Since each item on the testing outbox is an instance of the EmailMessage class, it contains an attribute named “attachments” (surprise!) that is list of tuples with all the relevant information:

    ("<filename>", "<contents>", "<mime type>")

    Here is an example:

    # utils.py
    from django.core.mail import EmailMessage
    
    
    def some_function_that_sends_emails():
        msg = EmailMessage(
            subject="Example email",
            body="This is the content of the email",
            from_email="some@email.address",
            to=["destination@email.address"],
        )
        msg.attach("sometext.txt", "The content of the file", "text/plain")
        msg.send()
    
    
    # tests.py
    from django.test import TestCase
    from django.core import mail
    
    from .utils import some_function_that_sends_emails
    
    
    class ExampleEmailTest(TestCase):
        def test_example_function(self):
            some_function_that_sends_emails()
    
            self.assertEqual(len(mail.outbox), 1)
    
            email_message = mail.outbox[0]
            self.assertEqual(email_message.subject, "Example email")
            self.assertEqual(email_message.body, "This is the content of the email")
            self.assertEqual(len(email_message.attachments), 1)
    
            file_name, content, mimetype = email_message.attachments[0]
            self.assertEqual(file_name, "sometext.txt")
            self.assertEqual(content, "The content of the file")
            self.assertEqual(mimetype, "text/plain")

    If you are using pytest-django the same can be achieved with the mailoutbox fixture:

    import pytest
    
    from .utils import some_function_that_sends_emails
    
    
    def test_example_function(mailoutbox):
        some_function_that_sends_emails()
    
        assert len(mailoutbox) == 1
    
        email_message = mailoutbox[0]
        assert email_message.subject == "Example email"
        assert email_message.body == "This is the content of the email"
        assert len(email_message.attachments) == 1
    
        file_name, content, mimetype = email_message.attachments[0]
        assert file_name == "sometext.txt"
        assert content == "The content of the file"
        assert mimetype == "text/plain"

    And this is it for today.

  • 8 useful dev dependencies for django projects

    In this post I’m gonna list some very useful tools I often use when developing a Django project. These packages help me improve the development speed, write better code and also find/debug problems faster.

    So lets start:

    Black

    This one is to avoid useless discussions about preferences and taste related to code formatting. Now I just simply install black and let it care of these matters, it doesn’t have any configurations (with one or two exceptions) and if your code does not have any syntax errors it will be automatically formatted according to a “style” that is reasonable.

    Note: Many editors can be configured to automatically run black on every file save.

    https://github.com/python/black

    PyLint

    Using a code linter (a kind of static analysis tool) is also very easy, can be integrated with your editor and allows you to catch many issues without even running your code, such as, missing imports, unused variables, missing parenthesis and other programming errors, etc. There are a few other In this case pylint does the job well and I never bothered to switch.

    https://www.pylint.org/

    Pytest

    Python has a unit testing framework included in its standard library (unittest) that works great, however I found out that there is an external package that makes me more productive and my tests much more clear.

    That package is pytest and once you learn the concepts it is a joy to work with. A nice extra is that it recognizes your older unittest tests and is able to execute them anyway, so no need to refactor the test suite to start using it.

    https://docs.pytest.org/en/latest/

    Pytest-django

    This package, as the name indicates, adds the required support and some useful utilities to test your Django projects using pytest. With it instead of python manage.py test, you will execute just pytest like any other python project.

    https://pytest-django.readthedocs.io

    Django-debug-toolbar

    Debug toolbar is a web panel added to your pages that lets you inspect your requests content, database queries, template generation, etc. It provides lots of useful information in order for the viewer to understand how the whole page rendering is behaving.

    It can also be extended with other plugin that provide more specific information such as flamegraphs, HTML validators and other profilers.

    https://django-debug-toolbar.readthedocs.io

    Django-silk

    If you are developing an API without any HTML pages rendered by Django, django-debug-toobar won’t provide much help, this is where django-silk shines in my humble opinion, it provides many of the same metrics and information on a separate page that can be inspected to debug problems and find performance bottlenecks.

    https://github.com/jazzband/django-silk

    Django-extensions

    This package is kind of a collection of small scripts that provide common functionality that is frequently needed. It contains a set of management commands, such as shell_plus and runserver_plus that are improved versions of the default ones, database visualization tools, debugger tags for the templates, abstract model classes, etc.

    https://django-extensions.readthedocs.io

    Django-mail-panel

    Finally, this one is an email panel for the django-debug-toolbar, that lets you inspect the sent emails while developing your website/webapp, this way you don’t have to configure another service to catch the emails or even read the messages on terminal with django.core.mail.backends.console.EmailBackend, which is not very useful if you are working with HTML templates.

    https://github.com/scuml/django-mail-panel

  • Channels and Webhooks

    Django is an awesome web framework for python and does a really good job, either for building websites or web APIs using Rest Framework. One area where it usually fell short was dealing asynchronous functionality, it wasn’t its original purpose and wasn’t even a thing on the web at the time of its creation.

    The world moved on, web-sockets became a thing and suddenly there was a need to handle persistent connections and to deal with other flows “instead of” (or along with) the traditional request-response scheme.

    In the last few years there has been several cumbersome solutions to integrate web-sockets with Django, some people even moved to other python solutions (losing many of the goodies) in order to be able to support this real-time functionality. It is not just web-sockets, it can be any other kind of persistent connection and/or asynchronous protocol in a microservice architecture for example.

    Of all alternatives the most developer friendly seems to be django-channels, since it lets you keep using familiar django design patterns and integrates in a way that seems it really is part of the framework itself. Last year django-channels saw the release of it second iteration, with a completely different internal design and seems to be stable enough to start building cool things with it, so that is what we will do in this post.

    Webhook logger

    In this blog post I’m gonna explore the version 2 of the package and evaluate how difficult it can be to implement a simple flow using websockets.

    Most of the tutorials I find on the web about this subject try to demonstrate the capabilities of “channels” by implementing a simple real-time chat solution. For this blog post I will try something different and perhaps more useful, at least for developers.

    I will build a simple service to test and debug webhooks (in reality any type of HTTP request). The functionality is minimal and can be described like this:

    • The user visits the website and is given a unique callback URL
    • All requests sent to that callback URL are displayed on the user browser in real-time, with all the information about that request.
    • The user can use that URL in any service that sends requests/webhooks as asynchronous notifications.
    • Many people can have the page open and receive at the same time the information about the incoming requests.
    • No data is stored, if the user reloads the page it can only see new requests.

    In the end the implementation will not differ much from those chat versions, but at least we will end up with something that can be quite handy.

    Note: The final result can be checked on Github, if you prefer to explore while reading the rest of the article.

    Setting up the Django project

    The basic setup is identical to any other Django project, we just create a new one using django_admin startproject webhook_logger and then create a new app using python manage.py startapp callbacks (in this case I just named the app callbacks).

    Since we will not store any information we can remove all database related stuff and even any other extra functionality that will not be used, such as authentication related middleware. I did this on my repository, but it is completely optional and not in the scope of this small post.

    Installing “django-channels”

    After the project is set up we can add the missing piece, the django-channels package, running pip install channels==2.1.6. Then we need to add it to the installed apps:

    INSTALLED_APPS = [
        "django.contrib.staticfiles", 
        "channels", 
    ]

    For this project we will use Redis as a backend for the channel layer, so we need to also install the channels-redis package and add the required configuration:

    CHANNEL_LAYERS = {
        "default": {
            "BACKEND": "channels_redis.core.RedisChannelLayer",
            "CONFIG": {"hosts": [(os.environ.get("REDIS_URL", "127.0.0.1"), 6379)]},
        }
    }

    The above snippet assumes you are running a Redis server instance on your machine, but you can configure it using a environment variable.

    Add websocket’s functionality

    When using “django channels” our code will not differ much from a standard django app, we will still have our views, our models, our templates, etc. For the asynchronous interactions and protocols outside the standard HTTP request-response style, we will use a new concept that is the Consumer with its own routing file outside of default urls.py file.

    So lets add these new files and configurations to our app. First inside our app lets create a consumer.py with the following contents:

    # callbacks/consumers.py
    from channels.generic.websocket import WebsocketConsumer
    from asgiref.sync import async_to_sync
    import json
    
    
    class WebhookConsumer(WebsocketConsumer):
        def connect(self):
            self.callback = self.scope["url_route"]["kwargs"]["uuid"]
            async_to_sync(self.channel_layer.group_add)(self.callback, self.channel_name)
            self.accept()
    
        def disconnect(self, close_code):
            async_to_sync(self.channel_layer.group_discard)(
                self.callback, self.channel_name
            )
    
        def receive(self, text_data):
            # Discard all received data
            pass
    
        def new_request(self, event):
            self.send(text_data=json.dumps(event["data"]))

    Basically we extend the standard WebsocketConsumer and override the standard methods. A consumer instance will be created for each websocket connection that is made to the server. Let me explain a little bit what is going on the above snippet:

    • connect – When a new websocket connection is made, we check which callback it desires to receive information and attach the consumer to the related group ( a group is a way to broadcast a message to several consumers)
    • disconnect – As the name suggests, when we lose a connection we remove the “consumer” from the group.
    • receive – This is a standard method for receiving any data sent by the other end of the connection (in this case the browser). Since we do not want to receive any data, lets just discard it.
    • new_request – This is a custom method for handling data about a given request/webhook received by the system. These messages are submitted to the group with the type new_request.

    You might also be a little confused with that async_to_sync function that is imported and used to call channel_layer methods, but the explanation is simple, since those methods are asynchronous and our consumer is standard synchronous code we have to execute them synchronously. That function and sync_to_async are two very helpful utilities to deal with these scenarios, for details about how they work please check this blog post.

    Now that we have a working consumer, we need to take care of the routing so it is accessible to the outside world. Lets add an app level routing.py file:

    # callbacks/routing.py
    from django.conf.urls import url
    
    from .consumers import WebhookConsumer
    
    websocket_urlpatterns = [url(r"^ws/callback/(?P<uuid>[^/]+)/$", WebhookConsumer)]

    Here we use a very similar pattern (like the well known url_patterns) to link our consumer class to connections of certain url. In this case our users could connect to an URL that contains the id (uuid) of the callback that they want to be notified about new events/requests.

    Finally for our consumer to be available to the public we will need to create a root routing file for our project. It looks like this:

    # <project_name>/routing.py
    from channels.routing import ProtocolTypeRouter, URLRouter
    from callbacks.routing import websocket_urlpatterns
    
    application = ProtocolTypeRouter({"websocket": URLRouter(websocket_urlpatterns)})

    Here we use the ProtocolTypeRouter as the main entry point, so what is does is:

    It lets you dispatch to one of a number of other ASGI applications based on the type value present in the scope. Protocols will define a fixed type value that their scope contains, so you can use this to distinguish between incoming connection types.

    Django Channels Documentation

    We just defined the websocket protocol and used the URLRouter to point to our previous defined websocket urls.

    The rest of the app

    At this moment we are able to receive new websocket connections and send to those clients live data using the new_request method on the client. However at the moment we do not have information to send, since we haven’t yet created the endpoints that will receive the requests and forward their data to our consumer.

    For this purpose lets create a simple class based view, it will receive any type of HTTP request (including the webhooks we want to inspect) and forward them to the consumers that are listening of that specific uuid:

    # callbacks/views.py
    
    class CallbackView(View):
        def dispatch(self, request, *args, **kwargs):
            channel_layer = get_channel_layer()
            async_to_sync(channel_layer.group_send)(
                kwargs["uuid"], {"type": "new_request", "data": self._request_data(request)}
            )
            return HttpResponse()

    In the above snippet, we get the channel layer, send the request data to the group and return a successful response to calling entity (lets ignore what the self._request_data(request) call does and assume it returns all the relevant information we need).

    One important piece of information is that the value of the type key on the data that is used for the group_send call, is the method that will be called on the websocket’s consumer we defined earlier.

    Now we just need to expose this on our urls.py file and the core of our system is done.

    # <project_name>/urls.py
    
    from django.urls import path
    from callbacks.views import CallbackView
    
    urlpatterns = [
        path("<uuid>", CallbackView.as_view(), name="callback-submit"),
    ]

    The rest of our application is just standard Django web app development, that part I will not cover in this blog post. You will need to create a page and use JavaScript in order to connect the websocket. You can check a working example of this system in the following URL :

    http://webhook-logger.ovalerio.net

    For more details just check the code repository on Github.

    Deploying

    I not going to explore the details about the topic of deployments but someone else wrote a pretty straightforward blog post on how to do it for production projects that use Django channels. You can check it here.

    Final thoughts

    With django-channels building real-time web apps or projects that deal with other protocols other than HTTP becomes really simple. I do think it is a great addition to the current ecosystem, it certainly is an option I will consider from now on for these tasks.

    Have you ever used it? do you any strong opinion about it? let me know on the comments section.

    Final Note: It seems based on recent messages on the mailing list that the project might suspend its developments in its future if it doesn’t find new maintainers. It would definitely be a shame, since it has a lot of potential. Lets see how it goes.

  • Looking for security issues on your python projects

    In today’s post I will introduce a few open-source tools, that can help you improve the security of any of your python projects and detect possible vulnerabilities early on.

    These tools are quite well known in the python community and used together will provide you with great feedback about common issues and pitfalls.

    Safety and Piprot

    As I discussed some time ago on a post about managing dependencies and the importance of checking them for known issues, in python there is a tool that compares the items of your requirements.txt with a database of known vulnerable versions. It is called safety (repository)  and can be used like this:

    safety check --full-report -r requirements.txt

    If you already use pipenv safety is already incorporated and can be used by running: pipenv check (more info here).

    Since the older the dependencies are, the higher the probability of a certain package containing bugs and issues, another great tool that can help you with this is piprot (repository).

    It will check all items on your requirements.txt and tell you how outdated they are.

    Bandit

    The next tool in the line is bandit, which is a static analyzer for python built by the Open Stack Security Project, it checks your codebase for common security problems and programming  mistakes that might compromise your application.

    It will find cases of hardcoded passwords, bad SSL defaults, usage of eval, weak ciphers, different “injection” possibilities, etc.

    It doesn’t require much configuration and you can easily add it to your project. You can find more on the official repository.

    Python Taint

    This last one only applies if you are building a web application and requires a little bit more effort to integrate in your project (at its current state).

    Python Taint (pyt) is a static analyzer that tries to find spots were your code might be vulnerable to common types of problems that affect websites and web apps, such as SQL injection, cross site scripting (XSS), etc.

    The repository can be found here.

    If you are using Django, after using pyt you might also want to run the built in manage.py check command, (as discussed in a previous post) to verify some specific configurations of the framework present on your project.

     

  • Browsing folders of markdown files

    If you are like me, you have a bunch of notes and documents written in markdown spread across many folders. Even the documentation of some projects involving many people is done this way and stored, for example, in a git repository. While it is easy to open the text editor to read these files, it is not the most pleasant experience, since the markup language was made to later generate readable documents in other formats (eg. HTML).

    For many purposes setting up the required configuration of tools to generate documentation (like mkdocs) is not practical, neither it was the initial intent when it was written. So last weekend I took a couple of hours and built a rough (and dirty) tool to help me navigate and read the markdown documents with a more pleasant experience, using the browser (applying style as github).

    I called it mdvis and it is available for download through “pip”. Here’s how working with it looks like:

    It does not provide many features and is somewhat “green”, but it serves my current purposes. The program is open-source so you can check it here, in case you want to help improving it.

  • Django Friday Tips: Managing Dependencies

    This one is not specific of django but it is very common during the development of any python project. Managing the contents of the requirements.txt file, that sometimes grows uncontrollably can be a mess. One of the root causes is the common work-flow of using virtualenv, install with pip all the required libraries and then do something like:

    $pip freeze > requirements.txt

    At the beginning this might work great, however soon you will need to change things and remove libraries. At this point, things start to get a little trickier, since you do not know which lines are a direct dependency of your project or if they were installed because a library you already removed needed them. This leads to some tedious work in order to maintain the dependency list clean.

    To solve this problem we might use pip-tools, which will help you declare the dependencies in a simple way and automatically generate the final requirements.txt. As it is shown in the project readme, we can declare the following requirements.in file:

    django
    requests
    pillow
    celery

    Then we generate our “official” requirements.txt with the pip-compile command, that will product the following output:

    #
    # This file is autogenerated by pip-compile
    # Make changes in requirements.in, then run this to update:
    #
    #    pip-compile requirements.in
    #
    amqp==1.4.8               # via kombu
    anyjson==0.3.3            # via kombu
    billiard==3.3.0.22        # via celery
    celery==3.1.19
    django==1.9
    kombu==3.0.30             # via celery
    pillow==3.0.0
    pytz==2015.7              # via celery
    requests==2.8.1
    

    Now you can keep track of where all those libraries came from. Need to add or remove packages? Just run pip-compile again.

  • Newsletters for Python web developers

    The amount of new information that is added each day to the web is overwhelming, trying to keep up daily with everything about a given topic can be a time consuming process. One good way I found to tackle this problem and to avoid wasting a good chunk of my day searching and filtering through lots of new content in order to know what’s going on, was to subscribe to good resources that curate this material and send to my email box at the end of each week/month.

    Over time I found that the following 4 sources have continuously provided me with selection of good and up to date content summing up what I might have missed in the previous week/month related to Python and web development in general.

    Pycoders weekly

    This weekly newsletter is not focused on the web but address what’s going on on the python community, suggests good articles so you can level up your python skills and showcases interesting projects or libraries.

    Url: http://pycoders.com/

    Django Round-Up

    This one is comes less frequently but I found the quality of the content to be high. As its name shows, Django round-up focus exclusively on contents related to the web framework.

    Url: https://lincolnloop.com/django-round-up/

    HTML5 Weekley

    The first two were about the server side, with this one we move to the browser. HTML5 Weekly focuses on what can be done in the browser and were these technologies are heading to.

    Url: http://html5weekly.com/

    Javascript Weekly

    Being a web development post we can’t leave JavaScript behind, at least for now. This newsletter gives you the latest news and tools related to this programming language.

    Url: http://javascriptweekly.com/

    I hope you like it. If you find them useful you might also want to follow my Django Collection bundle (which I described in this old post), where I collect useful material related with the Django web framework.

  • Moving to Python 3

    A week ago the support period for the last version of Python 2 was extended by 5 years (from 2015 to 2020) and this event just ignited once again the discussion about the fragmentation in the python ecosystem. Some have the opinion that version 2 should have a 2.8 release while others keep saying that the future is python 3 and this event will delay even more the adoption of the new version.

    The truth is, version 3 already has almost 5 and half years (released in December 2008) and it seem it didn’t have yet conquered enough number of users to dethrone the old version. While the first iterations of the new major version met many critics (3.0 until 3.2),  the last 2 releases seems to have conquered very good reviews and after many years the majority of the most important libraries and modules seems to have support for python 3 (can be checked here ).

    This way and after some thought, i decided that it is time (maybe a little late) to change my default development version for new projects to python3, since it really is the future of this programming language and it is time to move on. There will be exceptions of course, like old projects that need to be maintained,  new ones where the requirements do not allow the new version or where the needed packages do not yet support python 3.

    So lets check what this “new” version has to offer.

  • Django Resources

    As I said in earlier posts in this blog, when i build websites or webapps where there are no technology impositions, i usually choose to do it in Python and in most of the cases, that’s the equivalent to say i choose to do it in Django.

    Over the last year, since i started using Bundlr,  I’ve been aggregating some resources like blog entries, tutorials and videos that i found useful and that could become handy in the future.

    Today I’m sharing the collection here, since it might helpful to someone else. I hope you like it and if you know more references that should be included in the list, please share it in the comments or send me an email.

    The list can be found here.

    Edit July 2016: Since I removed my account, the list is not longer available on Bundlr. Check recent posts, it will be published again soon.

  • First experience with MOOC

    The year of 2012 for the Internet was definitely the year of the “massive open online courses” with some startups of online education stepping up to the big stage (Coursera, Udacity, etc) and some well know names coming up with their own initiatives (MIT, Harvard and Berkeley at Edx). So in the beginning of this year there were many opportunities to learn something new or update your knowledge with college level quality, where the only prerequisite is your motivation.

    So i decide to give it a try, in January I picked up a topic that i wanted to learn/improve and signed up for it. The course wasn’t taken in any of that major sites that i previously mentioned but the system was based on Edx. At the end of the month, i started the 10gen‘s 7 week course on “MongoDB for Developers” (Given in Python) and followed the weekly classes flawlessly till the final exam in the middle of March.

    In the next few paragraphs i will describe my experience based on some notes that i took during that period, basically, i will tell what i liked and what i think that should be improved.

    On the first week in a course for developers, the rythm was kinda slow with the instructors wasting too much time with the basics of python and how to install some libraries. At first i thought everyone would think the same, but in the discussions i noticed that many of the fellow students didn’t even knew how to install python on their machines. Even though it was a nice thing to do, in my opinion for this kind of course previous python experience should be a prerequisite.

    In the next few weeks things started to get interesting when we focused more on mongodb and talked about its operations, design, performance, the aggregation framework, etc. Every week a new batch of 3 to 10 minute videos (with few exceptions), covering each one a new concept or use case about the week’s topic was released, plus some questions to make sure you understood the what was being explained in each video. Personally i like this approach, i didn’t move to the next video until i completely understood the previous one, and if i had doubts it was as simple as watch the video again and use the discussions in the case the doubts persists. The responses to your questions were posted generally pretty fast, many times by the instructor but most of the times by fellow students.

    To complete the week you had to complete some kind of homework that weighed 50% of your final grade. Some people complained that it was relatively easy to complete these tasks, but in my opinion the purpose of this homework is to certify that you, at the end of each week, understood the key concepts lectured and not to test the capacity and expertise of the participants.

    In the last week of the course, you only had to complete the exam, the content posted by the instructor were optional and consisted in interviews with professionals talking about mongodb implementations in production right now on codecademy and foursquare.

    One improvement that i would like to see in future courses is a discussion box per video where you didn’t have to leave the video page to ask questions or to answer the ones you know.

    In conclusion, i really liked the experience and i will certainly put my new “mongodb” skills in action on a future project. Right now I’m already aiming to a new course for the summer (when my weekly schedule is lighter). If you already took one of these online courses, I would like to listen what you have to say about them. Feel free to use the comments.

  • Recovering your bookmarks

    Some time ago, while cleaning stuff in my computer, I decided to switch my browser to Opera and delete the version of Firefox that I was using at the time. While doing that and removing all the Firefox folders that are left behind, I accidentally erased all my bookmarks and I didn’t had them synced with some on-line service. Well that wasn’t good, I had references stored there that I wanted to keep.

    When trying to recover the file ‘places.sqlite’ I found an bookmark backup generated by Firefox. When I opened the file I found that it was a mess, basically it was bunch of big JSON objects stored in one line containing lots of garbage (I only needed the urls).

    I kept that file until today, when I finally decided that I would put those bookmarks again in my browser. As Opera doesn’t import this kind of files, I made a little python script that extracts the names and urls of the backup and generates a single file that opera can import, while keeping the folder structure.

    Well, it worked, so I tought it might be usefull to someone else and pushed it to github. If any of you ever have the same problem give it a shoot and use this “quick fix”. You can find it here with some instructions on how to use it. If you find any problem, use the comments and github issues.

  • Generators, Decorators and Metaclasses

    For some time now, I’ve been trying to improve my python skills and learn a little bit more deeply the how the language works. The objective of this quest is to write more efficient and structured code, because it seems to me that I’m not using the full potential of this programing language.

    Yesterday i found at stackoverflow 3 comments from the same person answering 3 different questions, one about the yield statement in python, other about decorators and another explaining metaclasses. The posts are long but the explanation very good and with several examples, I thought that they were so good that I must share them with those who are trying to learn more advanced python.So here they are, in chronological order:

  • Simple JSON Parser for Python

    Some time ago i started to follow a Blog that weekly proposes some programming exercices and i solved one of their problems (an old one, from 2009) . So today i’m posting here my solution for the problem of this week. Basically they ask us to write a JSON parser in our favorite computer language, so i chose “Python” and tried to complete the task.

    For those who don’t know what JSON is:

    JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

    My implementation is quite simple and it can contain some bugs (and is not optimized), so if you discover any error just leave a reply (we are always learning). Bellow is my code and a link to Github where you can also comment the code. In the following weeks i’ll try to solve more of their problems.

    class json_parser:
    
        def __init__(self, string):
            self.json_data = self.__remove_blanks(string)
            self.pointer = 0
    
        def __remove_blanks(self, string):
            new_list = []
            inside_string = False
            for i in list(string):
                if inside_string or i != ' ':
                    new_list.append(i)
                if i == '"':
                    inside_string = not inside_string
    
            return "".join(n for n in new_list)
    
        def __parse_obj(self):
            new_dic = {}
            self.pointer += 1
            while self.json_data[self.pointer] != '}':
                if self.json_data[self.pointer] == '"':
                    key = self.__parse_string()
                else:
                    raise Exception  # The only possible type of value for a key is String
    
                if self.json_data[self.pointer] == ':':
                    self.pointer += 1
                else:
                    raise Exception  # invalid object
    
                value = self.__parse_value()
                if value == -1:
                    return -1
    
                new_dic[key] = value
                if self.json_data[self.pointer] == ',':
                    self.pointer += 1
    
            self.pointer += 1
            return new_dic
    
        def __parse_array(self):
            new_array = []
            self.pointer += 1
            while self.json_data[self.pointer] != ']':
                value = self.__parse_value()
                if value == -1:
                    return -1
                else:
                    new_array.append(value)
    
                if self.json_data[self.pointer] == ',':
                    self.pointer += 1
            self.pointer += 1
            return new_array
    
        def __parse_string(self):
            self.pointer += 1
            start = self.pointer
            while self.json_data[self.pointer] != '"':
                self.pointer += 1
                if self.pointer == len(self.json_data):
                    raise Exception  # the string isn't closed
            self.pointer += 1
            return self.json_data[start:self.pointer - 1]
    
        def __parse_other(self):
            if self.json_data[self.pointer:self.pointer + 4] == 'true':
                self.pointer += 4
                return True
    
            if self.json_data[self.pointer:self.pointer + 4] == 'null':
                self.pointer += 4
                return None
    
            if self.json_data[self.pointer:self.pointer + 5] == 'false':
                self.pointer += 5
                return False
    
            start = self.pointer
            while (self.json_data[self.pointer].isdigit()) or (self.json_data[self.pointer] in (['-', '.', 'e', 'E'])):
                self.pointer += 1
    
            if '.' in self.json_data[start:self.pointer]:
                return float(self.json_data[start:self.pointer])
            else:
                return int(self.json_data[start:self.pointer])
    
        def __parse_value(self):
            try:
                if self.json_data[self.pointer] == '{':
                    new_value = self.__parse_obj()
                elif self.json_data[self.pointer] == '[':
                    new_value = self.__parse_array()
                elif self.json_data[self.pointer] == '"':
                    new_value = self.__parse_string()
                else:
                    new_value = self.__parse_other()
            except Exception:
                    print 'Error:: Invalid Data Format, unknown character at position', self.pointer
                    return -1
            return new_value
    
        def parse(self):
            if self.json_data[self.pointer] == '{' or self.json_data[self.pointer] == '[':
                final_object = self.__parse_value()
            else:
                print 'Error:: Invalid inicial Data Format'
                final_object = None
    
            return final_object

    [EDIT: The previous code has several issues, so please do not use it. Python has many great packages to handle JSON documents the right way, like simplejson.]