Categories
Old Posts

Simple JSON Parser for Python

Some time ago i started to follow a Blog that weekly proposes some programming exercices and i solved one of their problems (an old one, from 2009) . So today i’m posting here my solution for the problem of this week. Basically they ask us to write a JSON parser in our favorite computer language, so i chose “Python” and tried to complete the task.

For those who don’t know what JSON is:

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

My implementation is quite simple and it can contain some bugs (and is not optimized), so if you discover any error just leave a reply (we are always learning). Bellow is my code and a link to Github where you can also comment the code. In the following weeks i’ll try to solve more of their problems.

class json_parser:

    def __init__(self, string):
        self.json_data = self.__remove_blanks(string)
        self.pointer = 0

    def __remove_blanks(self, string):
        new_list = []
        inside_string = False
        for i in list(string):
            if inside_string or i != ' ':
                new_list.append(i)
            if i == '"':
                inside_string = not inside_string

        return "".join(n for n in new_list)

    def __parse_obj(self):
        new_dic = {}
        self.pointer += 1
        while self.json_data[self.pointer] != '}':
            if self.json_data[self.pointer] == '"':
                key = self.__parse_string()
            else:
                raise Exception  # The only possible type of value for a key is String

            if self.json_data[self.pointer] == ':':
                self.pointer += 1
            else:
                raise Exception  # invalid object

            value = self.__parse_value()
            if value == -1:
                return -1

            new_dic[key] = value
            if self.json_data[self.pointer] == ',':
                self.pointer += 1

        self.pointer += 1
        return new_dic

    def __parse_array(self):
        new_array = []
        self.pointer += 1
        while self.json_data[self.pointer] != ']':
            value = self.__parse_value()
            if value == -1:
                return -1
            else:
                new_array.append(value)

            if self.json_data[self.pointer] == ',':
                self.pointer += 1
        self.pointer += 1
        return new_array

    def __parse_string(self):
        self.pointer += 1
        start = self.pointer
        while self.json_data[self.pointer] != '"':
            self.pointer += 1
            if self.pointer == len(self.json_data):
                raise Exception  # the string isn't closed
        self.pointer += 1
        return self.json_data[start:self.pointer - 1]

    def __parse_other(self):
        if self.json_data[self.pointer:self.pointer + 4] == 'true':
            self.pointer += 4
            return True

        if self.json_data[self.pointer:self.pointer + 4] == 'null':
            self.pointer += 4
            return None

        if self.json_data[self.pointer:self.pointer + 5] == 'false':
            self.pointer += 5
            return False

        start = self.pointer
        while (self.json_data[self.pointer].isdigit()) or (self.json_data[self.pointer] in (['-', '.', 'e', 'E'])):
            self.pointer += 1

        if '.' in self.json_data[start:self.pointer]:
            return float(self.json_data[start:self.pointer])
        else:
            return int(self.json_data[start:self.pointer])

    def __parse_value(self):
        try:
            if self.json_data[self.pointer] == '{':
                new_value = self.__parse_obj()
            elif self.json_data[self.pointer] == '[':
                new_value = self.__parse_array()
            elif self.json_data[self.pointer] == '"':
                new_value = self.__parse_string()
            else:
                new_value = self.__parse_other()
        except Exception:
                print 'Error:: Invalid Data Format, unknown character at position', self.pointer
                return -1
        return new_value

    def parse(self):
        if self.json_data[self.pointer] == '{' or self.json_data[self.pointer] == '[':
            final_object = self.__parse_value()
        else:
            print 'Error:: Invalid inicial Data Format'
            final_object = None

        return final_object

[EDIT: The previous code has several issues, so please do not use it. Python has many great packages to handle JSON documents the right way, like simplejson.]

By Gonçalo Valério

Software developer and owner of this blog. More in the "about" page.