Python, Reading and writing JSON with the package json

Logo

Introduction

The essential features when learning quickly Python for an immediate use :

In this chapter, how to read and write JSON data in a Python program with the system package json.

The Python environment

The python environment is the following, sourced with the file $HOME/.python-3.8:

$HOME/.python-3.8
#!/bin/bash
export PYHOME=/opt/python/python-3.8
export PATH=$PYHOME/bin:$PATH
export LD_LIBRARY_PATH=$PYHOME/lib:$LD_LIBRARY_PATH
export PYTHONPATH=/opt/python/packages

sqlpac@vpsfrsqlpac2$ . $HOME/.python-3.8
sqlpac@vpsfrsqlpac2$ which python3
sqlpac@vpsfrsqlpac2$ which pip3
/opt/python/python-3.8/bin/python3
/opt/python/python-3.8/bin/pip3

virtualenv is installed and a full isolated virtual environment is setup for the project :

sqlpac@vpsfrsqlpac2$ cd /home/sqlpac
          
sqlpac@vpsfrsqlpac2$ virtualenv /home/sqlpac/google
Using base prefix '/opt/python/python-3.8'
New python executable in /home/sqlpac/google/bin/python3.8
Also creating executable in /home/sqlpac/google/bin/python
Installing setuptools, pip, wheel...
done.
sqlpac@vpsfrsqlpac2$ source /home/sqlpac/google/bin/activate
(google) sqlpac@vpsfrsqlpac2:/home/sqlpac$

The JSON data sample

We know the JSON format sent by Google Indexing API when requesting the status for a given URL :

{
  "url": "https://www.sqlpac.com/referentiel/docs/mariadb-columnstore-1.2.3-installation-standalone-ubuntu.html",
  "latestUpdate": 
  { "url": "https://www.sqlpac.com/referentiel/docs/mariadb-columnstore-1.2.3-installation-standalone-ubuntu.html",    
    "type": "URL_UPDATED", 
    "notifyTime": "2020-04-10T17:43:21.198591915Z"
  }
}

The environment variable $PRJ is set to the working directory

(google) sqlpac@vpsfrsqlpac2:/home/sqlpac$ mkdir google/json
(google) sqlpac@vpsfrsqlpac2:/home/sqlpac$ export PRJ=/home/sqlpac/google/json

(google) sqlpac@vpsfrsqlpac2:/home/sqlpac$ cd $PRJ
(google) sqlpac@vpsfrsqlpac2:/home/sqlpac/google/json$

Let’s see how to handle JSON in a Python program.

System package json

The system package json is available in native code, just import it :

$PRJ/handling-json.py
import json

That’s all !

Reading JSON data

The method loads : loading from a string variable

Use the method loads to load JSON from a string variable :

import json

response_json='{"a":1, "b":2}'

loaded_json = json.loads(response_json)

for key in loaded_json:
	print("key : %s, value: %s" % (key,loaded_json[key]))
(google) sqlpac@vpsfrsqlpac2:/home/sqlpac/google/json$ python3 handling-json.py
key : a, value: 1
key : b, value: 2

In the real life, JSON is not defined in a single line string, to define a JSON string using multiple lines :

import json

response_json = '''{
  "url": "https://www.sqlpac.com/referentiel/docs/mariadb-columnstore-1.2.3-installation-standalone-ubuntu.html",
  "latestUpdate":
  { "url": "https://www.sqlpac.com/referentiel/docs/mariadb-columnstore-1.2.3-installation-standalone-ubuntu.html",
    "type": "URL_UPDATED",
    "notifyTime": "2020-04-10T17:43:21.198591915Z"
  },
  "isactive" : true,
  "floatvalue" : 1.2399,
  "intvalue" : 1,
  "ostypes" : ["linux","macos","windows"]
}'''

loaded_json = json.loads(response_json)
for key in loaded_json:
  print("%s %s %s" % (key, type(loaded_json[key]), loaded_json[key]))

Extra data are added in the JSON sample for the demo : isactive, floatvalue, intvalue, ostypes.

Data types are also displayed with the function type(). The data types are then the following :

KeyTypeValeur
url<class 'str'>https://www.sqlpac.com/ref…
latestUpdate<class 'dict'>{'url': 'https://www.sqlpac.com/…', 'type': 'URL_UPDATED'…}
isactive<class 'bool'>True
floatvalue<class 'float'>1.2300
intvalue<class 'int'>1
ostypes<class 'list'>["linux","macos","windows"]

When we are used to Javascript, the data type translation is the following :

JavascriptPython
Objectdict
Arraylist
Stringstr
Number (int)int
Number (float)float
true | falseTrue | False
print(loaded_json["url"])
https://www.sqlpac.com/referentiel…

So naturally, we try the Javascript dot notation syntax, but it does not work :

print(loaded_json.url)
Traceback (most recent call last):
  File "handling-json.py", line 23, in <module>
    print(loaded_json.url)
AttributeError: 'dict' object has no attribute 'url'

To use the dot notation, a class must be created :

import json

response_json = '''{
 "url": "https://www.sqlpac.com/referentiel/docs/mariadb-columnstore-1.2.3-installation-standalone-ubuntu.html",
 "latestUpdate":
 { "url": "https://www.sqlpac.com/referentiel/docs/mariadb-columnstore-1.2.3-installation-standalone-ubuntu.html",
   "type": "URL_UPDATED",
   "notifyTime": "2020-04-10T17:43:21.198591915Z"
 },
 "isactive" : true,
 "floatvalue" : 1.2399,
 "intvalue" : 1,
 "ostypes" : ["linux","macos","windows"]
}'''

class google():
	def __init__(self, data):
		self.__dict__ = json.loads(data)
	
google_answer = google(response_json)

print(google_answer.url)
print(google_answer.latestUpdate["type"])
https://www.sqlpac.com/referentiel…
URL_UPDATED

As expected, google_answer.latestUpdate.type is not available as it would be with Javascript, but google_answer.latestUpdate["type"]. Python is not Javascript, we must leave sometimes our programming habits.

The method load : loading from a file

When JSON data are stored in a file, the method load is used :

import json

with open('json-data.json', 'r') as f:
    json_dict = json.load(f)

print(json_dict["url"])
https://www.sqlpac.com/referentiel…

No difference with the previous example, to use dot notation, create a class :

import json

class google():
	def __init__(self, filename):
		with open(filename, 'r') as f:
			self.__dict__ = json.load(f)
	

google_answer = google('json-data.json')
print(google_answer.url)
https://www.sqlpac.com/referentiel…

Handling malformed JSON data

Use try / except blocks to manage exceptions encountered when loading malformed JSON data :

import json

with open('json-data.json') as f:
	try:
		data = json.load(f)
	except Exception as e:
		print("Exception raised | %s " % str(e))
		exit()
	
print(data["url"])
Exception raised | Expecting ',' delimiter: line 6 column 5 (char 306)

Duplicate key/value

What if a key/value is defined more than once :

{
	"url": "1.html",
	"url": 1
}

No exception raised, the value loaded and the datatype is the last key/value read in the JSON data :

import json
…
print("value : %s, data type : %s" % (data["url"], type(data["url"]) ))
value : 18, data type : <class 'int'>

Returning and writing JSON data

Let’s imagine we want to return the following "dummy" answer :

{
    "url": "https://www.sqlpac.com/archives/2020",
    "ostypes": [ "linux", "macos","windows"],
    "isactive": true,
    "price": "12€",
    "details": {
        "returncode": "0",
        "reason": "none"
    }
}

The method dumps

The method dumps returns a JSON string from a Python dictionary :

import json

response = {}

response["url"] = "https://www.sqlpac.com/archives/2020"
response["ostypes"] = ["linux","macos","windows"]
response["isactive"] = True
response["price"] = "12$"
response["details"] = { "returncode": 1, "reason":"none" }

str_response = json.dumps(response)
print(str_response)
{"url": "https://www.sqlpac.com/archives/2020", "ostypes": ["linux", "macos", "windows"], "isactive": true, "price": "12$", "details": {"returncode": 1, "reason": "none"}}

Data are well transtyped in the way back :

PythonJavascript
dictObject
listArray
strString
intNumber (int)
floatNumber (float)
True | Falsetrue | false

Human readable

Data are returned in a single line format, use the indentation option indent to get a more human readable format

str_response = json.dumps(response, indent=4)
print(str_response)
{
    "url": "https://www.sqlpac.com/archives/2020",
    "ostypes": [
        "linux",
        "macos",
        "windows"
    ],
    "isactive": true,
    "price": "12$",
    "details": {
        "returncode": 1,
        "reason": "none"
    }
}

Unicode

And if there is a unicode character, for example 12€ instead of 12$. The response will look like this :

…
        "price": "12\u20ac",
…

By default, json.dumps ensures text is ASCII-encoded, if not, text is escaped. Set the option ensure_ascii to False to ensure unicode characters are not touched :

str_response = json.dumps(response, indent=4, ensure_ascii=False)
print(str_response)
…
        "price": "12€",
…

Sorting keys

Key order is not guaranteed or predefined, to force a key ordering, set sort_keys to True :

str_response = json.dumps(response, indent=4, ensure_ascii=False, sort_keys=True)
print(str_response)
{
    "details": {
        "reason": "none",
        "returncode": 1
    },
    "isactive": true,
    "ostypes": [
        "linux",
        "macos",
        "windows"
    ],
    "price": "12€",
    "url": "https://www.sqlpac.com/archives/2020"
}

The method dump

Use the method dump when writing JSON data to a file, all the options described above with the method dumps are available for the method dump :

with open('response.json', 'w') as f:
 json.dump(response,f,indent=4, ensure_ascii=False, sort_keys=False )
$PRJ/response.json
{
    "url": "https://www.sqlpac.com/archives/2020",
    "ostypes": [
        "linux",
        "macos",
        "windows"
    ],
    "isactive": true,
    "price": "12€",
    "details": {
        "returncode": 1,
        "reason": "none"
    }
}

Conclusion

Serializing and deserializing data for JSON usage are quite easy but we need to forget Javascript habits when handling loaded JSON data in Python programs (dot notation…).