Python, application configuration : environment variables, ini and YAML files

Logo

Introduction

The essential features when learning quickly Python for an immediate use :

Obviously, no hard coded application configuration values in programs, Python programs or not.

Configuration can be retrieved from :

  • Environment variables
  • Ini files
  • JSON files
  • XML files
  • YAML files

In this chapter, how to read (write) configuration data with Python from environment variables, INI files with configparser and YAML files with the package PyYAML.

XML is not covered here. XML format is less used nowadays, JSON and YAML have more human readable formats, further more XML parsers are a little bit heavy.

JSON format is also not covered in this paper, a dedicated article is published on this topic : Python, Reading and writing JSON with the package json

Environment variables, module os

sqlpac@vpsfrsqlpac2$ export CFG=/home/sqlpac/cfg
sqlpac@vpsfrsqlpac2$ echo $CFG
/home/sqlpac/cfg

To read environment variables, import the module os and call the method getenv or the method get of the class environ :

import os

confdir = os.getenv('CFG')
homedir = os.environ.get('HOME')

print(confdir)
print(homedir)
/home/sqlpac/cfg
/home/sqlpac

When the environment variable does not exist : the method getenv and environ.get return None.

The environment variable can be retrieved using directly the syntax os.environ["Environment Variable"] without using the get methods.

import os
confdir = os.environ['CFG']

But the exception KeyError must be managed in the case of the environment variable does not exist instead of testing if None is returned when using the methods get

import os

varenv = 'CFG2'

try:
	confdir = os.environ[varenv]
except KeyError:
	print('Environment variable %s does not exist' % (varenv))
Environment variable CFG2 does not exist

The most important : how to set an environment variable available in the sub programs (shell, Python…) ?

Just define the environment variable os.environ['var'] in the parent program, the environment variable is then available to sub programs.

subprogram.py
import os
import sys

print('%s : %s' % (sys.argv[0], os.getenv('VERSION')))
build.bash
#!/bin/bash
echo $0" : "$VERSION
import os

os.environ['VERSION']='4.2'

os.system('python3 subprogram.py')
os.system('./build.bash')
Python subprogram.py : 4.2
Shell ./build.bash : 4.2

Ini files, module configparser

Reading an INI file

A sample ini file (nested sections can not be implemented) :

sqlpac.ini
[sqlpac]

version=5.8
verbosity=2
debug=false
user=sqlpac

wwwurl=https://www.sqlpac.com/
rpc=https://www.sqlpac.com/rpc-secure/

[referential]

dir=https://www.sqlpac.com/referentiel/docs

[googleindexing]

apikey=ApIkeYdGF_kBtPVdAwIM7F0Fu87qWMoykfyl9hfnG2
jsonfile=google-auth-indexing.json

scopes=https://www.googleapis.com/auth/indexing
	
notification=https://indexing.googleapis.com/v3/urlNotifications/metadata?url=
publish=https://indexing.googleapis.com/v3/urlNotifications:publish
		
[mobiletest]
serviceurl=https://searchconsole.googleapis.com/v1/urlTestingTools/mobileFriendlyTest:run

Use the package configparser to read an INI file. An object is created with configparser.ConfigParser() and its method read is called with the ini file path in argument :

import configparser

cfg = configparser.ConfigParser()
cfg.read('sqlpac.ini')

The method sections return the sections in a list object :

import configparser

cfg = configparser.ConfigParser()
cfg.read('sqlpac.ini')

print(cfg.sections())
['sqlpac', 'referential', 'googleindexing', 'mobiletest']

Variables are retrieved with the usual syntax :

print(cfg['sqlpac']['debug'])
print(cfg['sqlpac']['version'])

for key in cfg['googleindexing']:
	print(cfg['googleindexing'][key])

false
5.8
ApIkeYdGF_kBtPVdAwIM7F0Fu87qWMoykfyl9hfnG2
google-auth-indexing.json
https://www.googleapis.com/auth/indexing
https://indexing.googleapis.com/v3/urlNotifications/metadata?url=
https://indexing.googleapis.com/v3/urlNotifications:publish

Config object does not guess data types, string datatype is applied. To convert to the right data types, use the appropriate get method :

version = cfg['sqlpac'].getfloat('version')
verbosity = cfg['sqlpac'].getint('verbosity')
debug = cfg['sqlpac'].getboolean('debug')

As with a dictionary, use the methods get to provide fallback values when the key does not exist :

debug = cfg['sqlpac'].getboolean('debug', False)

Using variables, interpolations

To avoid redundancy, variables can be used in INI files :

[sqlpac]

wwwurl=https://www.sqlpac.com
rpc=%(wwwurl)s/rpc-secure

By default, the basic interpolation is activated in ConfigParser. %(var)s is evaluated on demand where var is defined in the same section, there is no need to define and use the variables in a specific order.

print(cfg['sqlpac']['rpc'])
https://www.sqlpac.com/rpc-secure

Basic interpolation evaluates variables for directives in the same section. When evaluation is needed cross sections, extended interpolation must be defined in the config parser object. In extended interpolations, variables have the nomenclature ${section:var} and when the section is missing, the section of the variable is used.

[sqlpac]

wwwurl=https://www.sqlpac.com
rpc=${wwwurl}/rpc-secure
           
[referential]
dir=${sqlpac:wwwurl}/referentiel/docs
import configparser
from configparser import ExtendedInterpolation

cfg = configparser.ConfigParser(interpolation=ExtendedInterpolation())
cfg.read('sqlpac.ini')

print(cfg['sqlpac']['rpc'])
print(cfg['referential']['dir'])
https://www.sqlpac.com/rpc-secure
https://www.sqlpac.com/referentiel/docs

Basic and extended interpolations are mutually exclusive. Both can not be used.

With respectively the basic and extended interpolation, just double the character % and $ to escape when used in configuration directives values, otherwise they are candidates to interpolation.


notif=50%% done   # % added to bypass basic interpolation
price=$$10        # $ added to bypass extended interpolation

Delimiters, comments

The defaults about delimiters and comments are the followings :

delimiters=('=', ':')
comments=('#', ';')

The first occurence in a line is considered as the delimiter or the comment marker in that line.

Obviously, this can be overriden when creating the config parser object :

cfg = configparser.ConfigParser(interpolation=ExtendedInterpolation(),
                                delimiters=('=', ':', '~'))

Writing an INI file

Less used, but good to know, to write an INI file from a dictionary object, use the method write :

import configparser

config = configparser.ConfigParser()

config['sqlpac'] = {}
config['sqlpac']['wwwurl'] = 'https://www.sqlpac.com'
config['sqlpac']['rpc'] = '${wwwurl}/rpc-secure'

with open('sqlpac2.ini', 'w') as cfgfile:
  config.write(cfgfile)
sqlpac2.ini
[sqlpac]
wwwurl = https://www.sqlpac.com
rpc = ${wwwurl}/rpc-secure

Another coding using the methods add_section and set :

import configparser

config = configparser.ConfigParser()

config.add_section('sqlpac');
config.set('sqlpac','wwwurl','https://www.sqlpac.com')
config.set('sqlpac','rpc','${wwwurl}/rpc-secure')

with open('sqlpac2.ini', 'w') as cfgfile:
  config.write(cfgfile)

YAML

YAML : Ain’t Markup Language. Even more human readable than JSON format.

Translating the INI file to YAML format

Let’s write the previous sqlpac.ini file in YAML format :

sqlpac.yaml
sqlpac:
  version: 5.8
  verbosity: 2
  debug: false
  user: sqlpac
  wwwurl: https://www.sqlpac.com
  rpc: https://www.sqlpac.com/rpc-secure
  
referential:
  dir: https://www.sqlpac.com/referentiel/docs
  
googleindexing:
  apikey: ApIkeYdGF_kBtPVdAwIM7F0Fu87qWMoykfyl9hfnG2
  jsonfile: google-auth-indexing.json

  scopes: https://www.googleapis.com/auth/indexing
  
  rooturl: https://indexing.googleapis.com/v3
  
  endpoints:
    notification: https://indexing.googleapis.com/v3/urlNotifications/metadata?url=
    publish: https://indexing.googleapis.com/v3/urlNotifications:publish

mobiletest:
  serviceurl: https://searchconsole.googleapis.com/v1/urlTestingTools/mobileFriendlyTest:run

YAML is very interesting, we can introduce the subsection endpoints, this was not possible in the INI file.

But the bad news : we can not use variables as we did before in the ini file and the config parser interpolation interfaces.

In the YAML specifications, defining variables is not possible. Anchors can be defined but it is only used to duplicate values, concatenation is forbidden, the syntax below raises an error :


sqlpac:
  wwwurl: &url https://www.sqlpac.com
  rpc: *url/rpc-secure

Installing pyYAML

The YAML parser is not native in Python, an optional package must be installed. If not installed, install the package PyYAML :

pip3 search PyYAML
PyYAML (5.3.1)                 - YAML parser and emitter for Python
pip3 install PyYAML
Successfully built PyYAML
Installing collected packages: PyYAML
Successfully installed PyYAML-5.3.1

Loading YAML files

To load a YAML file, import the module yaml and call the method load :

import yaml

with open("sqlpac.yaml", "r") as ymlfile:
    cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)

print(cfg["googleindexing"]["endpoints"])
print(cfg["googleindexing"]["endpoints"]["notification"])
{'notification': 'https://indexing.googleapis.com/v3/urlNotifications/metadata?url=', 'publish': 'https://indexing.googleapis.com/v3/urlNotifications:publish'}
https://indexing.googleapis.com/v3/urlNotifications/metadata?url=

Since version 5.1, the load method must be called with the Loader option, otherwise a warning is raised :

YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe.
Please read https://msg.pyyaml.org/load for full details.

Valid Values for the Loader option are :

  • BaseLoader : only loads the most basic YAML.
  • SafeLoader : loads a subset of the YAML language, safely. This is recommended for loading untrusted input.
  • FullLoader : loads the full YAML language. Avoids arbitrary code execution.
  • UnsafeLoader : the original Loader code that could be easily exploitable by untrusted data input.

What about data types ? The appropriate data type is applied when loading, no conversion is needed compared to INI files and configparser :

import yaml

with open("sqlpac.yaml", "r") as ymlfile:
    cfg = yaml.load(ymlfile, Loader=yaml.FullLoader)

for key in ('version','verbosity','debug'):
	print('%s : %s, %s' % (key, cfg["sqlpac"][key], type(cfg["sqlpac"][key])))
version : 5.8, <class 'float'>
verbosity : 2, <class 'int'>
debug : False, <class 'bool'>

The data type can be enforced in the YAML file, for example if we want the directive version as a string datatype and not float :

sqlpac:
  version: !!str 5.8
…
version : 5.8, <class 'str'>

Writing YAML files

To write a YAML file, build a dictionary and use the dump method :

import yaml
cfgyaml = {}

cfgyaml["sqlpac"] = {}
cfgyaml["sqlpac"]["user"] = "sqlpac"
cfgyaml["sqlpac"]["wwwurl"] = "https://www.sqlpac.com"
cfgyaml["google"] = {}
cfgyaml["google"]["apis"] = ["googleindexing", "googleanalytics"]

with open("sqlpac2.yaml", "w") as f:
	yaml.dump(cfgyaml, f, sort_keys=False)
sqlpac2.yaml
sqlpac:
  user: sqlpac
  wwwurl: https://www.sqlpac.com
google:
  apis:
  - googleindexing
  - googleanalytics

The sort_keys option in the dump method is only available starting PyYAML version 5.1 released in March 2019, by default keys are ordered.

Conclusion : INI or YAML for the configuration file ?

When lists, nested dictionaries are intensively used in the configuration file : YAML is suitable, but remember that variables are the not possible.

Further more, YAML is independent of a programming language if it must be exchanged with other platforms.

INI file and configparser is the best choice when variables are needed, but conversions must be managed in this context and the INI file is then platform and Python language dependent.