Post

Coding Agreement: Python

Coding Agreement: Python

Purpose of this document

This document sets a baseline to ensure code quality during our engagement. It provides guidelines on Python coding best practices. You will also find insights on responsibilities expected for a developer before opening a Pull Request, and for a reviewer to ensure a good and constructive review.

Code review

Code review is an important part of our job as software engineers. It helps other to grow while improving code quality and share understanding between developers.

Cheat sheets

Whether you are a junior software engineer or an experienced one, anyone can have valuable insights!

Here are below two cheat sheets to follow as a PR author and as PR reviewer.

Author cheat sheet

When a review is conducted be open to feedback.

You can find more detailed guidance as an author on the CSE Code With Engineering Playbook.

To go further, we can also look at the Azure SDK coding guidance that can be a great source of inspiration, depending on the project we are working on.

Reviewer cheat sheet

You can find more detailed guidance as a reviewer on the CSE Code With Engineering Playbook.

Tooling used

To enforce these rules, a set of tools will be used. Decisions on tools to use can be found in the corresponding ADR ..\/adr\/022_coding_tools.md.

Coding guidelines

We are going to use Python as development language. Here are below a set of best practices to respect while developing. These guidelines need to be followed by developers. Reviewers will also need to validate these guidelines during PR review.

Code Layout [MUST]

  • import: first standard libraries, then third party and finally local libraries. All group alphabetically sorted.
  • blank lines: two blank lines surrounding classes and top-level functions. Methods inside functions are surrounded by a single line.
  • indentation: use 4 spaces (most IDEs will convert tab into 4 spaces by default).
  • line length: maximum 88 chars.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Standard libs
import std_lib_1
import std_lib_2

# Third party libs
import third_party_lib_1
import third_party_lib_2

# Local lib
from local_lib import class_1, function_1


# After two blank lines
def top_level_function(args: int) -> str:
    # Body

String quotes, whitespaces [MUST]

  • whitepace: avoid extra white spaces, use single white spaces around both sides of an operator, one after comma and none inside opening or closing parenthesis.
  • quotes: use double quotes. Single quotes can be used for strings if for instance you need to encapsulate json (which contains double quotes). But the rule is double quotes everywhere else.
1
2
3
4
x = "Hello world"
y = 15

print(x)

Naming convention [MUST]

  • language, spelling: Class name, function names and variables are written in English. Use grammatically correct names.
  • class name: name should start with an uppercase and follow the camlCase convention if more than two words.
  • function name:
    • lowercase, words separated by an underscore.
    • add self argument at first position if the method is a class’s method.
    • if the function’s name clashes with a reserved word, append underscore.
    • use 2 underscores at the beginning for private class’s methods.
    • use 1 underscore at the beginning of a private field.
    • specify the type of your input parameters
    • always provide a return type (use ‘None’ if ‘void’).
1
2
3
4
5
6
7
8
9
10
11
class CatalogInformation:
    def __init__(self, name: str) -> None:
        # Body constructor

    def get_metadata_count(self) -> int:
        # Body method
        return 1

    def __internal_check(self) -> bool:
        # Body private method
        return True

Comments / documentation [MUST]

Use comments, provide explanation on complex algorithms.

For the documentation, use the Sphinx style (official Python standard).

1
2
3
4
5
6
7
8
9
10
11
def is_valid(a: int, b: str) -> bool:
    """Explain what the function does

    :param a: what a is
    :type a: int
    :param b: what b is
    :type b: str

    :rtype: bool
    :return: what is the output of the function
    """

Tuples, Lists, Dictionaries [MUST]

Use tuples when data is non-changeable, dictionaries when you need to map things, and lists if your data can change later.

Functions can return multiple values, no need for a list:

1
2
3
4
5
6
7
def get_info(self) -> str, int:
    """Get the info. No need to return a List/Dict

    :rtype: str, int
    :return: what the str and the int are
    """
    return "hello world", 30

Use context managers [MUST]

Context managers are tool to use in situation where you need to run some code that has preconditions and postconditions.

For instance, when you read the content of a file, you need to ensure that you close the handle regardless of the success or the failure of the operation. With the with keyword you can achieve this:

1
2
3
4
5
6
7
8
9
10
11
with open(filename) as fd:
    process_file(fd)

# Note, that parenthesis are supported in Python 3.10 for context manager,
# useful when you have many 'with'
with (
    CtxManager1() as example1,
    CtxManager2() as example2,
    CtxManager3() as example3,
):
    ...

You can implement your own context manager may you need to execute actions in a certain order. Consider the case where you want to update a service configuration. You need first to stop the service, update the configuration then start the service again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class ServiceHandler:
    def __enter__(self) -> ServiceHandler:
        run("systemctl stop my.service")
        return self

    def __exit__(self, exc_type: str, ex_value: str, ex_traceback: str) -> None:
        run("systemctl start my.service")

def update_service_conf() -> None:
    # Body to update service's configuration

if (__name__ == '__main__'):
    with ServiceHandler():
        update_service_conf()

Use getters/setters for class attribute [SHOULD]

To add validation when getting or setting an attribute, let’s use getter / setter decorators for class’s attributes. This will also avoid accessing directly private class’s fields.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class BoxSize:
    def __init__(self, width: float, length: float) -> None:
        self._width = self._length = None
        self.width = width
        self.length = length

    @property
    def width(self) -> float:
        return self._width

    @width.setter
    def width(self, width_value: float) -> None:
        if (width_value < 0):
            raise ValueError(f"{width_value} is an invalid value for width")
        self._width = width_value

    @property
    def length(self) -> float:
        return self._length

    @length.setter
    def length(self, length_value: float) -> None:
        if (length_value < 0):
            raise ValueError(f"{length_value} is an invalid value for length")
        self._length = length_value

if (__name__ == '__main__'):
    boxSize = BoxSize(-32, 32)
    print(boxSize.width)
    boxSize.length = -45
    print(boxSize.length)

If a class is created to only host attributes (to describe something for instance), the class attribute dataclass can be used to simplify the writting:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Without the decorator
class Foo:
    def __init__(self, a: int, b: str, c: MyType):
        self.a = a
        self.b = b
        self.c = c

# could be rewritten as:
from dataclasses import dataclass


@dataclass
class Foo:
    a: int
    b: str
    c: MyType

Exceptions handling [MUST]

Hiding an exception or not properly anticipating potential errors (accessing an API for instance, network issues can arise) can lead to unexpected behaviors. Although a class or a function can affect default values to input parameters, not ensuring their validity, or not notifying the caller could lead to wrong results and difficulties to debug.

Each function has a logic, this logic must be followed by exceptions raised. For instance, if you have a function that get some data from an API, exceptions raised by this function should be logical: connection error, timeout etc. The exception must be raised at the right level of abstraction.

If you choose to propagate the exception to the caller, ensure that you do not expose sensitive information. Tracebacks of exception can contain sensitive details leading to exposing intellectual property.

Do not use exceptions as a go-to logic, meaning catching an exception and from the except calling other business code, the flow of the program will become harder to read. Exceptions are usually to notify the caller that something unexpected occured.

Finally, observability is an important engineering fundamental. Properly handling exceptions and managing observability (with AppInsight for instance) will lead to a more robust application and easy to debug when something not expected occurs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Never do ...
try:
    process_data()
except:
    pass

# Encapsulate orginal exception trace
def process_data() -> None:
    try:
        do_something()
    except KeyError as e:
        # Raise a specific exception from do_something,
        # encapsulate trace to a custom exception
        raise MyApplicationException("Item not present") from e

Dependency injection [SHOULD]

Dependency injection is a technique where an object receives other objects that it depends on (instead of creating this dependency inside the object that needs it). This approach has several advantages:

  1. it helps managing complexity of codebase,
  2. it helps testing and maintenance,
  3. it is supported by many languages, including Python.

Consider the following example without dependency injection: the class MyConverter reads a json file from an Azure Blob storage using the SDK and converts it into another format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# This code is a snippet no everything is implemented (exceptions for instance)
from azure.storage.blob import BlobClient

# Class that reads a blob using the Azure SDK
class AzureBlobReader:
    def __init__(self, connection: str):
        self.connection = connection

    def get_json(self, container: str, blob: str) -> str:
        client = BlobClient.from_connection_string(
            conn_str=self.connection,
            container_name=container,
            blob_name=blob
        )

        blob_raw = client.download_blob()

        return json.loads(blob_raw.readall())

# Class responsible of converting the blob into another format
class MyConverter:
    def convert_payload(self) -> str:
        blob_reader = AzureBlobReader("connection_str")
        input_json = blob_reader.get_json("my_container", "my_blob")

        return self.transform(input_json)

if (__name__ == '__main__'):
    converter = MyConverter()
    output = converter.convert_payload()

The class MyConverter is tightly coupled to AzureBlobReader. Implementing dependency injection in the code above will result into:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# This code is a snippet no everything is implemented (exceptions for instance)
from azure.storage.blob import BlobClient

# Class that reads a blob using the Azure SDK
class AzureBlobReader:
    def __init__(self, connection: str):
        self.connection = connection

    def get_json(self, container: str, blob: str) -> str:
        client = BlobClient.from_connection_string(
            conn_str=self.connection,
            container_name=container,
            blob_name=blob
        )

        blob_raw = client.download_blob()

        return json.loads(blob_raw.readall())

# Class responsible of converting the blob into another format
class MyConverter:
    def __init__(self, azure_blob_reader: AzureBlobReader):
        self.azure_blob_reader = azure_blob_reader

    def convert_payload(self) -> str:
        input_json = self.azure_blob_reader.get_json("my_container", "my_blob")

        return self.transform(input_json)

if (__name__ == '__main__'):
    converter = MyConverter(
        azure_blob_reader = AzureBlobReader("connection_str")
    )
    output = converter.convert_payload()

By decoupling these classes, code is easier to maintain and to read. It will also help building unit tests for the class MyConverter as we will be able to easily mock AzureBlobReader.

Unit tests [MUST]

Unit testing is a core tool for developers. They help us test our code, encourage good design practices, and reduce chances to have bugs hitting production. Unit tests can improve developer efficiency.

Unit tests should be:

  • Reliable: should be 100% reliable so failures indicate a bug in the code.
  • Fast: should run in milliseconds.
  • Isolated: removing all external dependencies ensures reliability and speed (Dependency injection for instance) for details).

Without entering into too many details, pytest provides a set of features helpful. Below some examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Execute this test 3 times, with a, b, and c as input_value
@pytest.mark.parametrize("input_value", ["a", "b", "c"])
def test_(
    input_value
):
    # Test body

# Define a fixed baseline, executed when needed
@pytest.fixture
def items() -> List[str]:
    return ["item 1", "item 2"]

# 'items' will be execute, hence, will be a List[str] with 2 items
def test_items_count(items):
    assert len(items) == 2

Please refer to pytest site for more useful patterns.

This post is licensed under CC BY 4.0 by the author.