Milestone 1 of the gita project: basic CLI

This is the first milestone where we implement the command-line interface (CLI) with sub-commands. Other posts in this series are

In general, I won’t provide complete code for each commit, but only hints, references, and snippets. This mimics the office situation where only requirements and partial instructions are given. If stuck, you can install gita to see the expected behavior, and read the source code.

Before diving into the coding nitty-gritty, let’s first review project organization. A typical Python project has the following structure

gita
├── gita
│   ├── __init__.py
│   └── ...
├── tests
│   └── ...
├── requirements.txt
├── setup.py
└── ...

Here I use the same name for the project, the project root folder, and the source code folder. This is not a requirement, but rather a convenience. The source code folder has an __init__.py file to make it importable. The tests folder contains test files. The requirements.txt specifies the project’s dependency on other third-party Python packages. And the setup.py defines the packaging details.

In addition, a project may contain

README.md: General information.
LICENSE: A must-have for any serious open-source project.
Makefile: Although Python code don’t need compilation, it’s convenient to define shortcuts.
MANIFEST.in: A list of non-Python files for packaging.
conftest.py: This file is specific to Pytest. It defines test fixtures and per-directory plugins.

As for the git repo setup, I recommend creating a repo on Github first. This remote repo is both an online backup and the ‘official’ copy for continuous integration (CI). Make sure to set your project public. Otherwise the CI tools used later in this milestone are not free.

After that, run git clone <remote address> locally. And we are ready to code.

v0.0.1: implement `add` and `rm` subcommands

In the first commit, we will implement the following behavior

$ gita add xxx
> xxx is added
$ gita rm yyy
> yyy is removed

Here nothing is added or removed. We only print out some text for visual inspection. The command-line interface (CLI) is like the project’s skeleton, and the ‘meat’ will be added later.

To get you started, here is a simple implementation of add. You can put it in main.py inside the source code folder.

import argparse

def add(args):
    print(f'{args.path} is added.')

if __name__ == '__main__':
    p = argparse.ArgumentParser(prog='gita')
    subparsers = p.add_subparsers(help='sub commands -h')

    p_add = subparsers.add_parser('add')
    p_add.add_argument('path')
    p_add.set_defaults(func=add)

    args = p.parse_args()
    if 'func' in args:
        args.func(args)
    else:
        p.print_help()

You can run it in the source code folder with

python3 main.py add hello

In this snippet, we attach sub-parsers to the root parser p. The parse_args() method collects all the user input in a dictionary-like object args. The sub-parser’s action is defined in the function add(), which becomes the value of args['func'] by calling the method set_defaults().

Now it’s your turn to implement the rm sub-command. Optionally you can skim through the official document of argparse.

When it’s all working, run git commit -am "<commit message>" to commit the code change. Good commit messages start with action verb. See the section title for example.

In general, it’s good practice to keep each commit small and self-contained. For example, it may

add a feature
fix a bug
refactor some logic

But not a complicated combination of them.

v0.0.2: enhance `add` sub-command

To make add and rm more meaningful, we will save the repo paths in ~/.config/gita/repo_path, where ~ is the user home directory. This location can be retrieved as

import os

path_file = os.path.join(os.path.expanduser('~'),
                         '.config', 'gita', 'repo_path')

Another improvement is to allow the addition of multiple paths, i.e,

$ gita add /a/b/c/repo1 /d/e/repo2
> repo1 is added
> repo2 is added

Take a look at nargs='+' in the argparse document for this feature.

In addition, make sure you get the following details right

add path only if it’s an existing directory, see os.path.isdir()
add path only if it hasn’t been added before
add absolute path even if the user input is relative

As for the file format, each path can be on its own line in repo_path.

If you are new to the os module, take a look at this article.

v0.0.3: add `ls` sub-command

In application programming interface (API) design, there is a jargon called “CRUD”. It stands for create, read, update, and deletion, which form the complete set of actions on persistent data. In this gita project, add is the C, rm is the D, ls and ll (see next milestone) are the R. I made the design choice to drop U.

An example output from ls sub-command is

$ gita ls
> repo1 repo2

Here the repo names are displayed instead of the repo paths, e.g., the path /a/b/repo1 has repo name repo1. Check out os.path.basename() for this purpose.

To display the full path, we can implement the following behavior

$ gita ls repo1
> /a/b/repo1

This means the ls sub-command takes an optional argument. Check out nargs='?' in the argparse module.

A naive implementation would simply display the content in repo_path. What if the user moves or deletes a folder? It is better to show only the valid ones. We should also remove the invalid ones since they are unlikely to become valid again. Think about where it should be done. Probably not here since ls is the R, which should not have side effects.

v0.0.4: implement `rm` sub-command

This is the D in CRUD API design. It deletes repo(s) from the repo_path file. For example,

$ gita rm repo2

The Unix philosophy says “no news is good news”. Thus we won’t give feedback if repo2 is deleted successfully. (Previously we printed some text just for debugging purposes.)

But what if repo2 doesn’t exist in repo_path? At the bare minimum, the program should exit with a non-zero error code without any traceback. Optionally you can print out some error messages to explain the situation.

v0.0.5: refactor repo parsing logic

So far we have two sub-commands sharing a common argument

gita ls <repo-name>
gita rm [repo-name]

Here the angular brackets denote mandatory argument, and the square brackets optional argument.

Both cases need validity check, i.e., whether the user input is registered in repo_path. Hopefully you don’t have two pieces of code doing this. Redundant code cause redundant work and bugs when requirement changes. This is the “don’t repeat yourself” (DRY) principle.

To kill the redundancy, we can define a helper function to parse the repo_path file.

def get_repos() -> Dict[str, str]:

Here I use type hints to annotate the return value. The returned dictionary has repo names as keys and repo paths as values. Any user input outside this dictionary is invalid.

There is another argparse trick to apply here. The add_argument method has a choices keyword argument, which can help take care of the membership check.

v0.0.6: add tests

Any serious project needs tests with high test coverage (I am thinking of >90%). To write Python tests, we have two choices: the unittest module in the standard library and the pytest module. I prefer the latter for its concision.

See this example from the unittest document

import unittest

class TestStringMethods(unittest.TestCase):

    def test_upper(self):
        self.assertEqual('foo'.upper(), 'FOO')

    def test_isupper(self):
        self.assertTrue('FOO'.is_upper())
        self.assertFalse('Foo'.is_upper())

The equivalent test in pytest is

class TestStringMethods:

    def test_upper(self):
        assert 'foo'.upper() == 'FOO'

    def test_isupper(self):
        assert 'FOO'.is_upper()
        assert not 'Foo'.is_upper()

More high-level comparison between them is in this page. Since pytest is not in the Python standard library, we need to put it in the requirement.txt with the version information so that other programmers can install it using

pip3 install -r requirement.txt

There are many types of tests. The most common ones are

unit test: check a single function
integration test: check a business logic

For the gita project, an integration test checks the behavior of a sub-command. We can refactor main.py to facilitate that

def main(argv=None):
    p = argparse.ArgumentParser(prog='gita')
    ...

if __name__ == '__main__':
    main()

This change allows us to pass command-line arguments to the main() function. Then in the test we can check the output using pytest’s capfd fixture.

At a bare minimum, you should test the behavior of add, rm, ls with valid inputs. As for edge cases, make sure that

invalid paths cannot be added
the same path cannot be added multiple times in repo_path
the CLI does not generate traceback when
- rm a non-existing repo
- ls a non-existing repo

To collect test coverage data, check out the pytest-cov package.

v0.0.7: add continuous integration

A software project can have many components (possibly many repos too). Making sure that all the components work together is called integration. It involves building the full software, performing unit tests, integration tests, performance tests, etc. All these processes should be automated of course.

Continuous integration (CI) means integrating as often as possible. It helps to catch bugs early and prevent catastrophe. For a small project like gita, we can afford to integrate every commit. Specifically, we will perform automated testing in CI.

There are many CI tools on Github. And the one I use is Travis CI. Another popular choice is Circle CI. Both are free for open source projects. Recently Github also released its own CI tool called Github Actions.

To set up Travis CI, register with your Github account and grant access to your repo. Then include a .travis.yml file in the project root folder. It specifies the commands to set up, run, and clean up the test. These commands will be triggered by every commit to the remote branches, and by the pull requests. You can find more details in their documentations.

v0.0.8: package and release

A concept closely related to CI is continuous deployment (CD). Deployment basically means making the software or service available to users. And CD means deploying often so that user feedbacks can be collected quickly.

For a small Python project like gita, CD is probably an overkill. In this session, we will implement manual deployment via the Python Package Index (PyPI). This will allow you to install your gita package using pip3 install gita1. (Since I already took the name gita, you will have to pick a different name, say gita1.)

There are two steps in deployment:

make installation files
upload to PyPI

And we will install some third-party tools for package and release

pip3 install twine setuptools

Remember to add them to requirements.txt too.

The purpose of installation is to enable the system python3 to find our source code. It also allows us to execute the CLI as gita too, instead of running or python3 <path-to-source-folder>/main.py. For example, my installed gita command points to /usr/local/bin/gita, with the following content

#!/usr/local/bin/python3.6
# EASY-INSTALL-ENTRY-SCRIPT: 'gita==0.10.2','console_scripts','gita'
__requires__ = 'gita==0.10.2'
import re
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(
        load_entry_point('gita==0.9.7', 'console_scripts', 'gita')()
    )

The first line says that this file is to be run with a specific Python version. The script passes the user input to the entry point of the installed gita package.

Setting up the packaging is as easy as adding a setup.py file in the project root folder. A minimum example is

from setuptools import setup

setup(
    name='gita',
    packages=['gita'],
    version='0.0.8',
    entry_points={'console_scripts': ['gita = gita.main:main']},
    python_requires='~=3.6',
)

You can check out the documentation for more keywords.

To avoid name clashes, you should rename your project to something else, say “gita1”. Note that the entry point is the main function.

With setup.py in place, you can already install locally

pip3 install .

Here I assume you are at the project root folder.

If you want your code change to take effect immediately, install in the developer mode

pip3 install -e .

It creates a symbolic link to the source code in the system folder.

To make your gita package installable without source code, you need to upload the installation files to the Python Package Index (PyPI), which is the official repository for Python packages. Make sure to create an account on PyPI first.

To make the distribution files, run

python3 setup.py sdist

It will create a folder called dist with the source tarball file in it.

Uploading to PyPI is simply one command

twine upload dist/*

If no error occurs, you will see the release at https://pypi.org/project/<your-package-name>/. Note that overwriting an existing release is forbidden on PyPI. You need to bump up the version number in setup.py for each upload.

I put all these commands in a Makefile so I only need to remember aliases for each step.

v0.1: clean up and tag

This completes the first milestone. At this point, you can optionally tag the code base using

git tag v0.1

v0.0.1: implement add and rm subcommands

v0.0.2: enhance add sub-command

v0.0.3: add ls sub-command

v0.0.4: implement rm sub-command