I have been recently busy creating materials for teaching / learning Python. The motivation for creating such materials is both in the trainings we host at Adimian and also in the personal need for having some interesting home project constantly going on. The technology choice for the materials was obvious from the beginning: Jupyter Notebooks. In this blog post, I'll explain why Jupyter Notebooks are so awesome for educational purposes. In addition, I'll describe my personal workflow for writing them.
Let's start with a short explanation of Jupyter Notebooks. Briefly, they are documents which can contain rich text elements (e.g. pictures and markdown) and code that can be directly run inside the notebook. The content is split into cells and each cell has it's own type (e.g. markdown or code). Notebooks are ran with Jupyter Notebook App which is basically a server-client application that enables opening and editing the documents in web browser. For executing code inside notebooks, there are kernels for different languages, such as Python, R and Scala. You can get the basic idea of notebooks if you browse through this blog post.
Notebooks are especially well adopted among data science and other scientific folks because of notebooks' capability for sharing data, visualizations, algorithms and plain text in a compact format. James Somers, a journalist of the Atlantic, even predicted that notebooks may become the preferred format of scientific papers in the near future.
In addition to the scientific community, notebooks are obviously well suited for learning purposes because theory and practice can be included in a single document. There's a significant difference between static code samples in PowerPoint slides or in a text book and executable - even modifiable - code samples in a notebook. I believe everyone who has ever studied programming remember the struggle of copying some example code from course materials into editor and then executing it. The struggle gets even worse when the output does not match with the one stated in the materials.
Implementing a post save hook
In addition to viewing notebooks (.ipynb files) locally in browser by running a local notebook server, notebooks can be also viewed in GitHub and in nbviewer. However, in order to make the materials even more accessible, I wanted to also have HTML versions of the notebooks. Another source of motivation was that HTML format has basically endless possibilities for fine tuning the layout and even for adding functionality.
After a short google research it turned out that it's possible to configure a post save hook which will be triggered when the notebook is saved. For converting notebook files (.ipynb) to other formats, such as to HTML, there's a tool called nbconvert. By combining these two information bits it was straightforward to make a configuration that automatically generates a HTML version when a notebook is saved. The neat thing about the Jupyter Notebook configuration file is that it's a regular Python module, which makes it user-friendly to add configuration related functionality.
After starting to write the materials, I soon realised that it would be nice to have the notebooks stored in a "fresh" state in the version control system. By fresh I mean that the code cells are not executed because the output of the execution is also visible in notebook. This way the learners will have to execute the cells themselves to see the output. The best part of this approach is that learners can stop to think what output they expect before actually revealing it. While browsing through notebook materials found from the wild wild web, written by other people, I noticed that people tend to store notebooks in already executed state. I think it's suitable for certain types of content but definitely not for learning materials. Thus, I decided to keep mine in the fresh state.
For the HTMLs, I wanted to have the output available. However, based on the above reasoning, I did not want the output to be visible directly when the HTML file is opened e.g. in a browser. With
You can see an example of the toggle button here.
You can see the full post save hook implementation here.
I want be sure that the code examples don't contain typos or other mistakes which would lead to exceptions when the code cells are ran. Also, I want to be sure that the examples are valid in all the Python versions which I intend to support. There is a pytest plugin called nbval which can be used for testing that execution of a notebooks does not raise exceptions. Combining this with tox and Travis CI I was able to configure continuous integration system which runs always when I push to the GitHub repository. Each CI run verifies that there are no exceptions when the notebooks are executed. The best part is that this is done against all the Python versions I want to test. This CI workflow ensures that there are no unpleasant surprises when the learners run the example code.
Potential future improvements
After implementing the functionality for the post save hook, I've thought some potential future enhancement ideas. As mentioned earlier, I wanted to store the notebooks in a fresh, not executed state. With the current setup, this is not done automatically. One potential improvement would be to always clear the output when the notebook is saved. This should be easily done because
nbconvert seems to have a feature for doing it. However, while modifying a notebook, I would not like to see the output magically disappearing when I hit save. To achieve this, one option could be to do the modifications in a working copy of the notebook which would generate/update the actual notebook during save.
Another enhancement could be to include example solutions for the exercises in version control. If I would have the exercise notebook and the solution notebook as two separate files, it would basically require doing identical updates in two different places, for example in case of a typo. As a self-respecting Python developer, this would be of course unacceptable. Thus, some kind of syncing mechanism would be needed. The tricky part is that the notebooks should be synced only partly - I don't obviously want to include the cells containing solutions in the exercise files. A potential technical solution for partly syncing could be to add a custom flag to cell metadata for those cells which should not be synced. Then in the syncing implementation, these flagged cells could be skipped.
Based on my experiences, notebooks are very well suited for teaching and learning Python. I hope university lecturers and other teachers have adopted them as part of their tool set as well.
Considering the learning experience, I think that notebooks are especially beneficial for beginners because notebooks omit the need for having to learn some code editor simultaneously while learning programming. At worst, learning Python (and programming in general) could mean that theory is in one place, examples are in some source code files, learners do their own experiments in other source code files, they execute the code from command line, and finally, they make notes to a piece of paper. With Jupyter Notebooks, this all can be done in one place.
From content creator perspective, the same benefit applies: everything can be put into one file. In addition, I found the post save hook and the CI workflow extremely helpful and relatively easy to set up. By automating all the boring stuff around the creation process, there is more time to focus on the quality of the content.