The musical paper about a Bach machine

I recently had the privilege of reading a paper about classical music and one of its greats of all time, Johann Sebastian Bach. The paper is one of my best of the year so far, and we are pretty close to its end, notwithstanding the great papers I’ve read, specially when I started almost three months ago to review and divulge them here in this Blog.

I emphasize again the point that these reviews are not intended to be anything official or authoritative on the papers I choose to review, and that could never be the case. But it is a good way to enhance my knowledge on the issues of Computer Science, Information Theory, Data Science and every about the engineering and scientific topics surrounding those fields of study; and I think that in the process I try to convey my views and add a bit of value here and there with a clear consciousness that my own mind – and I hope of every one that reads and enjoys the read – is capable enough to fully understand and interpret what is written. I am not also excessively naive and can acknowledge that some of the topics sometimes chosen are of a degree of difficulty or intellectually demanding enough to be a bit beyond my own abilities; on the other hand we must always try to strike the right balance and in this process learn something new, important and share it with others with little or no reserve whatsoever.

Leaning something new, interesting and important is what DeepBach: a Steerable Model for Bach chorales generationDeepBach: a Steerable Model for Bach chorales generation. It depicts the introduction by its authors of a new statistical algorithmic model, based on deep learning frameworks, that were applied to the art of Bach chorales. These kind of pieces of music have revealed challenging to model with a computer, in spite of repeated attempts by other researchers and practitioners whose efforts have achieved already some marks and developed the field of music with computers for some time now. That is one of the reasons for the researchers from Sony Computer Science Laboratories, Paris and the Université Pierre et Marie Curie to try a new approach, one which would relay on deep learning frameworks.


The composition of polyphonic chorale music in the style of J.S Bach has represented a major challenge in automatic music composition over the last decades. The art of Bach chorales composition involves combining four-part harmony with characteristic rhythmic patterns and typical melodic movements to produce musical phrases which begin, evolve and end (cadences) in a harmonious way. To our knowledge, no model so far was able to solve all these problems simultaneously using an agnostic machine-learning approach. This paper introduces DeepBach, a statistical model aimed at modeling polyphonic music and specifically four parts, hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. We evaluate how indistinguishable our generated chorales are from existing Bach chorales with a listening test. The results corroborate our claim. A key strength of DeepBach is that it is agnostic and flexible. Users can constrain the generation by imposing some notes, rhythms or cadences in the generated score. This allows users to harmonize user-defined melodies. DeepBach’s generation is fast, making it usable for interactive music composition applications. Several generation examples are provided and discussed from a musical point of view.

Introductory remarks

The much lauded machine learning field of deep learning has been enjoying a happy hyped phase in recent years. The frameworks currently being developed and deployed open-source to as wide a range of audiences as possible, are nothing short of amazing technological and scientific feats by their own. And the range of applications and implementations keeps increasing day by day. This is just one more: classical polyphonic music, which might lend itself to difficulties pertaining to the challenge of synchronization, precision, timing and rhythmic accuracy. This in the end result in a constraint optimization problem of the sort deep learning frameworks revealed to be good at:


The difficulty, from a compositional point of view comes from the intricate interplay between harmony (notes sounding at the same time) and voice movements (how a single voice evolves through time). Furthermore, each voice has its own “style” and its own coherence. Finding a chorale-like reharmonization which combines Bach-like harmonic progressions with musically interesting melodic movements is a problem which often takes years of practice for musicians. From the point of view of automatic music generation , the first solution to this apparently highly combinatorial problem was proposed by [13] in 1988.

This problem is seen as a constraint satisfaction problem, where the system must fulfill numerous hand-crafted constraints characterizing the style of Bach. It is a rule-based expert system which contains no less than 300 rules and tries to reharmonize a given melody with a generate-and-test method and intelligent backtracking. Among the short examples presented at the end of the paper, some are flawless. Drawbacks of this method are, as stated by the author, the considerable effort to generate the rule base and the fact that the harmonizations produced “do not sound like Bach, except for occasional Bachian patterns and cadence formulas”. In our opinion, the requirement of an expert knowledge implies a lot of arbitrary choices. Furthermore, we have no idea about the variety and originality of the proposed solutions.

What is remarkable for the deep learning frameworks mentioned is that they perform  the task of reproducing a replicated chorale of J.S.Bach without resorting to specialized prior knowledge of the music and without an expert bias in its input; all that is needed is lots of training data and computational appropriate apparatus (i.e. high performing GPU , etc). Some recent attempts were promising, but the authors of our reviewed paper detected some shortcomings to their implementation to Bach chorales, prompting their motivated new approach:

Recently, agnostic approaches (requiring no knowledge about harmony, or the music by Bach) using neural networks have been investigated with promising results. In [8], chords are modeled with Restricted Boltzmann Machines (RBMs). Their temporal dependencies are learned using Recurrent Neural Networks (RNNs). Variations of these architectures have been developed, based on Long Short-Term Memory (LSTM) units [23] or GRUs (Gated Recurrent Units) [10]. These models, which work on piano roll representations of the music, are in our opinion too general to capture the specificity of Bach chorales. But one of their main drawback is their lack of flexibility. Generation is performed from left to right. A user cannot interact with the system: it is impossible to do reharmonization for instance which is the essentially how the corpus of Bach chorales was composed. Moreover, their invention capacity and non-plagiarism abilities are not demonstrated.

The most recent advances in chorale harmonization is arguably the BachBot model [22], a LSTM-based approach specifically designed to deal with Bach chorales. This approach relies on little musical knowledge (all chorales are transposed in a common key) and is able to produce high-quality chorale harmonizations. However, compared to our approach, this model is less general (produced chorales are all in the C key for instance) and less flexible (only the soprano can be fixed). Similarly to and independently of our work, the authors evaluate their model with an online Turing test to assess the efficiency of their model with promising results. They also take into account the fermata symbols (Fig. 2) which are indicators of the structure of the chorales.

The DeepBach model introduced in this paper is a LSTM (Long Short-term Memory)-based framework. The innovative methodological introduced concerns the sampling procedure, which is somewhat unconventional (normally reading a music sheet score is done for left to right and our authors skip this; they also each voice separately). The result is surprisingly efficient, rich and an enhancement to previous music replication attempts:

In this paper we introduce DeepBach, a LSTM-based model capable of producing musically-appealing four-part chorales in the style of Bach. Contrary to other models based on RNNs, we do not sample from left to right and model each voice separately. This allows us to enforce user-defined constraints such as rhythm, notes, parts, chords and cadences. DeepBach is able to produce coherent musical phrases and provides, for instance, varied reharmonizations of melodies without plagiarism. Its core features are its reliance upon no knowledge, its speed, the possible interaction with users and the richness of harmonic ideas it proposes. Its efficiency opens up new ways of creating interesting Bach-like chorales for non experts similarly to what is proposed in [26] for leadsheets.




The DeepBach framework is a bit more detailed in what follows:

In this paper we introduce a new generative model which takes into account the distinction between voices. Sect. 2.1 indicates how we preprocessed the corpus of Bach chorale harmonizations and Sect. 2.2 presents the model’s architecture.

2.1 Data Representation


We represent a chorale as a tuple of six lists:



where the Vi ’s represent the four voices (soprano, alto, tenor and bass) to which we add two other lists: S the list of subdivisions and F the list of fermatas. All lists are indexed by a time index t and have equal size.




Since Bach chorales contains only simple time signatures, we discretize time with sixteenth notes, which means that each beat is subdivided into four equal parts. Since there is no smaller subdivision in Bach chorales, there is no loss of information in this process.

Each voice Vi contains the midi pitch of the played notes. It is a unique integer for each note, with no distinction between enharmonic equivalent notes. In order to represent rhythm in a compact way, we introduce an additional symbol to the pitches coding whether or not the preceding note is held. The subdivision list S contains the subdivision indexes of the beat. It is an integer between 1 and 4: there is no distinction between beats in a bar so that our model is able to deal with chorales with three and four beats per measure. The fermata list F indicates if there is a fermata symbol, see Fig. 2, over the current note, it is a Boolean value. If a fermata is placed over a note on the music sheet, we consider that it is active for all time indexes within the duration of the note.




Our choices are very general and do not involve expert knowledge about harmony or scales but are only mere observations of the corpus. The list S acts as a metronome. The list F is added since fermatas in Bach chorales indicate the end of each musical phrase. The use of fermata to this end is a specificity of Bach chorales that we want to take advantage of. Part 4 shows that this representation makes our model able to create convincing musical phrases in triple and quadruple simple time signatures.

2.2 Model Architecture


For clarity, we suppose in this section that our dataset is composed of only one chorale written as in Eq. 1. We introduce a family of probabilistic models p parametrized by a parameter θ on our representation defined in Sect. 2.1. We do not model probabilistically the sequences S nor F but consider them fixed. The negative log-likelihood of our data is thus defined by


We need to find a parameter θ which minimizes this loss. In order to have a computationally tractable training criterion, we introduce the pseudolikelihood of our data [7,4]. This approach was successful in many real-life problems [14] and consists in an approximation of the negative log-likelihood function by the sum over all variables:



where Vˆt [i] indicates the pitch of voice i at time index t and V\i,t the union of all Vi ’s except from the variable Vˆt[i] . This suggests to introduce four probabilistic models pi depending on parameter θi , one for each voice, and to minimize their negative log-likelihood independently using the pseudolikelihood criterion. We obtain four problems of the form:



The advantage with this formulation is that each model has to make predictions within a small range of integer values whose ranges correspond to the usual voice ranges.

(6 We adopt the standard notation [N] to denote the set of integers {1, . . . , N} for any integer N.)



I am a bit with a notion of copying too much of the paper here. But I could not help myself not doing so, as the compelling nature of what is written, its clarity and emotive content is impossible to miss for me not to reproduce here. Otherwise my own words wouldn’t be fully able to convey the main points to stress in this paper. Quite rightly so for The Information Age:



The aim of these models is to predict the pitch of one note knowing the value of its neighboring notes, the subdivision of the beat it is on and the presence of fermatas. We implement them using neural networks based on LSTMs [18,24]. For accurate predictions, we choose to use four neural networks: two stacks of LSTMs, one summing up past information and another summing up information coming from the future together with a non-recurrent neural network for notes occurring at the same time. Their outputs are merged and passed as the input of a fourth neural network whose output is pi(Vˆt[i] |V\i,t, S, F, θ). Figure 4a shows a graphical representation for one of these models. Details are provided in Sect. 2.4.


2.3 Generation


Generation is performed using Gibbs sampling [15]. In our case, this consists in the following algorithm:



Note that it is possible to make generation faster by making parallel Gibbs updates on GPU. Steps (3) to (5) from Alg. 1 can be run simultaneously to provide significant speedups. In Table 1 we show how the batch size (fixed number of parallel updates) influences the number of updates per second. Even if it known that this approach is biased [12] (since we can update simultaneously variables which are not conditionally independent), we experimentally observed that for small batch sizes (16 or 32), DeepBach still generates samples of great musicality while running ten times faster than the sequential version. This allows DeepBach to generate chorales in a few seconds.


Concluding remarks


Without delving in too much of the details of this interesting paper, I will now sketch the main points of advantage that the authors stress as being a feature for DeepBach model, which are based on the implementation of Algorithm 1 (all the source code is available open source at this GitHub repository), and then I finnish this paper review/description with the authors’ concluding remarks:

The advantage of this method is that we can enforce user-defined constraints by tweaking Alg. 1:

– instead of choosing voice i from 1 to 4 we can choose to fix the soprano and only resample voices from 2, 3 and 4 in step (3) in order to provide reharmonizations of the fixed melody

– we can choose the fermata list F in order to impose end of musical phrases at some places

– for any t and any i, we can fix specific ranges Rt i , subsets of the range of voice i, to restrict ourselves to some specific chorales by re-sampling V t i from pi(Vˆt[i] |V\i,t, S, F, θi , V t i ∈ Rt i ) at step (5). This allows us for instance to fix rhythm (since the hold symbol is pitch), impose some chords in a soft manner or restrict the vocal ranges.

– every time index is at distance at least δ from the other time indexes

– configurations of time indexes satisfying the relation above are equally sampled.

This trick allows to assert that we do not update simultaneously a variable and its local context.

As always I recommend anyone interested to read the full paper and disclose all the implementation details, experimental results and further websites, graphical, musical and mathematical niceties drawn from this paper. As an added bonus maybe someone else will form now on be a novel Bach fan, musical aficionado and even a new practitioner in the making. Everyone is sounding that this is good for your brain and mental health…


Discussion and future work

We described DeepBach, a probabilistic model together with a sampling method which is flexible, efficient and provides musically convincing results even to the ears of professionals. The strength of our method, to our point of view, is the possibility to let users impose unary constraints, which is a feature often neglected in probabilistic models of music. We showed that DeepBach do not suffer from plagiarism while reproducing J.S. Bach’s style and can be used to generate musically convincing harmonizations. We now plan to develop a music sheet graphical editor on top of the music21 toolkit in order to make interactive composition using DeepBach easier. This method is not only applicable to Bach chorales but embraces a wide range of polyphonic chorale music, from Palestrina to Take 6.




One thought on “The musical paper about a Bach machine

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s