The Neural Logic Machines

A neuro-symbolic model able to learn on its own first-order logic relations and solve some interesting problems, like sorting an array or the block world puzzle.

As promised, this week I have read:

Dong, Honghua and Mao, Jiayuan and Lin, Tian and Wang, Chong and Li, Lihong and Zhou, Denny,

Neural Logic Machines

Seventh International Conference on Learning Representations. https://iclr.cc/Conferences/2019/Schedule
https://doi.org/10.48550/arxiv.1904.11694

Compared to other papers I have read, I have found it both very clear and very dense. While reading it I was highlighting every single paragraph, each single piece seemed giving some important or useful information. Even if it is only 22 pages with the bibliography and the appendix, it will be very hard to resume it in one single Sunday morning post; but unfortunately this is the short time I can devote to this activity.

I found this article while reading about Neuro-Symbolic systems, and as it it different from the other examples I have found; I wanted to have a look. Being Neuro-Symbolic, it is a neural network based model that include also some symbolic form of reasoning over the extracted features. Reading the introduction, I see also that the goal is also to have an inductive learning model: you do not program the logic rules, you provide examples and the model extracts the rules on its own! This is very fascinating, but of course it comes with some hard limitations. Notice also that it will extract some rules, but they will be in the form of neural weights and very likely they will not be human interpretable, even tough the rules will be connected with logic operators.

The training of this model has to be similar to the way an human is trained: just because NLM has to extract on its own the rules it is not possible to start with difficult cases. This way of training is called curriculum learning: the examples are organized in a set of levels sorted by complexity. A model to pass a level must reach a certain performance: if it does not reach it, it will be discarded, otherwise it will be kept and continue to the next level. Examples that make a model fail are very useful, the can be kept and added to the training material, so that the next generation will be better trained. Since the architecture is the same, the model instances will be differentiated by the random seed used to initialize them. At the end of the paper, there is a nice table showing that for complex problems it is hard to train a good model, as fewer of them are able to graduate!

The reference problem used for this NLM is the classic “block world”: some object are laid on the ground, some other stacked on top of them; you can move one object only if it has nothing on top of it and you can put it on the ground or on top of another clear object; you have a start configuration and a target configuration to be reached. A kind of problem that can be modeled using first order logic clauses; but remember the Neural Logic Machine (NLM) has to create the rules on its own, just some initial facts will be provided as training input.

Let’s come to the limitations of this model: you have first-order logic (no rules on object sets, just on single instances). It can produce rules with the “exists” or “for all” quantifiers. It cannot produce recursive or cyclic rules. As meta-parameters you have to provide the maximum “arity” of the deduced rules (maximum number of rule parameters) and the maximum number of rules that are used in a computation. These two limits are called Breath and Depth respectively. The authors say NLM “realize Horn clauses in first-order logic”, to know quickly more see for instance https://en.wikipedia.org/wiki/Horn_clause

The computation is carried on using tensors that represent the probability of one relation to be true – so a true relation between object a and b will be represented by 1, while a false one will be represented by a 0. During the calculation these will be real numbers, but will be generated by sigmoids so they should not be too close to 0.5

The initial facts – 1 arity rules – will be represented by a matrix with the size of the number of input objects and the number of the facts. For instance if you have 3 object and the facts being red and being blue you will have a 3×2 matrix.

Binary predicates will be represented by [m, m-1, r] matrices where m is the number of input instances and r is the number of binary relations. This can continue with ternary relations [m, m-1, m-2, r] and so on. The fact is that you do not need oftern high arity relations, usually ternary relations are enough and you do not need a high Breadth model.

From this you find also an important limitation of NLM: they can work with single instances of objects (1 hot encoded) but they cannot work with real valued input: they can reason with distinct objects in the block world but they cannot reason with the heart-rates, pressures, temperatures etc.

Now that is clear that relation values between objects can be represented by tensors [m, m-1,…r], it is time to understand how the computation evolves in the Depth of the model. There are a set of operation that are defined that allow to compose these tensors. The computation is organized in layers. Each layer will be implemented by a muti-layer perceptron: given the m possible objects all the x-arity input combination will be produced and these will be given as input to replicas of the same multi-layer perceptron. Having the same perceptron means that its parameters will be the same in all the replicas, and they will not be too much – like in a convolutional neural network. The “all possible combinations inputs” causes problems: if you have 3 objects a, b and c, and a 2-arity relation you will have to compute it for (a,b), (a,c),(b,c),(b,a),(c,a),(c,b). If you do it with more object you will see that this is a factorial explosion in the combinations. Ok, machine learning models can have billion of parameter, but having something factorial it is something scaring.

In the and all possible inputs to a relation are generated, they pass in parallel in the same neural network and the result is squashed by a sigmoid. The output is ready to be passed to the next level but, there is more to be done. So far we considered only r-arity, but actually the model in a level considers also r-1 and r+1 arities. It is therefore mixing 2-arity rules also with 1-arity and 3-arity when computing one level. In this way it implements the and/or relation between predicates. At each level the rules are combined and the model can find more complex dependencies.

Notice also that the model implements reductions and expansion on rules. With expansion you consider that a relation like “moveable(x)” can implicitly be thought as “for all y it is moveable(x,y)”. Reduction is the opposite, when you loose one arity in a relation and you compute it in the sense of “exists at least one” or “for all”.

The model can combine together relations with different arities, and consider also the “exists” and “for all” quantifiers. The more times you allow to do this composition, the more complex rules you can compute. You may have noticed that tensors are in the form of [m,m-1,r] and you may wonder how you can consider a relation like relation(x,x); well you will have to deal with relation(x, y!=x) and relationSymmetric(x), so two relations to implement it but NLM can still handle it. I really invite you to read the full articles if you want to understand how the process is done, because it is very clear but requires some pages to be explained.

Once trained on few possible input object, the NLM can scale to an higher number of inputs: this because the relations are ariti based and scaling them to more object just requires to apply the same model to an higher number of permutations. This is a great property: a model trained on 20 instances can scale to 100 instances – but due to the factorial they cannot scale to huge numbers.

The paper evaluates NLM performances with the block world problem, and also with the family tree problem (who is father or x, who is the maternal uncle of x). NLM are able to learn rules that allow to reach 100% success, but they have to be trained with curriculum learning and for some problems more seeds have to be used because the problems are complex and it is not guaranteed to find a model that graduates in complex classes.

If you want to see NLM in action, visit theirs page on Google: https://sites.google.com/view/neural-logic-machines, it is nice to see that this model is able to learn the rules needed to sort an array, find the shortest path in a graph or execute the block world with many objects.

Written by Giovanni

March 14, 2023 at 12:30 pm

Posted in Varie

Giovanni Bricconi