Neuro-Symbolic AI: three examples

Some weeks ago I posted the neuro-symbolic concept learner article, and I wanted to know more about this approach. I then read about GRUs, used in that system, and this week it has been the time to learn about Neuro-Symbolic AI in general. I read

Neuro-Symbolic AI: An Emerging Class of AI Workloads and their Characterization. Zachary Susskind, Bryce Arden, Lizy K. John, Patrick Stockton, Eugene B. John
https://arxiv.org/abs/2109.06133

The question behind the article is: “are this new class of AI workloads much different from neural network/deep learning models?”. The authors first provided 3 references to Neuro-Symbolic systems, and then they investigated theirs performances. Theirs first finding is that the symbolic part of the computation is much less parallelizable than the neural network one: the symbolic part requires operations on few parameters and has a complex control flow that is not suitable to run on a GPU. Fortunately the symbolic part is not the one that dominates the response time of those systems. The symbolic part manipulates the features extracted by the neural part, and this is the reason why neuro-symbolic systems are more explainable: it is much easier to understand how the output is decided. This implies that these are really composite systems: one or multiple neural networks focus on some task, while the symbolic part reuses theirs output and focus on providing the output. Another advantage is that Neuro-Symbolic systems can be trained with less samples that pure neural ones.

It is now better to introduce the 3 systems under examination. The Neuro-Symbolic Concept Learner (NSCL) is the same system I described few weeks ago: its goal is to observe an image with many solids of different shapes and color and answer questions like “is the green cube of the same material of the sphere?”. The Neuro-Symbolic Dynamic Reasoning (NSDR) system answers to a much more challenging task: observe a 5 second video where solids moves, collides, or exit the scene and answer a natural language question like “what caused the sphere and the cylinder collide?”. The third example is the Neural Logic Machines (NLM); it is quite different from the previous two because there is no separation between the symbolic and the neural part, even though they are able to perform inductive learning and logic reasoning. NLM are for instance able to solve Block World tasks, given a world representation and a set of rules, decide the actions needed to reach the desired final state. The description provided was quite vague, for this reason I plan to read the referenced paper in the next weeks.

The NSCL is composed of a neural image parser extracting the objects positions and features, a neural question parser processing the input question (using GRU) and a symbolic executor that processes the information and provide the final answer. The NSDR is composed of a neural Video Frame Parser segmenting the images into objects, a dynamic predictor that learns the physics rules needed to predict what will happen to objects colliding, a question parser and a symbolic executor. The symbolic executor in NSCL works in a probabilistic way, it is so possible to train it as neural networks; this is not true for the NSDR where it is a symbolic program and runs only on a CPU really as a separate sub-module.

To pursue theirs goal the authors had to profile these systems, and it has not been always easy to set them up. For this reason they decided to completely replace some components with some others more manageable. This gives an interesting list of modules that can be reused:

Detectron2 is a frame parser that extracts objects position, shape, material, size, color… OpenNMT is a question parser able to translate a natural-language question into a set of tokens, PropNet is a dynamic predictor able to predict collision between objects that may enter and leave the scene.

The paper also describe an useful concept I missed, the computational intensity. Suppose you have to multiply an m*k matrix for a k*n one. in this case the number of operations will be O(mkn), because of the way the algorithm works. By converse the memory needed will be O(mk+kn+mn) so you obtain:

When you multiply big nearly square matrices you have an high intensitiy, and it is more easy to parallelize the work. By converse if you have matrices with some dimension very small (or even vectors) the intensity will be low and parallelization will be more difficult to introduce. The paper explains why the more symbolic parts of the computation are low intensity and difficult or impossible to parallelize with a GPU; luckily their execution does not require as much time as the parallelizable part.

Now I have some more interesting links to investigate, and I will start from NLM.

Written by Giovanni

March 5, 2023 at 8:19 pm

Posted in Varie

Giovanni Bricconi