Archive for May 2024
Leaky Integrate-and-Fire
About one year ago I attended a conference where one professor presented the work of her team on simulating real neurons. I was curious about how body neurons can be modeled and I have taken the opportunity of a long train journey to browse a bit.
I stared my search looking for “Spiking neural network”. The most surprising thing I learned is that real neurons seems do nothing most of the time. Suddenly they can fire a spike to alert other neurons, but usually they seem idle. The frequency at which they fire is quite low, and this is probably the reason why the brain consumes so few energy. If you search on google you will quickly see that this frequency is usually less than 1kHz, usually much less. By converse, the “artificial neurons” we use in deep learning models fire continuously and synchronously at gigahertz rates: every parameter need to be evaluated at each clock signal, the output can be zero but we need to recalculate the output of each neuron in the network.
Biological neurons seems modeled as a Capacitor with a reset circuit close to it. Each time an input spike is received, the capacitor charge increases. When the charge is high enough, a spike is emitted and the charge is reset. A more realistic model, the leaky integrate-and-fire, puts also a resistance in parallel of the capacitor. It’s role is to discharge slowly the capacitor, to clear the effect of old spikes received too long ago. Input spikes are significant only if they are received in a short time interval.
On this tutorial you will see a good presentation that describes the models and the formulas: https://compneuro.neuromatch.io/tutorials/W2D3_BiologicalNeuronModels/student/W2D3_Tutorial1.html
A biological neuron is composed of dendrides, which are root like structures that receives the inputs from other neurons. The dendrides enters the body of the neuron cell, from which an axon exits: the axon is the output connector that will propagate the spike to thousand of other neurons. Axons and dendrides do not touch directly, they are connected via synapses.

The leaky integrate-and-fire model consists in this formula:

The output current of the neuron depends on 2 components: the leaky part is on the left (with the R internal resistor) and the input driven part with the C constant. u_rest is the tension at rest of the output, when we have not enough input spikes.
The presentation is suggesting some other interesting points.
The spikes are all the same: the shape is the same, in 1-2 ms the spike ends. Seems that it is just important that the spike happened, there is not a concept of high or low spike. By converse artificial neurons have an output that is a real number, maybe normalized but with variability, not just 1 or 0.
There exists more sophisticated models that can explain another phenomenon called adaptation. Suppose you present the same constant input to a neuron, with the leaky integrate-and-fire model the neuron will charge, emit a spike, recharge, emit another spike… So it will generate an output sequence with a specific frequency. Seems that nature do not likes this behavior: a constant signal do not bring much information, real neurons will emit spikes, but with a frequency that decreases. The spikes will be there but less and less frequent.
But how do biological neuron learn? with artificial neurons you have algorithms that uses the gradient and adapt the neuron’s coefficients to learn producing the correct output. I ask myself if a model exists to explain how the dendrides and axon change in response to all these spikes. Maybe a good topic to explore in another train journey.
Large-language-model as a Judge

It is amazing what large language models can do with text, and with libraries like LangChain it is trivial to create chatbots. You can create a small application that uses your internal documentation to answer questions in few hours. I did it with my docs, and works pretty well even if I am not an expert of LLM and machine learning. It is robust, without any tuning you get something working.
But suddenly you start to see something wrong. For instance, I stated in the prompt that I wanted to know the source of every answer. Nothing bad in it, no? To simple questions like “what is the XYZ framework” the creative LLM machine started replying “The XYZ framework does this and that and has been created by Mohammed…”. Ohhh, the word SOURCE means also author, and in each page I have the AUTHOR. I don’t want that the tool starts finger-pointing people. Same happened with an incident post-mortem page: “According to Sebastien, you should check the metric abc-def to verify if the system is slowing down…”
Given the context the answers were correct, but somebody may not be happy of what the machine can tell, and I don’t want to deal with personal identification information at all. Mine is an internal tool, if it was a public site it would have been a real problem.
Now the question are: how can I be sure I have solved this issue? how can I be sure I won’t introduce them again? In the end the output of the chatbot is random and is in natural language so it is not a trivial problem where you can say for question X the answer is exactly Y.
An idea would be to create a panel of questions and prompts to validate the answers: so I won’t evaluate each single answer myself but I will trust the LLM to be able to decide on its own output. This because of course I do not want to spend my time in checking hundreds of questions each time I change a parameter.
Actually evaluating a chatbot is much more complex: I started looking for articles about that and I have found the Ragas framework and this article:
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks. Lianmin Zheng et al.
The authors explored the idea of using LLM as a judge, comparing the answers of different LLM. In their case they compare different models, but I think this could be useful also to evaluate the answers of different versions of your own chatbot.
In their case they have some questions and ask ChatGPT 4 to decide if the answer of system X is better than that of system Y. In my case I could change some parameter of my chatbot, for instance the number of context document used, and decide which is the best setting.
In theirs article they describe various approaches:
- Pairwise comparison. An LLM judge is presented with a question and two answers, and tasked to determine which one is better or declare a tie.
- Single answer grading. The LLM judge is asked to directly assign a score to an answer
- Reference-guided grading. In certain cases, it may be beneficial to provide a reference solution if applicable.
I do not like much the second and third approaches: asking to get a vote can result in high variability in the answers, and providing reference solutions is hard because you need to prepare these solutions.
Concerning the pairwise comparison the authors highlight some nontrivial issues:
The LLM can prefer the first answer to the second one just because of the position. So you should accept that X is better of Y only if swapping the answers position leads to a consistent result.
LLM could prefer a verbose answer… the more content the better! this seems difficult to address
Self-enhancement bias. for instance ChatGPT 4 favors itself over other LLM in 10% of the cases. This won’t affect much my chatbot as I won’t switch from one model to another. I have just ChatGPT 3.5 or 4 to choose and it is evident which one is the best.
Limited logic math capabilities… LLM are still limited in these domains, the judge will not be good enough in this case.
In the end ChatGPT 4 seems a good enough judge so I plan to use pairwise comparison in future as a method to assess the quality of a new chatbot release.