February | 2024 | Giovanni Bricconi

Archive for February 2024

As non English native speaker, it is always funny to try to understand the meaning of a new acronym, this time it is the turn of RAG Applications. I knew about “rag time” music, like this Ragtime Sea Shanty, and searching on the dictionary I saw that RAG stands for a piece of old cloth, especially one torn from a larger piece, used typically for cleaning things. Funny to see the above kind of picture associated to Research Augmented Generation (RAG).

What does Research Augmented Generation actually means? It is a technique to generate content using AI tools like chat GPT: instead of just asking the LLM “write a documentation page on how to shut down a nuclear reactor” you propose a prompt like this “write a documentation page on how to shut down a nuclear reactor rephrasing the content of these 10 pages: page 1. The supermega control panel has 5 buttons…”.

What is the advantage of including in the prompt these examples? The large language model (LLM) may not know anything of your domain for various reasons: maybe you want it to write about something to recent, that has not been used to train it. Another reason may be that you have a very specific domain, where the documentation is not public and therefore the LLM has no knowledge of that.

In my case my team has a private web site where we host our internal documentation: we describe there API, troubleshooting guides, internal procedure, meeting minutes, etc. The LLM does not know anything about our project, so it is impossible that it can provide detailed answers about it.

But how can you retrieve the right 10 pages to embed them in the LLM prompt? If you are familiar with search engines like Solr or ElasticSearch you may have many ideas right now: just export from the site the html text, preprocess it with beautifulsoup to convert it into text and then index it…

Today you can do better than that. Now you have not just LLM that answer questions, you have also models that compute the embeddings of a text like these from OpenAI. You take an html page from your site, you split it in multiple pieces, you ask to get the embeddings of this text and you store them into a specialized engine like ChromaDB. When a question is asked, you compute the question embeddings and ask ChromaDB to give you the 10/20 most similar contents it knows and you format a prompt for ChatGPT that contains those snippets.

It is amazing: we started chatting with our documentation! We can ask how to get a security token, which are the mandatory parameters of a specific API, on which storage accounts we are putting that kind of data…

Why do you need to split your contents in snippets? Simple, an LLM prompt has a size limit, you cannot just write each time a prompt encyclopedia and have an answer… chatgpt 3.5 accept fewer words but is very fast, when trying the 4 we had to wait a lot before seeing the full response. Also you need to consider the price of using the 4 instead of the 3.5.

Will you trust the LLM company for not collecting your private material and using it later as they want? It is much better if you read the user agreement! At least you do not sent the whole material at once to fine tune a new LLM model.

Some frameworks can help you doing this kind of application: Microsoft’s Semantic Kernel and LangChain. I tried the first one, but personally I am not very happy of Python support, maybe better to start with LangChain the next time.

Written by Giovanni

February 18, 2024 at 10:53 am

Posted in Varie

Tagged with ai, artificial-intelligence, chatgpt, llm, rag

World AI Cannes Festival 2024

leave a comment »

The exhibit has been held between February 8th and February 10th in Cannes, French Riviera

While 2023 edition was highly democratized, this time the organization decided to allow access to the interesting speeches only to paying tickets. The company I work for has also bought less ticket so, sadly, I did not profit as much as I wanted of this event. The organization also changed the mobile app they had the last year: in 2023 was so easy to find what was going on across the exhibit, and this time it has become a mess: check each single room to see what was planned… I hope in 2025 they will change their minds.

One interesting speech I saw was “NeuroAI: Advancing Artificial Intelligence through Brain-Inspired Innovations”. The speakers described us some interesting research paths about building more efficient AI hardware. The human brain consumes a ridiculous amount of energy, comparable to a candle light, by converse the modern hardware is really energy-intensive, requiring hundreds of watts to operate. To make the things worst, the current trend is just to build bigger models using more hardware and more training data. It will be impossible to keep this trend, as the energy demanded will quickly become incredibly huge: 85 TWh by 2027 according to this scientific american article

More efficient hardware could probably mimic what the brain is doing: natural neurons fires at some incredibly low rate, just 30Hz. Neurons receive signals, and aggregate them producing an output only once a certain threshold is reached. The output spike does not last for long time and does not consumes that much energy.

It is possible to use some transistors to implement a similar behavior, this could become an unit that can be cloned million of times realizing in hardware an huge array of neurons. Training these devices wold then mean let them learn the correct thresholds and input connections. By converse current architectures require to continously compute values for each neuron output even if the output is not required.

One more energy-intensive task is moving the data around: current architectures have specialized memory units and computation units. The required data has to be moved in and out from the memory to the GPU and back. This requires a lot of energy.

IBM did some experiments, and seems using this neuromorphic patter gives some advantages over traditional LSTM architecture. You can visit Stanislaw Wozniak page to find some references.

I then followed some few other presentations, and some commercial demos. AMD’s one about realtime video processing was both appealing and frightening. You can buy now devices that can process video input streams at up of 90 frames per seconds. You can buy a software stack that allows you to view this camera flow and decide that on some video areas you want to perform some actions: for instance you put a camera watching a car access gate and decide that you use a region of the view to search for a car registration number. You could stack these operation on the remote device or turn you attention on a central system that now just receives events such has the car xx123yy has passed here at 12:05.

The guy setting up this stack of operation does not need to be a rocket scientist, a nice ui can help him setting up his/her video driven application. Moreover you could react to events and change the pipeline of some other cameras to perform other operations. In Singapore parkings facilities they are using this to detect free parkings and guide VIP customers to theirs places… or maybe to avoid non vip customers to pick reserved slots. So far so good, but what does prevent some malicious entity from having cams looking for a specif guy, and being reprogrammed to detect his activity while moving around a building? Do we have to fear that a cam at work place starts detecting how many times we go to the toilets? Unfortunately it does not seems anymore sci-fi… Field Proven Smart City Video Analytics Solution

Written by Giovanni

February 10, 2024 at 11:20 am

Posted in Varie

Tagged with ai, artificial-intelligence, machine-learning, neuroscience, science

Giovanni Bricconi

Archive for February 2024

ReLu or not ReLu?

RAG Applications

World AI Cannes Festival 2024

Recent Posts

Archives

Categories

Meta