top of page

Our blog, ideas, opinions

Here you can find some of our ideas, opinions and more. The full list can be found in Medium, here.

Topics

Here is a list of the topics we wrote about. If there is a link to the source, click on it and you will be redirected.

Getting reliable information while on a tight budget

I am a consultant and I work on my own. I own a small company and have advised many clients -from individuals to large, publicly traded corporations- in the last 10 years.

Part of my work requires analyzing public documents, such as corporate registries, sanctions databases, news (of course) and other information from public sources like regulator’s information and similar.

There is a lot of information available, but at the same time there is not. I mean, if you have a big budget is easy, you pay and you can get a lot of information. Or you can hire an expensive company, which will have access to expensive databases (maybe) and have juniors working for you (likely).

But if you are on a “normal” budget, or, as it often happens, on a tight one, and you need an experienced consultant, you may be out of luck.

I have been often in similar situations. What happens there is that as a professional you more often than not are open to work even beyond budget to help your client, knowing that there won’t be a direct financial reward for your effort. However, professionally it may be *very* rewarding, for me at least.

But I also was in situations where the budget was not really the issue, the issue were either the available tools, the availability of the information or a combination of both.

Some months ago I decided to build my own research tool and I ended up with this, which is a web application called NC Data.

While my tool is not yet complete (I aim to cover all Europe, or at least most of Europe’s main countries), it already helps me search, draw timelines, create arbitrary network graphs without sending information to a third party (everything is local) and I can also process my entire “work session” with AI — which is amazing.

I use the tool daily and I am planning to add more features and sources.

You can check some videos on our YouTube  channel:

 

https://www.youtube.com/@nurnbergconsulting

Since I think that others may find the tool useful, I added a subscription option, so that anyone on a tight budget (or not, but why spend money uselessly?) can have access to a tool that allows them to search, print reports based on official information, create timelines and graphs and ask an AI to assist or deepen the analysis.

Enjoy it and feel free to contact me for suggestions or feature requests.

You can read this story also on Medium (which is nicer)

What a ride! Parsing the Spanish corporate registry

What a headache…

 

I work as a consultant. My main activity is to analyze documents or relationships to provide an assessment on risk or opportunities. This is a delicate work, because the quality of the assessment strictly depends on the availability and reliability of the information as well as the ability to set the correct context and ponder nuances. It’s sort of a back-and-forth between different contexts, each with its own peculiarities and weight.

But I have done this for years and feel comfortable doing it. I also enjoy doing it. This said, to understand a context, I like to plot network graphs to get a bigger picture of network relationships. This, however, given the constraint given by the reliability of the information is not always an easy task. This is the reason for which I eventually wrote my own application.

When there is a chance I use APIs of available corporate registries. This is the case of France, the United Kingdom, Italy, to name a few, but not of Spain. In Spain, the corporate registry does not provide an API and the only thing available is a daily update of the Spanish official gazette. The registry is accessible, of course, but with severe limitations, both in terms of information provided and of user experience (and cost, obviously).

So I decided to try to parse the daily releases of the Spanish gazette. For this, I wrote a parser in Python, which seemed to work straight away. Except that the parsed text ended up full of errors. The problem was caused by many edge cases, which turned out to be not exactly edge cases, rather a real mess. To make it short, it took me nearly three months to come up with a viable solution (viable meaning that I have an error rate of approximately 0.025%, which I should be able to lower a bit in the next few iterations, though I expect a manual intervention to fix a few hundred cases that are not manageable with the parser — at least, not for me).

After all this work I feel satisfied, and my graphs now build fast and with only a few errors, hopefully minors). However, every single day while working on the parser, I have though about the fact that corporate data should be available in APIs in all European countries. Besides the fact that I consider outrageous that a State does not understand that public information should be really public, I wonder if there is statistic data that analyzes the correlation between data access openness and societal development.

Out of curiosity, do you think there is a positive correlation between data openness and societal development? Let me know!

You can read this in Medium, too!

How I experimented with AI (and went straight for a cup of red wine)

Anthropic first, then ChatGPT

Last summer I was on holiday in the South of Spain. I was working on my application and decided to try what I felt could be an experiment with the newly available LLM models. I had just been granted access to a dev api key from Anthropic and I already had an OpenAI api key, so I could test the two models on the same novel approach.

TL;DR For some reasons, Anthropic’s Claude worked better in this specific use case

Since I was working on rendering network graphs in my application to analyze corporate relationships, I asked myself how would an LLM model perform in:

  • Extracting entities and relationships from an arbitrary text (well, I knew already, both LLM models performed well for entities extraction)

  • Once extracted the entities and the relationships, would they plot the network graph directly? What would they actually do?

  • Would it be straightforward to implement? Would it be usable and would be the output insightful? How would it compare to a deterministic network graph rendering?

I got the answer to my questions the same day. And I got more answers the day after.

What I did first was to choose an arbitrary text; in my case, that was a random newspaper’s article in Italian. Just to make sure entities could be recognized more easily, also for debugging purposes, I chose an article about a coffee company wanting to expand its business abroad.

I prepared a frontend, a graph component and serverless functions to call the api endpoints of the two LLMs. The api endpoints responded correctly, so I was ready to go to the second step: How could I make sure the response was correctly returned in a way that could be used by my graph component? My graph component uses the vis.js library and I needed to make sure that the response could be correctly understood in order to render the graph.

TL;DR It’s all about the prompt (and some cleaning of the response)

This is when things became interesting. I tried to “force” the LLMs to return a structure in a specific JSON format. After a few attempts, it turned out that Anthropic’s Claude was more easily able to return the response in the correct format. The key was to hardcode an example structure in the prompt, though it must be considered that the example must be simple and clear. I tried a few examples and I eventually noticed that both LLMs failed in returning a clean JSON because of their insistence in including polite introduction to the JSON: “Here is your JSON data:…”, “The data you asked is presented below”, and similar. This was unnerving.

Ultimately, in both cases, I hade to clean the response before passing it to the graph component, though I hoped the hardcoded prompt (my idea was to plot the graph with the click of a button) would suffice. It did not, so, cleaning the response turned out to be an essential step.

Once the response was cleaned, I was set. Passing the response to the vis.js library worked well and yes, LLMs like Anthropic’s Claude and ChatGPT can plot a network graph from arbitrary text.

Now, since copying pasting text is not my normal use case, I prepared a component to upload a PDF (and now also Excel) file and plot the network graph from an arbitrary PDF or Excel file.

This is the result with Anthropic’s Claude:

And below there is the result with OpenAI’s ChatGPT, using the same file. Notice that the first graph is not well rendered, but the second is, by simply clicking again on the “Get a graph” button:

I then iterated a bit, updating the models and allowing interactivity with the documents (more than one document at a time can be graphed, up to 5 documents) and the graph, so that now I can not only get a network graph, but I can also ask the LLMs about the graph or request a subset of the information in graph format or also generate tables, which can be easily exported in Excel format and seamlessly pasted in a spreadsheet for further use.

And yes, after seeing the first graph nicely rendering I went straight for a cup of a good Spanish red wine.

List Title

This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.

bottom of page