Wo bekomme ich eine DSGVO-konforme API als Alternative zu OpenAI?

Es darf gerne etwas teurer sein als die API-Verwendung von OpenAI, sollte allerdings nicht das 50-fache sein.

(2 votes)
Loading...

Similar Posts

Subscribe
Notify of
15 Answers
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
medmonk
5 months ago

If it is to be GDPR compliant and if you really want to be sure that no data can flow to third parties, it would be most sensible to host things on your own server. However, it also depends on the intended use whether the operation or Renting your own server is worthwhile for you at all.

There are also providers such as Aleph Alpha or Spherex. It would be useful to learn what such an API should be used for. Are there any services, a product of your own or a completely different use? Last but not least, I still remember HuggingFace, but I am not absolutely sure.

medmonk
5 months ago
Reply to  Jeremy Edberg

The context continues to help and in your case, other solutions come into question. I don’t know what it looks like about your hardware and computing power, but I would look more towards local LLMs and that.

I have run Ollama with OpenWebUI via Docker and can work a lot on deposited prompts. If you want to go further, for example, with n8n for AI automation would be much more possible.

The good of all the technologies mentioned is that your data is not passed on to third parties. Everything runs locally on your computer, on a home server or alternatively on a separate server that you rent outside.

Even if you only use Ollama so that “out-of-the-box” thanks to API can access it. For example, I have integrated it into Obsidian and can quickly, easily and comfortably search my entire notes.

Using shortcuts, I can create a summary, fill an email with content or work with other data. Even as a code assistant, everything runs locally without leaving my network.

medmonk
5 months ago

Not a thing, like it.

We in the company can’t cost anything.

Who doesn’t know, this constant whining. ; I wish you a lot of success in negotiating to make your employer tasteful. He can even save money if the cost of the subscription would fall.

Anyway, there’s a lot of success. You can give a feedback later on if and how it was solved. For further questions, just check again.

medmonk
5 months ago

There are smaller LLMs such as phi, which are relatively compact and optimized so that they can also be used on weaker hardware. However, I would plan at least 16 GB of RAM so that everything can run smoothly.

What might be an option for you would be buying a somewhat newer think client or the like. Alternatively, at least extend the RAM to 16 GB or more and then test itself which LLMs run acceptable on it.

Otherwise, hold a (V) server for ~50 € a month and then use the LLMs with significantly more performance. About OpenWebUI can also work with profiles, as you know from other web services.

The data still remains with you. The essential difference is that it does not run locally but on your/eur server. Everything else remains as if, except that it is running through an external server.

LUCKY1ONE
5 months ago

As far as I know, OpenAI says at least that they do not use data from paid users for training. But would not put my hand in the fire, even if they said it.

Epiktetos
5 months ago

Thanks to EU regulations such as the AI Act, European competitors on OpenAI are virtually no and will not exist in the foreseeable future. If you process personal data such as e-mails, you’re not realistic about running an LLM on your own infrastructure.

However, for processing and answering emails, Phi could be from Microsoft or comparable small LLMs (~2.7 billion parameters) to be small. I don’t think there’s something that makes sense, especially if you use GPT-4o or GPT-4 as a comparison. If you have the output quality of GPT-4o miniature Llama 3.1 70b of Meta would be very similar in terms of quality, but open-source and thus on own hardware (e.g. with Ollama). There is no old office PC enough. Meta recommends at least: an 8-core CPU, 32 GB RAM and a GPU at quality level of at least the Nvidia 3000 series. Minorly smaller systems also go, the answers will only be waiting for quite a long time.

There are some vendors who host the big open source models for you and which you can access by API with usage-based billing (just like OpenAI), for example Replicate. You don’t have to get your own hardware. Unfortunately, I do not have a European provider.

Epiktetos
5 months ago
Reply to  Jeremy Edberg

Phi 3.5 mini achieves a value in the MMLU Benchmark (a fairly broad benchmark that measures LLM’s world knowledge and language understanding) 55.4% (of max. 100%). GPT-4o mini a value of 82%. That’s quite an extreme difference. The quality of a language model correlates quite strongly with the number of parameters. The more parameters, the better, but the higher the system requirements. Phi 3.5 mini has 3.8 billion Parameters (I originally went out from 2.7 in my answer, but still has a little more), with GPT-4o mini the number is not known, but it is definitely far more. Llama 3.1 70b (70 billion) Parameters) achieves a value of 86%, is thus objectively even better than GPT-4o mini.

The measuring methods differ here and there (few-shot / multi-shot / CoT)So take everything with a pinch of salt. But in principle, this gives a good overview of how these models intersect with each other.

A compromise could possibly still be Llama 3.1 8b (8 billion parameters) is a value of 73% achieved. However, as I said, at least one 3000 GPU should be installed.

In principle, however, you have to try out what makes sense for your application. It may also be that a small language model like Llama 3.1 8b or even smaller than Phi mini will work well for your use case. Thanks to Ollama, it can be tested locally relatively easily, so long as its own machine has enough juice.

medmonk
5 months ago
Reply to  Epiktetos

Here you have forgotten Llama 3.2 with 1b and 3b, which also cut off relatively well and also run relatively well on weaker hardware. There you can also use 16 GB of RAM and without dedicated GPU.

Alternatively, Llama 3.1 8b instead of the 70b and you might have to look at HuggingFace as it looks at the data. And instead of the 32 GB RAM and dedicated GPU, a MacMini M1 and a newer one would be an option.

Where I had already run the mentioned models on a ThinkClient test. Do not object to you and optimally more performance would be good. It still works in the smaller frame even without 8 cores, GPU etc.

A dedicated server with a 14 core and 64 GB of RAM also costs “only” by 50 € per month. If a dozen employees don’t access it and it’s enough for you, you could help yourself and host everything yourself.

LG medmonk

Epiktetos
5 months ago
Reply to  medmonk

I honestly do not believe that 3b- or even 1b-models (which are actually mainly used for edge/on-device applications) are sufficient for the application described by him (e-mail addresses formulate). Especially when it comes to the fact that the model is still to splash out an output in a structured format with a correspondingly long input.

I tested this with Llama 3.1 8b Instruct q8_0 and an average long email. Despite the request, only the answer to the e-mail in Plaintext started with “Certainly! […] Here’s an appropriate answer to the provided e-mail:”. Depends on how the answer is processed further at the end. Llama 3.1 70b Instruct had no problem with it.

Perhaps this would be possible with fine tuning, then these very small models would be possible. worth considering.

But the Apple Silicion devices are definitely a good recommendation – we use machines instead of dedicated GPUs.