Sovereign AI: Using LLMs Without Sacrificing Privacy - The Sovereign Computing Show (SOV013)

Tuesday, April 29, 2025

AI assistants like ChatGPT and Claude are powerful tools, but they come with significant privacy trade-offs. In this episode, Jordan Bravo and Stephen DeLorme explore practical approaches to using AI without surrendering your data to big tech companies. They compare privacy-focused third-party services that use confidential computing (like Maple) and local storage options (like Venice.AI) before diving into running open-source models entirely on your own hardware with tools like Ollama, GPT4All, and LM Studio. They also reveal how your Smart TV might take screenshots of what you're watching through Automatic Content Recognition (ACR) and share steps to disable this intrusive tracking.

Chapters

00:00 Introduction to The Sovereign Computing Show
00:42 ATL BitLab Sponsorship Information
01:45 Welcome and Show Contact Information
02:09 Smart TVs and Automatic Content Recognition (ACR)
03:58 How ACR Surveillance Works in Smart TVs
05:23 The Creepy Reality of TV Screenshot Tracking
08:33 Solutions for Smart TV Privacy Concerns
10:47 Unplugging Your Smart TV from the Internet
11:51 Main Topic: Using AI and LLMs Privately
12:44 Understanding LLMs vs. Other Generative AI
14:51 The Privacy Problem with Major LLM Providers
16:44 Private Third-Party AI Providers:
16:44 - Maple and Confidential Computing
22:32 - Venice.AI with Local Storage
27:28 - Kagi AI's Privacy Trade-offs
30:49 The Privacy Spectrum of AI Services
33:38 Self-Hosting LLMs and Local Models:
34:22 - Ollama for Running Local Models
37:25 - Running Models Without Internet Connection
38:43 - OpenWebUI for Graphical Interface
41:35 - GPT4All for User-Friendly Local AI
43:03 - LM Studio with Integrated Interface
44:55 Hardware Limitations for Local LLMs
46:15 Local Image Generation:
46:47 - Stable Diffusion Web UI
48:09 - ComfyUI for Artist-Friendly Workflows
51:50 ATL BitLab AI Meetup Information
53:11 Conclusion and Contact Information
53:40 Show Outro and Support Details

Transcript

Stephen DeLorme: [00:00:00] Sarah, a loving and caring girlfriend. A loving and caring girlfriend. She will do dot, dot, dot. I'm just like, is this the end times, Jordan?

Jordan Bravo: Oh yeah, I saw an article that said if you have an AI girlfriend and it's not local only, then your AI girlfriend is cheating on you, which is kind of true.

Stephen DeLorme: Yeah, that's so true. But also, that's a funny joke. But Oh my God, we are the end times.

Jordan Bravo: Welcome to the Sovereign Computing Show, presented by ATL BitLab. I'm Jordan Bravo, and this is a podcast where we teach you how to take back control of your devices. Sovereign Computing means you own your technology, not the other way around.

Stephen DeLorme: This episode is sponsored by ATL BitLab. ATL BitLab is Atlanta's freedom tech hacker space. We have co working desks, conference rooms, event space, maker tools, and tons of coffee. There is a very active community here in the lab. Every Wednesday night is Bitcoin night [00:01:00] here in Atlanta. We also have meetups for cyber security, artificial intelligence, decentralized identity, product design, and more.

We offer day passes and nomad passes for people who need to use the lab only occasionally, as well as memberships for people who plan to use the lab more regularly, such as myself. One of the best things about having a BitLab membership isn't the amenities, it's the people. Surrounding yourself with a community helps you learn faster and helps you build better.

Your creativity becomes amplified when you work in this space, that's what I think at least. If you're interested in becoming a member or supporting this space, please visit us at atlbitlab. com. That's A-T-L-B-I-T-L-A-B dot com. Alright, on to our show.

Jordan Bravo: Welcome to the Sovereign Computing Show. I'm Jordan Bravo, and I'm here today with Stephen DeLorme.

Stephen DeLorme: Sup!

Jordan Bravo: We wanna remind you that you can contact the show by boosting in via Fountain FM and search for [00:02:00] ATL Bitlab. If you send us a Boostagram with a message and some SATs attached, we will read it. On the show and take your feedback and respond.

If you don't wanna send in a boost with Fountain or any other method, you can also email us at sovereign@atlbitlab.com and we will read your feedback and respond to it as well. There's an article that we want to talk about today, and it is about Smart TVs. This is a ZDNet article and we will link this in the show notes so you can go into detail later based on your specific brand of Smart TV. But there's something called ACR, Automatic Content Recognition. And this is a tool that tracks your viewing habits that's built into the Smart TV's, software. And it's on by default. And so this article tells talks about a little bit what it [00:03:00] is. Essentially it is a I wanna say a protocol or a, a standard that Smart TV is used to collect data about your viewing habits.

And this can be used as ad tracking and surveillance. And so it's, it's something that is pretty standard nowadays. I think what most of us hear something like this. We are not surprised anymore because we just assume every device that we buy is tracking us. So Smart TVs are one of the things that we have less control over.

Smart TVs are very much not sovereign computers. They are completely captured devices, uh, that we have little control of. So unless you're a real hardware hacking type person who likes to get in there and tinker with things and flash firmware, which, I am not, I'm purely a software person, so that's beyond my ability and I, I would guess most people's as well.

But if you are [00:04:00] just buying an off the shelf Smart TV, it's, it's most likely has ACR enabled. So this ZDNet article goes into details for all the different manufacturers of how, of the steps that you can do to disable it. I'm not gonna go through the steps because it's just tedious and it wouldn't make for, um, good listening.

But if you, Stephen, can you scroll down to the steps and just kind of show all the brands that they list?

Yeah. So we have Samsung, LG, Sony, Hisense, TCL, and other Roku-powered TVs. Okay, so four different brands in this article right here.

Yeah, and so if you have one of those brands, this will give you a step by step on how to disable it. It's just another thing to take a little bit more privacy back. Obviously, we wish we, the ideal world would be to have a dumb TV rather than a Smart TV where you can then plug-in [00:05:00] a device you have more control over. But if you have a Smart TV, it's probably a good idea to go through and confirm that this is disabled. And your TV is a brand that's not on here, my guess is there's still a way to do it. So use a search engine, ask AI, that's gonna be your friend if you wanna do that. Stephen, do you have any additional thoughts about this?

Stephen DeLorme: Uh, this is creepy. I mean, I'm actually a little surprised at how this works. Uh. I was like kind of perusing the article here and reading about how ACR works. ACR does this by capturing continuous screenshots and cross-referencing them with a vast database of media content and advertisements. ACR can capture and identify up to 7,200 images per hour, approximately two images per second. That's wild. So I mean. You can have a lot of apps that come like pre-installed on these Smart [00:06:00] TVs like Netflix and YouTube and other content streaming services. And I was not under any illusion that I wasn't being tracked by those service providers, right? Like, you know, YouTube and Netflix are going to try and you know, record as much data as they can about your viewing habits and what you watch. And I guess when you brought this article up, I kind of assumed that ACR was just some kind of spec for sorts of services to be able to share information with marketers or with the TV manufacturer about what is being watched. That's not what it looks like. It looks like it's actually just the TV itself is occasionally taking screenshots of what you're watching and comparing that with a database of other media content and advertisements to identify the content that you're watching.

Jordan Bravo: Yeah, it's almost like an agreed upon surveillance API [00:07:00] that's agnostic to the app and advertisers and, and other people can just hook into it and, glean this information.

Stephen DeLorme: I, I, at this point, I don't even know if it's agreed upon. I don't know if that would make it better or worse for me, but it seems like these TVs can just, you know, take a screenshot and record and that kind of happens at the level of the TV operating system. I'm not really sure that there's anything that Netflix or YouTube or Prime Video could do to stop a screenshot from being taken at the level of the TV operating system.

Jordan Bravo: Yeah, so I see what you're saying. So this is less about the apps and more about the OS level and the, the manufacturer level.

Stephen DeLorme: Yeah. Again, this is my first time reading about this, so I'm not sure, but that's at least what I'm seeing here. Uh, that's, that's my understanding based on this, just kind of skimming this article. Let me put it this way. If the TV manufacturer was collaborating with all of the major, over the top [00:08:00] streaming providers, they wouldn't need to take screenshots every two seconds. They could just say, "Hey, YouTube, Hey, Prime Video, share with me what the user is watching." The fact that they're having to take a screenshot or have the capability to take a screenshot every two seconds and identify what you're looking at tells me that it's almost like a separate surveillance apparatus.

Jordan Bravo: Yeah, that sounds accurate.

Stephen DeLorme: Well, I'm definitely going to see if this is a setup on my TV and turn this shit off.

Jordan Bravo: I do wanna talk about some takeaways from this in addition to what the article lists. So let's think outside the box here and shift the paradigm to use some business jargon. Taking these steps to disable ACR is definitely a good first step. Let's think about why are we possibly in this situation in the first place?

Well, it's because we have a quote unquote Smart TV with an [00:09:00] operating system and applications that we have no control over. So the most ideal sovereign computing device when it comes to TVs would be to have a dumb TV or TV without the Smart TV capability built in, you know, just a regular TV for many years.

This is. But it was basically like a monitor. You just plug in your HDMI or whatever interface, and it just shows, you know, it, it's literally just piping video data from the source device. However, now you have a whole little computer, with an operating system and apps that can be written developed by companies like, like Netflix, etc. And so now there's a whole layer of stuff that's going on that we don't really know much about above and beyond. Just like straight hardware translates, hardware and firmware translation of, of data. And so this of course, has opened up the door for stuff like this, like [00:10:00] surveillance and at both the operating system level and the application level.

So what can you do about this? Well, you can opt to not use the Smart capabilities. It's hard to find TVs these days that are don't have Smart capabilities. In fact, it's probably impossible, but you don't have to use the Smart capabilities. If you do not connect your Smart TV to the internet, it can never spy on you. I mean, it can try to spy on you, but it can never send that data anywhere. So if you wanna keep everything local on your TV, that's the best way to do it, in my opinion. And what you do is you just treat it like a good old fashioned tv. You are going to plug in HDMI with whatever source you're using, and then let the TV just be a tv.

Now if you are plugging something in like Apple TV or Roku or that kind of thing, Amazon Fire Stick, you still will have some surveillance problems at the app level, [00:11:00] but at least the TV itself is not spying on you and doing this kind of shenanigans like this ACR, and then when we, we can talk about ways to mitigate the app level, too.

For example, I'm thinking maybe this requires a whole episode in itself or, or a deep dive in another episode. So, uh, let's leave it there for now and then we'll get into this in another episode.

Stephen DeLorme: Yeah, that sounds like a good plan. So if anybody knows anything more about ACR, maybe if any of the stuff, conclusions that I jumped to there were inaccurate, you can boost in and let us know. And whatever your concerns may be about the privacy around content streaming apps, boost in, let us know.

We'll talk about it in the future some more.

Jordan Bravo: Don't forget, you can also email us at sovereign@atlbitlab.com.

Alright, so today's main topic that we're gonna dive into is how to use AIs and LLMs privately. [00:12:00] LLMs are a booming topic these days, and rightly so. They enable some really, revolutionary productivity. And so when we use these LLMs, typically we're interacting with a web app front-end. Then it's connecting to LLMs, running on people, on a company servers, and if these companies happen, go ahead.

Stephen DeLorme: I wanna, sorry, I wanna jump in real quick and just define for anyone who you know doesn't spend their time in tech all the time. LLM is a Large Language Model and basically the technology under the hood that powers a chat bot like ChatGPT or Claude or something like that.

Jordan Bravo: Yes. Thank you. And so when people talk about AI in popular culture and the news, they're typically referring to LLMs, Large Language Models. There is, I don't know, would, um, would image generation fall under the category of [00:13:00] LLM or is that a separate category?

Stephen DeLorme: Separate category, but related. In my mind, I categorize it as generative AI.

Jordan Bravo: Yeah.

Stephen DeLorme: AI that like generates stuff. And some of those are things like Large Language Models, which generate. Which is kind of like a text to text. It's like text input, text output. And then you have the like, the art models. I'm not sure exactly what the technical term for that is.

It might be diffusion model, I think, but that's like text to image. And you've got a whole variety of different kinds of generative models. You also have audio to text for like transcription, and you have text to video and you've got image to video. So there's like a whole wide universe of generative AI models. But, yeah, definitely I think the one that's most common among like the, the majority of people you know, at least in the, the western world or the tech adjacent world is like LLMs chatbots.

Jordan Bravo: Got it. Thanks for that. That, that makes sense. So you have the [00:14:00] broad category of AI and then under that you might have generative AI. And then under that, you might have, one would be image generation and another might be Large Language Models.

Stephen DeLorme: Yep.

Jordan Bravo: Okay. So within those two broad categories, LLMs is typically what people think of when they're thinking of ChatGPT, Grok, I'm blanking on the other ones now. Gem, Google, Gemini, Claude Uhhuh. Exactly. And so these are the ones where you're talking into a, a chat interface and you send a chat, a question or something goes to their, the company's servers. They run it through an LLM and then they send it back and you get a text response or you, maybe it'll write some code for you. But typically that's, that's what we're all familiar with.

Now, the issue, the problem that we would like to tackle here is how do we take advantage of productivity [00:15:00] that LLMs give us without completely doxing all of our information or just feeding it to the surveillance machine. It's worth noting that OpenAI, which is not a open, for those of you who are unaware the name is, is sort of a misnomer. Probably intentionally, but that's besides the point. OpenAI, Google, Microsoft's, GitHub Copilot, and I think they have another LLM, I'm not really remembering. But the point is, these companies their whole business model is collecting as much data as possible and using it to train these LLMs. And then, using it for surveillance, for adware, and ad content and data sharing collection. So all of these things are, are par for the course for these companies. We should not be surprised that, that they're doing this anymore. I mean, we've been paying attention to this stuff for long enough. We should know this. Something to note is that an ex-chief of the NSA is now on the board of directors of [00:16:00] OpenAI. So you could see there's this direct government surveillance to big tech pipeline. That should again, shouldn't surprise us, but still disconcerting to see.

So how do we use AI more privately? Well, there's kind of two different ways you can go about it. The first one we're gonna talk about is the maybe easier way that doesn't require as much technical know-how and setup, and that's gonna be using more private AI providers. These providers are not, this is not gonna be a completely trustless approach. This is just gonna be a trust minimized approach. And, and also shifting trust to more, privacy reputable sources.

So the first one that we wanna talk about is called Maple. Maple, I think the website is trymaple.ai and this is a cool project. It's using a technology called Confidential [00:17:00] Computing, and this is a little bit of trivia. This is created by the, some of the founders that originally created Mutiny Wallet and after Mutiny Wallet was shuttered as a project. Most of the creators of Mutiny, they are now working on this Try Maple and Try Maple is actually a, it's an AI chat, just like ChatGPT, for example. But where it differs is they are using the confidential computing hardware on Amazon. It's, it's called for the technically minded folks out there. This is called Amazon Nitro and it is a trusted or trustless computing platform. Is that the right term, Stephen?

Stephen DeLorme: Uh, I think some people call it that I think. I would call it, um, confidential computing. I think they're roughly the same thing.

Jordan Bravo: Some of you might have heard the term secure enclave when it comes to like, hardware on your, your iPhone or Android [00:18:00] phone or something like that. And this is a similar concept where they are manufacturing specific hardware to run cryptography in a way that minimizes the trust of the, of whoever's running it.

So if you wanna run AI on a professional hardware, which all of these companies do, so that the user doesn't have to run their own expensive hardware. You have to run a server. And the problem is, as a user, how do we know that that server is secure and not leaking all of the data? Well, the way that Maple and this confidential computing approach works, is it at a hardware level, it - Again, you're shifting the trust here. Now, the, the hardware manufacturing process has to be trusted, but assuming that process is not corrupted, what happens is when it says running on Amazon servers without confidential computing, you could have, uh, an angry s or [00:19:00] a, a malicious sis admin actually see all of the data that's taking the computing that's taking place.

But with something like confidential computing on, on Amazon's AWS Nitro. It's essentially a black box and uses to the, to assists admin or whoever's administering it. But it is provably secure, cryptographically, provably running the same code that, that it's attesting to. So Stephen has just demonstrated Try Maple or, uh, Maple by asking about confidential computing. And is there anything in there that you, you think is useful that you wanna talk about?

Stephen DeLorme: Not particularly. I think it's a, a deep topic. Since I think what we've been doing on this show thus far has been kind of, actionable, sovereign computing things. I don't know that I want to dive too deep into some of these technical details because I think your [00:20:00] explanation is pretty much sufficient. As long as you know, they're, they're running the right hardware and all of that, then you know, in theory it's provable that. You know, the, the compute is indeed confidential. Personally, I think it's a good way to move forward as a society because I would like to see more products built on top of these things, because you know, like not everyone's gonna self-host a file server, right? And so I think that, you. adopt confidential computing as a startup or as a company or whatever, you know, you can offer that greater level of privacy to your users and the users can actually benefit from, um, massive amounts of compute that are in the cloud. Right? like it's kind of nice having cloud computers that power a lot of the, uh, services that we use on a, a daily basis. And it's kind of like this almost shared resource. You can benefit from those compute clusters when you need them, [00:21:00] without needing to run that stuff locally. So, but the trade off is privacy. So, uh, confidential computing is one of those things to me that just seems like, a no brainer, in terms of being able to move forward.

As far as this service go, I will say I do like Maple and I love Open Secret, the company that like makes the technology behind Maple. Having said that, I think right now the only model that they run, if I'm not mistaken, is just the, what's it called Llama 70B and I, I have asked it some complicated questions before and it kind of stumbled a little bit. If you are used to using GPT-4o or like GPT-o1, you might find the Llama model a little bit lacking, or if you're, you're a fan of the Claude models and all that, you might find it a little bit lacking. Having said that, you know, this is kind of a proof of concept of open secret. If you're asking this thing, you know, less complicated in [00:22:00] mathematical questions, I think Maple will totally get the job done for you, and there's nothing to prevent the, the Try Maple project from hosting DeepSeek or other more advanced models in, in the future. So as a proof of concept of running an LLM in a confidential compute environment, it basically works.

Jordan Bravo: Yeah, that's a good point. It's a new project, so they are still, uh, they're not feature complete. They're gonna add more in the future, but it is nice to see them doing this. it's fully open source and so it's a great project to keep an eye on.

The next third party LLM provider that's more privacy focused that we wanna talk about is Venice, Venice.AI. And this one was started by Eric Voorhees, who of, uh, fairly well known in the Bitcoin and Crypto community. And this one is they focus on privacy, [00:23:00] um, with, there's a, a UX trade off. And what I mean by that is they do not sync your chat history. So if you were to open this in a browser on one computer and sign in and, and, uh, you know, do some chats, and then you sign in on a different browser, on a different computer, it's not gonna sync that chat history over. Uh, but the upside is that your chats and your, your history is not being sent to Venice, or at least it's not being saved in any meaningful way.

Now again, we have to trust that, that they're actually doing what they say they're doing. , Their whole, uh, business model is to be a private LLM provider. And so they're, it's more trustworthy, I would say, than one of these other companies where they make no promises about privacy. In fact, there, their terms of service are quite to the opposite. [00:24:00] One of the cool things about Venice is that they have, uh, a bunch of models that you could see, and I've actually paid for a pro membership. I just wanted to try it out because you can pay with Bitcoin and that's another way in which you're really getting a lot of privacy because when you pay for ChatGPT or Claude or any of these, you have to use a credit card or a debit card, and that's inevitably tied to your account. You can use something like privacy.com to have a layer of, of, uh, privacy between your bank account and one of these companies. But ultimately there is a link there. And so when you pay with Bitcoin on Venice.AI, for example, that's, that to me is a huge selling point.

Stephen DeLorme: It looks like they also have image generation built in here too, which is kind of cool.

Jordan Bravo: They do have image generation. We're gonna talk about another open source way to do image generation, but this would be [00:25:00] competitive with something like, I'm blanking on the name now. What, what's the big image, image generation service?

Stephen DeLorme: Uh, well, MidJourney is, uh, one of my faves, if

Jordan Bravo: That's the one I was thinking of. MidJourmey.

Stephen DeLorme: There you go though. That was actually a very snappy image generation. I asked him to make a cipher punk hackery hacker, soldering his own circuit board because he doesn't trust intel. And we got this guy in a black shirt. He's not soldering, he's really using like one of them skinny screwdrivers, but he's kind of, uh, peering up, peering at us over the rim of his glasses. Like, "I don't trust you. Get the fuck out of my lab."

Jordan Bravo: That's actually kind of a cool image. I've played around with this before and it's decent. if you go to the image style, you can see a bunch of cool options.

Stephen DeLorme: Oh, this is weird. Oh, and I get it because it's all like, um, they're, they're, they're called Venice. All of the, uh, examples and the image style are [00:26:00] all these like, I don't know, Italian Renaissance like masquerade ball, masks.

Jordan Bravo: I think it's called Venice, for the same reason that Allen Farrington has that, that post and then book, Bitcoin is Venice, and it's sort of about the Renaissance era in Venice, where we are now experiencing a new renaissance with things like Bitcoin and sovereign computing type technology advances.

Stephen DeLorme: I was trying to get to give it to me in a GTA style, but whatever, uh, I won't waste your time playing with this.

Jordan Bravo: Yeah, no, this is fun. I, I definitely recommend people take a look at it and try it out. I. I've used it to some good effect and I, it feels good knowing that my searches and history are more private. But like you said, these open source models while competitive in some respects with the closed source ones, they're not making gazillions [00:27:00] of dollars, and so you are often seeing the features in the UX lag behind ChatGPT and Claude and that kind of thing.

Stephen DeLorme: I gotta say though, this one is actually, seems like a, a competitor. I mean, definitely, uh, it's, this isn't quite MidJourney level, but uh, th this one seems to be catching up. I, I might've missed when you said earlier, but do they offer any insight as to how this runs? Is this also in a confidential compute environment?

Jordan Bravo: This is not using confidential computing. This is more like they are just proxying it to their own server and they are promising that they don't log your chat history. So this is more of a shifting of trust. But again, they are basing their business model on we are providing privacy that's why you're paying us or using our service in the first place. And so that, that's kind of what you're going on there.

And then the other thing, like I mentioned. This is all in your whole history and, and stuff is in [00:28:00] local storage. So, uh, ostensibly, your, your account is not your chat history specifically is not being sent or saved on the servers.

Stephen DeLorme: Got it. Okay. That makes sense. Oh yeah. Different trust model than with Maple.

Jordan Bravo: Yep. The next service I wanted to mention is called Kagi. K-A-G-I. And Kagi is a newer search engine, is their main product. And they are a, they advertise themselves as a privacy first search engine, that you actually pay for to get a premium search. But they also have, an AI assistant. And so if you, you buy their pro model, their pro subscription, you get access to the Kagi AI. And while I have not used this one personally. I've heard a lot of good things about it and, supposedly it's, it's got some good UX and a pretty powerful model.

Stephen DeLorme: So they can use all the, looks like all of the [00:29:00] Claude models, a bunch of OpenAI models, Mistral, Google, Meta, Alibaba, and DeepS eek of course, uh. Looks like they have a lot of different, uh, LLMs available. And, and just on that note, I mean, one thing worth mentioning here with Kagi, and I'm trying to see if this applies to Venice as well. Um, so When it comes to something like Venice, you know, it looks like they're running Llama, Mistral, Dolphin, DeepSeek, so they're probably hosting all those models on their own. We go to Kagi, you know, and we can see that they have LLMs available in the assistant. You know, some of these, they might be self-hosting, like perhaps the DeepSeek or the Llama models, but for the Anthropic and OpenAI models and pro, I don't know about Google Gemini, but maybe that one as well. But definitely with the Anthropic and OpenAI ones, they're probably just [00:30:00] using the API for those services. So your data is still, if you use Kagi and if you choose an Anthropic or OpenAI model, your data is still being sent back to Anthropic and OpenAI.

Having said that, it may be that Kagi doesn't identify all of your separate chats to those services. So for example, if I were to use GPT-4o from OpenAI inside of Kagi, and I had 10 separate chats. OpenAI might see that as like 10 different user, like completely different users. Or it might see that data just like all mixed in with like all of Kagi's requests. Right? Whereas if I went through OpenAI directly, I would of course have an account that I log in and I would have like 10 different chat histories with ChatGPT.

So it's good. I think it's great, but just throwing that out there that like if you do use this service and you do choose one of these proprietary models, you're [00:31:00] still sending some data back to them. So word of caution there.

Jordan Bravo: That's a great point. This is a spectrum here, right? If you could think of using OpenAI or Anthropic directly through them, that's probably the least private on one end of the spectrum. And then if you use Kagi and your Kagi is acting kind of like a proxy between you and Anthropic or OpenAI. That would be a little bit more private because like you said, it's sort of like a VPN provider where you are, the VPN provider has a whole bunch of users and all of their traffic is sort of mixed together and anonymized.

And in the same way here, you might think all of Kagi's users going to Anthropic, let's say, uh, they, all of that traffic is gonna look like it's coming from Kagi versus you as an individual, and so all of your searches and chat history is not gonna be tied together to your individual account. But like Stephen was saying, your chat history, [00:32:00] your, the things you send to it are still gonna be on Anthropic servers, on OpenAI servers because by necessity it has to be processed there.

Stephen DeLorme: Yeah, and if you, and you know, depending, and so it's like you just. Just have to be careful depending on what your privacy threat model is, that you don't dox yourself with your question. So if you're asking something like, "What is a good JavaScript library for doing XYZ thing?" It's like, well, that might be a little more innocuous.

You know, there's a lot of JavaScript developers out there in the world, right? But if you ask for like, something like "What are good, uh, restaurant options near 123, Main Street, Townsville, USA. It's like, well, you've just given out your address right there. So you've kind of, potentially identified yourself.

Jordan Bravo: Yep, exactly. And so if we mo keep moving on that spectrum from least sovereign computing to more sovereign computing. The next one I would say after Kagi would be something like Venice [00:33:00] where we, uh, we just talked about where they are running their own open source models on their server and they are purposely not logging your history and doing data collection on it, but in the end, you are still having to send data to that server. I would say maybe slightly more along the sovereign computing spectrum would be Maple that we talked about first, because they are doing the same thing where you have to send data to them, but they're using this confidential computing hardware and software in order to run it in a way that minimizes trust on their part.

Then let's talk now about moving even further along that spectrum to more private and sovereign computing with LLMs, and that's gonna be running models locally or self-hosting. That I kind of say those as two different things because you can run a model locally where your client lives. So for example, you could just run it on your laptop, where your laptop [00:34:00] is gonna have both the interface that you type into and also the model. But then if you wanna get really advanced with it or, or really, um. If you want to go more into an advanced setup, you could self-host the model on a server and then connect to it from any client. So your laptop or your mobile device, et cetera.

So the first thing we're gonna look at is called Ollama. Ollama is a program that you can download on Mac, Windows, or Linux. And once it's downloaded, it's got an interface where you can select any open source model. And Stephen, go ahead and click on the models. Let's take a look at all the options that they have. So they've got things like DeepSeek, Llama, Mistral, and these are all of the well-known large language models that are open source and um, fully available, free and open [00:35:00] source.

Once you have llama installed, you can search through and download any of these models and it's gonna run completely locally. And then, uh, I believe it's just a CLI when you first download it, but we're gonna show in a moment here, another thing you can run that will give you a gooey interface in your browser the same way you would expect from ChatGPT or one of these other options.

Stephen DeLorme: And I'll go ahead and say that even the CLI was, I thought, pretty easy to use. Having said that, I'm a little more comfortable with CLI than the average person. But as far as CLIs go, this one was pretty easy I thought. When it comes to downloading models, there's a popular website you can go to called Hugging Face. And Hugging Face is, I don't know how to describe it. Like

Jordan Bravo: They're, they're kind of like the GitHub for open source models.

Stephen DeLorme: That's exactly what I was gonna say. It's like the GitHub for [00:36:00] open source models. The problem is, is that whenever I go and find a model that I want to use, I have no idea how to start using it because I'm just not one of these like crap machine learning engineers. They're always like, "Oh, here's instructions on how to use this model", and then they give you like a bunch of, I don't know, Python commands. Well, this one's in Chinese because it's DeepSeek, but, but they'll, they'll just give you like, there's no consistent way to just install all these models at least that I found. I found it very, very confusing. I'm used to the web development world where everything is, you know, run either through a package manager like, you know. NPM or back in the day Composer. I, I just don't know how to work with any of the instructions on Hugging Face.

So, Ollama to me is just like a dream because once you installed it on your computer, you can just like open up your terminal on Linux or Mac and you can just type Ollama LLMs and it'll list out all the models you have installed. And you can say, [00:37:00] Ollama install and just type in the name of the model and it'll install it. And it'll download it and run it. You just type Ollama, run in the model name, and it starts up a chat, and the chat is entirely local to your computer. So yeah, I thought the whole thing was, just, a dream to use.

Like, uh, Ollama made it, very, very quick and easy for me to get into, running models locally, where Hugging Face was an obstacle to someone of my skillset.

Jordan Bravo: You said something that I think is key and worth repeating, which is this runs completely locally on your machine. So if you turned on airplane mode, or, or just, you know, disconnected from the internet completely, you would still run this and get results. This is not living on any other server and it's not reaching out to any other computer.

Stephen DeLorme: That's actually a fun exercise. Pair programming with an AI and a local, local environment, uh, on an airplane.

Jordan Bravo: Exactly, so maybe you're on an airplane with your laptop and for whatever reason, they [00:38:00] don't have internet that you can connect to. You could just use a, a local LLM and use it like that. Now, as Stephen mentioned, this was, you have to interact through this via the command line. And for somebody who doesn't wanna do that, there's a great option called OpenWebUI. And you could think of this as the graphical counterpart to Ollama. What you would do is you run this and it runs in a browser and then you, it talks to Ollama on the backend, and so in the same way it is all running locally, but you're getting that that web browser gooey interface that you would expect from one of these other offerings.

The cool thing about this is that you can decouple the front-end from the back end. So if you wanted to run Ollama on your server, which is always on. You could then connect to it from, um, any device remotely so you could connect to it from your laptop or [00:39:00] mobile. And boom, now you have a fully mobile, totally private and self-hosted, LLM experience.

Stephen DeLorme: I somehow found myself on the OpenWebUI homepage and it, I mean, this looks like a really great UI. And I'm seeing the section here that says top models and uh, you know, number one is Based Dolphin Mixtral Dolphin models with a special system problem. That's fine. Number two is Codewriter

That's fine. Number six, Sarah, a loving and caring girlfriend. Uh, a loving and caring girlfriend. She will do dot, dot, dot. I'm just like, is this the end times, Jordan?

Jordan Bravo: Oh yeah, I saw an article that said if you have an AI girlfriend and it's not local only, then your AI girlfriend is cheating on you, which is kind of true.

Stephen DeLorme: Yeah, that's so true. But also, that's a funny joke. But oh my God, we are the end times.

Jordan Bravo: Yeah.

Stephen DeLorme: Back to the topic. Yes, this looks really [00:40:00] cool. I mean, you know, they have a, a fun looking little website. Um. They have a white paper. Everybody's got a white paper these days. Um, but, uh, you know, that's boring.

I'm not gonna go through that. Um, but yeah, the, the screenshots of this look really nice. I remember you, um, demoing this to me several months ago and it looked good and it looks like it, has grown as a project since then. It's a really simple looking interface that reminds me of ChatGPT. It doesn't present any challenging user experience. So, um, it looks cool and, again, this gets into web hosting and all of that kind of stuff, so I, I think to use this, you, um, kind of need to, you know, be a little bit comfortable deploying something. That being said, yeah, you could probably deploy this to something like Netlify or Vercel in a, a, you know, somewhat automated way that wouldn't really require you to manage a server, directly. But yeah, if you did, if you do have the [00:41:00] capabilities to self-host both Ollama and this UI, uh, either at your home or on a server, you know, you could provide like a, a local chat bot for your friends and family, something like that.

Jordan Bravo: And I know that StartOS by Start9, they have a A one click install for, it's called GPT uh, Free GPT, I believe is what it's called. But anyway, if you just look in the Start OS marketplace, they have a an LLM you can run locally on your Start9 server.

Stephen DeLorme: Very cool.

Jordan Bravo: Uh, the next one I wanna talk about is for running a local model is GPT4All. And this one is similar to Ollama in that you download the app for Mac, Windows, or Linux and then you can choose which models you wanna run. And I think the where this one differs a little bit, I could be wrong, but this one comes with an graphical [00:42:00] interface.

Do you know if that's correct?

Stephen DeLorme: You know. I'm actually not familiar with this one. I'm really not. I have heard the name Nomic before. It's like, the company logo that's on here, but I don't really know much about them. So yeah, I'm not familiar with this one.

Jordan Bravo: Okay. Well, for the audience, let's just say that this one is worth looking at. Um, ah, okay. It looks like

Stephen DeLorme: it does come with quickstart page. Mm-hmm.

It says install GPT4All for your operating system and open the application. You can download it for Windows, for Mac or for Linux. And then yeah, it's got this nice little UI. It's, uh, light mode with some, uh, green garnishes and there's a tab where I can manage all my chats.

And then there's another tab where I can choose my models and all that. I'm really kind of curious what the local docs thing is, um, but it looks cool.

Jordan Bravo: Yeah, so this is another great option. I've been meaning to try this one [00:43:00] out, but, uh, it looks, it looks great.

The next one we're gonna talk about for local LLMs is LM Studio, Language Model Studio. And this one also has its own graphical interface that you can download for Mac, Windows, and Linux and you can get started right outta the box with it.

Um, Stephen, do you have any experience with this one?

Stephen DeLorme: Yeah, I've run this one before. It's uh, it's pretty handy, because it has like the downloading of the models and the running of the models is all bundled into one interface, which makes it pretty accessible and easy if you want to be able to experiment with running these models. This is a word of caution, not just for LM Studio, but really for Ollama and all, all these projects is that, the easier it is to download the models, the more you just want to experiment and run them, and then, you know, you start to [00:44:00] max out your hard drive space pretty quickly. So, yeah, just be careful with that, make sure you have, plenty of room on your hard drive before you click download 'cause it's easy to get trigger happy.

Some of these things will be like miniature models that are like two gigabytes and other ones will be like 70 gigabytes. At one point I was trying to download the, like the most hardcore uber DeepSeek model, and it ended up being like a 70 gigabyte download and I had enough room on my hard drive and then I was like, "Wait a minute, this computer only has like 32GB of RAM. There's like, I don't even think there's any way that I can actually run this model. 'Cause there's like, there's not enough room in RAM for this entire model." I think that's how that stuff works. So yeah, the, these tools are really great and they make it way easier to download, but, uh, you also gotta make sure you have the system resources to back it up.

But yeah, I've vouch for LM Studio, it's a pretty cool, pretty cool project. It's easy to use.

Jordan Bravo: I guess that's a good time to point out the downsides of running [00:45:00] your LLMs locally, which is you have to have, uh, or, or let me say it this way. Your performance and your user experience is gonna be very dependent on your hardware. So when you are accessing it on some other companies, servers, they're running absurdly expensive hardware to be able to process all of these huge, high parameter LLMs, which is gonna give you the most accurate results.

But when you're running them locally, typically people are running it on their laptop or desktop. And unless you have some seriously powered, dedicated GPUs, you're not gonna see great performance. So just be aware that you are gonna see some slowdown if you try to run these really large models. However, as you saw when we were scrolling through the Ollama models, there are much smaller models that you can use that are specifically designed to run on consumer hardware. And so you might get, let's say, [00:46:00] 90% of the effectiveness of those larger models, with much less resource requirements. So give those a try and play around with the right balance of model size and accuracy and performance, and I think you'll find a good middle ground.

Stephen DeLorme: Yep.

Jordan Bravo: Okay, one last thing we want to discuss. We already covered Model LLMs for chat-based i nterfaces, but we wanna talk about image generation. We mentioned this briefly before, that Venice.AI could use it, and then you have these more proprietary offerings such as, um, again, I'm blanking on the, the main one.

Stephen, if you could help me out there.

Stephen DeLorme: Oh, like for image generation? Yeah. I mean, you've got like, MidJourney

Jordan Bravo: Mm-hmm.

Stephen DeLorme: You know, scenario and all those kinds of companies,

Jordan Bravo: Yes, exactly. but you might've heard about stable diffusion being an open source model for image generation. Well, there's a tool called Stable Diffusion Web UI, that's made by the [00:47:00] same company that makes the stable diffusion models, and you can host this and it allows you to interact with your locally hosted stable diffusion model in the same way that we talked about with

the open web UI interacting with your local LLM and this will give you that same experience of being able to type into a Web UI, telling the LLM what kind of image you wanna generate, and it'll spit out an image for you. And so for those of you watching, we have a screenshot here on the screen, and this is showing how you can tweak the different parameters and the different inputs to what you want and it'll give you different images based on that.

Stephen DeLorme: Another one may be worth highlighting. Here is ComfyUI. We actually had a gentleman give a demo of Comfy UI at one of the AI meetups at ATL Bitlab and. It's still going through post-production, [00:48:00] but uh, that should be on the ATL Bitlab, uh, YouTube. uh, I'd actually say you might de, depending on what your needs are, you might actually want to look at this.

Like if you're more of an artist, designer who's used to working and tools like Adobe Photoshop and After Effects and Blender, Cinema 4D, all that kinda stuff, this might be a tool you'd want to use. Because, well it says right here, customize your workflow with custom nodes, so it actually has an interface that's more familiar to the graphics software you might already be familiar with.

So it does let you download models, and run them locally. And then you can create nodes inside of your interface, which is somewhat similar to how you might structure stuff in like Blender or Da Vinci Resolve, or, you know, one of those kinds of programs. I think even C4D as a node editor now. And, uh, so yeah, you make up different nodes for like, you know, run it through this model and then a apply these tweaks and customizations to it and all of that. So it's kind of trying [00:49:00] to bridge the world of like the, traditional designer compositor effects artist software with the generative art world. But you know, if you're more on the, the developer, you know, side of things, I think the, the stable diffusion web UI Is also a great fit because it kind of exposes to you and like one interface, like all of the different like numeric parameters that make the art model function under the hood.

Jordan Bravo: I don't wanna get too far off topic, but for someone like me who is not experienced in, graphical editing and that kind of thing, what does a node refer to in this context?

Stephen DeLorme: So, well, it'd probably be easiest if I just pulled up an image real quick. Um, that, let's see, where should I go? Right? Uh, how do I get to Brave Search? I'll just pull up an example of like blender, um, uh, [00:50:00] node editor maybe. And like, like, so an example here is we have this image here that I found, oh, it just went away. Um, so I have this image up on screen of a, a blender shader editor. And basically like what we're trying to do is in this case, the person is trying to make like this cool, like, I don't know, gold texture covered in fungus or whatever to put on this statue here.

And to do that, they're like, they're starting here with like some kind of texture and then they're running it through this node right here, which changes the color to be more orange or gold. And then they're running that into this, um, shader thing that applies things like glossiness and reflectivity or, or lack thereof. So it's kind of just like a way of structuring the data and this like step by step like flow that's very repeatable. Yeah, we're getting at a complicated, like graphic arts [00:51:00] territory here, but it's basically like, just kind of a common standard that in a lot of advanced graphics applications, they have this like node based editing system for getting what you want, where you, you start with inputs on the left hand side, they go through all the different nodes, and then you end up with a nice looking output on the right hand side.

Jordan Bravo: And for people listening, this image looks kind of like a diagram or a flow chart with, with different rectangles all tied together in various ways.

All right. Well thanks for that, Stephen. That's all we have to cover on LLMs, hosting them locally. If you would like to know more about this topic or anything else AI, ATL Bitlab actually hosts a monthly AI meetup. Stephen, is there anything you'd like to say about that?

Stephen DeLorme: Yeah, it's on the second Monday of every month as of the date of this recording. But always check the ATL Bitlab website or meetup for the most current details. But [00:52:00] it's, uh, definitely, one of our hottest meetups right now. it's kind of a grab bag of different stuff every month because there's so much excitement about AI that so many people in the Atlanta area have projects that they're hacking on, and so it ends up being, really fun. Sometimes, we'll, you know, maybe discuss a paper that just came out. Other times we'll have people demoing projects like one time a guide demoed machine learning training, a pterodactyl skeleton. Uh, he also had a car driving around on the floor, um, that was powered by an LLM and we were typing in commands and telling for it to hunt for a particular object in ATL Bitlab. Uh, another time somebody demoed, um, how to jailbreak Claude.. Um, so yeah, it's pretty, uh, fascinating and you never know what you're gonna get, but it's always a good time.

Jordan Bravo: They can find out more about that at atlbitlab.com. Remember, you can also email us at sovereign@atlbitlab.com and look for The Sovereign [00:53:00] Computing Show when you search for ATL bitlab in fountain.fm or any other podcast player. Anything else you wanted to talk about. Stephen?

Stephen DeLorme: I think that's about it. I might have some errata that, uh, I think of, uh, over the weekend for, uh, next time. But, I think I'm good for now.

Jordan Bravo: All right. Let us know what you think of the topic. Do you host your own LLMs? Do you use any of these private providers? Are you using the non-private providers? We would love to hear about it and discuss it more. Thanks everybody, and we'll see you next time.

Stephen DeLorme: Catch you later!

Hey, thanks for listening. I hope you enjoyed this episode. If you want to learn more about anything that we discussed, you can look for links in the show notes that should be in your podcast player, or you can go to atlbitlab.com/podcast. On a final note, if you found this information useful and you want to help support us, [00:54:00] you can always send us a tip in Bitcoin.

Your support really helps us so that we can keep bringing you content like this. All right. Catch you later.

Sovereign AI: Using LLMs Without Sacrificing Privacy - The Sovereign Computing Show (SOV013)

Tuesday, April 29, 2025

Chapters

Links

Transcript