The Typescript x AI organizers and speakers: Luca Becker, Carl Assmann, Luisa Peter, Lucas L. Treffenstädt, Benjamin Behringer, Alexander Opalic, Johannes Loher
Talk Abstract
Nearly every AI-enabled product in 2025 has a summarizing function. But they all use 3rd party software to get the summary. This raises major privacy concerns and is prohibitive for most of us in the community. We display a small React/Typescript app that will enable you to summarize even longer text on your local machine or anywhere and a Large Language Model of your choosing. We motivate the app and show it in a live demo with local and remote LLMs. We discuss different approaches to prompting and the quality of summaries on various LLMs. We show the implementation and dive into the most interesting parts of the code.
LLMs are huge, slow to run and require terrabytes of RAM on the newest graphics cards, right? Well, maybe not. In this article we compare the performance of two popular current LLMs: llama 3.x and deepseek-r1 on a variety of consumer hardware from Laptop GPUs to dual Nvidia 5090s. We test quantized and full precision models and show which one can fit into the memory of your graphics card. We also test against Apples M chip series and explore what can be achieved with older but cheaper server hardware.
Test Setup
We test two of the currently popular LLM families, llama 3 and deepseek-r1, in quantized and 16bit floating point (fp16) versions for their performance on consumer hardware. We run the models using ollama 0.6.5 and open-webui as frontend. We collect the measured response tokens after executing the following query:
I need a summary of the book “War and Peace¨. Please write at least 500 words.
For each test, the model fits into the memory of the graphics card (VRAM). Mobile devices are allowed to cool down before each test run.
Tested Models
In this article we focus on two of the currently popular LLMs:
Llama 3, developed by Meta, represents the latest evolution in open-source large language models, building upon its predecessors with enhanced capabilities, improved efficiency, and greater contextual understanding. It comes in various sizes, typically ranging from smaller, more efficient models suitable for edge computing and personal devices, to larger, more powerful variants designed for complex tasks and extensive data processing. Its versatility makes it ideal for applications such as chatbots, content generation, coding assistance, and research. For the AI community, Llama 3 signifies a significant step toward democratizing advanced AI technology, fostering innovation, and enabling broader access to sophisticated AI tools and research opportunities.
DeepSeek-R1, developed by DeepSeek AI, is a cutting-edge large language model designed specifically to excel in coding and technical tasks. It comes in multiple variants, including models optimized for general-purpose programming, debugging, and software development assistance, making it highly versatile for developers and tech professionals. Its primary uses include code generation, error detection, automated debugging, and providing detailed technical explanations. For the AI community, DeepSeek-R1 represents a significant advancement in specialized AI models, enhancing productivity and accuracy in software development, and contributing to the broader adoption of AI-driven coding solutions.
Model
Variant
Precision
VRAM Size
llama3.2:1b
instruct
Q8_0
2.7GB
llama3.2:3b
instruct
Q4_K_M
4GB
llama3.1:8b
instruct
Q4_K_M
6.9GB
llama3.3:70b
instruct
Q4_K_M
49GB
llama3.2:1b
instruct
fp16
3.9GB
llama3.2:3b
instruct
fp16
8.5GB
llama3.1:8b
instruct
fp16
17GB
deepseek-r1:7b
qwen distill
Q4_K_M
6GB
deepseek-r1:8b
llama distill
Q4_K_M
6.9GB
deepseek-r1:14b
qwen distill
Q4_K_M
11GB
deepseek-r1:32b
qwen distill
Q4_K_M
25GB
deepseek-r1:70b
qwen distill
Q4_K_M
49GB
deepseek-r1:1.5b
qwen distill
fp16
4.2GB
deepseek-r1:7b
qwen distill
fp16
16GB
deepseek-r1:14b
qwen distill
fp16
32GB
Table 1: tested llms
Table 1 shows the tested LLMs. Ollama uses quantized models by default. If you specify a more detailed tag you can choose different quantizations as well as the non-quantized floating point versions. We select llama 3.x and deepseek-r1 for their popularity and start with the smallest versions parameter-wise. We continuously choose higher parameter models until they would not fit into any of the test systems VRAM. The last column in the table shows the size of the model when loaded into VRAM as reported by ollama ps. We can already see here that not all models will fit into the VRAM of the test systems from Table 2.
In our earlier post “Will AI Kill The Blogosphere?” we discussed the impact of AI on content creation in the internet and how to react to this change. Now we want to find out what the AI itself thinks of the subject and our previous article. We created a rebuttal using the following prompt:
Read the given article and write a response. Evaluate the premise and conclusion in detail. Draw your own conclusion on the subject. Be honest and don´t hold back. Try to provide value for the reader.
The following the Grok 3’s answer. Enjoy!
Response to the Article: AI’s Impact on Content Creation and Knowledge Sharing
The article “The Impact of AI on Content Creation and Knowledge Sharing” explores how artificial intelligence is reshaping the landscape of content creation and the dissemination of knowledge. Below, I’ll evaluate the article’s premise and conclusion in detail, offer my own perspective, and provide actionable insights for readers navigating this evolving terrain. Let’s dive in.
There is a whole business segment out there of people making money writing blogs. Either they earn money directly from ads, affiliate links, subscriptions or donations, or they grow the blog to a certain viewership and then sell it to someone who needs a platform to promote their product. There are marketplaces for this like Flippa and Motion Invest, to name two, and Youtube channels like Income School to teach you how to do it. And albeit the multiples for selling such a blog are relatively low at around 2-3, if you can grow the blog fast enough and maybe do more than one at a time, it’s still a profitable business.
At the same time, there are tens of thousands of users on Stack Overflow, answering the millions of new questions per year for free. These people are not getting paid but receive a different kind of reward: recognition. Apparently, a strong enough motivator.
But with the advent of AI and their liberal use of copyrighted material, or just plain piracy, both kinds of incentives might be threatened: If on one hand AI scrapes your content so people don´t need to visit your page to create clicks or recognize your name, and on the other hand the AI might create the kind of content you provide directly without needing your input, then why put in the time and effort to create content, especially when you depend on the ability to sell it for money? Consequently, there are various signs that content creation on Stackoverflow is dropping: 1, 2, 3, and also Blog sales are contracting.
So you want to follow the hype and generate some images with Stablility AI`s shiny new Stable Diffusion 3.5 model (SD 3.5). You find the model on Hugging Face, and hey, there is a code example to try it out. And it works!?
Absolutely not!
Inconveniently there are a lot more steps to take and considerations to make before your python script will generate an AI image, especially on consumer hardware with little VRAM. This article shows how to really do it, even on a laptop GPU with only 6GB of VRAM. As such it is an adapted collection of other material available on the web.
Docker Desktop just got more pricey again. Let’s explore some ways to replace at least part of its functionality like running docker containers and doing networking. This guide will be for the Windows operating system, as it is the one where users will most likely use Docker Desktop.
We will use the Hyper-V virtualization solution already present on Windows and show how to integrate your Docker Desktop replacement into your environment.
I bought me a Tesla Model 3 in August of 2023. I did it mostly because I was bored but also to take advantage of the government subsidies that were available at that time. Hey, if the government wants to spend my taxes, they should spend it on me. Right of the bat: The car is good. It has its flaws like most cars but also a lot of nice features to make up for it. Some quirks also arise from the new EV technology and neither Tesla nor the car itself can do anything to fix it. But there are strong opinions on EVs from both supporters or opposers of the technology. Here I want to share my observations on some of the most common prejudices and also share some learnings of my own.
So they sent you an Excel sheet with a bunch of contacts and two days later your call history looks like you are taking part in a code decipher challenge? Maybe you should have converted them to contacts in your phone? Ok, so you tried but that didn’t work: Apple Addressbook fails silently, you cannot trust the web services with customer data and after paying for some apps on the internet you discover that some of them cannot even open Excel sheets without failing. And of course no one will type hundreds of contacts into their phone manually.
But there is another way: just automate it yourself. It’s remarkably simple using Python. In this article we demonstrate how to read an address list from an Excel sheet in xlsx format (Excel Standard since 2007) and output vCard 3.0 files one can import to their phone or EMail/PIM app of choice.
Large Language Models (LLMs) like GPT-4 are trained on vast datasets that include man pages, readmes, forum questions and discussions, source code and other sources of command line tool documentation. Given a set of requirements one can query a LLM to predict a command line that will perform the required task.
please-cli is a wrapper around GPT-4 that can help you translate your requirements into a shell command. Let’s start with an example:
benjamin@asterix:~# please convert a.jpeg to avif and upscale it to 200% 💡 Command: convert a.jpeg -resize 200% a.avif
❗ What should I do? [use arrow keys or initials to navigate] > [I] Invoke [C] Copy to clipboard [Q] Ask a question [A] Abort
Well, looks promising and the code actually works. please-cli also gives you some handy shortcuts to immediately invoke or copy the code. You can also inquire directly about the command. In the following sections we will look at some other examples and wether we can find limitations of the script generation.
For weeks now I had a sync exception from my DAVx app, indicating a problem when syncing my CalDAV calendar from SOGo. As Thunderbird was not affected I managed to ignore the problem for quite some time. But then I noticed that some newer appointments are not synced anymore. So I had to investigate.
DAVx allows you to export debug information, where the error is shown as:
SYNCHRONIZATION INFO Account: Account {name=benjamin@example.com, type=bitfire.at.davdroid} Authority: com.android.calendar
EXCEPTION at.bitfire.dav4jvm.exception.DavException: Received multi-get response without calendar data at at.bitfire.davdroid.syncadapter.CalendarSyncManager$downloadRemote$1$1$onResponse$1.invoke(CalendarSyncManager.kt:9) at at.bitfire.davdroid.syncadapter.CalendarSyncManager$downloadRemote$1$1$onResponse$1.invoke(CalendarSyncManager.kt:1) at at.bitfire.davdroid.syncadapter.SyncManager.responseExceptionContext(SyncManager.kt:13) at at.bitfire.davdroid.syncadapter.CalendarSyncManager$downloadRemote$1$1.onResponse(CalendarSyncManager.kt:18) at at.bitfire.dav4jvm.Response$Companion.parse(Response.kt:308) ...
A few lines later the log file specifies the remote source as:
I tried to delete the calendar from my device and set it up again, but to no avail. After a bit of digging I got the impression that something was wrong with the calendar entry. Unfortunately the log file doesn’t say anything about the calendar entry other than the link. So I can’t look at the entry in Thunderbird or the SOGo web interface. But I am running my own SOGO instance so I am able to check the SOGo database entry.
SOGo organizes data in a series of tables named sogo${username}${hash}. The c_name of every table corresponds with the event name from the URL. So we can look for the event and it’s content in the tables:
MariaDB [sogo]>select c_content from sogobenjamin0010de46299 where c_name like'%2930%'; +-----------+ | c_content | +-----------+ || +-----------+ 1rowinset (0.005 sec)
Apparently the event just has an empty c_content field which is what threw off DAVx. As I don’t know anything else about the event and there is nothing to restore, I just deleted the entry from the table: