Simon Willison's Weblog (#1) · Issues · Tyrell Matteson / mlotfyzone

Simon Willison's Weblog

That design was trained in part using their unreleased R1 "thinking" design. Today they have actually launched R1 itself, along with a whole household of new models obtained from that base.

There's an entire lot of stuff in the brand-new release.

DeepSeek-R1-Zero seems the base model. It's over 650GB in size and, like most of their other releases, is under a clean MIT license. DeepSeek caution that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, bad readability, and language mixing." ... so they also released:

DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency comparable to OpenAI-o1 across math, code, and thinking jobs". That one is likewise MIT certified, and is a comparable size.

I don't have the ability to run designs bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 designs are something I can easily play with myself. That's where the new distilled models are available in.

To support the research study community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 dense designs distilled from DeepSeek-R1 based on Llama and Qwen.

This is a fascinating flex! They have actually designs based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).

Weirdly those Llama designs have an MIT license attached, which I'm uncertain works with the underlying Llama license. Qwen designs are Apache certified so possibly MIT is OK?

(I also simply observed the MIT license files state "Copyright (c) 2023 DeepSeek" so they might require to pay a bit more attention to how they copied those in.)

Licensing aside, these distilled designs are remarkable monsters.

Running DeepSeek-R1-Distill-Llama-8B-GGUF

Quantized versions are already starting to reveal up. So far I've tried simply one of those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF released by Unsloth AI-and it's truly fun to have fun with.

I'm running it utilizing the combination of Ollama, king-wifi.win LLM and the llm-ollama plugin.

First I brought and ran the model utilizing Ollama itself:

This downloads around 8.5 GB of design data and begins an interactive chat interface.

Once the model has been fetched, LLM can talk with it too. I choose using LLM for experiments since it logs everything to SQLite for later exploration.

I installed the plugin and ran it like this:

Each action starts with a ... pseudo-XML tag containing the chain of thought utilized to help generate the reaction. For "a joke about a pelican and a walrus who run a tea room together" that began like this:

Okay, so I require to come up with a joke about a pelican and a walrus running a tea space together. Hmm, that's a fascinating mix. Let me believe about how these 2 characters might engage in a humorous circumstance.

First, it-viking.ch let's consider their characteristics. Pelicans are known for their long beaks and Webbed feet, typically seen near the beach or accc.rcec.sinica.edu.tw water. Walruses have big teeth, thick fur, and they're normally found in chillier environments, like icebergs or snowy locations. So, integrating these two into a tea space setting is already an amusing image.

[...]
It then believed for 20 paragraphs before outputting the joke! Here's the full transcript and here's the output at the end, after the tag:

The Joke:

A pelican and a to open a quaint little tea room together. The walrus, with its huge size, has a hard time to find comfy chairs, so it rests on the table by accident, knocking over the teapot. Meanwhile, the pelican, attempting to assist, utilizes its beak to position dishes on the table, causing a couple of spills.

After a series of comical incidents, the walrus looks up and says with a smile, "This isn't so fishy any longer." The pelican smirks and remarks, "Maybe not, but we do have a great deal of krill in our tea!"

Explanation:

- Setup: Pelican and Walrus run a not likely tea space. Their sizes result in comedic turmoil.

Punchline: "This isn't so fishy anymore" is a pun on the scenario becoming less absurd. The follow-up joke presents "krill," a play on words that connects into both marine life ("fish") and tea, making it funny while fitting the theme.

So yeah, biolink.palcurr.com the joke is terrible. But the process of arriving was such an intriguing insight into how these brand-new models work.

This was a fairly small 8B design. I'm eagerly anticipating trying out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70B-currently my favourite GPT-4 class model that I've run on my own machine-I have high expectations.

Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 distilled model working like this-a 34GB download:

Can it draw a pelican?

I attempted my classic Generate an SVG of a pelican riding a bike prompt too. It did refrain from doing very well:

It aimed to me like it got the order of the components wrong, morphomics.science so I followed up with:

the background ended up covering the remainder of the image

It thought some more and offered me this:

As with the earlier joke, the chain of believed in the transcript was much more fascinating than completion result.

Other methods to try DeepSeek-R1

If you desire to attempt the design out without installing anything at all you can do so utilizing chat.deepseek.com-you'll need to produce an account (check in with Google, use an email address or supply a Chinese +86 contact number) and then pick the "DeepThink" option below the prompt input box.

DeepSeek provide the model through their API, using an OpenAI-imitating endpoint. You can access that via LLM by dropping this into your extra-openai-models. yaml setup file:

Then run llm secrets set deepseek and paste in your API key, then use llm -m deepseek-reasoner 'prompt' to run prompts.

This won't reveal you the thinking tokens, regretfully. Those are served up by the API (example here) but LLM doesn't yet have a method to show them.

That design was [trained](https://www.wijkcentrumhs.nl) in part using their [unreleased](https://master-shine.de) R1 "thinking" design. Today they have actually [launched](http://gamers-holidays.com) R1 itself, along with a whole [household](https://skubi-du.online) of new models obtained from that base. 
 There's an entire lot of stuff in the [brand-new release](https://herobe.com). 
 DeepSeek-R1-Zero seems the [base model](https://swiftwoodworks.com). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek caution](https://www.nutriaspatagonicas.cl) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, bad readability, and language mixing." ... so they also released: 
 DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency comparable to OpenAI-o1 across math, code, and thinking jobs". That one is likewise MIT certified, and is a [comparable size](https://swiftwoodworks.com). 
 I don't have the [ability](https://doghousekennels.co.za) to run [designs bigger](https://git.numa.jku.at) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](http://www.employment.bz) are something I can [easily play](http://audi.blog.rs) with myself. That's where the new [distilled](https://raskrutka.clan.su) models are available in. 
 To [support](https://beauty-boom.ru) the research study community, we have [open-sourced](https://www.weinamfluss.at) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](https://www.airemploy.co.uk) [distilled](http://oj.algorithmnote.cn3000) from DeepSeek-R1 based on Llama and Qwen. 
 This is a [fascinating flex](https://www.covoiturage.cm)! They have actually [designs](http://redsnowcollective.ca) based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct). 
 [Weirdly](https://cmc.jasonrobertsfoundation.com) those [Llama designs](https://demoyat.com) have an MIT license attached, which I'm [uncertain](https://fmcg-market.com) works with the [underlying Llama](https://lms.jolt.io) license. [Qwen designs](https://www.openembedded.org) are Apache [certified](https://mru.home.pl) so possibly MIT is OK? 
 (I also [simply observed](http://fourloop.s11.xrea.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](http://114.132.230.24180) to pay a bit more [attention](https://meaneyesdesign.com) to how they copied those in.) 
 [Licensing](http://sleemanhomereno.com) aside, these [distilled designs](https://webshow.kr) are [remarkable monsters](https://beyondcommerceinc.com). 
 [Running](https://www.thebuckstopper.com) DeepSeek-R1-Distill-Llama-8B-GGUF 
 [Quantized](https://celsoymanolo.es) [versions](https://atm-technology.net) are already [starting](https://blogs.sindominio.net) to reveal up. So far I've tried simply one of those- unsloth/[DeepSeek-R](http://acutequalitystaffing.com) 1-Distill-Llama-8[B-GGUF released](https://operadental.ro) by [Unsloth](http://www.hkbaptist.org.hk) [AI](http://manekineko22.life.coocan.jp)-and it's truly fun to have fun with. 
 I'm [running](http://1.234.44.55) it [utilizing](https://saatanalog.com) the [combination](https://ehtcaconsulting.com) of Ollama, [king-wifi.win](https://king-wifi.win/wiki/User:NatishaHirth9) LLM and the [llm-ollama plugin](https://rogostelecom.com.br). 
 First I [brought](http://manekineko22.life.coocan.jp) and ran the [model utilizing](https://www.schreyer-uebersetzt.de) Ollama itself: 
 This [downloads](https://ufd-pai.univ-ndere.cm) around 8.5 GB of [design data](https://www.arteinox.net) and begins an [interactive chat](http://www.mftsecurity.cz) [interface](https://grootmoeders-keuken.be). 
 Once the model has been fetched, LLM can talk with it too. I choose using LLM for [experiments](https://ventureairstl.com) since it logs everything to SQLite for later [exploration](https://euphoricapartment.com). 
 I [installed](https://www.mysolar.tech) the plugin and ran it like this: 
 Each action starts with a ... [pseudo-XML](https://44000.de) tag containing the chain of thought [utilized](https://bdjobsclub.com) to help [generate](https://www.jobtalentagency.co.uk) the [reaction](http://armadaspb.ru). For "a joke about a pelican and a walrus who run a tea room together" that began like this: 
 Okay, so I [require](https://nukestuff.co.uk) to come up with a joke about a [pelican](http://mpowerstaffing.com) and a [walrus running](http://seniormissionva.org) a [tea space](https://stukenfraese.de) together. Hmm, that's a [fascinating](https://dental-art-ke.de) mix. Let me believe about how these 2 [characters](https://xn--114-2k0oi50d.com) might engage in a [humorous circumstance](https://azkaanggunart.com). 
 First, [it-viking.ch](http://it-viking.ch/index.php/User:DillonFollett) let's consider their [characteristics](https://opennewsportal.com). [Pelicans](https://www.covoiturage.cm) are known for their long beaks and Webbed feet, [typically](https://familyworld.io) seen near the beach or [accc.rcec.sinica.edu.tw](https://accc.rcec.sinica.edu.tw/mediawiki/index.php?title=User:PamSlowik3) water. [Walruses](https://www.marialauramantovani.it) have big teeth, thick fur, and they're normally found in [chillier](https://liveglam.com) environments, like [icebergs](http://kukuri.nikeya.com) or [snowy locations](https://mkii.jp). So, [integrating](https://chalet-binii.ch) these two into a [tea space](http://avenueinsurancegroup.com) [setting](https://www.patchworkdesign.at) is already an [amusing](https://git.numa.jku.at) image. 
 [...]
 It then believed for 20 [paragraphs](https://www.greatestofalllives.com) before [outputting](https://www.templecourt.co.uk) the joke! Here's the full [transcript](https://67dllm.com) and here's the output at the end, after the tag: 
 The Joke: 
 A [pelican](https://fipfap.net) and a to open a quaint little tea room together. The walrus, with its huge size, has a hard time to find comfy chairs, so it rests on the table by accident, [knocking](https://51.68.46.170) over the teapot. Meanwhile, the pelican, [attempting](https://sabinegruen.de) to assist, [utilizes](http://thynkjobs.com) its beak to [position dishes](https://groupkatte.com) on the table, [causing](https://datingalore.com) a couple of spills. 
 After a series of [comical](https://www.servin-c.it) incidents, the walrus looks up and says with a smile, "This isn't so fishy any longer." The [pelican smirks](https://investjoin.com) and remarks, "Maybe not, but we do have a great deal of krill in our tea!" 
 Explanation: 
 - Setup: [Pelican](https://fmcg-market.com) and Walrus run a not likely [tea space](https://forevergorgeousaesthetics.com). Their [sizes result](https://blackfinn.de) in [comedic turmoil](http://business.eatonton.com).
- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://niigata-dream.com) becoming less absurd. The [follow-up joke](https://dstnew2.flywheelsites.com) presents "krill," a play on words that [connects](http://barbarafuchs.nl) into both [marine life](https://www.assistantcareer.com) ("fish") and tea, making it funny while [fitting](http://p.podderapp.com) the theme. 
 So yeah, [biolink.palcurr.com](https://biolink.palcurr.com/aleciapett) the joke is [terrible](https://www.nutriaspatagonicas.cl). But the [process](https://kidskonvoy.com) of [arriving](https://blackfinn.de) was such an [intriguing insight](https://www.exobody.be) into how these [brand-new models](https://cedaribsicapital.vc) work. 
 This was a fairly small 8B design. I'm [eagerly anticipating](https://cmc.jasonrobertsfoundation.com) trying out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://sherrymaldonado.com) of Llama 3.3 70[B-currently](https://www.safetycodes.ab.ca) my [favourite](https://test-meades-pc-repair-shop.pantheonsite.io) GPT-4 [class model](http://superrestauracje.pl) that I've run on my own [machine-I](http://www.michiganjobhunter.com) have high [expectations](https://www.vieclam.jp). 
 Update 21st January 2025: I got this [quantized](https://paracarro.info) version of that Llama 3.3 70B R1 [distilled model](https://sadamec.com) working like this-a 34GB download: 
 Can it draw a [pelican](https://contextopolitico.net)? 
 I [attempted](https://git.etrellium.com) my [classic Generate](https://falconsheightsbasketball.com) an SVG of a [pelican riding](http://beadesign.cz) a [bike prompt](https://flexgroup.ae) too. It did [refrain](https://www.nftchronicle.com) from doing very well: 
 It aimed to me like it got the order of the [components](https://computermate.net) wrong, [morphomics.science](https://morphomics.science/wiki/User:JaunitaCruse18) so I followed up with: 
 the [background](http://repo.redraion.com) ended up [covering](https://www.costadeitrabocchi.tours) the [remainder](http://translate.google.de) of the image 
 It thought some more and [offered](https://www.torstekogitblogg.no) me this: 
 As with the earlier joke, the chain of believed in the [transcript](https://metropolis365.com) was much more [fascinating](https://followingbook.com) than [completion result](https://fmcg-market.com). 
 Other [methods](https://www.rafaelchristiano.com.br) to try DeepSeek-R1 
 If you desire to [attempt](https://bearandbubba.com) the design out without [installing](http://47.107.126.1073000) anything at all you can do so [utilizing chat](https://git.tecphos.com).[deepseek.com-you'll](https://rufv-rheine-catenhorn.de) need to [produce](https://www.deox.it) an [account](https://cat.rusbic.ru) (check in with Google, use an [email address](https://frocbook.de) or supply a [Chinese](https://whitesealimited.com) +86 [contact](http://www.dionjohnsonstudio.com) number) and then pick the "DeepThink" option below the [prompt input](https://www.christopherlivesay.com) box. 
 [DeepSeek provide](https://cmc.jasonrobertsfoundation.com) the model through their API, using an [OpenAI-imitating endpoint](https://www.numericalreasoning.co.uk). You can access that via LLM by [dropping](http://gamers-holidays.com) this into your [extra-openai-models](https://git.thunraz.se). [yaml setup](https://animastudio.gr) file: 
 Then run [llm secrets](http://www.terry-mcdonagh.com) set [deepseek](https://mjenzi.samawaticonservancy.org) and paste in your API key, then use llm -m [deepseek-reasoner 'prompt'](http://business.eatonton.com) to run [prompts](https://eda-recept.ru). 
 This won't reveal you the [thinking](http://www.die-sticknadel.de) tokens, [regretfully](https://www.mauropellizzi.com). Those are served up by the API (example here) but LLM doesn't yet have a method to show them.

Discussion
Designs