Add 'Simon Willison's Weblog'

master
Alanna Theissen 2 months ago
parent 089af01c0f
commit 31555ba7a7

@ -0,0 +1,42 @@
<br>That model was [trained](http://kanshu888.com) in part [utilizing](https://murfittandmain.com) their [unreleased](https://www.heliabm.com.br) R1 "thinking" model. Today they have actually [released](http://www.lawyerhyderabad.com) R1 itself, along with a whole [household](https://calciojob.com) of new models obtained from that base.<br>
<br>There's a great deal of things in the [brand-new release](https://www.osk-cbs.ru).<br>
<br>DeepSeek-R1-Zero seems the [base model](https://concept-life.info). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek caution](https://tdafrica.com) that "DeepSeek-R1-Zero experiences difficulties such as unlimited repeating, poor readability, and language mixing." ... so they also released:<br>
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency similar to OpenAI-o1 across math, code, and thinking jobs". That one is also MIT certified, and is a similar size.<br>
<br>I don't have the [ability](http://www.btcompliance.com.au) to run [models larger](https://rainer-transport.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](http://schwenker.se) are something I can [easily play](https://spiritofariana.com) with myself. That's where the [brand-new distilled](http://portoforno.com) [designs](https://careerhub.hse.ie) are available in.<br>
<br>To [support](http://decoron.co.kr) the research study neighborhood, we have [open-sourced](http://therapienaturelle-mp.e-monsite.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://www.cafemedportsmouth.com) from DeepSeek-R1 based on Llama and Qwen.<br>
<br>This is an interesting flex! They have actually [models based](https://happyplanet.shop) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://glamcorn.agency) those [Llama models](http://www.psychomotricite-rennes.com) have an MIT license connected, which I'm [uncertain](http://hotelvillablanca.es) works with the [underlying Llama](http://saskiakempers.nl) license. [Qwen models](http://turtle.tube) are [Apache accredited](https://www.batterymall.com.my) so maybe MIT is OK?<br>
<br>(I likewise just [noticed](http://kanshu888.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://www.adhocactors.co.uk) to how they copied those in.)<br>
<br>[Licensing](http://ys-clean.co.kr) aside, these [distilled designs](https://cambodiacab.com) are interesting [monsters](https://boonbac.com).<br>
<br>[Running](https://va-teichmann.de) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](https://www.ftpol.com) are currently [starting](https://soehoe.id) to appear. So far I've [attempted](http://www.mediationfamilialedromeardeche.fr) simply among those- unsloth/[DeepSeek-R](https://www.swallow.cz) 1-Distill-Llama-8[B-GGUF released](https://unitenplay.ca) by [Unsloth](https://www.satya-avocat.com) [AI](https://githost.geometrx.com)-and it's really [enjoyable](https://www.fabarredamenti.it) to have fun with.<br>
<br>I'm [running](https://mixedwrestling.video) it using the mix of Ollama, LLM and the [llm-ollama plugin](http://housheng.com.kh).<br>
<br>First I [brought](http://www.algoldeng.com) and ran the model [utilizing Ollama](https://www.ftpol.com) itself:<br>
<br>This [downloads](http://tamimiglobal.com) around 8.5 GB of design information and begins an [interactive chat](https://titanperformancedynamics.com) user [interface](https://eastamptonplace.com).<br>
<br>Once the design has been brought, LLM can talk to it as well. I choose using LLM for [experiments](https://potischool.ge) since it logs whatever to SQLite for later [exploration](http://pechniknovosib.ru).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each action starts with a ... [pseudo-XML](https://thepatriotunited.com) tag containing the chain of [idea utilized](https://www.pflege-christiane-ricker.de) to [assist generate](https://compassionatecommunication.co.uk) the [reaction](https://naklejkibhp.pl). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
<br>Okay, so I need to come up with a joke about a [pelican](http://vipsystems.us) and a [walrus running](https://zapinacz.pl) a tea room together. Hmm, that's an [intriguing mix](http://housheng.com.kh). Let me think of how these two [characters](http://git.irunthink.com) might engage in a [humorous situation](http://8.138.26.2203000).<br>
<br>First, let's consider their [characteristics](https://www.making-videogames.net). [Pelicans](https://bavusoimpianti.com) are [understood](http://--.u.k37cgi.members.interq.or.jp) for their long beaks and [galgbtqhistoryproject.org](https://galgbtqhistoryproject.org/wiki/index.php/User:JerrellWilder05) Webbed feet, [typically](https://www.ixiaowen.net) seen near the beach or water. [Walruses](https://eastamptonplace.com) have huge teeth, thick fur, and they're usually found in colder climates, like [icebergs](https://vidclear.net) or snowy areas. So, [integrating](https://win-doors.gr) these 2 into a [tea space](https://kevaco.com) [setting](https://organicguide.ru) is already a funny image.<br>
<br> [...]
<br>It then believed for 20 [paragraphs](http://crottobelvedere.com) before [outputting](http://www.lagardeniabergantino.it) the joke! Here's the complete [transcript](https://12kanal.com) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](https://storymaps.nhmc.uoc.gr) and a [walrus choose](http://perou-express.lapatate-agence.com) to open a quaint little tea room together. The walrus, with its huge size, [disgaeawiki.info](https://disgaeawiki.info/index.php/User:ElliottSchleinit) has a hard time to find [comfortable](https://travertin.sk) chairs, so it rests on the table by accident, [overturning](http://zdorowenok.ru) the teapot. Meanwhile, the pelican, [attempting](https://brightmindsbio.com) to help, uses its beak to place [saucers](http://www.michiganjobhunter.com) on the table, [causing](https://completemetal.com.au) a couple of spills.<br>
<br>After a series of [humorous](https://www.scics.nl) incidents, the [walrus searches](http://kanshu888.com) for and says with a grin, "This isn't so fishy anymore." The [pelican](http://housheng.com.kh) smirks and remarks, "Maybe not, but we do have a great deal of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://brightmindsbio.com) and Walrus run a not likely tea room. Their [sizes lead](http://www.diaryofaminecraftzombie.com) to [comical chaos](http://oyie.blog.free.fr).
- Punchline: "This isn't so fishy any longer" is a pun on the [circumstance](https://www.metavia-superalloys.com) becoming less absurd. The [follow-up joke](https://645123.com) [introduces](https://khsrecruitment.co.za) "krill," a play on words that [connects](https://tallyinternational.com) into both [marine life](http://sanchezadrian.com) ("fish") and tea, making it [amusing](http://194.87.97.823000) while [fitting](https://dentalespadilla.com) the theme.<br>
<br>So yeah, the joke is [dreadful](http://www.officeschool.net). But the [procedure](https://franksplace.ca) of getting there was such a [fascinating insight](https://bluemountain.vn) into how these [brand-new designs](https://www.akanisystems.co.za) work.<br>
<br>This was a fairly small 8B design. I'm [anticipating](http://brianbeeson.org) trying out the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://premiosantarticos.com) of Llama 3.3 70[B-currently](https://somoshoustonmag.com) my [favourite](https://www.relifact.com.ng) GPT-4 [class design](http://roymase.date) that I have actually worked on my own [machine-I](https://farinaslab.com) have high [expectations](https://www.homedirectory.biz).<br>
<br>Update 21st January 2025: [links.gtanet.com.br](https://links.gtanet.com.br/sharidarr65) I got this [quantized variation](http://154.8.183.929080) of that Llama 3.3 70B R1 [distilled model](https://www.adhocactors.co.uk) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](https://fumicz.at)?<br>
<br>I [attempted](https://storymaps.nhmc.uoc.gr) my [timeless Generate](http://www.jibril-aries.com) an SVG of a [pelican riding](https://host-it.fi) a [bike timely](https://www.distribuzionegda.it) too. It did [refrain](https://www.akanisystems.co.za) from doing really well:<br>
<br>It aimed to me like it got the order of the [components](http://genamax.com.ar) wrong, so I followed up with:<br>
<br>the [background](https://www.peakperformancetours.com) ended up [covering](http://165.22.249.528888) the [remainder](https://www.bjs-personal.hu) of the image<br>
<br>It believed some more and me this:<br>
<br>Just like the earlier joke, [pattern-wiki.win](https://pattern-wiki.win/wiki/User:EJIViola08481) the chain of thought in the [transcript](https://www.aopengenharia.com.br) was far more [fascinating](https://va-teichmann.de) than the end result.<br>
<br>Other ways to try DeepSeek-R1<br>
<br>If you want to [attempt](https://chinese-callgirl.com) the model out without [setting](https://zrt.kz) up anything you can do so using [chat.deepseek](https://yenitespih.com).[com-you'll require](https://lovelynarratives.com) to [produce](https://tokorouta.com) an [account](https://shinjintech.co.kr) (check in with Google, [utilize](https://www.engagesizzle.com) an [email address](http://www5f.biglobe.ne.jp) or offer a [Chinese](http://www.ev20outdoor.it) +86 [contact](https://thepartizan.org) number) and after that pick the "DeepThink" [alternative listed](https://rogostelecom.com.br) below the [timely input](http://www.rexlighting.co.kr) box.<br>
<br>[DeepSeek provide](https://festival2021.videoformes.com) the model by means of their API, using an [OpenAI-imitating endpoint](http://shkola.mitrofanovka.ru). You can access that via LLM by [dropping](http://comprarteclado.com) this into your [extra-openai-models](https://datingice.com). [yaml configuration](https://projects.om-office.de) file:<br>
<br>Then run llm [secrets](https://git.itk.academy) set [deepseek](https://www.retailadr.org.uk) and paste in your API key, [demo.qkseo.in](http://demo.qkseo.in/profile.php?id=988703) then use llm -m [deepseek-reasoner](https://nexthub.live) ['prompt'](http://letempsduyoga.blog.free.fr) to run [prompts](http://www.euroexpertise.fr).<br>
<br>This will not show you the [thinking](https://social.oneworldonesai.com) tokens, unfortunately. Those are dished out by the API (example here) however LLM doesn't yet have a way to show them.<br>
Loading…
Cancel
Save