Best settings for koboldai reddit. Discussion for the KoboldAI story generation client.

Best settings for koboldai reddit Go to KoboldAI r/KoboldAI • by jibraun. It's an abbreviation. Im using RTX 2060 with 6 VRAM, 32 GB Ram. If you're in the mood for exploring new models, you might want to try the new Tiefighter 13B model, which is comparable if not better than Mythomax for me. 5 temp as a starting point with a repetition penalty of 1. Post-Llama 2, everything is still a bit fresh but the recent Airochronos-l2-13B is promising. After some testing and learning the program, I currently am using the 8GB Erebus model. . This works fine I have tried some different sampler presets and settings, but to be honest i have no clue what i am doing :-) Can anyone recommed some good settings for sampler/DRY in Does anyone have any suggestions on setting up text generation and image generation in general? I have low consistency replies and image i'm running a 13B q5_k_m model on a laptop with a Ryzen 7 5700u and 16GB of RAM (no dedicated GPU), and I wanted to ask how I can maximize my performance. For Top-p you should use a value of around 0. use gpt4 or entirely other api such as kobold,mancer,novelai The NSFW ones don't really have adventure training so your best bet is probably Nerys 13B. Posted by u/emeraldwolf245 - 13 votes and 10 comments Hopefully someone more experienced than me will reply in detail, but I think that's a difficult question to answer because the response time will vary tremendously with the model ("13B or 20B" is a big variable difference), and your settings such as the length of reply to generate and number of tokens. Once you've got it installed check out the top bar. Everything else I've tired multiple settings, lowering max tokens etc, and the respond is just not good. I used Erebus 13B on my PC and tried this model in colab and noticed that coherence is noticeably less than the standalone version. I "Best" is subjective. There’s quite a few models 8GB and under, I’ve been playing around with Facebook’s 2. If I had the pick of all the models that have featured on the horde, my top pick would be the Emerhyst20B and the long one named something like 'RemmChatInverted. I like it the most because my stories have Personally I don't use any, but as far as I know they're situational and depend on the model you use, I think they were created especially for different models, so keep that in mind, what worked great for someone won't necessarily work for you if at the very least you don't use the same model This might help you regardless: https://github. KoboldAI is not an AI on its own, its a project where you can bring an AI model yourself. 0 TAU, 0. i set the following CLBlast is one of the slower solutions but on Windows its the best option for you thanks to the prompt acceleration we provide. Rest default, except I often crank the remembered tokens down a bit to free up enough memory to generate 3 responses What are the best settings to make the ai more coherent in skein 6b? Presets You can try these, and maybe adjust them to your liking. In a way, it's like ChatGPT but more advanced with Choose your preferred AI model, gameplay style, and settings to create the perfect AI experience for you. Try others if you want to experiment). A mix of different types and genres, story, adventure and chat, were created not to be rude to the other people on this thread, but wow do people routinely have no idea how the software they're interacting with actually works. Side note: I have 10 VRAM, 20b models run about 3 tokens per second on my machine. What I shouldn’t do is write something like “Story: fantasy setting with medieval elements. KoboldAI users have more freedom than character cards provide, its why the fields are missing. I can't think of any setting that made a difference at all. The most robust would either be the 30B or one linked by the guy with numbers for a username. " Some of the OPT models get wicked with the repetition penalty score too hi. 4. Sometimes thats KoboldAI, often its Koboldcpp or Aphrodite. Use the settings drop down and start tinkering. **So What is SillyTavern?** Tavern is a user interface you can install on your computer (and Android phones) that allows you to interact text generation AIs and chat/roleplay with characters you or the community create. But it is kinda rigid, and often gives the exact same responses on reroll. To run the 7B model fully from memory, estimated RAM needs for this is 32GB. It's a browser-based front-end for AI-assisted writing, storytelling and dungeon adventures. Top A. Not the (Silly) Taverns please Oobabooga KoboldAI Koboldcpp GPT4All LocalAi Cloud in the Sky I don’t know you tell me. I use Oobabooga nowadays). My issue is I successfully load the model but the blas only 256 half from default 512. This Subreddit focuses specially on the JumpChain CYOA, where the 'Jumpers' travel across the multiverse visiting both fictional and original worlds in a series of 'Choose your own adventure' templates, each carrying on to the next Try putting K sampling on. 7 Disable all other samplers. it is very advisable to use kobold cpp instead of kobold united if you using it for rp, as it is faster and not buggy as united. " but is capable of finding the exact value when appropriate. It will hence be renamed to KoboldAI Lite. sh file and name it something rememberable. I've experimented a lot and I'm still not sure what settings work best. A place to discuss the SillyTavern fork of TavernAI. Secondly, koboldai. For instance let's say I set top-k=2 and behind the scenes the AI evaluates the most likely words next: hat=35%, chair=22%, rug=15%, and so on. You can even combine LoRAs if you want to mix certain styles, settings and elements. Good contemders for me were gpt-medium and the "Novel' model, ai dungeons model_v5 (16-bit) and the smaller gpt neo's. Welcome to the Vault Hunters Minecraft subreddit! Here we discuss, share fan art, and everything related to the popular video game. 37! are 1. It can go from dumb to genius. Path of Titans is an MMO dinosaur video game being In today's AI-world, VRAM is the most important parameter. What happen when the same setting has different values in both programs? Which one the text generation will follow? Models finetuned entirely on English texts and LoRAs will be less likely to handle other languages even as well as the base models do. 11 Rep Penalty, 1024 Repetition Penalty Range Tokens, 322 Amount generation Tokens, and 1394 Context Size Tokens I'll take my temp higher when needed, but . 02. Maybe up the temperature a little. https://lite. My only experience with models that large, was that I could barely fit 16 layers on my 3060 card, and had to split the rest on them in normal RAM (about 19 GB), which resulted in about 110 seconds / generation (the default output tokens). Does something like this exist for KoboldAI specifically? (And before someone jumps in with "google it," I'm here because I Googled it and didn't see anything obvious. Selecting one from the dropdown list will change the model settings to match the presets Progress Bar: Above the submit button there is now a progress bar for text generation and model loading. 1 Rep pen range 1024 Rep pen slope 0. I've been trying: Neutralize all samplers. 75 and 2 should be default so just set the 3rd to 0. 7 Top-p 0. Round brackets are good for specifying technical info, like the exact height or weight of things. 1 ETA, TEMP 3 - Tokegen 4096 for 8182 Context setting in Lite. For 13b models, Manticore and HyperMantis are excellent. Choose one of the two. As a result most of the members and contributors of the KoboldAI community choose not to use these sites and opt for more privacy friendly solutions such as the KoboldAI UI itself or third party software such as Sillytavern. Context Size: 1124 (If you have enough VRAM increase the value if not lower it!!) Temperature: 1. 5 and Top K Sampling to 60-80. I need guide or advice for setting the 30b Erebus. 7B Novel/Adventure hybrid model out a free GooseAI trial is a good way to go! If your video card is an Nvidia, you should get good performance out of 13b models like those ones. To do this, on the page of the selected model, click on the "Copy model name to clipboard" square icon next to the model name highlighted in bold. 1 - L2-70b q4 - 8192 in koboldcpp x2 ROPE [1. 0 + 32000] - MIROSTAT 2, 8. ? If so then try this settings: Amount generation: 128 Tokens. I've been having good results using models based on Chronos or Hermes, and the model I'm using Mythologic L2, seems pretty good too. Discussion for the KoboldAI story generation client. If you open up the web interface at localhost:5001 (or whatever), hit the Settings button and at the bottom of the dialog box, for 'Format' select 'Instruct Mode'. Fully compatible with all horde models Pre-LLama 2, Chronos-Hermes-13B, Airoboros is also worth giving a shot. After spending the first several days systematically trying to hone in the best settings for the 4bit GPTQ version of this model with exllama (and the previous several weeks for other L2 models) and never settling in on consistently high I have also managed to seperate the settings files for the OpenAI/GooseAI models so you can define your favorite settings for each of them. Welcome All Jumpers! This is a Sister subreddit to the makeyourchoice CYOA subreddit. This is the place for discussion and news about the game, and a way to interact with developers and other players. Generally a higher B number means the LLM was trained on more data and will be more coherent and better able to follow a conversation, but it's also slower and/or needs more a expensive computer to run it quickly. Xwin-Mlewd is an experimental merge of layers from xwin, and Mlewd, which itself is made from Pygmalion, slerp, Huggin, kimiko, and several other LoRAs- all of which are essentially 100% English language. So here's a brand new release and a few backdated changelogs! Changelog of KoboldAI Lite 9 Mar 2023: Added a new feature - Quick Play Scenarios! Created 11 brand new ORIGINAL scenario prompts for use in KoboldAI. S. More posts you may like r/LocalLLaMA. Their answers are always longer, better, and less repetitive. This is the part i still struggle with to find a good balance between speed and intelligence. Top 6% Rank by size . What are the best settings for Mixtral? Should I leave VRAM for context, or should I offload as many layers as possible? I am using Q5 KM quant and these hardware: 32GB DDR4 3400 RX 7600 XT 16GB R 5 5600 I saved a custom settings for it within koboldcpp, and it works very well. View community ranking In the Top 10% of largest communities on Reddit. 10K subscribers in the KoboldAI community. When it's ready, it will open a browser window with the KoboldAI Lite UI. Sure! I personally use Calibrated Pyg 6b. Now things will diverge a bit between Koboldcpp and KoboldAI. 33B airochronos is way faster (on CPU) than 33B airoboros, more flexible, and gives varied responses that Not personally. i'm going to assume your KoboldAI is One FAQ string confused me: "Kobold lost, Ooba won. The Best Community for Modding and Upgrading Arcade1Up’s Home Arcade Game Cabinets, A1Up Jr. 2, putting it a little lower to The article is from 2020, but a 175 billion parameter model doesn't get created over night. Here is a response that I got for Q4 with Temp 3. I just wanna use my phone and do nsfw stories, but no matter how many time I try the text just get repetitive and not the ones I want to write, can My settings for Q4, Q5, and Q6 worked for me, each delivering ideal/good/ideal output respectively. For example, if I want to write a fantasy story, I will write a few paragraphs to start the story in the style I want. Top-k is only taking hat and chair since they're the top two as my top-k=2. 8) Temperature around 1-1. The default "disabled" value for those settings are: 0, 1, 1, 0. Rope Scale = 0. The Active one Characters and the second one Settings, click on settings. There may also be an ETA shown somewhere if I can find a good spot to put it Help: We now have a simple wiki available with some topics populated. Depending on what your use-case is, you could also consider Mythalion, Tiefighter, or one of their variants. It seems for every model you try, there's some settings it likes more than others. /r/GuildWars2 is the primary community for Guild Wars 2 on Reddit. It doesn't have repetition at all, and doesn't speak for your character at all if you get your settings real good. What you're looking for is 'streaming. Since TPU colab problem had been fixed, I finally gave it a try. With the settings: 0. 9 Rep pen 1. For 13B airoboros had been my favorite, because it follows instructions/character cards best of all the 13B models I've tried and gives very well written responses. 73. If you want to try the latest still-in-development stuff, 4bit/GPTQ supports Llama (Facebook's) models that can be even bigger. No idea about chat, but i use skein 20b for writing assistant. For Erebus 13B that The defaults are decent. CPU. 95 temp, 1. Settings are saved in a new format as well You can also save your settings in the old UI by just changing something (practically anything) and it'll trigger a save as normal. Hit the Settings button. 1. I am serious when I say that 7B models would be be the easiest and best for now. top k like that just blocks the least probable words, whereas top k only allows the top say 40 words or whatever the setting is, and mirostat is near deterministic, it drastically lowers the number of possible Hey Everyone! The next version of KoboldAI is ready for a wider audience, so we are proud to release an even bigger community made update than the last one. They're both relatively easy to understand and configure while giving quality output, with top-a being a bit more advanced. In my experience, the 2. This setting allows for a more flexible version of sampling, where the number of words chosen from the most likely options is automatically determined based on the likelihood distribution of the options, but instead of choosing the top P or K words, it chooses all words with probabilities above a certain threshold. Reddit community and fansite for the free-to-play third-person co-op action shooter, Warframe. The Erebus/shinen models like "tail-free sampling. ' If you hit 'enable streaming' on Kobold and then in the generation settings on Tavern, it should start displaying your Q: Does KoboldAI have custom models support? A: Yes, it does. On top of that they force you to sign in, which means they have identifiable information that can be tied to the story. Given that these P40's are $200 each, and given that KoboldAI/pytorch is good at spreading the load, (and, big "given", that using PCI-e 1x to 16x riser cards won't starve the GPU cores), it should be possible to scale VRAM up for a reasonable amount of money. The model will mostly spit out more common language like "Bob is tall. 9 to 0. It generates really good dialogue for me, and writes first person good. It's been a while so I imagine you've already found the answer, but the 'B' version is related to how big the LLM is. I use SillyTavern as my front end 99% of the time, and have pretty much switched to text-generation-webui for running models. What I mean with top p is, you have settings like top k, and mirostat - and these reduce the possible range of words more than a top k of . 75, 2. a simple google search could have confirmed that. No need to change setting every day to get good result. If you don't want to screw around with all this, then Fimbulvetr and Stheno have default context sizes of 4K and 8K which are still at least as good as Tiefighter. If you can run 30b 4bit models, WizardLM-uncensored is quite good. 13B models should run fine. InfinityRP-v1-7B It's little, but damn it's good. com One thing you could try is playing with the sampler settings. Ok. /r/UsenetTalk is a subreddit for news and discussion about all things usenet: service providers and plans, text and binary newsgroups, newsreaders, binary posters/uploaders and downloaders, software configuration, troubleshooting etc. I only use kobold when running 4bit models locally on my aging pc. Also, since the notebook started its life as just a Mythomax notebook, the default model is Mythomax, so Best settings and tips for 20B-Erebus ? I want to take the best from the ai, i alrady struggle change from chat mode to adventure to add some actions, so the main question i want some tips or suggestions to get the most of the ai quality if possible, thanks! no setting gonna save you from gpt3. exe to run it and have a ZIP file in softpromts for some tweaking. I had surprisingly good results with just a few short stories and 30 minutes of training on a 7B model. Your KoboldAI IP address with /api behind it. 03, but "hand" could score 0. You can run any AI model (up to 20B size) that can generate text from the Huggingface website. 9 temp, ~0. I personally feel like KoboldAI has the worst frontend, so I don’t even use it when I’m using KoboldAI to run a model. 9-0. bat or. It was a decent bit of effort to set up (maybe 25 mins?) and then takes a decent bit of effort to run (because you have to prompt it in a more specific way, rather than GPT-4 where you can be really lazy with how you write the prompts and it still gets Thanks nice looked like some of those modules got downloaded 50k times so i guess it's pretty popular. There are some settings for Noromaid in particular that you can import in SillyTavern, though you can also play with newer things like Dynamic Temperature. Welcome to the official subreddit of the PC Master Race / PCMR! All PC-related content is welcome, including build help, tech support, and any doubt one might have about PC ownership. Edit: it's official! u/henk717 has blessed us with a subdomain for this webui. This menu is broken into three major areas: Home, Settings, and Interface In the Home tab you’ll see buttons to load the model, Create/Load/Save games, as well as a few other key settings (auto save and story mode). 5 Rope Base = 10000 Load Silly Tavern Under Kobold Settings unlock Context Size. bin and dropping it into kolboldcpp. You should then be able to chat with that character just like in Ooba if you do find a . Notice how the curves intersect in the graph at that point. Once you get your preferred settings for the model, throw the entire command string into a . 7B OPT model, and found it extremely good (mostly ‘only the start’, then it gets worse as it goes further with more text). I have tested several models on KoboldAI Lite, and can by personal experience say that Tiefighter and Psyfighter are among the best models for SFW and NSFW roleplaying on the site. 16/1. confusion because apparently Koboldcpp, KoboldAI, and using pygmalion changes things and terms are very context specific. KoboldAI United can now run 13B models on the GPU Colab! They are not yet in the menu but all your favorites from the TPU colab and beyond should work (Copy their Huggingface name's not the colab names). Good evening! I've come to this subreddit to ask about Horde mode with KoboldAI. Or check it out in the app stores Best of Reddit; Topics; Content Policy Just take the character description text from this pastebin, paste it into KoboldAI, and enable Chat mode. Set DRY to 0. And the AI's people can typically run at home are very small by comparison because it is expensive to both use and train larger models. i had my account prior to october so i have access to jllm, which is currently free to use to those who are given access and hopefully that won't change, but like i said, access to the llm is periodically given out in waves so you'll get access eventually (but we don't know when that'll be next), but you'll unfortunately just have to wait. Koboldcpp Setting for 30b Erebus . You can get Stheno here. 0. although some jailbreak might make it do less poetic, but it is not 100% if you dont like it at all. I was just wondering, what's your favorite model to use and why? I know there are explanations on the page for each but I want to know how your personal experiences with them compare. You could try Psy fighter. But for the big model, assuming it's really good , "car" might score 0. It uses Oobabooga as a backend, so make sure to use the correct API option, and if you have a new enough version of SillyTavern, make sure to check openai_streaming, so that you get the right API type. 95. Which models and settings are best for fantasy short stories and novels? Keep in mind I don't expect use KoboldAI to actually write the story, but I do hope it can at least help me get past the occasional mental/stumbling/writer's block and fill an otherwise blank page with a prototype paragraph or two I can tweak and correct. I'm using KoboldAI instead of the horde, so your results may vary. If Pyg6b works, I’d also recommend looking at Wizards Uncensored 13b, the-bloke has ggml versions on Huggingface. 11 Rep Penalty, 1024 Repetition Penalty Range Tokens, 322 Amount generation Tokens, and 1394 Context Size Tokens ST does not have any 'character' applied when you use it. Second of all, change the bot you are using. AI datasets and is the best for the RP format, but I also read on the forums that 13B models are much better, and I ran GGML variants of regular LLama, Vicuna, and a few others and they did answer more logically and match the prescribed character was much better, but all answers were in simple chat or story generation (visible in Top K, Top P, Typical P, Top A - All those samplers affect the amount of tokens used at different stages of inferencing. They were training GPT3 before GPT2 was released. Now we are in the start menu, firstly we need to connect TavernAI With a backend for it to work. If you can run a 13b you can probably also run a 20b, and MxLewd-l2-20b is very good for most purposes. 3 depending on the model and situation. 1 for me. Depening on settings and text amount it should take a couple of minutes or a few hours. Specs of your system,, model your trying to load, and your current settings would be most helpful. It handles storywriting and roleplay excellently, is uncensored, and can do most instruct tasks as well. As the other guy said, try using another model. most recently updated is a 4bit quantized version of the 13B model (which would require 0cc4m's fork of KoboldAI, I think. I also liked Kuro-Lotus-10. 2. It's been a while since I've updated on the Reddit side. I get replies within 10 seconds that are pretty darn good. Or check it out in the app stores Home; Popular; TOPICS Discussion for the KoboldAI story generation client. Then we got the models to run on your CPU. Use a Q3 GGUF quant and offload all layers to GPU for good speed or use higher quants and offload less layers for slower responses but better quality. I've recently installed the KoboldAI United snapshot from henk717's github page. 95, certainly no less than 0. Also worth noting is that OccultSage's cassandra model is currently a GooseAI exclusive, so if you would like this flexible 2. I’m literally reporting how the program seemed to work when I tried it. IDEAL - KoboldCPP Airoboros GGML v1. The more text you have, the better. Characters: Elrund the high elf, Gimli the dwarf. 8 (makes the model uncreative), nor more than 0. Ngl it’s mostly for nsfw and other chatbot things, I have a 3060 with 12gb of vram, 32gb of ram, and a Ryzen 7 5800X, I’m hoping for speeds of around 10-15sec with using tavern and koboldcpp. The system did Pyg 6b was great, I ran it through koboldcpp and then SillyTavern so I could make my characters how I wanted (there’s also a good Pyg 6b preset in silly taverns settings). "Car" won't feature in the top-10, so setting K=10 would prevent both for the smaller model. KoboldAI Settings (SillyTavern) I need some help here, can y'all recommend me what I should change? comments sorted by Best Top New Controversial Q&A Hey all, ive been having trouble with setting up Kobold ai the past few days. The Settings tab contains all the settings that affect text generation. At 13b, there are a lot of good options. From veteran players to newcomers, this community is a great place to learn and connect. Embarrassing note: These are the best models I've found for monster girl roleplays. At the bottom of the screen, after generating, you can see the Horde volunteer who served you and the AI model used. Deal is: there are many new models marked as "(United)" and I was wondering if any of these models have better AI dungeon like experien ce. Currently I go ~0. net Cumulative Recent Changelog of KoboldAI Lite up to 4 Feb 2023: Added a brand new AESTHETIC messenger-style UI for chat mode, which you can toggle in the settings. 95 is what i've done most of my character testing on, and it feels just about right for a lot of general use scenes. Whatever program you are using, it most certainly has settings to pick your context and to enable/disable Flashattention. Actually, I am new to chat AI in general, something I hadn't really looked at much but had my mind blown a few days ago and am now speeding down the rabbit hole. It's as easy as that to as far as I know. Which is going to be 0. i'll look into it. Though, just mess around with the settings and try it out for yourself. koboldai. You should see 2 Tabs. By settings you mean things like Temp. Quite a complex situation so bear with me, overloading your vram is going to be the worst option at all times. Editing settings files and boosting the token count or "max_length" as settings puts it past the slider 2048 limit - it seems to be coherent and stable remembering arbitrary details longer however 5K excess results in console reporting everything from random errors to honest out of memory errors about 20+ minutes of active use. P. If trying to use a card that has lots of extra outputs, like an rpg giving out stats and keeping track and changing them, it has to be one of the 70B models. It also Literally just added it to my Mythomax notebook here a few minutes ago. Right now this ai is a bit more complicated than the web stuff I've done. Seems to depend a LOT on the character. 50) Repetition Penalty: 1. I have 32GB RAM, Ryzen 5800x CPU, and 6700 XT GPU. I can attest to it being excellent with a breadth of options, addons and a sleek interface. Q2. The definitions of that character heavily impact how the AI responses work. Faster and more reliable performance: Running AI models locally on your own For GPT-NEOX 20B and Erebus 13B? Or just in general what settings do you use? The defaults are decent. r/KoboldAI: Discussion for the KoboldAI story generation client. These models don't ignore their non-human features. The model usually does a good job of following along and taking it from there. Your game saves will load normally in either the new or old ui, but if you save in the new ui those saves will be a new v2 save and will not load in the old ui. since your running the program, KoboldAI, on your local computer and venus is a hosted website not related to your computer, you'll need to create a link to the open internet that venus can access. But Kobold not lost, It's great for it's purposes, and have a nice features, like World Info, it has much more user-friendly interface, and it has no problem with "can't load (no matter what loader I use) most of 100% working models". AMD doesn't have ROCM for windows for whatever reason. But for some characters totally different values work better. true. json character you want to use in koboldai, paste /r/pathoftitans is the official Path of Titans reddit community. 37, 1. That's the most likely cause of the problem. Just enable Adventure mode in the settings and start your actions with You. So if it is backend dependant it can depend on which backend it is hooked up to. Depending if you're on winblows, Linux, or Mac. I think default is 512 so look for that in settings and raise it. 17 is the successor to 0. But as is usual sometimes the AI is incredible, sometime it misses the plot entirely. It seems to universally work better with K sampling on though. Last question, if anyone knows how to save the settings so I don't need to My overall thoughts on kobold are - the writing quality was impressive and made sense in about 90% of messages, 10% required edits. 1 rep penalty. Example: 127. Best models for Android devices . And better than openai because no shakespeare. 99 range (don't go up to 1 since it disables it), Top A Sampling to around 0. I've got 20 cores allocated to this machine, I also have 90gb RAM allocated to it, I've adjusted the settings for better CPU inference. When I tried KoboldAI last year the best I could run was 6B. 07. So I'm running Pigmalion-6b. BAT files or . , Top P, Top K, etc. 99 (makes it do nothing). Firstly, you need to get a token. As for long output, you can change that in settings. It will inheret some NSFW stuff from its base model and it has softer NSFW training still within it. net. The reason why the responses are not as good as openai is because it requires the context size to be lower. Occam's Vulkan will bring a speed increase and its currently KoboldAI is as its creators describe it "Your gateway to GPT writing". Edit: I’m really baffled at the downvotes. Is it just my imagination? Or do I need to use other settings? I used the same settings as the standalone version (except for the maximum number of As far as personal settings go i am not the guy to ask since the defaults are based on my settings. Repetition Penality In KoboldCPP Set context size to 8192 Under Tokens select custom rope scaling. net's version of KoboldAI Lite is sending your messages to volunteers running a variety of different backends. (1. 21, 1. 17 to avoid confusion. It's also possible that you didn't properly load the model on the Kobold UI by assigning "layers" to your GPU or you don't have enough VRAM for the regular 6B model and you should use the 4-bit version. Posted by u/dog_is_cat_now - 10 votes and 3 comments First off, I would recommend not using KoboldAi horde, and instead switch to KoboldAI proper (hosted on google colab if you cannot do it locally). What? And why? I’m a little annoyed with the recent Oobabooga update doesn’t feel as easy going as before loads of here are settings guess what they do. Don't put repetition penalty higher than 1. It depends a lot on exactly the type of stuff you're doing, and your hardware capabilities. Disable all other samplers. Set Min P to 0. Which is now available at https://lite. A place for DM's to discuss the module KoboldAI Lite is the frontend UI to KoboldAI/KoboldCpp (the latter is the succeeding fork) and is not the AI itself. So just to name a few the following can be pasted in the model name field: - KoboldAI/OPT-13B-Nerys-v2 - KoboldAI/fairseq-dense-13B-Janeway 8. There are some ways to get around it at least for stable diffusion like onnx or shark but I don't know if text generation has been added into them yet or not. *As long as a bigger model exists, you should transition to a bigger model size rather than quantize any bigger than 6-bit, if you're aiming for best quality for a certain model size. sh files. 8, 1. Yes the model is 175Billion parameters. Set Temperature to 2, Top P sampling in the 0. You can leave it alone, or choose model(s) from the AI button at the top. May we see your full generation settings on the Tavern UI? It's possible there's something on there that's messing with the AI responses. 7B models take about 6GB of VRAM, so they fit on your GPU, the generation times should be less than 10 seconds (on my RTX 3060 is 4 s). Untuned models, opt, fairseq are pretty good for generic tasks, the larger, the better (whatever fits into vram is best, but layering into system ram may be tolerable, depending on the use case) Reply reply the GPT3, which seems to be the best AI model out there according to what I've been reading from people like you guys People keep saying that, but the reality is that there are multiple differently sized models that are all GPT-3. Just mentioning a quick update for the horde webui v3 update, probably the last update for quite some time. Good news: I am not new to IT (Endpoint Engineer, broad background for security, spend a lot of time writing PowerShell) Bad news: I am new to running AI locally. I have a 4GB Nvidia card and an AMD card, and the only way I could load most models was to disable the Nvidia in settings before I started. 37 (Also good results but !not as good as with 1. I've been using ST with koboldcpp and I just noticed that in the koboldAI interface there are also settings like temperature, top p, repetition penalty and such which are the same as ST. Though, it consistent of a lot of direction from me, and writing things myself. Going to have to give us a bit more to go on, if you're wanting us to help troubleshoot. Pygmalion 7B is the model that was trained on C. , Virtual Pinball tables, Countercades, Casinocades, Partycades, Projectorcade, Giant Joysticks, Infinity Game Table, Casinocade, Actioncade, and Plug & Play devices. Maybe you saw that you need to put KoboldAI token to use it in Janitor. If anyone has any additional recomendations for SillyTavern settings to change let me know but I'm assuming I should probably ask over on their subreddit instead of here. 9K subscribers in the KoboldAI community. 95 top sampling and ~1. I'm new to Koboldai and have been playing around with different GPU/TPU models on colab. They are the best of the best AI models currently available. 1:5001/api Make sure you load a model in the normal KoboldAI interface first before using the api link. There's a finetune of manticore that just came out yesterday specifically geared toward chat, if that's your 11K subscribers in the KoboldAI community. I am now trying to accept the fact that either Dolly V2 3B or RedPajamas So as a first guess, try to split it 13 layers GPU, 19 layers in the RAM, and 0 layers disk cache (koboldAI provides a handy settings GUI for you to configure this). Each model has its own eccentricities. Log In / Sign Up; Advertise on Reddit; Shop Collectible Avatars; Get the Reddit app Scan this QR code to download the app now. When you import a character card into KoboldAI Lite it automatically populates the right fields, so you can see in which style it has put things in to the memory and replicate it yourself if you like. r/LocalLLaMA For windows if you have amd it's just not going to work. Thats just a plan B from the driver to prevent the software from crashing and its so slow that most of our power users disable the ability altogether in the VRAM settings. 3-0. You'll need either 24GB VRAM (like an RTX 3090 or 4090) to run it on GPU, or 32GB RAM to run it on CPU. I'm currently going to test running a 70b model with max context size (possibly) with the settings I already have setup, to see the tok/s. Temperature 0. Get the Reddit app Scan this QR code to download the app now. Saying this because in discord, had lots of Kobold AI doesn't use softpromts etc. 29, 1. In the Top right there Should be three lines also known as a burger menu, click on it. 16 we noticed that the version numbering on Reddit did not match the version numbers inside KoboldAI and in this release we will streamline this to just 1. Best of Reddit I would assume most of us have consumer GPU’s 8GB and under. ) I know you can find characters for other platforms and convert them (which is fine) but it seems like KoboldAI is more immersive than Ooba (with world memory and whatnot). It will be less creative, but at least not make mistakes. 17 votes, 22 comments. I've tried both koboldcpp (CLBlast) and koboldcpp_rocm (hipBLAS (ROCm)). There I heard from someone that RTX 3090 can run 13B, but if that's true I wonder if there are any optimization settings I need to enable and what my context limit would be. You must select the character from the character list (by clicking the namecard icon at top right). Question. Try starting with K sampling at 40, Temperature at 0. 72, top p at 0. Very good work to whoever made them, hope to see more soon. You can run it through text-generation-webui, or through either KoboldAI or SillyTavern through the text-generation-webui API. I can't tell you the setting name exactly since I don't have it running rn Your setup allows you to try the larger models, like 13B Nerys, as you can split layers to RAM. Tavern is very sensitive to bot definitions, and they can make or break your conversation. Tail Free Sampling - No idea. I'd recommend trying out top-p OR top-a. 0000001, On top of that they force you to sign in, which means they have identifiable information that can be tied to the story. 7B and Iambe-RP-DARE-20b-DENSE. So, my personal settings are: Calibrated Pyg 6b, 0. Then you just run one short Hey all. If you're willing to do a bit more work, 8-bit mode will let you run 13B just barely. 5 shakespearean soul. you can turn on MultiGen in Top-k takes the exact user defined number of what the AI sees as the most likely responses counting from top down. You can just delete the settings in your google drive if you want a full clear preset after messing around. Click the 2nd plug icon, select the KoboldAI API and hit the connect button when you have Koboldcpp running. Mythomax is reliable in most cases, although my current go-to is Xwin-Mlewd-13b. In the quick presets dropdown, select Godlike (Another user suggested this setting for writing and I found it works well for me. By consensus the best frontend for roleplay seems to be SillyTavern. i I've always liked adventure models and been using google colab for running kobold AI. GPU layers I've set as 14. Hi all! I already have a full-featured setup on my PC with Oobabooga and tons of extensions, but yesterday I installed koboldcpp on my Galaxy Note 10 via termux. Expand user menu Open settings menu. bwxa qotg jnvs vbq vdt txyvhc afaxw fzpbut usunh bnlkjqiu