This looks pretty neat. Just spotted in the docs that it has an MCP server too, however, I haven't found anything in the docs about using a locally hosted model. Running this on a box in the corner of the office would be great, but external AI providers would be a deal breaker.
How does this compare to ingesting all your code into some RAG tool and using that in a chat? I understand the citations part, which is a cool feature indeed, but especially tools for graph-RAG, such as graphiti https://github.com/getzep/graphiti can deliver so much more information that can be stored in a graph versus the code-repository alone, such as info about collaborators, infrastructure, metrics, logs, etc. pp.
bshzzle 48 days ago [-]
You certainly could create an embedding of your code and then hooking it up to OpenWeb UI or equivalent as a chat interface - we've actually spoked to some teams that have rolled their own custom solution like that!
From a product POV: our main focus with Sourcebot is providing a world-class DX and UX so that it is really easy to use. Practically speaking, for DX: a sys-admin should be able to throw Sourcebot up into their cluster in minutes with minimal maintenance overhead. For UX: provide a snappy interface that is minimal and gets out of your way.
From a technology POV: vector embeddings (and techniques like graph-RAG) are definitely something we are going to investigate as a means of improving the agent's ability to find relevant context fast. Bringing in additional context sources (like git history, logs, GitHub issues, etc.) is also something we plan to investigate. It's a really fascinating problem :)
bravura 48 days ago [-]
I was very excited for a strong off-the-shelf code vector embedding search tool.
I wanted to encourage you to explore that direction, since it's a) very powerful, b) annoying to hand-roll, and thus c) sorely needed as open source.
cobbzilla 48 days ago [-]
Love this idea, docs are good I just need to read them better :)
Trying it out now. Keep it fully open source and nicely pluggable and I'll keep being a fan!
Yes, thanks! I opened an issue on your support site. I got stuck on a file ownership error when trying to mount local repos. Excited to try it if I can get it to work :)
cobbzilla 34 days ago [-]
I figured a late reply is better than none — I was able to get sourcebot running on my private Gitea repos, and it’s great! I appreciate the responsiveness from the devs!
perelin 45 days ago [-]
Just recently discovered Devins DeepWikis and love them. Same idea, talk to your repo, right? What does Sourcebot doe differently / better? https://deepwiki.org/
bshzzle 45 days ago [-]
Yea it's a similar idea - DeepWiki has the generated "wiki" part which we think is really cool (and maybe we'll add something similar in the future). The core chat experience is the same idea - we had some UX inspiration since we think they nailed the experience.
Deepwiki's context retrieval seems to be more sophisticated. I'm speculating, but I imagine they are using the generated wiki + embeddings which probably gives them higher recall over the codebase, vs. how we are using precision search.
Sourcebot has more "IDE" features built into the product like a file explorer and code navigation, which makes it easier to use the AI-generated answer as a jumping off point for further code exploration.
prepend 48 days ago [-]
So can I use Functional Source licensed code in internal products if I’m a commercial org?
msukkarieh 48 days ago [-]
hey I'm Michael (the other cofounder). If the products are purely internal[1] then you're able to use, modify, and distribute the code as you please (even if you're a commercial org). If you have any additional questions about the license feel free to reach out at license@sourcebot.dev
The Fair Source website is a great resource to learn more: https://fair.io/
[1] The only restriction on the code is that it cannot be used for a commercial product that substitutes for our software. We have a few teams that have connected Sourcebot into internal dev dashboards! This is 100% allowed by the license
hahaxdxd123 48 days ago [-]
I got this set up and working in basically 5 minutes. Going to try to set it up at work. Super cool! It seems like the open source version already has a bunch of features, how do you plan on making sure you can sustainably support it?
bshzzle 48 days ago [-]
awesome glad to hear! We are monetizing enterprise features like audit logging and SSO. The core product will remain free and under a FSL license.
I'm using OIDC SSO (via Pocket ID) just for my own sanity. I don't want or need multiple sets of credentials for my home lab applications.
skybrian 47 days ago [-]
Why not use a password manager instead?
cweagans 47 days ago [-]
That is an orthogonal solution to SSO. I have many apps in my home lab. It doesn't make sense to have individual credentials for everything, even if it is effectively free to keep track of them. Rotating dozens of passwords (even spread out over time) is not my idea of a fun day, nor is supporting individual logins for friends/family who use the apps in my network.
SSO is the quick and easy way, especially when other people are involved.
dchuk 48 days ago [-]
In reading the docs, it doesn't look like the MCP server supports the Ask Sourcebot capability. Is that correct or am I missing something in the docs? Is that planned to be added?
bshzzle 48 days ago [-]
Yea they are currently separate - the MCP server exposes out the same tools that Ask Sourcebot uses, but the actual LLMs call is on the MCP client. It would be interesting to merge them though - maybe have a Exa style MCP tool that lets MCP clients ask questions similar to how we are doing it with Ask Sourcebot.
Would be great to hear more about your use case though.
er0k 47 days ago [-]
congrats guys, this new feature looks really cool :)
witnessme 47 days ago [-]
I see you use the Zoekt project for code search. Why did you choose this over alternatives and how has been your experience so far?
bshzzle 45 days ago [-]
We went with Zoekt because it is full-featured (it's fast, supports regex, search filters, streamed search, etc.), and is a mature project. Sourcegraph, GitLab, and other large companies use it, so it felt like a safe choice. Overall our experience has been positive - maybe I'll write a blog post about it :)
pkz1234 47 days ago [-]
Just tried it, very cool!
cuzinluver 49 days ago [-]
Love that it’s free to use
Alifatisk 48 days ago [-]
I thought this had anything to do with Perplexity
bshzzle 48 days ago [-]
We used Perplexity as a mental mapping since there is some overlap, e.g., LLMs using search and citing its sources, it's a webapp, etc.
[1] https://platform.openai.com/docs/api-reference/chat
We are using the Vercel AI SDK which supports Ollama via a community provider, but doesn't V5 yet (which Sourcebot is on): https://v5.ai-sdk.dev/providers/community-providers/ollama
From a product POV: our main focus with Sourcebot is providing a world-class DX and UX so that it is really easy to use. Practically speaking, for DX: a sys-admin should be able to throw Sourcebot up into their cluster in minutes with minimal maintenance overhead. For UX: provide a snappy interface that is minimal and gets out of your way.
From a technology POV: vector embeddings (and techniques like graph-RAG) are definitely something we are going to investigate as a means of improving the agent's ability to find relevant context fast. Bringing in additional context sources (like git history, logs, GitHub issues, etc.) is also something we plan to investigate. It's a really fascinating problem :)
I wanted to encourage you to explore that direction, since it's a) very powerful, b) annoying to hand-roll, and thus c) sorely needed as open source.
Trying it out now. Keep it fully open source and nicely pluggable and I'll keep being a fan!
Thanks for the support!
Deepwiki's context retrieval seems to be more sophisticated. I'm speculating, but I imagine they are using the generated wiki + embeddings which probably gives them higher recall over the codebase, vs. how we are using precision search.
Sourcebot has more "IDE" features built into the product like a file explorer and code navigation, which makes it easier to use the AI-generated answer as a jumping off point for further code exploration.
The Fair Source website is a great resource to learn more: https://fair.io/
[1] The only restriction on the code is that it cannot be used for a commercial product that substitutes for our software. We have a few teams that have connected Sourcebot into internal dev dashboards! This is 100% allowed by the license
I'm using OIDC SSO (via Pocket ID) just for my own sanity. I don't want or need multiple sets of credentials for my home lab applications.
SSO is the quick and easy way, especially when other people are involved.
Would be great to hear more about your use case though.