Decentralizing DevRel
DX @ HuggingFace: Scaling open source ML community 200% a year with No OKRs & No Meetings
Following the ZIRP DevRel post, the community has had many great discussions on where devrel needs to go next. DXTips exists to share this tacit niche industry knowledge. DX@X is our new async interview series we are starting with DevRel leaders to get more perspectives on the state of the art in DX and DevRel. We're excited to kick it off with Omar Sanseviero, Chief Llama Officer at HuggingFace!
HuggingFace is well known for being incredible stewards of the open source ML community, building critical infrastructure at hypergrowth (growing from 780k to 2.3m repos on the HF Hub in the past year) and doing so profitably. They are also a rare startup whose large online community also translates to massive multi-thousand people meetups all over the world.
Request for Suggestions: Who else would you like to hear from? Let us know on X/Twitter and join our Newsletter to get the next issue!
Introduction to Omar and HuggingFace
Intro: Hey Omar! Let’s assume people know the surface level of HuggingFace - it’s the largest AI community which collaborates on open source models, datasets, and applications, with paid compute and enterprise solutions. What does a Chief Llama Officer do at HF?
Memes! More seriously, my title might translate to “Head of Platform and Community” at another company, although the scope of what I do is quite broad. There are two aspects to my role:
- Leadership: Within HF, my role involves horizontal and vertical leadership. Vertically, I direct a family of teams (Dev Advocacy Engineering, On-device ML, Moonshot Factory, Argilla - our most recent acquisition). Horizontally, our team sits at the intersection of Open Source, Product, and the external community (+ sometimes research). In my day-to-day, I aim to identify high-impact potential areas, connect dots across teams at HF and the community, and unblock people to succeed.
- IC: HF has a very bottom-up leadership culture. This, combined with a meeting-less async culture (example, example, example, old HBS case study), allows folks in leadership positions to dedicate significant time to technical and meaningful contributions to different projects. A significant part of my role involves collaborating with partners to release new models, such as the latest Llama and Gemma models. Each release is unique, intense, and fun, and I quite enjoy being deeply involved in the entire process.
Followup: What do you think is under-appreciated about HF’s open source work?
Hugging Face is a community-centric company. It's hard to exaggerate how community-centric we are. Some examples:
- We prioritize giving the spotlight to community members and collaborators as much as possible.
- Provide compute and no-strings-attached cash grants (including but not limited to the $10m ZeroGPU program) to community members/communities (for example, in the past, Eleuther, Boris from Dall-e Mini, and lucidrains have been sponsored by HF, allowing them to keep doing their cool work without financial constraints).
- Help maintain open-source libraries (eg sentence transformers and bitsandbytes) from other groups and closely collaborate with other tools (eg LM Eval Harness and OpenCLIP)
Our approach to working with other groups and open-source platforms and libraries is always collaborative. We view ourselves as "the Switzerland" of the ML community, actively contributing to and supporting the ML ecosystem. We want the community to be successful and grow the pie.
So, one aspect of HF that I think is underappreciated is the extent of the support and collaboration with the community. Many see the outputs—like models and libraries—but might not realize the significant behind-the-scenes effort that the team puts into fostering the thriving ecosystem.
DevRel at HuggingFace: Metrics and Velocity
Credibility/Success: What are some “real” metrics that you track that point to HF’s devrel success?
DevRel comes in all kinds of flavors in the industry. Some DevRel teams are part of a marketing function, and some are within a monetization-team function. At Hugging Face, DevRel sits between the open-source and product organizations and is primarily an engineering function.
This means HF DevRel's goal is to see increased usage of the Hub platform and open-source tools rather than focusing on revenue as our primary goal. Two of our north stars are the number of repositories on HF and logged-in usage of the Hub platform. For example, the Hub has 2.3 million repositories today, compared to 780k repositories a year ago. (Of course, we can look at everything with more granularity, e.g., the number of Spaces, which grew from 187k in June last year to 650k this year).
Each team member works on different topics (e.g., Computer Vision, Audio ML, ML for 3D CV, etc.), so we jointly define some metrics that we would like to see move based on our efforts. We prioritize usage-based (number of repos, downloads, installs) over visibility-driven (GitHub stars, Twitter likes, views), which are also valuable but not the main motivation of our work.
That said, I'm skeptical of cultures that overemphasize metrics (of course, this is nuanced and depends on a lot of context). From my experience at Google and looking at other startups, I've seen the downsides of measuring too much too early. Metrics are an imperfect proxy for impact and are game-able. Cultures prioritizing metrics above all risk losing sight of user needs and making wrong decisions (e.g., to improve metrics for their performance review rather than genuine user benefit). Some DevRel activities might not have immediate metric changes but have long-term impact.
One of the most rewarding moments was after two years of building connections with Spanish-speaking folks, we initiated an exciting Alpaca translation effort involving Argilla, Platzi (a Colombian edtech), and many community super-users. This 'Avengers assemble' moment is becoming more frequent as we foster stronger relationships with practitioners, communities, and organizations. Examples of these are Zephyr ORPO (KAIST + HF + Argilla), QLoRA (University of Washington), and the very recent AI Math Olympiad winner (NuminaM + HF).
DevRel @ HF: What was your reaction to the ZIRP DevRel article? What’s different at HF?
As mentioned before, DevRel comes in all kinds of flavors. There were things that I could relate to. Specially two points:
- I'm a bit skeptical of the impact of traveling to conferences. While conferences can be impactful if approached strategically, the highest impact usually doesn't come from giving a talk. Instead, it's often the connections made and behind-the-scenes collaborations which require a different mindset. We support conference travel (and people do that a lot), but we encourage team members to attend with an impact-driven mindset, ready to achieve some concrete things beyond attending an event.
- Lack of OKRs. We do not have OKRs. The ML ecosystem moves incredibly fast, so we need to be nimble and action-driven. Gemini Nano was added to Chrome? Let's figure out how to run it and release some docs. Model 504B is coming out next month; let's make sure it's usable by the community. Although exciting, this comes with cons: priorities can and will change, planning becomes challenging, and maintaining focus can be difficult in the chaos of the current ML space.
That said, I think HF DevRel has been successful overall for a couple of reasons:
- It's an engineering-centric function. Day-to-day activities might involve fine-tuning models to get a training script right, collaborating on a research project, or finding out why a model is not quantizing well to 4 bits. Our users are engineers and researchers, so it's essential that we are in their world to understand them.
- It's a decentralized function. Although we have a dedicated DevRel team, everyone at HF is expected to do activities usually associated with DevRel. Although we have a DevRel team, everyone at HF, from research to success engineers, is involved in doing DevRel-like activities themselves, so you'll see everyone engaging in social media, creating content (youtube, blog posts, etc), giving presentations, etc. If you build a feature/tool, you're responsible of its visibility and growth (of course, with support/guidance from others). Marketing your own work could involve writing a blog post or a technical deep dive, crafting some beautiful notebooks, and yes, sometimes making memes. If you visit HF blog, you'll see content from all across the company. Rather than "outsourcing" these responsibilities to a third team (either a marketing or a DevRel team in many companies), HF members are encouraged and expected to own their work, end to end. FineWeb is an amazing example of how this can be successful.
The two points combined lead to very genuine and scalable relationships. Rather than a competitive culture, we've fostered a culture in which people are excited to collaborate both internally and externally and ready to amplify the amazing work being done by the community.
Followup question on DevRel Velocity: I notice that OKRs very rarely prioritize moving fast. What has worked/not worked in encouraging your team members to move fast (other than the obvious intrinsic motivation)?
- What works well: Beyond intrinsic motivation, which is indeed a strong factor, collective momentum plays a big role. Being surrounded by a group of smart, driven individuals working on the latest ML advancements creates an environment where progress is both expected and contagious. This collaborative atmosphere builds some sense of urgency and encourages everyone to push forward together.
- What does not work well: On the flip side, a lack of structured planning and clear OKRs can affect some people. While flexibility is desired a lot in the industry, it can lead to ambiguity and confusion about expectations, making it harder for new team members to get up to speed quickly. This can result in onboarding challenges and potential mismatches in cultural fit. Each team is a bit different, but there's a balance between agility and more structured goal-setting that can help everyone thrive.
Managing Open Source Community: Explore vs Exploit
Open Source Engineering and Community: HF maintains a -lot- of open source work, and only (~200?) employees. How do you organize the different projects you work on, and how does the community engagement work?
Yes, we're a relatively small team (215 persons), and maintain a large number of libraries ourselves: demos (Gradio), data (datasets, Argilla, distilabel), modeling (transformers, diffusers, timm, peft, Candle, tokenizers, accelerate, parler TTS, transformers.js), production (TGI and TEI), and research related (lerobot, alignment handbook), plus support community libraries (bitsandbytes and sentence-transformers and others as mentioned above).
There are some key strategies that have worked well for us
- Strong async culture. We mostly communicate through Slack and GitHub, enabling collaboration across different projects. This fosters transparency, allowing everyone to gain visibility into other projects.
- Flexible organizational and role boundaries. The organizational structure is flexible, allowing people from different teams to contribute where needed. For example, when we were preparing for Llama 2 release, people from all kinds of teams contributed to make sure the model was in good shape and usable by the community. It's quite powerful to see different teams working organically to make things happen without having to go through bureaucracy or process management.
- (other points mentioned above, such as being collaboration and community centric)
- Pragmatic. Let me dive into this one more in the next point :)
Prioritization: There’s a lot of interesting directions in ML and only so much time/resources. How do you decide -what- to invest in? And what to cut? Because you're decentralized - what do managers decide vs leave to ICs?
That's a great question and likely one of our biggest challenges. As you said, there are many interesting directions, and the ecosystem is changing quickly. We see new players, from new libraries and startups to new organizations releasing models.
In general, I like to apply the concept of exploration/exploitation from Reinforcement Learning. This involves two main stages:
- Exploration: We do small projects or comms to gauge their potential impact and community interest. This allows us to experiment without having too many people working on it or committing lots of time.
- Exploitation: Based on the knowledge gained from the exploration stage, we focus our efforts on things we are more confident will have a significant impact. This involves scaling up successful projects and allocating more resources to areas with proven value.
Of course, it's never as simple as that (the ϵ is variable), and it's usually cyclical (exploration -> exploitation -> exploration), but it's a good mental framework to have. Some projects are heavy in exploration by nature (for example, exploring a very niche domain or community), and others might require a larger time investment (which tends to happen in research-oriented projects).
The second point, related to the above, is pragmatism. That means being willing to pause or stop projects if they aren't having the expected impact. For example, investing days to make a YouTube video with a few hundred views may not be a worthwhile investment unless it leads to some very valuable or targeted outcome. It can be sad to spend some weeks building an open-source library and then see no engagement or adoption. What is worse, however, is to keep pushing and pushing for a tool that might lack product-market fit.
Failure is a part of the process for all of us. The key is to learn from it, understand what went well and what didn't, and know when to pivot or stop. This pragmatic approach helps us stay focused on what truly matters.
Followup Question: Let’s apply Explore-Exploit. Just to pick on a specific, visible example that has caught my (swyx’s) eye recently, VB (Vaibhav) has staked out a very notable position as “the audio guy” on AI Twitter. Always the first to have great takes on anything in audio, shipping insanely-fast-whisper and the TTS Arena, and goodness knows what else I don’t even know about. He of course also does other open source AI work eg on LLMs. Was there a top down decision to focus on Audio? It must have been… But I’m also equally sure that audio doesn’t drive nearly as much revenue for HF as, say, LLMs or Diffusion models (Apolinario). So… great hire, but how did you decide to invest in audio in the first place? Is there any calculation driven by the GTM/Product/Sales side of HF?
It might sound surprising, but audio (with VB) was a very validated area we wanted to invest in, while diffusion/art (Poli) was a very experimental area.
For audio, back in 2022, we saw a significant wave of OS libraries (SpeechBrain, ESPNet, Asteroid, etc) and interesting research (Whisper, XLS-R by Meta, etc). We were actively organizing community sprints with free GPUs to help people fine-tune speech recognition models in their languages. There was a lot going on that led to the decision to hire a DA for the role (apart from the MLE in the open source team already working on the topic). VB was working in audio in his masters and had already engaged with us through different efforts. Despite being somewhat junior in the open ML space, his very strong alignment with the open ML culture and mindset allowed him to scale his impact. Since joining, VB has expanded beyond audio, leading different collaborations and integrations, including recent work with Georgi on llama.cpp. Now, VB is even hiring an intern to support the ML ecosystem for audio!
For diffusion/art, Poli was our first Moonshot MLE hired to make "ML for art as accessible and open source as possible." This was before the hype around Stable Diffusion. We hired him because of his strong cultural alignment, his contributions to early HF Spaces and him being a Gradio super-user. At that time, while more experimental, the impact on Spaces and the potential of diffusion models (like latent diffusion by CompVis) showed promising signs. As a power (and somewhat early) user for Spaces, he also brought lots of product ideas on making Spaces more successful.
In summary, our decision to invest in audio was based on clear community and research validation as well as growth potential. In contrast, MLxArt was a more experimental exploration that showed early impacts and ended up being a very high impact area.
Sometimes both intercept! Talking about AI x music with will.i.am is definitely a highlight of last year.
Calls to Action: Insights, Tooling, Hiring, Research
Open Questions: What are you looking for help with? What questions do you want answered that would help you get to your “next level” (whatever that means to you)?
Hugging Face's core audience has traditionally been people with ML experience, but we've seen more and more developers without an ML background who want to incorporate ML into their projects or learn about ML. These developers often feel overwhelmed by the complexity of ML and the speed of the ecosystem. While the community has introduced new tools and APIs to simplify things, and we have exciting features coming soon, there's still much to be done to lower the entry barriers further. I'm looking for insights and suggestions on how we can make our tools even more accessible to non-ML developers. (Editor: some might call these AI Engineers?)
Additionally, we're expanding our team and are looking for individuals with strong developer empathy and technical skills based in the Bay Area. If you're interested or know someone who might be, please reach out!
Request for Startups/Tools: You recently acquired Argilla for collaborating on high quality datasets — what else do you wish people worked on? (that would be useful to the ecosystem from your POV)
Some topics I'm interested in (not necessarily for a startup):
In Research:
- more distillation experiments and OS tooling
- densification of sparse (MoE) models
- quantization (sub-1-bit for MoEs, <8-bit fine-tuning), tooling on speculative decoding strategies, more people trying the KTO alignment algorithm (which removes the need of preference data for RLHF/PPO/DPO)
- true multimodality (2+ input modalities and 2+ output modalities, e.g., text+image+video to text+image in a unified model)
There are trends in all of this already.
More generally: We want more developer-friendly ML tooling (i.e. making it super easy for any developer to use ML, not just LLMs). If you come from a background in which you can speak both the language of a discipline (biochemistry, chemistry, material sciences, health) and ML, and communicate and work well with both audiences, you're a unicorn and can do very impactful things, not just in the ML domain, but in other industries.
Thank you for your time, Omar!
CTA from DXTips: Who else would you like to hear from? Let us know on X/Twitter and join our Newsletter to get the next issue!