MLOps / Gen AI World is a unique collaborative event for the Ml/Gen AI community comprised of over 20,000 ML researchers, engineers, scientists and entrepreneurs across several disciplines.

Taken from the real-life experiences of practitioners, the Steering Committee has selected the top applications, achievements and knowledge-areas to highlight across the event.

Come expand your network with ML/GenAI experts and further your own personal & professional development in this exciting and rewarding field.

The MLOPS WORLD initiative is dedicated to helping promote the development of AI/ML/Gen AI effectively, efficiently, responsibly across all Industries. As well, to help practitioners, researchers and entrepreneurs fast-track their learning process and develop rewarding careers in the field.

MLOps / Gen AI World is a unique collaborative event for the Ml/Gen AI community comprised of over 20,000 ML researchers, engineers, scientists and entrepreneurs across several disciplines.

Taken from the real-life experiences of practitioners, the Steering Committee has selected the top applications, achievements and knowledge-areas to highlight across the event.

Come expand your network with ML/GenAI experts and further your own personal & professional development in this exciting and rewarding field.

The MLOPS WORLD initiative is dedicated to helping promote the development of AI/ML/Gen AI effectively, efficiently, responsibly across all Industries. As well, to help practitioners, researchers and entrepreneurs fast-track their learning process and develop rewarding careers in the field.

40 + Technical Workshops and Industry Case Studies

Renaissance Austin Hotel, Marriot 9721 Arboretum Blvd,
Austin, TX 78759, United States

Thursday, November 7th, 2024 - 9:00 AM to 5:00 PM
Friday November, 8th, 2024 -9:00 AM to 6:00 PM MDT

Guided and selected by our committee to ensure the lessons and takeaways across these two days will help you put more models into production environments, effectively, responsibly, and efficiently.

Trusted by over 20,000 ML/GenAI practitioners.

Co-located alongside

Pass Includes

Access to All Workshops

Private and Co-Working Space On-Site

Food, Drinks and Evening Events

All Video Content Included

Join us if you're working on

Business & Stakeholder Alignment

Topics to Be Covered

Deployment & Integration

Topics to Be Covered

Ethics, Governance & Compliance

Topics to Be Covered

Future Trends & Benchmarks

Topics to Be Covered

Infrastructure & Scalability

Topics to Be Covered

Introduction to MLOps & GenAI

Topics to Be Covered

Model Dev, Training, Architecture

Topics to Be Covered

Performance Optimization & Efficiency

Topics to Be Covered

Real world Apps & Case Studies

Topics to Be Covered

Security & Privacy

Topics to Be Covered
Thank you

Sponsors and Partners

Thank you to the sponsors of the 5th Annual MLOps World, taking place alongside the Generative AI Summit.

Platinum Sponsor

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Start-Up Corner

Community Partners

Interested in exhibiting/sponsoring? Contact faraz@mlopsworld.com for details.

Get inspired

We’ve planned 4 tracks to help tackle different perspectives, from speakers around the world;

Tracks include:

Industry Case Studies

Technical Workshops

Research and Technical Lessons

Business Alignment

Get skilled-up

Designed for anyone working with ML/AI

Pick the workshops and sessions
to hone your skills.

Get familiarity with:

  • Agentic Model Infrastructure for Scalability
  • Different Types of LLM Evaluation Methods and which work best for your use-case
  • Quantization, Distillation, & other Techniques for more Cost-effective, Efficient Model Hosting
  • Different Types of RAG Implementation Strategies

 

and MORE!

Explore the city. Build your community

Designed for anyone working with ML/AI

"I can barely begin to explain how much your invite changed my life.
Not only did I learn a TON at the conference but the knowledge I gained completely changed the trajectory of my career"

See 2024 MLOps/Gen AI World Steering Committee

Become a partner

Email for Brochure: faraz@mlopsworld.com

Talk: Open-Ended and AI-Generating Algorithms in the Era of Foundation Models

Presenter:
Jeff Clune, Professor, Computer Science, University of British Columbia; CIFAR AI Chair, Vector; Senior Research Advisor, DeepMind

About the Speaker:
Jeff Clune is a Professor of computer science at the University of British Columbia, a Canada CIFAR AI Chair at the Vector Institute, and a Senior Research Advisor at DeepMind. Jeff focuses on deep learning, including deep reinforcement learning. Previously he was a research manager at OpenAI, a Senior Research Manager and founding member of Uber AI Labs (formed after Uber acquired a startup he helped lead), the Harris Associate Professor in Computer Science at the University of Wyoming, and a Research Scientist at Cornell University. He received degrees from Michigan State University (PhD, master’s) and the University of Michigan (bachelor’s). More on Jeff’s research can be found at JeffClune.com or on Twitter (@jeffclune). Since 2015, he won the Presidential Early Career Award for Scientists and Engineers from the White House, had two papers in Nature and one in PNAS, won an NSF CAREER award, received Outstanding Paper of the Decade and Distinguished Young Investigator awards, received two test of time awards, and had best paper awards, oral presentations, and invited talks at the top machine learning conferences (NeurIPS, CVPR, ICLR, and ICML). His research is regularly covered in the press, including the New York Times, NPR, the New Yorker, CNN, NBC, Wired, the BBC, the Economist, Science, Nature, National Geographic, the Atlantic, and the New Scientist.

Talk Track: Virtual Talk

Talk Technical Level: 3/7

Talk Abstract:
Open-Ended and AI-Generating Algorithms in the Era of Foundation Models

Foundation models (e.g. large language models) create exciting new opportunities in our longstanding quests to produce open-ended and AI-generating algorithms, wherein agents can truly keep innovating and learning forever. In this talk I will share some of our recent work harnessing the power of foundation models to make progress in these areas. I will cover our recent work on OMNI (Open-endedness via Models of human Notions of Interestingness), Video Pre-Training (VPT), Thought Cloning, Automatically Designing Agentic Systems, and The AI Scientist.

What You’ll Learn
TBA

Talk: Building Agentic and Multi-Agent Systems with LangGraph

Presenters:
Greg Loughnane, Co-Founder, AI Makerspace | Chris Alexiuk, Co-Founder & CTO, AI Makerspace

About the Speaker:
Dr. Greg Loughnane is the Co-Founder & CEO of AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Since 2021 he has built and led industry-leading Machine Learning education programs. Previously, he worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and an ML researcher. He loves trail running and is based in Dayton, Ohio.

Chris Alexiuk is the Co-Founder & CTO at AI Makerspace, where he is an instructor for their AI Engineering Bootcamp. Previously, he was a Founding Machine Learning Engineer, Data Scientist, and ML curriculum developer and instructor. He’s a YouTube content creator YouTube who’s motto is “Build, build, build!” He loves Dungeons & Dragons and is based in Toronto, Canada.

Talk Track: Workshop

Talk Technical Level: 4/7

Talk Abstract:
2024 is the year of agents, agentic RAG, and multi-agent systems!

This year, people and companies aim to build more complex LLM applications and models; namely, ones that are ever-more capable of leveraging context and reasoning. For applications to leverage context well, they must provide useful input to the context window (e.g., in-context learning), through direct prompting or search and retrieval (e.g., Retrieval Augmented Generation or RAG. To leverage reasoning is to leverage the Reasoning-Action ReAct pattern, and to be “agentic” or “agent-like.” Another way to think about agents is that they enhance search and retrieval through the intelligent use of tools or services.

The best practice tool in the industry for building complex LLM applications is LangChain. To build agents as part of the LangChain framework, we leverage LangGraph, which allows us to bake in cyclical reasoning loops to our application logic. LangChain v0.2, the latest version of the leading infrastructure orchestration tooling, incorporates LangGraph directly, the engine that powers stateful (and even fully autonomous) agent cycles.

In this session, we’ll break down all the concepts and code you need to understand and build the industry-standard agentic and multi-agent systems, from soup to nuts.

What You’ll Learn
– A review of the basic prototyping patterns of GenAI, including Prompt Engineering, RAG, Fine-Tuning, and Agents
– The core ideas and constructs to build agentic and multi-agent applications with LangGraph
– ⛓️ Build custom agent applications with LangGraph
– 🤖 Develop multi-agent workflows with LangGraph

Talk: Unleashing the Algorithm Genie: AI as the Ultimate Inventor

Presenter:
Jepson Taylor, Former Chief AI Strategist DataRobot & Dataiku, VEOX Inc

About the Speaker:
Jepson is a popular speaker in the AI space having been invited to give AI talks to companies like Space X, Red Bull, Goldman Sachs, Amazon, and various branches of the US government. Jepson’s applied career has covered semiconductor, quant finance, HR analytics, deep-learning startup, and AI platform companies. Jepson co-founded and sold his deep-learning company Zeff.ai to DataRobot in 2020 and later joined Dataiku as their Chief AI Strategist. Jepson is currently launching a new AI company focused on the next generation of AI called VEOX Inc.

Talk Track: Research or Advanced Technical

Talk Technical Level: 3/7

Talk Abstract:
Prepare to have your understanding of AI capabilities turned upside down. Jepson Taylor presents groundbreaking advancements in the field of generative algorithms, where AI systems now possess the ability to invent and optimize their own algorithms. This talk explores how adaptive workflows can produce thousands of novel solutions daily, effectively automating the role of the AI researcher. Through engaging demonstrations, attendees will explore the vast potential of this technology to accelerate innovation across all sectors. Discover how these self-evolving systems are set to redefine the boundaries of what’s possible in technology and learn how you can start incorporating these concepts into your own work.

What You’ll Learn
Cutting-edge advancements in multi-agent systems and their role in driving AI innovation. The paradigm shift from prompt engineering to goal engineering in AI development. The power and potential of bespoke algorithms versus general-purpose solutions. How generative algorithms are revolutionizing the field of AI research and development. Practical insights into implementing automated innovation systems for rapid solution generation. Strategies for integrating self-evolving AI systems into various industry applications. Real-world examples and case studies of generative algorithms in action.

Talk: Optimizing AI/ML Workflows on Kubernetes: Advanced Techniques and Integration

Presenter:
Anu Reddy, Senior Software Engineer, Google

About the Speaker:
Anu is a senior software engineer working on optimizing Google Kubernetes Engine for techniques like RAG and supporting popular AI/ML framework and tools such as Ray.

Talk Track: Advanced Technical/Research

Talk Technical Level: 6/7

Talk Abstract:
Explore advanced technical strategies for optimizing AI/ML workflows on Kubernetes, the world’s leading open-source container orchestration platform. This session will cover techniques for integrating open-source AI tools, wide range of workflows, including training, inference, prompt engineering (RAG, agent), managing multi-cluster environments, and ensuring cost-effective resource utilization. Participants will gain deep insights into how Kubernetes supports flexible and scalable AI/ML infrastructure, with specific examples of using Kubernetes-native tools like Kueue for job queuing and Ray for distributed computing. The session will also highlight the use of NVIDIA GPUs, TPUs, and advanced workload management strategies, with Google Kubernetes Engine (GKE) as an illustrative example.

Foundation models (e.g. large language models) create exciting new opportunities in our longstanding quests to produce open-ended and AI-generating algorithms, wherein agents can truly keep innovating and learning forever. In this talk I will share some of our recent work harnessing the power of foundation models to make progress in these areas. I will cover our recent work on OMNI (Open-endedness via Models of human Notions of Interestingness), Video Pre-Training (VPT), Thought Cloning, Automatically Designing Agentic Systems, and The AI Scientist.

What You’ll Learn
– Advanced techniques for optimizing AI/ML workflows on Kubernetes
– Integration of open-source AI tools within Kubernetes environments
– Strategies for managing multi-cluster AI/ML deployments and optimizing resource utilization

Talk: Can Long-Context LLMs Truly Use Their Full Context Window Effectively?

Presenter:
Lavanya Gupta, Senior Applied AI/ML Associate | CMU Grad | Gold Medalist | Tech Speaker, JPMorgan Chase & Co.

About the Speaker:
I am Lavanya, a graduate student from Carnegie Mellon University (CMU), Language Technologies Institute (LTI); and a passionate AI/ML industrial researcher with 5+ years of experience. I am also an avid tech speaker and have delivered several talks and participated in panel discussions at conferences like Women in Data Science (WiDS), Women in Machine Learning (WiML), PyData, TensorFlow User Group (TFUG). In addition, I am dedicated to providing mentorship via collaborations with multiple organizations like Anita Borg.

Talk Track: Advanced Technical/Research

Talk Technical Level: 6/7

Talk Abstract:
Recently there has been a growing interest in extending the context length (input window size) of large language models (LLMs), aiming to effectively process and reason over long input documents, as large as upto 128K tokens (ie. ~200 pages if a book). Long-context large language models (LC LLMs) promise to increase reliability of LLMs in real-world tasks. Most model provider benchmarks champion the idea that LC LLMs are getting better and smarter with time. However, these claims are far from perfect in real-world applications.

In this session, we evaluate the performance of state-of-the-art GPT-4 suite of LC LLMs in solving a series of progressively challenging tasks on a real-world financial news dataset, using an improvised version of the popular “needle-in-a-haystack” paradigm. We see that leading LC LLMs exhibit brittleness at longer context lengths even for simple tasks, with performance deteriorating sharply as task complexity increases. At longer context lengths, these state-of-the-art models experience catastrophic failures in instruction following resulting in degenerate outputs. Prompt ablations also expose the unfortunate continued sensitivity to both the placement of the task instruction in the context window as well as minor markdown formatting.

Overall, we will address the following questions in our session:
1. Does performance depend on the choice of prompting?
2. Can models reliably use their full context?
3. Does performance depend on the complexity of the underlying task?

What You’ll Learn
Key learnings:
1. Complete understanding of the popular “Needle-in-a-Haystack” paradigm
2. Learning the shortcomings of the traditional “Needle-in-a-Haystack” setup and how to improve it for real-world applications
3. Can state-of-the-art long-context LLMs truly use their full context window effectively?
4. Are models able to perform equally well at both short-context and long-context tasks?

Talk: LeRobot: Democratizing Robotics

Presenter:
Remi Cadene, ML for Robotics, Hugging Face

About the Speaker:
I build next-gen robots at Hugging Face. Before, I was a research scientist at Tesla on Autopilot and Optimus. Academically, I did some postdoctoral studies at Brown University and my PhD at Sorbonne.

My scientific interest lies in understanding the underlying mechanisms of intelligence. My research is focused on learning human behaviors with neural networks. I am working on novel architectures, learning approaches, theoritical frameworks and explainability methods. I like to contribute to open-source projects and to read about neuroscience!

Talk Track: Virtual Talk

Talk Technical Level: 3/7

Talk Abstract:
Learn about how LeRobot aims to lower the barrier of entry to robotics, and how you can get started!

What You’ll Learn
1. What LeRobot’s mission is.
2. Ways in which LeRobot aims to lower the barrier of entry to robotics.
3. How you can get started with you own robot.
4. How you can get involved in LeRobot’s development.

Talk: From ML Repository to ML Production Pipeline

Presenters:
Jakub Witkowski, IT Expert, Roche Informatics | Dariusz Adamczyk, IT Expert, Roche Informatics

About the Speaker:
Jakub Witkowski, PhD is a data scientist and MLOps engineer with experience spanning various industries, including consulting, media, and pharmaceuticals. At Roche, he focuses on understanding the needs of data scientists to help them make their work and models production-ready. He achieves this by providing comprehensive frameworks and upskilling opportunities.

Dariusz is a DevOps and MLOps engineer. He has experience in various industries such as public cloud computing, telecommunications, and pharmaceuticals. At Roche, he focuses on infrastructure and the process of deploying machine learning models into production.

Talk Track: Virtual Talk

Talk Technical Level: 4/7

Talk Abstract:
In the pRED MLOps team, we collaborate closely with research scientists to transition their machine learning models into a production environment seamlessly. Through our efforts, we have developed a robust framework that standardises and scales this process effectively. In this talk, we will provide an in-depth look at our framework, the tools we leverage, and the challenges we overcome in this journey.

What You’ll Learn
– How to create framework for moving ML code to production
– What can be automated in this process (role of containerisation, CI/CD, building reusable components for repeating tasks)
– What tools are important for dev team
– What are most important challenges to tackle in this process

Talk: Large Language Model Training and Serving at LinkedIn

Presenter:
Dre Olgiati, Distinguished Engineer, AI/ML, LinkedIn

About the Speaker:
Dre is a Distinguished Engineer at LinkedIn, where he leads wide-ranging initiatives relevant to large model training, serving, MLOps and more.

Talk Track: Case Study

Talk Technical Level: 4/7

Talk Abstract:
In this talk, Dre will describe some of the fundamental challenges and solutions faced by the LinkedIn team as they build innovative products based on LLMs and agents.

What You’ll Learn
How do I build scalable training and serving solutions for large language models (LLMs)? What are the challenges in scaling LLM training and serving?

Talk: Measuring the Minds of Machines: Evaluating Generative AI Systems

Presenter:
Jineet Doshi, Staff Data Scientist/AI Lead, Intuit

About the Speaker:
Jineet Doshi is an award winning AI Lead and Engineer with over 7 years of experience. He has a proven track record of leading successful AI projects and building machine learning models from design to production across various domains, which have impacted millions of customers and have significantly improved business metrics, leading to millions of dollars of impact. He is currently an AI Lead at Intuit where he is one of the architects of their Generative AI platform which was featured on Forbes and Wall Street.

Jineet has also delivered guest lectures at Stanford University and UCLA on Applied AI. He is on the Advisory Board of University of San Francisco’s AI Program. He holds multiple patents in the field, has advised numerous AI startups and has also co chaired workshops at top AI conferences like KDD.

Talk Track: Case Study

Talk Technical Level: 3/7

Talk Abstract:
Evaluating LLMs is essential in establishing trust before deploying them to production. Even post deployment, evaluation is essential to ensure LLM outputs meet expectations, making it a foundational part of LLMOps. However, evaluating LLMs remains an open problem. Unlike traditional machine learning models, LLMs can perform a wide variety of tasks such as writing poems, Q&A, summarization etc. This leads to the question how do you evaluate a system with such broad intelligence capabilities? This talk covers the various approaches for evaluating LLMs along with the pros and cons of each. It also covers evaluating LLMs for safety and security and the need to have a holistic approach for evaluating these very capable models.

What You’ll Learn
The audience will learn why evaluating GenAI systems is fundamental yet it remains an open problem, a broad overview of different techniques for evaluating GenAI systems (including some state of the art ones) along with pros and cons of each, how other ML Practicioners are doing LLM evals and techniques for evaluating for safety and security

Talk: Generative AI Infrastructure at Lyft

Presenter:
Konstantin Gizdarski, ML Engineering, Lyft

About the Speaker:
Konstantin is an engineer at Lyft where he has worked on expanding the company’s capabilities in machine learning. Originally from Bulgaria, Konstantin grew up in the San Francisco Bay Area and attended Northeastern in Boston as an undergraduate.

Talk Track: Case Study

Talk Technical Level: 4/7

Talk Abstract:
In this talk, we will present the Gen AI infrastructure stack at Lyft.

We will talk about what components we used that were already part of our ML platform to support AI applications such as:
– model training
– model serving

Next, we will talk about some of the novel AI related components we built:
– AI vendor gateway
– custom clients
– LLM evaluation
– PII preserving infrastructure

Finally, we will share one or two use-cases that have been utilizing Gen AI at Lyft.

What You’ll Learn
You will learn how to evolve an ML Platform into an AI Platform.

Talk: Multimodal LLMs for Product Taxonomy at Shopify

Presenter:
Kshetrajna Raghavan, Senior Staff ML Engineer, Shopify

About the Speaker:
With over 12 years of industry experience spanning healthcare, ad tech, and retail, Kshetrajna Raghavan has spent the last four years at Shopify building cutting-edge machine learning products that make life easier for merchants. From Product Taxonomy Classification to Image Search and Financial Forecasting, Kshetrajna has tackled a variety of impactful projects. Their favorite? The Product Taxonomy Classification model, a game-changer for Shopify’s data infrastructure and merchant tools.

Armed with a Master’s in Operations Research from Florida Institute of Technology, Kshetrajna brings a robust technical background to the table.

When not diving into data, Kshetrajna loves jamming on guitars, tinkering with electric guitar upgrades, hanging out with two large dogs, and conquering video game worlds.

Talk Track: Case Study

Talk Technical Level: 4/7

Talk Abstract:
At Shopify we fine-tune and deploy large vision language models in production to make millions of predictions a day, and leverage different open source tooling to achieve this.
In this talk we walkthrough how we went about doing it for a generative ai use case at Shopify’s scale.

What You’ll Learn
a. Getting to Know Vision Language Models:

The Basics: We’ll kick things off with a quick rundown of what vision language models are and how they work.
Cool Uses: Dive into some awesome ways these models are being used in e-commerce, especially at Shopify.
b. Fine-Tuning and Deployment:

Tweaking the Models: Learn the ins and outs of fine-tuning these big models for specific tasks.
Going Live: Tips and tricks for deploying these models so they can handle millions of predictions every day without breaking a sweat.
c. Open Source Tools:

Tool Talk: How to pick the right open-source tools for different stages of your model journey.
Smooth Integration: Real-life examples of how we fit these tools into our workflows at Shopify.
d. Scaling Up and Speeding Up:

Scaling Challenges: The hurdles we faced when scaling these models and how we jumped over them.
Speed Boosts: Techniques to keep things running fast and smooth in a production setting.
e. Generative AI Case Study:

Deep Dive: A step-by-step look at a specific generative AI project we tackled at Shopify, from start to finish.
Key Takeaways: What we learned along the way and how you can apply these lessons to your own projects.

Talk: Toyota's Generative AI Journey

Presenter:
Ravi Chandu Ummadisetti, Generative AI Architect, Toyota

About the Speaker:
Ravi Chandu Bio (Generative AI Architect): Ravi Chandu Ummadisetti is a distinguished Generative AI Architect with over a decade of experience, known for his pivotal role in advancing AI initiatives at Toyota Motor North America. His expertise in AI/ML methodologies has driven significant improvements across Toyota’s operations, including a 75% reduction in production downtime and the development of secure, AI-powered applications. Ravi’s work at Toyota, spanning manufacturing optimization, legal automation, and corporate AI solutions, showcases his ability to deliver impactful, data-driven strategies that enhance efficiency and drive innovation. His technical proficiency and leadership have earned him recognition as a key contributor to Toyota’s AI success.

Kordel France Bio (AI Architect): Kordel brings a diverse background of experiences in robotics and AI from both academia and industry. He has multiple patents in advanced sensor design and spent much of the past few years founding and building a successful sensor startup that enables the sense of smell for robotics. He is on the board of multiple startups and continues to further his AI knowledge as an AI Architect at Toyota.

Eric Swei Bio (Senior Generative AI Architect): Boasting an impressive career spanning over two decades, Eric Swei is an accomplished polymath in the tech arena, with deep-seated expertise as a full stack developer, system architect, integration architect, and specialist in computer vision, alongside his profound knowledge in generative AI, data science, IoT, and cognitive technologies.

At the forefront as the Generative AI Architect at Toyota, Eric leads a formidable team in harnessing the power of generative AI. Their innovative endeavors are not only enhancing Toyota’s technological prowess but also redefining the future of automotive solutions with cutting-edge AI integration.

Stephen Ellis Bio (Technical Generative AI Product Manager): 10 years of experience in research strategy and the application of emerging technologies for companies as small as startups to Fortune 50 Enterprises. Former Director of the North Texas Blockchain Alliance where leading the cultivation of the Blockchain and Cryptocurrency competencies among software developers, C-level executives, and private investment advisors. Formerly the CTO of Plymouth Artificial Intelligence which was researching and developing future applications of AI. In this capacity advised companies on building platforms that seek to leverage emerging technologies for new business cases. Currently Technical Product Manager at Toyota Motors North America focused on enabling generative AI solutions for various group across the enterprise to drive transformation in developing new mobility solutions and enterprise operations.

Talk Track: Case Study

Talk Technical Level: 2/7

Talk Abstract:
Team Toyota will delve into their innovative journey with generative AI in automotive design, with the talk exploring how the Toyota research integrates traditional engineering constraints with state-of-the-art generative AI techniques, enhancing designers’ capabilities while ensuring safety and performance considerations.

What You’ll Learn
1. Toyota’s Innovation Legacy
2. Leveraging LLMs in Automotive – battery, vehicle, manufacturing, etc
3. Failures in Generative AI projects
4. Education to business stakeholders

Talk: HybridRAG: Merging Knowledge Graphs with Vector Retrieval for Efficient Information Extraction

Presenter:
Bhaskarjit Sarmah, Vice President, BlackRock

About the Speaker:
As a Vice President and Data Scientist at BlackRock, I apply my machine learning skills and domain knowledge to build innovative solutions for the world’s largest asset manager. I have over 10 years of experience in data science, spanning multiple industries and domains such as retail, airlines, media, entertainment, and BFSI.

At BlackRock, I am responsible for developing and deploying machine learning algorithms to enhance the liquidity risk analytics framework, identify price-making opportunities in the securities lending market, and create an early warning system using network science to detect regime change in markets. I also leverage my expertise in natural language processing and computer vision to extract insights from unstructured data sources and generate actionable reports. My mission is to use data and technology to empower investors and drive better financial outcomes.

Talk Track: Virtual Talk

Talk Technical Level: 7/7

Talk Abstract:
In this session we will introduce HybridRAG, a novel approach that combines Knowledge Graphs (KGs) and Vector Retrieval Augmented Generation (VectorRAG) to improve information extraction from financial documents. HybridRAG addresses challenges in analyzing financial documents, such as domain-specific language and complex data formats, which traditional RAG methods often struggle with. By integrating Knowledge Graphs, HybridRAG provides a structured representation of financial data, thereby enhancing the accuracy and relevance of the generated answers. Experimental results demonstrate that HybridRAG outperforms both VectorRAG and GraphRAG individually in terms of retrieval accuracy and answer generation.

What You’ll Learn
Key learnings from this session will include an understanding of the integration of Knowledge Graphs (KGs) and Vector Retrieval Augmented Generation (VectorRAG) to enhance information extraction from financial documents. The paper addresses challenges posed by domain-specific language and complex data formats in financial documents, which are often not well-handled by general-purpose language models. The HybridRAG approach demonstrates improved retrieval accuracy and answer generation compared to using VectorRAG or GraphRAG alone, highlighting its effectiveness in generating contextually relevant answers. Although the focus is on financial documents, the techniques discussed have broader applications, offering insights into the wider utility of HybridRAG beyond the financial domain.

Talk: Evaluating LLM-Judge Evaluations: Best Practices

Presenter:
Aishwarya Naresh Reganti, Applied Scientist, Amazon

About the Speaker:
Aishwarya is an Applied Scientist in the Amazon Search Science and AI Org. She works on developing large scale graph-based ML techniques that improve Amazon Search Quality, Trust and Recommendations. She obtained my Master’s degree in Computer Science (MCDS) from Carnegie Mellon’s Language Technology Institute, Pittsburgh. Aishwarya has over 6+ years of hands-on Machine Learning experience and 20+ publications in top-tier conferences like AAAI, ACL, CVPR, NeurIPS, EACL e.t.c. She has worked on a wide spectrum of problems that involve Large Scale Graph Neural Networks, Machine Translation, Multimodal Summarization, Social Media and Social Networks, Human Centric ML, Artificial Social Intelligence, Code-Mixing e.t.c. She has also mentored several Masters and PhD students in the aforementioned areas. Aishwarya serves as a reviewer in various NLP and Graph ML conferences like ACL, EMNLP, AAAI, LoG e.t.c. She has had the opportunity of working with some of the best minds in both academia and industry through collaborations and internships in Microsoft Research, University of Michigan, NTU Singapore, IIIT-Delhi, NTNU-Norway, University of South Carolina e.t.c.

Talk Track: In-Person Workshop

Talk Technical Level: 5/7

Talk Abstract:
The use of LLM-based judges has become common for evaluating scenarios where labeled data is not available or where a straightforward test set evaluation isn’t feasible. However, this approach brings the challenge of ensuring that your LLM judge is properly calibrated and aligns with your evaluation goals. In this talk, I will discuss some best practices to prevent what I call the “AI Collusion Problem,” where multiple AI entities collaborate to produce seemingly good metrics but end up reinforcing each other’s biases or errors. This creates a ripple effect.

What You’ll Learn
– Gain insight into what LLM judges are and the components that make them effective tools for evaluating complex use cases.
– Understand the AI Collusion problem in context of evaluation and how it can create a ripple effect of errors.
– Explore additional components and calibration techniques that help maintain the integrity and accuracy of evaluations.

Talk: On-Device ML for LLMs: Post-training Optimization Techniques with T5 and Beyond

Presenter:
Sri Raghu Malireddi, Senior Machine Learning Engineer, Grammarly

About the Speaker:
Sri Raghu Malireddi is a Senior Machine Learning Engineer at Grammarly, working on the On-Device Machine Learning. He specializes in deploying and optimizing Large Language Models (LLMs) on-device, focusing on improving system performance and algorithm efficiency. He has played a key role in the on-device personalization of the Grammarly Keyboard. Before joining Grammarly, he was a Senior Software Engineer and Tech Lead at Microsoft, working on several key initiatives for deploying machine learning models in Microsoft Office products.

Talk Track: Advanced Technical/Research

Talk Technical Level: 4/7

Talk Abstract:
This session explores the practical aspects of implementing Large Language Models (LLMs) on devices, focusing on models such as T5 and its modern variations. Deploying ML models on devices presents significant challenges due to limited computational resources and power constraints. However, On-Device ML is crucial as it reduces dependency on cloud services, enhances privacy, and lowers latency.

Optimizing LLMs for on-device deployment requires advanced techniques to balance performance and efficiency. Grammarly is at the forefront of On-Device ML, continuously innovating to deliver high-quality language tools. This presentation offers valuable insights for anyone interested in the practical implementation of on-device machine learning using LLMs, drawing on Grammarly’s industry application insights.

The topics that will be covered as part of this talk are –
– Techniques for optimizing performance and reducing inference latency in LLMs – Quantization, Pruning, Layer Fusion, etc.
– Methods to develop efficient and scalable AI solutions on edge devices.
– Addressing common challenges in deploying LLMs to edge devices – over-the-air updates, logging, and debugging issues in production.

Foundation models (e.g. large language models) create exciting new opportunities in our longstanding quests to produce open-ended and AI-generating algorithms, wherein agents can truly keep innovating and learning forever. In this talk I will share some of our recent work harnessing the power of foundation models to make progress in these areas. I will cover our recent work on OMNI (Open-endedness via Models of human Notions of Interestingness), Video Pre-Training (VPT), Thought Cloning, Automatically Designing Agentic Systems, and The AI Scientist.

What You’ll Learn
TBA

Talk: AI in Financial Services: Emerging Trends and Opportunities

Presenter:
Awais Bajwa, Head of Data & AI Banking, Bank of America

About the Speaker:
TBA

Talk Track: TBA

Talk Technical Level: 3/7

Talk Abstract:
TBA

What You’ll Learn
TBA

Talk: A Practical Guide to Efficient AI

Presenter:
Shelby Heinecke, Senior AI Research Manager, Salesforce

About the Speaker:
Dr. Shelby Heinecke leads an AI research team at Salesforce. Shelby’s team develops cutting-edge AI for product and research in emerging directions including autonomous agents, LLMs, and on-device AI. Prior to leading her team, Shelby was a Senior AI Research Scientist focusing on robust recommendation systems and productionizing AI models. Shelby earned her Ph.D. in Mathematics from University of Illinois at Chicago, specializing in machine learning theory. She also holds an M.S. in Mathematics from Northwestern and a B.S. in Mathematics from MIT. Website: www.shelbyh.ai

Talk Track: Research or Advanced Technical

Talk Technical Level: 3/7

Talk Abstract:
In the past two years, we’ve witnessed a whirlwind of AI breakthroughs powered by extremely large and resource-demanding models. And as engineers and practitioners, we are now faced with deploying these AI models at scale in resource constrained environments, from cloud to on-device. In this talk, we will first identify key sources of inefficiency in AI models. Then, we will discuss techniques and practical tools to improve efficiency, from model architecture selection, to quantization, to prompt optimization.

What You’ll Learn
TBA

Talk: Open-Ended and AI-Generating Algorithms in the Era of Foundation Models

Presenter:
Maxime Labonne, Senior Staff Machine Learning Scientist, Liquid AI

About the Speaker:
Maxime Labonne is a Senior Staff Machine Learning Scientist at Liquid AI, serving as the head of post-training. He holds a Ph.D. in Machine Learning from the Polytechnic Institute of Paris and is recognized as a Google Developer Expert in AI/ML.

An active blogger, he has made significant contributions to the open-source community, including the LLM Course on GitHub, tools such as LLM AutoEval, and several state-of-the-art models like NeuralBeagle and Phixtral. He is the author of the best-selling book “Hands-On Graph Neural Networks Using Python,” published by Packt.

Connect with him on X and LinkedIn.

Talk Track: Applied Case Studies

Talk Technical Level: 5/7

Talk Abstract:
Fine-tuning LLMs is a fundamental technique for companies to customize models for their specific needs. In this talk, we will introduce fine-tuning and best practices associated with it. We’ll explore how to create a high-quality data generation pipeline, discuss fine-tuning techniques using popular libraries, explain how model merging works, and present the best ways to evaluate LLMs.

What You’ll Learn
Best practices for fine-tuning, creating a high-quality data generation pipeline, fine-tuning techniques, best fine-tuning libraries, how to do model merging, and evaluation methods for fine-tuned models.

Talk: Revolutionizing Venture Capital: Leveraging Generative AI for Enhanced Decision-Making and Strategic

Presenter:
Yuvaraj Tankala, AI Engineer and Venture Capital Innovator, Share Ventures

About the Speaker:
TBA

Talk Track: TBA

Talk Technical Level: 3/7

Talk Abstract:
TBA

What You’ll Learn
TBA

Talk: Demystifying Multi-Agent Patterns

Presenter:
Pablo Salvador Lopez, Principal AI Architect, Microsoft

About the Speaker:
As a seasoned engineer with extensive experience in AI and machine learning, I possess a unique blend of skills in full-stack data science, machine learning, and software engineering, complemented by a solid foundation in mathematics. My expertise lies in designing, deploying, and monitoring AI/ML software products at scale, adhering to MLOps/LLMOps and best practices in software engineering.

Having previously led the MLOps practice at Concentrix Catalyst and the ML Engineering global team at Levi Strauss & Co., I have developed a profound understanding of implementing real-time and batch time ML solutions for several Fortune 500 enterprises. This experience has significantly enhanced my ability to manage big data and leverage cloud engineering, particularly with Azure’s AI, GCP and AWS.

Currently, at Microsoft, as a Principal Technical Member of the prestigious AI Global Black Belt team, I am dedicated to empowering the world’s largest enterprises with cutting-edge generative AI and machine learning solutions. My role involves driving transformative outcomes through the adoption of the latest AI technologies and demystifying the most complex architectural and development patterns. Additionally, I am actively involved in shaping the industry’s direction in LLMOps and contributing to open source by publishing impactful software and AI solutions.

Talk Track: Applied Case Studies

Talk Technical Level: 5/7

Talk Abstract:
How to successfully build and productionalize a multi-agent architecture with semantic-kernel and Autogen.

What You’ll Learn
The audience will learn how to build a multi-agent architecture following best practices using open-source technology like Semantic-Kernel and Autogen. This session will accelerate the journey from single-agent to multi-agent systems and how to productionize these systems to scale using best practices for LLMs in production.

Talk: Enabling Safe Enterprise Adoption of Generative AI

Presenter:
John Hearty, Head of AI Governance, Mastercard

About the Speaker:
TBD

Talk Track: Case Study

Talk Technical Level: 6/7

Talk Abstract:
At Mastercard, we have over a decade of experience leveraging AI, with a mature AI Governance program that provides oversight, and enables the fair, effective, and transparent use and development of AI solutions. However, Generative AI has brought new challenges and risks, which have made us rethink our processes.

What You’ll Learn:
We will discuss Mastercard’s journey to setting up our AI Governance Program, and how we’ve adapted it to meet the demands of emerging technology.
We will also discuss about
– How we have operationalized responsible AI development,
– The new possibilities that Generative AI brings, as well as the challenges and how we have adapted to them,
– Ways of leveraging this technology in a safe and effective way
– Lessons learned from a relatively small team to enable a major enterprise (the importance of strategic partnerships!)
– Scaling enterprise-wide adoption of consistent governance frameworks and risk management techniques for GenAI, focusing on process and scale

Talk: How to Run Your Own LLMs, From Silicon to Service

Presenter:
Charles Frye, AI Engineer, Modal Labs

About the Speaker:
Charles teaches people to build data, ML, and AI applications. He got his PhD from the University of California, Berkeley, in 2020 for work on the geometry of neural network optimization. He has since worked as an educator and evangelist for neural network applications at Weights & Biases, Full Stack Deep Learning, and now Modal Labs.

Talk Track: Advanced Technical/Research

Talk Technical Level: 6/7

Talk Abstract:
In this talk, AI Engineer Charles Frye will discuss the stack for running your own LLM inference service. We’ll cover: compute options like CPUs, GPUs, TPUs, & LPUs; model options like Qwen & LLaMA; inference server options like TensorRT-LLM, vLLM, & SGLang; and observability options like the OTel stack, LangSmith, W&B Weave, & Braintrust.

What You’ll Learn
Everything about serving LLMs, from the latest and greatest open source software tooling to the fundamental principles that drive engineering constraints across the stack.

Talk: From Paper to Production in 30 Minutes: Implementing code-less Gen AI Research

Presenter:
Aarushi Kansal, AI Engineer, AutoGPT

About the Speaker:
Aarushi is a passionate and seasoned AI engineer, currently working at AutoGPT – one of the most popular projects on GitHubm aiming to democratize AI. Previously she has initiated and lead Generative AI at Bumble as a principal engineer. She has also been a software engineer at iconic companies such as ThoughtWorks, Deliveroo and Tier Mobility.

Talk Track: Advanced Technical/Research

Talk Technical Level: 5/7

Talk Abstract:
There are new research papers in the Gen AI space, about prompting, RAG, different models, different ways to finetune almost every other day these days. Often they come with no code and in this talk we’re going to go through and implement research papers in 30 minutes.

What You’ll Learn
In this talk the audience will learn how to take a research paper and quickly implement (within 30 minutes) and then how to actually evaluate if it’s useful for their work or not.

Talk: Scaling Vector Database Usage Without Breaking the Bank: Quantization and Adaptive Retrieval

Presenter:
Zain Hasan, Senior ML Developer Advocate, Weaviate

About the Speaker:
Zain Hasan is a Senior Developer Advocate at Weaviate an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto building artificially intelligent assistive technologies. He then founded his company developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients. More recently he practiced as a consultant senior data scientist in Toronto. He is passionate about open-source software, education, community, and machine learning and has delivered workshops and talks at multiple events and conferences.

Talk Track: Case Study

Talk Technical Level: 3/7

Talk Abstract:
Everybody loves vector search and enterprises now see its value thanks to the popularity of LLMs and RAG. The problem is that prod-level deployment of vector search requires boatloads of both CPU, for search, and GPU, for inference, compute. The bottom line is that if deployed incorrectly vector search can be prohibitively expensive compared to classical alternatives.

The solution: quantizing vectors and performing adaptive retrieval. These techniques allow you to scale applications into production by allowing you to balance and tune memory costs, latency performance, and retrieval accuracy very reliably.

I’ll talk about how you can perform realtime billion-scale vector search on your laptop! This includes covering different quantization techniques, including product, binary, scalar and matryoshka quantization that can be used to compress vectors trading off memory requirements for accuracy. I’ll also introduce the concept of adaptive retrieval where you first perform cheap hardware-optimized low-accuracy search to identify retrieval candidates using compressed vectors followed by a slower, higher-accuracy search to rescore and correct.

These quantization techniques when used with well-thought-out adaptive retrieval can lead to a 32x reduction in memory cost requirements at the cost of ~ 5% loss in retrieval recall in your RAG stack.

What You’ll Learn
TBA

Talk: GenAI ROI: From Pilot to Profit

Presenter:
Ilyas Lyoob, Faculty, University of Texas; Head of Research, Kyndryl; Venture Partner, Clutch VC

About the Speaker:
Dr. Ilyas Iyoob is faculty of Data Science and Artificial Intelligence in the Cockrell School of Engineering at the University of Texas. He pioneered the seamless interaction between machine learning and operations research in the fields of autonomous computing, health-tech, and fin-tech. Previously, Dr. Iyoob helped build a cloud computing AI startup and successfully sold it to IBM. He currently advises over a dozen venture funded companies and serves as the Global Head of Research at Kyndryl (IBM Spinoff). He has earned a number of patents and industry recognition for applied Artificial Intelligence and was awarded the prestigious World Mechanics prize by the University of London.

Talk Track: Case Study

Talk Technical Level: 1/7

Talk Abstract:
In this session, we will dive deep into the real-world ROI of Generative AI moving beyond pilot projects and into scalable, value-driving solutions. With real world examples of our enterprise implementations, we reveal the hidden costs, unexpected value, and key metrics that truly matter when measuring success. We will also explore practical steps to overcome “pilot paralysis” and strategies for balancing innovation with cost control.

What You’ll Learn
Whether you’re a decision-maker or AI leader, this session will provide actionable insights on how to make GenAI work for your business, ensuring it delivers measurable impact and not just hype.

Talk: Build with Mistral

Presenter:
Sophia Yang, Head of Developer Relations, Mistral AI

About the Speaker:
Sophia Yang is the Head of Developer Relations at Mistral AI, where she leads developer education, developer ecosystem partnerships, and community engagement. She is passionate about the AI community and the open-source community, and she is committed to empower their growth and learning. She holds an M.S. in Computer Science, an M.S. in Statistics, and a Ph.D. in Educational Psychology from The University of Texas at Austin.

Talk Track: Advanced Technical/Research

Talk Technical Level: 1/7

Talk Abstract:
In the rapidly evolving landscape of Artificial Intelligence (AI), open source and openness AI have emerged as crucial factors in fostering innovation, transparency, and accountability. Mistral AI’s release of the open-weight models has sparked significant adoption and demand, highlighting the importance of open-source and customization in building AI applications. This talk focuses on the Mistral AI model landscape, the benefits of open-source and customization, and the opportunities for building AI applications using Mistral models.

What You’ll Learn
TBA

Talk: Fast and Reproducible: Taming AI/ML Dependencies

Presenter:
Savin Goyal, Co-founder & CTO, Outerbounds

About the Speaker:
Savin is the co-founder and CTO of Outerbounds – where his team is building the modern ML stack to accelerate the impact of data science. Previously, he was at Netflix, where he built and open-sourced Metaflow, a full stack framework for data science.

Talk Track: Advanced Technical/Research

Talk Technical Level: 3/7

Talk Abstract:
Careful management of software dependencies is one of the most underrated parts of ML and AI systems despite being critically important for the stability of production deployments as well as for the speed of development. For the past many years, we have worked with the wider Python package management community (pip, conda, rattler, uv, and many more) and multiple organizations (Netflix, Amazon, Goldman Sachs, and many more) to advance the state of the art in dependency management for ML/AI platforms, including our open-source framework Metaflow.

In this talk, we’ll explore common pitfalls in dependency management and their impact on ML projects, from unexpected results due to package changes to the challenges of reproducing environments across different machines. We’ll cover issues ranging from the complexities of scaling dependencies in distributed cloud environments to performance regressions from seemingly innocuous updates, highlighting why robust dependency management is crucial for production ML systems.

We’ll share our learnings and demonstrate how we address key challenges in building robust and maintainable ML systems, such as:
Creating fast, stable, and reproducible environments for quick experimentation
Scaling cross-platform execution to the cloud with automatic dependency handling
Auditing the full software supply chain for security and compliance

We’ll also demo some of our recent work which enables baking very large container images in just a few seconds, significantly accelerating the prototyping and experimentation cycle for ML practitioners.

What You’ll Learn
This talk explores the critical yet often overlooked role of software dependency management in ML and AI systems. Drawing from years of collaboration with the Python package management community and major organizations, the speakers will share insights on common pitfalls in dependency management and their impact on ML projects and recent innovations in rapid container image creation for accelerated ML experimentation.

Talk: Driving GenAI Success in Production: Proven Approaches for Data Quality, Context, and Logging

Presenter:
Alison Cossette, Developer Advocate, Neo4j

About the Speaker:
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications. Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies. She is currently a Volunteer Member of the US Department of Commerce – National Institute of Standards and Technology’s Generative AI Public Working Group Alison’s academic journey includes Masters of Science in Data Science studies, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science. Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.

Talk Track: Advanced Technical/Research

Talk Technical Level: 2/7

Talk Abstract:
Generative AI is a part of our every day work now, but folks are still struggling to realize business value in production.

Key Themes:

Methodical Precision in Data Quality and Dataset Construction for RAG Excellence: Uncover an integrated methodology for refining, curating, and constructing datasets that form the bedrock of transformative GenAI applications. Specifically, focus on the six key aspects crucial for Retrieval-Augmented Generation (RAG) excellence.

Navigating Non-Semantic Context with Awareness: Explore the infusion of non-semantic context through graph databases while understanding the nuanced limitations of the Cosine Similarity distance metric. Recognize its constraints in certain contexts and the importance of informed selection in the quest for enhanced data richness.

The Logging Imperative: Recognize the strategic significance of logging in the GenAI landscape. From application health to profound business insights, discover how meticulous logging practices unlock valuable information and contribute to strategic decision-making.

Key Takeaways:

6 Requirements for GenAI Data Quality

Adding non-semantic context, including an awareness of limitations in distance metrics like Cosine Similarity.

The strategic significance of logging for application health and insightful business analytics.

Join us on this methodologically rich exploration, “Beyond Vectors,” engineered to take your GenAI practices beyond the current Vector Database norms, unlocking a new frontier in GenAI evolution with transformative tools and methods!

What You’ll Learn
TBA

Talk: Building Trust in AI Systems

Presenter:
Joseph Tenini, Principal Data Scientist, Universal Music Group

About the Speaker:
Joseph Tenini has worked in data science for over a decade in a variety of industries including healthcare, publishing, digital marketing, and entertainment. He has developed, deployed, and managed the lifecycle of a variety of ML-enabled products in many different settings. His specific expertise lies in recommender systems, reinforcement learning, and process improvement. He holds a PhD in Mathematics from the University of Georgia.

Talk Track: Business Strategy

Talk Technical Level: 3/7

Talk Abstract:
As builders of AI and ML systems, much time and effort is spent in building our own trust with the technology we are developing. This can take the form of model accuracy metrics, compute efficiency, and core functionality achieved. There is another, often more daunting, step to be considered: building trust in the technology with non-technical users and other stakeholders who will be impacted by its adoption.

In this talk, we explore four pillars of building trust in AI systems with non-technical stakeholders:
1. Describing performance relative to an interpretable and intuitive baseline.
2. Quantifying uncertainty as part of the delivery process.
3. Sharing “the why” in non-binary decision processes.
4. Designing for 2nd order process effects.

After this talk, machine learning practitioners and managers will be equipped to build trust in the products they develop – enabling maximum value and impact from their work.

What You’ll Learn
TBA

Talk: Building State-of-the-Art Chatbot Using Open Source Models and Composite Systems

Presenter:
Urmish Thakker, Director of Machine Learning, SambaNova Systems

About the Speaker:
Urmish leads the LLM Team at SambaNova Systems. The LLM team at SambaNova focuses on understanding how to train and evaluate HHH aligned large language models, adapting LLMs to enterprise use-cases and HW-SW co-design of LLMs to enable efficient training and inference. Before SambaNova, he was in various engineering and research roles at Arm, AMD and Texas Instruments. He also helped drive the TinyML Performance Working Group in MLPerf, contributing to the development of key benchmarks for IoT ML. Urmish has 35+ publications and patents focussing on efficient deep learning and LLMs. His papers have been published at top ML and HW conferences like NeurIPS, ICLR and ISCA and has been an invited speaker at various top universities and industry academia summits. He completed his masters at the University of Wisconsin Madison and his bachelors from Birla Institute of Technology and Science.

Talk Track: Advanced Technical/Research

Talk Technical Level: 4/7

Talk Abstract:
Open source LLMs like LLAMA2 and BLOOM have enabled widespread development of enterprise LLM Applications. As the models adoption has matured over-time, we have seen a rise in LLMs specialized to solve narrow domains, tasks or modalities. By adopting such specialization, these models are able to outperform far larger proprietary or open models. For example, 7-70B llama experts like UniNER, TabLLAMA, NexusRaven, SambaCoder-nsql-llama2 etc can outperform GPT-4 on NER, Function Calling, Tabular data and Text2SQL tasks. Many more such examples exist and can be found in open source. However, one unique feature that larger proprietary models offer is a single end-point that takes an user query and provides a response. These responses can sometimes also include a chain of tasks that was solved to get to such a response.

The question we try to answer in this research is whether we can build a composite LLM system using open source checkpoints that can effectively provide this same usability as a larger proprietary model. This includes taking a user request and mapping it to a single checkpoint or a group of checkpoints that can solve the request and serve the user. Our work indicates that such a composite is indeed possible. We show this by building a new state-of-the-art model based on various adaptations of the mistral 7b model. Using unique ensembling methods, this composite model outperforms Gemma-7B, Mixtral-8x7B, llama2-70B,, Qwen-72B, Falcon-180B and BLOOM-176B at an effective inference cost of <10B parameter model.

What You’ll Learn
TBA

Talk: AI Features Demand Evidence-Based Decisions

Presenter:
Connor Joyce, Senior User Researcher, Microsoft and Author of “Bridging Intentions to Impact”

About the Speaker:
Connor Joyce is the author of “Bridging Intentions to Impact” and a Senior User Researcher on the Microsoft Copilot Team, where he is advancing the design of AI-enhanced features. Passionate about driving meaningful change, Connor advocates that companies adopt an Impact Mindset, ensuring that products not only change behavior to satisfy user needs but also drive positive business outcomes. He is a contributor to numerous publications, advises emerging startups, and lectures at the University of Pennsylvania. Based in Seattle, Connor enjoys exploring the outdoors with his dog, Chai, and a local event organizer.

Talk Track: Business Strategy

Talk Technical Level: 5/7

Talk Abstract:
We are in the midst of a technology paradigm shift, and there is significant pressure on product teams to build Generative AI (GenAI) into their products. Navigating these uncharted waters requires decisions based on a deep understanding of user needs to ensure that this new technology is leveraged in the most beneficial way for both users and the business. This presentation emphasizes the necessity of creating a demand for insights by product teams and the democratization of evidence creation. Doing both can be achieved by defining features in a way that highlights the evidence supporting why they should work. By using the novel User Outcome Connection, teams can naturally identify what data is known and unknown about a feature. This framework makes the pursuit of new research to fill the gaps more straightforward, ensuring a solid foundation for decision-making.

By developing User Outcome Connection frameworks for key features, teams can design solutions that appropriately and effectively incorporate GenAI. This will be showcased through B2B and B2C examples illustrating the practical application and transformative potential of this approach.

What You’ll Learn
Attendees will learn how using the User Outcome Connection framework for key features, enables the strategic use of GenAI where it truly adds value. By the end of this session, participants will be equipped with actionable steps to adopt evidence-based frameworks, ensuring their products meet the evolving demands of technology and user expectations. Join this session to learn how to navigate the AI paradigm shift with evidence-based decisions and design truly impactful AI-enhanced features.

Talk: Agentic Workflows in Cybersecurity

Presenter:
Dattaraj Rao, Chief Data Scientist, Persistent

About the Speaker:
TBA

Talk Track: TBA

Talk Technical Level: 3/7

Talk Abstract:
TBA

What You’ll Learn
TBA

Talk: ML Deployment at Faire: Predicting the Future, Serving the Present

Presenter:
Harshit Agarwal, Senior Machine Learning Engineer, Faire Wholesale Inc

About the Speaker:
How Faire transitioned a traditional infrastructure into a modern, flexible model deployment and serving stack that supports a range of model types, while ensuring operational excellence and scalability in a dynamic e-commerce environment.

Over the past few years at Faire, we have overhauled our ML serving infrastructure, moving from hosting XGBoost models in a monolithic service to a flexible and powerful ML deployment and serving stack that powers all types of models, small and big.

In this talk, we’ll cover how we set up a system that makes it easy to migrate, deploy, scale, and manage different types of models. Key points will include how we set up infrastructure as code and CI/CD pipelines for smooth deployment, automated testing, and created user-friendly tools for managing model releases. We’ll also touch on how we built in observability and monitoring to keep an eye on model performance and reliability.

Come and learn how Faire’s ML serving stack helps our team quickly bring new ideas to life, while also maintaining the operational stability needed for a growing marketplace.

Talk Track: Research or Advanced Technical

Talk Technical Level: 5/7

Talk Abstract:
How Faire transitioned a traditional infrastructure into a modern, flexible model deployment and serving stack that supports a range of model types, while ensuring operational excellence and scalability in a dynamic e-commerce environment.

Over the past few years at Faire, we have overhauled our ML serving infrastructure, moving from hosting XGBoost models in a monolithic service to a flexible and powerful ML deployment and serving stack that powers all types of models, small and big.

In this talk, we’ll cover how we set up a system that makes it easy to migrate, deploy, scale, and manage different types of models. Key points will include how we set up infrastructure as code and CI/CD pipelines for smooth deployment, automated testing, and created user-friendly tools for managing model releases. We’ll also touch on how we built in observability and monitoring to keep an eye on model performance and reliability.

Come and learn how Faire’s ML serving stack helps our team quickly bring new ideas to life, while also maintaining the operational stability needed for a growing marketplace.

What You’ll Learn
1. How to best structure an ML serving and deployment infrastruture
2. How to build testing and observability into your deployment and serving infra
3. How to build production grade tools that your data scientists and MLEs will love
4. See how we are serving users at scale and the design choices that we made

Talk: Building AI Infrastructure for GenAI Wave

Presenter:
Shreya Rajpal, CEO & Co-Founder, Guardrails AI

About the Speaker:
Shreya Rajpal is the CEO of Guardrails AI, an open source platform developed to ensure increased safety, reliability and robustness of large language models in real-world applications. Her expertise spans a decade in the field of machine learning and AI. Most recently, she was the founding engineer at Predibase, where she led the ML infrastructure team. In earlier roles, she was part of the cross-functional ML team within Apple’s Special Projects Group and developed computer vision models for autonomous driving perception systems at Drive.ai.

Talk Abstract:
As Generative AI (GenAI) continues to revolutionize industries, it brings a new set of risks and challenges. This talk focuses on building robust AI infrastructure to manage and mitigate these risks. We will explore the multifaceted nature of GenAI risks and the essential infrastructure components to address them effectively. Key topics include implementing real-time monitoring systems to identify anomalies and biases, designing audit trails for enhanced transparency and developing adaptive security measures to combat emerging threats.

The presentation will also cover governance strategies for GenAI, and the integration of ethical AI frameworks to support responsible development and deployment. This talk is tailored for CISOs, AI ethics officers, ML engineers, and IT architects aiming to build secure and responsible GenAI systems.

Talk: Revolutionizing the skies: Mlops case study of LATAM airlines

Presenters:
Michael Haacke Concha, MLOps Lead, LATAM Airlines
Diego Castillo Warnken, Staff Machine Learning Engineer, LATAM Airlines

About the Speaker:
Michael Haacke Concha is the Lead Machine Learning Engineer of the centralized MLOps team at LATAM Airlines. He holds both a Bachelor’s and a Master’s degree in Theoretical Physics from Pontificia Universidad Católica de Chile (PUC). Over his three years at LATAM Airlines, he developed an archival and retrieval system for black box data of the aircraft to support analytics. He then played a key role in building the framework for integrating the Iguazio MLOps platform within the company. In the past year, he has been leading the development of a new platform using Vertex GCP.

Prior to joining LATAM Airlines, Michael worked as a data scientist on the ATLAS experiment at the Large Hadron Collider (LHC), where he contributed to various studies, including the search for a long-lived Dark Photon and a Heavy Higgs.

Diego Castillo is a Consultant Machine Learning Engineer at Neuralworks, currently on assignment as Staff in LATAM Airlines, where he plays a pivotal role within the decentralized Data & AI Operations team. A graduate of the University of Chile with a degree in Electrical Engineering, Diego has excelled in cross-functional roles, driving the seamless integration of machine learning models into large-scale production environments. As a Staff Machine Learning Engineer at LATAM, he not only leads and mentors other MLEs but also shapes the technical direction across key business areas.

Throughout his career at LATAM Airlines, Diego has significantly impacted diverse domains, including Cargo, Customer Care and the App and Landing Page teams. He has more recently been supporting the migration of the MLOPS internal framework from Iguazio to Vertex GCP.

With a comprehensive expertise spanning the entire machine learning lifecycle, Diego brings a wealth of experience from previous roles, including Data Scientist, Backend Developer, and Data Engineer, making him a versatile leader in the AI space.

Talk Track: Applied Case Studies

Talk Technical Level: 2/7

Talk Abstract:
This talk explores how LATAM Airlines leveraged MLOps to revolutionize their operations and achieve financial gain in the hundred of millions of dollars. By integrating machine learning models into their daily workflows and automating the deployment and management processes, LATAM Airlines was able to optimize tariffs, enhance customer experiences, and streamline maintenance operations. The talk will highlight key MLOps strategies employed, such as continuous integration and delivery of ML models, real-time data processing. Attendees will gain insights into the tangible benefits of MLOps, including cost savings, operational efficiencies, and revenue growth, showcasing how strategic ML operations can create substantial value in the airline industry.

What You’ll Learn
You will acquire insight into how a scalable and decentralized tech team grows inside LATAM airlines, thanks to technology and organizational structure. also you will learn some of our successful use cases of our MLOps ecosystem.

Talk: Supercharge ML Teams: ZenML's Real World Impact in the MLOps Jungle

Presenter:
Adam Probst, CEO & Co-Founder, ZenML

About the Speaker:
Adam Probst is the Co-founder and CEO of ZenML, an open-source MLOps framework simplifying machine learning pipelines. He holds a degree in Mechanical Engineering and studied at both Stanford University and the Technical University of Munich. Before co-founding ZenML, Adam gained valuable experience in the ML startup world within the commercial vehicle industry. Driven by a passion for customer-centric solutions, Adam is obsessed with unlocking the tangible benefits of MLOps for businesses.

Talk Abstract:
Supercharge ML Teams: ZenML’s Real World Impact in the MLOps Jungle
In the complex ecosystem of machine learning operations, teams often find themselves entangled in a dense jungle of tools, workflows, and infrastructure challenges. This talk explores how ZenML, an open-source MLOps framework, is cutting through the underbrush to create clear paths for ML teams to thrive.
We’ll dive into real-world case studies demonstrating how ZenML has empowered organizations to streamline their ML pipelines, from experimentation to production. Attendees will learn how ZenML addresses common pain points such as reproducibility, scalability, and collaboration, enabling teams to focus on innovation rather than operational overhead.
Key topics include:

Navigating the MLOps tooling landscape with ZenML as your compass
Achieving seamless transitions from laptop to cloud deployments
Enhancing team productivity through standardized, yet flexible, ML workflows
Lessons learned from implementing ZenML in diverse industry settings

Whether you’re a data scientist, ML engineer, or team lead, you’ll gain practical insights on how to leverage ZenML to supercharge your ML initiatives and conquer the MLOps jungle.

Talk: Effective Workflows for Delivering Production-Ready LLM Apps

Presenter:
Ariel Kleiner, CEO & Co-Founder, Inductor

About the Speaker:
Ariel Kleiner is the CEO and founder of Inductor, which enables teams to deliver production-ready LLM applications significantly faster, more easily, and more reliably. Ariel was previously at Google AI, cofounded Idiomatic, and holds a PhD in computer science (specifically, in machine learning) from UC Berkeley.

Talk Abstract:
Going from an idea to an LLM application that is actually production-ready (i.e., high-quality, trustworthy, cost-effective) is difficult and time-consuming. In particular, LLM applications require iterative development driven by experimentation and evaluation, as well as navigating a large design space (with respect to model selection, prompting, retrieval augmentation, fine-tuning, and more). The only way to build a high-quality LLM application is to iterate and experiment your way to success, powered by data and rigorous evaluation; it is essential to then also observe and understand live usage to detect issues and fuel further improvement. In this talk, we cover the prototype-evaluate-improve-observe workflow that we’ve found to work well, and actionable insights as to how to apply this workflow in practice.

Talk: Finding the Hidden Drivers of AI Business Value

Presenter:
Jakob Frick, CEO & Co-Founder, Radiant

About the Speaker:
Jakob Frick is the CTO and Co-founder of Radiant AI. Before that he worked at Palantir Technologies across a range of areas from Covid Vaccine distribution work with the NHS, to National-scale Cyber defense to Model integration across platforms. Before that he worked on Open Source software with JP Morgan Chase.

Talk Abstract:
How do you know how well your AI products are actually working? In this talk we will explore how companies are looking beyond evaluations to tie LLM activity to their business outcomes. We’ll look at case studies and examples of the field as well as a framework for identifying the metrics that really move the needle in creating value with Generative AI.