2026 Abstracts

Keynote

From Models to Teammates: Operating, Monitoring and Trusting Agentic AI in Production

Akriti Chadda - Senior Applied Scientist

Agentic AI systems are changing not just how machine learning works, but how teams think, communicate and make decisions. Unlike traditional models, agents plan, act, and adapt over time and often in non-deterministic ways. This creates a new leadership challenge: how do you build trust in systems that don't behave predictably and how do you explain their risks, limitations and failures to non-technical stakeholders?

This talk focuses on the human and organizational skills required to operate agentic AI in production. It explores how to communicate uncertainty, set realistic expectations and influence decision-making when metrics are incomplete and failures are subtle. Attendees will learn practical frameworks for framing agent behavior, aligning cross-functional teams and pushing back on over-automation when agents are the wrong abstraction.

Grounded in real-world production experience, this session helps data professionals grow beyond technical execution into thoughtful leadership thus equipping them to guide teams, stakeholders and organizations through the complexities of deploying agentic AI responsibly and effectively.

Invited Speakers

Every LLM Call Counts: The environmental cost of AI, and how data scientists can reduce it

Catherine Nelson - Data Scientist, ML Engineer, Author, consultant

AI comes with a big environmental cost. Training and serving AI models consumes vast amounts of electricity, water, and raw materials. Already, AI accounts for around 15% of data center energy usage, and the energy demands of AI are projected to double by 2030. But as data scientists using AI, there are some things we can do to reduce our environmental footprint.

In this talk, I'll summarize the latest data on AI's environmental impact, looking in particular at OpenAI, Google, and Anthropic. I'll also highlight what data the providers aren't disclosing, and why that lack of transparency makes it harder for us to make good choices.

In the second half of the talk, I'll give you actionable steps you can take to reduce the impact when you're using an AI model. I'll show you techniques including prompt optimization, model selection strategies, and caching that reduce both environmental impact and costs. I'll also talk about how good evaluation data is essential for these. You'll learn which models are most efficient, how model choices affect emissions, and you'll gain practical knowledge to make more sustainable choices.

How to Work with Your PM (When They Don't Speak AI)

Shaili Guru - AI product leader and educator

You've built something promising. You understand the technical tradeoffs. But your PM keeps asking for timelines you can't commit to, scope that doesn't make sense, or success metrics that miss the point.

After more than a decade on the PM side, I can tell you they're not being difficult on purpose. Data science projects just don't follow the rules they learned managing traditional software. That mismatch causes real problems.

I keep hearing the same frustrations from data scientists. Being treated like a request machine. PMs who have no clue what's easy versus hard in ML. Agile processes that try to cram research into two-week sprints. Work that stays invisible until it ships.

But here's what most data scientists don't see. Your PM is getting squeezed too. They're being asked for roadmaps, to defend your project against competing priorities, and to translate your work for leadership (usually without the context they need to do it well).

This talk is about bridging that gap. I'll share what PMs are actually worried about when they push for certainty, why they frame things the way they do, and what helps them advocate for your work when you're not in the room.

We'll cover how to reframe uncertainty as risk, how to make the exploration phase work visible, and how to build a real partnership with your PM. Not just a transactional one.

You'll leave with approaches you can use in your next sprint planning: language that lands, ways to build trust, and how to educate without being condescending.

Leveraging AI to Support Evidence-Based Wildlife and Permit Management

Sridevi Narayana Wagle - Machine Learning Engineer, Pacific Northwest National Laboratory

The Hanford Site played a central role in the Manhattan Project, producing an extensive corpus of scientific, engineering, and operational records spanning multiple decades. These materials ranging from technical reports and engineering drawings to photographic documentation are essential for contemporary nuclear research, environmental remediation, and historical analysis. While this is publicly accessible through the DOE Declassified Document Retrieval System (DDRS), its analytical value is significantly constrained by inconsistent metadata, limited document-level indexing, heterogeneous file formats, and the lack of full-text search capabilities.

We present a scalable AI-based framework for multimodal archival exploration that integrates semantic search, automated metadata enrichment, and interactive large language model (LLM) interfaces. Using AWS Bedrock embeddings in combination with the Claude Sonnet 3.5 model, our system extracts structured entities, infers relationships, generates technical summaries, and supports conversational querying over text and image-based content. The data processing pipeline processed approximately 1.5 TB of legacy data, including 4 million TIF files, over 70,000 images, and 1,300 PDF documents. Automated deduplication, document reconstruction, and page-level segmentation enabled fine-grained indexing and embedding of previously inaccessible technical details.

The resulting multimodal search platform supports fuzzy matching, retrieval, and contextual filtering, allowing users to locate specific chemical compounds, process descriptions, construction specifications, or equipment references embedded deep within scanned reports or imagery. The AI-driven interface dynamically generates follow-on research questions and interactive knowledge graphs that expose cross-document linkages, enabling new forms of exploratory analysis across historical nuclear workflows and environmental impact data.

This work demonstrates a methodology for transforming complex, low-accessibility scientific archives into AI-ready knowledge systems. Beyond Hanford, the approach establishes a technical foundation for applying advanced AI-driven discovery to other unique DOE collections, accelerating research, improving archival usability, and supporting future innovation.

When Time Tells: Using Sequence Modeling to Understand Transfer Student Retention

Hoda Soltani - civil engineer and data scientist, university of oklahoma

An end-to-end predictive analytics framework to model student dropout, with a focus on transfer students at a four-year university. Student dropout remains one of the most multifaceted and pressing challenges in higher education, arising from a complex interplay of academic, social, economic, and institutional factors that limit both individual potential and broader social mobility.

This study conducts school-level retention prediction using university-specific administrative datasets that are not publicly available. Transfer students--those who have previously earned academic credit at another postsecondary institution--represent a large and academically diverse population whose successful integration into four-year institutions requires timely, evidence-based support informed by both historical academic pathways and early university performance.

The session presents a comprehensive predictive framework examining the academic histories and first-term outcomes of transfer students admitted to Engineering, Business, and Arts and Sciences over a three-year observation period. Drawing on principles from educational data mining, the analysis incorporates multidimensional features including sociodemographic attributes, pre-transfer coursework, enrollment intensity, academic load, financial aid, campus employment, and early indicators of academic engagement.

The modeling pipeline integrates supervised learning for binary classification (retained vs. non-retained), clustering methods to identify latent student subpopulations, and model-interpretation tools to support transparency. Central to the framework is the use of sequence modeling techniques--such as recurrent neural networks, gated recurrent units, and attention-based architectures--to capture temporal dependencies in students' academic trajectories. Rather than relying on static or summary-based features, these models learn patterns across semester-by-semester course enrollments and performance, enabling more accurate and earlier identification of dropout risk by modeling the order, timing, and evolution of academic behaviors. Methodological challenges, including class imbalance, overfitting, and domain-informed feature engineering, are explicitly addressed.

A Data Science Approach to Quantifying Fish Passage Through Dams, Assessing Fish Injury, and Advancing Fisheries Research

Erin Zionce - Data Scientist, Pacific Northwest National Laboratory

Sandy Rech - Earth Scientist, Pacific Northwest National Laboratory

Dams disrupt the natural life cycles of migratory riverine fish, posing significant challenges to their survival. Addressing these connectivity issues requires interdisciplinary collaboration between biologists and data scientists. At Pacific Northwest National Laboratory (PNNL), researchers integrate ecological expertise with data science to study fish passage and survival. PNNL’s fish passage projects focus on anadromous fish species such as salmonids, using various tagging methods (e.g., radio telemetry (RT), balloon-tagging) and injury assessments to analyze migration through dams. Two studies conducted at U.S. Army Corps of Engineers operated dams – Mud Mountain Dam (MMD) in Washington State and Foster Dam in Oregon – used RT to evaluate fish passage and survival. At MMD, adult Chinook salmon (Oncorhynchus tshawytscha) implanted with RT tags were tracked as they returned to spawning grounds from the ocean via a Fish Passage Facility. At Foster Dam, RT-tagged juveniles were monitored to evaluate survival rates and travel times during ocean-bound migration. Efforts to automate fish tracking and streamline data analysis aimed to reduce manual data processing. However, challenges like the noisy nature of RT data and unpredictable fish behavior required tailored algorithms to ensure accurate results. A third study conducted at Howard A. Hanson Dam (HAHD) in Washington State used balloon-tagging with complementary injury assessments to evaluate the biological consequences of dam passage through specific routes. Traditional injury assessment methods at HAHD rely on intensive fish handling and manual assessment, introducing potential human bias, variability, and stress to fish. As an extension of this study, a proof-of-concept approach leveraging AI-driven image analysis was developed to automate and standardize injury assessments, while reducing human bias and minimizing stress to fish. Together, these three projects demonstrate the importance of interdisciplinary research to improve the evaluation of fish passage and survival, and to support the conservation of salmonid populations.

AI Beyond English: Building Multi-Lingual and Non-English AI Solutions

Rachel Wagner-Kaiser - Director, NLP Data Scientist

We will address the core challenges technical teams face when dealing with non-English languages in building effective AI solutions, reinforced by real-life examples. We will outline the complexity of non-English data, from tackling non-Latin character sets and low-resource languages to the practical hurdles of transforming unstructured data (like images and audio) into usable text. We will also go into the options for different technical approaches, including topics such as the complexity of language detection and cross-language processing techniques. The session will also analyze the current role and limitations of LLMs across diverse languages. We will conclude with best practices for designing and deploying high-performance, multilingual NLP systems that deliver value for practical business use cases.

Agentic AI as Your Personal Wellness Coach

Riya Joshi - Data and Applied Scientist, Microsoft AI

In today's fast-paced world, maintaining healthy daily routines is challenging, yet critical for overall well-being. Traditional wellness applications largely offer passive tracking and generic recommendations, leaving users to make decisions without actionable guidance. This talk introduces an **agentic AI framework** designed to proactively optimize daily habits by integrating wearable data, predictive modeling, and real-time decision-making.

Our system collects physiological and activity data from wearable devices such as the Apple Watch via HealthKit APIs. A lightweight iOS frontend captures and transmits data to a Python-based backend, where the agentic AI resides. The AI models the user's current state--including sleep quality, fatigue, heart rate variability, and activity trends--and predicts near-future wellness metrics. Using a combination of rule-based policies and reinforcement learning, the agent recommends and delivers personalized interventions, such as exercise prompts, diet adjustments, or sleep optimization strategies. Notifications and actionable guidance are pushed back to the user in real time, creating a closed-loop feedback system that continuously adapts to the user's behavior and goals.

This agentic approach transforms wellness applications from passive trackers into proactive personal coaches. By demonstrating a real-time prototype integrating wearable data, predictive modeling, and dynamic intervention, this work highlights the potential of agentic AI to improve habit adherence, enhance physical and mental health, and empower users with personalized, adaptive decision support. This talk will cover the system architecture, agentic decision logic, and potential avenues for future research in AI-driven wellness, making it accessible to both technical and non-technical audiences.

Developing hybrid KG-LLM solutions for reliable information extraction

Anahita Pakiman - Senior Knowledge graph engineer & semantic Architect, amazon

In this talk, we will explore the c"Qualification processes in industrial settings require accurate, equipment-specific inspection criteria from technical documentation and perfect execution to ensure deliverable quality and minimize post-launch downtime and claims. This presents a challenging science problem: how to extract and generate reliable inspection recommendations from heterogeneous data sources, without hallucinations.

We developed a hybrid Knowledge Graph - LLM solution that addresses fundamental limitations of LLM-only approaches. Initial LLM-only approaches exhibited significant hallucinations, generating unreliable inspection values and recommendations that couldn't be validated against domain constraints, prompting our hybrid KG-LLM solution.

Our methodology employs a domain-specific KG that captures semantic relationships between equipment types, failures and inspection requirements. This domain-specific KG extracts and links entities from diverse historical data sources, including failures and unstructured technical documentation, creating a comprehensive semantic network for constrained generation.

By using graph patterns to constrain LLM inputs, we transformed the task from open generation to structured information insertion, significantly reducing hallucinations. Results demonstrate substantial improvement in inspection recommendation accuracy and consistency, while maintaining extraction efficiency. The methodology offers generalizable findings for bridging structured and unstructured data in domains requiring high-precision in AI outputs.

From Bots to Bookings: Agentic AI in the Real World @ Expedia

Emma Rosenthal - Data Scientist, Expedia Group

Stephanie Chen - Senior Manager, data science, expedia group

Agentic AI is reshaping how we build and interact with data systems - and at Expedia, we're harnessing its power to redefine both customer experiences and internal workflows. In this session, we'll share a two-fold perspective on practical implementations of agentic AI in industry.

First, we'll explore how Expedia is integrating conversational AI into our checkout experience. By enabling users to connect Expedia with ChatGPT, travelers can browse products and book trips directly through AI-driven workflows. We'll discuss the architecture behind this integration, the strategies for measuring AI-driven user behavior, and the challenges and opportunities of embedding agentic AI into a high-stakes e-commerce environment.

Second, we'll turn inward to examine how AI is transforming the way data scientists work. From intelligent agents that automate repetitive tasks to workflow optimizations and productivity ""hacks"", we'll showcase how AI tools are accelerating analytics, improving decision-making, and freeing teams to focus on high-value insights. Attendees will gain practical ideas for leveraging AI in their own organizations--whether to enhance customer-facing products or to streamline internal processes.

Join us for a candid look at the promise and limitations of agentic AI in real-world applications, and learn how Expedia is navigating this rapidly evolving landscape to deliver smarter experiences for travelers and data professionals alike.

From Individual Contributor to Data Leader: How to Unblock your team & Influence Strategy

Shikha Verma - Senior manager analytics, toast

The transition from individual contributor to manager is one of the most challenging career shifts in data science--particularly for women, who represent only 26% of data science roles and an even smaller fraction of technical leadership in the US. This gap widens at the management level, where many talented women ICs hesitate to pursue leadership or struggle with the transition because the playbook is unclear.

This session shares hard-won lessons from my journey as a PhD-trained data scientist turned manager of a team of 5. I'll address the identity crisis many of us face: ""If I'm not the best technically anymore, what's my value?"" and provide a concrete roadmap for becoming a leader who is technical enough to unblock your team and strategic enough to influence the roadmap.

You'll walk away with clear frameworks to apply immediately:

-- The 70-20-10 rule: How to evolve your time allocation across technical work, enablement, and strategy
-- The "Am I the Bottleneck?" test: Weekly self-assessment to identify where you're helping vs. hindering
-- The "Strategic Value" filter: Prioritization framework for ruthless decision-making
--The "Technical Enough" checklist: Know when to dive deep vs. delegate

Actionable insights on:

--What to unlearn from the IC mindset (value = output â†’ value = team's multiplied impact)
--Where to stay technical for high leverage (design reviews, unblocking) vs. where to let go (being the fastest coder)
--How to build strategic influence through stakeholder mapping, translating analytics to business language, and saying no effectively

This is for any woman in analytics considering leadership, newly managing, or struggling with the IC-manager balance. Leave with a clear mental model and practical tools to accelerate your transition and lead with confidence.

GeoAI for the Built Environment: Siting and Permitting

Anastasia Bernat - Senior Data Scientist, Pacific Northwest National Laboratory

How do we make sure AI doesn't get lost in space, especially when agencies need to make coordinated decisions on a myriad of environmental reviews for projects planned on U.S. lands? Too often these projects are delayed or over budget due to poor coordination with a variety of federal, state, and local laws dependent on the geographic location of the project site. However, geospatial artificial intelligence (GeoAI) has the potential to transform the pace and precision of permitting. PermitAI is a multimodal large language model testbed led by the Pacific Northwest National Laboratory that uses GeoAI to streamline the environmental permitting review process by turning millions of permitting maps into structured geointelligence. Digitization efforts focus on turning vast document and map repositories amassed through the National Environmental Policy Act (NEPA) into a spatially coherent Geographic Information System (GIS) dataset that charts decades of environmental review across agencies, scales, and formats. This includes capturing key geospatial data that agencies fundamentally rely on to scope baseline environmental conditions, communicate alternatives, weigh footprint constraints, and track mitigations. By then enriching, automating, and generating geospatial data from vast and heterogeneous government georegistries, GeoAI can rapidly build cohesive spatial reasoning for streamlined interagency coordination. This presentation will highlight a GeoAI data pipeline that is transforming how agencies geovisualize and analyze permitting data, reducing time spent navigating static documents and setting the foundations for integrating historic NEPA GIS layers into modern, information-rich digital permitting platforms and decision-support systems.

Soft Skills Are Not Optional: Why Early-Career Data Professionals Need Them Most

Swapnil Agrawal - Data Scientist, Microsoft

A common belief among early-career professionals is that soft skills are something to worry about later, once you become a manager or a leader. Early on, the focus is often placed solely on technical excellence, it could be writing better code, building better models, and delivering accurate results. While technical skills are essential, they are only the starting point.

In reality, soft skills matter more than ever at the beginning of a career. Early-career data professionals frequently work in ambiguous environments, collaborate across teams, and translate complex insights to non-technical stakeholders. Without strong communication, collaboration, and storytelling skills, even the best analysis can fail to influence decisions or create impact.

This talk challenges the myth that soft skills are only relevant for managers and leaders. Drawing from real experiences transitioning from entry-level to mid-level roles, the session demonstrates how early investment in communication, influence, and collaboration accelerates career growth, increases visibility, and builds trust with stakeholders.

Attendees will learn practical techniques to structure compelling data stories, align analysis with business goals, handle pushback on insights, and influence decisions without formal authority. The talk also explores how to demonstrate leadership behaviors--such as ownership, clarity, and empathy--regardless of title.

By reframing soft skills as career accelerators rather than optional extras, this session equips early-career data professionals to maximize their impact, navigate complex organizations, and grow faster and more intentionally in their careers.

AI As Your Personal Data Science Intern

Nandita Krishnan - data scientist

Many data professionals find themselves spending more time debugging AI-generated code than they would have spent writing it themselves, defeating the entire purpose. And then there are those who have abandoned AI tools entirely after repeated, frustrating experiences. This gap between AI’s potential and its practical application in real-world data science work remains frustratingly wide.

This talk bridges that gap by providing practical strategies for leveraging agentic AI, such as Cursor and Claude Code, as your ‘personal intern’: one that can actually excel when you give the right amount of supervision and guidance. Drawing from hands-on experience implementing AI tools for building machine learning models, creating automated pipelines, and generating analysis visualizations, I’ll share concrete strategies that separate productive AI use from time-wasting rabbit holes.

You’ll learn how to identify which tasks benefit most from AI assistance and which are better done manually. I’ll demonstrate prompt and context engineering techniques, including dos and don’ts, to help you avoid common pitfalls. We’ll explore how to establish verification workflows that catch errors early, implement guardrails that prevent catastrophic mistakes, and create feedback loops that improve AI output over time. I‘ll also talk about how to leverage built-in memory features to maintain context across sessions, configure custom rules that enforce your coding standards automatically, and use context files to give AI the proper background knowledge for your specific projects. This is about making agentic AI a reliable partner in your daily work, not just another technology to manage.

Forecasting You: How Data Science Powers Personalized Marketing

Ojasvi Khanna - Data scientist

From the ad-sponsored content we scroll past, to the products we are shown, to the emails that land in our inboxes, personalized ads are quietly influencing countless everyday customer buying decisions. Personalization is no longer a niche application of data science--it is the backbone of modern digital marketing. Today, the $650 billion industry of personalized marketing is being rapidly reshaped by AI and is projected to grow beyond $1.5 trillion by 2035.

While personalized systems often feel intuitive or even magical, the reality is more nuanced. At its core, personalization is a forecasting problem: making informed but uncertain predictions about what future you will do. Understanding this framing helps demystify why personalized marketing works when it does--and why it sometimes fails.

This talk breaks down personalized marketing from the ground up. It explains what data science models power these systems, and how recent advances in AI are accelerating both their scale and their impact. The session covers foundational ideas, modeling approaches, and emerging AI-driven use cases, offering a high-level understanding of how data scientists and AI professionals model, execute, monitor, and evaluated such personalized marketing campaigns in industry.

Lastly, this talk also tackles an important but often overlooked question: why better models and more powerful AI do not always lead to better outcomes. Such challenges reveal a critical truth for data professionals-- performance improvements on paper do not always translate into healthier, more meaningful real-world impact.

Although the examples focus on personalized marketing, the lessons extend far beyond it. This session is designed to equip data professionals with a clearer, more grounded understanding of how AI-driven personalization works--and how to build predictive systems that are not just smarter, but more thoughtful and responsible.

More Than a Retrain: How to Monitor, Diagnose, and Explain Drift in Production ML Models

Aashreen Raorane - Senior Data Scientist

Model degradation in production rarely comes from a single failure--it emerges through subtle shifts in data, upstream pipelines, or behavioral patterns. In real-world environments, a retrain alone doesn't fix these issues. What teams need is a systematic way to detect, diagnose, and explain drift.

This session presents a practical, tool-agnostic framework for understanding model drift based on lessons learned from validating and comparing production models. We will cover:

Detection: identifying feature drift, prediction distribution changes, and version-to-version inconsistencies through input/output checks
Diagnosis: tracing issues to upstream data shifts, schema changes, data quality problems, or model logic mismatches
Explanation: translating technical findings into clear narratives for stakeholders to support retraining, rollback, or remediation decisions

Attendees will gain actionable techniques for monitoring model health and ensuring ML systems remain accurate, stable, and trustworthy over time--without requiring advanced ML Ops infrastructure.

Beyond the Prompt: Building Autonomous AI Agents for High-Stakes Adversarial Environments- such as finance, fraud & abuse

Sneha Sivakumar - Product leader

This talk provides practical, proven methods to build AI agents for real world high stakes business processes such as combating fraud. Moving past simple chatbots to explore how autonomous agents can reason, pivot, and act to stop bad actors in real-time. How do we solve the integration with existing systems and how to prepare your data for a successful launch. The talk will cover (1) how to choose a business problem ready for agentic revolution (2) data preparation and labelling (3) achieving a high, 99% precision (4) launching and learning from the the agent outcomes.

Biomanufacturing for a better world

Erin Wilson - data scientist

Industrial biomanufacturing. The phrase doesn’t quite evoke “social good” imagery, but with a little more context, I hope it will next time you hear it! Biomanufacturing is a sector of industry that aims to produce valuable materials by harnessing the vast catalog of molecules made by Nature. While molecules exist already in natural forms, many require painstaking, unsustainable, extractive processes to isolate key ingredients at scale. We can be more clever: by taking genetic instructions from organisms that naturally produce a useful molecule and installing those instructions in a microbe, we can grow tanks of microbes that “brew” such molecules instead. An ecosystem of biomanufacturing companies is already growing: many feed microbes with renewable inputs, like sugar, while others capitalize on waste streams from other industries, such as gaseous carbon emissions.

To make a dent in climate change, biomanufacturing needs to get big. Industrial scale! While steel pipe networks wrapping around building-size bioreactors may contrast typical leafy green motifs of environmental sustainability work, industrial biomanufacturing is poised for social impact. We can reduce environmental harms caused by agricultural land use and pollution, provide alternatives that displace fossil carbon-based products, and even capture and repurpose carbon emissions before they enter the atmosphere. Biomanufacturing beautifully blends the mechanical and microbial for sustainability.

Many biomanufacturing approaches are maturing, but most are critically held back by underdeveloped data practices. We need better measurement equipment that can detect small-but-critical changes in biological systems; software that can predict and alert when such changes will trigger upsets to operations; deep domain experts that can ALSO troubleshoot with effective data analysis and visualization. Properly applied data science and engineering can help create a clearer window into the complexities of industrial biomanufacturing, and accelerate the field’s progress towards a healthier, more sustainable planet.

Designing Reliable Agentic AI Systems: Design Patterns for Production

Harsheeta Venkoba Rao - founding software engineer

Agentic AI systems promise autonomy, adaptability, and powerful multi-step reasoning but deploying them in production introduces challenges that traditional machine learning systems were never designed to handle. As systems move beyond single prompts to stateful, tool-using workflows, teams often encounter unpredictable behavior, silent quality degradation, and growing concerns around reliability, cost, and trust. This talk focuses on why these challenges emerge and how data professionals can think more systematically about building agentic AI systems that are reliable, observable, and safe.

The session begins by establishing a clear and accessible understanding of what makes a system “agentic,” contrasting prompt-based LLM pipelines with systems that reason over time, interact with external tools, and maintain context across steps. Using real-world scenarios, the talk highlights why agentic systems behave differently from traditional models and why familiar evaluation and monitoring approaches are often insufficient.

The core of the talk introduces design patterns for production agentic systems, emphasizing principles rather than tools. It explores how thoughtful system architecture, intentional monitoring, and well-placed guardrails can improve reliability without limiting usefulness. Monitoring is framed as a design choice rather than an afterthought, helping teams detect issues early, understand system behavior, and maintain confidence as systems evolve.

The talk concludes by examining the tradeoffs between autonomy and control and offering guidance on when agentic architectures add meaningful value and when simpler approaches may be more effective. Attendees will leave with a clear mental model for agentic AI systems, an understanding of why reliability is difficult but achievable, and practical principles they can apply when designing, evaluating, or deploying agentic systems in production environments.

Model Context Protocol (MCP): The Next Frontier of Generative AI

Neelam Koshiya - principle applied ai architect

As generative AI moves from experimentation to enterprise-scale adoption, the need for structure, control, and context becomes critical. Enter the Model Context Protocol (MCP)--a new paradigm that standardizes how applications communicate with foundation models using modular, context-rich instructions. MCP enables safer, more interpretable, and reusable GenAI workflows by separating business logic from prompts and embedding policy and governance into interactions. Model Context Protocol (MCP) has evolved into the universal "USB-C for AI," solving the critical "disconnected models" problem by standardizing how Large Language Models (LLMs) securely access diverse enterprise data and tools. This session explores why organizations are replacing brittle, custom-coded connectors with this model-agnostic layer to eliminate vendor lock-in, and how its client-server architecture--utilizing JSON-RPC--enables seamless integration with platforms like Claude, Bedrock, and Microsoft Copilot. Attendees will learn what capabilities can be unlocked through MCP's core primitives--Resources, Tools, and Prompts--and when to leverage the 10,000+ public integrations already available in the ecosystem to move from pilot projects to full-scale, agentic production deployments.

Surfacing Hidden Potential: ML-Driven Selection and Causal Inference for Rare Event Prediction in Partner Ecosystems

Somang (So) Han - data scientist

We present a machine learning framework for prioritizing high-potential partners in a large-scale ecosystem where meaningful business events are extremely rare.

To validate model-driven selection, we design a hybrid experimental framework combining randomized controlled trials for ML-selected partners with Synthetic Difference-in-Differences (SDID) for heuristic-selected partners where randomization is infeasible. A key challenge is statistical power under rare events and anticipated 50% non-compliance. To enable decision-making under these constraints, we adopt Bayesian inference with conjugate Beta-Binomial updating and Monte Carlo sampling for compliance-adjusted causal estimands (ITT, LATE). Rather than frequentist significance thresholds, we apply posterior probability thresholds calibrated to internal experimentation standards, enabling principled decisions when classical power is unattainable.

This work contributes methodologies for (1) rare event prediction under extreme class imbalance, (2) hybrid causal inference combining RCT and quasi-experimental approaches, and (3) Bayesian decision frameworks for resource-constrained experiments.

Taxonomy-Agnostic Hybrid Recommendation System for Procurement Classification

Ayushi Das - data scientist

Spend classification is a capability in procurement management, enabling strategic sourcing, supplier negotiation, cost optimization, and accurate financial reporting. However, classifying purchase orders (POs) and invoices remains challenging due to noisy and unstructured inputs, limited labeled data, evolving taxonomies, and ambiguous category definitions. Traditional supervised approaches struggle to generalize across such complex procurement environments.

This paper presents a taxonomy-agnostic, supplier-aware dual-expert recommendation architecture that combines LLMs with trained embedding-based semantic retrieval for robust procurement classification. The system leverages hierarchically grounded taxonomy descriptions, automatically generated and refined using LLMs, to improve semantic alignment between items and category scopes. Domain-specific embedding models are trained to enhance semantic search accuracy across noisy item descriptions, invoice text, and taxonomy metadata.

The dual-expert design consists of: (1) a retrieval expert that performs hybrid semantic search over taxonomy data, historical procurement records, and supplier intelligence, including normalized supplier descriptions and LLM-generated supplier tags; and (2) a fine-tuned LLM-based reranking expert that performs item-centric classification using structured reasoning, forced decision logic, and supplier-based validation signals. Prompt optimization of the reranking expert improves ranking precision and decision consistency without requiring model fine-tuning or retraining.

The system is evaluated across multiple taxonomies and achieves 85% top-5 accuracy on POs and 90 - 95% on invoices, outperforming the strongest baseline by approximately 20% and 51%, respectively. Recent enhancements yield 75 - 80% top-1 accuracy in real-world invoice classification. Error analysis indicates that remaining failures primarily arise from taxonomy design limitations, such as overlapping categories and insufficiently defined scopes.

Beyond accuracy gains, this work contributes automated taxonomy scope generation, supplier-aware classification via LLM-derived metadata, and scalable, production-ready framework that adapts to evolving taxonomies and unstructured data without retraining, demonstrating strong applicability across enterprise procurement environments.

When (and When Not) to Leverage Agentic AI: Practical Lessons from Building Projects and Autonomous Data Workflows

Alisha Gala - Senior data scientist

Booma Sowkarthiga Balasubramani - senior data scientist

Jingyi Du - principal data science manager

Agentic AI is increasingly promoted as the default paradigm for intelligent systems, promising autonomy, flexibility, and productivity. Yet as agent-based designs move from demos into real world data and decision workflows, teams encounter under-discussed challenges: unclear evaluation metrics, hidden operational costs, reliability risks, and ambiguous boundaries between human- and machine-control.

This talk takes a pragmatic view of Agentic AI through three concrete case vignettes:

Creative matching pipeline (image→theme→verse): A deterministic workflow outperforms agentic orchestration on latency, predictability, and explainability—illustrating when agentic behavior is not needed.
Experimentation-analysis-agent: Reads specifications, aggregates key metrics, and produces rollout recommendations—but doesn’t execute rollout decisions. Instead, the system surfaces tradeoffs, confidence signals, and guardrail-checks for human review. This shows how Agentic AI adds value as analytical decision support without crossing into autonomous control in high-risk contexts.
Autonomous anomaly investigation agent: Plans queries, revises hypotheses, and proposes remediation. It demonstrates genuine agentic properties—and failure modes that grow as control is delegated: error amplification from early wrong assumptions, persuasive but false confidence, observability gaps, and evaluation blindness when teams track completion rather than decision quality, stability, and human override cost.

Across these examples, we analyze Agentic AI as a spectrum of architectural decisions—from simple pipelines to systems that adapt their level of autonomy under uncertainty. We share a decision framework for when autonomy earns its place (including stakes, reversibility, observability) and an evaluation toolkit that operationalizes success beyond task completion (correct-action rate, steps/time saved, latency inflation, etc.).

Attendees will leave with actionable criteria to decide when Agentic AI is justified, how to evaluate its real impact, and avoid common pitfalls, including a clear rubric for defending “no agent” designs when they are the safer, faster, and more reliable choice.

Back to Main Page