At the cost of 1 local Intern, get 2 remote Experienced Professionals
Hero

If you are a client, who wants to work remotely from home for US company click here

If you are a startup, then click here to get more information

If you are a client, who wants to work remotely from home for US company click here

Article
The rise of Artificial Intelligence (AI) is often met with polarized views, with one side heralding its potential to revolutionize the world, and the other warning against its perceived threat to professionals and humanity as a whole. However, there’s a different and far more nuanced concern—one that isn't about AI replacing humans but about AI turning people, particularly new learners, into “mindless zombies.” This phenomenon—let's call it the "AI Zombocalypse"—is characterized by professionals becoming overly reliant on AI tools, ultimately losing their critical thinking and problem-solving abilities. While it may sound hyperbolic, this trend is not just an abstract possibility but a present danger, particularly for those just starting their careers. They are at risk of developing shallow, unstructured thinking patterns that lack the depth, creativity, and analytical rigor necessary to solve complex problems. This article explores how AI-induced mindlessness is a greater threat than AI itself, and how the current generation of learners is uniquely vulnerable to this issue. The Alluring Power of AI and the Danger Beneath AI tools are incredibly effective in getting things done quickly, which creates a sense of exhilaration, especially for those who are new to a field. It provides them with results that look polished on the surface and offer an illusion of completeness. But there's often a catch: when you start to dig deeper into these AI-generated results, you frequently find repetition of the same ideas in different forms, a lack of originality, or a vacuousness that becomes apparent upon closer inspection. Essentially, AI can deliver quantity at the expense of quality, leading to content that may look good on paper but fails to hold water upon critical evaluation. This allure of quick, seemingly accurate solutions is akin to a drug—an instant gratification that is hard to resist, especially for new learners who are keen to make an impression or solve a problem quickly. However, just as a drug masks the underlying issues rather than solving them, AI can obscure the learner's understanding, often bypassing essential skills in critical thinking, debugging, and problem decomposition. Evidence of the Problem: Uplevel's Findings The issue of blind AI reliance is supported by real-world data. A study conducted by Uplevel examined about 800 developers over three months using GitHub Copilot, an AI-powered coding assistant by Microsoft. The results were stark: there were "no significant improvements for developers" using Copilot compared to the previous three months without it, and in fact, 41% more bugs were introduced when using AI assistance. This indicates that the AI-generated code was not only less effective but potentially harmful to code quality. New learners are particularly prone to these pitfalls, as they may lack the ability to properly vet AI-generated solutions and instead blindly accept them. This reinforces the point that, far from enhancing developer productivity, AI can actually hinder the development of critical coding and debugging skills, which are essential for quality work. The Disappearance of Debugging Skills and the “Streetlight Effect” Debugging is a skill that separates a good programmer or problem-solver from a mediocre one. It requires systematically breaking down a problem, placing breakpoints, adding logging, and continuously analyzing the state of the system to understand what's going wrong. However, the rise of AI-assisted development tools is eroding this foundational skill. Instead of trying to understand the issue and experiment with possible solutions, learners are too quick to turn to AI for an answer. In this context, we often see the "streetlight effect," where learners act like the proverbial drunkard who searches for their keys only where there is light, not necessarily where they dropped them. They focus on where the AI's solution shines—regardless of whether it’s the right area to focus on. The AI provides a suggestion, and instead of critically evaluating it, they blindly implement it, often without truly understanding the underlying problem or even the solution. This type of behavior discourages deep, analytical thinking and stunts their problem-solving growth. A Symptom of a Broader Problem The erosion of debugging skills is not just about software; it reflects a broader loss of critical thinking that will affect every field as AI tools become ubiquitous. The human role in a world dominated by AI will shift from doing the work to guiding AI when it makes mistakes. This guiding role requires strong analytical skills to track state, validate solutions, and detect errors—skills that are being dulled by over-reliance on AI for immediate answers. The issue is not limited to debugging but represents a deeper problem: losing the ability to critically analyze, question, and break down complex issues. The Double Whammy for New Learners New learners are facing a perfect storm: on one hand, they are struggling to find jobs in a post-COVID world where companies are adjusting their expectations, downsizing, and assuming AI will bring productivity gains. Tools like Cursor, Replit Agent, Devin, and All Hands are reducing the need for large, entry-level engineering teams by automating many programming and administrative tasks. On the other hand, the very skills that new learners need to stand out—critical thinking, complex problem-solving, and the ability to debug effectively—are being eroded by their dependence on AI. Rather than developing mental models to decompose complex problems into manageable subtasks, they lean on AI to do the heavy lifting. AI's involvement can be particularly insidious because, unlike traditional learning, it does not encourage a systematic approach to problem-solving. It hands over pre-packaged solutions that make sense on a superficial level but fail to build the cognitive pathways necessary for long-term understanding. In a sense, AI is like Gollum's "my precious" from The Lord of the Rings, offering a shortcut that feels empowering but ultimately leads to an addiction that diminishes the user's abilities and critical thinking. The "Idiocracy" Parallel The 2006 satirical film Idiocracy foresaw a world where society’s intellectual rigor had been dulled to an extreme degree, leaving humans incapable of critical thought and complex problem-solving. Eerily, this future seems to be materializing faster than we anticipated, particularly as AI tools make it easier for people to bypass thinking for themselves. Just as Idiocracy predicted the rise of the popular Crocs footwear (which did indeed happen), it also anticipated a world where intellectual complacency would become the norm—thanks to technology, and now, AI. What Needs to Change: A Call for Cognitive Resilience It's clear that AI is here to stay, and its benefits are undeniable. But we must address how AI is affecting new learners and professionals before it becomes too late. To avoid an AI Zombocalypse, learners need to be taught not just how to use AI, but how to use it responsibly and critically. This includes: 1. Encouraging Debugging as a Core Skill Developers must learn to debug effectively, which involves breaking down problems, questioning assumptions, and methodically testing hypotheses. Simply pasting in AI solutions without understanding their implications is counterproductive. 2. Promoting Deep Problem-Solving Over Superficial Solutions AI can often offer quick fixes, but educators and mentors need to stress the importance of deeply understanding the problems at hand. Learners should be encouraged to decompose problems into smaller, manageable tasks and to critically analyze AI suggestions before implementing them. 3. Fostering a Healthy Skepticism Toward AI Solutions Learners should be trained to view AI as a tool—not an infallible oracle. It’s crucial to cross-check AI-generated suggestions against one’s understanding of the problem, and to not simply accept AI's word as gospel. 4. Building Resilience and Self-Reliance in Learning New learners should be encouraged to struggle and learn from their struggles. Over-reliance on AI shortcuts hampers the development of the problem-solving tenacity that is crucial in the long run. Conclusion The threat posed by AI is not its power to replace humans but its ability to make humans complacent, uncritical, and reliant on easy solutions. The real danger of AI is the rise of "AI zombies"—professionals and learners who have lost their cognitive edge, unable to think critically or solve problems without AI’s hand-holding. As technology continues to advance, our educational systems and professional development practices must adapt to emphasize critical thinking, deep problem-solving, and debugging skills that resist the allure of AI’s quick fixes. The future will belong to those who use AI thoughtfully, critically, and responsibly—not to those who let AI think for them.
5 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
As large language models (LLMs) increasingly become central to various applications, the need for robust tools to monitor, evaluate, and optimize these models is more important than ever. Two standout platforms that have emerged in this landscape are Opik and LangSmith. Both platforms offer powerful features for developing and managing LLM applications, yet they cater to distinct needs and workflows. In this blog, we’ll dive into a comprehensive comparison of Opik and LangSmith, examining their key features, strengths, and weaknesses. My recent experiments with both tools—focused on classifying emotions in Twitter data—provided valuable insights, particularly in terms of usability. I conducted two primary experiments: one centered on prompt refinement and the other on model comparison. Through these experiences, I aimed to highlight ease of use as a critical factor in choosing the right platform for your LLM projects. Overview of Opik Opik is an advanced, open-source platform designed for logging, viewing, and evaluating large language model (LLM) traces throughout both development and production stages. Its primary objective is to empower developers with detailed insights to debug, evaluate, and optimize LLM applications effectively. Opik also has SDK support for direct use, you can just setup your account and use it. Key Features of Opik: Self-Hosting Options: Opik offers flexible deployment options for both local and production environments. It supports local deployment via Docker Compose and scalable deployments using Kubernetes, making it adaptable for different scales of use. Comprehensive Tracing: Opik enables comprehensive logging and trace viewing, allowing developers to annotate traces and track LLM behavior in both local and distributed environments. This ensures greater visibility into model performance and helps identify issues quickly during both development and production phases. Integrated Evaluation Tools: Opik provides a set of built-in evaluation metrics, including heuristic performance measures and relevance assessments. It also supports metrics for detecting hallucinations and moderating content, and users can define custom metrics based on specific application needs. Testing Frameworks: Opik integrates with Pytest, providing developers with a framework to thoroughly test their LLM applications. This ensures that models are rigorously evaluated before deployment. Integration: Opik simplifies logging, viewing, and evaluating LLM traces with a robust set of integrations. Key features include: OpenAI : Log all OpenAI LLM calls for easy tracking. LangChain : Capture logs from LangChain interactions. LlamaIndex : Monitor LlamaIndex LLM performance. Ollama : Integrate logging for Ollama LLMs. Predibase : Fine-tune and serve open-source LLMs while logging their usage. Ragas : Evaluate Retrieval Augmented Generation (RAG) pipelines effectively. Overall, Opik’s rich set of tools and integrations make it a powerful asset for developers working with LLMs, offering end-to-end support for debugging, optimizing, and scaling LLM applications​. * You can access a comprehensive exploration of Opik from this link. Overview of LangSmith LangSmith is a comprehensive platform designed to streamline the development, debugging, testing, and monitoring of production-grade LLM (Large Language Model) applications. It bridges the gap between traditional software development processes and the unique challenges posed by LLMs, particularly around handling non-deterministic, complex workflows. Key Features of LangSmith: Advanced Tracing Capabilities: LangSmith excels in tracing the performance of LLM applications by providing detailed insights into the sequence of calls and inputs/outputs at each step. It supports code annotations for automatic trace generation, with options to toggle traces on or off depending on needs. Developers can also control trace sampling rates, ensuring that they log only what’s necessary, particularly useful in high-volume applications. The platform can trace multimodal interactions (e.g., text and image inputs) and distributed systems, ensuring a holistic view of an application’s performance. Dataset Management: LangSmith offers powerful dataset management, allowing developers to create and curate datasets for testing and evaluation. This feature supports few-shot learning experiments, which is essential for optimizing LLM performance. Developers can also organize experiments and results by dataset for better analysis and insights​. Evaluation Metrics: Built-in evaluators enable both automated and manual testing of LLM outputs, supporting various metrics like relevance, accuracy, harmfulness, hallucination, and more. LangSmith’s evaluation tools can assess how changes in prompts or model configurations impact overall performance​. Playground and Prompts: LangSmith includes an interactive playground that allows developers to tweak and experiment with prompts in real-time. This environment is user-friendly and removes friction from the iteration process, helping teams rapidly optimize their application’s behavior​. Scalability: Designed for scalability, LangSmith is built on a cloud architecture capable of handling LLM applications at large scales. It supports robust data retention policies, and its monitoring tools ensure that applications run efficiently and cost-effectively, even under heavy use​. Usability: Comparative Experiments I conducted an experiment with Opik and LangSmith and explored their usability while classifying emotions in Twitter data. I conducted two main experiments: one focused on prompt refinement and the other on model comparison. Here’s a breakdown of my findings, emphasizing ease of use rather than performance. For the prompt refinement experiment, I used the Emotion dataset from Twitter to classify tweets into happiness, sadness, or neutral categories. Both platforms required only an API key and client initialization for setup, which was straightforward. For the model comparison experiment, I applied the best-performing prompt from the first experiment to compare two models: gpt-4o-mini and claude-3-sonnet. Open-Source Flexibility vs. Closed-Source Stability Opik : Open-Source : Opik is an open-source platform, giving developers the freedom to access, modify, and customize the platform’s source code. This flexibility fosters a collaborative environment where developers can contribute to the platform, improve it, and tailor it to their specific project needs. Customization : The open-source nature allows Opik users to implement unique, project-specific features or adjustments, which is valuable for teams with highly specialized requirements. This community-driven development model also allows the platform to evolve continuously based on user contributions. Ideal for Developers Seeking Flexibility : For teams or individuals who prefer to have control over their tools and the ability to customize according to their workflow, Opik is well-suited. It enables full transparency and adaptability, empowering developers to iterate on the platform as they wish. LangSmith : Closed-Source : LangSmith, on the other hand, is a proprietary, closed-source platform. While this restricts customization compared to Opik, it offers the advantage of being a more stable and streamlined platform. LangSmith’s closed-source nature ensures that updates are consistent and cohesive, with dedicated support to maintain the platform’s performance and reliability. Stability and Support : Being closed-source allows LangSmith to provide a more stable user experience, particularly important for enterprise users. It ensures regular updates, dedicated customer support, and a fully integrated suite of tools that work seamlessly together. Ideal for Enterprises Seeking Stability : Enterprises or teams that prioritize stability and dedicated support may prefer LangSmith. The closed-source model can provide peace of mind, knowing that the platform will continue to function reliably with cohesive updates and minimal disruption. Self-hosting Opik : Local Installation : Opik offers a local installation option, which is quick to set up and allows developers to get started immediately. However, this local setup is not intended for production environments, as it lacks the robustness required for large-scale operations. The local installation is suitable for quick testing and experimentation. It operates through a local URL and requires basic configuration of the SDK to interact with the self-hosted instance. This setup makes it very user-friendly for small-scale or short-term tasks. Kubernetes Installation : For production-ready deployment, Opik supports installation via Kubernetes. This option allows for scalability and ensures that all of Opik’s core functionalities—such as tracing and evaluation—are accessible in a more stable environment. Despite the production readiness of the Kubernetes setup, Opik lacks certain user management features in its self-hosted mode, which might be a drawback for larger teams needing detailed access control. There is no mention of built-in storage options in Opik’s self-hosted mode, implying that developers may need to set up external storage solutions for data management. Managed Options : For organizations seeking reduced maintenance, Opik provides managed deployment options through Comet. This allows teams to focus more on development and analysis without worrying about infrastructure maintenance. LangSmith : Docker and Kubernetes Support : LangSmith can be self-hosted via Docker or Kubernetes, making it suitable for both controlled cloud environments and large-scale production deployments. This flexibility allows LangSmith to cater to different organizational needs, from small startups to large enterprises. Componentized Architecture : LangSmith’s architecture is more complex than Opik’s, as it comprises multiple components including the Frontend, Backend, Platform Backend, Playground, and Queue. This setup ensures that LangSmith is highly modular and scalable but also requires more infrastructure management. The need to expose the Frontend for UI and API access adds to the operational complexity. Storage Bundling : Unlike Opik, LangSmith includes bundled storage services by default, making it easier for teams to get started without needing to configure external storage systems. However, users still have the option to configure external storage systems if their project demands it. Enterprise Focus : LangSmith is designed with large, security-conscious enterprises in mind. Its multi-component infrastructure is intended to support complex, secure environments. However, this also means that LangSmith may have a higher maintenance overhead compared to simpler platforms like Opik. The increased complexity requires careful configuration and management to ensure all components operate smoothly. Tracing Opik : Opik offers versatile tracing options, allowing you to log traces to the Comet LLM Evaluation platform via either the REST API or the Opik Python SDK. It supports integrations with a variety of tools, including LangChain, LlamaIndex, Ragas, Ollama, and Predibase, making it a flexible choice for developers looking to track their LLM performance across multiple frameworks. LangSmith : LangSmith provides tracing support primarily with LangChain, Vercel AI, and LangGraph. While it may have fewer integrations compared to Opik, LangSmith compensates with more advanced and low-level features for tracing. This can be beneficial for users who require in-depth analysis and customization in their LLM evaluations. Opik Tracing LangSmith Tracing As shown, LangSmith allows you to view more detailed information, including input, total tokens used, latency, feedback (i.e., evaluation score), metadata, and more. In contrast, Opik provides limited information, showing only input, output, scores, metadata, and so on. Here's a detailed comparison of Opik’s tracing and LangSmith’s tracing based on their dashboard visuals: Similarities: Tracing and Logging of Inputs/Outputs: Both Opik and LangSmith provide a clear breakdown of the input and output logs for evaluation tasks. Each platform displays detailed information regarding the input prompts and the model-generated outputs, which is essential for understanding the context and accuracy of the LLM response. The platforms also show additional details like feedback scores (Opik) or evaluation metrics (LangSmith), enabling users to assess performance in an organized format. Structured Presentation: Both dashboards offer a structured format where evaluation tasks are broken down into sections like "Input/Output," "Feedback Scores," and "Metadata." This ensures that users can navigate easily through the various components of the model evaluation. Status Indicators: Both platforms highlight the success/failure status of each evaluation task. This feature is useful for quickly identifying which tasks were successful and which may need further investigation. Differences: Visualization of Trace Details: Opik provides a more simplified view of the trace spans, with a focus on essential data such as input and output in a straightforward format. The left panel of the Opik dashboard groups spans hierarchically but is relatively simple. LangSmith , however, offers a more detailed tracing breakdown . It displays additional technical details like token usage, latency, and trace spans with granular timing (e.g., 0.2s). The dashboard offers richer metadata and breakdowns on a more technical level, making it more suitable for in-depth performance analysis. Feedback and Evaluation: Opik allows for quick feedback scores and custom metrics within the same pane, which are summarized easily in the CLI or notebook interface. The evaluation task is shown with simple input/output YAML formatting. LangSmith focuses more on detailed feedback evaluations . It provides more elaborate evaluation results, including a link to the platform dashboard for viewing advanced statistics and data visualizations. Visual Complexity: LangSmith has a more sophisticated interface with more detailed trace spans and multiple evaluation layers. This visual complexity can provide more powerful insights but may require more effort to navigate. Opik is more minimalist, prioritizing simplicity in its presentation. This could be more user-friendly for developers who prefer a lightweight and efficient interface. Evaluation Opik : Opik simplifies the process of defining metrics, allowing users to easily initialize and pass them as parameters during evaluation. It supports both heuristic and LLM-based judge metrics, with the added flexibility to create custom metrics tailored to specific needs. This user-friendly approach makes it accessible for developers looking to assess their LLM applications efficiently. Opik also summarizes results directly in the CLI or notebook, allowing for easy access to insights on-the-fly. LangSmith : LangSmith requires a more hands-on approach to metric definition. In LangSmith, evaluators are functions that score application performance based on specific examples from your dataset and the outputs generated during execution. Each evaluator returns an EvaluationResult, which includes: key, score and comment. LangSmith provides a link to its dashboard for viewing results, which, while informative, required navigating away from the immediate workflow. Opik Evaluation LangSmith Evaluation Both LangSmith and Opik provide overall metric scores as well as scores for each individual dataset item. In summary, both platforms give evaluation results in a similar way; the main difference lies in the setup of the metrics. In Opik, the setup is straightforward, while in LangSmith, it requires more effort to configure. Here's a detailed comparison of Opik’s dataset and LangSmith’s dataset based on their dashboard visuals: Similarities: Experiment Tracking: Both Opik and LangSmith provide a clear overview of experiments conducted on datasets. Each experiment is tracked with a unique identifier or name, and the results are logged in a structured manner. They both display the correctness of the evaluation (precision, recall, or label correctness) in a way that allows users to immediately grasp the performance of the model for each dataset item. Metric Display: Both systems display evaluation metrics for each experiment, such as precision , recall , and other relevant scores. This enables developers to gauge how well a specific model or experiment performed based on specific performance indicators. Dataset Connection: In both systems, experiments are linked to datasets, which allows for context-driven evaluation. This connection between the experiment and dataset ensures that users can quickly refer back to the dataset and see how the model performed against each data point. Differences: Visualization of Metrics: Opik: In the Opik evaluation dashboard, you can see metrics such as context precision and recall displayed prominently at the top of the interface. Each dataset entry is evaluated based on these metrics, and results are presented for each item. The emphasis is on immediate metric visibility for each input/output pair within the dataset. LangSmith: LangSmith provides an aggregate view of the experiment performance. Instead of breaking down individual metrics per dataset entry, LangSmith focuses on displaying experiment-level metrics such as Correct Label scores across multiple runs. This is useful for a more general performance comparison between different models or experiment configurations over time. Apart from that, you can also view metrics for each dataset entry by clicking on any specific experiment. Detailed Experiment Comparison: LangSmith: The LangSmith evaluation dashboard provides an overview of multiple experiments at once, listing them with splits, repetitions, and correctness scores. This allows users to quickly compare how different versions of models or setups have performed relative to one another, ideal for tracking improvements or regressions over time. Opik: The Opik evaluation dashboard focuses on individual metrics for each input . It presents a more fine-grained evaluation, especially when comparing precision and recall for specific inputs. However, it lacks a broad overview of multiple experiments in one glance. Dataset Opik : Opik presents a more straightforward view of dataset information, displaying inputs and expected outputs clearly. However, it lacks the advanced visualization capabilities found in LangSmith, which may limit users’ ability to quickly identify trends and insights. LangSmith : LangSmith excels in offering advanced visualization features that clearly showcase trends and evaluation metrics within the dataset tab. It provides rich support for datasets, allowing users to view experiments conducted on the dataset, perform pairwise experiments, and explore various formats, including key-value pairs, LLM, and chat data. This comprehensive approach makes it easier to analyze and understand the dataset’s performance and evaluation. Opik Dataset LangSmith Dataset As shown, LangSmith allows you to see how many experiments were run on a dataset, along with their metric scores and other details. In contrast, Opik only provides information about the dataset and its items. Here's a detailed comparison of Opik’s dataset and LangSmith’s dataset based on their dashboard visuals: Similarities: Sentiment Dataset: Both dashboards displays a dataset , with inputs and expected outputs. Each dataset item includes both the original input and the expected label. Dataset Structure: Both platforms show the dataset in a structured table format, where inputs and expected outputs are clearly listed. This ensures transparency and consistency in dataset management for both platforms. Support for Experimentation: Both platforms support running experiments on the datasets. They allow users to test different models or versions of a model and compare the performance based on these input/output pairs. Differences: Visualization: Opik Dataset: The Opik dataset interface is minimalistic, showing only the input/output pairs . It lacks advanced visualization capabilities, focusing instead on providing clear data entries for developers to reference. LangSmith Dataset: In contrast, the LangSmith dataset interface provides rich visualizations . For example, it shows a chart of experiments , enabling users to see the results of evaluations over time or across multiple experiments. This provides better analytical tools for users who want to track model performance trends. Experiment Features: Opik Dataset: The Opik interface offers simplicity, focusing on basic dataset information and expected outcomes. While it supports dataset-based evaluations, it lacks advanced tools for conducting complex experiments directly from the interface. LangSmith Dataset: LangSmith offers more advanced options for conducting experiments, such as pairwise experiments and the ability to add evaluators and generate new examples. It also supports few-shot learning , giving users more flexibility to perform sophisticated analyses on their datasets. Customization and Flexibility: LangSmith offers more features for interacting with datasets, such as tagging dataset versions, adding new examples, and generating examples. These features make it easier for users to experiment with their datasets and modify them on the go, offering more flexibility and control over data. Opik , on the other hand, is streamlined for straightforward dataset management and lacks these interactive features, focusing on simplicity and clarity for the user. * You can access a code and other exploration details of this comparison from this link. The table below highlights the functionality supported in Opik vs. LangSmith: Feature/Functionality Opik LangSmith Open-Source ✅ ❌ Self-hosting Support ✅ ✅ Dataset ✅ ✅ Tracing ✅ ✅ Evaluation ✅ ✅ Pytest Integration ✅ ❌ OpenAI Support ✅ ✅ LangChain Support ✅ ✅ LlamaIndex Support ✅ ❌ Ollama Support ✅ ❌ Predibase Support ✅ ❌ Ragas Support ✅ ❌ LangGraph Cloud Support ❌ ✅ Own Prompt Management ❌ ❌ Capture Human Feedback ❌ ✅ Advanced Monitoring & Automations ❌ ✅ Conclusion Both Opik and LangSmith offer valuable tools for large language model (LLM) application development, but they cater to different user needs and contexts. Opik is well-suited for developers who appreciate open-source flexibility and a user-friendly setup. Its straightforward metric definition, extensive integrations, and ease of use make it ideal for quick implementations and individual projects. However, it falls short in several areas critical for enterprise use, such as advanced dataset management, sophisticated monitoring, and built-in support for human feedback mechanisms. Opik’s limited tracing capabilities and basic logging features may hinder comprehensive performance analysis and compliance with privacy regulations, which are vital in larger team environments. LangSmith , in contrast, excels in enterprise settings where stability, scalability, and comprehensive monitoring are essential. Its advanced tracing capabilities, rich dataset management, and detailed visualization features facilitate deeper analysis and collaboration among stakeholders. LangSmith excels with its sophisticated tracing options, including the ability to log images and manage sensitive data effectively. Its built-in automation tools allow teams to respond proactively to issues, a necessity in high-stakes production settings. The closed-source model of LangSmith streamlines updates and support, allowing teams to focus on development rather than maintenance. These features are crucial for organizations aiming to deploy production-grade applications effectively. For AI researchers and engineers working on personal projects, Opik offers a flexible and accessible environment for experimentation and learning. Its open-source nature allows for customization without the constraints of a closed-source system. Conversely, AI engineers in enterprise environments will benefit from LangSmith’s comprehensive features tailored for production, including stability, extensive support, and advanced monitoring capabilities. In conclusion, the choice between Opik and LangSmith depends on the specific context of the user. Opik is a great fit for individuals and small teams focused on exploration, while LangSmith is the preferred option for organizations aiming to build scalable, production-ready applications. Aligning your toolset with your project requirements and long-term goals is essential for success in the evolving landscape of AI development.
6 min read
authors:
Sumit MishraSumit Mishra
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
Apple's recent advancements in Edge AI, known as " Apple Intelligence ," are setting new standards for AI on edge devices (such as mobile phones, tablets, and laptops) and shaping user expectations across the technology landscape. By embedding AI capabilities directly within iPhones, iPads, and Macs, Apple emphasizes privacy, low latency, and efficiency. This strategy allows tasks like image generation, text rewriting, and voice commands to be processed locally on the device, offering faster, more reliable, and secure interactions without constant cloud support. Apple is not alone in this focus. The trend is evident across other major players such as Microsoft, Google, Facebook and Samsung, all working on running AI on edge devices. While Edge AI offers many benefits, it also presents challenges, including the need for more powerful hardware and potential limitations on model size. To address these challenges and enable efficient on-device AI, technologies like WebLLM (for running large language models in web browsers), MLC (Machine Learning Compilation for optimizing AI models), and WebGPU (a low-level graphics API for web browsers) are being actively developed. These technologies are receiving contributions from a wide range of companies, including top tech giants. The WebGPU API, which serves as the backbone for running WebLLM models efficiently in the browser, is already supported across major browsers like Chrome, Firefox, and Safari. Given the rapid development of these technologies that will power a significant portion of future mobile and web applications, it's crucial to understand how they work. In the following sections, we will explain WebLLM, MLC, and WebGPU in detail, and illustrate their deployment using a practical WebLLM chat example that works directly on your device. WebLLM WebLLM is a high-performance, in-browser inference engine for Large Language Models (LLMs). It is designed to allow developers to deploy and run large language models directly in the browser with WebGPU for hardware acceleration, without requiring any server support. It is open-source and can be accessed on GitHub here. WebLLM manages the overall inference process, which includes: Tokenization: Converting natural language input into a format suitable for model processing. Model Management: Downloading and loading model weights into browser memory, where they are stored efficiently, often in a quantized format. Inference and Detokenization: Interfacing with MLC for computational tasks and converting results back to a human-readable form. WebLLM is designed to be compatible with the OpenAI API, allowing developers to use the same interface they would with OpenAI, supporting features such as streaming outputs, JSON-mode generation, and function calling (currently in progress). Key Features Include: In-Browser Inference Using WebGPU: Achieves hardware-accelerated LLM inference directly within the browser. Compatibility with OpenAI API: Facilitates integration using existing OpenAI-compatible functionalities. Structured JSON Generation: Provides JSON-mode structured generation for applications that require schema-based outputs. Extensive Model Support: Works natively with a variety of models, including Llama, Phi, Mistral, Qwen , etc., with the ability to integrate custom models using the MLC format. Real-Time Interactions & Streaming: Supports interactive applications like chat completions, allowing real-time text generation. Performance Optimization with Web Workers & Service Workers: Enables efficient computations and model lifecycle management by offloading tasks to separate browser threads. MLC LLM (Machine Learning Compilation for Large Language Models) MLC LLM is a specialized component of the MLC ecosystem, designed to optimize the inference of Large Language Models (LLMs) across various platforms, including browsers, desktops, and mobile devices. It is a machine learning compiler and high-performance deployment engine for large language models. It compiles and prepares LLMs for efficient execution based on the underlying hardware capabilities. Throughout this explanation, we will refer to MLC LLM as "MLC." MLC works closely with WebLLM by receiving tokenized inputs and preparing computational tasks that are optimized for the available hardware. These tasks are compiled into efficient GPU kernels, CPU instructions, or WebGPU shaders to ensure that LLM inference runs effectively across platforms. The goal of MLC is to bring high-performance LLM deployment natively to web browsers, desktops, and mobile devices. MLC is open-source and can be found on GitHub here , providing tools for efficient execution of LLMs across different environments, including browsers and native platforms. Platform-Specific Optimization MLC is designed to adapt to various hardware and platform needs, enabling efficient LLM inference. The mission of this project is to enable everyone to develop, optimize, and deploy AI models natively on everyone’s platforms. Key features: GPU Support for Major Manufacturers (AMD, NVIDIA, Apple, Intel): MLC optimizes the execution for different GPU types using APIs such as Vulkan, ROCm, CUDA, and Metal, based on the platform and hardware availability. Browser Support with WebGPU & WASM: MLC runs natively within web browsers by leveraging WebGPU and WebAssembly, providing hardware-accelerated inference directly in the browser. Mobile Platform Support: On iOS devices, MLC uses Metal for efficient execution on Apple GPUs, while on Android devices, it leverages OpenCL to support Adreno and Mali GPUs. MLCEngine: A Unified Inference Engine At the heart of MLC is MLCEngine, a high-performance inference engine that runs across various platforms, providing the necessary backbone for executing LLMs efficiently. MLCEngine offers OpenAI-compatible APIs for easy integration into various environments, including REST servers, Python applications, JavaScript, and mobile platforms. By using MLC, developers can deploy LLMs seamlessly across different platforms, harnessing the benefits of optimized hardware acceleration, whether it's for browsers, desktops, or mobile devices. WebGPU WebGPU is the hardware acceleration layer that enables efficient LLM inference within the browser. It interfaces directly with MLC, executing the optimized kernels or instructions prepared by MLC based on the available hardware resources (GPUs or CPUs). WebGPU is responsible for: Parallel Computation & Memory Transfers: Performing the necessary computations and managing memory efficiently to support the rapid inference of large models. Fallback to CPU when GPU is Unavailable: When no GPU is available, WebGPU ensures that computations can still proceed on the CPU, though performance will be reduced. By providing a direct bridge between model operations and hardware execution, WebGPU is critical for achieving the performance necessary for real-time LLM inference in web applications. Here is a refined and focused discussion that accurately captures the flow of WebLLM Chat using Llama 3.2, while addressing the clarity on custom model endpoints and structured outputs. Illustration with WebLLM Chat Using Llama 3.2 This section walks through how WebLLM Chat uses Llama 3.2 for real-time AI conversations within the browser. It highlights each step from user interaction to model response, leveraging WebGPU and MLC LLM's capabilities to optimize performance. The following diagram extends the earlier diagram to show on how Llama 3.2 can be used for chat interactions using WebLLM. Step-by-Step Flow of WebLLM Chat with Llama 3.2 Initialization & Model Loading Interface & Model Selection: Open WebLLM Chat in the browser. The user selects Llama 3.2 from the available models. Upon selection, the model weights are downloaded (if not cached) and loaded into memory. Progress Feedback: WebLLM Chat provides real-time progress updates on the model loading process, ensuring the user knows when Llama 3.2 is ready for conversation. Tokenization & User Input Input & Tokenization: The user types a query into WebLLM Chat. The interface tokenizes this input to prepare it for Llama 3.2 inference, converting the natural language into a sequence that the model understands. Responsive UI Through Web Workers: To keep the UI smooth and responsive, WebLLM uses Web Workers to offload computations away from the main thread. This enables real-time input processing without performance lags. Inference & WebGPU Acceleration Model Execution & Hardware Utilization: WebLLM uses MLC LLM to manage computations, leveraging WebGPU to perform inference on available GPUs for faster response generation. Real-Time Response Generation: The model streams its response as it is generated, token by token, and WebLLM Chat displays these results incrementally. This streaming capability allows users to interact with the model in real-time. Inference Output & Structure Standard Chat Output: By default, Llama 3.2 provides plain text responses suitable for typical chat-based interactions. The responses are detokenized and presented back to the user in a natural language format. Structured Outputs (JSON Mode): If specific structured data is required (e.g., formatted as JSON), WebLLM Chat can be configured to return such responses. This is particularly useful if you want to use WebLLM to respond to complex queries where the data needs to be formatted (e.g., a structured list, a dictionary of items, etc.). Generating structured output can be part of the model’s behavior if it has been fine-tuned for it. Depending on your model's performance, you may need to validate the structured outputs in the interface. Lifecycle Management Lifecycle Management & Caching: Model weights and configurations are cached locally after the initial load, improving efficiency for subsequent interactions. Web Workers manage computations to ensure smooth inference without interrupting the chat's responsiveness. Mermaid code for the diagram 1 graph TD A[Web Application] <-->|Real-Time User Input/Output| B[WebLLM] B <-->|Model Management, Tokenization & Inference Requests| D[MLC] D <-->|Compiled & Optimized Computation Tasks for GPU/CPU| C[WebGPU] C -->|Delegate to Hardware| E[Discrete GPU] C -->|Or Fallback to CPU| F[Fallback to CPU] E -->|Execution Results| C F -->|Execution Results| C C -->|Computation Results| D D -->|Inference Results| B B -->|Detokenization & User Output| A style A fill:#FFA07A,stroke:#333,stroke-width:2px,color:#000000 style B fill:#A0D8EF,stroke:#333,stroke-width:2px,color:#000000 style C fill:#FFD700,stroke:#333,stroke-width:2px,color:#000000 style D fill:#98FB98,stroke:#333,stroke-width:2px,color:#000000 style E fill:#DDA0DD,stroke:#333,stroke-width:2px,color:#000000 style F fill:#DDA0DD,stroke:#333,stroke-width:2px,color:#000000 classDef default stroke:#333,stroke-width:2px,color:#000000 Mermaid code for the diagram 2 graph TD A[Web Application] <-->|User Input & Output| B[WebLLM Chat Interface] B <-->|Tokenization & Inference Requests| D[MLC LLM Engine] D <-->|Optimized Computations for GPU/CPU| C[WebGPU Interface] C -->|Delegate Computations| E[Discrete GPU] C -->|Fallback to CPU| F[CPU Processing] E -->|Execution Results| C F -->|Execution Results| C C -->|Compute Results| D D -->|Inference Results| B B -->|Streamed Responses| A %% Note on Validation for Structured Outputs B -->|If Required: Validate & Reprocess| G[Validate Structured Output] %% Styling the Nodes for Clarity style A fill:#FFA07A,stroke:#333,stroke-width:2px,color:#000000 style B fill:#A0D8EF,stroke:#333,stroke-width:2px,color:#000000 style C fill:#FFD700,stroke:#333,stroke-width:2px,color:#000000 style D fill:#98FB98,stroke:#333,stroke-width:2px,color:#000000 style E fill:#DDA0DD,stroke:#333,stroke-width:2px,color:#000000 style F fill:#DDA0DD,stroke:#333,stroke-width:2px,color:#000000 style G fill:#FF6347,stroke:#333,stroke-width:2px,color:#000000 classDef default stroke:#333,stroke-width:2px,color:#000000
6 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
1. Identifying and Analyzing Distraction Points AI Agents tend to go astray, especially in the design phase, because of variety of reasons namely: Spending excessive time on non-critical or low-priority sub-tasks that don't directly contribute to the agent’s primary objectives. Engaging in excessive reflection on secondary components that do not directly impact task performance. AI agents can struggle to resolve conflicting or ambiguous information, which may lead to confusion or suboptimal decisions. Human oversight, complemented by automated monitoring tools, plays a critical role in recognizing when an AI agent becomes distracted or sidetracked, particularly during over-reflection or pursuit of tangential paths. In a Human-in-the-Loop (HITL) framework, humans can intervene at various stages of the agent’s reasoning process to detect these moments of distraction and correct the course. When identifying and analyzing distraction points in an AI agent's reasoning process, humans should employ critical thinking strategies such as chunking, hierarchical organization, pattern recognition, and abstraction. Chunking allows humans to break the agent's complex reasoning flow into manageable segments, enabling easier identification of specific areas where distractions may occur. Using a hierarchical structure to organize these segments further reduces cognitive load, allowing humans to focus on fewer elements at a time. This top-down decomposition helps in isolating distraction points within specific parts of the flow, ensuring that humans aren't overwhelmed by the entire process at once. Within this structured hierarchy, pattern recognition becomes more efficient, as humans can more easily spot recurring behaviors or common points where the agent gets sidetracked. Once these patterns are identified, they can be further simplified through abstraction , which allows common flows to be combined, reducing the number of individual flows that need to be addressed. This abstraction not only streamlines the analysis but also makes it easier to apply corrective measures across similar distraction points. The process is iterative, meaning that as human operators gain more insight into where distractions occur, they can refine the hierarchy and segmentation, improving their conceptualization of the agent's flow over time. Domain knowledge and task frequency further guide this process, helping humans prioritize flows that occur often or are critical to the agent's objectives. This combination of techniques allows humans to efficiently detect, analyze, and address distractions in the agent’s reasoning flow, such as repeatedly choosing low-priority actions or misinterpreting key goals. 2. Setting Rule-Based Guidance to Avoid Stuck States Once distraction or misprioritization points are identified, humans can design specific rule-based guidance, such as thresholds or priority rules, to help the AI agent stay focused on its core objectives. Rule-based systems act as guardrails, providing structure that prevents the agent from becoming distracted by irrelevant or low-priority tasks, keeping it aligned with core objectives. Defining Action Thresholds : Humans can set reflection thresholds based on task complexity or importance to ensure the agent moves forward after a reasonable amount of time spent on reflection. Task Prioritization Rules : Task Prioritization Rules: Humans can encode priority rules, such as task-scoring systems or hierarchical structures, to help the agent distinguish between critical and tertiary tasks. For example: “Always prioritize goal completion over error correction unless the error is critical to task success or safety.” “Allocate a percentage of resources to core tasks, adjusting dynamically based on task importance, with minimal resources allocated to secondary tasks.” Timeout Mechanisms : Rule-based timeouts can be implemented to ensure that agents do not spend too long on low-priority tasks. If an agent is stuck reflecting on a minor issue for too long, the system can trigger a timeout, prompting the agent to either stop and reassess its priorities or initiate predefined fallback actions. Flow-Based Rules : For specific workflows, humans can create step-by-step rules to keep the agent focused on the main flow of the task, ensuring progression toward the ultimate goal. These rules can guide the agent through key stages, ensuring it progresses toward the ultimate objective even if it encounters distractions. If the agent starts deviating from the intended flow, these rules can nudge it back on track. Humans can establish hierarchical guiding principles that break down constraints from general to specific, helping the agent focus on fewer constraints at a time, thereby reducing the complexity of its reasoning process. By analyzing recurring patterns where the agent tends to get stuck, humans can design preemptive rules that directly address these distraction points based on past observations. By using abstraction, humans can generalize rules across multiple workflows, enabling the agent to apply the same principles in various contexts with minimal adjustments. For instance, rather than addressing each case where the agent becomes sidetracked, abstracted rules can encompass a range of similar scenarios, allowing the agent to handle recurring distractions with minimal human intervention. 3. Managing Excessive or Inappropriate Tool-Calling Excessive or inappropriate tool-calling is a significant challenge for AI agents, particularly in workflows requiring interaction with external systems, where internal reasoning might be more efficient or appropriate. Overuse of tools can lead to inefficiencies such as wasted computational resources, increased latency, or distraction from the agent's primary objectives, much like over-reflection. Tool-Calling Limits : Humans can set dynamic limits on how often an agent can call a tool within a given period, adjusting these limits based on task context or performance feedback to ensure optimal efficiency. This prevents agents from wasting computational resources and time by repeatedly calling tools when internal reasoning could provide a quicker or more efficient solution, based on predefined criteria. Contextual Tool Use : Humans can establish rules to define appropriate contexts for tool usage, such as setting task-specific thresholds or constraints based on complexity or resource requirements. This teaches the agent when a tool is necessary and when it should rely on its own reasoning, either through rule-based systems or by training the agent with reinforcement learning techniques. Fallback Mechanisms : If an agent calls a tool and fails to progress, rule-based fallback mechanisms can interrupt the cycle, prompting the agent to escalate the issue by requesting human feedback, switching tools, or reverting to internal reasoning based on predefined criteria. Effective management of excessive or inappropriate tool-calling by AI agents requires a series of critical thinking strategies. Humans must evaluate when tool use is necessary, guiding the agent in recognizing when to rely on internal reasoning, while also ensuring that both tools and reasoning mechanisms are optimized for task efficiency. Analytical thinking is crucial in evaluating the LLM output against the task's goal, focusing on aspects such as accuracy, relevance, and completeness to ensure alignment with task objectives. By breaking down the output into its core components and comparing it with the goal criteria, critical thinkers can assess its accuracy, completeness, and relevance. This ensures the output is both factually correct and aligned with the task's objectives. Further, reasoning allows humans to pinpoint where the output falls short—whether through missing information or failure to meet specific goals. Recognizing these gaps is essential for determining the use of appropriate tools or identifying changes needed in the tool’s design or the agent’s reasoning framework. This involves assessing the strengths and limitations of available tools, weighing factors such as efficiency and relevance, and determining which tool is best suited to fill the identified gaps. Once a tool is selected, problem-solving also applies when the tool fails, such as when it delivers incomplete data, slow processing, or incorrect results. Critical thinkers must diagnose whether the failure is tool-related or task-specific and determine appropriate fallback actions, such as switching to a different tool or reverting to internal reasoning. Finally, reasoning aids in selecting the best fallback option by comparing the effectiveness and potential of various strategies, often requiring real-time analysis to minimize workflow disruption. By considering multiple approaches—whether retrying, switching tools, or returning to internal logic—critical thinkers ensure that the agent remains on track toward its goal, despite obstacles in the workflow. 4. Generating Synthetic Data for Training Agentic Reasoning and Tool-Calling In addition to rule-based guidance, synthetic data—such as simulated task scenarios or artificially generated datasets—can be used to train AI agents on how to reason through tasks effectively. By simulating complex, domain-specific scenarios and potential distractions, synthetic data helps the agent learn to prioritize key tasks and balance internal reasoning with external tool use. Scenario Generation : Synthetic data can simulate edge cases—such as rare, anomalous, or highly complex situations—where the agent might become distracted or misuse tools. This allows the agent to learn how to identify important objectives, avoid over-focusing on irrelevant details, and generalize these lessons across various types of tasks and distractions. Tool-Calling Optimization : Training agents on synthetic data can improve their understanding of when to call external tools (e.g., APIs or databases) by simulating conditions and thresholds that define when tool use is necessary or redundant. The agent can learn: To call tools only when required for task completion. To avoid excessive or redundant tool calls that waste resources or introduce delays. Balancing Reflection and Tool Use : Training the agent on scenarios that combine reflection and tool use helps it develop a better understanding of when reflection should lead to action or tool invocation, optimizing decision-making through reinforcement learning or iterative feedback. Synthetic data can be used to fine-tune models and expand the agent's reasoning abilities, improving its management of excessive or inappropriate tool-calling. Critical thinking skills such as deductive reasoning , combined with domain-specific expertise, play a critical role for humans in guiding AI tools for scenario generation, allowing the simulation of edge cases where the agent might get distracted or misuse tools. These scenarios help the agent learn to prioritize important objectives by incorporating factors like time constraints and task hierarchies, guiding the agent away from irrelevant details. Synthetic data has been playing a significant role in the recent advancements of foundational models. Analytical thinking is necessary to assess how well the agent responds to these scenarios, often through a combination of automated performance metrics and human analysis, to identify areas for improvement. Developing scenarios that teach efficient decision-making without over-relying on tools or reflection, especially through iterative design and feedback, helps the agent optimize its reasoning process.
6 min read
authors:
Rohit AggarwalRohit Aggarwal

Article
Before understanding why many GenAI projects fail, it may not hurt to have an overview of what these models are capable of in simple terms. GenAI models, like ChatGPT, have been trained on vast datasets that include diverse publicly available text sources, such as Wikipedia, websites, books, and other digital texts. These models can answer a wide range of questions based on this knowledge, as you may have experienced when using ChatGPT. Beyond leveraging GenAI's general knowledge, you can also provide your own context (i.e., your specific knowledge, data, or documents) with your instructions, asking GenAI to extract information or generate output based on that context. With the right instructions and context, GenAI can produce outputs that closely resemble human decision-making. While GenAI’s potential is significant, its success depends on how it's deployed, and many projects fail because of misunderstandings about its capabilities and proper usage. 1. Approaching GenAI as Process Automation Instead of Process Redesign: A common pitfall in deploying GenAI is that organizations often treat it like traditional automation, where predefined processes are automated step-by-step. However, GenAI isn’t just about automating existing processes; it has the potential to transform workflows, especially in areas requiring human decision-making and creativity. When businesses focus solely on automation and fail to redesign processes, they miss the opportunity to unlock GenAI’s real potential—augmenting human intelligence. Traditional systems analysis and design approaches focused on capturing predefined, deterministic workflows meant to be used by humans. Now, with GenAI augmenting many human decision-making steps, new workflows must be designed to accommodate this shift. For example, when generating customer support FAQs, a customer executive previously had to manually review past support tickets, reference existing FAQs, and create new ones. In the GenAI paradigm, such interfaces for manual searching are no longer necessary. Instead, the focus should be on determining what information to retrieve from past customer support tickets via APIs and integrating GenAI models to automatically analyze and generate FAQs. Rather than creating interfaces for humans to sift through data, the emphasis should be on designing systems that allow GenAI to access and process information directly from databases or APIs. For instance, you can develop an application where GenAI models are connected to your customer support database, allowing them to automatically extract common issues, analyze sentiment, and generate draft FAQs or support documents. The human role then shifts to reviewing and refining the outputs generated by GenAI, ensuring they meet quality standards and align with the company’s messaging. This paradigm shift requires businesses to redesign their workflows to be more GenAI-centric. Processes should be built around the capabilities of GenAI, leveraging its strengths in data processing and content generation. If you're unsure how to plan for GenAI systems, you may want to go through the GenAI planning framework here . 2. Over-Reliance on a Single Inadequate Prompt to Handle Complex Human Decision-Making: There is often an over-expectation that simply feeding AI a prompt will produce sophisticated decision-making outputs, underestimating the complexity of human judgment. In many decision-making processes, humans navigate through multiple steps, often using their intuition and expertise to assess situations in parts. To replicate this with AI, the decision-making process often needs to be broken down, with humans providing clear step-by-step instructions for the AI to follow. Stepwise prompting involves tackling the entire multi-step process within a single prompt. For moderately complex tasks, this can be efficient as it reduces the need for multiple interactions with the model. However, for more complex tasks that involve numerous steps and intricate explanations, it may be less effective. In such cases, you may need to break the task into a sequence of sub-tasks, using different prompts for each sub-task. This process is called Prompt chaining . You can read more about it here . Additionally, besides providing clear instructions, offering a few examples to the model can help it better understand your expectations. This technique is known as Few shot prompting or In-context learning . You can read more about it here . Integrating all these ideas in a prompt can get complex and overwhelming. We have found Metadata prompting to be highly effective for handling this complexity by separating concerns: first, focusing on explaining the instructions while assuming that all variables or constructs are predefined, and then later explaining those variables/constructs in detail. You can read more about it here . Even after following these best practices, issues may still arise—such as using semantically incorrect words, introducing modifier ambiguity, or mixing up the order of instructions. You can learn how to systematically iterate and improve prompts to resolve these issues here . Another reason to use Prompt chaining is the generation capacity of AI models. While these models can process large amounts of input, their ability to generate content is more limited. If you attempt to generate content covering too many topics at once, the quality may degrade, turning into overly simplistic or "listicle" style outputs. To maintain high-quality results, it’s often necessary to break tasks into smaller sub-tasks, use more focused prompts for each sub-task, reduce the cognitive load on the AI, and ensure better results for each segment of the task. 3. Assuming Models Need to Be Fine-Tuned Rather Than Using Proper Prompting Techniques A significant misunderstanding is the assumption that GenAI models need to be fine-tuned for every task. This often leads organizations to unnecessarily commit to fine-tuning, adding substantial cost and complexity to their AI projects. In reality, many tasks can be handled effectively using advanced prompting techniques such as Metadata prompting , Few-shot learning , Stepwise prompting , and Prompt chaining —all without the need for fine-tuning. Fine-tuning models not only slows down the process by requiring the creation and management of training data but also complicates the deployment and inference stages. Teams often work independently on fine-tuning models, resulting in fragmented efforts, when in fact, using a common base model with heterogeneous adapters could allow for better resource utilization and system flexibility. Heterogeneous Parameter-Efficient Fine-Tuning (PEFT) adapters can be applied in batches to various models, optimizing resource usage. Read more about using heterogeneous PEFT adapters here . 4. Over-Reliance on AI Autonomy A growing trend in AI development is designing autonomous AI agents using approaches like ReAct with Reflection, Look-Ahead Task Sequencing (LATS) , and others. While these designs can be effective for certain use cases, such as simple Q&A using tools like Google Search when retrieval-augmented generation (RAG) systems are insufficient, they pose challenges when applied to more complex tasks that require nuanced reasoning and decision-making. When GenAI systems are used to augment human decision-making, over-reliance on these agentic designs can lead to several issues: the models may over-reflect, lose focus, struggle to differentiate between critical and irrelevant details, or overuse external tools rather than leveraging their own knowledge. As a result, costs can rapidly escalate, and the quality of output on complex tasks often deteriorates. To mitigate these issues, well-thought-out guardrails and interventions are necessary. These guardrails help define task scope, keep AI models on track, and improve governance, reducing risks associated with unmonitored AI autonomy. Without these measures, autonomous AI systems may underperform on complex tasks and fail to deliver the expected value. 5. Misaligned Goal Setting and Success Metrics for New GenAI Processes GenAI enables entirely new processes that were previously impossible or too resource-intensive for traditional AI or human-driven systems. These innovations include automated creative content generation, context-aware conversational agents, and intelligent document synthesis (e.g., creating detailed reports, legal contracts, or tailored marketing content based on minimal inputs). GenAI can also facilitate dynamic decision-making by generating and iterating on multiple solutions in real-time, which traditional AI systems cannot handle effectively without substantial human input. However, this new potential presents a significant challenge: defining appropriate goals and success metrics. Organizations often struggle to set realistic objectives that take full advantage of GenAI’s strengths because they are anchored in conventional process thinking. Since GenAI can fundamentally change how work is performed, companies must redefine what success looks like and select goals that offer the highest ROI. Misunderstanding or underestimating these possibilities often leads to poorly chosen objectives and a failure to fully realize the impact GenAI could offer. 6. Lack of Context Provided in Prompts One of the major reasons GenAI projects fail is the lack of adequate context in prompts. While humans draw on years of experience, domain knowledge, and exposure to various settings to interpret ambiguous information, GenAI models rely solely on the data they’ve been trained on and the explicit details provided in prompts. When important context is missing, AI models may generate responses that are too vague, irrelevant, or even incorrect. For instance, planning a weekly social media calendar requires tacit knowledge of what types of posts work best and which ones underperform within a specific industry domain. Without this background and context, GenAI systems may struggle to generate high-quality content, leading to generic, low-engagement posts. By incorporating tacit industry knowledge into the prompt, you can guide the AI to create more relevant and impactful content. You can learn how to include tacit knowledge in prompts for more effective results here .
5 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
1. Asking model to explain its reasoning Asking an AI model to generate explanations for its labels or recommendations can significantly enhance output quality by promoting deeper reasoning and analysis. This approach, closely related to chain-of-thought prompting, encourages the model to articulate its decision-making process, which can reveal and potentially correct flaws in its reasoning. By requiring explanations, the model is pushed to engage in more thorough contextual understanding and align its thinking more closely with human-like reasoning patterns. This process can help mitigate biases, improve transparency, and ultimately lead to more thoughtful, well-justified outputs. Additionally, the act of explaining can reinforce the model's grasp of concepts and relationships, potentially improving its performance over time. This technique not only enhances the model's ability to handle complex tasks but also provides valuable insights into its decision-making process, fostering greater trust and understanding between AI systems and their users. 2. Iterate and Refine The "Iterate and Refine" guideline in prompt engineering highlights the necessity for continuous testing, evaluation, and enhancement of prompts used with AI models, like ChatGPT, to optimize response efficiency and accuracy. This iterative process involves experimenting with various prompts, analyzing AI responses, and refining prompts based on performance to improve response quality and relevance gradually. Acknowledging the trial and error involved is essential, as crafting the perfect prompt often requires multiple attempts due to the complexities of human language and AI interpretation. Initial attempts may not fully convey the needed context or specificity, necessitating prompt adjustments. Ask for LLM's understanding of the prompt Start by ensuring the AI comprehends your prompt correctly. This step involves not just asking for understanding, but also an iterative refinement process. Here's the expanded process: a. Initial Query: Use this specific prompt to get the AI's initial understanding: Provide your understanding of the following prompt for an AI tool: b. Analyze the Response: Carefully review the AI's explanation of your prompt. Look for any misinterpretations, gaps in understanding, or areas where the AI's interpretation doesn't align with your intent. c. Iterative Refinement: Use the edit option to change your original prompt. Update the prompt to incorporate better wordings or explanations you see in the AI's output. When you save the updated prompt, the AI will give you another explanation. Review this new explanation carefully. d. Decision Point: If you see the need for further minor changes, repeat the process from step c. If you're satisfied with the AI's understanding and feel no further changes are necessary, proceed to the next step in the prompt engineering process. It may take you 2-3 iterations to fix your prompt. This iterative refinement within the first step is crucial because it allows you to: - Gain insights into how the AI interprets your language - Incrementally improve your prompt based on the AI's feedback - Ensure a solid foundation of mutual understanding before moving on to more complex refinements An alternative to manual intervention is to let LLM handle the rewrite. Once you've confirmed that the AI's understanding is good and it hasn't misconstrued or deviated much from your intent, ask it to improve the prompt: Rewrite the prompt to make it better Or Evaluate the structure of the following content, focusing on improving its organization and presentation. Avoid adding or suggesting new information--your task is to reframe the existing content for better clarity and flow. This collaborative approach can lead to unexpected insights and refinements. Addressing potential uncertainties Next, address any subjectivity or unclear elements in your prompt that could lead to unreliable results, especially when the context might differ from your test cases. Identify potential uncertainties You can begin with using LLM to help you identify uncertainties in your instructions by using the prompt given below. This question helps you pinpoint areas where your prompt might be open to interpretation or lacking specificity. Is there any subjectivity in the prompt or something unclear for an AI tool Asking LLM to guess answers for uncertainties Instead of manually addressing how to make instructions more specific for identified uncertainties, you can ask the LLM to make educated guesses about potential answers or solutions. This approach capitalizes on the model's advanced capabilities, potentially saving you time and effort. By using the prompt given below you're essentially outsourcing part of the problem-solving process to the AI. This not only helps in generating potential solutions but also provides insights into how the model might interpret and respond to ambiguities in your prompt, further informing your refinement process. Make your best guess and try to answer subjectivities you identified in the last response LLM can overdo and list frivolous points at times, besides mostly great feedback on uncertainties. You would want to filter good points from the rest. Ask LLM to rewrite prompt to address uncertainties Based on the insights gained, you can ask LLM to rewrite your prompt to address the identified issues. Remember that the AI might overdo it and list some frivolous points alongside mostly great feedback. You may need to mention specific points that you want to be incorporated leaving the rest. By selectively incorporating points, you prevent the prompt from becoming overly complex or veering off-track due to the AI's tendency to sometimes over-elaborate. Rewrite the prompt to address the following points: - point 1 - point 2 While these guidelines provide a solid foundation, don't hesitate to experiment with different phrasings, structures, and approaches. Each use case may require unique tweaks to achieve optimal results. By combining thoughtful design with systematic testing and refinement, you can create highly effective prompt templates that maximize the capabilities of LLMs in your workflow.
4 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
Zero Shot Prompting Zero-shot prompting is a technique used with Generative Pre-trained Language Models (LLMs) like GPT (Generative Pre-trained Transformer) that enables the model to undertake tasks it hasn't been explicitly trained on. It involves presenting a task to a language model without any task-specific examples or training. The model is expected to understand and execute the task based solely on its pre-existing knowledge and the general instructions provided in the prompt. We communicate with the model using a prompt that explains what we want to achieve. The model uses its pre-trained knowledge, acquired from a vast amount of text data, to infer the best way to complete the task. This capability is pivotal for several reasons: Versatility and Adaptability: It allows models to handle a wide range of tasks without the need for fine-tuning or retraining, making them highly versatile and adaptable to new challenges. Whether it’s sentiment analysis, summarization, or question-answering, the model adapts to the prompts provided. Cost Efficiency: Reducing the necessity for large, annotated datasets for every new task saves significant resources in data collection and annotation. Generalization: Demonstrates the model's ability to generalize from its training data to new, unseen tasks, highlighting its understanding of language and concepts. Example of Zero-Shot Prompting Let's consider the task of sentiment classification. Here's how you would set up your prompt: Task: Sentiment classification Classes: Positive, neutral, negative Text: "That shot selection was awesome." Prompt: “Classify the given text into one of the following sentiment categories: positive, neutral, negative.” The model's response would likely be "positive" because it has learned from its training data that the word "awesome" is associated with positive sentiment. Few Shot Prompting Few-shot prompting is a technique used to guide large language models (LLMs), such as ChatGPT and Llama, to perform specific tasks or understand particular contexts using only a small number of examples. In few-shot prompting, you provide the model with a few carefully selected examples (typically between 2 and 10) that demonstrate both the input and the desired output of the task. These examples help the model infer the pattern or context of the task, which it then attempts to generalize to new, unseen inputs. It's important to note that the model does not update its internal weights during few-shot prompting. The model temporarily "learns" or infers patterns from the provided examples but discards this information once the interaction is over. Example 1: Input: "Do you have the latest model of the XYZ smartphone in stock?" Response: "Thank you for your inquiry. Yes, we have the latest XYZ smartphone model available. Would you like to place an order?" Example 2: Input: "Is the ABC laptop available in your store?" Response: "Thank you for reaching out. The ABC laptop is currently out of stock, but we expect new shipments to arrive next month. Can we notify you when it's available?" Your task: Input: "Can you tell me if you have the DEF headphones in stock?" Response: In this scenario, the model is provided with two examples of customer inquiries regarding product availability, along with the corresponding email responses. In the first example, the product is in stock, and the response includes an offer to place an order. In the second example, the product is out of stock, and the response offers to notify the customer when it becomes available. When the model is tasked with generating a response to a new inquiry about DEF headphones, it applies the pattern observed in the previous examples to craft an appropriate reply. This might involve confirming the product's availability and suggesting next steps if it's in stock, or explaining that the product is out of stock and offering alternatives or a notification service. This approach enables the model to understand the context of customer service in a business setting and to generate responses that are both relevant and considerate of the customer's needs. Exemplars (Examples) Exemplars are specific instances or examples that demonstrate how a task should be performed, helping to train or guide machine learning models, especially in few-shot learning scenarios. Here's how few-shot prompting can be approached using exemplars for a business-related task, such as drafting email responses to customer inquiries about product availability, while paying attention to avoiding common pitfalls: Ensure Exemplar Consistency: All exemplars should follow a consistent format and structure. This consistency helps the model to understand the task and apply its learning to new inputs effectively. Select Relevant Exemplars: Choose exemplars directly related to the task at hand. Irrelevant exemplars can confuse the model, leading to inaccurate outputs. Diversify Your Exemplars: To give the model a broad understanding of the task, include a range of exemplars that cover various scenarios and outcomes related to the task. This diversity helps the model handle different inputs more effectively. Keep Exemplars Simple and Clear: While it's important to capture the complexity of the task, overly complicated exemplars can confuse the model. Aim for clarity and simplicity to ensure the model can easily learn from the examples provided. Optimize the Number of Exemplars: Balance is key. Too few exemplars may not provide enough information for the model to understand the task, while too many can overwhelm it. Adjust the number of exemplars based on the task's complexity and the model's performance. Incorporate Contextual Clues in Exemplars: Providing clear instructions and relevant context within your exemplars is crucial. These clues help the model to understand the task better and generate more accurate outputs. Many-shot Prompting Many-shot prompting is a variant of few-shot learning where, instead of using a handful of examples (e.g., around 10), you use several hundred examples (e.g., 500-800). Models with large context windows, such as Gemma, can accommodate many examples in a single prompt. However, a significant downside of utilizing such large context windows is the increased computational cost and slower inference times. With this many examples, it may be more efficient to fine-tune the model directly, avoiding the repeated cost of processing large context lengths during every inference. In-Context Learning In-Context Learning refers to a large language model's ability to perform tasks by interpreting examples provided in the input prompt, without updating its internal parameters. Few-shot prompting and many-shot prompting are both forms of in-context learning. Despite the term "learning," the model doesn't actually update its weights or retain information beyond the current interaction. Instead, it temporarily infers patterns or rules from the examples in the prompt but discards this inferred knowledge once the interaction concludes. Metadata Prompting Metadata prompting is an approach designed to simplify and streamline the process of instructing large language models (LLMs). It applies principles of modularity and separation of concerns to prompt engineering, enhancing the effectiveness of communication with LLMs. Traditionally, prompts often combine task descriptions with explanations of various entities involved, resulting in complex and cluttered instructions. The core principle of metadata prompting is to separate the task description from entity explanations. It encourages users to start by clearly defining the main task, using all necessary entities without worrying about explaining them. To distinguish entities within the task description, they are enclosed in backticks (`). This allows for a focused and concise task description while clearly marking which terms will be explained later. After the task is clearly defined, each entity that requires explanation is described separately in JSON format. The entity names serve as keys, with their explanations as corresponding values. This structured approach offers several benefits: It creates a clear separation between the task description and entity explanations. It makes prompts easier to understand, modify, and maintain. It helps visualize connections between different parts of the task more effectively. It reduces clutter in the main task description. It introduces modularity, allowing for easier updates and reuse of entity explanations across different prompts. By structuring prompts in this way, metadata prompting aims to create more efficient, readable, and adaptable instructions for AI models, ultimately improving the quality of AI-generated outputs and making the process of working with LLMs more user-friendly. Taking an example, let's consider a situation where a user wants to assign custom tags to each paragraph in an extensive document. Given the limitations on the token size that an LLM can handle, the document would need partitioning into segments. Yet, for every segment, crucial context like the document's title, headings, and preceding paragraphs must be provided. Traditional prompting methods might fall short here, as LLMs could have difficulty discerning metadata from the main content. In contrast, Metadata prompting offers a more straightforward communication method. Tag each of `target-paragraphs` with one of the `tags` considering `article-title`, `headings` and `preceding-paragraphs`. tags: """ tagA: definition of tag A tagB: definition of tag B """, article-title: """Article title""", headings: """ h1: heading with type Heading 1 h2: heading with type Heading 2 """ preceding-paragraphs: """Provide 2 paragraphs that come before the target paragraphs to give more context""" target-paragraphs: """Provide the paragraphs you want the task to summarize""" Using impressive NLU and in-context learning abilities of LLMs, AI agents typically use text as an interface between components to plan, use external tools, evaluate, reflect, and improve without additional training. Chain of Thought (CoT) prompting Chain of thought prompting is a technique used to encourage language models to break down complex problems into a series of smaller, interconnected steps or thoughts, mimicking the way humans reason through problems. A language model is prompted to generate a series of short sentences that mimic the reasoning process a person might employ in solving a task. The process involves three main steps: Step-by-step reasoning: Instead of directly providing the final answer, the model generates a series of intermediate reasoning steps that guide it towards the solution by breaking down the problem into smaller, more manageable parts. Intermediate outputs: At each step, the model generates an intermediate output that serves as a building block for the next step in the chain of thought. These outputs can be partial solutions, relevant information, or logical connections. Final output: After generating the intermediate steps, the model combines the information to produce the final answer or solution to the original prompt. There are several approaches to prompting a model to generate intermediate reasoning steps in a chain of thought. The most common and the one used in the original paper by Wei et al. (2022) is few-shot learning. In this approach, the model is provided with a few examples of problems along with their corresponding chains of thought and final answers. The model learns from these examples and applies the same reasoning pattern to new, unseen problems, relying on its ability to generalize from a small number of examples. In their experiments, Wei et al. (2022) provided the model with examples of problems, each demonstrating the step-by-step reasoning process. For instance: Source: Paper link Note: A good read for automating picking examplers, Auto-CoT: Paper link , Good summary article When presented with a new question, the model uses these examples as a reference to generate its own chain of thought and final answer. The authors found that this few-shot learning approach led to significant improvements in the model's performance on various reasoning tasks, including arithmetic, commonsense reasoning, and symbolic manipulation. The generated chains of thought also provided valuable insights into the model's reasoning process, making its outputs more interpretable and trustworthy. Typical implementation: Question 1 to n are the few shot exemplars with their respective Reasonings and Answers. Question: {question 1} Reasoning: Let's think step-by-step. {reasoning 1} Answer: {answer 1} ... Question: {question n} Reasoning: Let's think step-by-step. {reasoning n} Answer: {answer n} Question: {question 1} Reasoning: Let's think step-by-step. Other approaches to prompting a model to generate intermediate reasoning steps include: 1. Zero-shot Chain of Thought: By appending the phrase "Let's think step by step", "Break down your reasoning into clear steps", or "Take a deep breath and work on this problem step-by-step" to the original prompt given to the model, it encourages the model to break down its reasoning process into a series of logical and intermediate steps rather than attempting to reach the final answer in one leap. 2. Structured prompts: Prompts that include placeholders for intermediate reasoning steps, which the model is trained to fill in along with the final answer. For instance, a prompt might be structured as follows: Question: [Original question] Step 1: [Placeholder for first reasoning step] Step 2: [Placeholder for second reasoning step] ... Step N: [Placeholder for final reasoning step] Answer: [Placeholder for final answer] The model is trained to fill in the placeholders with relevant intermediate steps and the final answer. How is it Different from Standard Prompting? Standard prompting might involve asking a model a direct question and receiving a direct answer, without any explanation of the steps taken to reach that answer. CoT prompting, on the other hand, explicitly asks the model to show its work, providing a step-by-step breakdown of its reasoning. This not only leads to more accurate answers in many cases but also provides an explanation that can be helpful for users to understand the model's thought process. Business Example: Enhancing Customer Support with RAG and CoT Consider an online retailer implementing a chatbot equipped with RAG and chain of thought prompting to handle customer inquiries. A customer asks a complicated question about a product's features, compatibility with other devices, and return policy. Logical Processing: Through chain of thought prompting, the chatbot first breaks down the query into sub-questions: What are the product's key features? Which devices are compatible? What is the return policy? Retrieval: For each sub-question, the chatbot sequentially processes the information, starting with product features, moving to compatibility, and finally addressing the return policy. At each step, it synthesizes information from the retrieved documents and previous reasoning steps. Final Response: The chatbot compiles its findings into a comprehensive response that clearly explains the product's features, compatibility with specific devices, and return policy, offering a detailed and helpful answer to the customer's inquiry. This example illustrates how chain of thought prompting in RAG transforms the way LLMs handle complex queries, enabling them to provide more accurate, detailed, and contextually relevant responses. By mimicking human-like reasoning and adaptability, this approach significantly enhances the capabilities of AI in business applications, particularly in areas requiring deep understanding and nuanced responses. Other Prompting types Stepwise prompting–Please read here Prompt chaining–Please read here
8 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
A well-structured approach is essential when planning an AI system, balancing high-level conceptualization with practical implementation details. Creating a high-level roadmap for an AI system is a crucial step in project planning. This process begins with a critical first step: clearly defining your system's goals, inputs, and outputs. This foundation will guide all subsequent development decisions and help ensure your AI system effectively addresses its intended purpose. It's important to understand that this planning process is inherently iterative. In your first attempt, you'll likely encounter many unanswered questions and gaps in your understanding. This is normal and expected. The key is not to get bogged down by these uncertainties initially. Instead, focus on completing the following steps, even if there are gaps in your thought process. Your initial roadmap will be imperfect, but it will provide a good starting point. The idea is to iterate over the planning phase multiple times. With each pass, your understanding of the problem will get better, and your design will improve. You'll start with a broad structure, intentionally ignoring gaps and unanswered questions at first. The important part is to try creating basic plan despite your uncertainties. When you feel stuck, focus on what you already know and can act upon. Keep moving forward with the aspects of your plan that are clear and defined. This approach maintains momentum and often leads to insights about the less certain parts of your project. As you work, keep a separate list of uncertainties, unknowns, and areas that need further investigation. This allows you to track uncertainties without letting them halt your progress. Regularly review and update this list as you gain new information. Also, consider temporarily reducing the scope of your project or outcome. Focus on creating a simplified version that captures the core essence of what you're trying to achieve. This "minimum viable product" approach allows you to make tangible progress and gain valuable insights. As you complete this scaled-down version, you'll develop a better understanding of the project's complexities and challenges. From there, you can gradually expand the scope, adding more components or advanced features in a controlled, iterative manner. Each iteration allows you to fill in more details, address previously identified gaps, and incorporate new insights. Step 1: Define the Goal, Input, and Output Start by clearly articulating the main objective of your AI system, including the broad input it will process and the output it will generate. This step sets the foundation for your entire project. Defining the Goal: What the AI system will accomplish. When crafting your goal, ensure it's specific, actionable, and aligned with the overall problem you're trying to solve. A well-structured goal outlines the system's purpose or action and states its intended outcome. Consider the problem you're solving, who you're solving it for, and any key constraints. Here is a template to write the goal of your AI system: Create an AI system that [performs specific function] on [inputs] and produces [output]. Defining the Input: What information the AI system will need to achieve that goal. This process involves determining what data is relevant, and available. Input data may be structured (e.g., spreadsheets, databases, CSV files) or unstructured (e.g., PDFs, word documents, emails, websites, customer chats). Many times, you may need to extract features from the raw data and it can be one of the steps in your AI system plan. At this stage, you don't have to think about the needed feature engineering details, rather the scope is limited to understanding what data is needed, where it will come from and ensuring it's accessible. Your identified inputs should comprehensively cover all the AI system needs to perform its intended function effectively. Defining the Output: What the outcome of the AI system is–outcome content and format This includes determining the output format, deciding on the required level of detail, and considering how end-users will utilize the information. Plan for result interpretation and explanation, ensuring the output is actionable and understandable. The output definition should align closely with the system's goals and user needs, providing clear, relevant, and valuable information that directly addresses the problem the AI system is designed to solve. Stakeholder Involvement: Throughout this definition process, it's crucial to involve relevant stakeholders. This may include end-users, domain experts, managers, and other key decision-makers. Their input helps ensure that the system's goals, inputs, and outputs align with real-world needs and constraints. Stakeholders can provide valuable insights into: 1. The specific problems the AI system should address 2. The types of data available and any limitations in data access 3. The most useful forms of output for decision-making processes 4. Potential challenges or considerations in implementing the system By involving stakeholders early in the planning process, you will build something useful, avoid rework, and provide better ROI. This initial planning stage sets the foundation for your entire AI project. By clearly defining your goals, inputs, and outputs—with input from key stakeholders—you create a solid framework that will guide your development process and help ensure your AI system meets its intended objectives. Let's walk through a practical example of how to apply the principles discussed in this document. We'll consider a scenario where you've been tasked with creating a Generative AI system that takes a Microsoft Word document as input and generates a PowerPoint presentation from it. After consulting with your manager and relevant stakeholders, you've developed the following: Goal: Create an AI system that create powerpoint slides for a given book chapter Input: A book chapter in MS Word format. Output: Powerpoint slide for this book chapter Step 2: Identify main steps This process involves identifying the main steps that will guide your project from inception to completion. These steps should represent the major phases or milestones in your system's operation, providing a framework for more detailed planning later. There are a few effective approaches to identifying these steps, each offering unique advantages. Back-to-start : One method is to begin with the end goal in mind and work backwards. Visualize what your completed AI system should accomplish, then reverse-engineer the process to identify the major components or processes needed to reach that objective. As you do this, consider the logical sequence of these steps. Ask yourself: What needs to happen first? Which steps depend on the completion of others? Component Listing and Ordering : A second strategy involves listing out all potential components that you think may be needed, ordering them logically, and then refining the list by keeping or dropping components as needed. This more flexible, brainstorming-oriented approach can bring out creative solutions and help identify parallel processes or steps that don't necessarily fit into a strict linear sequence. Gap analysis : Another valuable approach is to analyze the gap between the steps identified so far, or between the latest identified step and the input. This gap analysis method can help uncover missing intermediate steps, ensure a logical flow from input to output, and reveal potential challenges or complexities that weren't immediately apparent. In practice, a combination of these approaches often yields the most robust and comprehensive planning process. By viewing the problem from different angles, you can potentially develop more innovative and effective solutions. Regardless of the method used, it's crucial to note that at this stage, you should not worry about how to implement these components or processes. The focus is solely on identifying what steps need to happen, not on determining how to make those steps happen. Implementation details will come later in the planning process. For now, concentrate on creating a high-level overview of the necessary stages in your AI system's development. Continuing with the example, let's walk through the thought process of deriving our main steps: Starting Point: Generate a PowerPoint Presentation We begin by considering our end goal: a PowerPoint presentation generated from a Word document. Question: What's the last step needed to achieve this? Answer: We need a step that takes finalized slide content and creates a PPT file. Step Identified: generatePPT Generating Slide Content Before we can generate the PPT, we need the content for each slide. Initial Thought: Can we simply feed the entire document to an LLM and ask it to generate slides? Problem Identified: This approach could be too complex for the LLM due to the "lost in the middle" problem with long contexts. Solution: Break down the task into smaller, more manageable steps. Breaking Down the Content Generation Task How can we make the task easier for the LLM? Idea: Instead of processing the whole document at once, we can work with smaller segments. Question: How do we determine these segments? Answer: We can use the document's structure, specifically its main topics. Identifying Document Structure Before we can segment the content, we need to understand its structure. Step Identified: extractTopics This step will analyze the document to identify main topics and their hierarchy. Segmenting Content Once we have the topics, we can divide the document content accordingly. Step Identified: extractContentSegment This step will associate relevant content with each identified topic. Generating Slide Content for Each Topic Now we can generate slide content more effectively, focusing on one topic at a time. Step Identified: generateSlideContent This approach allows the LLM to work with smaller, more focused chunks of information, potentially improving accuracy and relevance. Initial Data Extraction Realizing we need the document content before any processing can occur. Question: What's the very first step in our process? Answer: We need to extract the content from the Word document. Step Identified: extractContentFromWordDoc By walking through this thought process, we've arrived at a logical sequence of steps: extractContentFromWordDoc extractTopics extractContentSegment generateSlideContent generatePPT Step 3: Identify Inputs, Outputs, and Repetition for Each Step After identifying the main steps in your AI system, the next crucial task is to determine the inputs required, outputs produced, and repetition structure for each step. This process helps you understand the flow of data through your system and identify dependencies between steps, creating a clear data pipeline. It also helps with discovering any missing processes. For each step, specify: Inputs: What data or information does this step need to perform its function? Outputs: What results or transformed data does this step produce? Repetition: Does this step need to be repeated for multiple items or documents? When documenting this information, use a structured format. For example: step1Output = step1Name(input1, input2) step2Output = step2Name(step1Output, input3) Note that inputs for later steps often include outputs from previous steps, illustrating the data flow through your system. Handling Repetition: Be mindful of potential limitations in processing large amounts of data, especially when working with language models. You may need to break down large content into portions and then process each portion through the same set of subsequent steps. To account for this, indicate which steps need to be repeated and for what entities. Use indentation–shifting text to right–to show steps that need to be repeated, and don't use indentation for steps that do not need to be repeated, Let's apply this step to our ongoing example of creating an AI system that generates a PowerPoint presentation from a Word document. We'll break down each step, identifying its inputs, outputs, any repetition, and importantly, the intuition behind each step: content = extractContentFromWordDoc(wordDocFilePath) topics = extractTopics(content) For each topic in topics: contentSegment = extractContentSegment(topic, content) slides = generateSlideContent(topic, contentSegment) ppt = generatePPT(slides) 1. `extractContentFromWordDoc`: - Input: `wordDocFilePath` (the file path of the Word document) - Output: `content` (the extracted text content from the Word document) - Repetition: Performed once for the entire document - Intuition: We start by extracting the raw text from the Word document. This step is crucial because it converts the potentially complex Word format into plain text that our AI system can more easily process. It's the foundation for all subsequent steps. 2. `extractTopics`: - Input: `content` (the extracted text from the previous step) - Output: `topics` (a list or structure of main topics identified in the content) - Repetition: Performed once for the entire content - Intuition: By identifying the main topics, we create a high-level structure for our presentation. This step mimics how a human might skim a document to understand its main points before creating slides. It helps ensure our final presentation will be well-organized and cover all key areas. 3. `For each topic in topics:`: - Intuition: This loop allows us to process each topic individually, which is crucial for managing complexity. Instead of trying to create an entire presentation at once (which could overwhelm our AI), we break it down into more manageable topic-sized chunks. This approach aligns with how humans typically create presentations, focusing on one section at a time. 4. `extractContentSegment`: - Inputs: `topic` (a single topic from the list of topics), `content` (the full text content) - Output: `contentSegment` (the portion of content relevant to the current topic) - Repetition: Repeated for each topic - Intuition: This step is about focusing on relevant information. For each topic, we extract only the content that's pertinent. This helps manage the amount of text our AI needs to process at once, reducing the risk of information overload and improving the relevance of generated slides. 5. `generateSlideContent`: - Inputs: `topic` (the current topic), `contentSegment` (the relevant content for this topic) - Output: `slides` (the content for slides related to this topic) - Repetition: Repeated for each topic - Intuition: Here's where the AI creates the actual slide content, ie slides' titles and their bullet points. By working with one topic and its relevant content at a time, we allow the AI to focus deeply on each section of the presentation. This approach helps ensure that each set of slides is coherent and properly represents its topic. 6. `generatePPT`: - Input: `slides` (all the slide content generated from the previous steps) - Output: `ppt` (the final PowerPoint presentation) - Repetition: Performed once, after all slides have been generated - Intuition: This final step compiles all our generated content into a cohesive PowerPoint presentation. This structure effectively breaks down the complex task of creating a presentation into more manageable steps. The process mimics how a human might approach the task: first understanding the overall content, then identifying main topics, focusing on one topic at a time to create relevant slides, and finally compiling everything into a complete presentation. By using a loop to process topics individually, we address the potential limitation of handling large amounts of data. This approach helps manage the workload for our AI system, potentially improving the accuracy and relevance of the generated slides. Step 4: Assign Tool Type for Each Step After identifying the main steps, their inputs, outputs, and repetition structure, the next crucial task is to assign appropriate tool types to each step. This process helps bridge the gap between high-level planning and implementation, allowing you to think about which steps can be accomplished through coding versus those that require AI models or other specialized tools. For our ongoing example of creating an AI system that generates a PowerPoint presentation from a Word document, let's assign tool types to each step: content = py_extractContentFromWordDoc(wordDocFilePath) topics = llm_extractTopics(content) For each topic in topics: contentSegment = py_extractContentSegment(topic, content) slides = llm_generateSlideContent(topic, contentSegment) ppt = py_generatePPT(slides) Let's break down each step with its assigned tool type and the rationale behind the choice: 1. `py_extractContentFromWordDoc`: - Tool Type: Python function (py_) - Rationale: Extracting text from a Word document is a well-defined task that can be efficiently handled by existing Python libraries like python-docx. This doesn't require the complexity of an AI model and is better suited for a straightforward Python script. 2. `llm_extractTopics`: - Tool Type: Language Model function (llm_) - Rationale: Identifying main topics from a body of text requires understanding context and content, which is well-suited for a language model. This task benefits from the natural language processing capabilities of an LLM. 3. `py_extractContentSegment`: - Tool Type: Python function (py_) - Rationale: Once we have the topics and the full content, extracting relevant segments might seem like a straightforward text processing task. However, a significant challenge arises: topics can appear multiple times throughout the content, and a simple text-matching script wouldn't be able to accurately determine where each topic segment begins and ends. To address this, we can enhance our approach by requesting additional information from the LLM in the previous step. Specifically, we can ask the LLM to provide not just the topics, but also markers (such as the starting lines) for each topic. This additional context allows us to precisely identify where each topic segment begins, greatly simplifying the extraction process. To implement this improved approach, we need to modify our workflow slightly. Here's how the revised flow would look: content = py_extractContentFromWordDoc(wordDocFilePath) topicsAndMarkers = llm_extractTopicsAndMarkers(content) For each {topic, marker} in topicsAndMarkers: markerEnd = get marker for the next topic in topicsAndMarkers contentSegment = py_extractContentSegment(marker, markerEnd, content) slides = llm_generateSlideContent(topic, contentSegment) ppt = py_generatePPT(slides) {topic, marker} in topicsAndMarkers: topicsAndMarkers can be a list or array of tuples, where each tuple contains two elements: a topic and its corresponding marker (starting line). The curly braces {} in the for loop syntax suggest that we're using tuple unpacking to iterate over these pairs of information. markerEnd is determined by getting the marker for the next topic in the topicsAndMarkers list. For each topic except the last one, markerEnd will be the marker of the next topic in the list. For the last one, markerEnd can be none. 4. `llm_generateSlideContent`: - Tool Type: Language Model function (llm_) - Rationale: Creating concise, relevant slide content from a segment of text requires understanding and summarizing information, which is a strength of language models. This step benefits from the natural language generation capabilities of an LLM. 5. `py_generatePPT`: - Tool Type: Python function (py_) - Rationale: Creating a PowerPoint file from structured slide content is a well-defined task that can be handled efficiently by Python libraries like python-pptx. This is more about file manipulation than natural language processing, making it suitable for a Python script. By assigning these tool types, we've created a more detailed roadmap for implementation. This approach allows us to leverage the strengths of different tools: using Python for well-defined, programmatic tasks and language models for tasks requiring natural language understanding and generation. This step in the planning process helps identify which parts of the system will require AI model integration and which parts can be handled by more traditional programming approaches. It provides a clearer picture of the technical requirements for each step and helps in resource allocation and task delegation during the implementation phase. Remember, as emphasized in the document, this planning process is iterative. As you delve deeper into implementation, you may find that some tool type assignments need to be adjusted. The key is to maintain flexibility while progressively refining your plan based on new insights and challenges encountered during development. Step 5: Iterative Refinement Repeat steps 2-4 iteratively, refining your plan each time. Planning an AI system is rarely a linear process. This step encourages you to review and refine your plan multiple times, each iteration bringing more clarity and detail to your system design. During each iteration: Review the logical flow of your system. Does each step naturally lead to the next? Check for consistency between steps, especially in terms of inputs and outputs Ensure that all aspects of your initial goal are addressed Look for opportunities to simplify your design or if you can avoid using an AI model Refine your choice of tools and function names based on a deeper understanding of the requirements Don't be afraid to make significant changes if you identify better approaches. The goal is to have a comprehensive, well-thought-out plan before you start implementation. Step 6: Create prompt templates Once, your design has finalized, write down prompt templates for LLM type steps. While writing prompt templates you would likely use many, if not all, of the inputs you identified for the step. You would also want to pay close attention to the step's objective and desired output when formulating your prompt. These elements should guide the structure and content of your prompt template. A critical consideration in prompt engineering is managing complexity. While modern LLMs have impressive reasoning capabilities, their performance can degrade when faced with overly complex prompts requiring multiple operations. Hence, it's essential to take an iterative approach to prompt design. Test your prompts on an LLM interface to gauge their effectiveness. This hands-on testing allows you to quickly identify whether you need to break a step further into smaller, simpler sub-steps. Prompt for extractTopicsAndMarkers Since a topic can repeat in document and it may not be a good marker for dividing the document into portions. Hence, we would use first two starting lines of the key topic as marker. The starting lines of the next key topic will signify the end of the section for a given key topic. Analyze the given document and extract key topics, following these guidelines: 1. Key Topic Identification: - Topics should represent major sections or themes in the document. - Each key topic should be substantial enough for at least one slide with 3-5 bullet points, potentially spanning multiple slides. - Topics should be broad enough to encompass multiple related points but specific enough to avoid overlap. - Identify topics in the order they appear in the document. - Consider a new topic when there's a clear shift in the main subject, signaled by transitional phrases, new headings, or a distinct change in content focus. - If a topic recurs, don't create a new entry unless it's substantially expanded upon. 2. Key Topic Documentation: - For each key topic, create a detailed name that sums up the idea of the section or theme it represents. - Next, provide the first ten words of the section that the key topic represents. 3. Provide the output in the following format: **key topic 1** first ten words of the section or theme that the key topic 1 represents **key topic 2** first ten words of the section or theme that the key topic 2 represents Document to analyze: ''' {{content}} ''' Prompt for generateSlideContent You will be given a key topic, and a document portion, which provide detail about the key topic. Your task is to create slides based on the document portion. Follow these steps: 1. Identify the relevant section of the document between the given starting lines. 2. Analyze this section and create slides with titles and bullet points. Guidelines: - The number of slides can be as few as one and as many as 10, depending on the amount of non-repetitive information in the relevant section of the key topic. - Present slides in the order that the information appears in the document. - Each slide should have 4-6 concise bullet points, each containing a single key idea or fact. - Use concise phrases or short sentences for bullet points, focusing on conveying key information clearly and succinctly. - If information seems relevant to multiple topics, include it in the current topic's slides, as it appears first in the document. - Avoid redundancy across slides within the same key topic. Output Format: **paste slide title here** paste point 1 here paste point 2 here paste point 3 here Inputs: Key Topic: '''{{topic}}''' Document portion:''' {{contentSegment}} ''' Please create slides based on the document portion, following the guidelines provided. Ensure that the slides comprehensively cover the key topic without unnecessary repetition. We will need to process the response generated by llm_extractTopicsAndMarkers, and convert it into JSON format for subsequent steps. Further, let us also save the llm outputs locally. It would help us evaluate the outputs. content = py_extractContentFromWordDoc(wordDocFilePath) extractTopicsMarkersPrompt = py_generatePrompt(extractTopicsMarkersPromptTemplate, vars={content}) topicsAndMarkers = llm_extractTopicsAndMarkers(extractTopicsMarkersPrompt) py_saveFile(topicsAndMarkersFilePath, topicsAndMarkers) topicsMarkersJson = py_convertTextToJson(topicsAndMarkers) For each i, {topic, marker} in topicsMarkersJson: startPostion = py_getMarkerPostion(marker, content) {marker} = topicsMarkersJson[i+1] endPostion = py_getMarkerPostion(marker, content) contentSegment = py_extractContentSegment(startPostion, endPostion, content) topicWithContentSegment = topicWithContentSegment +"\n\n**" + topic + "**\n" + contentSegment generateSlideContentPrompt = py_generatePrompt(generateSlideContentPromptTemplate, vars={topic, contentSegment}) slides = slides + llm_generateSlideContent(generateSlideContentPrompt) py_saveFile(topicWithContentFilePath, topicWithContent) py_saveFile(slideContentFilePath, slides) ppt = py_generatePPT(slides) Step 7: Detailed Implementation Requirements (optional) For each step, describe the key operations that need to be performed. This step helps you dive deeper into each broad step, bridging the gap between high-level steps and specific implementation. By outlining the operations within each step, you could potentially use LLMs to write the complete code. Depending on how well you describe the operations, your code accuracy could be more than ninety percent. Having said that, debugging and integrating code in the broader multi-tier architecture requires coding experience. For implementation of Powerpoint creation: Please download the zipped file and extract them Make sure you have installed VSCode and Python in it. (Refer to this tutorial if needed: https://chatgpt.com/share/dfa8bbc9-dc52-4a2b-bd31-90154fad2b3d) Create .env file in the folder ANTHROPIC_API_KEY="" Add your API key: https://console.anthropic.com/dashboard Paste a doc.docx in the folder Exercise: AI system for Analyzing Reviews Let us understand the implementation with the following example: Step 1: Define the Goal, Input, and Output Goal: Create an AI system that analyzes customer reviews for products to extract sentiment and key features, providing insights for product improvement and customer satisfaction. Input: A list of products and their associated customer reviews. Output: A summary report for each product containing sentiment analysis, key features mentioned, and overall insights. Step 2: Identify Broad Step Names collectReviews extractInfo analyzeUsingML generateReport Step 3: Identify Inputs, Outputs, and Repetition for Each Step For each product in products: reviews = collectReviews(product) For each review in reviews: featuresAndSentiments = extractInfo(review) mlModelResults = analyzeUsingML(featuresAndSentiments for all reviews) finalReport = generateReport(mlModelResults for all products) Step 4: Assign Tool Type for Each Step For each product in products: reviews = api_collectReviews(product) For each review in reviews: featuresAndSentiments = llm_extractInfo(review) mlModelResults = py_analyzeUsingML(featuresAndSentiments for all products and reviews) finalReport = llm_generateReport(mlModelResults) Check Model's Understanding of AI implementation plan The following prompt will help you evaluate the model's understanding of your AI system implementation plan. Insert the outputs from Step 1 and Step 4 in the designated areas within the prompt. Analyze the provided goal, inputs, output, and pseudo-code for an AI system. Generate explanations by following the steps given below: 1) function list: Get all the functions mentioned in the pseudo-code. Function names in the pseudo-code have prefixes such as py, llm, and ml. Following is the definition of prefixes: py<function> : suggests that this function is a python method llm<function> : suggests that this function makes an api call to a LLM model ml<function>: suggets that this function calls a machine learning model 2) pseudo-code explanation: Explain the pseudo-code and its flow. 3) function explanations: Generate explanation for each function in the pseudo-code covering detail: a. Expected input parameters b. Expected output c. list of operations that need to be done in the function. Output Format: Provide the explanations in the following structure: *pseudo-code explanation <pseudo-code explanation> **<function 1> <function 1 explanation> **<function 2> <function 2 explanation> Goal: <paste goal here> Inputs: <paste inputs here> Output: <paste output here> pseudo code: """ <paste pseudo code> """
16 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
TL;DR: When working with complex, multi-step tasks in language models, 🔗 Prompt Chaining offers precise control by breaking the process into discrete steps, while Stepwise Prompting combines all steps into a single prompt for efficiency. Start with stepwise prompting for simpler tasks, and switch to prompt chaining if quality or consistency declines. Prompting for multi-step processes Prompting often involves instructing models to produce responses for complex, multi-step processes. Try to generate prompt for the following scenario. This scenario involves processing a document through three main steps: Summarizing each paragraph of the document in one line. Extracting key points from the entire document. Organizing the generated summary lines under their respective key points in the order they appear in the original document. This task requires a combination of summarization, key point extraction, and organizational skills. It's a complex process that involves understanding the document's structure, content, and main themes. The two popular ways to handle such tasks are: Prompt Chaining, and Stepwise Prompting. Prompt chaining Prompt chaining is an approach for refining outputs from large language models (LLMs) that involves using a series of discrete prompts to guide the model through different phases of a task. This method allows for a more structured and controlled approach to refinement, with each step having a specific focus. By breaking down complex tasks into smaller, more manageable prompts, prompt chaining provides greater flexibility and precision in directing the LLM's output. For example, in a text summarization task, one prompt might focus on extracting key points, another on organizing them coherently, and a final prompt on polishing the language. Stepwise prompting Stepwise prompting, on the other hand, integrates all phases of the task within a single prompt. This approach attempts to guide the LLM through the entire process in one go, challenging the model to generate a longer and more complex output based on a single set of instructions. While simpler to implement, stepwise prompting may be less effective for complex tasks that benefit from a more granular approach. For instance, when summarizing a lengthy academic paper, a single stepwise prompt might struggle to capture all the nuances of content selection, organization, and style that separate prompts in a chain could address individually. Further, you can improve the clarity by assigning step names followed by step explanation. For example: 1) Analysis: Analyze the given text for key themes. 2) Summary: Summarize the themes identified in `Analysis`. 3) Conclusion: Draw conclusions based on the `Summary`. The use of backticks to reference previous outputs leaves no ambiguity. When to Use Step Names in Stepwise Prompts? Use step names when the output of a step will be referenced later in the prompt or in subsequent prompts. Step names help in organization and add to clarity, especially for tasks with multiple interdependent steps. Additionally, it can help the model more effectively associate output variables with the specific steps in which they are extracted. When Step Names Might Be Optional? 1. Simple, Linear Tasks: For straightforward tasks with clear progression, step names might be unnecessary. 2. Short Prompts: In brief prompts with only 2-3 steps, numbering alone might suffice. Best Practices - Be consistent: If you use step names, use them for all steps in the prompt. - Keep names short and descriptive: Use clear, concise labels that indicate the purpose of each step. Illustration of Stepwise prompting and Prompt Chaining: Generate document outline Let's create prompts for the scenario explained earlier using both techniques: Prompt Chaining: For prompt chaining, we'll break down the task into three separate prompts, each focusing on a specific subtask. Prompt 1 (Paragraph Summarization): You are tasked with summarizing a document. For each paragraph in the given document, create a one-line summary that captures its main idea. Please provide these summary lines in a numbered list, with each number corresponding to the paragraph number in the original document. Prompt 2 (Key Point Extraction): Based on the entire document, identify and list the main key points. These should be the overarching themes or crucial ideas that span multiple paragraphs. Present these key points in a bulleted list. Prompt 3 (Organization): You will be provided with two lists: one containing one-line summaries of each paragraph, and another containing key points extracted from the document. Go through the `Document`. Your task is to organize the summary lines from `Summary list` under their most relevant key points from `Key point list`. Maintain the original order of the summary lines within each key point. Present the result as a structured list with key points as main headings and relevant summary lines as sub-points. Output can be in the following format: **<key point 1> <summary line 1> <summary line 2> **<key point 2> <summary line 3> <summary line 4> <summary line 5> Ensure that all summary lines are included and that they maintain their original numbering order within each key point category. Document: It is the document that needs to be converted into topic-wise `summary list` Summary list: It is a list of one-line summaries of each paragraph in the `document` Key point list: It is a list of key points covered in the `Document` Stepwise prompting: The prompt below demonstrates how we can replicate the three distinct steps previously used in prompt chaining within a single step by using stepwise prompting. You are tasked with analyzing, summarizing, and organizing the content of a given document. Please follow these steps in order: 1. Read document: Carefully read through the entire document. 2. Summary list: Create a one-line summary for each paragraph, capturing its main idea. Number these summaries according to the paragraph they represent. 3. Key point list: Identify the main key points of the entire document. These should be overarching themes or crucial ideas that span multiple paragraphs. 4. Key point wise Summary list: Organize the `Summary list` under their most relevant key points from `Key point list`. Maintain the original order of the summary lines within each key point. Present the result as a structured list with key points as main headings and relevant summary lines as sub-points. The output should be in the following format: **<key point 1> <summary line 1> <summary line 2> **<key point 2> <summary line 3> <summary line 4> <summary line 5> Ensure that all summary lines are included and that they maintain their original numbering order within each key point category. Missing paragraphs or lines Notice the last line: Ensure that all summary lines are included and that they maintain their original numbering order within each key point category. Despite our repeated efforts to ensure the AI model analyzes every paragraph and line, it continues to overlook some. Models may consider multiple paragraphs as one paragraph, or may skip paragraphs completely. This is much common problem when processing long documents. Similarly, if you have to process each line in a paragraph, model may combine lines or skip lines. The solution is you should consider breaking the document into smaller chunks and process each chunk separately. Further, I've seen better results if you provide numbered list of paragraphs, and ask it to generate output as the numbered list corresponding to each paragraph. This makes it easier for the AI model to keep track of paragraphs. Same goes for line processing within paragraphs. Stepwise prompting vs Prompt Chaining Here's a comparison of prompt chaining and stepwise prompting in table format: Aspects Prompt Chaining Stepwise Prompting Execution Runs the LLM multiple times, with each step focusing on a specific subtask Completes all phases within a single generation, requiring only one run of the LLM Complexity and Control Allows for more precise control over each phase of the task, but requires more comprehensive prompts from humans Uses a simpler prompt containing sequential steps, but challenges the LLM to generate a longer and more complex output Effectiveness Generally yields better results, especially in text summarization tasks Might produce a simulated refinement process rather than a genuine one, potentially limiting its effectiveness Task Breakdown Excels at breaking down complex tasks into smaller, more manageable prompts Attempts to handle the entire task in a single, more complex prompt Iterative Improvement Allows for easier iteration and improvement of individual steps in the process Less flexible for targeted improvements without modifying the entire prompt Resource Usage May require more computational resources due to multiple LLM runs More efficient in terms of API calls or processing time Learning Curve Higher initial complexity for prompt designers, but potentially more intuitive for complex tasks Simpler to implement initially, but may be challenging to optimize for complex tasks Recommendation for choosing between the two: I recommend starting with stepwise prompting, as it is a more cost-effective solution and requires less engineering effort compared to prompt chaining. However, if you notice a decline in quality or inconsistent results, switching to prompt chaining will be necessary. Which approach do you prefer when working with LLMs? Let’s discuss!
6 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
Introduction In our previous articles, we explored the limitations of in-context learning and the motivations behind fine-tuning large language models (LLMs). We highlighted the growing need for solutions that can provide task-specific optimizations without the computational overhead traditionally associated with full model fine-tuning. This led us to our exploration of Low-Rank Adaptation (LoRA) in "LoRA Demystified: Optimizing Language Models with Low-Rank Adaptation," where we delved into the intricacies of this groundbreaking technique that has revolutionized how we fine-tune LLMs. Now, we take the next logical step in our journey: scaling LoRA for production environments. As organizations increasingly rely on specialized language models for a variety of tasks, a new challenge emerges: how can we serve multiple LoRA-adapted models simultaneously without sacrificing performance or breaking the bank? This article answers that question by introducing cutting-edge techniques for multi-tenant LoRA serving, enabling the deployment of thousands of fine-tuned LLMs on a single GPU. We'll explore the evolution from basic LoRA implementation to advanced serving strategies, focusing on: The challenges of batching different task types in a multi-tenant environment Innovative solutions like Segmented Gather Matrix-Vector Multiplication (SGMV) The concept and implementation of heterogeneous continuous batching Practical examples using state-of-the-art tools like LoRAX By the end of this article, you'll have a comprehensive understanding of how to leverage LoRA at scale, opening new possibilities for efficient, cost-effective deployment of multiple specialized language models in production environments. A Brief Recap of LoRA Before we dive into the complexities of multi-tenant serving, let's quickly recap the key idea behind LoRA: Keep the pretrained model's weights intact. Add small, trainable matrices to each layer of the Transformer architecture. Use rank decomposition to keep these additional matrices low-rank. This approach offers several significant advantages: Drastically Reduced Parameter Count: By focusing on low-rank updates, LoRA significantly reduces the number of parameters that need to be trained. This makes fine-tuning more efficient and less resource-intensive. Preserved Base Model: Since the original model weights remain unchanged, you can easily switch between different LoRA adaptations or revert to the base model without any loss of information. Cost-Effective Customization: The reduced computational requirements make it feasible to create multiple customized LoRA models tailored to specific needs, even with limited resources. Competitive Performance: Despite its simplicity, LoRA often achieves performance comparable to full fine-tuning across a wide range of tasks. LoRA's efficiency and effectiveness have made it a cornerstone of modern LLM deployment strategies. However, to truly leverage its power in production environments, we need to address the challenges of serving multiple LoRA-adapted models simultaneously. This is where batching strategies come into play. Challenges in Batching Different Task Types As we move towards deploying multiple LoRA-adapted models in production, we encounter a new set of challenges, particularly when it comes to batching requests efficiently. Let's explore these challenges and why traditional batching approaches fall short. The GPU Utilization Imperative Graphics Processing Units (GPUs) are expensive and limited resources. Efficient GPU utilization is crucial for cost-effective deployment of LLMs. As highlighted by Yu et al. in their 2022 study, batching is one of the most effective methods for consolidating workloads to enhance performance and GPU utilization. The Naive Approach: Separate Queues A straightforward approach to handling multiple LoRA-adapted models would be to batch workloads separately for each task type or adapter. This method involves: Segregating tasks into queues based on their type or associated adapter. Waiting for each queue to reach a specific size (batch size) before processing However, this approach leads to several significant drawbacks: Resource Underutilization: The system might have idle resources even when there are enough tasks of different types for a batch, simply because it's waiting for individual queues to fill. This significantly reduces overall throughput. Unpredictable Performance: Performance becomes highly dependent on the arrival rate of each task type. Less frequent tasks can cause long delays in their respective queues, potentially holding up dependent tasks waiting for completion. Scalability Issues: Adding new task types or adapters requires creating new queues, increasing management complexity and potentially leading to more idle periods with less frequent queues. Latency Spikes: Tasks might experience high latency if they arrive when their queue is nearly empty, as they'll have to wait for the queue to fill before being processed. Here's a simplified Python example illustrating the challenges of this naive approach: import queue import time class NaiveBatchingSystem: def __init__(self, batch_size=32): self.queues = {} self.batch_size = batch_size def add_task(self, task_type, task): if task_type not in self.queues: self.queues[task_type] = queue.Queue() self.queues[task_type].put(task) def process_batches(self): while True: for task_type, task_queue in self.queues.items(): if task_queue.qsize() >= self.batch_size: batch = [task_queue.get() for _ in range(self.batch_size)] print(f"Processing batch of {task_type} tasks") # Process the batch... else: print(f"Waiting for more {task_type} tasks...") time.sleep(1) # Avoid busy-waiting # Usage batcher = NaiveBatchingSystem() batcher.add_task("math", "2 + 2") batcher.add_task("translation", "Hello in French") batcher.process_batches() This example demonstrates how tasks of different types might be stuck waiting for their respective queues to fill, even if there are enough total tasks to form a batch. These challenges highlight the need for a more sophisticated approach to batching, one that can efficiently consolidate multi-tenant LoRA serving workloads onto a small number of GPUs while maximizing overall utilization. To address these challenges, researchers have developed innovative techniques like Segmented Gather Matrix-Vector Multiplication (SGMV). Segmented Gather Matrix-Vector Multiplication (SGMV) Chen et al. introduced SGMV in 2023 as a novel CUDA kernel designed specifically for multi-tenant LoRA serving. SGMV enables the batching of GPU operations, allowing multiple distinct LoRA models to be executed concurrently. How SGMV Works At its core, SGMV optimizes the matrix multiplication operations that are central to LoRA adapters. Here's a simplified explanation of how it works: Segmentation: Instead of treating each LoRA adapter as a separate entity, SGMV segments the operations across multiple adapters. Gather: It efficiently gathers the relevant weights from different adapters based on the incoming requests. Batched Multiplication: The gathered weights are then used in a batched matrix-vector multiplication operation, leveraging the GPU's parallel processing capabilities. Benefits of SGMV By leveraging SGMV, we can: Process Multiple Adapters Concurrently: Different LoRA models can be executed in parallel, improving overall system performance and resource utilization. Eliminate Queue-Based Bottlenecks: SGMV allows for grouping requests for different adapters together, avoiding the need for separate queues for each adapter or task type. Maintain Continuous Processing: The system can process tasks constantly, regardless of type, keeping the processing flow continuous and avoiding delays from waiting for specific task types to accumulate. Improve Throughput and Consistency: Heterogeneous continuous batching significantly improves overall throughput and maintains consistent performance even with a growing number of different tasks or adapters. While the actual implementation of SGMV is complex and involves low-level GPU programming, its effects can be observed at the system level. Heterogeneous Continuous Batching in LoRAX LoRAX , an open-source Multi-LoRA inference server, represents a significant leap forward in the efficient deployment of multiple fine-tuned language models. At its core, LoRAX leverages the power of SGMV to achieve heterogeneous continuous batching, optimizing overall system throughput while maintaining low latency. Key Components of LoRAX LoRAX's architecture is built around three fundamental components that enable its powerful heterogeneous batching capabilities: Dynamic Adapter Loading: LoRAX doesn't require all adapters to be pre-loaded into GPU memory. Instead, it dynamically downloads and loads adapters onto the GPU as requests arrive. This on-demand loading ensures efficient use of GPU memory and allows the system to handle a large number of different adapters without blocking other requests. Continuous Batching: Unlike traditional batching systems that wait for a fixed batch size, LoRAX employs a token-based approach to manage batching. It dynamically groups requests into batches based on available GPU memory and desired latency, ensuring a continuous flow of processing. Asynchronous Adapter Scheduling: A background thread in LoRAX efficiently manages adapter offloading and loading, minimizing the performance impact of swapping adapters in and out of GPU memory. Implementation Example Let's look at a simplified example of how LoRAX handles a batch of tasks using the lorax-client with Flask: from flask import Flask, jsonify, request from lorax import Client import requests app = Flask(__name__) # Configuration LORAX_ENDPOINT = "http://127.0.0.1:8080" # Replace with your LoRAX server endpoint CALLBACK_URL = "http://localhost:5001/uploadresponse/" # Replace with your callback endpoint # Initialize the LoRAX client lorax_client = Client(LORAX_ENDPOINT) @app.route("/lorax/upload", methods=["POST"]) def upload_batch(): """ Handles batch upload requests. """ try: # Parse the request body data = request.get_json() batch_id = data.get("batchId") prompts = data.get("data") if not batch_id or not prompts: return jsonify({"message": "Missing batchId or data"}), 400 # Send the batch to LoRAX responses = [] for prompt_data in prompts: response = lorax_client.generate( prompt_data["prompt"], adapter_id=prompt_data.get("adapter_id"), max_new_tokens=prompt_data.get("max_new_tokens"), # ... other parameters ) responses.append(response.dict()) # Trigger the callback callback_data = {"batchId": batch_id, "response": responses} requests.post(CALLBACK_URL, json=callback_data) return jsonify({"message": "Batch processed successfully"}), 200 except Exception as e: print(f"Error processing batch: {e}") return jsonify({"message": "Error processing batch"}), 500 if __name__ == "__main__": app.run(debug=True, port=5001) This implementation showcases several key aspects of LoRAX's heterogeneous continuous batching: Batch of Tasks: The Flask server receives a batch of tasks as a JSON payload. Each task includes a prompt, an optional adapter ID, and the maximum number of tokens to generate. LoRAX Client: The server uses the lorax-client library to communicate with the LoRAX server, abstracting away the complexities of heterogeneous batching. Heterogeneous Batching: Notice that the server doesn't need to filter or sort prompts by adapter ID. LoRAX handles this internally, dynamically grouping tasks based on available resources and efficiently managing adapter loading. Dynamic Adapter Loading: If an adapter specified in a request isn't already loaded, LoRAX will download and load it on-demand, allowing for efficient use of GPU memory. Asynchronous Processing: The server processes each prompt in the batch asynchronously, allowing for efficient handling of multiple requests with different adapters. Testing with curl To test this implementation, you can use a curl command like this: curl -X POST -H "Content-Type: application/json" \ -d '{"batchId": "10001", "data": [ { "prompt": "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]", "adapter_id": "vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k", "max_new_tokens": 64 }, { "prompt": "[INST] Write a SQL query to answer the question based on the table schema.\n\n context: CREATE TABLE table_name_74 (icao VARCHAR, airport VARCHAR)\n\n question: Name the ICAO for lilongwe international airport [/INST]", "adapter_id": "ai2sql/ai2sql_mistral_7b", "max_new_tokens": 128 }, { "prompt": "[INST] What is the capital of France? Provide a brief history. [/INST]", "adapter_id": "vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k", "max_new_tokens": 128 } ]}' \ http://localhost:5001/lorax/upload This curl command sends a POST request to the Flask server's /lorax/upload endpoint with a batch of three prompts. The prompts are varied and include both math and SQL tasks, each specifying a different LoRA adapter to use. LoRAX's heterogeneous continuous batching shines in this scenario. It efficiently handles the diverse set of tasks, potentially loading different adapters as needed, and processes them concurrently. This approach significantly improves throughput and maintains low latency, even when dealing with a mix of task types and adapters. By leveraging LoRAX and its implementation of heterogeneous continuous batching, we can efficiently serve multiple fine-tuned LLMs in production, overcoming the challenges of traditional batching methods and maximizing GPU utilization. References Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685. Yu, C., Han, S., Shen, H., Gao, Y., & Li, J. (2022). PaLM-Coder: Improving Large Language Model Based Program Synthesis Through Batching and Speculative Execution. arXiv preprint arXiv:2212.08272. Chen, Z., Jiang, Y., Luo, Y., Liu, X., Ji, S., & Gong, Z. (2023). LoRAX: A High-Performance Multi-Tenant LoRA Inference Server. arXiv preprint arXiv:2311.03285. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M. A., Lacroix, T., ... & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67. Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv preprint arXiv:2305.14314. Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., ... & Pasunuru, R. (2022). OPT: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., ... & Sifre, L. (2022). Training compute-optimal large language models. arXiv preprint arXiv:2203.15556. Lequn Chen, Zihao Ye, Yongji Wu, Danyang Zhuo, Luis Ceze, & Arvind Krishnamurthy. (2023). Punica: Multi-Tenant LoRA Serving. Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, & Devvret Rishi. (2024). LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report.
7 min read
authors:
Akshat PatilAkshat Patil
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
1. Introduction to LoRA Hey there, language model enthusiasts! Today, we're diving into the fascinating world of LoRA - Low-Rank Adaptation. If you've been keeping up with the latest trends in fine-tuning large language models, you've probably heard this term buzzing around. But what exactly is LoRA, and why should you care? Let's break it down!Have you ever wished you could fine-tune a massive language model without breaking the bank or waiting for days? Enter LoRA - the game-changing technique that's revolutionizing how we adapt large language models. If you've been keeping up with the AI world, you've likely heard whispers about LoRA, but maybe you're not quite sure what all the fuss is about. Well, buckle up, because we're about to embark on a journey that will demystify LoRA and show you how it's reshaping the landscape of language model optimization. Imagine being able to tailor a behemoth language model to your specific needs without the hefty computational costs typically associated with fine-tuning. That's the magic of LoRA, or Low-Rank Adaptation. In a world where AI models are growing exponentially in size and complexity, LoRA emerges as a beacon of efficiency, allowing us to adapt these digital giants with surgical precision. In this article, we're going to pull back the curtain on LoRA. We'll start by unraveling what LoRA is and why it's causing such a stir in the AI community. Then, we'll roll up our sleeves and dive into the nitty-gritty of how LoRA works, from its clever use of low-rank matrix decomposition to its seamless integration with pre-trained models. But we won't stop at theory. We'll guide you through implementing LoRA in PyTorch, breaking down the process into manageable chunks. You'll learn how to create LoRA layers, wrap them around your favorite pre-trained model, and orchestrate a forward pass that leverages the power of LoRA. We'll also explore best practices for using LoRA, from choosing the right rank parameter to optimizing the scaling factor. And for those ready to push the boundaries, we'll delve into advanced techniques that can take your LoRA implementations to the next level. Whether you're an AI researcher looking to streamline your model adaptation process, a developer aiming to make the most of limited computational resources, or simply an enthusiast curious about the cutting edge of language model optimization, this article has something for you. So, are you ready to unlock the potential of LoRA and revolutionize how you work with large language models? Let's dive in and demystify LoRA together! What is LoRA? LoRA, short for Low-Rank Adaptation, is a clever technique that's revolutionizing how we fine-tune large language models. Introduced by Hu et al. in 2022, LoRA allows us to adapt pre-trained models to specific tasks without the hefty computational cost typically associated with full fine-tuning. At its core, LoRA works by adding small, trainable matrices to each layer of the Transformer architecture. These matrices are decomposed into low-rank representations, hence the name. The beauty of this approach is that it keeps the original pre-trained model weights untouched while introducing a minimal number of new parameters to learn. Benefits of LoRA for Language Model Fine-Tuning Now, you might be wondering, "Why should I use LoRA instead of traditional fine-tuning?" Great question! Here are some compelling reasons: Efficiency: LoRA dramatically reduces the number of trainable parameters, making fine-tuning faster and less resource-intensive. Cost-effectiveness: With fewer parameters to train, you can save on computational costs and energy consumption. Flexibility: LoRA allows you to create multiple task-specific adaptations of a single base model without the need for full fine-tuning each time. Performance: Despite its simplicity, LoRA often achieves comparable or even better performance than full fine-tuning for many tasks. 2. Understanding the LoRA Architecture Before we dive into the implementation, let's take a moment to understand how LoRA works under the hood. This knowledge will help you appreciate the elegance of the technique and make informed decisions when using it. Low-Rank Matrix Decomposition The key idea behind LoRA is low-rank matrix decomposition. In linear algebra, a low-rank matrix is one that can be approximated by the product of two smaller matrices. LoRA leverages this concept to create efficient adaptations. Instead of learning a full matrix of weights for each layer, LoRA introduces two smaller matrices, A and B. The adaptation is then computed as the product of these matrices, scaled by a small factor. Mathematically, it looks like this: LoRA adaptation = α * (A * B) Where: A is a matrix of size (input_dim, r) B is a matrix of size (r, output_dim) r is the rank, typically much smaller than input_dim and output_dim α is a scaling factor This decomposition allows us to capture the most important directions of change in the weight space using far fewer parameters. Integration with Pre-Trained Models LoRA integrates seamlessly with pre-trained models. Here's how it works: The original weights of the pre-trained model are frozen (not updated during training). LoRA layers are added in parallel to the existing linear layers in the model. During the forward pass, the output of the original layer and the LoRA layer are summed. Only the LoRA layers are updated during training, leaving the base model untouched. This approach allows us to adapt the model's behavior without modifying its original knowledge, resulting in efficient and effective fine-tuning. 3. Implementing LoRA in PyTorch Now that we understand the theory, let's roll up our sleeves and implement LoRA in PyTorch! We'll break this down into three main components: the LoRA Layer, the LoRA Model, and the forward pass. 3.1 LoRA Layer Implementation First, let's create our LoRA Layer. This is where the magic happens! ```python import torch import torch.nn as nn class LoRALayer(nn.Module): def __init__(self, in_features, out_features, rank=4): super().__init__() self.lora_A = nn.Parameter(torch.randn(in_features, rank)) self.lora_B = nn.Parameter(torch.zeros(rank, out_features)) self.scaling = 0.01 def forward(self, x): return self.scaling * (x @ self.lora_A @ self.lora_B) ``` Let's break this down: We define a new `LoRALayer` class that inherits from `nn.Module`. In the constructor, we create two parameter matrices: `lora_A` and `lora_B`. Notice that `lora_A` is initialized randomly, while `lora_B` starts as all zeros. The `scaling` factor is set to 0.01. This small value helps to keep the LoRA adaptation subtle at the beginning of training. In the forward pass, we compute the LoRA adaptation by multiplying the input `x` with `lora_A` and `lora_B`, then scaling the result. 3.2 LoRA Model Implementation Now that we have our LoRA Layer, let's create a LoRA Model that wraps around our base pre-trained model: ```python class LoRAModel(nn.Module): def __init__(self, base_model): super().__init__() self.base_model = base_model self.lora_layers = nn.ModuleDict() # Add LoRA layers to relevant parts of the base model for name, module in self.base_model.named_modules(): if isinstance(module, nn.Linear): self.lora_layers[name] = LoRALayer(module.in_features, module.out_features) ``` Here's what's happening: We create a `LoRAModel` class that takes a `base_model` as input. We iterate through all modules in the base model, looking for linear layers. For each linear layer, we create a corresponding LoRA layer and add it to our `lora_layers` dictionary. This approach allows us to selectively apply LoRA to specific layers of the model, typically focusing on the attention and feed-forward layers in a Transformer architecture. 3.3 LoRA Model Forward Pass Finally, let's implement the forward pass for our LoRA Model: ```python def forward(self, x): # Forward pass through base model, adding LoRA outputs where applicable for name, module in self.base_model.named_modules(): if name in self.lora_layers: x = module(x) + self.lora_layers[name](x) else: x = module(x) return x ``` In this forward pass: We iterate through the modules of the base model. If a module has a corresponding LoRA layer, we add the LoRA output to the base module's output. For modules without LoRA, we simply pass the input through as usual. This implementation ensures that the LoRA adaptations are applied exactly where we want them, while leaving the rest of the model unchanged. 4. Using the LoRA Model Great job! Now that we have our LoRA model implemented, let's talk about how to use it effectively. Training Process Training a LoRA model is similar to training any other PyTorch model, with a few key differences: Freeze the base model parameters: ```python for param in model.base_model.parameters(): param.requires_grad = False ``` Only optimize the LoRA parameters: ```python optimizer = torch.optim.AdamW(model.lora_layers.parameters(), lr=1e-3) ``` Train as usual, but remember that you're only updating the LoRA layers: ```python for epoch in range(num_epochs): for batch in dataloader: optimizer.zero_grad() output = model(batch) loss = criterion(output, targets) loss.backward() optimizer.step() ``` Inference with LoRA-Adapted Models When it's time to use your LoRA-adapted model for inference, you can simply use it like any other PyTorch model: ```python model.eval() with torch.no_grad(): output = model(input_data) ``` The beauty of LoRA is that you can easily switch between different adaptations by changing the LoRA layers, all while keeping the same base model. 5. Best Practices As you start experimenting with LoRA, keep these best practices in mind: Choosing the Rank Parameter The rank parameter (r) in LoRA determines the complexity of the adaptation. A higher rank allows for more expressive adaptations but increases the number of parameters. Start with a small rank (e.g., 4 or 8) and increase if needed. Scaling Factor Optimization The scaling factor (α) in the LoRA layer can significantly impact performance. While we set it to 0.01 in our example, you might want to treat it as a hyperparameter and tune it for your specific task. Performance Comparisons Always compare your LoRA-adapted model's performance with a fully fine-tuned model. In many cases, LoRA can achieve comparable or better results with far fewer parameters, but it's essential to verify this for your specific use case. 6. Advanced LoRA Techniques Ready to take your LoRA skills to the next level? Here are some advanced techniques to explore: Hyperparameter Tuning for the Scaling Factor Instead of using a fixed scaling factor, you can make it learnable: ```python self.scaling = nn.Parameter(torch.ones(1)) ``` This allows the model to adjust the impact of the LoRA adaptation during training. Selective Application of LoRA You might not need to apply LoRA to every layer. Experiment with applying it only to specific layers (e.g., only to attention layers) to find the best trade-off between adaptation and efficiency. Freezing Base Model Parameters We touched on this earlier, but it's crucial to ensure your base model parameters are frozen: ```python for param in model.base_model.parameters(): param.requires_grad = False ``` This ensures that only the LoRA parameters are updated during training. And there you have it! You're now equipped with the knowledge to implement and use LoRA for optimizing language models. Remember, the key to mastering LoRA is experimentation. Don't be afraid to try different configurations and see what works best for your specific use case. Happy adapting, and may your language models be ever more efficient and effective! Summary In this article, we've demystified LoRA (Low-Rank Adaptation), a powerful technique for optimizing large language models. We explored how LoRA enables efficient fine-tuning by introducing small, trainable matrices to pre-trained models, dramatically reducing computational costs while maintaining performance. We delved into the LoRA architecture, explaining its use of low-rank matrix decomposition and seamless integration with pre-trained models. We then provided a step-by-step guide to implementing LoRA in PyTorch, covering the creation of LoRA layers, wrapping them around base models, and executing forward passes. Key takeaways include: LoRA offers a cost-effective and flexible approach to adapting large language models. Implementing LoRA involves creating specialized layers and integrating them with existing model architectures. Best practices such as choosing appropriate rank parameters and optimizing scaling factors are crucial for success. Advanced techniques like learnable scaling factors and selective application can further enhance LoRA's effectiveness. As AI models continue to grow in size and complexity, techniques like LoRA become increasingly valuable. Whether you're an AI researcher, developer, or enthusiast, LoRA opens up new possibilities for working with large language models.
7 min read
authors:
Akshat PatilAkshat Patil
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
Introduction Artificial Intelligence (AI) is transforming industries across the globe, and its influence on social media management is particularly noteworthy. AI agents can analyze vast amounts of data, uncovering patterns and trends that are otherwise difficult to detect. As part of my Master of Science degree program in Information Systems at the University of Utah, I faced a pivotal decision: complete a certification course or take on a capstone project. Drawn to the challenge and potential for growth, I chose the latter. Under the guidance of Prof. Rohit Aggarwal from the Information Systems Department at the David Eccles School of Business, I embarked on an exciting journey to build an AI agent capable of revolutionizing social media posts. Little did I know that this project would push me far beyond the boundaries of my classroom knowledge and into the realm of practical, cutting-edge AI application. Seizing the Opportunity: The Beginning of My AI Journey The project's scope was ambitious: develop an AI system that could analyze historical social media content and generate future posts to drive engagement. I chose Instagram as the primary platform, focusing on strategies employed by industry leaders like Ogilvy, BBDO, AKQA, and MCCANN. These companies, known for their expertise in brand promotion, provided valuable insights into audience engagement that I could leverage in my AI agent. My task involved collecting data from company websites and social media platforms through web scraping, processing it, and utilizing AI models to extract meaningful themes. The ultimate goal was to generate future posts that would drive engagement and align with each company's brand. This project would test not only my technical skills but also my ability to understand and apply marketing strategies in a digital context. The Journey: Building the AI Agent Data Collection and Preprocessing My first major hurdle was data collection through web scraping. While I had some exposure to Python in my classes, this project demanded a level of expertise I hadn't yet achieved. I spent countless hours poring over YouTube tutorials, documentation, and online forums to master the intricacies of web scraping. I learned to use Python libraries like Instaloader to obtain data from Instagram pages, and Selenium and BeautifulSoup to scrape company websites. This process yielded valuable information, including captions, shares, likes, and comments from each company's Instagram account. After collecting the data, I moved on to preprocessing. I cleaned the data by removing duplicates and null values, converted dates to a datetime format, and prepared it for analysis. Ensuring that the data was accurate and well-organized was crucial for setting a solid foundation for theme extraction. This step taught me the importance of data quality in AI projects, a concept that was only briefly touched upon in my coursework. Theme Extraction and Grouping With the data ready, I conducted theme extraction using the Gemma2b model from Ollama. This was a significant leap from the basic machine learning concepts I had learned in class. I employed zero-shot prompting, a method where I asked the model to perform tasks it hadn't been explicitly trained on. By providing suggested themes, I guided Gemma2b to extract relevant themes from the Instagram posts, such as 'Product Announcement' and ‘Customer Story.’ Once I extracted the themes, I grouped and normalized them. I used Gemma2b to categorize the themes into more concise groups, ensuring that similar themes like 'Customer Story' and 'Customer Stories' were treated as one. This normalization was essential for scaling the data effectively, teaching me about the nuances of natural language processing and the importance of context in AI-driven text analysis. Engagement Analysis and Generating Future Posts Next, I conducted an engagement analysis by calculating scores for each theme based on likes, shares, and comments. Summing up these metrics helped me identify the top 10 themes across all companies. This analysis revealed which themes were driving engagement and how companies like Ogilvy and AKQA were leveraging these strategies. This step required me to blend my understanding of social media metrics with data analysis techniques, bridging the gap between marketing concepts and technical implementation. Armed with this analysis, I used Gemma2b to generate future social media posts. I crafted these posts based on the successful strategies I identified, with suggestions for images, videos, captions, and hashtags. I also included a predicted engagement score for each post, aiding social media managers in planning their content effectively. This phase of the project was particularly exciting as it allowed me to see the practical application of AI in content creation, a concept far beyond what I had learned in my classes. To make my AI agent accessible, I developed an interactive interface using Streamlit. This user-friendly platform allowed social media managers to interact with the model, generate posts, and visualize engagement predictions. Creating this interface pushed me to learn about web application development and user experience design, areas that were entirely new to me but crucial for making my AI agent practical and usable. Challenges I Faced Throughout this project, I encountered numerous challenges that pushed me far beyond what I had learned in my classes: Web Scraping Implementation: Despite my theoretical knowledge of web scraping in Python, this project demanded practical application at a much higher level. I had to enhance my skills through intensive study of YouTube tutorials and comprehensive reading on the subject, including its legal implications to ensure compliance. Model Selection and Deployment: I initially explored quantized models for local execution, gaining extensive knowledge about their capabilities and limitations. After considering various options, including GPU-dependent models, I settled on Gemma 2b with Ollama due to its compatibility with my local machine's resources. This decision came after attempting to use Google Colab's enhanced GPU environment, which proved financially unfeasible for my project's scope. Development Environment Setup: Setting up the working environment posed its own challenges. I opted for Visual Studio Code, which provided a robust platform for code structuring and debugging the large language model. This choice significantly improved my workflow efficiency, but required me to learn a new development environment. Data Processing and Analysis: Data cleaning and merging CSV files presented initial hurdles. I overcame these by developing Python scripts to streamline these processes. The most significant challenge was extracting themes from the large dataset using Gemma 2b, which required substantial computational time. To address this, I utilized a high-RAM system and implemented checkpoints in my code to manage the process more effectively. Model Fine-tuning and Result Validation: To ensure the extracted themes aligned with the desired format, I implemented a training method using sample themes. This was followed by a meticulous manual review process to verify the accuracy and relevance of the extracted themes. Post-processing and Application Development: Once I extracted themes, I leveraged the model to categorically group them and align them with engagement metrics. Additionally, I used Gemma to generate weekly posts designed to resonate with the target audience. The final step involved developing a Streamlit application to generate prompt responses, providing a user-friendly interface for accessing the project's insights. Lessons Learned and Conclusion Despite the difficulties, this project provided me with invaluable lessons. I honed my coding skills, mastered the intricacies of web scraping, and gained hands-on experience with machine learning models. Additionally, the project emphasized the importance of adaptability, communication, and project management—skills that are crucial for success in any professional setting. Building this AI agent was a transformative experience for me. It not only equipped me with technical skills but also prepared me for future roles in AI and data analytics. My project demonstrated the potential of AI in enhancing social media management and underscored the importance of understanding data to make informed decisions. Looking ahead, I'm excited about the possibilities AI offers and the role I can play in shaping this technology. This experience has not only provided me with technical skills but also ignited a passion for creating AI solutions that can make a real difference in how businesses understand and interact with their digital audience. My journey of building my first AI agent has laid a solid foundation for future projects, and I have a strong desire to continue learning and growing in this dynamic field.
5 min read
authors:
Ololade OlaitanOlolade Olaitan
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

Article
Opinion: Teaching Ai & Critical Thinking The opinions expressed herein are derived from our research and own experiences in: developing a few AI Agents, observing student engagement across different variations of our AI classes, engaging in discussions within AI committees and with attendees of BizAI 2024. The Growing Importance of Critical Thinking in the AI Era In the new era of artificial intelligence (AI) and large language models (LLMs), critical thinking skills have become more important than ever before. A 2022 survey by the World Economic Forum revealed that 78% of executives believed critical thinking would be a top-three skill for employees in the next five years, up from 56% in 2020 [1]. As AI systems become more advanced and capable of performing a wide range of tasks, it is crucial for humans to develop and maintain strong critical thinking abilities to effectively leverage these tools and make informed decisions. The Necessity of Human Insight for Effective AI Utilization One of the key reasons critical thinking is so valuable in this context is that LLMs excel at providing information and executing tasks based on their training data, but they often struggle with higher-level reasoning, problem decomposition, and decision-making. While an LLM can generate code, write articles, or answer questions, it may not always understand the broader context or implications of the task at hand. This is where human critical thinking comes into play. For example, let's consider a scenario where a company wants to develop a new product. An LLM can assist by generating ideas, conducting market research, and even creating a project plan. However, it is up to the human decision-makers to critically evaluate the generated ideas, assess their feasibility and potential impact, and direct the AI model where it made mistake or missed something significant such as company values, long-term goals, and potential risks. Cultivating Critical Thinking Skills for AI Collaboration Moreover, as the value of knowing "how" to perform a task decreases due to the capabilities of LLMs, the value of knowing "what" to do and "why" increases. Because AI can manage a lot of "how" to perform a task, it frees professionals to focus on "what" and "why". By developing strong critical thinking abilities, professionals can effectively collaborate with AI systems, leveraging their strengths while compensating for their limitations. This synergy between human reasoning and AI capabilities has the potential to make professionals more productive, bring costs down and help companies grow manifold. However, it is important to note that critical thinking skills must be actively cultivated and practiced. As professors, we need to think of ways to teach students with the tools and training necessary to thrive in an AI-driven world. Let us consider an example of how AI and critical thinking can be taught in tandem. Example: Teaching AI and Critical Thinking in Tandem In one of our courses we teach students how to effectively use AI models to augment their thought process and plan AI agents for revamping business processes. Students explore how to plan an AI agent that learns the tacit knowledge, which experts develop over years of experience. Further, how another AI agent can use this tacit knowledge in conjunction with Retrieval Augmented Generation (RAG) as part of its context to generate decisions or content that mimics the complex decision making of an expert. Through this process, students not only learn technical skills related to AI and LLMs but also develop essential critical thinking abilities such as problem decomposition, strategic planning, and effective communication. They learn to view AI as a tool to augment and enhance their own thinking, rather than a replacement for human judgment and decision-making. They also have better understanding of the limitations of AI models. These AI models solve a lot of "how" type problems that professionals earlier had to spend significant time learning, planning and working on. However, these models also come with their own set of challenges such as context window, limited reasoning abilities, and variability in responses. Hence, there is strong need for students to prepare for AI integration in workplaces accounting for AI models' limitations. Educating students to remain in control Teaching students to view LLMs as highly knowledgeable assistants that sometimes get confused and need direction is a valuable approach. It encourages students to take an active role in guiding and correcting the AI, rather than simply accepting its outputs at face value. They recognize that while AI can provide valuable insights and generate ideas, it is ultimately up to humans to critically evaluate and act upon that information. This understanding helps students develop a healthy and productive relationship with AI, one in which they are in control and can effectively leverage these tools to support their own learning and growth. Intellectual laziness & associated risks While the collaboration between humans and AI presents numerous opportunities, it is essential to be aware of potential drawbacks and risks. As AI models become more advanced and capable, there is a genuine concern that some early learners, may become overly dependent on these tools. This over-reliance could diminish their critical thinking and problem-solving abilities, possibly fostering "intellectual laziness." Individuals might become less inclined to learn and explore new concepts on their own, relying instead on AI for answers. Further, they may lose faith in their own judgment and may stop questioning the AI model's output. In one of our research studies, we observe this behavior among early software developers who start relying on AI models too much. This situation could widen the divide between those who use AI to boost their productivity and those who lean on it too much. To counter these risks, it's important that, along with fostering critical-thinking abilities, we need to stress the need for critical engagement with AI. We should encourage students to scrutinize and question the outputs of AI actively. They need to help students see that excessive reliance on AI can lead to a lack of depth in understanding and personal growth. By advocating for a strategy that equally values AI resources and independent thinking skills, we can guide learners through this new landscape successfully. As we look towards the future, the increasing importance of critical thinking skills in the AI era will have significant implications for job markets and educational curricula. Professionals who can effectively collaborate with AI systems and leverage their capabilities will be in high demand. Hence, faculty will need to adapt their programs to ensure that students understand the importance of using AI as a tool to augment their thinking and not as a replacement. Further, we must rethink our courses and integrate more emphasis on the "what", challenging students to apply their critical thinking skills to real-world problems and decision-making scenarios. Invite our colleagues for collaboration This is not a trivial task, and it will require collaboration and idea-sharing among faculty members. We have been actively exploring these issues and would greatly value the perspectives and insights of our colleagues on this topic. We welcome further discussions and encourage you to reach out to us to share your thoughts and experiences. Disclaimer It's important to note that these insights are primarily anecdotal and have not undergone scientific scrutiny. Additionally, the research involving developers where we noted instances of intellectual laziness has not been validated yet through peer review. References World Economic Forum. (2022). The Future of Jobs Report 2022. Geneva, Switzerland.
5 min read
authors:
Rohit AggarwalRohit Aggarwal
Harpreet SinghHarpreet Singh

If you are a startup, then click here to get more information