Before understanding why many GenAI projects fail, it may not hurt to have an overview of what these models are capable of in simple terms. GenAI models, like ChatGPT, have been trained on vast datasets that include diverse publicly available text sources, such as Wikipedia, websites, books, and other digital texts. These models can answer a wide range of questions based on this knowledge, as you may have experienced when using ChatGPT. Beyond leveraging GenAI's general knowledge, you can also provide your own context (i.e., your specific knowledge, data, or documents) with your instructions, asking GenAI to extract information or generate output based on that context. With the right instructions and context, GenAI can produce outputs that closely resemble human decision-making.
While GenAI’s potential is significant, its success depends on how it's deployed, and many projects fail because of misunderstandings about its capabilities and proper usage.
A common pitfall in deploying GenAI is that organizations often treat it like traditional automation, where predefined processes are automated step-by-step. However, GenAI isn’t just about automating existing processes; it has the potential to transform workflows, especially in areas requiring human decision-making and creativity. When businesses focus solely on automation and fail to redesign processes, they miss the opportunity to unlock GenAI’s real potential—augmenting human intelligence.
Traditional systems analysis and design approaches focused on capturing predefined, deterministic workflows meant to be used by humans. Now, with GenAI augmenting many human decision-making steps, new workflows must be designed to accommodate this shift. For example, when generating customer support FAQs, a customer executive previously had to manually review past support tickets, reference existing FAQs, and create new ones. In the GenAI paradigm, such interfaces for manual searching are no longer necessary. Instead, the focus should be on determining what information to retrieve from past customer support tickets via APIs and integrating GenAI models to automatically analyze and generate FAQs.
Rather than creating interfaces for humans to sift through data, the emphasis should be on designing systems that allow GenAI to access and process information directly from databases or APIs. For instance, you can develop an application where GenAI models are connected to your customer support database, allowing them to automatically extract common issues, analyze sentiment, and generate draft FAQs or support documents. The human role then shifts to reviewing and refining the outputs generated by GenAI, ensuring they meet quality standards and align with the company’s messaging.
This paradigm shift requires businesses to redesign their workflows to be more GenAI-centric. Processes should be built around the capabilities of GenAI, leveraging its strengths in data processing and content generation. If you're unsure how to plan for GenAI systems, you may want to go through the GenAI planning framework here.
There is often an over-expectation that simply feeding AI a prompt will produce sophisticated decision-making outputs, underestimating the complexity of human judgment. In many decision-making processes, humans navigate through multiple steps, often using their intuition and expertise to assess situations in parts. To replicate this with AI, the decision-making process often needs to be broken down, with humans providing clear step-by-step instructions for the AI to follow. Stepwise prompting involves tackling the entire multi-step process within a single prompt. For moderately complex tasks, this can be efficient as it reduces the need for multiple interactions with the model. However, for more complex tasks that involve numerous steps and intricate explanations, it may be less effective. In such cases, you may need to break the task into a sequence of sub-tasks, using different prompts for each sub-task. This process is called Prompt chaining. You can read more about it here.
Additionally, besides providing clear instructions, offering a few examples to the model can help it better understand your expectations. This technique is known as Few shot prompting or In-context learning. You can read more about it here.
Integrating all these ideas in a prompt can get complex and overwhelming. We have found Metadata prompting to be highly effective for handling this complexity by separating concerns: first, focusing on explaining the instructions while assuming that all variables or constructs are predefined, and then later explaining those variables/constructs in detail. You can read more about it here. Even after following these best practices, issues may still arise—such as using semantically incorrect words, introducing modifier ambiguity, or mixing up the order of instructions. You can learn how to systematically iterate and improve prompts to resolve these issues here.
Another reason to use Prompt chaining is the generation capacity of AI models. While these models can process large amounts of input, their ability to generate content is more limited. If you attempt to generate content covering too many topics at once, the quality may degrade, turning into overly simplistic or "listicle" style outputs. To maintain high-quality results, it’s often necessary to break tasks into smaller sub-tasks, use more focused prompts for each sub-task, reduce the cognitive load on the AI, and ensure better results for each segment of the task.
A significant misunderstanding is the assumption that GenAI models need to be fine-tuned for every task. This often leads organizations to unnecessarily commit to fine-tuning, adding substantial cost and complexity to their AI projects. In reality, many tasks can be handled effectively using advanced prompting techniques such as Metadata prompting, Few-shot learning, Stepwise prompting, and Prompt chaining—all without the need for fine-tuning.
Fine-tuning models not only slows down the process by requiring the creation and management of training data but also complicates the deployment and inference stages. Teams often work independently on fine-tuning models, resulting in fragmented efforts, when in fact, using a common base model with heterogeneous adapters could allow for better resource utilization and system flexibility. Heterogeneous Parameter-Efficient Fine-Tuning (PEFT) adapters can be applied in batches to various models, optimizing resource usage. Read more about using heterogeneous PEFT adapters here.
A growing trend in AI development is designing autonomous AI agents using approaches like ReAct with Reflection, Look-Ahead Task Sequencing (LATS), and others. While these designs can be effective for certain use cases, such as simple Q&A using tools like Google Search when retrieval-augmented generation (RAG) systems are insufficient, they pose challenges when applied to more complex tasks that require nuanced reasoning and decision-making.
When GenAI systems are used to augment human decision-making, over-reliance on these agentic designs can lead to several issues: the models may over-reflect, lose focus, struggle to differentiate between critical and irrelevant details, or overuse external tools rather than leveraging their own knowledge. As a result, costs can rapidly escalate, and the quality of output on complex tasks often deteriorates.
To mitigate these issues, well-thought-out guardrails and interventions are necessary. These guardrails help define task scope, keep AI models on track, and improve governance, reducing risks associated with unmonitored AI autonomy. Without these measures, autonomous AI systems may underperform on complex tasks and fail to deliver the expected value.
GenAI enables entirely new processes that were previously impossible or too resource-intensive for traditional AI or human-driven systems. These innovations include automated creative content generation, context-aware conversational agents, and intelligent document synthesis (e.g., creating detailed reports, legal contracts, or tailored marketing content based on minimal inputs). GenAI can also facilitate dynamic decision-making by generating and iterating on multiple solutions in real-time, which traditional AI systems cannot handle effectively without substantial human input.
However, this new potential presents a significant challenge: defining appropriate goals and success metrics. Organizations often struggle to set realistic objectives that take full advantage of GenAI’s strengths because they are anchored in conventional process thinking. Since GenAI can fundamentally change how work is performed, companies must redefine what success looks like and select goals that offer the highest ROI. Misunderstanding or underestimating these possibilities often leads to poorly chosen objectives and a failure to fully realize the impact GenAI could offer.
One of the major reasons GenAI projects fail is the lack of adequate context in prompts. While humans draw on years of experience, domain knowledge, and exposure to various settings to interpret ambiguous information, GenAI models rely solely on the data they’ve been trained on and the explicit details provided in prompts. When important context is missing, AI models may generate responses that are too vague, irrelevant, or even incorrect.
For instance, planning a weekly social media calendar requires tacit knowledge of what types of posts work best and which ones underperform within a specific industry domain. Without this background and context, GenAI systems may struggle to generate high-quality content, leading to generic, low-engagement posts. By incorporating tacit industry knowledge into the prompt, you can guide the AI to create more relevant and impactful content. You can learn how to include tacit knowledge in prompts for more effective results here.