A well-structured approach is essential when planning an AI system, balancing high-level conceptualization with practical implementation details. Creating a high-level roadmap for an AI system is a crucial step in project planning. This process begins with a critical first step: clearly defining your system's goals, inputs, and outputs. This foundation will guide all subsequent development decisions and help ensure your AI system effectively addresses its intended purpose.
It's important to understand that this planning process is inherently iterative. In your first attempt, you'll likely encounter many unanswered questions and gaps in your understanding. This is normal and expected. The key is not to get bogged down by these uncertainties initially. Instead, focus on completing the following steps, even if there are gaps in your thought process. Your initial roadmap will be imperfect, but it will provide a good starting point. The idea is to iterate over the planning phase multiple times. With each pass, your understanding of the problem will get better, and your design will improve. You'll start with a broad structure, intentionally ignoring gaps and unanswered questions at first. The important part is to try creating basic plan despite your uncertainties.
When you feel stuck, focus on what you already know and can act upon. Keep moving forward with the aspects of your plan that are clear and defined. This approach maintains momentum and often leads to insights about the less certain parts of your project. As you work, keep a separate list of uncertainties, unknowns, and areas that need further investigation. This allows you to track uncertainties without letting them halt your progress. Regularly review and update this list as you gain new information. Also, consider temporarily reducing the scope of your project or outcome. Focus on creating a simplified version that captures the core essence of what you're trying to achieve. This "minimum viable product" approach allows you to make tangible progress and gain valuable insights. As you complete this scaled-down version, you'll develop a better understanding of the project's complexities and challenges. From there, you can gradually expand the scope, adding more components or advanced features in a controlled, iterative manner. Each iteration allows you to fill in more details, address previously identified gaps, and incorporate new insights.
Start by clearly articulating the main objective of your AI system, including the broad input it will process and the output it will generate. This step sets the foundation for your entire project.
Defining the Goal: What the AI system will accomplish.
When crafting your goal, ensure it's specific, actionable, and aligned with the overall problem you're trying to solve. A well-structured goal outlines the system's purpose or action and states its intended outcome. Consider the problem you're solving, who you're solving it for, and any key constraints. Here is a template to write the goal of your AI system:
Create an AI system that [performs specific function] on [inputs] and produces [output].
Defining the Input: What information the AI system will need to achieve that goal.
This process involves determining what data is relevant, and available. Input data may be structured (e.g., spreadsheets, databases, CSV files) or unstructured (e.g., PDFs, word documents, emails, websites, customer chats). Many times, you may need to extract features from the raw data and it can be one of the steps in your AI system plan. At this stage, you don't have to think about the needed feature engineering details, rather the scope is limited to understanding what data is needed, where it will come from and ensuring it's accessible.
Your identified inputs should comprehensively cover all the AI system needs to perform its intended function effectively.
Defining the Output: What the outcome of the AI system is–outcome content and format
This includes determining the output format, deciding on the required level of detail, and considering how end-users will utilize the information.
Plan for result interpretation and explanation, ensuring the output is actionable and understandable. The output definition should align closely with the system's goals and user needs, providing clear, relevant, and valuable information that directly addresses the problem the AI system is designed to solve.
Stakeholder Involvement:
Throughout this definition process, it's crucial to involve relevant stakeholders. This may include end-users, domain experts, managers, and other key decision-makers. Their input helps ensure that the system's goals, inputs, and outputs align with real-world needs and constraints. Stakeholders can provide valuable insights into:
1. The specific problems the AI system should address
2. The types of data available and any limitations in data access
3. The most useful forms of output for decision-making processes
4. Potential challenges or considerations in implementing the system
By involving stakeholders early in the planning process, you will build something useful, avoid rework, and provide better ROI.
This initial planning stage sets the foundation for your entire AI project. By clearly defining your goals, inputs, and outputs—with input from key stakeholders—you create a solid framework that will guide your development process and help ensure your AI system meets its intended objectives.
Let's walk through a practical example of how to apply the principles discussed in this document. We'll consider a scenario where you've been tasked with creating a Generative AI system that takes a Microsoft Word document as input and generates a PowerPoint presentation from it. After consulting with your manager and relevant stakeholders, you've developed the following:
Goal: Create an AI system that create powerpoint slides for a given book chapter
Input: A book chapter in MS Word format.
Output: Powerpoint slide for this book chapter
This process involves identifying the main steps that will guide your project from inception to completion. These steps should represent the major phases or milestones in your system's operation, providing a framework for more detailed planning later. There are a few effective approaches to identifying these steps, each offering unique advantages.
Back-to-start: One method is to begin with the end goal in mind and work backwards. Visualize what your completed AI system should accomplish, then reverse-engineer the process to identify the major components or processes needed to reach that objective. As you do this, consider the logical sequence of these steps. Ask yourself: What needs to happen first? Which steps depend on the completion of others?
Component Listing and Ordering: A second strategy involves listing out all potential components that you think may be needed, ordering them logically, and then refining the list by keeping or dropping components as needed. This more flexible, brainstorming-oriented approach can bring out creative solutions and help identify parallel processes or steps that don't necessarily fit into a strict linear sequence.
Gap analysis: Another valuable approach is to analyze the gap between the steps identified so far, or between the latest identified step and the input. This gap analysis method can help uncover missing intermediate steps, ensure a logical flow from input to output, and reveal potential challenges or complexities that weren't immediately apparent.
In practice, a combination of these approaches often yields the most robust and comprehensive planning process. By viewing the problem from different angles, you can potentially develop more innovative and effective solutions.
Regardless of the method used, it's crucial to note that at this stage, you should not worry about how to implement these components or processes. The focus is solely on identifying what steps need to happen, not on determining how to make those steps happen. Implementation details will come later in the planning process. For now, concentrate on creating a high-level overview of the necessary stages in your AI system's development.
Continuing with the example, let's walk through the thought process of deriving our main steps:
By walking through this thought process, we've arrived at a logical sequence of steps:
extractContentFromWordDoc
extractTopics
extractContentSegment
generateSlideContent
generatePPT
After identifying the main steps in your AI system, the next crucial task is to determine the inputs required, outputs produced, and repetition structure for each step. This process helps you understand the flow of data through your system and identify dependencies between steps, creating a clear data pipeline. It also helps with discovering any missing processes.
For each step, specify:
When documenting this information, use a structured format. For example:
step1Output = step1Name(input1, input2)
step2Output = step2Name(step1Output, input3)
Note that inputs for later steps often include outputs from previous steps, illustrating the data flow through your system.
Handling Repetition:
Be mindful of potential limitations in processing large amounts of data, especially when working with language models. You may need to break down large content into portions and then process each portion through the same set of subsequent steps. To account for this, indicate which steps need to be repeated and for what entities. Use indentation–shifting text to right–to show steps that need to be repeated, and don't use indentation for steps that do not need to be repeated,
Let's apply this step to our ongoing example of creating an AI system that generates a PowerPoint presentation from a Word document. We'll break down each step, identifying its inputs, outputs, any repetition, and importantly, the intuition behind each step:
content = extractContentFromWordDoc(wordDocFilePath)
topics = extractTopics(content)
For each topic in topics:
contentSegment = extractContentSegment(topic, content)
slides = generateSlideContent(topic, contentSegment)
ppt = generatePPT(slides)
1. `extractContentFromWordDoc`:
- Input: `wordDocFilePath` (the file path of the Word document)
- Output: `content` (the extracted text content from the Word document)
- Repetition: Performed once for the entire document
- Intuition: We start by extracting the raw text from the Word document. This step is crucial because it converts the potentially complex Word format into plain text that our AI system can more easily process. It's the foundation for all subsequent steps.
2. `extractTopics`:
- Input: `content` (the extracted text from the previous step)
- Output: `topics` (a list or structure of main topics identified in the content)
- Repetition: Performed once for the entire content
- Intuition: By identifying the main topics, we create a high-level structure for our presentation. This step mimics how a human might skim a document to understand its main points before creating slides. It helps ensure our final presentation will be well-organized and cover all key areas.
3. `For each topic in topics:`:
- Intuition: This loop allows us to process each topic individually, which is crucial for managing complexity. Instead of trying to create an entire presentation at once (which could overwhelm our AI), we break it down into more manageable topic-sized chunks. This approach aligns with how humans typically create presentations, focusing on one section at a time.
4. `extractContentSegment`:
- Inputs: `topic` (a single topic from the list of topics), `content` (the full text content)
- Output: `contentSegment` (the portion of content relevant to the current topic)
- Repetition: Repeated for each topic
- Intuition: This step is about focusing on relevant information. For each topic, we extract only the content that's pertinent. This helps manage the amount of text our AI needs to process at once, reducing the risk of information overload and improving the relevance of generated slides.
5. `generateSlideContent`:
- Inputs: `topic` (the current topic), `contentSegment` (the relevant content for this topic)
- Output: `slides` (the content for slides related to this topic)
- Repetition: Repeated for each topic
- Intuition: Here's where the AI creates the actual slide content, ie slides' titles and their bullet points. By working with one topic and its relevant content at a time, we allow the AI to focus deeply on each section of the presentation. This approach helps ensure that each set of slides is coherent and properly represents its topic.
6. `generatePPT`:
- Input: `slides` (all the slide content generated from the previous steps)
- Output: `ppt` (the final PowerPoint presentation)
- Repetition: Performed once, after all slides have been generated
- Intuition: This final step compiles all our generated content into a cohesive PowerPoint presentation.
This structure effectively breaks down the complex task of creating a presentation into more manageable steps. The process mimics how a human might approach the task: first understanding the overall content, then identifying main topics, focusing on one topic at a time to create relevant slides, and finally compiling everything into a complete presentation.
By using a loop to process topics individually, we address the potential limitation of handling large amounts of data. This approach helps manage the workload for our AI system, potentially improving the accuracy and relevance of the generated slides.
After identifying the main steps, their inputs, outputs, and repetition structure, the next crucial task is to assign appropriate tool types to each step. This process helps bridge the gap between high-level planning and implementation, allowing you to think about which steps can be accomplished through coding versus those that require AI models or other specialized tools.
For our ongoing example of creating an AI system that generates a PowerPoint presentation from a Word document, let's assign tool types to each step:
content = py_extractContentFromWordDoc(wordDocFilePath)
topics = llm_extractTopics(content)
For each topic in topics:
contentSegment = py_extractContentSegment(topic, content)
slides = llm_generateSlideContent(topic, contentSegment)
ppt = py_generatePPT(slides)
Let's break down each step with its assigned tool type and the rationale behind the choice:
1. `py_extractContentFromWordDoc`:
- Tool Type: Python function (py_)
- Rationale: Extracting text from a Word document is a well-defined task that can be efficiently handled by existing Python libraries like python-docx. This doesn't require the complexity of an AI model and is better suited for a straightforward Python script.
2. `llm_extractTopics`:
- Tool Type: Language Model function (llm_)
- Rationale: Identifying main topics from a body of text requires understanding context and content, which is well-suited for a language model. This task benefits from the natural language processing capabilities of an LLM.
3. `py_extractContentSegment`:
- Tool Type: Python function (py_)
- Rationale: Once we have the topics and the full content, extracting relevant segments might seem like a straightforward text processing task. However, a significant challenge arises: topics can appear multiple times throughout the content, and a simple text-matching script wouldn't be able to accurately determine where each topic segment begins and ends. To address this, we can enhance our approach by requesting additional information from the LLM in the previous step. Specifically, we can ask the LLM to provide not just the topics, but also markers (such as the starting lines) for each topic. This additional context allows us to precisely identify where each topic segment begins, greatly simplifying the extraction process. To implement this improved approach, we need to modify our workflow slightly. Here's how the revised flow would look:
content = py_extractContentFromWordDoc(wordDocFilePath)
topicsAndMarkers = llm_extractTopicsAndMarkers(content)
For each {topic, marker} in topicsAndMarkers:
markerEnd = get marker for the next topic in topicsAndMarkers
contentSegment = py_extractContentSegment(marker, markerEnd, content)
slides = llm_generateSlideContent(topic, contentSegment)
ppt = py_generatePPT(slides)
{topic, marker} in topicsAndMarkers:
topicsAndMarkers can be a list or array of tuples, where each tuple contains two elements: a topic and its corresponding marker (starting line). The curly braces {} in the for loop syntax suggest that we're using tuple unpacking to iterate over these pairs of information.
markerEnd is determined by getting the marker for the next topic in the topicsAndMarkers list.
For each topic except the last one, markerEnd will be the marker of the next topic in the list. For the last one, markerEnd can be none.
4. `llm_generateSlideContent`:
- Tool Type: Language Model function (llm_)
- Rationale: Creating concise, relevant slide content from a segment of text requires understanding and summarizing information, which is a strength of language models. This step benefits from the natural language generation capabilities of an LLM.
5. `py_generatePPT`:
- Tool Type: Python function (py_)
- Rationale: Creating a PowerPoint file from structured slide content is a well-defined task that can be handled efficiently by Python libraries like python-pptx. This is more about file manipulation than natural language processing, making it suitable for a Python script.
By assigning these tool types, we've created a more detailed roadmap for implementation. This approach allows us to leverage the strengths of different tools: using Python for well-defined, programmatic tasks and language models for tasks requiring natural language understanding and generation.
This step in the planning process helps identify which parts of the system will require AI model integration and which parts can be handled by more traditional programming approaches. It provides a clearer picture of the technical requirements for each step and helps in resource allocation and task delegation during the implementation phase.
Remember, as emphasized in the document, this planning process is iterative. As you delve deeper into implementation, you may find that some tool type assignments need to be adjusted. The key is to maintain flexibility while progressively refining your plan based on new insights and challenges encountered during development.
Repeat steps 2-4 iteratively, refining your plan each time. Planning an AI system is rarely a linear process. This step encourages you to review and refine your plan multiple times, each iteration bringing more clarity and detail to your system design. During each iteration:
Don't be afraid to make significant changes if you identify better approaches. The goal is to have a comprehensive, well-thought-out plan before you start implementation.
Once, your design has finalized, write down prompt templates for LLM type steps. While writing prompt templates you would likely use many, if not all, of the inputs you identified for the step. You would also want to pay close attention to the step's objective and desired output when formulating your prompt. These elements should guide the structure and content of your prompt template.
A critical consideration in prompt engineering is managing complexity. While modern LLMs have impressive reasoning capabilities, their performance can degrade when faced with overly complex prompts requiring multiple operations. Hence, it's essential to take an iterative approach to prompt design. Test your prompts on an LLM interface to gauge their effectiveness. This hands-on testing allows you to quickly identify whether you need to break a step further into smaller, simpler sub-steps.
Since a topic can repeat in document and it may not be a good marker for dividing the document into portions. Hence, we would use first two starting lines of the key topic as marker. The starting lines of the next key topic will signify the end of the section for a given key topic.
Analyze the given document and extract key topics, following these guidelines:
1. Key Topic Identification:
- Topics should represent major sections or themes in the document.
- Each key topic should be substantial enough for at least one slide with 3-5 bullet points, potentially spanning multiple slides.
- Topics should be broad enough to encompass multiple related points but specific enough to avoid overlap.
- Identify topics in the order they appear in the document.
- Consider a new topic when there's a clear shift in the main subject, signaled by transitional phrases, new headings, or a distinct change in content focus.
- If a topic recurs, don't create a new entry unless it's substantially expanded upon.
2. Key Topic Documentation:
- For each key topic, create a detailed name that sums up the idea of the section or theme it represents.
- Next, provide the first ten words of the section that the key topic represents.
3. Provide the output in the following format:
**key topic 1**
first ten words of the section or theme that the key topic 1 represents
**key topic 2**
first ten words of the section or theme that the key topic 2 represents
Document to analyze:
'''
{{content}}
'''
You will be given a key topic, and a document portion, which provide detail about the key topic. Your task is to create slides based on the document portion. Follow these steps:
1. Identify the relevant section of the document between the given starting lines.
2. Analyze this section and create slides with titles and bullet points.
Guidelines:
- The number of slides can be as few as one and as many as 10, depending on the amount of non-repetitive information in the relevant section of the key topic.
- Present slides in the order that the information appears in the document.
- Each slide should have 4-6 concise bullet points, each containing a single key idea or fact.
- Use concise phrases or short sentences for bullet points, focusing on conveying key information clearly and succinctly.
- If information seems relevant to multiple topics, include it in the current topic's slides, as it appears first in the document.
- Avoid redundancy across slides within the same key topic.
Output Format:
**paste slide title here**
paste point 1 here
paste point 2 here
paste point 3 here
Inputs:
Key Topic: '''{{topic}}'''
Document portion:'''
{{contentSegment}}
'''
Please create slides based on the document portion, following the guidelines provided. Ensure that the slides comprehensively cover the key topic without unnecessary repetition.
We will need to process the response generated by llm_extractTopicsAndMarkers, and convert it into JSON format for subsequent steps. Further, let us also save the llm outputs locally. It would help us evaluate the outputs.
content = py_extractContentFromWordDoc(wordDocFilePath)
extractTopicsMarkersPrompt = py_generatePrompt(extractTopicsMarkersPromptTemplate, vars={content})
topicsAndMarkers = llm_extractTopicsAndMarkers(extractTopicsMarkersPrompt)
py_saveFile(topicsAndMarkersFilePath, topicsAndMarkers)
topicsMarkersJson = py_convertTextToJson(topicsAndMarkers)
For each i, {topic, marker} in topicsMarkersJson:
startPostion = py_getMarkerPostion(marker, content)
{marker} = topicsMarkersJson[i+1]
endPostion = py_getMarkerPostion(marker, content)
contentSegment = py_extractContentSegment(startPostion, endPostion, content)
topicWithContentSegment = topicWithContentSegment +"\n\n**" + topic + "**\n" + contentSegment
generateSlideContentPrompt = py_generatePrompt(generateSlideContentPromptTemplate, vars={topic, contentSegment})
slides = slides + llm_generateSlideContent(generateSlideContentPrompt)
py_saveFile(topicWithContentFilePath, topicWithContent)
py_saveFile(slideContentFilePath, slides)
ppt = py_generatePPT(slides)
For each step, describe the key operations that need to be performed. This step helps you dive deeper into each broad step, bridging the gap between high-level steps and specific implementation. By outlining the operations within each step, you could potentially use LLMs to write the complete code. Depending on how well you describe the operations, your code accuracy could be more than ninety percent. Having said that, debugging and integrating code in the broader multi-tier architecture requires coding experience.
For implementation of Powerpoint creation:
Let us understand the implementation with the following example:
Goal: Create an AI system that analyzes customer reviews for products to extract sentiment and key features, providing insights for product improvement and customer satisfaction.
Input: A list of products and their associated customer reviews.
Output: A summary report for each product containing sentiment analysis, key features mentioned, and overall insights.
collectReviews
extractInfo
analyzeUsingML
generateReport
For each product in products:
reviews = collectReviews(product)
For each review in reviews:
featuresAndSentiments = extractInfo(review)
mlModelResults = analyzeUsingML(featuresAndSentiments for all reviews)
finalReport = generateReport(mlModelResults for all products)
For each product in products:
reviews = api_collectReviews(product)
For each review in reviews:
featuresAndSentiments = llm_extractInfo(review)
mlModelResults = py_analyzeUsingML(featuresAndSentiments for all products and reviews)
finalReport = llm_generateReport(mlModelResults)
The following prompt will help you evaluate the model's understanding of your AI system implementation plan. Insert the outputs from Step 1 and Step 4 in the designated areas within the prompt.
Analyze the provided goal, inputs, output, and pseudo-code for an AI system. Generate explanations by following the steps given below:
1) function list: Get all the functions mentioned in the pseudo-code. Function names in the pseudo-code have prefixes such as py, llm, and ml. Following is the definition of prefixes:
py<function> : suggests that this function is a python method
llm<function> : suggests that this function makes an api call to a LLM model
ml<function>: suggets that this function calls a machine learning model
2) pseudo-code explanation: Explain the pseudo-code and its flow.
3) function explanations: Generate explanation for each function in the pseudo-code covering detail: a. Expected input parameters b. Expected output c. list of operations that need to be done in the function.
Output Format: Provide the explanations in the following structure:
*pseudo-code explanation
<pseudo-code explanation>
**<function 1>
<function 1 explanation>
**<function 2>
<function 2 explanation>
Goal: <paste goal here>
Inputs: <paste inputs here>
Output: <paste output here>
pseudo code: """
<paste pseudo code>
"""