Legacy Edge Logo

Author: Legacy Edge Team

  • How to Prepare Sales Data for an AI Pricing Model

    How to Prepare Sales Data for an AI Pricing Model

    Introduction: Why Clean, Well-Prepared Data Is the Secret Ingredient in AI Pricing

    As distributors across every industry look to gain a competitive edge, AI-powered pricing models are becoming one of the most powerful tools available. These models can uncover hidden patterns in historical transactions, predict customer sensitivity to price changes, and recommend optimized prices that protect margin while staying competitive.

    But before an algorithm can learn anything, it needs clean, well-structured data. Most distributors already sit on a goldmine of information — product catalogs, customer order histories, cost data, and supplier terms — yet these valuable records are often scattered, inconsistent, or incomplete. That’s why the first step in building a successful model is learning how to prepare sales data for an AI pricing model.

    In this article, we’ll walk through how to collect, clean, and enhance your existing sales and operations data so it’s ready for machine learning. By the end, you’ll understand which data sources matter most, how to transform them into model-ready inputs, and how better data can translate directly into smarter, more profitable pricing decisions.

    The Data Distributors Already Have (and Why It’s a Goldmine)

    The good news for most distributors is that the foundation for an AI pricing model is already sitting in your systems — it just needs to be unlocked. Every quote and sales order tells a story about what your customers value, what they’re willing to pay, and how your prices perform in the market. By gathering this information into a clean, structured dataset, you can train a machine learning model to detect patterns that no human could ever spot at scale.

    Here are some of the most valuable types of data that distributors already possess:

    • Sales transactions: Each line item — product, gross profit margin, and customer — forms the backbone of your training data. These records show how price interacts with real-world buying behavior.
    • Product information: Descriptions, SKUs, product categories and groups, stock or non-stock status, and cost data help the model understand relationships between products and margin structures.
    • Customer data: Attributes such as industry, region, customer tier and/or customer type (e.g., contractor vs. OEM) allow the model to personalize pricing recommendations.
    • Supplier and cost data: Fluctuating supplier prices and terms can be key variables when predicting optimal selling prices.
    • Historical quotes and win/loss data: This often-overlooked data is extremely valuable for understanding price sensitivity and competitive dynamics. Valuable information includes quoted lead times, quantity on hand at the time of quote, quantity quoted, outcome (won or lost) and salesperson.
    • Seasonality and time-based data: Sales patterns by month, quarter, or season help the model adjust for demand cycles.

    When all of this information is combined, it becomes a dynamic pricing engine waiting to happen. The next step is ensuring that this data is clean, consistent, and machine-readable — which is where the real work (and value) begins.

    Cleaning and Normalizing: Turning Raw Data into Model-Ready Input

    Raw sales data, no matter how rich, is rarely ready for machine learning. It’s full of duplicates, missing fields, inconsistent formats, and outdated records. Before your AI model can recognize pricing patterns, it first has to trust the data it’s being trained on. That’s why data cleaning and normalization are so critical — they transform messy sales records into a structured, reliable dataset that a pricing algorithm can actually learn from.

    Here are the most important steps in preparing distributor data for an AI pricing model:

    • Remove duplicates and errors. Repeated invoice lines or miskeyed prices can distort model training. Even a few outliers can cause the model to learn incorrect pricing relationships.
    • Handle missing or incomplete data. When costs, quantities, customer category, product category or dates are missing, use business logic or statistical methods to fill gaps — or remove unusable rows entirely. Consistency is more important than volume.
    • Fix or remove records with invalid data (prices or costs at or below zero, negative lead-time days, negative quantities, etc.) Do not include records for internal transactions (quote or sale records for internal transfers, for example.)
    • Normalize units and currencies. Distributors often sell the same product in different units (e.g., cases, boxes, or singles). Convert all transactions into a common base unit and currency so the model can compare apples to apples.
    • Align product and customer identifiers. Standardize SKUs, product categories, and customer IDs across all systems (ERP, CRM, quoting tools). A single, unified key for each entity prevents confusion during model training.
    • Tokenize categorical data. Many AI models can’t directly read text fields like “Region = Midwest” or “Customer Type = Contractor.” Tokenization — assigning numeric or encoded values — allows these labels to become usable inputs.
    • Group numerical fields into bins. Continuous fields such as “quantity sold” or “order size” can be bucketed into ranges (e.g., 1–10, 11–50, 51–100) to help the model identify threshold effects, such as volume-based discount behavior.
    • Detect and treat outliers. An occasional “$0.01” sale or “10,000-unit” order can throw off training results. Flag and investigate these before feeding them into your model.
    • Remove any quotes or sales that are pre-determined (sales based on pre-agreed contracts or price sheets, for example.)
    • Remove any quotes for items that have never been won as the model might be overly aggressive when attempting to price these items.

    By the end of this stage, your raw data becomes a standardized, trustworthy foundation. Only then can it reveal the true signals behind pricing performance — signals that a well-trained AI model can amplify into real margin improvement.

    Feature Engineering for Better Predictions

    Once your data is clean and consistent, the next step is to make it more informative. Feature engineering is the process of transforming raw data into new variables (or “features”) that help your AI pricing model recognize the subtle factors influencing customer behavior.

    Think of it as giving the model more context — the same way an experienced sales rep instinctively knows that a contractor ordering 1,000 units in May behaves differently from a retail customer ordering ten units in December.

    Here are some practical ways distributors can enhance their datasets through feature engineering:

    • Create ratio-based features. Calculating fields such as margin percentage, discount from list price, or average revenue per customer helps the model see relationships that aren’t obvious in raw sales data.
    • Add time-based context. Derived features like days since last purchase, month of year, or season capture repeat buying patterns and seasonal demand.
    • Segment by customer and product attributes. Creating flags or encoded values such as “key account”, “preferred supplier”, or “new product launch” gives the model behavioral cues.
    • Aggregate transactional history. Summarizing data into higher-level metrics — like average order size or total spend per quarter — helps smooth out noise and reveal long-term trends.
    • Use tokenized and bucketed fields. Earlier steps like tokenizing categories or binning order quantities now become the building blocks for modeling how price elasticity changes across segments.

    Good feature engineering transforms your sales database from a record of past transactions into a simulation of your market dynamics. When these enhanced features are used to train your AI pricing model, it doesn’t just learn what happened — it begins to infer why.

    From Raw Data to Model-Ready: An Example Schema

    To make this more tangible, let’s look at how typical distributor data evolves from raw quotes to model-ready training data.

    Typical raw quote data:

    Quote IdQuote DateCustomer IdProduct IdQty QuotedUoMQuoted GPSalesPerson Id
    10012029-01-02CUST-001PROD-010100EA13.5%SALES-100
    10022029-01-02CUST-002PROD-0101CASE13.2%

    Normalize and populate missing data:

    Quote IdQuote DateCustomer IdProduct IdQty QuotedUoMQuoted GPSalesPerson Id
    10012029-01-02CUST-001PROD-010100EA13.5%SALES-100
    10022029-01-02CUST-002PROD-010250EA13.2%SALES-104

    Link related sales orders to identify won and lost quotes:

    Quote IdQuote DateCustomer IdProduct IdQty QuotedUoMQuoted GPSalesPerson IdOutcome
    10012029-01-02CUST-001PROD-010100EA13.5%SALES-100lost
    10022029-01-02CUST-002PROD-010250EA13.2%SALES-104won

    Add additional details about the customer, product, etc:

    Quote IdQuote DateCustomer IdProduct IdQty QuotedUoMQuoted GPSalesPerson IdOutcomeCustomer TierCustomer RegionProduct Category
    10012029-01-02CUST-001PROD-010100EA13.5%SALES-100lost1WestFasteners
    10022029-01-02CUST-002PROD-010250EA13.2%SALES-104won3NorthBolts

    Enhance & Engineer the data:

    Quote IdQuote DateCustomer IdProduct IdQty QuotedUoMQuoted GPSalesPerson IdOutcomeCustomer TierCustomer RegionProduct CategoryQty BucketDiscount from ListMonth
    10012029-01-02CUST-001PROD-010100EA13.5%SALES-100lost1WestFasteners0-1005%1
    10022029-01-02CUST-002PROD-010250EA13.2%SALES-104won3NorthBolts101-50012%1

    At this stage:

    • Row-level ratios like margin% and discount% are new columns in the same table.
    • Aggregated metrics (e.g., customer lifetime value) may be calculated separately and merged back in by key (e.g., customer_id).
    • Tokenized fields allow categorical data to be processed numerically.
    • Bucketed fields (like quantity ranges) help the model learn threshold effects such as volume discounts.

    The result is a flattened, model-ready table where each row represents one transaction, but each column encodes valuable business knowledge. This is what modern AI pricing models are trained on — a single, rich, structured dataset that reflects both transactional detail and business context.

    Enhancing Data with External Context

    Even the cleanest internal dataset can only describe what’s already happened inside your business. To train an AI pricing model that reacts to the market, not just your history, you’ll want to enrich your data with external signals. These contextual factors help the model recognize the why behind pricing shifts—things like seasonality, supplier volatility, or regional demand patterns.

    Here are a few powerful types of external data you can integrate:

    • Market and commodity indexes. For distributors whose costs depend on raw materials (steel, copper, resin, etc.), linking supplier prices to public commodity indexes gives the model a real-world cost baseline.
    • Freight and logistics costs. Adding average freight rates or fuel costs by region can help the model understand variations in delivered pricing and margin erosion.
    • Economic indicators. Regional GDP growth, interest rates, or housing starts can all influence industrial demand. Including these variables lets your model anticipate pricing pressure before it shows up in sales data.
    • Weather and seasonality. For sectors tied to climate (HVAC, landscaping, construction materials), temperature or precipitation data can reveal when demand spikes are most likely.
    • Competitor or market pricing. Even limited competitive intelligence—such as average market prices from a benchmarking service—helps the model learn where your price points sit in context.

    When combined with your cleaned and engineered sales data, these external signals transform your pricing model from a reactive tool into a forward-looking one. The model can then spot correlations your teams might miss, like how freight volatility or regional construction activity subtly shifts price sensitivity.

    Ultimately, data enrichment bridges the gap between your transactional reality and the economic environment you operate in. That’s where the predictive power of AI becomes a genuine strategic advantage.

    Data Governance and Ongoing Maintenance

    Preparing your data for an AI pricing model isn’t a one-time task — it’s an ongoing discipline. The moment you start training models, your data pipeline becomes part of your daily operations. If the data feeding the model degrades, so will the model’s accuracy and trustworthiness.

    Here are the key practices every distributor should adopt to keep their data healthy:

    • Standardize data entry and definitions. Ensure that all departments use consistent product categories, customer classifications, and units of measure. Small inconsistencies compound quickly in large datasets.
    • Monitor for data drift. Over time, market conditions and internal processes change. Regularly compare current data distributions (like average margins or order sizes) to historical ones to spot shifts that may require retraining the model.
    • Schedule periodic audits. Review data integrity quarterly or semiannually. This might include sampling transactions for accuracy, checking for missing fields, and validating that external feeds (like commodity prices) are still updating correctly.
    • Document data lineage. Keep a record of where each data source originates, what transformations are applied, and who owns it. This transparency makes troubleshooting and compliance far easier down the line.
    • Retrain the model on a schedule. Even a perfectly prepared dataset becomes outdated as the market evolves. Set a cadence for retraining your AI pricing model—monthly, quarterly, or annually—depending on your sales volume and industry volatility.

    Strong governance ensures that the effort you put into collecting, cleaning, and enhancing your data continues to pay off. Over time, this steady flow of reliable, enriched information becomes your most valuable competitive asset — powering not just pricing optimization, but smarter forecasting, inventory management, and customer insights across the business.

    Conclusion: Turning Clean Data into Profitable Intelligence

    For distributors, success with artificial intelligence begins long before the first line of code. The real magic happens when clean, consistent, and enriched data becomes the foundation for smarter decision-making. By collecting, preparing, and enhancing your sales data, you’re not just creating a dataset — you’re building a digital model of how your market behaves.

    With that foundation in place, you’re ready to take the next step: transforming your prepared data into a working AI pricing model. In the next article — How to Build an AI Pricing Model Using Machine Learning in Python — we’ll walk through how to feed this data into a machine learning framework, train your model, and start generating optimized pricing recommendations that boost both margin and competitiveness.

    Investing in data preparation today means unlocking long-term pricing intelligence tomorrow — a true strategic edge in an increasingly data-driven distribution landscape.

  • How to Build an AI Pricing Model Using Machine Learning in Python

    How to Build an AI Pricing Model Using Machine Learning in Python

    In today’s competitive markets, pricing has become one of the most powerful levers for profitability — and one of the hardest to get right. Traditional pricing methods, like simple margin targets or “last price quoted” rules, can overlook complex relationships between cost, demand, competition, and customer behavior.

    That’s where Machine Learning (ML) comes in. By analyzing large volumes of historical quote and sales data, an AI Pricing Model can uncover hidden patterns — such as how product category, lead time, customer tier, or market conditions influence the probability of winning a quote or achieving a target gross profit.

    In this post, we’ll walk through how to build an AI Pricing Model using Machine Learning in Python, using tools like pandas, scikit-learn, and XGBoost. You’ll learn not just how the model works, but also why it can outperform traditional pricing logic — and how to interpret the model’s predictions in a way that’s practical for business decision-making.

    By the end, you’ll understand how to:

    • Prepare and clean pricing data for machine learning,
    • Train and evaluate a predictive pricing model,
    • Measure model accuracy using metrics like RMSE, R², and AUC,
    • and apply your AI pricing model to generate optimized prices that balance margin and win probability.

    Why Machine Learning Works for Pricing Optimization

    At its core, pricing optimization is about understanding how different factors — such as cost, competition, customer type, quantity, and lead time — influence a buyer’s willingness to pay and your ability to win profitable deals. The challenge is that these relationships are rarely linear or static. They can shift over time, differ across product categories, and interact in subtle ways that are hard to detect with traditional rule-based logic or spreadsheets.

    That’s exactly where Machine Learning (ML) excels.

    A Machine Learning model for pricing learns directly from historical quote and sales data. It doesn’t rely on hard-coded formulas; instead, it identifies patterns and correlations that may not be obvious to humans. For example, it might learn that:

    • A certain customer tier consistently pays higher prices for small-quantity orders,
    • Or that short lead times increase win probability but reduce achievable margins,
    • Or that specific product categories are more price-sensitive during certain market conditions.

    Because ML models can analyze millions of records at once, they’re capable of understanding these complex interactions and weighting them appropriately. The result is an AI Pricing Model that predicts outcomes such as:

    • Win probability for a given quote price, or
    • Expected gross profit (GP%) given market and customer conditions.

    With those predictions, pricing teams can simulate “what-if” scenarios — such as, “What happens to win probability if we increase price by 3%?” — and make more confident, data-driven decisions.

    Dynamic Learning and Adaptability

    Another reason Machine Learning works so well for pricing optimization is its ability to evolve. As new data is collected — from market shifts, supply chain changes, or customer behavior — the model can be retrained to learn from recent patterns. This makes it inherently adaptive, unlike static pricing rules that quickly become outdated.

    From Insight to Action

    When combined with explainability tools like SHAP (SHapley Additive exPlanations), businesses can even interpret why the model made certain pricing recommendations. This transparency builds trust and ensures that AI-driven pricing decisions are both accurate and explainable — not just “black box” outputs.

    Two Core Models Behind an AI Pricing Framework: Regression and Classification

    A strong AI Pricing Model using Machine Learning in Python usually isn’t just one model — it’s a combination of two that work together to predict both what price to quote and how likely you are to win at that price. These are called regression models and classification models, and each plays a distinct but complementary role.


    1. The Regression Model — “What GP% should we expect (or recommend)?”

    Think of the regression model as the price-setting brain of your AI Pricing system.

    It answers questions like:

    “Given these conditions — product group, customer tier, lead time, and market — what gross profit percentage (GP%) should we expect on this quote?”

    The model learns from past quotes where you know both the inputs (features such as customer, product, and quantity) and the outcome (the GP% you actually achieved).

    Over time, it learns relationships like:

    • “High-volume orders tend to have lower GP%,”
    • “Certain customers consistently negotiate tighter margins,”
    • “Stock items can carry higher GP% due to faster availability.”

    By understanding these patterns, the regression model can predict the most reasonable or competitive GP% for a new quote — effectively recommending a data-driven price point that aligns with historical success patterns.


    2. The Classification Model — “What’s the probability we’ll win this quote?”

    If the regression model helps you set the price, the classification model helps you evaluate the risk and opportunity of that price.

    This model predicts a win probability — essentially answering:

    “Given this quote’s characteristics and price level, what’s the likelihood that the customer will award us the order?”

    It learns from historical data labeled as won or lost. For each quote, it examines factors such as:

    • Quoted GP% (price competitiveness),
    • Customer relationship or tier,
    • Lead time,
    • Product category or market conditions.

    The output is a probability — for example, “There’s a 72% chance of winning this quote at this price.”

    With that, your pricing system can balance profit vs. win likelihood, enabling smarter trade-offs — such as lowering margin slightly on a high-probability deal or holding firm on price when the odds of winning are already low.


    When You Combine the Two

    When you use both models together, you get a complete AI Pricing Framework:

    • The Regression Model recommends a target price or GP%,
    • The Classification Model estimates the win probability at that price,
    • And together they create a feedback loop that helps your team find the optimal price point — the one that maximizes both revenue and likelihood of success.

    This combination mirrors what experienced sales or pricing analysts do intuitively — except the AI can analyze millions of records and update itself continuously as new data comes in.

    Tools and Frameworks in Python for Building an AI Pricing Model

    One of the biggest advantages of building an AI Pricing Model using Machine Learning in Python is that the Python ecosystem already provides powerful, production-ready libraries for every stage of the workflow — from cleaning your data to training, evaluating, and explaining the model’s predictions.

    Below are the key tools you’ll use, and the role each one plays in developing a pricing optimization framework.


    1. pandas — The Data Preparation Workhorse

    Before you can train a model, your quote and sales data needs to be cleaned and structured.

    That’s where pandas comes in. It’s a Python library designed for working with tabular data — like your pricing history — using simple, spreadsheet-like commands.

    With pandas, you can:

    • Load CSVs or Excel files into DataFrames,
    • Handle missing or invalid data,
    • Create new features (e.g., “lead_time_days” or “customer_tier”),
    • Filter, sort, and group data by product or customer,
    • Join multiple datasets (e.g., cost data with quote history).

    In short: pandas is where you prepare your pricing dataset for Machine Learning.

    import pandas as pd
    
    df = pd.read_csv("quotes.csv")
    df['lead_time_days'] = (df['expected_ship_date'] - df['quote_date']).dt.days
    

    2. scikit-learn — The Foundation for Machine Learning

    Once your data is ready, scikit-learn provides the essential tools for building and evaluating models.

    It includes algorithms for both regression and classification, along with utilities for:

    • Splitting your dataset into training and test sets,
    • Scaling and encoding data,
    • Evaluating model accuracy using metrics like RMSE, MAE, and (for regression) or AUC (for classification),
    • Building pipelines that make your ML workflow repeatable and organized.

    Even if you later switch to more advanced models like XGBoost, scikit-learn remains the framework that ties everything together.

    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_absolute_error, r2_score
    

    3. XGBoost — High-Performance Predictive Modeling

    For real-world pricing problems, where data can include millions of quotes and dozens of features, XGBoost (Extreme Gradient Boosting) is a top performer.

    It’s a gradient-boosted decision tree algorithm known for:

    • Handling nonlinear relationships (e.g., between GP% and quantity),
    • Managing missing values gracefully,
    • Delivering high accuracy and fast training times.

    In your AI Pricing Model, XGBoost is typically used for both:

    • Regression → predicting the expected GP% for a quote,
    • Classification → predicting the probability of winning at that price.

    Its robustness and interpretability make it one of the most trusted algorithms for business-critical ML applications.

    import xgboost as xgb
    
    reg_model = xgb.XGBRegressor()
    clf_model = xgb.XGBClassifier()
    

    4. SHAP (SHapley Additive exPlanations) — Explaining the Model

    Even the most accurate AI pricing model isn’t useful if you can’t explain why it makes a certain recommendation.

    That’s where SHAP comes in. SHAP values quantify how much each feature — such as customer tier, lead time, or product group — contributes to a specific prediction.

    For example:

    • “Lead time contributed +1.2% to the GP% recommendation,”
    • “Customer tier lowered the win probability by 8%.”

    With SHAP visualizations, pricing analysts can see exactly what drives the model’s logic, turning complex AI outputs into actionable business insights.

    import shap
    
    explainer = shap.TreeExplainer(reg_model)
    shap_values = explainer.shap_values(X_test)
    shap.summary_plot(shap_values, X_test)
    

    Bringing It All Together

    When combined, these Python tools provide a complete end-to-end solution for building an AI Pricing Model:

    StageGoalTool
    Data preparationClean and organize quote datapandas
    Model trainingBuild regression and classification modelsscikit-learn, XGBoost
    Model evaluationMeasure accuracy and predictive powerscikit-learn metrics
    Model explainabilityVisualize feature importance and logicSHAP

    Together, these frameworks turn raw pricing history into a living, learning system — one that continuously refines your pricing strategy based on data, not gut feel.

    Step-by-Step: How to Build an AI Pricing Model Using Machine Learning in Python

    This walkthrough shows how to build the two-model framework:

    1. a Regression model to recommend a target GP%, and
    2. a Classification model to estimate win probability at a given price.

    We’ll use pandas, scikit-learn, XGBoost, and SHAP.

    Mini data-sanity checklist (save the deep dive for the next post):

    • Remove or flag obvious data errors (negative quantities/costs, GP% > 100, etc.).
    • Avoid leakage: for the win model, only use features available before the decision (e.g., do not use “won/lost” derived fields or post-quote info).
    • Ensure time awareness: train on older data, test on newer (or do time-based CV).
    • Encode categories (customer tier, product group/category) and handle missing values.

    1) Setup & Load Data

    import pandas as pd
    import numpy as np
    
    # Load your quotes dataset
    df = pd.read_csv("quotes.csv", parse_dates=["quote_date"], low_memory=False)
    
    # Example expected columns (adjust to your schema):
    # 'quoted_price', 'quoted_quantity', 'cost', 'quoted_gp_pct', 'won_flag',
    # 'product_group', 'product_category', 'customer_tier', 'lead_time_days',
    # 'is_stock_item', 'on_hand_qty_at_quote'
    

    Light cleaning:

    # Basic filters/sanity
    df = df.dropna(subset=["quoted_gp_pct", "won_flag", "product_group", "customer_tier"])
    df = df[(df["quoted_gp_pct"] > -20) & (df["quoted_gp_pct"] < 100)]  # tweak if needed
    
    # Ensure types
    df["is_stock_item"] = df["is_stock_item"].astype(int)  # 0/1
    df["won_flag"] = df["won_flag"].astype(int)  # 0/1
    

    2) Feature Sets for Each Model

    • Regression (target = quoted_gp_pct)
      Inputs that influence achievable margin: ['product_group', 'product_category', 'customer_tier', 'lead_time_days', 'is_stock_item', 'on_hand_qty_at_quote']
    • Classification (target = won_flag)
      Include the price signal (e.g., quoted_gp_pct) plus context features: ['quoted_gp_pct', 'product_group', 'product_category', 'customer_tier', 'lead_time_days', 'is_stock_item', 'on_hand_qty_at_quote']

    Tip: Keep feature names aligned between models so the system is easy to maintain. Avoid ‘feature leakage’ by not giving any features to the models that give away the answer (for example, do not give information about the actual price or GP% from the training data to the regression model, and do not give the outcome information to the classification model.) Also, remember that the information that is chosen to be used as a feature will be the same information that will be required to give the models later when using them to predict outcomes.


    3) Train/Test Split (time-aware if possible)

    # Option A: random split (simple)
    from sklearn.model_selection import train_test_split
    
    reg_features = ['product_group','product_category','customer_tier',
                    'lead_time_days','is_stock_item','on_hand_qty_at_quote']
    clf_features = ['quoted_gp_pct'] + reg_features
    
    X_reg = df[reg_features]
    y_reg = df["quoted_gp_pct"]
    
    X_clf = df[clf_features]
    y_clf = df["won_flag"]
    
    Xr_train, Xr_test, yr_train, yr_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
    Xc_train, Xc_test, yc_train, yc_test = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42)
    
    # Option B (recommended for production): split by date so test is newer period
    

    4) Preprocessing Pipelines

    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import OneHotEncoder
    from sklearn.pipeline import Pipeline
    from sklearn.metrics import mean_absolute_error, r2_score, mean_squared_error, roc_auc_score
    import xgboost as xgb
    import numpy as np
    
    cat_cols = ['product_group','product_category','customer_tier']
    num_cols = ['lead_time_days','is_stock_item','on_hand_qty_at_quote']
    
    preprocessor = ColumnTransformer(
        transformers=[
            ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols),
            ("num", "passthrough", num_cols)
        ]
    )
    

    5) Train the Regression Model (GP% recommender)

    reg_model = xgb.XGBRegressor(
        n_estimators=600,
        max_depth=6,
        learning_rate=0.05,
        subsample=0.9,
        colsample_bytree=0.9,
        random_state=42,
        n_jobs=-1
    )
    
    reg_pipe = Pipeline([
        ("prep", preprocessor),
        ("model", reg_model)
    ])
    
    reg_pipe.fit(Xr_train, yr_train)
    
    # Evaluate
    yr_pred = reg_pipe.predict(Xr_test)
    rmse = np.sqrt(mean_squared_error(yr_test, yr_pred))
    mae = mean_absolute_error(yr_test, yr_pred)
    r2 = r2_score(yr_test, yr_pred)
    
    print(f"Regression — RMSE: {rmse:.3f} | MAE: {mae:.3f} | R²: {r2:.3f}")
    

    6) Train the Classification Model (win probability)

    clf_model = xgb.XGBClassifier(
        n_estimators=600,
        max_depth=6,
        learning_rate=0.05,
        subsample=0.9,
        colsample_bytree=0.9,
        random_state=42,
        n_jobs=-1,
        eval_metric="auc"
    )
    
    # Preprocessor is the same structure, but includes quoted_gp_pct as numeric
    cat_cols_c = cat_cols
    num_cols_c = ['quoted_gp_pct','lead_time_days','is_stock_item','on_hand_qty_at_quote']
    
    preprocessor_clf = ColumnTransformer(
        transformers=[
            ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols_c),
            ("num", "passthrough", num_cols_c)
        ]
    )
    
    clf_pipe = Pipeline([
        ("prep", preprocessor_clf),
        ("model", clf_model)
    ])
    
    clf_pipe.fit(Xc_train, yc_train)
    
    # Evaluate
    yc_pred_proba = clf_pipe.predict_proba(Xc_test)[:,1]
    auc = roc_auc_score(yc_test, yc_pred_proba)
    print(f"Classification — AUC: {auc:.3f}")
    

    7) Explainability with SHAP (optional but recommended)

    import shap
    
    # For tree-based models, explain on the transformed matrix
    # Grab a small sample to keep plots fast
    sample = Xr_test.sample(n=min(2000, len(Xr_test)), random_state=42)
    
    # Fit a TreeExplainer on the trained XGB model inside the pipeline
    # We need the model object (reg_model) and the transformed features
    X_sample_transformed = reg_pipe.named_steps["prep"].transform(sample)
    explainer = shap.TreeExplainer(reg_pipe.named_steps["model"])
    shap_values = explainer.shap_values(X_sample_transformed)
    
    # Summary plot (run in notebooks)
    # shap.summary_plot(shap_values, X_sample_transformed, feature_names=reg_pipe.named_steps["prep"].get_feature_names_out())
    

    Tip: For reports, capture SHAP bar plots for top features affecting GP% and win probability. This builds trust with commercial teams.


    8) Put the Models to Work: Recommend a Price & Simulate Win Probability

    Flow in production for a new quote:

    1. Use the regression model to recommend a baseline GP%.
    2. Convert GP% → price (based on cost).
    3. Create a small price ladder around that recommendation (±2–5 percentage points).
    4. For each rung, compute win probability via the classification model.
    5. Pick the rung that meets your business objective (e.g., maximize expected margin = margin × win_prob, or enforce a minimum win probability).

    Example:

    def gp_to_price(cost, gp_pct):
        # gp_pct as percentage number, e.g., 25 means 25%
        return cost / (1 - gp_pct/100.0)
    
    def simulate_ladder(row, reg_pipe, clf_pipe, ladder_pts=(-4,-2,0,2,4)):
        # 1) Predict baseline GP%
        reg_input = row[reg_features].to_frame().T
        gp_base = float(reg_pipe.predict(reg_input))
    
        results = []
        for delta in ladder_pts:
            gp_try = max(min(gp_base + delta, 95), -5)  # clamp
            price_try = gp_to_price(row["cost"], gp_try)
    
            clf_input = row[clf_features].copy()
            clf_input["quoted_gp_pct"] = gp_try
            win_prob = float(clf_pipe.predict_proba(clf_input.to_frame().T)[:,1])
    
            margin = price_try - row["cost"]
            expected_margin = margin * win_prob
    
            results.append({
                "gp_pct": round(gp_try,2),
                "price": round(price_try,2),
                "win_prob": round(win_prob,3),
                "expected_margin": round(expected_margin,2)
            })
        return pd.DataFrame(results).sort_values("expected_margin", ascending=False)
    
    # Example usage on a single quote row (replace with real row)
    # row = df.iloc[0]
    # ladder = simulate_ladder(row, reg_pipe, clf_pipe)
    # display(ladder)
    

    9) Save & Load Models

    import joblib
    joblib.dump(reg_pipe, "gp_regression_pipe.joblib")
    joblib.dump(clf_pipe, "win_classifier_pipe.joblib")
    
    # Later
    # reg_pipe = joblib.load("gp_regression_pipe.joblib")
    # clf_pipe = joblib.load("win_classifier_pipe.joblib")
    

    10) Production Tips

    • Time-based validation: use rolling windows to ensure robustness across market regimes.
    • Segmented models: consider separate models by product category if behavior differs drastically.
    • Guardrails: enforce GP% floors/ceilings by customer tier or category.
    • Monitoring: track drift in feature distributions and periodic re-train cadence (monthly/quarterly).

    Conclusion: Turning Data Into Dynamic Pricing Decisions

    Building an AI pricing model with Python transforms pricing from guesswork into a measurable, data-driven strategy. By combining regression and classification models with tools like pandas, scikit-learn, XGBoost, and SHAP, businesses can predict both the optimal price and the probability of winning at that price — all while understanding why the model makes its recommendations. The result is a pricing framework that adapts to changing markets, maximizes profit margins, and empowers your team to make confident, intelligent decisions. As AI continues to reshape competitive industries, developing your own machine learning pricing model isn’t just a technical advantage — it’s a strategic one.

  • Creating a custom private chatbot for your Business

    Creating a custom private chatbot for your Business

    How to create an agent within Microsoft 365 with proprietary knowledge sources for answering questions

    In every organization, knowledge lives in scattered places — tucked inside PDFs, SharePoint folders, policy manuals, and the collective minds of employees. The problem isn’t a lack of information; it’s the daily friction of finding it. That’s where creating a custom chatbot for your business powered by your company’s proprietary knowledge becomes transformative. Instead of digging through folders or pinging colleagues for answers, employees can simply ask: “What’s our policy for client data retention?” or “How do I reset the field service tablet?” and get an instant, accurate reply drawn directly from the organization’s verified sources.

    This kind of system doesn’t just save time; it elevates the reliability of information. By connecting the chatbot to internal documents, you ensure that every answer reflects the company’s latest guidance — not the internet’s best guess. Microsoft’s Copilot Studio makes this easier than ever, allowing you to build custom agents that understand your domain, your language, and your processes. Think of it as creating your own ChatGPT — except it knows your business.

    It’s worth noting that Copilot Studio agents are, by design, internal tools. They live inside your Microsoft 365 environment and are accessible only to authenticated users within your organization. That limitation is intentional — it keeps sensitive company data secure. While Microsoft is gradually expanding options for public-facing deployment, the real power today lies in using these custom copilots to boost efficiency, reduce knowledge bottlenecks, and centralize institutional wisdom safely behind your firewall.

    In this post, we’ll walk through how to create a custom LLM-based agent using Microsoft’s Copilot Studio, upload your proprietary knowledge documents, and configure it to answer real business questions with authority — effectively turning your company’s documentation into a living, conversational assistant.

    What you’ll build

    A Microsoft agent that references internal/proprietary knowledge sources when answering questions.

    Prerequisite: Microsoft Office 365

    Step-by-step

    1. Create Your Custom Chatbot Agent

        In Microsoft’s Office 365, click on the ‘Create agent’ link located in the left-hand menu bar:

        The two-panel Agent configuration window will appear:


        On the left side is the Describe tab. Entering information into this tab will allow Copilot Studio to do some automatic setup behind the curtain. Here you can tell Copilot in natural language what kind of agent you want to build. Think of it as the agent’s “origin story.” You can start by typing a natural-language description (“I want an AI that answers customer service questions from my company’s manuals”) or select a template from Microsoft’s starter options, such as Career Coach, Customer Insights Assistant, or Idea Coach. Each template comes preloaded with example intents and tone settings, which can save time if your use case fits a common pattern.

        Entering information into the Describe tab isn’t mandatory, but it’s like having an AI co-designer for the first five minutes. Entering even a one-sentence goal (“Build an agent that answers technical support questions from my company’s service manuals”) can prime the system with an initial personality and intent structure. Skipping straight to Configure gives you a blank slate — ideal for advanced users who prefer full creative control, but slightly more work for beginners.

        The right side of the interface is a workspace preview for the agent that serves as a design canvas that will gradually fill in as the agent is configured. Later, as knowledge documents are added and additional instructions are given, it becomes a workspace where you can test and refine the agent’s behavior.

        2. Configure the agent

        Click on the Configure tab (just beside Describe). This is where you’ll complete the configuration and link your proprietary knowledge documents — PDFs, manuals, spreadsheets, or databases — so your custom LLM can answer questions using real company data rather than generic web knowledge. Here’s where you can make your agent shine with insider expertise.

        The Configure tab initially looks like this:

        On the left-hand side of this screen, you’ll see three main sections that matter most:

        Template Dropdown

        At the very top, you’ll find the Template option. If you selected one earlier (like Career Coach or Customer Insights Assistant), you’ll see it listed here. Templates act as preconfigured blueprints that come with example intents and tone settings. Choosing None gives you a blank slate — ideal for building a fully custom agent powered by your company’s documents and internal data.

        Pro Tip: If your use case is highly specific — such as a compliance advisor or technical support agent for internal systems — start with “None.” This avoids baked-in behaviors that can conflict with your proprietary content. Templates are best for quick prototypes, not precision builds.

        Details Panel (Name and Description)

        This is where your agent’s public identity takes shape.

        • Name: The title your users will see when they interact with the chatbot. Choose something meaningful — for example, Legacy Edge Knowledge Assistant or HR InfoBot.
        • Description: A one-sentence summary that tells users what the agent does. This description also guides the underlying model: it acts as metadata to prime its understanding of its domain and purpose.
        • Icon: Click the pencil button to change the icon that is shown for the agent.

        Pro Tip: Before writing your agent’s description, spend a few minutes crafting a mission statement in plain English — one or two sentences that describe what your agent should know and how it should respond. Write the description as if your’re briefing a new employee. The clearer the description, the sharper the responses. Keep the description under two sentences and use active verbs (“provides”, “retrieves”, “summarizes”). This helps both Copilot Studio and human users instantly understand the agent’s scope — and prevents it from over-generalizing into unrelated areas.

        Instructions Panel

        This large text field is the most powerful section — it defines how your agent thinks and behaves. Here you’ll describe:

        • What the agent should do
        • How it should sound (professional, conversational, technical, etc.)
        • Any rules or boundaries it must follow

        In essence, this is your agent’s system prompt — the foundational guidance the LLM (large language model) reads before interacting with users. Microsoft calls this “Instructions,” but think of it as the “mission script” for your AI.

        Example:

        “You are an internal support assistant for Legacy Edge. Your job is to answer employee questions using official company knowledge documents. Always cite the relevant document when responding. If a question is outside your scope, respond: ‘I don’t have that information available internally.’ Maintain a helpful and factual tone.”

        Pro Tip: Treat the Instructions like programming logic written in plain English. Every word affects how the agent behaves. Avoid vague directions like “Be smart and friendly.” Instead, define purpose, style, and boundaries clearly — especially when the agent handles proprietary data.

        What Happens Next

        Once you’ve completed these fields, you’re ready to click Create in the top-right corner. That’s the moment when Copilot Studio actually instantiates your agent — giving it a workspace and enabling the right-hand chat window for live testing. From there, you’ll move on to uploading your internal knowledge documents and fine-tuning how the agent retrieves and summarizes information.

        Note: if you click an option that takes you away from the configuration screen after the chatbot has been created, you can always return to the configuration screen by clicking the three dots next to the name of your chatbot (your new chatbot should appear in the left-hand menu under ‘Agents’) and selecting Edit.

        3. Add knowledge sources

        Once your agent has a name, description, and behavior defined, it’s time to give it something to know. The Knowledge section is where you connect your internal data sources — the documents, files, or URLs that the chatbot will draw from when answering questions.

        When you click inside the Knowledge field, you’ll see two main options:

        1. Upload or link content directly, such as PDFs or Microsoft Office files.
        2. Reference existing online sources, such as OneDrive, SharePoint, or public URLs.

        Each approach has its strengths, and the best option depends on how dynamic your information is.

        Uploading Files Directly

        Uploading files gives your chatbot a static snapshot of the knowledge at the time of upload. This is ideal for material that doesn’t change often — things like standard operating procedures, troubleshooting guides, or technical reference manuals.

        • Pros: Simple setup, fast ingestion, no dependency on external systems.
        • Cons: You’ll need to re-upload documents when content changes.

        Linking to OneDrive or SharePoint

        Referencing cloud-based storage (OneDrive or SharePoint) lets your Copilot dynamically access the latest versions of those files. This is the better choice if your documentation is frequently updated by multiple teams.

        • Pros: Always current, centralizes updates, ideal for shared corporate documentation.
        • Cons: Requires consistent access permissions; if the file is moved or renamed, the link breaks.

        Pro Tip: When you upload files directly, those documents are ingested and indexed locally and pre-processed into searchable text embeddings that are optimized for retrieval. This will result in faster response times (by one or two seconds.) Usually a hybrid approach is best — upload your static core documents that do not change frequently (SOPs, policy documents, troubleshooting guides) and link dynamic resources like “Known Issues” spreadsheets or service logs stored in SharePoint.


        What Types of Documents Work Best

        Copilot Studio supports a variety of document formats, but not all files are equal in usefulness. Here’s how they stack up:

        • PDFs: Excellent for finalized procedures, policies, and manuals. They preserve layout and structure, making them easy for the model to parse.
        • Word Documents (.docx): Great for technical explanations, onboarding guides, or FAQs. They’re readable and editable, so you can easily update knowledge content.
        • Excel Spreadsheets: Useful for structured data — such as error code lookups, pricing matrices, or configuration values. However, keep them simple; the model reads cell text, not formulas or visuals.
        • PowerPoint Files: Fine for summarizing visuals or presenting workflows, but only the text portions (titles and speaker notes) are parsed.
        • URLs: Handy if your company hosts documentation externally (like a support portal or documentation site). Be sure the pages don’t require authentication; Copilot can only read public URLs.

        Pro Tip: Avoid uploading raw exports, complex tables, or engineering drawings. Instead, upload or link documents that explain those materials in text — for example, a “Product Configuration Reference Guide” is far more useful than a schematic file with no context.


        Choosing the Right Knowledge Sources

        When deciding what to feed your Copilot, prioritize documents that provide context and clarity. Good knowledge candidates include:

        • Standard Operating Procedures (SOPs) — Yes. These are gold; they define the official way things are done.
        • Employee Manuals — Only if your chatbot handles HR or policy questions. Otherwise, they clutter the knowledge base.
        • Technical Specifications — Excellent if they contain plain-language explanations and troubleshooting details.
        • Product Catalogs — Useful for sales or support bots, but less valuable if they’re mostly images or SKU tables without text descriptions.
        • Drawings or Engineering Blueprints — Generally not effective; the model can’t interpret images or CAD formats.

        Pro Tip: Before uploading, skim each document and ask, “Would I hand this to a new hire to help them solve a problem?” If the answer is yes, it probably belongs in your chatbot’s knowledge base.


        Prioritizing Knowledge Sources

        After adding your files or links, toggle “Prioritize the knowledge sources you added for agent knowledge-based queries” to on. This instructs the agent to rely on your internal content first before using its general reasoning abilities — a crucial setting for keeping responses accurate and aligned with company policy.

        Once your documents are added and prioritized, your Copilot now has a mind of its own — filled with Legacy Edge’s proprietary knowledge, ready to deliver accurate, document-sourced answers.

        Pro Tip: Keep your uploaded documents clean and well-labeled; Microsoft’s ingestion process works best when file names and content headers clearly reflect the topics they cover. Think “Product_FAQ_June2025.pdf” instead of “final_v3_reallyfinal.pdf.”

        Enabling Capabilities and Suggested Prompts

        Once your knowledge documents are connected, scroll down to the Capabilities and Suggested Prompts sections. These settings shape how your Copilot processes information and how users interact with it.


        Capabilities

        The Capabilities section controls the advanced tools your chatbot can use. Currently, Copilot Studio offers two main options: Code Interpreter and Image Generator.

        Code Interpreter

        When this option is enabled, your agent can perform lightweight calculations, analyze data, or process snippets of code. For technical or engineering environments, this can be incredibly useful — for example, parsing error logs, formatting configuration strings, or doing quick math related to system parameters.

        • When to use it: Enable it if your chatbot will handle any queries involving formulas, data analysis, or troubleshooting automation scripts.
        • When to skip it: Leave it off if your agent deals strictly with procedural knowledge or policies (it adds no benefit to general knowledge lookup tasks).

        Example query that would need Code Interpreter:

        User: “What’s the average system response time if server A reports 310ms and server B reports 270ms?”

        EdgeAssist: “The average is 290 milliseconds.”

        Pro Tip: Activating the Code Interpreter doesn’t make the agent a full Python environment — it’s more like a quick analytical assistant. Keep it on for demos involving data-driven reasoning, but test responses carefully before deploying it to production.


        Image Generator

        Turning this on allows your Copilot to generate simple images — useful in creative or presentation contexts. For an internal technical support bot like EdgeAssist, it’s not essential, but you might enable it to demonstrate how Copilot can illustrate workflows, diagrams, or UI steps.

        • When to use it: Helpful for documentation, onboarding, or training use cases.
        • When to skip it: Disable for performance-critical technical support bots — image generation can add unnecessary complexity.

        Suggested Prompts

        The Suggested Prompts section helps guide users by showing examples of what they can ask the chatbot. Think of this as a built-in conversation starter menu — especially helpful for new users who aren’t sure what to type.

        When you click “Add a suggested prompt,” you can pre-fill several example questions. These prompts don’t just improve usability; they quietly teach your users the kinds of queries your chatbot is designed to handle.

        Examples:

        • “How do I fix a timeout error in the automation module?”
        • “What are the steps to reset a client’s API key?”
        • “Where can I find the latest release notes for the analytics engine?”
        • “What causes a configuration mismatch during deployment?”

        Pro Tip: Use this feature like a mini onboarding experience and show a variety of uses. For example, include one prompt that demonstrates the agent’s range (“Show me how to configure a new client system”), one that shows its precision (“What’s the fix for error code 421?”), and one that shows its ability to answer an often-asked non-technical question (“What should I do if documentation doesn’t cover a client issue?”).

        The Suggested Prompts appear when users open a new chat, reducing the learning curve and encouraging meaningful engagement from the start.


        Wrapping Up This Step

        The Capabilities and Suggested Prompts features may seem small, but they’re what turn a static assistant into an interactive teammate. Enabling capabilities extends what your Copilot can do, while thoughtful suggested prompts guide users toward what it should do best.

        At this stage, your internal chatbot can reason, retrieve knowledge, and communicate clearly with your team. The next step is deployment: deciding where employees can access it (in Teams, on your internal portal, or embedded within your support dashboard).

        3. Share the Agent

        Once you’ve configured the Agent’s knowledge, capabilities, and prompts, it’s time to make it available to your team. Publishing in Microsoft Copilot Studio is straightforward, but understanding the access options ensures your chatbot stays secure and reaches the right audience.


        Publishing the Agent

        At the top right of the configuration screen, you’ll see the Publish button (or Update, if you’ve already published once). Clicking this triggers Microsoft Copilot Studio to build and deploy your agent inside your Microsoft 365 environment.

        Once published, your chatbot is ready to use — but by default, it’s internal only. That means it’s available to authenticated users within your Microsoft Entra (formerly Azure Active Directory) tenant.

        After publishing, you can access your deployment options under the Share button (next to Update).


        Sharing Options

        When you click Share, Copilot Studio will show you several ways to make your agent accessible:

        1. Microsoft Teams Integration

        This is the most common deployment path for internal copilots. With a single click, you can add EdgeAssist to your organization’s Teams environment. Users can chat with it just like they would with a colleague — perfect for quick troubleshooting or referencing technical documentation mid-meeting.

        • Pros: Frictionless access; no separate login; easy for employees to adopt.
        • Cons: Limited visibility outside Teams unless embedded elsewhere.

        2. Web App Access

        Copilot Studio also lets you deploy your chatbot as a standalone web app — a simple, branded chat interface your employees can access via a unique URL.

        • Pros: Works anywhere within your organization; easy to bookmark or embed in an internal support portal.
        • Cons: Still requires Microsoft login; cannot be accessed anonymously by external users.

        3. Embedding via Power Pages or Internal Websites

        For a more integrated experience, you can embed your chatbot directly into your company’s intranet or Power Pages site using an iframe or web component provided by Microsoft.

        • Pros: Seamless integration into existing systems or dashboards.
        • Cons: Requires Power Platform permissions and possibly some admin setup.

        About Public Access

        At this time, Microsoft Copilot Studio is designed primarily for internal copilots. That means public, anonymous access is turned off by default. While Microsoft has begun rolling out limited preview features for anonymous sharing, most organizations (especially those handling proprietary data) will want to keep this feature disabled for security reasons.

        For demonstration or testing with external partners, you can manually invite specific Microsoft accounts with limited permissions — just make sure those users are added as “guests” to your tenant.

        Pro Tip: Keep your first deployment internal. Once your team has used the chatbot and verified that responses are accurate, you can later publish an external-facing version (for clients or website visitors) using Microsoft Power Pages with controlled access.


        Verifying the Chatbot

        After publishing, click New Chat in the right-hand preview pane to test it in action. Ask a few real-world questions drawn from your uploaded knowledge base to confirm:

        • It retrieves the correct documents.
        • Responses are accurate and concise.
        • The tone and scope align with your “Instructions.”

        If any answers seem off, go back to the Configure tab, adjust the instructions or add more knowledge sources, and hit Update to republish. You can iterate as often as needed — each publish creates a new version that instantly replaces the old one.

        Pro Tip: If the chatbot returns wrong information, chat with it until it realizes and admits that it made a mistake. If it doesn’t offer a reason why it made the mistake, ask it to explain why it made the mistake. Then ask it “Give me the exact instruction statement that I should add to this Agent’s Instructions so that this mistake never happens again.” Then, add this statement to the Agent’s instructions, Update it, and test it with more questions to confirm that the additional training statement helped the chatbot to be more accurate. Continue this pattern until the chatbot’s responses are acceptable.


        Wrapping Up

        Publishing marks the transition from design to deployment — from blueprint to working AI teammate. Whether it lives in Teams, on your internal portal, or embedded in your support dashboard, your custom chatbot now becomes a living resource: a searchable, conversational layer on top of your company’s collective expertise.

        Your next frontier? Expanding the chatbot’s role — connecting it to workflows, integrating it with service tickets, or even giving it multi-agent collaboration powers to route complex client issues automatically.

      1. AI-Powered Email Automation

        AI-Powered Email Automation

        How to use AI to determine the sender’s intent from Office365 emails

        Inbound email Inbound still drives a shocking amount of work inside most companies. Examples you’ll recognize:

        • “Please quote 250 EA of PN-4321.”
        • “Attached is PO‑98765”
        • “What’s the status on order 12345?”
        • “Send tracking for yesterday’s shipment.”

        If a human must open each message, interpret it, save attachments, run a script, and route folders, you lose hours and introduce errors.

        Classifying the intent of each email the moment it arrives unlocks automation. Your flow can branch to send quotes, process purchase orders, trigger status lookups, and more—consistently and at scale.

        What you’ll build

        A Microsoft Power Automate cloud flow that:

        1. Reads new emails
        2. Converts HTML to text
        3. Uses an LLM prompt to categorize the message
        4. Parse the JSON response from the LLM
        5. Gates on confidence
        6. Switches on intent
        7. Takes appropriate actions based on the sender’s intent

        Prerequisites: Microsoft Office 365 Outlook and Microsoft Power Automate Cloud

        High-level-flow

        Trigger HTML to text Run an AI prompt Parse JSON response Condition (confidence) Switch (intent) Per-intent actions

        Step-by-step

        1. Create the flow and set the trigger

        In Microsoft’s Power Automate, select My Flows and then New Flow – Automated cloud flow:

        Give the new flow a descriptive name, and choose the trigger: When a new email arrives (V3). Click Create.

        The flow will be shown in Map View:

        Click the first (and only) action shown in the Map View of the flow to reveal the Settings panel, which will usually appear on the left side of the screen.

        Select Folder under the Advanced parameters dropdown list and select the email folder that you are interested in having the trigger action monitor (usually Inbox).

        Set any other Advanced parameters that might apply for your automation. Another common parameter that is changed is the Include Attachments parameter, it is often set to Yes.

        2. Convert the email body to text

        Emails are often sent in HTML format. The automation will be more reliable if we convert the body of those emails to text.

        Add a 2nd action by clicking the “+” button below the first. Search for the action named Html to text and click it to add it to the flow.

        In the Settings panel for the new action, click in the Content box, then click the blue lightening icon to insert dynamic content.

        A list of available dynamic content will appear. From the list of dynamic content, select Body from the previous step.

        Tip: There are quite a few outputs from the previous step that are not shown. If Body is not listed, click See More to see the additional dynamic content options.

        This will cause the body of the email to be dynamically inserted into the Content that will be converted to text.

        3. Use AI to categorize the email, returning a JSON response.

        Add another action to the flow. Select the action named Run a prompt.

        In the Settings, for the new action, under the Parameters tab, click the drop-down arrow under Prompt and select New custom prompt.

        Insert the following prompt into the Instructions textbox:

        You are an assistant that categorizes inbound emails.
        Your task: Analyze the email subject and body, then return ONLY valid JSON that matches this schema:
        {
          "intent": "quote_request | purchase_order | other",
          "confidence": 0.0-1.0,
          "summary": "A one-sentence summary of the email"
        }
        Definitions:
        - "quote_request": The sender is asking for pricing, a quote, or availability for one or more items.
        - "purchase_order": The sender is sending a Purchase Order (PO) in response to a quote or pricing you previously sent them.
        - "other": Anything that is not a quote request or a purchase order.
        Guidelines:
        - Always output JSON only, with no explanatory text.
        - confidence should be between 0.0 and 1.0 indicating how certain you are of the classification.
        - summary should be a short, plain English sentence.
        EMAIL_SUBJECT:
        EMAIL_BODY:

        Tip: This example shows you how you can process purchase orders automatically and respond to customer quotes automatically. Customize the above prompt to your needs. For example, change the list of intents to match the intents that you desire the AI to categorize. When you do so, make sure you modify the expected JSON schema below in step #4 to match your modifications.

        Dynamically inserting the email subject and body into the prompt

        You’ll notice that there is no text after the EMAIL_SUBJECT and EMAIL_BODY tags in the above prompt.

        Place your cursor after the words EMAIL_SUBJECT in the prompt, add a space, and then enter a forward-slash “/” character. Doing so will bring up a context menu that looks like the following:

        Select Text to choose a text input, then name the input email_subject. (You do not need to enter any sample data). Click Close on the context window.

        You’ll notice a new placeholder inserted in the prompt next to EMAIL_SUBJECT with the name email_subject, and at the bottom of the Instructions textbox you will see ‘1 input’.

        Follow the same pattern to add another Text input named email_body next to the text EMAIL_BODY. Now you should see ‘2 inputs’ at the bottom of the Instructions textbox:

        Click Save at the bottom right of the screen.

        The two inputs (Email_body and Email_subject) are now available to be set under the Parameters tab:

        Click in the text box for Email_body, click the blue lightening bolt icon and select The plain text content output from the previous step Html to text.

        Select the Subject from the When a new email arrives (V3) action as the dynamic content for the Email_subject.

        4. Parse the JSON returned from the LLM

        Add a Parse JSON action to the flow.

        Tip: Actions can be conditionally called upon success or failure of previous actions. For complicated actions that have a chance of failure (like the Run a Prompt action above), condition the next action to only fire upon a successful completion of the previous action. When hardening this solution for production, consider changing the Setting for the Parse JSON action to only fire if the Run a Prompt action is successful. Define another action to handle the error flow if the previous action fails or times out.

        Under its Parameters, select the Text output from the Run a Prompt as the dynamic content for the Content parameter.

        The other parameter setting of this action needs to describe the JSON schema that should be expected. Enter the following into Schema:

        {
          "type": "object",
          "properties": {
            "intent": {
              "type": "string",
              "enum": ["quote_request","purchase_order","other"]
            },
            "confidence": { "type": "number" },
            "summary": { "type": "string" }
          },
          "required": ["intent","confidence","summary"]
        }

        Tip: Don’t forget to modify the above schema to match the list of intents that you want to categorize, from step #3 above.

        5. Gate on confidence

        Add a Condition action to the flow. In its Parameters, click in the box labeled Choose a value, click the blue lightening bolt to insert dynamic content and choose the Body confidence output from the Parse JSON action.

        Set the condition is greater than or equal to 0.90.

        Tip: Tune this confidence score as needed.

        If False: you have several options here. For example, you can put no actions below this condition to basically ignore emails that arrive in the monitored folder that the AI was not highly confident matched a specified intent. Or, you could add actions here to move the email to another folder for a human to review.

        If True: continue to the next step, Switch on intent.

        6. Switch on intent

        Add a Switch action to the flow under the True condition.

        In the On parameter, click in the box labeled Choose a value, then click on the blue lightening bolt and select Body intent from the Parse JSON action as dynamic content.

        In the Map View, there is a “+” button below the Switch action, as well as a Default case.

        Click the “+” button to the left of the Default case to add a case. In the Equals parameter, enter one of the intents, such as “quote_request”.

        Tip: For readability in the Map View, click ‘Case’ to change the name of the case from ‘Case’ to ‘Quote_Request’.

        Follow the same pattern to add the other cases below the Switch action, for example, “purchase_order”.

        7. Continue the flow for each intent

        Click the “+” button under each intent (and potentially the ‘Default’ case too) to perform your desired actions for each intent. An excellent next step in the flow may be to extract a list of items from the content. Refer to the article How to Extract Items and Quantities from Emails, PDFs and Excel using AI to learn how to perform that step.

        Summary

        This post has shown how to achieve Office365 email automation using Microsoft Power Automate and AI to automatically classify incoming emails by intent—such as quote requests, purchase orders, or other messages. This method of AI email categorization will save your business time, reduce errors, and streamline the processes that are triggered by emails.