实验报告

您可以通过以下两种主要方式生成实验报告:

  • 直接实验报告:查询 experiment 资源以获取指标。此选项可在单个响应中提供对照组和实验组的指标,以及提升幅度和 p 值等统计比较数据。这是报告广告系列内实验的唯一方式。
  • 广告系列报告:查询 campaign 资源以获取指标,使用 campaign.experiment_type 区分基础广告系列和实验广告系列。此选项仅适用于使用单独的对照组广告系列和实验组广告系列的实验,例如系统管理的实验。

本指南主要介绍直接实验报告,该报告与支持报告的所有实验类型兼容。

直接实验报告

您可以直接查询 experiment 资源,以检索对照组和实验组之间的效果指标和统计比较结果。

指标和统计显著性

对于点击次数、展示次数、费用、转化次数和转化价值等核心指标,experiment 资源会在同一行中同时提供实验组指标(例如 metrics.clicks)和对照组指标(例如 metrics.control_clicks)。

它还提供了一些字段,可帮助您评估各实验组之间任何差异的统计显著性:

  • metrics.*_p_value:如果实验对相应指标没有实际影响,出现观测结果的概率。p 值越小,统计显著性越高。
  • metrics.*_point_estimate:实验组相对于对照组在给定指标方面的估计升幅(正或负)。metrics.*_point_estimatemargin_of_error 一起描述了一个置信区间,该区间具有针对所估计差异的指定置信水平。所估计的量为(实验组 / 对照组 - 1)。点估计值是置信区间的中心。
  • metrics.*_margin_of_error:置信区间的半径,以 point_estimate 为中心。它是针对指定置信度计算的,而置信度取决于实验类型。

experiment 资源支持以下核心指标字段,包括处理组值、对照组值和前面列出的统计字段:

  • clicks
  • impressions
  • cost_micros
  • conversions
  • cost_per_conversion
  • conversion_value
  • conversion_value_per_cost

对于转化,统计字段可通过以下 absolute_change 字段获得,而不是以相对值的形式获得:

如需帮助构建针对 experiment 资源的有效查询,请使用 Google Ads 查询构建器工具。

查询示例

以下 GAQL 查询可检索实验的关键指标:

SELECT
  experiment.experiment_id,
  experiment.name,
  experiment.type,
  metrics.clicks,
  metrics.control_clicks,
  metrics.clicks_point_estimate,
  metrics.clicks_margin_of_error,
  metrics.clicks_p_value,
  metrics.conversions,
  metrics.control_conversions,
  metrics.conversions_absolute_change_point_estimate,
  metrics.conversions_absolute_change_margin_of_error,
  metrics.conversions_absolute_change_p_value
FROM experiment
WHERE experiment.experiment_id = EXPERIMENT_ID

解读结果

您可以使用 p 值、点估计值和边际误差字段来确定实验是否产生了具有统计显著性的结果。例如,如果 conversions_absolute_change_p_value 低于您选择的阈值(例如,对于 95% 的置信度,阈值为 0.05),且 conversions_absolute_change_point_estimate - conversions_absolute_change_margin_of_error 大于零,则表示实验组在转化方面的效果明显优于对照组。

以下 Python 代码段展示了如何根据 p 值和提升效果估计值评估结果:

Java

private void evaluateExperiment(
    GoogleAdsClient googleAdsClient, long customerId, GoogleAdsRow row) {
  Metrics metrics = row.getMetrics();
  String experimentResourceName = row.getExperiment().getResourceName();

  // 1. Evaluate conversion success as a primary success signal if available.
  // - Point Estimate: Represents the estimated average lift or difference in conversions.
  // - Margin of Error: Outlines the confidence interval bounds. Note that the margin_of_error
  //   provided by the API is calculated for a preset confidence level which is set based on the
  //   experiment type.
  // - Lower Bound: (Point Estimate - Margin of Error). If this value is above 0,
  //   we have statistical significance that performance has improved.
  double convPValue = metrics.getConversionsAbsoluteChangePValue();
  double convLift = metrics.getConversionsAbsoluteChangePointEstimate();
  double convError = metrics.getConversionsAbsoluteChangeMarginOfError();
  double convLowerBound = convLift - convError;

  if (convPValue <= P_VALUE_THRESHOLD) {
    if (convLowerBound > 0) {
      System.out.printf(
          "Significant Success: Conversions increased. Even at the lower bound, the lift is %.2f."
              + " Promoting changes.%n",
          convLowerBound);
      promoteExperiment(googleAdsClient, customerId, experimentResourceName);
      return;
    } else if ((convLift + convError) < 0) {
      System.out.printf(
          "Significant Decline: Even the upper bound (%.2f) is below zero. Ending experiment.%n",
          convLift + convError);
      endExperiment(googleAdsClient, customerId, experimentResourceName);
      return;
    }
  }

  // 2. Fall back to evaluating click metrics if conversions are inconclusive.
  double clickPValue = metrics.getClicksPValue();
  double clickLift = metrics.getClicksPointEstimate();
  double clickError = metrics.getClicksMarginOfError();
  double clickLowerBound = clickLift - clickError;

  if (clickPValue <= P_VALUE_THRESHOLD && clickLowerBound > 0) {
    System.out.printf("Click volume is significantly up (+%.1f%%).%n", clickLift * 100);

    // Graduation is only supported for separate campaign experiments, not
    // intra-campaign experiments where there is no separate treatment campaign.
    ExperimentType experimentType = row.getExperiment().getType();
    if (experimentType != ExperimentType.ADOPT_BROAD_MATCH_KEYWORDS
        && experimentType != ExperimentType.ADOPT_AI_MAX) {
      System.out.println("Graduating treatment campaign for further manual analysis.");
      graduateExperiment(googleAdsClient, customerId, experimentResourceName);
    } else {
      System.out.println(
          "Intra-campaign trial detected: graduation is not supported. Continuing to run the"
              + " experiment to gather more conversion data.");
    }
  } else {
    // 3. Print status if no action was taken.
    System.out.printf(
        "Inconclusive: No significant lift in Conversions (p=%.2f) or Clicks (p=%.2f). Current"
            + " estimated lift: %.2f +/- %.2f. Allowing the experiment to continue running.%n",
        convPValue, clickPValue, convLift, convError);
  }
}

      

C#

private static void EvaluateExperiment(GoogleAdsClient client, long customerId, GoogleAdsRow row)
{
    // This function evaluates performance metrics and immediately takes action
    // to update the experiment's status (promote, end, or graduate) if
    // statistical significance thresholds are met.
    var metrics = row.Metrics;
    string experimentResourceName = row.Experiment.ResourceName;

    bool hasConvMetrics = metrics.HasConversionsAbsoluteChangePValue
        && metrics.HasConversionsAbsoluteChangePointEstimate
        && metrics.HasConversionsAbsoluteChangeMarginOfError;

    bool hasClickMetrics = metrics.HasClicksPValue
        && metrics.HasClicksPointEstimate
        && metrics.HasClicksMarginOfError;

    // 1. Evaluate conversion success as a primary success signal if available.
    // - Point Estimate: Represents the estimated average lift or difference in conversions.
    // - Margin of Error: Outlines the confidence interval bounds. Note that the margin_of_error
    //   provided by the API is calculated for a preset confidence level which is set based on
    //   the experiment type.
    // - Lower Bound: (Point Estimate - Margin of Error). If this value is above 0,
    //   we have statistical significance that performance has improved.
    if (hasConvMetrics)
    {
        double convPValue = metrics.ConversionsAbsoluteChangePValue;
        double convLift = metrics.ConversionsAbsoluteChangePointEstimate;
        double convError = metrics.ConversionsAbsoluteChangeMarginOfError;
        double convLowerBound = convLift - convError;

        if (convPValue <= P_VALUE_THRESHOLD)
        {
            if (convLowerBound > 0)
            {
                Console.WriteLine(
                    $"Significant Success: Conversions increased. Even at the lower" +
                    $" bound, the lift is {convLowerBound:F2}. Promoting changes.");
                PromoteExperiment(client, customerId, experimentResourceName);
                return;
            }
            else if ((convLift + convError) < 0)
            {
                Console.WriteLine(
                    $"Significant Decline: Even the upper bound ({convLift + convError:F2}) " +
                    $"is below zero. Ending experiment.");
                EndExperiment(client, customerId, experimentResourceName);
                return;
            }
        }
    }

    // 2. Evaluate click volume as a secondary signal.
    // This is helpful as an early indicator or for lower-volume accounts.
    if (hasClickMetrics)
    {
        double clickPValue = metrics.ClicksPValue;
        double clickLift = metrics.ClicksPointEstimate;
        double clickError = metrics.ClicksMarginOfError;
        double clickLowerBound = clickLift - clickError;

        if (clickPValue <= P_VALUE_THRESHOLD && clickLowerBound > 0)
        {
            // We have a directional winner: high confidence in more traffic,
            // but not enough data to confirm conversion impact yet.
            Console.WriteLine(
                $"Click volume is significantly up (+{clickLift * 100:F1}%).");

            // Graduation is only supported for separate campaign experiments, not
            // intra-campaign experiments where there is no separate treatment campaign.
            if (row.Experiment.Type != ExperimentType.AdoptBroadMatchKeywords
                && row.Experiment.Type != ExperimentType.AdoptAiMax)
            {
                Console.WriteLine("Graduating treatment campaign for further manual analysis.");
                GraduateExperiment(client, customerId, experimentResourceName);
            }
            else
            {
                Console.WriteLine(
                    "Intra-campaign trial detected: graduation is not supported. " +
                    "Continuing to run the experiment to gather more conversion data.");
            }
            return;
        }
    }

    // 3. Print status if no action was taken.
    if (hasConvMetrics || hasClickMetrics)
    {
        string convStatus = hasConvMetrics
            ? $"Conversions (p={metrics.ConversionsAbsoluteChangePValue:F2}, " +
              $"lift={metrics.ConversionsAbsoluteChangePointEstimate:F2} +/- " +
              $"{metrics.ConversionsAbsoluteChangeMarginOfError:F2})"
            : "Conversions (not populated)";

        string clickStatus = hasClickMetrics
            ? $"Clicks (p={metrics.ClicksPValue:F2}, " +
              $"lift={metrics.ClicksPointEstimate:F2} +/- " +
              $"{metrics.ClicksMarginOfError:F2})"
            : "Clicks (not populated)";

        Console.WriteLine(
            $"Inconclusive: No significant action taken. {convStatus}, {clickStatus}. " +
            "Allowing the experiment to continue running.");
    }
    else
    {
        Console.WriteLine(
            "Conversion and click performance metrics are not yet populated. " +
            "Allowing the experiment to continue running.");
    }
}
      

PHP

This example is not yet available in PHP; you can take a look at the other languages.
    

Python

def evaluate_experiment(
    client: GoogleAdsClient, customer_id: str, row: GoogleAdsRow
) -> None:
    """Evaluates the performance of the experiment and updates it accordingly
    (for example, promotes, ends, or graduates).

    Checks conversion and click metrics against statistical significance thresholds
    to determine the appropriate action to take on the experiment.

    Args:
        client: an initialized GoogleAdsClient instance.
        customer_id: a client customer ID.
        row: a GoogleAdsRow containing the experiment and metrics.
    """
    # This function evaluates performance metrics and immediately takes action
    # to update the experiment's status (promote, end, or graduate) if
    # statistical significance thresholds are met.
    metrics = row.metrics
    experiment_resource_name = row.experiment.resource_name

    has_conv_metrics = (
        "conversions_absolute_change_p_value" in metrics
        and "conversions_absolute_change_point_estimate" in metrics
        and "conversions_absolute_change_margin_of_error" in metrics
    )
    has_click_metrics = (
        "clicks_p_value" in metrics
        and "clicks_point_estimate" in metrics
        and "clicks_margin_of_error" in metrics
    )

    # 1. Evaluate conversion success as a primary success signal if available.
    # - Point Estimate: Represents the estimated average lift or difference in conversions.
    # - Margin of Error: Outlines the confidence interval bounds. Note that the margin_of_error provided by the API is calculated for a preset confidence level which is set based on the experiment type.
    # - Lower Bound: (Point Estimate - Margin of Error). If this value is above 0,
    #   we have statistical significance that performance has improved.
    if has_conv_metrics:
        conv_p_value = metrics.conversions_absolute_change_p_value
        conv_lift = metrics.conversions_absolute_change_point_estimate
        conv_error = metrics.conversions_absolute_change_margin_of_error
        conv_lower_bound = conv_lift - conv_error

        if conv_p_value <= P_VALUE_THRESHOLD:
            if conv_lower_bound > 0:
                print(
                    "Significant Success: Conversions increased. Even at the lower"
                    f" bound, the lift is {conv_lower_bound:.2f}. Promoting"
                    " changes."
                )
                promote_experiment(
                    client, customer_id, experiment_resource_name
                )
                return
            elif (conv_lift + conv_error) < 0:
                print(
                    "Significant Decline: Even the upper bound"
                    f" ({conv_lift + conv_error:.2f}) is below zero. Ending"
                    " experiment."
                )
                end_experiment(client, customer_id, experiment_resource_name)
                return

        # 2. Evaluate click volume as a secondary signal.
        # This is helpful as an early indicator or for lower-volume accounts.
        click_p_value = metrics.clicks_p_value
        click_lift = metrics.clicks_point_estimate
        click_error = metrics.clicks_margin_of_error
        click_lower_bound = click_lift - click_error

        if click_p_value <= P_VALUE_THRESHOLD and click_lower_bound > 0:
            # We have a directional winner: high confidence in more traffic,
            # but not enough data to confirm conversion impact yet.
            print(f"Click volume is significantly up (+{click_lift*100:.1f}%).")

            # Graduation is only supported for separate campaign experiments, not
            # intra-campaign experiments where there is no separate treatment campaign.
            experiment_type_name = row.experiment.type_.name
            if (
                experiment_type_name != "ADOPT_BROAD_MATCH_KEYWORDS"
                and experiment_type_name != "ADOPT_AI_MAX"
            ):
                print(
                    "Graduating treatment campaign for further manual analysis."
                )
                graduate_experiment(
                    client, customer_id, experiment_resource_name
                )
            else:
                print(
                    "Intra-campaign trial detected: graduation is not supported. "
                    "Continuing to run the experiment to gather more conversion data."
                )
            return

    # 3. Print status if no action was taken.
    if has_conv_metrics or has_click_metrics:
        conv_status = (
            f"Conversions (p={metrics.conversions_absolute_change_p_value:.2f}, "
            f"lift={metrics.conversions_absolute_change_point_estimate:.2f} +/- "
            f"{metrics.conversions_absolute_change_margin_of_error:.2f})"
            if has_conv_metrics
            else "Conversions (not populated)"
        )
        click_status = (
            f"Clicks (p={metrics.clicks_p_value:.2f}, "
            f"lift={metrics.clicks_point_estimate:.2f} +/- "
            f"{metrics.clicks_margin_of_error:.2f})"
            if has_click_metrics
            else "Clicks (not populated)"
        )
        print(
            f"Inconclusive: No significant action taken. {conv_status}, {click_status}."
            " Allowing the experiment to continue running."
        )
    else:
        print(
            "Conversion and click performance metrics are not yet populated. "
            "Allowing the experiment to continue running."
        )
      

Ruby

This example is not yet available in Ruby; you can take a look at the other languages.
    

Perl

This example is not yet available in Perl; you can take a look at the other languages.
    

curl

相较于广告系列报告的优势

与单独查询广告系列报告相比,直接实验报告具有以下优势:

  1. 集中式指标:在一行中检索对照组和实验组的指标。
  2. 统计置信度数据:提供计算出的 p 值、点估计值和误差范围。
  3. 效率:无需手动联接或比较多个报告中的结果。
  4. 支持广告系列内实验:这是比较广告系列内实验的对照组与实验组的唯一方法,此类实验的流量是在单个广告系列内分配的。

广告系列报告

对于创建单独的实验组广告系列(例如 SEARCH_CUSTOM)的实验,您可以查询 campaign 资源,并使用 campaign.experiment_type 来识别 BASE(对照组)和 EXPERIMENT(实验组)广告系列。如果您需要以更精细的级别(例如按广告组或关键字)细分指标,或者查看 experiment 资源中未提供的广告系列元数据,此方法会非常有用。不过,您需要手动执行效果比较和统计计算。

您无法使用广告系列级报告来比较广告系列内实验的各组,因为流量分配是在单个广告系列内部进行的。 针对广告系列内实验查询 campaign 时,只会返回汇总总数。

最佳做法

  • 选择合适的置信度:设置较低的 p 值阈值可以更快地提供方向性指导,尤其是在预算或转化量较低的情况下。95% 的置信度(p 值 <= 0.05)被认为是学术标准,可能更适合在较长时间内获得更准确的结果。
  • 让实验运行足够长的时间:让实验至少运行 4 周,以考虑每周的效果周期、转化延迟和学习期。
  • 留出磨合时间:对于采用自动出价的广告系列或测试新功能的广告系列,请忽略前 1-2 周的数据,以便出价模型和流量水平重新校准到拆分后的状态。
  • 采用 50/50 的分配比例:采用 50/50 的流量分配比例通常可以最快地获得具有统计显著性的结果。
  • 提前安排:将实验开始日期设置为未来 3-7 天,以便为广告审核和审批流程预留时间。
  • 在任何给定时间,每个广告系列只能运行一项实验。