您可以通过以下两种主要方式生成实验报告:
- 直接实验报告:查询
experiment资源以获取指标。此选项可在单个响应中提供对照组和实验组的指标,以及提升和 p 值等统计比较数据。这是报告广告系列内实验的唯一方法。 - 广告系列报告:使用
campaign.experiment_type区分基础广告系列和实验广告系列,查询campaign资源以获取指标。此选项仅适用于使用单独的对照组广告系列和实验组广告系列的实验,例如系统管理的实验。
本指南主要介绍直接实验报告,该报告与支持报告的所有实验类型兼容。
直接实验报告
您可以直接查询 experiment 资源,以检索对照组和实验组之间的效果指标和统计比较结果。
指标和统计显著性
对于点击次数、展示次数、费用、转化次数和转化价值等核心指标,experiment 资源会在同一行中同时提供实验组指标(例如 metrics.clicks)和对照组指标(例如 metrics.control_clicks)。
它还提供了一些字段,可帮助您评估不同实验组之间任何差异的统计显著性:
metrics.*_p_value:如果实验对相应指标没有实际影响,出现观测结果的概率。p 值越低,表示统计显著性越高。metrics.*_point_estimate:实验组相对于对照组在指定指标方面的估计提升效果(正或负)。它们与margin_of_error一起描述了所估计差值的置信度达到规定水平的置信区间。要估计的量是(实验组 / 对照组 - 1)。点估计值是置信区间的中心。metrics.*_margin_of_error:置信区间的半径,以point_estimate为中心。它是针对指定置信度计算的,而置信度取决于实验类型。
experiment 资源支持以下核心指标字段,包括实验组值、对照组值和前面列出的统计字段:
clicksimpressionscost_microsconversionscost_per_conversionconversion_valueconversion_value_per_cost
对于转化,统计字段通过以下 absolute_change 字段提供,而不是以相对值的形式提供:
metrics.conversions_absolute_change_p_value:实验对转化绝对变化没有影响这一零假设的 p 值。范围为 0 到 1。metrics.conversions_absolute_change_point_estimate:估计实验对转化绝对变化的影响时的点估计值。metrics.conversions_absolute_change_margin_of_error:估计实验对转化绝对变化的影响时的边际误差。
如需帮助构建针对 experiment 资源的有效查询,请使用 Google Ads 查询构建器工具。
查询示例
以下 GAQL 查询可检索实验的关键指标:
SELECT
experiment.experiment_id,
experiment.name,
experiment.type,
metrics.clicks,
metrics.control_clicks,
metrics.clicks_point_estimate,
metrics.clicks_margin_of_error,
metrics.clicks_p_value,
metrics.conversions,
metrics.control_conversions,
metrics.conversions_absolute_change_point_estimate,
metrics.conversions_absolute_change_margin_of_error,
metrics.conversions_absolute_change_p_value
FROM experiment
WHERE experiment.experiment_id = EXPERIMENT_ID
解读结果
您可以使用 p 值、点估计值和边际误差字段来确定实验是否产生了具有统计显著性的结果。例如,如果 conversions_absolute_change_p_value 低于您选择的阈值(例如,对于 95% 的置信度,阈值为 0.05),且 conversions_absolute_change_point_estimate - conversions_absolute_change_margin_of_error 大于零,则表明实验组在转化方面的效果明显优于对照组。
以下 Python 代码段展示了如何根据 p 值和提升效果估计值评估结果:
Java
This example is not yet available in Java; you can take a look at the other languages.
C#
This example is not yet available in C#; you can take a look at the other languages.
PHP
This example is not yet available in PHP; you can take a look at the other languages.
Python
def evaluate_experiment( client: GoogleAdsClient, customer_id: str, row: GoogleAdsRow ) -> None: """Evaluates the performance of the experiment. Args: client: an initialized GoogleAdsClient instance. customer_id: a client customer ID. row: a GoogleAdsRow containing the experiment arm and metrics. """ metrics = row.metrics experiment_resource_name = row.experiment.resource_name # 1. Evaluate conversion success as a primary success signal. # - Point Estimate: Represents the estimated average lift or difference in conversions. # - Margin of Error: Outlines the confidence interval bounds. Note that the margin_of_error provided by the API is calculated for a preset confidence level which is set based on the experiment type. # - Lower Bound: (Point Estimate - Margin of Error). If this value is above 0, # we have statistical significance that performance has improved. conv_p_value = metrics.conversions_absolute_change_p_value conv_lift = metrics.conversions_absolute_change_point_estimate conv_error = metrics.conversions_absolute_change_margin_of_error conv_lower_bound = conv_lift - conv_error if conv_p_value <= P_VALUE_THRESHOLD: if conv_lower_bound > 0: print( "Significant Success: Conversions increased. Even at the lower" f" bound, the lift is {conv_lower_bound:.2f}. Promoting" " changes." ) promote_experiment(client, customer_id, experiment_resource_name) return elif (conv_lift + conv_error) < 0: print( "Significant Decline: Even the upper bound" f" ({conv_lift + conv_error:.2f}) is below zero. Ending" " experiment." ) end_experiment(client, customer_id, experiment_resource_name) return # 2. Evaluate click volume as a secondary signal. # This is helpful as an early indicator or for lower-volume accounts. click_p_value = metrics.clicks_p_value click_lift = metrics.clicks_point_estimate click_error = metrics.clicks_margin_of_error click_lower_bound = click_lift - click_error if click_p_value <= P_VALUE_THRESHOLD and click_lower_bound > 0: # We have a directional winner: high confidence in more traffic, # but not enough data to confirm conversion impact yet. print( f"Click volume is significantly up (+{click_lift*100:.1f}%). " "Graduating treatment for further manual analysis." ) # Graduate if it's a separate campaign test. # This keeps the high-volume treatment running independently. # Intra-campaign experiments (like ADOPT_BROAD_MATCH_KEYWORDS and # ADOPT_AI_MAX) run directly within the base campaign, meaning there is only # a single campaign involved and no separate treatment campaign to graduate. # Therefore, graduation is not supported for intra-campaign experiments. experiment_type_name = row.experiment.type_.name if ( experiment_type_name != "ADOPT_BROAD_MATCH_KEYWORDS" and experiment_type_name != "ADOPT_AI_MAX" ): graduate_experiment(client, customer_id, experiment_resource_name) else: print( "Intra-campaign trial detected: Graduation is not supported" " because there is only one campaign. Continuing to run to" " gather more conversion data." ) else: # Both conversions and clicks are noisy. print( "Inconclusive: No significant lift in Conversions" f" (p={conv_p_value:.2f}) or Clicks (p={click_p_value:.2f})." f" Current estimated lift: {conv_lift:.2f} +/- {conv_error:.2f}." " Continue running." )
Ruby
This example is not yet available in Ruby; you can take a look at the other languages.
Perl
This example is not yet available in Perl; you can take a look at the other languages.
curl
相较于广告系列报告的优势
与单独查询广告系列报告相比,直接实验报告具有以下多项优势:
- 集中式指标:在单行中检索对照组和实验组的指标。
- 统计置信度数据:提供计算出的 p 值、点估计值和误差范围。
- 效率:无需手动联接或比较多个报告中的结果。
- 支持广告系列内实验:这是比较广告系列内实验的对照组与实验组的唯一方法,此类实验的流量是在单个广告系列内分配的。
广告系列报告
对于创建单独的实验组广告系列(例如 SEARCH_CUSTOM)的实验,您可以查询 campaign 资源并使用 campaign.experiment_type 来识别 BASE(对照组)和 EXPERIMENT(实验组)广告系列。如果您需要以更精细的级别(例如,按广告组或关键字)细分指标,或者查看 experiment 资源中未提供的广告系列元数据,此方法会非常有用。不过,您需要手动进行性能比较和统计计算。
您无法使用广告系列级报告来比较广告系列内实验的各组,因为流量分配是在单个广告系列内部进行的。
针对广告系列内实验查询 campaign 时,只会返回汇总总数。
最佳做法
- 选择合适的置信度:设置较低的 p 值阈值可以更快地提供方向性指导,尤其是在预算或转化量较低的情况下。95% 的置信度(p 值 <= 0.05)被认为是学术标准,可能更适合在较长时间内获得更准确的结果。
- 让实验运行足够长的时间:让实验至少运行 4 周,以考虑每周的效果周期、转化延迟和学习期。
- 留出磨合时间:对于采用自动出价的广告系列或测试新功能的广告系列,请忽略前 1-2 周的数据,以便出价模型和流量水平重新校准到拆分后的状态。
- 采用 50/50 的分配比例:采用 50/50 的流量分配比例通常可以最快地获得具有统计显著性的结果。
- 提前安排:将实验开始日期设置为未来 3-7 天,以便为广告审核和审批流程留出时间。
- 在任何给定时间,每个广告系列只能运行一项实验。