加入新推出的
Discord 社区,展开实时讨论,获得同行支持,并直接与 Meridian 团队互动!
执行探索性数据分析
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
收集数据后,请执行探索性数据分析 (EDA),以查找并解决任何数据质量问题。这是营销组合建模分析 (MMM) 流程中的一个关键步骤,因为它让您可以评估数据,确认数据能否准确反映营销活动、客户响应和其他相关指标。通过修正 EDA 流程中发现的问题,您可以提高模型输出的可靠性。
执行 EDA 的基本流程如下:
- 运行数据审核,以确定是否存在任何数据缺失或不完整的情况。
- 修复原始输入文件中的缺失值。
- 评估数据的准确性。
- 修正数据中的任何异常值、离群点或不准确之处。
- 检查 KPI、媒体和控制变量之间的相关性。
执行 EDA 的方法有很多种,因此 Meridian 并没有为此流程提供可视化图表。我们建议您根据自身需求,在下列两者之间找到适当的平衡:进行全面的精细分析,以增强信心;快速检查概要数据,以提供较不详细的数据洞见。
在制作自己的可视化图表以协助执行 EDA 时,请考虑以下准则:
检查数据完整性:检查数据中是否存在缺失值。您可以创建图表来显示每个变量(渠道)的数据完整性百分比,然后调查显示为不完整的变量。
为进一步完善 EDA,您可以创建可视化图表,按年、月、周和工作日显示观测结果的数量。查找任意时间段内观测结果意外偏低的情况。
检查数据准确性:确保数据准确无误,且不存在可能导致结果偏差的异常值或离群点。通过创建可视化图表来检查准确性,包括比较每个渠道的媒体支出份额,以及检查某个渠道的趋势以发现任何异常情况。您可以将这些可视化图表与媒体策划方案进行比较,也可以与营销团队合作,帮助确定数据是否准确且足够精细。
检查渠道规模:查看渠道的支出占比。
支出占比非常小的渠道可能难以准确估计,
建议将这类渠道与其他渠道组合分析。
检查渠道媒体执行的可变性:媒体执行(展示次数、点击次数等)可变性较低的渠道可能难以估计。若掌握相关先验信息,不妨考虑使用自定义先验。
检查变量之间的相关性:虽然不要求 KPI、媒体和控制变量之间存在相关性,但在以下使用情形中,创建可视化图表来检查相关性还是很有帮助的:
在确信数据准确无误且完整后,您可以使用支持的格式加载数据,然后创建模型。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-29。
[null,null,["最后更新时间 (UTC):2025-08-29。"],[[["\u003cp\u003eExploratory data analysis (EDA) is a crucial step in marketing mix modeling (MMM) to assess and confirm the accuracy of data related to marketing efforts and customer responses.\u003c/p\u003e\n"],["\u003cp\u003eThe EDA process involves reviewing data for completeness, fixing missing values, evaluating accuracy, correcting anomalies, and checking the correlation between key performance indicators (KPIs) and other variables.\u003c/p\u003e\n"],["\u003cp\u003eEDA visualizations should check data completeness by identifying missing values and evaluating the number of observations over time.\u003c/p\u003e\n"],["\u003cp\u003eEDA visualizations should also check data accuracy by comparing media spend across channels and examining trends to detect anomalies or outliers.\u003c/p\u003e\n"],["\u003cp\u003eChecking correlation between variables, including media and control variables, can help identify unexpected relationships and multicollinearity issues that could affect the accuracy of the regression model.\u003c/p\u003e\n"]]],["Exploratory data analysis (EDA) is crucial for marketing mix modeling (MMM). Key actions include: reviewing data for completeness, fixing missing values, evaluating data accuracy, and correcting anomalies. Checking the correlation between KPI, media, and control variables can reveal unexpected relationships or multicollinearity. Visualizations, showing data completeness, accuracy, and correlation are helpful. The EDA process ensures data quality before loading data and building a model.\n"],null,["After you collect your data, perform an exploratory data analysis (EDA) to find\nand address any data quality issues. This is a critical step in the marketing\nmix modeling (MMM) process because it lets you assess the data to confirm that\nit accurately represents the marketing efforts, customer responses, and other\nrelevant metrics. By correcting issues discovered through the EDA process, you\ncan improve the reliability of the model output.\n\nThe basic process for performing an EDA is:\n\n1. Run a data review to identify any missing or incomplete data.\n2. Fix missing values in your raw input files.\n3. Evaluate the accuracy of the data.\n4. Correct any anomalies, outliers, or inaccuracies in the data.\n5. Check the correlation between your KPI, media, and control variables.\n\nThere are many ways to approach EDA, and so Meridian doesn't provide the\nvisualizations for this process. We recommend that you find the right balance\nfor your needs between running a thorough granular analysis for greater\nconfidence and a quick check of high-level data that gives less detailed\ninsight.\n\nConsider these guidelines as you produce your own visualizations to assist with\nyour EDA:\n\n- **Checking data completeness:** Check for missing values in the data.You can\n create charts that show the percentage of data completeness for each\n variable (channel), then investigate the variables that show as incomplete.\n\n To further refine your EDA, you can create visualizations that show the\n number of observations by year, month, week, and weekday. Look for\n unexpectedly lower observations for any time period.\n- **Checking data accuracy:** Ensure that data is accurate and free from\n anomalies or outliers that could skew results. Creating visualizations to\n check for accuracy can include comparing the share of media spend for each\n channel and checking the trend of a channel to identify anything unusual.\n You can compare these visualizations against the media plan or work with the\n marketing team to help identify whether the data is accurate and granular\n enough.\n\n- **Checking channels size:** look at the channel's share of spend.\n Channels with very small share of spend might be difficult to estimate.\n You might want to combine them with other channels.\n\n- **Checking variability of channels' media execution:** Channels with low\n variability in media execution (impressions, clicks, etc.) might be\n difficult to estimate. Consider using a custom prior, if you have relevant\n information for it.\n\n- **Checking correlation between variables:** Though correlation between\n KPI, media, and control variables is not required, creating visualizations\n to check for correlation can be helpful in the following use cases:\n\n - Measuring the correlation between media and control variables to see if\n there is any unexpected relationship. This can help you decide\n whether to keep or remove any media or control variable.\n\n - Identifying multicollinearity. When two or more variables in the media\n and control variables are highly correlated with each other, they create\n multicollinearity, which can cause regression models to have difficulty\n calculating the impact of the collinear variables. By identifying\n multicollinearity in your data review, you can decide which variables to\n include or exclude from your model.\n\nAfter you have confidence that your data is accurate and complete, you can [load\nthe data using a supported\nformat](/meridian/docs/user-guide/supported-data-types-formats), and then\n[create your model](/meridian/docs/user-guide/modeling-overview)."]]