Links to Other Parts
Conclusion
Congratulations on getting through the weeds of an example marketing data analysis!
If you’re reading this, then you’ve seen the 5-step analysis process in action:
In Step 1, you saw how to form a good research question
In Step 2, you followed along to see how a data scientist would take a first look through a descriptive statistics readout for a dataset
In Step 3, you identified strange-looking data series and saw how to analyze them for outliers and measurement artifacts
In Step 4, you saw an example analysis test if two variables meaningfully relate in our banking dataset
Finally in Step 5, you saw some ways to test for and avoid common statistical pitfalls after finding an analysis answer
The first time you go form and answer your own research question may feel scary, but you now have a flashlight to guide you in the dark.1
If your company has recently implemented or is considering a new self-service tool, don’t be afraid of getting your hands dirty in data analysis, even if you don’t have a data science background.
With the steps above, any marketer can take advantage of the new power at their disposal to better serve their customers and rise above competitors.
If you’re interested in additional help getting up to speed on a newly-installed tool, how to drive outcomes with data-driven analysis, or want to have an answer to your specific questions, reach out to me via email or comment!
Code and Relevant Links
Throughout this tutorial, we worked through a data extract from a Portuguese bank’s call center from 2008 to 2013. The initial analysis worked to determine which factors surrounding a direct marketing call were most responsible for closing a sale of long-term deposits.
The factors that were identified as most important in the paper were removed in the dataset, so we will be looking through potential other factors to test to increase sales.
To be as cross-platform as possible, relevant calculations and code will be available for reference in the above Jupyter Notebook (unfortunately, I cannot upload an ipynb to this platform, so email me if you want a copy).
Assumptions:
There is no seasonality in the data
Our only success factor is number of deposits completed, not the amount within each deposit
Useful Links:
Use the bank-additional-full.csv file
Appendix:
Hypothesis Question Examples:
A few questions that could lead to a testable hypothesis in our dataset could be:
Does a positive change in the 3-month Euribor rate positively correlate to a higher likelihood of making a long-term deposit?
Does having previously been in a campaign make someone more likely to make a deposit?
Are there any ages that are most likely to make a deposit when contacted?
Does the contact method (fixed-line telephone or cell) affect deposit rate?
Some questions are important to answer for the sake of your business but cannot be resolved by data analysis. A few examples:
Are we spending our budget well?
Spending it well relative to what? Until there are specific goals or metrics to measure against, data analysis can’t answer this question
How can we be more innovative?
Innovation usually involves approaching a customer need from a new perspective. Analyzing and optimizing the current ways that you do business is unlikely to lead to a radical new perspective. I’ve written more on the subject previously here.
How can we improve the bottom line?
This question is too broad to answer with data analysis. In order to approach this, break the question into more digestible parts repeatedly until an answerable question emerges. An example ordering is below
How can we improve profit?
How can we reduce costs?
How can we reduce our inventory holding costs?
How can we better predict our seasonal demand to reduce product holding time?
What times in the year does our inventory ordering regularly overshoot the demand?
Are we people first?
If you’re asking yourself this, the answer is no
Data analysis can’t answer this question anyway
Descriptive Statistics Examples:
These often include, but are not limited to, the following measurements on an example set of (1, 2, 1, 7, 55, 1, 1, 4, 5, 7, 4, 7, 8, 9, 10):
Range, minimum, and maximum
What the extremes and dimensions your dataset covers
For our example, the minimum would be 1, the maximum would be 55, and the range would be 55-1 or 54
Mean
Also known as the ‘arithmetic mean’ or just ‘average’, this value reflects the ‘center of mass’ of the data set
For our example, the mean, ~8.43, is the sum of all values, 118, divided by the count of all values, 14
Median
Sample in the middle of an ordered population. This value reflects the center of measurement for the data set. Often this can be useful when there are many measurement values far away from the mean
For our example, the set—when ordered—isÂ
(1, 1, 1, 1, 2, 4, 4, 5, 7, 7, 7, 8, 9, 10, 55), so the median is 5
Mode
Most commonly appearing value in a set
For our example, there are 4 instances of the value ‘1’,
(1, 1, 1, 1, 2, 4, 4, 5, 7, 7, 7, 8, 9, 10, 55), so 1 is the mode
Variance
Variance measures the distance between values and the mean. If there is a larger variance, then you’ll need more sample measurements to prove statistical significance
To perform this calculation, further reading is available here
For our example, the variance is 177.84, indicating a large dispersion from the mean values
Skew
Skewedness describes the shape of a distribution and if it leans more left or right of a sample bell curve. A highly skewed distribution may require different analysis methods than a non-skewed distribution.
To perform this calculation, further reading is available here
For our example, the skewedness is 3.51—above 1—indicating a right-skewed distribution due to the outlier of 55
High-Level Step 4 Approach for a New Dataset
Use the knowledge of the data gathered in Steps 1-3 to choose an appropriate statistical method, do the initial analysis on part of your data set to determine effect size and significance relevant to your research question, and see if the trends observed carry forward for the portion of the dataset withheld from the analysis. Â
This step can take many forms depending on your dataset and goals, but if the previous steps have been completed, this analysis should be well-defined and straightforward.
Here is a PDF of the whole series combined.