Links to Other Parts
Link to Part 2 [Current Article]
Introduction
Congratulations! By starting this part, you’ve decided to move beyond the high-level framework of Part 1 and are going off into the forest on your own.
This part gets you set up with the data in case you want to follow along on your own (Python code will be published in the Appendix section), gives links for background research if you’re interested, and the first few steps.
Getting Started
Throughout this tutorial, we will be working through a data extract from a Portuguese bank’s call center from 2008 to 2013. The initial analysis worked to determine which factors surrounding a direct marketing call were most responsible for closing a sale of long-term deposits.
The factors that were identified as most important in the paper were removed in the dataset, so we will be looking through potential other factors to test to increase sales.
To be as cross-platform as possible, relevant calculations and code will be available for reference in a Jupyter Notebook linked when the Appendix is published.
Assumptions:
There is no seasonality in the data
Our only success factor is number of deposits completed, not the amount within each deposit
Useful Links:
Use the bank-additional-full.csv file
1. Orient: Design A Real-World End Goal for Analysis
“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where—” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.—Lewis Carroll, “Alice’s Adventures in Wonderland”
Before charging off in a random direction looking for treasure, it’s best to figure out which way is north and orient yourself in the landscape.
In the same way, before running any regressions, before opening any data visualizations, before even touching the keyboard, orient yourself and figure out which way is north—decide on a desired real-world goal to orient yourself against.
You should aim to eventually generate a hypothesis that is testable, falsifiable, and would meaningfully affect business performance.
Examples of well-formed vs vague questions will be listed in the Appendix.
For our analysis, we will be determining the answer to the following question: Does the contact method affect deposit rate?
Don’t let the blank slate of a full dataset scare you—view it as an opportunity to better understand your customers in ways other haven’t seen yet.
As you explore the data to answer your chosen research question, new questions worth pursuing may appear. Record them and walk through the framework again after you’ve either answered your original question or determined that the initial goal is obsolete.
⚠️ A cautionary note: Don’t be dissuaded from exploring new areas in the data because they don’t connect to your initial goal. If you discover interesting relationships that were unseen before, add them to a backlog of interesting items to research.
Join me for Part 3 where you’ll learn how to generate and read summary statistics like a professional analyst.
If you know anyone who could benefit from learning how to manage their own analytics, feel free to send them a link to the article.
Links to Other Parts
Link to Part 2 [Current Article]