Sufficiency test
Helps with prioritization. Differentiates between behaviors we believe to be causal (because we’d abandon them if they didn’t actually turn out that way). Creates zones of accountability.
Sufficiency test
Helps with prioritization. Differentiates between behaviors we believe to be causal (because we’d abandon them if they didn’t actually turn out that way). Creates zones of accountability.
Yesterday, I wrote about using an Alternative Universe exercise to get stakeholders focused on behaviors. Today, I’ll share a second tactic: Outliers.
Let’s use a different example and focus on an internal use case: “I want people to be more inclusive in meetings.” And for variety, our stakeholder can be a glasses-wearing CEO named Satya.
The Outliers exercise relies on a natural human bias: our tendency to remember vivid extremes better than averages. This is useful because by using real exemplars of actual outliers, it helps stakeholders connect with behaviors as observed, rather than hypothesized.
I generally start with the positive version. “Alright, Satya, I want you to think about the most inclusive person you know. Do you have them in your mind? Great; tell me a bit about them.”
Satya will usually start with the person’s demographics, which you can ignore unless that is all he mentions, in which case you’ll need to prompt him: “You know he’s inclusive because he’s Black? Are all Black people inclusive?”
Usually, though, he’ll mention some behaviors in passing and those are what you want to call out and emphasize. “Oh, so he calls out when men repeat ideas that a woman has already said. Is that what you mean by inclusive: someone who calls out idea attribution?”
I generally suggest waiting until he’s finished his initial description, as he may mention several behaviors without prompting. You’re not actually trying to finalize a selection here, just get a list of possibilities, so the more behaviors you can pull out, the greater the chance that you’ll find the sufficient one. You can worry about narrowing down later as you actually write your behavioral statement.
I also think it is worthwhile to do the negative version. “OK, let’s try something different. I want you to think of the least inclusive person you know and tell me about them.”
This is useful because sometimes our end goal isn’t getting people to do a desirable behavior but rather stopping them from doing an undesirable behavior; there are plenty of inclusive practices that are about the absence of a bias. And because of our innate tendency to focus on promoting pressures, focusing on a negative outlier can help close that blindspot.
I tend to use Outliers over Alternative Universes when the subject is serious, as it doesn’t rely so much on the entertainment factor to hold interest; talking about exemplars is inherently fascinating. And because Satya has no access to the exemplars cognitions and emotions, he’s forced to rely on observable behaviors, which makes it easier to move people away from concepts like “loving a product”.
But the focus on exemplars comes at a cost: because it uses real people, the Outlier exercise often struggles with niche behaviors where the stakeholders’ have no personal experience. If Satya has never actually seen an inclusive leader, how is he supposed to describe them?
Tomorrow’s exercise, The Genie, addresses this by moving back into the realm of fantasy.
One of the hardest jobs that applied behavioral scientists have is getting stakeholders to focus on behaviors, rather than emotions or cognitions. Over the years, I’ve come to rely on three tactics for creating a behavioral outcome and while none of them is a silver bullet, at least one of them usually manages to do the job.
They all work on the same central premise: that people find it easier to make decisions through comparisons. They’re designed to be entertaining (high promoting pressure) but also simple enough that anyone in the room can use and understand them (low inhibiting pressure). And they all start with an emotion or cognition that a stakeholder expresses.
For our examples, let’s use an emotional outcome: “I want users to be obsessed with our product.” And because it feels appropriate, let’s say that was expressed by a turtleneck-wearing CEO named Steve.
The first tactic is using Alternative Universes.
“We’ve all seen a sci-fi movie where there are alternative universes. There is Original Steve and Steve Prime, and you know Steve Prime is the bad Steve because he has a mustache. And then he shaves his mustache and they end up in some dramatic fight scene where you have to figure out which Steve is the original so you can shoot the other one and keep them from taking over the multiverse.”
“Now Steve and Steve Prime are identical in every possible way except Steve Prime is from a universe that is obsessed with our product. You’ve got to shoot him…how would you know he is from the universe that is obsessed and Original Steve isn’t?”
Now we’ve got people laughing (entertaining, check!) and everyone can relate (easy, check!) and you can start to push on the behavioral bit. You’ve got two jobs: be the buzzer when someone says something that isn’t observable and provide suggestions of potential behaviors if people are struggling.
So if someone says “Steve Prime loves the phone!” you have to be quick to shut that down. “Bzzzzz, you just killed Original Steve and now we’re doomed.” (Bonus points if you sing “I don’t know what love is…but I want you to show me!”) Be funny but firm on this; you have to shut down anything that isn’t a physically observable behavior.
If people aren’t getting there, you can always make suggestions. “What about owning our phone? Original Steve doesn’t own one of our phones, Steve Prime does…is that enough?”
The ‘enough’ part of that is key – is owning the phone alone enough to say that someone is truly obsessed? If both of them owned our phone, would they both be obsessed? Why or why not? Provocative questions are key, because disagreement is good. If everyone rushes to the behavior, that is likely a false consensus and will come back to bite you later.
Tomorrow we’ll look at the second tactic I use, Outliers.
Delight without satisfaction is addiction. And so when we design to make people feel happy in the moment, we must be mindful that it also enhances their long-term happiness, or risk creating a suboptimal world.
Over the weekend, designer Taurean Bryant posted about his hatred for the term “delight” in design. And Justin Maxwell shared an anecdote about the Mint.com team being bewildered by “Design for Delight” as an OKR at Intuit after they were acquired. Yet given the option, I think we’d all prefer a delightful experience. So why the dissonance?
Let’s start by defining terms. Generally, hedonic psychologists think of happiness as made up of two parts: delight (a momentary experience; I often think of the first lick of an ice cream cone) and satisfaction (a long-term experience; I imagine myself watching my sleeping son, thinking my life is good).
One of the oddities of hedonics is that the two don’t correlate particularly strongly. In studies where people are randomly pinged to rate their in-the-moment delight and then asked about their satisfaction at the end of the day, the two emerge as distinct: some have many moments of delight but are highly unsatisfied, while others aren’t particularly delighted but very satisfied generally (I may be a strong outlier in this category).
Since they are distinct, the perfect product would both delight and satisfy me. But that is hard to achieve and so design leaders frequently pronounce edicts about what they perceive as the deficit. I have no doubt that some well-meaning Intuit leader thought “Well, our product is very satisfying but not particularly delightful, so let’s lean into introducing more delightful moments.”
Which makes sense, as long as it is contextualized. In general, I don’t buy Intuit products to be delighted; I have Netflix for that. I just want to file my taxes so I can be satisfied. So what Intuit really wants is to be maximally satisfying and delightful enough. The result would be an app that files my taxes correctly and doesn’t make me totally hate the in-the-moment experience.
That is a cohesive, achievable design strategy. You can say that finishing filing your taxes and having them accepted is the satisfying behavior and advancing to the next screen is the delightful behavior, then divide your team to focus on designs that maximize the rate of each. In the event of conflicting needs, filing your taxes wins.
The converse is true for Netflix: be maximally delightful and satisfying enough. The result is an entertainment experience that also, on occasion, makes me reflect on some of the deeper truths of my life. Again, those are differing behaviors that can be designed for separately.
Because in reality, the problem isn’t with the introduction of delight into design, but rather a misunderstanding of when and why it matters. Every product needs both but can only maximize one, so they have to be viewed and communicated by leaders as a tradeoff of both resources and features.
Yesterday, I wrote about Lenny Rachitsky’s attempt to figure out which companies produce the best PMs and the problems with his analysis. But today is more important: even if you correct for method errors, this data doesn’t really answer the question he is asking. But it may tell PMs, particularly those underrepresented in tech, what companies to avoid.
A brief reminder of method: he looked at PMs who have left a company and what happens across 7 career categories, like how quickly they are promoted in their next job.
But using only alumni data introduces significant confounds. And he acknowledges this deep in one section of his analysis: “Another explanation is that the best PMs at FAANG companies are happy and don’t leave, and so we don’t see their trajectories in the data.”
This is very much burying the lead. All good insights start with removing as much systematic bias as you can from your sample and looking at only alumni means ignoring all the reasons that PMs choose to stay at a company. Many of the companies have experienced layoffs. Some are better at retaining leaders vs juniors. Companies that are newer have less time to show attrition.
And these biases aren’t random. Take Average Time to First Promotion. In his view, lower is better: it means the company turned you into a talented PM. But it could just as easily mean that a company systematically underpromotes top talent, causing them to leave and get quickly promoted elsewhere. This is particularly true for underrepresented people, who are most likely to be overlooked.
But what if, by combining Time to Promotion and Leadership, we try to find companies that systematically underpromote talented people?
There are caveats. Smaller companies might not have as much room to promote people and we don’t have data in this sample to control for that. Cross-validating with another dataset (like time in role without promotion before leaving) and qualitative interviews would go a long way.
But let’s say we do believe the combined metric is reasonable. Where should folks choose to work?
Worst first; these companies appear to systematically overlook top talent.
46. Discord
45. Deel
44. Revolut
43. Scale AI
44. Plaid
Both Revolut and Plaid made Rachitsky’s Top 5 Best Companies. This is why looking at things through an inclusive lens is so important: otherwise you might not just give random advice but advice that is actively bad for disadvantaged groups.
The best companies?
Maybe large companies with more formal processes are better at mitigating promotion bias. Maybe they just have more room to grow. Maybe underrepresented people can more freely transfer internally away from biased managers there. I don’t feel strongly enough about this data to feel like I know the answer.
And that’s really the point: good causal analysis matters and being wrong can worsen systematic issues. So if you have a large audience, be particularly careful what you say, and don’t rely on others to check your work.
Newsletter expert Lenny Rachitsky tried to figure out which companies produce the best product managers, using a very large dataset. And not only did he arrive at the wrong conclusion, he inadvertently produced a list of places you probably shouldn’t work, particularly if you’re underrepresented in tech.
Preamble: I don’t know Rachitsky but our mutuals paint him as reasonable. And he had a chance to comment on this draft and gave it his blessing. The point isn’t to drag him – it is to understand how and why we have to do better when creating insights.
So today I’ll explain how his analysis actually miscategorizes the top companies. Tomorrow I’ll talk about why that doesn’t matter, because the data isn’t suited to answer his question anyway. But it can potentially answer another important question.
Let’s start with his stated intention: to guide people to great companies, where PMs learn the craft and go on to have stellar careers. To do that, he looked at PMs who have left a company and what happens across 7 categories: Total promotions, Fastest immediate promotion, Fastest rise to leadership, Highest rate of CPOs, HOPs, First PM hires, and Founders.
He then ranks each category and looks at the Top Ten in each. His overall Top 5?
1. Revolut
2. N26
3. eBay
4. Plaid
5. Intercom
Using rankings is his first mistake. If the difference between 1st and 2nd is small but 2nd and 3rd is huge, using ordinals obscures that. So you need to normalize to standard deviations with Z-scores, looking at how far companies are from the average.
Which leads to his second mistake: differences in population size. Deel has 79 former PMs on LinkedIn; Microsoft has 32k. Given the rarity of events like becoming a founder, this creates massive data anomalies if even one more person takes that path at Deel.
Finally, he assumes independence among his criteria. But this isn’t true. Take PMs from Discord. They’re almost dead last in terms of becoming founders but top the list in fastest promotions. Why? Because you can’t get promoted if you become a founder!
In fact, all of the data is substantially correlated. Average Time To Promotion and Average Time To Leadership have a correlation of 0.96; they might as well be the same list.
There seem to be two discriminant criteria: Rate of Founder/Head of Product/CPO/First Hire (Early Stage) and Average Time to Promotions/To Reach Leadership/Total Promotions (Late Stage).
So if we adjust to these criteria and use Z-scores, who actually comes out on top?
This isn’t entirely different from Rachitsky’s list but it isn’t the same either. And with 230k followers and over $2m a year in newsletter revenue, what he says matters to a lot of people. So it is worth saying the right thing.
But it gets worse: even if you correct for bad statistics, this data doesn’t actually answer the question he is asking and may actually be an anti-signal that disproportionately affects underrepresented PMs. Tomorrow, I’ll talk about why.
During breakfast on Monday, my son turned to me and said “Last night was really fun. Thank you for having them over.” That is an exact quote and, I think we can all agree, an utterly bizarre thing for a nine-year-old to say. It was sandwiched between two long soliloquies about anime, so I’m fairly sure he’s still my child and not a clone.
“Them” was Di Le, Asli Aydin, Misti Cain, and Diana Wolosin. We all met (along with the absent but beloved Kevin Bethune) at the speaker dinner for DDX San Diego and it became a rollicking WhatsApp group that resulted in a followup dinner at my house. I asked them before tagging but won’t say more; the Chatham House rule applies.
My son is used to explicit discussions of diversity, because my co-parent, my partner, and I are all aligned on being clear with him about our beliefs and how we came to them. And most parents spend at least some anxious midnight hours worrying about how we talk to our children.
Ditto executives. CEOs often have multi-person executive communications teams responsible for helping them with messaging for employees, investors, and the public.
But I know of few executives who have an equally sized team responsible for making sure they are behaving in line with that communication. Exec comms is a role; exec behavior change isn’t. And this represents a misallocation of resources.
Imagine four CEOs. CEO A talks about diversity and has a diverse set of leaders around them. B talks about diversity but doesn’t have it on their team. C doesn’t talk about diversity but has a diverse team. D neither talks about diversity nor has a diverse team.
Intuitively, A is the best and D is the worst; as with any 2×2, the matching corners are easy. And in the mixed corners, intuitively C is better than B: it is better to walk than talk, given the choice.
The surprising finding is that B is actually often just as bad as D, and some situations may even be worse. This isn’t irrational. If someone doesn’t do or talk about something, the possibility at least exists that they are unaware and that they might change their behavior with awareness. But if they talk about it and don’t do it, that possibility is closed off; our brain naturally says “if they are aware but still not doing it, there must be good reasons I shouldn’t do it either.”
There are exceptions: leaders can talk about their struggle with a behavior rather than an accomplished reality and that tends to reduce the anti-signal. But ultimately, behaviors rule. Which means we need to be investing both time and money in making sure that we are behaving in ways that are congruent with the messages we send.
At home, you can use the dinner party test. Who was the last group of non-family people that your kid saw around your table? Do they accurately represent the values and beliefs you have expressed? Certainly those four women do and I am grateful for their friendship.
At work, consider the same. What shows people that you mean what you say? Are you sure you’re doing it?
Over the last few years, I’ve done thousands of office hours: first-come-first-served blocks that mentees set the goals for. And this is the dominant model for how mentorship is viewed in the workplace today: a 1:1 meeting, set on a calendar, with an explicit agenda. I’m also the dad of a wonderful, quirky nine-year-old. He does not schedule his growth around my calendar and follows no agenda. Guidance happens spontaneously, as part of the context of our time together. And this used to be the dominant model of work mentorship, against the backdrop of co-labor: a more experienced worker paired with a less experienced worker, doing a task together, with the knowledge transfer occurring contextually. As work became increasingly differentiated, the meeting model became the norm. But it doesn’t have to be that way. My son is nine and finally old enough to play video games designed for adults. Our first outing has been Valehim, a Viking-themed sandbox survival game, and it has created a context for conversations that I cannot imagine having outside of co-labor. This week, my partner decided she wants to play with us. But in order for her to not die immediately, she needs armor that matches our level. Which means we need troll hide. So my son had to form a plan to collaboratively hunt trolls and then act in a coordinated fashion, with one of us as bait while the other focuses on shooting the monster.We talk as we do this, sometimes about the tasks of the game but also about life and his experiences more generally. Sometimes the mentorship is explicit (I’m teaching about how the meta-game works), sometimes it happens implicitly (I’m listening and nudging as he rambles), but both happen organically. Contrast that with the modern work environment. Often we do our work alone, because increasing profitability means reducing labor costs means using technology as our co-laborer (this will get worse, not better, with AI). We pop out of this solo labor for meetings, of which mentorship is one. But what if mentorship reintroduced co-labor? In a perfect world, this would be role-relevant: we would pair code or write a communication together or design an intervention. But it doesn’t have to be. It can be as simple as introducing a quick game, a 60-minute hackathon on an orthogonal business problem, or anything that creates a context within which mentorship can happen. To borrow an educational phrase, “guide on the side, rather than sage on the stage.” This doesn’t mean returning to the office. But it does mean being more deliberate about planning for context, which is the third participant in any mentorship experience. Because if the goal of mentorship is to encourage growth, we need much better soil than we have today.
Last week, someone was murdered. In reality, many people were murdered last week but someone leapt to mind the moment you read it. On social media, people argue about the validity of who you think about and why you should have thought of someone else. I have never seen these arguments at a funeral.
I have been an executive at a health insurance company. I neither approved nor denied claims directly, but was responsible for the call center that handled both talking to customers and providers. Claims are denied on the presence and absence of evidence; my team was part of the evidence-gathering process.
Every claim is a research project. As with science, you start with the default that the claim should be denied and then gather evidence to reject that null hypothesis. Over time, heuristics form and then become laws, as immutable as gravity, because you cannot afford to revalidate them every time. So every flu shot claim, even if you’ve already had one this year, gets approved.
During my time as an executive, I am absolutely certain we denied claims that we should have approved. I am also certain that we approved claims that should have been denied. These mistakes happen in insurance for the same reasons they happen in science: our sources of information are imperfect and our judgement fallible.
Mistakes were made. I have made many mistakes. Many mistakes are mine.
(I pause here to make dinner for my son, who has a father that is still alive and has not yet been murdered.)
(Sometimes, my son drops his bowl, scattering soup across the kitchen. I am angry. He protests that it was an accident. I point out that it was an avoidable accident, that he made choices that made the accident more likely, that the accident was a matter of policy. We settle there, hug, clean up the food. I do not murder my son.)
(What if he dropped the bowl on purpose, because he is not a fan of soup? What if the dog gave him a big-eyed look in the certain knowledge that spilled soup increases Rover Oral Integration, also known as doggy dinner? What if, in spilling the soup, he burns my leg, does me harm, forces me to take out a loan to pay for more food and burn cream?)
(Should I kill him then?)
Science does best when free from outside influence. I have been a health insurance executive and I believe that people are best served by a single payer system run by the government. I believe this because evidence exists from the study of our own system and from the practical experience of other countries.
I enact this belief by voting. Sometimes my candidates win; sometimes they do not. Sometimes the candidate that wins says publicly that they will turn more of our healthcare system over to companies. I believe he will.
I also enact this belief by being visible. Sometimes, being visible gets you a nasty note. Sometimes, it gets you a death threat. Sometimes, you ask the police to do an extra pass by your house at night and they do it. Not to shoot you but to shoot those who would shoot you, because you are white and male and affluent and because someone cares if you die.
I would work in health insurance again. I would try to make it better. I would be glad when we got a single payer system. I would find new problems to solve. Because it is core to my beliefs that no one deserves to die. Not the murdered. Not the murderer. Not the murdered murderer.
(My son will spill the soup again. He will be a grown man some day. Aggravator and aggrieved, please do not kill him, through action or inaction. There are already enough ways to die.)
People often seem to forget the root of compensation is compensate: a force exerted in order to counterbalance an opposing force. Think of it like a balance scale; something is loaded on one side, so we compensate by loading the other.
But what exactly is it that we’re loading?
There are generally two frames that occur in a variety of contexts. The negative frame suggests that what is loaded is injury, compensating by recompense. For example, work creates harm by costing me effort and my paycheck is restorative justice for that harm. This justice view often creates a wage-style system with tightly-defined borders; I give you 8 hours, you give me $200.
The positive frame, in contrast, suggests that what is loaded is value, which is then compensated by shifting value. For example, work is a value-producing activity performed in sync with others and my paycheck is my share of that collective value. This value view often creates a variable-style system with aligned incentives.
Neither frame is right or wrong, they’re just different. But how you choose to engage with them will dramatically change the work choices you make.
For example, I have tended toward the value frame and while it is tempting to think of that as a reflection simply of white collar work, I’ve also had some very mundane jobs: IT repair, bouncer, retail, farm work. And I know plenty of white collar workers with a justice frame, who treat their job as something inflicted upon them.
Because I’ve chosen the value frame, I tend to only accept direct payment where I am certain I am creating value. I don’t charge for speaking or advising, for example, because both activities have potential, rather than realized, value; my advice may or may not be good and so the value it creates is highly variable. To capture the potential value, I often joke with founders that if they sell for billions, they can buy me a boat.
If I instead chose the justice frame, then I would likely charge for far more activities. If I fly around the world to speak at your event, you owe me for that effort; if it took me an hour to advise you, you owe me for that lost time.
Because I have chosen a way of viewing my work, it is difficult to write about both sides empathically; no doubt others may better express why they see their paycheck as a form of restorative justice.
But what is important is that like all frames, these can be shifted. Every workday is an opportunity to choose a different frame for your compensation, with both direct and indirect consequences for that choice. So be deliberate and do not let the frame be imposed upon you; you can decide how you want to view the relationship between the two sides of the scale.