Episode 2 – Why Innovation Teams have to be Measured Differently and How to do it – Podcast Transcript

Main Episode Page

Episode Transcript

Tristan: What I’m really trying to measure with Insight velocity is whether or not they are running experimentation in a way that produces data and insight.

Elijah: Welcome to the Innovation Metrics podcast, where we geek about innovation management. We bring you insights on how to measure innovation, innovation accounting, and managing the uncertain process of developing new, sustainable and profitable business models. In today’s episode, we are exploring how innovation teams should be measured. You can find links to important topics covered in this episode and information about the guests and hosts in the show Notes, or go to our blog on innovationmetrics co I’m very excited to welcome Tristan Chroma on the show today. As an innovation coach and founder of Chromatic, tristan works with innovation teams and leaders to create amazing products and build startup ecosystems. He has worked with companies from early stage startups with zero revenue to enterprise companies with more than a billion dollars. Revenue like unilever salesforce, LinkedIn and the list goes on. Tristan worked with more than 30 technology accelerators and ecosystem programs around the world. I recommend you all to check out his website, chromatic.com, for one of the best innovation blogs and other resources. Hope you find this episode to be fun and insightful. Hi, Tristan, great to have you here.

Tristan: Great to be here. Ilia.

Elijah: Today we’re talking about how we better measure teams in an innovation context, innovation teams specifically in large organizations. And to kick it off, I would like to ask you why do we even need any different form of measurement for an innovation team?

Tristan: It’s a pretty interesting kind of way to approach things, because I guess the question we need to answer first then, is how are we measuring teams right now?

Elijah: Yes.

Tristan: So what is an innovation team versus, I guess, a normal product team? How are they measured right now?

Elijah: Yeah, that would be good. Let’s dig into that.

Tristan: Right, so I guess what I see most often is that both innovation teams and normal teams are often just measured by what Amazon would call an output metric. How much revenue do you have? How many users do you have? What you or I might call the lagging indicator, the thing that has a lot of different variables going into it, but only one kind of important thing the company really cares about coming out, which is the number of dollars and the number of users. Or if it’s a nonprofit or other agency, it might be mission impact, the number of lives saved, the number of cigarettes ripped from the mouths of children and that sort of thing. These are the sort of indicators that I think are not irrelevant. They’re absolutely relevant, but they’re very hard to control for and they’re not exactly what’s the most useful thing to measure for teams. Right. They’re not something that the team can.

Elijah: Teams in general or for innovation teams, we’re talking about teams in general.

Tristan: I think for both. I do think there is a heavy distinction between teams in general and innovation teams. But if you just look at teams in general for a media site or a media business, a Lagging indicator might be the number of eyeballs that hit a podcast. We were actually talking before we started recording about yoga for Adrian, right? She has nine point, however many million subscribers, which seems really awesome. And each one of those videos has a lot of views to it, which seems very good. But that’s something that is controlled by there are a lot of variables that go into it. There’s the topic, there’s the user, there’s when you post, there’s how long it is. There’s a lot of different variables that go into how successful an individual post would be. But a media company might judge its writers or content producers, say, on the volume or the velocity of things that they’re publishing. That might be a better metric because that might tell you at least like, well, we’re posting a lot of stuff. Or they might say something along the lines of, well, I want the highest ratio of views per post. Right? I don’t necessarily just want a lot of posts, I want the highest quality posts possible. Different media groups might have different approaches, they might have a long tail. Media group might just care about velocity, not necessarily the number of views per post. But these are the sort of input metrics that the content producer in this case can control directly. I can either just publish more frequently or I can really focus on understanding my customer audience and post the exact right post to get the highest engagement and the highest number of eyeballs because I’m focusing really high on quality, not quantity. That’s an example of an input metric or what Amazon would call an input metric, or we might call just a leading indicator of success. Something that is highly correlated to the lagging indicator. Meaning the more posts we put up, the more eyeballs we put up, the higher quality of posts we put up, the more eyeballs. So something that we can control right now that we hope will lead to the thing that the company cares about later. Number of eyeballs, dollar signs, livestock.

Elijah: So the thing that a team can control.

Tristan: Yeah, I think that’s very important in any sort of measurement, and it is something that I think is lacking in a lot of companies and even more so on innovation teams, to your kind of original point. But it’s the same issue, it’s just a different type of metric that the innovation team needs to deal with because they don’t necessarily know what the good leading indicator that will eventually lead to revenue is, because it’s an unknown. So when they don’t know the causality, picking something randomly is probably not going to be helpful.

Elijah: Right?

Tristan: Yeah.

Elijah: So we have the things they can control, and then like in traditional teams, and then obviously we’re measuring in the first place so we can make decisions. We’re not just measuring for the sake of measuring. I hope that’s what we’re talking about here, at least today.

Tristan: Yeah. We’re not talking about data mining, right. Where we just measure everything and hope to find something later. That’s not very feasible for most innovation teams, at least. Yeah. Or develop dashboard is a big factor. Yeah.

Elijah: Develop dashboards to please somebody in whatever position we want to impact behavior and come to a fair judgment if somebody fairer judgment if somebody did the job well or not, I suppose, yeah.

Tristan: We want to give a measure to teams, whether they’re innovation teams or anyone, really, that most directly correlates to things that they can control and things that they can control that will hopefully be correlated with the good outcome. So if somebody has a bad race day, who’s running a marathon and their kids woke them up at 05:00 in the morning and 03:00 in the morning and 07:00 in the morning and 423 in the morning, they might just have a bad day. But yes, we could judge them on the outcome of that race. But more realistically, could we try and judge their performance and give them encouragement based on the amount of times they’re going to practice their regular average pace through whatever, their five K training sessions? I’m obviously not a runner, so I don’t know what all the leading indicators would be, but I would want to try and set up the metrics for performance based on their practice, not just the outcome of the race. Right. Although cool, we know that their endorsement dollars are depend on them winning the race. Unfortunately. Yeah.

Elijah: Maybe not when they’re just starting out the training, though.

Tristan: Right.

Elijah: Maybe not when they’re like there’s this analogy of measuring you can’t isn’t there this who put it out? This analogy between when we start measuring that was more product related. Yeah. When you think about trying to measure product, like when you measure Hussein bolt, or when you measure somebody who just starts out running against the metric of Hussein bolt. Did you remember that analogy? Was that tendai or somebody who spoke about it?

Tristan: I don’t, but obviously it’s a very clear metaphor, but at least rather analogy, but at least it’s a very clear comparison in that case. Right. I think the metaphor that’s probably the most appropriate for innovation teams is you don’t want to judge a two year old by their Sat scores. There you go. Right. Because when somebody’s just starting out running and you want to compare them to the speed of Hussein bolt, it’s not realistic that they’re going to beat that benchmark. But it’s at least the same measurement that you would measure all runners from. Right. Because it is just a question of how fast you go across the finish line. But a child’s Sat scores is not going to tell you anything about their intellectual development at two years old, their Sat scores are going to be zero. Just flat out zero. So maybe in that case, the analogy to a marathon would be if I was starting to run today and you checked out my time to run a marathon, it would be infinite because I cannot run a marathon. I probably couldn’t run three blocks. Maybe if there were zombies chasing me, I could maybe get to four or five blocks before running out of breath. But I’m never going to run a marathon right now. What’s something that’s in my control, that is an interim step to that could be just getting out every day and running. Can I start practicing now? If I start practicing, that won’t guarantee that I’ll be able to run a marathon, let alone win a marathon. But I will never be able to run a marathon unless I start practicing and unless I start running every day. So that is going to be a good first step to doing anything in that domain of running. And I think the same applies to innovation. I cannot guarantee that any innovation team will win. There’s no way I can’t predict which ones will win and which ones will not. I mean, I sat next to Instagram, and I thought they had a terrible idea. I sat next to them in dog patch labs, and I just was utterly disinterested in their idea because I’m utterly disinterested in photographs. But what I can kind of tell is that teams that experiment and release product very, very quickly seem to have a much better chance. And there is some data that’s coming out of Harvard that’s starting to back that up. I’m not sure who, but I think Tom Eisenberg just released a bunch of data that was saying something along those lines.

Elijah: Great. We should post that along with the podcast if we can find it. Fantastic.

Tristan: Absolutely right. So that sort of data is finally coming out, like ten years after the Lean Startup was released. And it’s kind of what we expect, right? Because the more times you release and the faster you release product iterations is, the faster you’re going to learn. And that makes you much more likely to have a success when you can test things, learn from the data, and release the new, improved version.

Elijah: And I guess there’s another fundamental aspect to, again, what is their fault? What could be seen as a mistake from a team and not like, setting strategy, saying, we only have AIpowered blockchain enabled, I don’t know, what is it? Motorbikes or whatever it is right now, and then it’s not really a day. And they’re testing basically that strategy in a sense, with what they’re doing, and that has nothing to them.

Tristan: Well, strategy, I think, is testable, and it is measurable in some ways, but it’s kind of similar in that you have to test it and measure it in different ways. You have to isolate the variables if you’re going to understand what’s going on. Strategy is so complex and can be impacted by so many things. Like we were saying, people can’t execute, the weather turns bad, and so you have a tough time running your marathon. You got the wrong shoes that day, you just put your foot in the pothole. Whatever the case may be, there are a lot of different things that go wrong and that are simply beyond your control. So you have to try and isolate them. If you really want to measure something like, do we have the right strategy? But I know that a team, but.

Elijah: If we have it or not, the.

Tristan: Strategy doesn’t do anything.

Elijah: Right. Yeah, but fundamentally, it’s not the team’s issue at all, potentially at all to say the company said run in this direction and that was just not the direction to run in. Yeah, I was trying to make that point more.

Tristan: Yes. Got it. Yeah. If the strategy is out of their control, then that would be unfair to.

Elijah: Measure them against, but that’s probably our primer for them. Maybe that should be the next topic, measuring strategy rather than anything else super juicy.

Tristan: Sure. I mean, that’s definitely a hard one. I have thoughts on it, and I think there are things you can measure that are kind of leading indicators, but it is a hard one.

Elijah: Yeah. Okay, great. So we have established the problem sphere quite a bit, and let’s dive into what we can do now. So how can we more effectively measure and more fairly as well? I guess there are different aspects. So for their behavior as well as for the product, maybe it start with the behavior first. So what would be a good measure of a team in that sense?

Tristan: Yeah, I think what you’re talking about is kind of separating the measurement of the product from the measurement of the team or the performance of the team, which I think is a very smart point. Right. Because the team innovation team, at least if we’re talking about innovation, could perform very well. But as you said, be going in the wrong direction, like the wrong strategy, or just things don’t work. Something random happened and things don’t work. There was too much competition. The trend went the other way. There was a global pandemic. Something happened that prevented the team, prevented the project from succeeding, even though the team did everything right.

Elijah: Yeah.

Tristan: So I think you and I are probably both on the same page in that we like to first measure just the experimental velocity. Like, is that team learning something, putting something out there every week? And then, of course, the insight velocity, are they learning something every week from what they put out there? Whether it’s an interview or product of some type that gets into the hands of the customer and generates some sort of data as to whether or not the customer has a problem likes the solution or anything like that. Anything that will tell you if there is some semblance of product market fit there, great.

Elijah: So we have experimentation velocity and we have inside velocity and I think those terms, they fly around and I would love to dig in here a bit into them. May 1 of all, how many organizations do you know that use those measures? Is there a fair amount? Is that growing? Is that growing at a certain rate?

Tristan: That’s a very good question. I honestly I couldn’t tell you. I know that everybody I works with use those measures because I will count them. I know that some organizations that I’ve worked with have tried to establish those metrics at scale so that they can measure teams across regions and see if the teams in Europe are performing better or worse than the teams in Latin America. That is a challenge, I think. It’s not one of those things that makes a huge amount of sense to put too much effort in, but it’s a good kind of traffic light measurement, I think. It doesn’t make a lot of sense to measure that. This team ran eight experiments and this other team ran four experiments. Therefore the team that ran eight experiments is somehow better than the team that ran four because you start getting into these very silly detailed arguments about, well, my four experiments were more important and was quantitative instead of qualitative data and the data that we generated was much better. So then you start arguing about the story ports, story points or the insight points that have been generated there and then it just kind of becomes too gamified. But in terms of a traffic light system, I know that team not running any experiments. Zero experiments each week, red light, maybe one experiment per week or two experiments a month, that’s probably a yellow light. And one experiment per week, that’s great. Green light teams doing well, one or more, that’s a green light for me.

Elijah: Yeah, I think you wrote once that if they run in one experiment a week, it doesn’t necessarily that they learn it doesn’t necessarily mean that’s not what you wrote, but I might add that it doesn’t necessarily mean that they’re learning the right thing at the right time.

Tristan: No.

Elijah: Yeah, but if they but if they don’t, we definitely know something’s wrong, right? At least in most, in most industries, most fields. So that’s interesting, but can we not potentially compare at some point or is there not value in the information if two teams run the same type of experiment and one is generating or doing it cheaper, for example?

Tristan: Oh, sure, yeah.

Elijah: So then we have suddenly like interesting insights that can where we do continuous improvement, in a sense, on the way we learn and push efficiency. I don’t know how much future that’s really what I’m wondering here. And you’re probably one of the best people in the world to ask right now, given that you have worked with a lot of teams and I don’t know how utopian that kind of thinking is, I find it very exciting to imagine a world where that exists. And not, again, not to punish a team, not to say, like, hey, you’re paying twice as much here for customer insights or whatever, but to say internally, we can do better. We can train each other, we can help each other to push this to the next level.

Tristan: Yeah, I do think that would be great. That would be like the dream dashboard. If I could see exactly how many dollars that each user interview costs and how much each insight from that user interview cost. I’m too much of a pragmatist. I think that if I use a very rough scale, I know that if a team is outsourcing their user research and their customer insights to some sort of marketing agency or research agency, and they’re paying $50,000 and that research agency is going out and interviewing five people, that seems like too much. That’s way too much.

Elijah: Not for the research agency, not for the research agency.

Tristan: It’s a good business to be in. But on the other hand, I’m not going to quibble between one team that spent $25 a person for a Starbucks card for their interviews and another that paid $100 gift certificate to Best Buy. It’s not worth analyzing at that level of detail. So I am concerned. I sit on a number of growth boards, and so growth board is, for listeners that may not be familiar with the term, is just like an investor board. It’s a group of people that allocate funding to different innovation projects. So if an innovation team comes to that growth board and says, we want $50,000 to run some customer interviews, then yes, that will raise some red flags in my hand head, and I’ll certainly ask, why is it going to cost that much to go talk to five people? I know it’s a pandemic and all, but could you not go out on the street with a mask on and try and talk to people? But anyway, it’s not worth measuring at too final level of detail because especially with innovation projects, no two are alike. And even two A B tests on a website are going to be a little bit different. So if the cost varies a little bit, screw it.

Elijah: Okay, let’s see. Let’s take a step back. So we have experimentation velocity. We have inside velocity. Those two are different. I think we should go into that. But do you want to say something? Do you ever split experimentation and learning velocity? You see sorry, that is what you referred to earlier. When somebody has a bad day because they don’t sleep because of children. I put that that’s me right now.

Tristan: Ellie is tired.

Elijah: I am tired.

Tristan: So experimentation, research, like a bad kid day.

Elijah: I need to do some yoga with Adrian.

Tristan: Yeah, some yoga with Adrian. One more view for Adrian.

Elijah: One more view for Adrian. Big shout out to our sponsor, yoga with Adrian. No.

Tristan: Innovation. Yoga with Adrian. We will be generating new poses. New poses on the fly. Wow.

Elijah: We could run some experiments with her, right?

Tristan: Yeah, absolutely.

Elijah: Okay. We talk about or I think you were the first one I made that really nicely clear for me at least ever, the difference between an experiment and research. I think you use it as in your daily practice, I think you refer to everything as an experiment, even though you’re clear there is a difference. Maybe we should talk about that, or do you want to quickly explain the difference? And when we use it, how we use it?

Tristan: Sure. Colloquially, we often say experiment, and we include things like customer discovery under that bucket and say customer discovery or talking to customers and doing interviews, or rather listening to customers with interviews. We might call it an experiment, but it’s not a very precise definition. It’s not so important to get the terms right, but it is important to acknowledge that there is a distinction between what we would call generative research and an evaluative experiment. And the main difference is that in an evaluative experiment, you are attempting to evaluate whether a hypothesis is true or false, meaning the buzzwords there would be validate or invalidate something. Right. So I believe that a green call to action on our landing page will be more effective than a blue button. Therefore, I have a prediction. My prediction is that our conversion rate will increase by 10%. I’m going to launch that landing page with an a B test, and I’m going to see if my prediction is true or false. If it’s false, my hypothesis will be invalidated. That’s an experiment, right. One where you kind of get this yes or no answer at the end, which tells you if you’re right or if you’re wrong. Hopefully one of those two. Sometimes it just tells you that you don’t have enough data, but that’s the idea that’s the goal is to tell you true or false, whereas generative research, there is no such goal. Your goal is literally to generate ideas or to generate one clear idea or one clear hypothesis that you can then go and test. So when we go and interview customers, in the sense of generative research and customer discovery, we’re really just going out and trying to listen to them. And hopefully we’re going to come away with five new ideas about who our customers are, what their pain points might be, what type of marketing channels they frequent, what magazines they read, what television channels they watch, what YouTube channels they watch. Yoga with Adrian? Of course. So we’re just looking for new ideas, or we’re looking to narrow our idea down to one thing specifically that we can then go and test now in generative research. It does sometimes happen that you had an assumption there and you thought that out of 20 people in Silicon Valley, at least ten of them would know who Adrian was and what her yoga show is all about. But you speak to 20 of them and zero people know that. I still have no idea who that really is, but I’m just going to.

Elijah: Assume she’s fixed that after we fixed it. After the podcast.

Tristan: Right after this, I’m going to do some yoga with Adrian. But if you did interview 20 people from Silicon Valley and your hypothesis was that everybody knows who yoga with Adrian is, all you have to do is talk to one person like me to realize that that cannot possibly be true. Right. There is at least one person who has not heard of Adrian’s yoga classes. So you might accidentally invalidate something with generative research, but the purpose is to generate ideas, not invalidate them. Even if I have an assumption that I’m going to put this survey out in, let’s say, Siberia in December, and I fully expect that the most annoying thing is going to be the utter, bitter, freezing cold, it’s still generative research, right? Because I might get a few answers that say something like, well, the potholes in the road, it’s still open ended. There’s still this possibility that I might generate some answers even with a strong assumption. But if you change the question and make it a closed ended question or something that’s highly structured to elicit these types of responses and say, are you concerned about the weather? Yes or no? And I set what I would call a fail condition or some people might call a success condition or a benchmark. I want 70% of people to say yes in order for me to believe that the general population has an issue with the weather and therefore we should work on innovating weather control devices or perhaps just umbrellas, which are the MVP of a weather control device. Right. That is an evaluative experiment.

Elijah: Okay, cool. Thank you. So since we have that a bit clearer when you then measure right. It’s the annoying theme and topic of the show for those who haven’t realized.

Tristan: You haven’t gotten that, you’re in trouble.

Elijah: So do you differentiate there? Do you at some point go, all they do is not all they do, but most of the activity this team does and no matter where they at is they go out and they talk and they talk again and they don’t run, quote, epics. Is there value? That’s really the question from your perspective at least. Is there value in that insight? Should that be measured?

Tristan: That’s interesting. I hadn’t honestly thought to do that in a deliberate fashion. I think that could be an interesting question at scale. And I can kind of tell you what my sort of anecdotal assumptions would be there in that. If a team is still running customer interviews after three months. Something is wrong. Something’s very wrong. At that point, or actually, let me rephrase, I would like them to still be running customer interviews after three months. But if they are only running customer interviews and they’re not building anything at that point, or they haven’t tried to run a concierge test or do a solution interview or do a paper prototype or anything like that, or even just do a landing page test if they haven’t run any evaluative experiments or solution generative research at that point, yeah. Then that’s a big problem. I haven’t thought to measure it that way, but honestly, it’s never come up. It would be a good research project for, I don’t know if any academics are listening, but that would be a very interesting research project, data that I’d be interested in. But of course I only see the projects that I see and those projects that I see, I am kicking in the butt to move as fast as possible and to not just run interviews, but actually validate the demand with some sort of value proposition test pretty quickly.

Elijah: Now this is great because not just if the ratio changes, but do they continue running interviews, that might be even more important. When I think about it, that’s not what I thought about. Right. That might be even more important inside together.

Tristan: Yeah, I mean, it gets a little screwy at some point of scale, right. Because at some point if the team is becoming 20 people, 30 people, then of course we might not expect everybody in the team to be running interviews every single week. But we said a good team is kind of rotating people on and off the customer support line and generally engaging customers frequently. Now that you’ve mentioned that, though, I would be really interested in data to see if teams that use a greater variety of experiment types are in general more successful. And my hypothesis here would be yes, absolutely. Because a team that is just running a B tests and is just coming up with an idea of putting up a landing page coming up with an idea of putting up a landing page is going to be far. Less successful than a team that does customer interviews, generates a landing page based on those customer interviews, and then runs that a B test, then runs a concierge test, then runs a wizard Rods test. Generally, that’s what I advocate that. I think you advocate that. I’m pretty sure David Bland advocates that as well. Most everybody I know goes that route, and I’m very resistant to teams that have a preferred method of only running surveys and they just want to run surveys. That’s not a team I want to bet on. So I’d be very interested in that data. Again, fingers crossed that some academic with a lot of funding is listening to this. Please run that research report.

Elijah: Okay, great. Nice. So I wanted to. Go into experimentation inside velocity, because that can get tricky, and I want to see if we can assist people. But in order to measure this whole thing, maybe let’s quickly mention that it needs to be recorded. Right. So report card. So when you think about just going into that, the difference between maybe if you just record the type of experiment or the type of research you’re doing, then you have the data. You probably don’t need to split it in a report card. Is this an experiment or is this a research piece? That should be sufficient. Right?

Tristan: Yeah. I mean, if you had the experiment card, the A four or whatever you want to call it, and you had the data on that card and you could parse it up and slice and dice it, then you would be able to tell all that very quickly. And that, in fact, is what I’ve done with innovation programs that are scaling up, is just to look at the data very roughly and see what’s going on without doing any hugely sophisticated data analysis. I’ll look and see what type of experiment is happening, even if it’s not written down. I will pull up the mural board digital whiteboard where teams are working and just look each week to see are they doing retrospectives? Another, I think, very important measure of an innovation team, or any team for that matter, is do they do retrospective once a week? That’s a very good thing. If I can see that on their boards, I won’t necessarily go through and again write everything down and count everything up. But it is something I’ll check where if I see a warning sign from a team, like they are not running experiments once a week, they’re not generating insights, or I just hear through the grape prime that there’s some people that are concerned about that team, I will look on their digital whiteboard. I’ll look in their experiment reports. I’ll see, are there experiment reports? Is there a digital whiteboard? How many experiments are they running? Are they running retrospectives? Are there any lessons learned? The worst thing for me is finding an experiment card and seeing that there are actually no results written down on that experiment card. That’s a bad sign to me because even if they actually ran the experiment, it means the data is sitting in somebody’s head, is not accessible, is not being shared, and if that person gets hit by a bus, the team has a problem. These are all kind of warning signs.

Elijah: That I can use or more likely leaves the company. Right. And I guess that’s massive.

Tristan: Yes, it was for dramatic effect.

Elijah: It was very well done, but I’m just saying.

Tristan: Yes, exactly that’s the more common thing is they walked out of the building with all those insights, and suddenly they’re competing against the parent company.

Elijah: Right, okay. We don’t know if the team will return monetary value to the company. But they should at least what their primary purpose is to ascertain knowledge, right? Like to bring back insights and so it belongs to the company in a sense.

Tristan: Right.

Elijah: And so you do want to give it to the company or to the to the entity and that seems to be the most plausible vehicle, you know, to write it down in in a report card, right?

Tristan: Yeah, I think so. Unfortunately, the I haven’t seen any really great tools that would allow the sort of data that you described in your utopia. There aren’t great tools for recording experiments on a regular basis. There are some tools like glider and I use Trello in my team and I’ve hacked together some systems in Google Drive for some other people. But none of these tools are really ideal. They don’t make for really good knowledge bases. That’s a whole nother can of worms for innovation teams, is that they’re just not very good at saving their insights and sharing them. And the tools that are supposed to be well designed for this just aren’t very good. Teams hate them. They feel like they’re forced to use them. I haven’t seen anything great, actually. Have you? Have you seen anything like no, that’s.

Elijah: The reason for the podcast. I just wanted to.

Tristan: Find it would be useful.

Elijah: Yeah. I find it utterly frustrating and I know I allow myself to dream a little bit and I know it’s nothing that’s going to be implemented at scale today, but I think we should be allowed to dream about that at times. I think this is very interesting. And you recommend basically to use still trello in combination with Google Drive or something like that. Is that still your.

Tristan: Yin? Yes and no. That’s a secret German word. For those of you who don’t speak German. I don’t necessarily recommend any tool. I try and use the tool that already exists in whatever company I’m dealing with. So if they’re using Microsoft teams and they want to use Word Doc, we’ll use a Word Doc. If they’re going to use Google Drive, let’s use Google drive. I’ve got enough on my hands. If I’m trying to get a team to run faster and run experiments, that’s plenty of behavior change. Right. So I don’t want to tack on learning a new tool. I’d rather just use something that people are comfortable with. Trello is pretty easy to use, so if teams aren’t using anything, trello is a pretty good one. I do think Google Docs is in general very good, but trello is very poorly searchable for this sort of thing. Like you have to spend a lot of time categorizing into different columns and different boards how your insights go and tagging them in some way. It’s kind of a pain. Google Drive is a little bit better just because it is Google, so they usually have very good keyword search, but at scale, none of these tools work very well. Frankly. Yeah.

Elijah: Or you use like a PDF report card and attach it to the trello to the Trello card, that’s the first step. Or what do you do?

Tristan: No. So if you’re doing something like a PDF or a Word doc, having it in some sort of putting in some sort of tool that’s going to allow it to be keyword, searchable is pretty key. That’s why Google drive works decently. And because Google Drive, I can shove a PowerPoint into a file, and if I want to write my experiments down in PowerPoint, that’s fine. If you want to do them in PDF, that’s fine. If you want to do them in Google’s Locks, that’s fine. It’s all searchable because Google is indexing every single word. Of course, I need to train my teams to use the same words. That’s the biggest problem, because if I have a team in Buenos Aires and a team in Beijing that are targeting the same persona, I need them to call the persona or the customer archetype. I need them to call that Sam the shopper or Samantha the shopper or whatever the case may be. So that when somebody is looking for insights about Samantha, they’re going to find them. They’re going to find both the insights from Beijing and the insights from Buenos Aires if there is a persona that crosses those two geographic regions. So that’s the difficulty. Basically, it’s the human part, getting everybody to speak the same language.

Elijah: Great. I think that was fantastic. Just had a noise in the background. I think that’s fine. Yeah, great. Fantastic. So coming back to experimentation, maybe two things. We’re speaking for quite a while now, which I enjoy coming back to experimentation versus inside velocity. Do you want to speak to that? Because you run one experiment, but you might have three insights and so on. I think it’s a confusing topic. Can you speak to that and can you speak to how do you say this was inside one? Inside? How was this two insides?

Tristan: Okay, yeah. I think it’s very important to acknowledge that both in terms of experiment velocity and insight velocity, I really count it as one or more. I am not counting the actual number of experiments. I am not counting the actual number of insights because I don’t care, because the insight of we have achieved product market fit is clearly weightier than the insight of I ran this comprehension test and 80% of people seem to understand my value proposition.

Elijah: Right?

Tristan: Obviously the insight of 80% of people have purchased my value proposition or purchased my Widget. That’s clearly more valuable than just running a comprehension test and knowing that people understand it. But again, you get into this, really, you just get diminishing returns in terms of the value of this information. As long as the team is learning something, I don’t want to sit around trying to judge which insight is more valuable or which one is worth five points and which one is worth four points. It’s not worth the time. I’d rather focus on just getting the team to run as fast as possible. If they do that, then I worry about trying to get them to go in the right direction. I’m not going to quibble about which insight is more valuable.

Elijah: So is that my brain or did I hear that you’re not differentiating really between experimentation and insight velocity?

Tristan: So how do you just in terms of how you measure them right? Sort of what level of fidelity? One or more experiments per week and one or more insights per experiment. I’m happy. What I’m really trying to measure with insight velocity is whether or not they are running experimentation in a way that produces data and insights. Okay, that’s it. It’s really just at scale. What I’m measuring is the percentage of experiments that generate an insight and I would like that number to be as close to 100% as possible. And just to be clear, I’m using experiment in the broad sense of generative or evaluative. But if they run something and they don’t learn anything, it means the experiment has failed somehow. Like not failed as in you’ve invalidated the hypothesis, but you have failed to validate or invalidate. It means you ran the experiment and the results were inconclusive. So that’s what we want to avoid when we run experiments. The general premise here is that you must run experiments of some type in order to generate data and you’re not going to generate any insights sitting around thinking about it super, super hard. You’re going to generate a lot of ideas maybe, but you’re not going to generate any insights. An insight is something that is valid information, that has data backing it up in some sense. And you and I and I think a lot of other people, hopefully most of the people listening to this believe that the more insights you have, the more likely you are to achieve product market fit and eventually get the impact or revenue that you are looking for. That it is a prerequisite that the odds of you succeeding without any insights to begin with are very poor. You might just take a stab in the dark and say, I’m not going to do any research, I’m just going to build this giant thing and throw it out there. But at the end of the day, you’re going to get at least one insight which is did anybody purchase it or not? And then you’ll know so we believe that more insights increases your likelihood of succeeding. Therefore the thing that generates insights is experiments or research. Therefore we want to run experiments and research every week and then we can kind of go up the chain from there. So run experiments. The highest percentage of experiments possible should generate insights and hopefully those insights will lead to a product or service that succeeds.

Elijah: Thank you for clearing that up. That’s another interesting one. I think there’s this concept out there that you want. So this could be another KPI for the team itself. Like that you actually want them to fail a certain amount of experiments. So the idea behind that is that if they don’t fail at least X amount, not they, if the experiment fails not a certain amount of times, then they’re not pushing themselves at least the knowledge that they have currently. And therefore we can assume that if every experiment works out, let’s say, right, everything is, so to say, validated, then we would assume then that they’re not trying anything new. They work around what they or the organization at least already knows. And that’s not really what we’re trying to achieve with transformational innovation, with brand new business models. So we could say there’s something going on. Do you want to elaborate on that? And then maybe say, do you measure that? Do you look into that?

Tristan: Yeah, no, I think that’s a great point and I would just kind of want to make it clear as best as possible to our dear listeners that when we say the experiment fails, like what Elite is talking about in this case is validating or invalidating the hypothesis. Right? So the hypothesis may be incorrect, but the experiment has still succeeded. By proving something incorrect, we’re talking about we don’t want to see 100% of hypotheses that are true. That certainly means that you’re not testing anything super exciting. So if you’re testing the difference between a product image, let’s say, pair of shoes on your website, versus a smiling person holding up those pair of shoes, that is a pretty boring experiment, right? Because there’s plenty of data out there that says pictures of products with smiling people are going to sell more. Unless you’re trying to sell funeral services or something like that, in which case you probably should not have smiling people in contextual, but hey, you never know. But the point being that I think you’re absolutely right. You don’t want to see all of your hypotheses succeed. I have never specifically measured that at scale. It is something I will look at when talking with a team, particularly typically, again when I’m engaged with teams or when I’m sitting on a growth board, I’m seeing their hypotheses before they run the experiment. Sometimes I’ll let a team throw an easy path, so to speak. Maybe the team’s feeling down and they just want to get an experiment that looks like success. They want to run an experiment on a hypothesis. Seems pretty easy, right? There’s a decent reason to give the team a pass at just testing something that they know is probably going to be true. But if they’re doing that over and over again if I see that this team is asking for $100,000 and they want to test three things that are super obvious, like people like cereal or people love ice cream, that’s not worth testing, and it’s certainly not worth $100,000, I want to know that there is a kind of return on my investment. And the return I’m looking for is insights. And if your insight is that people like ice cream, that is not super impressive. Now, again, I’m not going to count the story point value of that insight, but I might count that as zero. That’s not an insight. Not an insight if you already knew.

Elijah: It potentially this very exciting. I find it a very exciting topic. On one hand, the concept of failure is nearly abused. Now I think, oh, we just failed, and it was just yeah, but you shouldn’t have like, in a sense that you should have never tried that. You should have never run that test, that hypotheses should have never been tested.

Tristan: You mean for a hypothesis that is just obviously false?

Elijah: Obviously false, or it’s obviously clear or it’s not the most risky one. Right. Like, this is just not the type of thing you should have learned, and it’s just like, no, we failed. I don’t want to go too deep. I think there’s its own podcast, right? Too deep into that.

Tristan: Yeah.

Elijah: So the whole concept of failure, but on the other hand, and it prevents us to really look at failure in the most positive really in the most positive sense. And I thought when you start contemplating the idea of you need to fail at least X percent of your experiments, now, people will hack that. Again, I’m pretty sure people will hack that, but it gives that freedom to really do something silly or so to really go out there and try something really new and really disruptive and actually, truly fail, in a sense, with all the best intentions behind it.

Tristan: Yeah, I think you’re kind of digging into a few things in there that I think are probably worth, again, worth a lot of more academic research than we can afford with our time. But the one thing you said there was people might gain these metrics if you measure them on them, and I think that is absolutely true. It’s one of the things we sometimes say in my team when we’re talking about teams that we coach is measure but don’t count. I mean, that is sort of a double entendre. There like, literally don’t count the number of experiments. Just one or more is great, but also don’t count as a KPI for the team. I don’t like the team staring at the dashboard, so to speak, or looking at a leaderboard and saying, oh, we only ran three experiments and they ran seven experiments. I think that can engage in the sort of behavior that you’re describing, which is another reason for me why to just count like, one or more insights or one or more experiments, because it’s not a race to run the largest number of really low value experiments. Ultimately, you want to make progress on the product, right? So experimentation velocity and insight velocity and even retrospective velocity. These are all things that measure behaviors that the team is in control of that we think will be a leading indicator of project success. But ultimately, especially for an innovation team, the innovation team is going to need to generate a product dashboard, or a project dashboard, I should say, in case it’s a service. And they’re going to have to figure out what their own leading indicators of success in that product for service are. Whether that’s the acquisition rate or the cost to acquire a customer, or the conversion rate on the homepage, or the retention rate, there are going to be a lot of different numbers that they’re going to have to figure out and figure out which is their greatest lever for success, depending on the project. For some projects, like SaaS products, retention is a very important metric for those products. For a media company, it’s going to be virality. Social media companies also. Viral coefficient is probably the most important measure of leading indicator of project success. But those teams are going to have to figure out their project success based on the actual project that they’re working on. So these measures that we’ve been discussing are pretty generic. There are generic measures of behavior, but ultimately, the team is going to have to generate their own dashboard, and that should supplant these behavioral metrics at some level.

Elijah: Okay, so for that, we probably need to talk more about how we measure products yeah, absolutely. In another session and tie it back to this. We probably have to stop there as much it would be just organically flowing into that at this point.

Tristan: Sure.

Elijah: But maybe let’s recap. So we have experimentation velocity, we have got inside velocity, we have retrospectives or retrospective velocity.

Tristan: I guess you could call it retrospective velocity. It sounds kind of funny, but it sounds really funny. Velocity is not even the right word, but it’s just like doing all of those things. Yeah. And it’s at scale. You would measure the percentage of teams that are running retrospectives or the percentage of teams that are running one or more experiment per week. But when you’re looking at an individual team, it’s just kind of a binary state. You could look at the percentage of weeks where the team ran one or more experiments. I like to look at a four week cadence, four week rolling cadence, last 30 days. What percentage of weeks were the team able to generate? One experiment, one insight run retrospective? Yeah, that’s the way you measure it on a team basis?

Elijah: Yeah. Okay, fantastic. So that would be three when we look at it. And what are some other ones maybe too? I’ll try not to get into them too deeply. I think we’re hitting an hour soon.

Tristan: But I think the other obvious thing that everybody should be measuring is qualitative data here. Right. Like, don’t just rely on these numbers these numbers are potentially good traffic lights to tell you something is going very wrong. But a good sprint demo every week for all your teams, where teams are demonstrating what they’ve done in the past week and what they’ve learned in a quick five minute presentation is going to be extremely valuable to you. You will generate more insights by watching the qualitative presentations than you will just staring at the dashboard. Those dashboards are good to be aware of. As a coach, if I’m working with one team, I kind of keep those metrics in my mind, and I am kind of measuring them as I go at scale. They’re very useful when I can’t attend a sprint demo in Melbourne and New York City. If I can’t do that well, those metrics are very good for high level, large scale programs, but most people aren’t really dealing with that. So look at the qualitative data.

Elijah: Great. It I think that’s it. Well, thank you, Tristan. I think we wrap up the topic here.

Tristan: Sure.

Elijah: That’s it. Cool.

Sign up for the latest on Innovation Accounting, Lean Startup, Workshops and Innovation Ecosystem Design.

Episode 2 – Why Innovation Teams have to be Measured Differently and How to do it – Podcast Transcript

Main Episode Page

Episode Transcript

Related Posts

Episode 15 – Psychological Safety, Cognitive Diversity, Trust & Fear in Innovation – Podcast Transcript

Episode 1 – From Quarterly to Weekly Learning Cycles or “The Man in the Mirror” – Podcast Transcript

Episode 14 The Fruit Tree Analogy: Cultivating Business Value at Microsoft Garage – Podcast Transcript

Episode 13 The Science Behind Becoming an Entrepreneur – Podcast Transcript

Episode 12 – How to Measure Australia’s Innovation Ecosystem – Podcast Transcript

Leave A Comment Cancel reply

PREDICT YOUR INNOVATION ROI