We are thrilled to announce 📢 Kosli is now SOC 2 Type 2 compliant - Read more
New Feature: Kosli Trails is liveCreate comprehensive audit trails for any DevOps activity - Read more
John Willis DevSecOps

DevSecOps: The Broken or Blurred Lines of Defense

John Willis
Author John Willis
Published March 14, 2023 in features
clock icon 35 min read

With the modern patterns and practices of DevOps and DevSecOps it’s not clear who the front-line owners are anymore. Today, most organizations’ internal audit processes have lots of toil and low efficacy. This is something John has referred to in previous presentations as “Security and Compliance Theater.”

In this talk, filmed at Exploring DevOps, Security, Audit compliance and Thriving in the Digital Age, John takes a deep dive into DevSecOps and what effective governance will look like as regulation and automation continue to have competing impacts on the way software is delivered. 

He’ll ask how we came to be at the current pass with references to well-known risk and compliance failures at Equifax, Knight Capital, Capital One, and Solar Winds. 

Full Transcript:

So if you think back in time, my evolution of trying to get people to change the way they think about what we do in security from a DevOps perspective, the Abraham Wald story, you have probably heard it, you just haven’t heard it with his name. 

So during World War Two, there were a bunch of mathematicians and statisticians where their job was to figure out how to do weight distribution and repair of fighter planes that came back with bullet holes. This Abraham Wald one day woke up and said – and this is really the definition of survival bias – “Wait a minute, we’re repairing the planes where the bullet holes are? They’re the ones that are coming back. We should be repairing the planes where the bullet holes aren’t, because they’re the ones that aren’t coming back.”

I think that’s a great metaphor for the way we think about security: maybe we’re looking in the wrong place? And so I asked this meta question about three or four years ago – which I hope makes your brain hurt a little bit, but in the Abraham Wald way – which was “What if DevSecOps happened before DevOps?”

Well, the world would be different. Because if you think about it – I’m pro DevSecOps, I think everybody should have a good DevSecOps reference architecture – basically what happened was we did all this DevOps work, and then we put an overlay of security on it, and that’s good, it’s necessary. But maybe we already had this bias of bullet holes when we were thinking about that?

What if we started with security? What if some security person said, “I’ve got a great way to do security. I’m going to call it DevSecOps!” and we started in that order? Could things be different? Would we be thinking differently, or have we not thought differently? So Shannon Lietz, who is one of my mentors – she wrote the DevSecOps Manifesto – coined the term “everyone is responsible for security”. 

We were talking at the break about these three lines of defense. So I don’t come from an auditor background. All I know is that I get brought into all these companies and they would ask, “Hey John, look at our DevSecOps reference architecture!” And I’d go, “Well, that’s awesome,” and then we would have a conversation.

“Yeah, we buy the three lines of defense model.”

“Erm yeah, that one is not so awesome!”

Because Andrew Clay Shafer, in the earliest days of DevOps, pre-DevSecOps, made this beautiful character of what described the problem of the original DevOps problem statement. There was a wall of confusion – and some of you people look like you might be close to as old as I am – but there was a day when developers would throw their code over the wall and operations would catch it and say, “It doesn’t work!” And the other side would say, “No, you broke it, it works!

And this would go on for weeks and weeks and weeks, and Andrew would talk about figuring out a way to break that wall. And there were some of the original DevOps, these beautiful stories of developers working together in collaboration, and there is a whole industry that’s been built out of it.

So there’s busting the wall and it becomes a metaphor for any non-collaborative groups in an organization. And so when I was thinking about what was the problem statement that drove DevOps, where are we now with the problem statement of what happens in a large organization between first line, second line, and third line? The way I view it when I have these conversations, the second line is by definition a buffer for the third line.

Second line has no way to communicate with the first. And this is what “dev” and “ops” looked like 15 years ago. We didn’t have the tools. We didn’t even have the cognitive mapping to have those discussions. We didn’t even know that we should be having those concerns. In Investments Unlimited, we have a short description about how I’m not going to go to the Institute of Internal Auditors and say, “Hey, I’m John Willis, you’ve never heard of me, get rid of three lines of defense!”

That ain’t happening. But what I am going to say is, just like we do a separation, can we reframe – and we did this a little bit in the book – the conversation of how we think about this. Why can’t we create that DSL I talked about, where the second line can meet the first line in designer requirements?

And here’s the kicker, right? I don’t claim to be a genius about all things. But what I do know is every bank and every company I’ve walked in, what’s the purpose of that second line? They basically make sure they do the job that they need to do. What’s the job they need to do? Protect the brand. 

That’s it, right? Everything falls on protecting the brand. When you’re Equifax and you lose $5billion market cap in a day, or you’re another company called Knight Capital that was the second largest high-frequency trading company on the NYSE, they lost $440million in 45 minutes and were out of business in 24 hours. That’s what they’re supposed to do.

And our relationship now is to basically hide things from them. That’s got to change. And that’s why we get into the likes of Equifax, Ignite Capital, and Capital One. So what do you do? 

I had this idea when I started thinking about security. Have you ever been to RSA? You go into the exhibition hall at the RSA you’re like, “I’ve gotta get out of here!” There are too many lights, too many vendors, this is just too confusing. It was almost impossible to come up with a taxonomy for security because there’s just so many ways to discuss it and look at it. So I started thinking about how do I make it simple.

Could I come up with it? And like I always say – and I’ll keep saying it until I get punched in the face for saying it – a post-cloud native world, or using cloud native as the marker to make the conversation simpler. I’m not implying that security doesn’t exist for legacy in mainframes, it certainly does. But we could have a simpler conversation if we just assumed there was a line that we could say everything to the right is cloud native. 

And so with that, I will tell you that what we need to do, and what we do, are inconsistent. What we need to do or what we do is how do we prove we’re safe when we have some form of usually subjective audit service to our records that tell stories about things that might lead to screen prints?

And then how do we demonstrate we have a poor second line in our internal auditors or external auditors, try to figure out what all those subjective descriptions are, and we need to do both. So we need to be able to make a system that can prove we’re safe and be very consistent with what we demonstrate. And that’s the whole point of something like Kosli or just in general, this idea of digitally signed, immutable data that represents what we say we do.

So then the audit isn’t a subjective 40-day conversation, it’s a one-second look at the SHA and we’re done. So we move from implicit security models to explicit proof models and we change subjective to objective and then verifiable. Back to the cloud-native model, if you can accept there’s a post cloud-native world, then I can tell you that I can give you a simple taxonomy for thinking about security and not having to think about it in like 40 horizontal and 50 vertical ways.

I worked with a couple of groups and I started hearing from these CISOs, and I said, “I don’t want to call it taxonomy, but we can look at it as risk defense and trust, and we can look at it as a transition from subjective, to objective, to verifiable.”

So we went through in the last presentation in 20 minutes the risk from change at the attestation. I didn’t talk about continuous verification, but there are some really interesting products that are basically trying to use chaos monkey-like tools to go beyond just breaking things to actually, for example, this port should never be open… let’s just open the port.

If this vulnerability should have never got through the pipeline, let’s launch an application that has that vulnerability, right? So there’s some really interesting continuous verification. I’ll spend a little more on that. But then on defense, it’s table stakes that you detect and respond by Azure and all that stuff.

And then everybody’s basically trying to build a data lake right now, a cyber data lake, that’s the in hip thing to do. I’m not making fun of it, it’s required, but there’s some real thought process that isn’t happening about how you build a cyber data lake that isn’t just a bunch of junk. So there are a couple of vendors and projects that are thinking about ‘Can we normalize data and ingest like coming out of the provider?’

So for example, you take a message that might come from Amazon, the same message might come from Google Cloud and might come from Oracle, and it might be the same thing, like increased privileges. But the message is completely different, there’s no normalization, and so if you shove that all the way to the right in a cyber data lake, you’re going to have a hard time figuring out what the message even is, let alone that each one of them has a different meta definition for the ID and all that stuff, and at some point you really want to you want to attach that to a NIST or a minor attack framework tactic. So let’s do all that on the left side, and there’s some good work happening there. 

And then the trust thing is interesting too, because the thing that we’re seeing, anybody could file the SDS. When Mike said I sold a company to Docker, what I actually did is I had this crazy idea of ‘Could we do software-defined networking in containers?’ And we did it. We were literally me and this guy who pretty much invented software networking, we built it and as you know, this whole idea of how you do trust and build around it. If you think about SDN, it was changing a north-south paradigm of how traffic came in and out to the east-west.

If you looked at some of the traffic patterns going back 15, 20 years ago, 90% of your traffic was north-south. And then all of a sudden the more you got into high-scaled service, service mesh, all that stuff, it flipped. It went to 80% east-west.

And we built a network around that. Well, I believe we have to do that for trust now. And we already see evidence of this when we start getting into Kubernetes and clusters, and stuff like that. We’re seeing pods, like SPIFFE and SPIRE, some of the new service mesh stuff, ambient mesh – I am throwing out a lot of terms – but there is this possibility to instead of building this on or off north-south trust, we could create ephemeral trust within a cluster and it goes away.

So even things like secrets management – I think Vault is a great product today, but that stuff could happen really at the mesh level, where a secret just exists in this clustered pod, or cluster for the life of the cluster. And by the way, you’re in or you’re out, like you’re authorized for that cluster.

So I think there’s incredibly interesting stuff around what I call moving. And zero trust is table stakes, right? I’m talking about more of let’s really go to a level of trust where the world is going to be – I don’t know if it’s Kubernetes – but it’s definitely going to be a cluster-based compute, and then we could build our trust around that model. I know it sounds crazy, but hey.

So, risk differently. We talked about this in Investments Unlimited. This was the Capital One breach, which is fun for everybody except Capital One! Basically this was the stretch to Jakarta. Oh, wait a minute, this is actually Equifax, but that’s fine. So what happened was there was a vulnerability in one of the stretch two libraries, which almost everybody uses, where if you are an unauthorized system, you put a command in, it runs. And if that was their system, you could do whatever you want – this is what I told you about the breach.

But this one’s a little more interesting, this is Capital One’s breach. If you follow aviation disasters, this is like the Air France 447 of computing, if that makes any sense to anybody. So what was interesting about this one is they were basically rolling IDSs, so there was this window of anywhere from seconds to about five minutes where an adversary could poke in. There was this woman who was a crypto miner, who basically runs a billion curls a day, looking for one person to put up a proxy with the defaults on. And this team that was in a hurry got an exception, put up a proxy that left the defaults, and this one proxy, the default bypass was on. 

So this crypto miner got really lucky because the IDSs were rolling, they popped through, they hit that IP address – it’s a hardwired address that anybody who has ever worked with Amazon Web Service knows is the metadata server – so that capitalone.com question mark URL equals the metadata server, and because they were in a hurry, they probably cut and paste from a stack overflow some VPC definitions. And they were privileged, so they got right through, were able to dump the privileges and assume a super user power. 

Meanwhile, some developers left 100 million business credit card applications in S3 bucket. Here’s where it gets really worse. In business credit cards, the PCI DSS requires Social Security numbers be tokenized, but it doesn’t require the corporation ID to be that. I’m sure it’s everywhere, but basically half of the S corps are small businesses and they use the Social Security number as the corporation ID. So again there’s just all these loopholes that happen. And that’s called server side request forgery. 

I was actually brought into SolarWinds. One of the authors of the book worked for a big five and they wanted to get the contract for the clean-up job. So I was brought in to talk about automated governance. Again, we can make fun of SolarWinds all day long, but every software company out there is basically as bad as they are. By the way, all your software that you’re buying – now that I actually don’t work for a software company, we’re SaaS-based, we’re good! – you look at what SolarWinds was and it was terrible. The pipelines, it was just horrendous.

And so I go in talking about advanced stuff and they’re like, “No, no, we’ve just got to get DevOps!” So they weren’t really that interested. But I thought I’d be Johnny on the spot and go in there and take this CrowdStrike minor attack framework and say, “Alright, I’m going to really show these guys what they should use.”

Because basically what happened was the adversary got into Microsoft compiler. These are supply chain attacks, they are the really scary ones – where they’re not even going after you, they’re going after whoever you’re delivering stuff to. So they got into there. And by the way, they apparently rolled their logs after 18 months, so they don’t even know how long they were in there. They could have been there for years.

So CrowdStrike did a really good analysis, and one of the ones that I just caught – in fact it’s in our demo, that’s why I sent our demo – was they weren’t checking the image SHA. So what happened is they said, “I must build!” and they start injecting nefarious code into it so that when that product goes out to the Department of Defense or the bank, they’ve got this open backdoor out there.

And a table stakes attestation would be if it’s a clean image or a job file, is doing a baseline SHA and be able to look before or after, and be able to see if it should have been this, and there’s other ways to detect. And the other thing that was really interesting about why this idea of automated governance has to have an immutable and non-tamperable data store, they went in and actually created logs. That’s really scary if they get to live in your company. And by the way, they’re in your company right now, don’t think they’re not there now. They may not be finding a way around to do real damage, but you are incredibly naive if you don’t think there are adversaries sitting in your corporation. There’s polymorphic malware – I spent 20 minutes explaining how polymorphic malware works – they are in your company. The question is how hard it is or what are the opportunities that arise? The Air France 447 that allows them to get to the next step and to the next step. If they’re really smart, this is where it gets really scary.

They can actually tamper with the logs to remove the evidence that they actually did things. One of the biggest things to Equifax, when that was said and done, the Equifax story is really interesting. I know a lot of people who worked at Equifax and so their external auditors are like the thing that drove almost everybody to quit for the next two years after the breach: they wanted everybody to prove the negative and note the negative. In other words, they were like, you know what? We survived a nightmare because they didn’t change the data. 

That’s the scary thing. It’s one thing to have an adversary that dumps all a bunch of confidential data out in the wild, and that’s not good, it’s going to hurt the brand. You’ll go out of business if they change your system or record data, and they publish that. If you’re a bank and I change your counts; they were in Marriott for five years for that breach.

So if they’re really smart – and this is evidence that they do this – not only might they mutate your data, they’ll mutate the evidence of the data. That’s why it has to be in an immutable, non-tamperable store.

Defense differently: again I talked a lot about. You have to think about your site, don’t just build a cyber data lake. There are some really good opportunities to think about how you ingest at the provider level. And there’s a couple of providers now building these interesting SDKs – it’s called automated cloud governance. It’s from a group out in New York called ONUG, where you can basically use these SDKs from Microsoft, Oracle, IBM, and Google, where you can start building these NIST user data, and you can normalize the messages themselves. So by the time you get in the data lake, you’re not doing an incredible amount of compute consumption to try to correlate.

And trust differently, zero trust is table stakes. But I think the really interesting stuff becomes, certainly in this state the 207, and the good news is when I first was writing this, SPIFEE and SPIRE were just external projects. Now they’re actually built into the Istio, and Envoy, and service mesh, so they’re all there.

But Sigstore is really interesting, a Merkle tree-based solution that deserves to be looked at. The thing that I’m trying to get Mike and James and everybody really excited about is what’s coming down the pike. And here’s the thing: in the past, we had this buffer as IT, our first line people who know that the adversaries and the auditors, we are ahead of them. We’ve got Kubernetes. They won’t figure out all the dangers, the ghosts and dragons in Kubernetes until next year. We’ve been sort of living in that sort of delayed buffer. Well, now what’s happening is the people who write the Google stuff like service mesh, Istio, and Envoy, are now writing or getting contracted by NIST to write the documentation. 

So now some of the incredibly dangerous stuff that’s in Istio and Envoy, which is the service mesh part of Kubernetes, is well documented in English – easily read and both the adversaries and the auditors can see that there’s something called the blue green deploy that used to only happen in layer three. Basically what happened now is all the way through, stuff can happen to layer seven. And in layer three stuff or switch config, which is very hard for an adversary to get in your system and tamper with that stuff. 

But now an adversary just sees a find one leaky API or the YAML file and they can basically say, “You know what? I’m going to take 1% of all the payment traffic and send it to some latest version.”

I ask people, “Have you ever heard of the Envoy? Do you even turn on the Envoy access log?” “What’s that?” So that means there are banks that are running production, customer payment or funds-based stuff in service mesh – and they have no evidence of that. So this is for people like Kosli and us, who want to get ahead of the curve, a treasure trove. It’s stuff that’s going to happen so fast and people aren’t even ready.

And again, the takeaway is we had the luxury to think about the adversaries, how we don’t think they’ll figure that out because it’s so advanced. The people who write Envoy, Istio, and that stuff are now writing this documentation on how it works. So the adversaries are not stupid. You know, when you tell them there’s something called the blue green deploy, they might not know what it is, but once they realize it’s a reroute of traffic, then they’ll know exactly what to do with it.

By the way, that’s a GPT-3 image; all I put in was Bacciagalupe as John Willis, and that’s what I got. And the only thing I will say is – and this is probably worth a drink of water – I think the internet thinks I’m a clown. So that’s OK!

We’ve got some time for a Q&A, so I’ll bring a couple of chairs up and we can have a bit of a fireside chat.

If you have any questions, put them in on the QR code on your lanyard. So before we get into the questions from the audience, I’d like to pick up on what you were saying about the network stuff, because I have to say, when you started talking about this, Istio and Envoy, can we just stick with what we’ve got for now? And the more I started thinking about it, the more I thought, “Oh wait, hold on, this is quite interesting because again it goes back to the DevOps story because it’s another example of things that used to be in another department in the business where the developers get so pissed off with it, they decide that we’re going to put this in software now. So first it was billed as security, it’s deployments, it’s cloud, it’s containers. Time after time, we talk about everything is code, but it’s really developers doing civil disobedience against other parts of the org in some way. So networking is one area, but some of the conversations I’ve had this week are also about data. Maybe you could say a bit about that?

Oh, yeah. I mean, that’s another thing. So one of the things that John Rzeszotarski implemented this thing and one of the first interesting conversations that happened after, and he had he built it, and our whole thought process was this is for the software supply chain. And it turns out one of the API development teams saw it and did a pull request and said, “Hey, we built a system for API development workflow.” “Ooh, that’s interesting!”

Since this isn’t really software-defined workflows, it’s a workflow that shows evidence that decisions you made in defining an API, right? Like leaky API, all that stuff. And that sort of opened up this idea that it’s just a model for workflow evidence.

And so I started thinking about what else are we doing? And right at the time this concept of data ops was starting. Go back 15 or 20 years, let’s say 20 years ago, there were a lot of big banks that the way they put software into production was they just put the software into production.

And then DevOps came, CI and CD, and then DevOps, like it’d be very rare and almost probably pretty close to criminal if that happened in a large bank today. There are people that sort of bypass the system, but in general, I don’t run into any financial organizations that don’t have some form of pipeline.

But those same organizations have developers that stuff 145 million business credit cards into a S3 bucket through no operational pattern. And so this movement of data ops is ‘Could we do a workflow for how we move data?’ And we’ve been sort of doing the data warehouse forever, but now whenever a developer wants to get some data, there should be a process of how you tokenize it. Does it get ETL’d? What’s the evidence, so when it’s sitting in an S3 bucket? 

So imagine all the way that you’re processing. They say that you’re taking it from the raw data, maybe ETL process, maybe you’re sort of tokenizing, maybe it’s going through Kafka and doing this, this, and this, and it’s winding up here. What if you were keeping these Kosli or just this attestational evidence all the way so that when it goes to the S3 bucket, you could reject it? Just like you could reject the build, fail to build? Or even better, have a scanner scanning all the S3 buckets looking for any data that doesn’t have a verifiable meta of evidence.

Again, the two worlds have to mature, meet together, and I think the more of the conversations that happen about data ops, the more it puts us in a better or anybody who’s doing this model of that kind of evidence should naturally happen, and it could happen for it. I’ve seen people talk about doing it for modeling, for example Monte Carlo modeling. What were the decisions that you made and what’s the data that shows? When the model runs like it’s force majeure, right? I mean, at that point, once it’s been trained it’s going to do what it does. Now if it does really bad stuff, at least I can show evidence that these are decisions that we made when we were building the model.

This gentleman had a question. I know you, you gave a great presentation the other night!

Thanks! I was just thinking about the information within the data, and the kind of the situation that we are in is that the regulations keep changing, everything changes, right? So if we even have this tokenization or verification of the data that you’re using, whatever that is in the architecture, if the regulations change, what are you going to do about it? That’s what I was thinking, because if you don’t scan for it, but if you know where it is, that means that you can go out and you can pick it out. So the GDPR regulations, OK, we can’t keep it for six months anymore, it’s only three.

If you get the meta, it will tell you everything.

Then you know where you have what so you can actually change on the spot.

Here’s the beauty part of that: it’s the same thing with the software delivery, right? Remember I said earlier, the beauty of having that DSL associated as an artifact in the evidence chain is because if the requirements today are that you had to have this, this and this, and then in six months now there’s some executive order where we realize, oh, you had to have this, this and this, it’s point in time evidence because the artifact is part of the evidence. So when you’re looking at that data or that record, the evidence said you only had to have this. Well, it’s even more true with data. With data, you might have reclassified. I did some work with Nike – you want to talk about how interesting their data classification is in the cloud? You guys might not know who Michael Jordan is because you don’t follow American basketball, but at the time, the ownership of his data at Nike is cherished as a bank’s account data. The data about some of their big clients, so data classification, but then how do you mature the meta around it? And I think that’s a great point – if the policy changes so that it needs to be from six months to three months, if you have the meta tagged – which this model I think works really well for that – then you could just scan around and say, “OK, we got rid of all the data that’s been sitting around for four months, that used to be for six months, and now should be three months.

I think just to add to all of this, I agree with everything that’s been said. But we know from the SRE book and from Google that 70% of system outages are due to changes in a live system. We focus a lot on changes nowadays in the DevOps world about deployments, but there’s so much change that there isn’t a deployment, right? There’s a database migration or a user is provisioned or, you know, somebody needs to fix up a record in a transaction or something. It’s just so much more. But it’s the same thing, right? The data currently is siloed, ephemeral, disconnected.

And we talked about this the other day. What are the basics? And I’ll just throw out the basic four – it’s probably five, maybe six – but what are the basic four about audit?

  1. When did the change happen?
  2. Who did the change?
  3. Who approved the change?
  4. And then usually some variant of it was successful, was there a backup plan?

And that’s whether it’s data, whether it’s software artifact, whether it’s configuration. And again, when the orders come in, they ask about this artifact, which is some library we still, without something like Kosli or a solution like that, spend a lot of time grabbing a lot of stuff to prove it.

But when they ask us that same question about databases, I can tell you the answer is it’s chaos, because: one we don’t use data ops as a model; and two, if we had data ops, we could actually be more aligned with giving the evidence of who made it. Those should be standard in any delivery of any workflow, whether it’s an API, whether it’s Monte Carlo modeling, whether it’s data ops, or whether it’s software delivery.

100% agree. But in the interest of getting through some of the other questions, we had a question from Axel which I think is quite interesting: where do you think the CSO should be put in organization, both in terms of the formal setup, but also the activities and where they do it from. That’s an interesting question. 

I had a great conversation in the break, so I’ll give you my first uneducated answer. And it’s not too uneducated, because Mark Schwartz – he’s another writer for IT Revolution and he’s written a bunch of books – but one of his books is A Seat At The Table. And it’s an interesting book that sort of talks about ‘Are you really an IT company if your CIO isn’t really on a real seat at the table?’ And actually what I’ve done when I go into large companies, not that I’m talking to the CEO – I do actually get to work for CIOs quite often, but not CEOs, I don’t dress well enough for that – but the question I like to find out the answer to is ‘Where does your CIO sit today?’ Do they sit on the kiddies table or the big grown up table? Because if they’re not on the grown up table, I don’t care how much you tell the world you’re a data company or software company – you’re not a software company. So he makes that point really well and then he says that even companies that do create this category of achieving – and this might offend some people – but he says that creating a chief data officer is basically an excuse that you’re doing it terribly, because why isn’t that part of the CIO? Is data not part of information technology?

So my only point is – the John Willis answer is – you call it CIO or whatever you want to call it, but they all should be aligned. Why is security completely segregated? Compliance and risk is over here, the CISOs here, and CIO here – is security not information technology? Now, you pointed out that there are some requirements where they have to be firewalled, but then I go back to: John Willis doesn’t say get rid of the three lines of defense – I say we have to reframe the way we do things. So if I can’t change you structurally, I’m not going to get rid of the three lines defense, but I’m going to ask you until you kick me out of the building, “Why isn’t the second line in designer requirements?” every time I talk to you until you either tell me to get lost or you finally say, “OK, they’re going to start showing up, John.” So I think that there’s somewhere in that’s how you solve the problem, where it’s a hardwired regulation. You work around it by reframing the mindset and the collaboration.

But I think it’s quite an interesting concept as well, because I know some banks even in this room, their second line doesn’t report internally, it reports to the board as an independent control function, which makes a lot of sense. But it’s interesting that you would take information security as a control function externally, rather than an internal cultural thing that you need to do.

Yeah, part of the legacy of our company. I’d say five years into the 10-year DevOps journey, oh my goodness, we forgot to bring security along. Our industry talks about bleeding end information. I’ve seen CTOs at banks like Citibank say, “We need to be more like Google in like the third slide. Fifteen slides later they have a slide that says, ‘Do more with less.’ No, that’s not how Google does their business! They don’t do more with less! They hire incredibly expensive people. When a person tries to leave Google for a startup, they basically add about $1,000,000 to their yearly salary. So they don’t do more with less.

I was really surprised that the IT budget for places like JPMorgan. It’s incredible how much money they spend though, it’s more than Google.

So good friend of mine, I can’t say who it is, but when you fire up a backup IBM mainframe, you immediately have to write a $1,000,000 check to IBM. And by the way, there are products called Net View – thare millions and millions of dollars that go into legacy budgets. But yes, the big banks – JPMorgan, Goldman Sachs – Goldman have been trying to figure out Quantum for trading applications. They put an incredible amount of investment money into bleeding edge tech. I was at Docker and they were literally the first large financial institution that was going all in on figuring out how they can use containers for tier one trading applications. So they definitely do spend money.

Great, OK. So another question from an anonymous source. We have a whistle here in the room! So how do we overcome skepticism, resistance among non-tech stakeholders? You can’t imagine life without a CAB. I have some opinions!

It all goes back to trust. And there’s actually a couple of really good books written by Scott Prugh, who is part of the Gene Kim tribe of people. There’s a methodical way and it all comes back to you just having to create the trust model. And it sounds simple, but it could be what we’re talking about. 

One of the things I had to take out of the slide deck because I couldn’t do it in 20 minutes, what got me really interested in working with Topo Pal is back in 2017 – he was the first fellow at Capital One – he wrote a blog article about how they did their pipelines. This is 2017 and it is a great article out there – if you want it I can get a link to it, it’s still very relevant – he defined what he called 16 gates. And the idea was that they told the developers, “If you can provide evidence for these 16 things and we can pick it up, you don’t have to go to the CAB.”

So the first model is the way you get rid of the CAB is trusted data right. And there’s ways to create trust. I heard somebody say recently that their auditors don’t want to hear anything about SHAs or anything like that. What are they thinking about when they’re asking questions about funds?

Because that tells you it’s all encrypted. And if it’s not, they’ve got way worse problems than worrying about what we do, you know? So it’s how you frame things. If you go to a second line and you talk about SHAs, and crypto, and we use Vault to do this, you’re going to lose them. But if you try to explain it in a way that says, “The way we protect our system record data and data like our banking information is the same model we’re using.” That reframes that conversation to, “Oh, I get it. Yeah, that makes sense.”

I think we’ve got a question in the audience.

There was a comment just before you started this about the trust model, because I’m thinking that is what is important. If you skip the part about the governance coming down and we go back to DevOps, we need to have a little legitimacy. I think that developers need to have a mandate, or they need to feel a legitimacy to the auditors or the ones controlling them, that they can give away the data, they can give away the code, the 16 gates of trust kind of thing is really important. And I have an example, if you want to hear it. I wrote a master’s thesis on the security police in Norway because they had to do a complete reorg after the terror attacks that we had on 22 July a few years back. And my question to them was: how do you trust an organizational change? What they did was ask all the departments and department heads what they needed to work. And they all said more money, more people, and then I’ll fix it. And then they fired all of them. Literally, they had to apply for their own jobs. So the solution to all of this was that they asked everybody that worked on the very root level of the organization: what do you need to work? And they said, “Well, I need to talk to my colleague more. I need to sit in the same room. We need to establish the value chains from the bottom and then up.” So they did that and they did it all internally without any external company auditing them. And it’s a completely different matter.

Don’t even get me started on Dr Deming, because we will not end the day. But probably one of the greatest research projects of the 21st century in our industry is called Project Aristotle by Google. And they asked one question: how do you create great teams? And the single answer – although there was a ton of data, they talked to anthropologists, they talked to software engineers, they interviewed an incredible wealth to figure out this question – and the answer was psychological safety. 

And if you think about the umbrella of psychological safety, it includes everything you just talked about. Because if I’m a junior female worker in a corporation that’s been around for 30 years, that has a bunch of fat old men like me, can that person say, “I don’t think that’s going to work,” and not get, “You’ve only been here for a week! How would you know?!” A psychologically safe person would say, “We need to take a look at that.”

So I’m not saying for you, but it’s easy to say we need to collaborate. But you can’t have collaboration until you can take into account diversity and all these things that you can break down. And again, some of the best, strongest research that has ever happened in our industry comes out of something Google did. And there’s some really great resources for people who just track psychological safety. I think it’s the number one.

I’ll get on my metahorse – put me in front of the CEO and make me king for a day where they’re forced to listen to me, and there are two things I would tell them they have to do systemically in that organization.

One is systemically pervasive psychological safety throughout the whole company. And the second, I’d want them to pervasively create a systemic mindset around systems thinking. Those are the two things I would basically create and I tell you, everything else will fall into place.

Well, John, you have literally been here for a week! And in the interests of creating some psychological safety, it’s time for a break. So we’re going to break for 10 minutes I believe so, and then we’ll come back with some more talks. So see you soon.


ABOUT THIS ARTICLE

Published March 14, 2023, in features

AUTHOR

Stay in the loop with the Kosli newsletter

Get the latest updates, tutorials, news and more, delivered right to your inbox
Kosli is committed to protecting and respecting your privacy. By submitting this newsletter request, I consent to Kosli sending me marketing communications via email. I may opt out at any time. For information about our privacy practices, please visit Kosli's privacy policy.
Kosli team reading the newsletter

Got a question about Kosli?

We’re here to help, our customers range from larges fintechs, medtechs and regulated business all looking to streamline their DevOps audit trails

Contact us
Developers using Kosli