Skip to main content


I woke up in the middle of the night with this question grinding on me in a dream:

How to optimally allocate resources between operating security processes (i.e., applying patches) and process automation (i.e., automating the process of applying patches to minimize resource requirement).

Unknown parent

@risottobias it's like a whole field of econometrics for security resource planning is emerging before my eyes
in reply to Merry Jerry πŸŽ„πŸŽ…πŸ•Žβ›„οΈβ„οΈ

always automate something repetitive. It’s not necessary to do it straight away. But once the same process has been repeated 3or4 times manually, it should be clear how much it sucks the life out of a person, is error prone, and isn’t the best thing to spend their time doing. If we don’t see that at first it’s okay, just don’t put it off over and over, and don’t pay much attention to the voice that says effort spent automating a task is somehow not as valuable as the effort of doing a repetitive task manually over and over.
If it’s not a repetitive task, it can still be worth automating, especially if there are hundreds or thousands of steps that need to be done the same way with different parameters. It is after all why computers were invented.
The things we learn from creating automation are far more valuable than the things we learn from doing something manually over and over.
in reply to spmatich :blobcoffee:

@spmatich strongly disagree. The amount of times I have seen people spending weeks to automate something that takes 5 minutes is way too high. The number of systems that choose insane behavior that was easy to automate over correct behavior which was not, or product code bending over backwards to track things for β€˜automated UI tests’ that don’t actually test the UI, etc……..

Trying to boil down this kind of design and UX tradeoff to β€œalways do X” is just… mistaken, and if anything that it’s sometimes more β€œfun” to do the automating part rather than the manual part results in people usually going too far the other way…

in reply to Billy O'Neal

@malwareminigun @spmatich Not everything's worth automating, but ppl tend to under- rather than over-automate.

As an SRE, a surprising amount of value comes from looking at what the devs still do by hand and going "Really?!" until they're embarrassed.

It's best if adding automation helps put simple, programmatic interfaces on everything. This not only helps with future automation, but general scripting.

If you don't have that kind of environment, it can be miserable and unprofitable.

in reply to Billy O'Neal

@malwareminigun @spmatich I guess it depends on your role & workplace.

In my previous workplace as a dev, there wasn't much to automate. In retrospect in a couple of places it'd have really helped, but mostly meh.

In the same place, infra was a joke. To get a VM you'd raise a ticket, and a human would create it for you. A waste of my time and theirs. Definitely worth automating.

OTOH, as an SRE on an exponentially growing product, it was automate or drown. And it helped reliability.

in reply to Simon Frankau

@sgf @malwareminigun basically all cloud providers are is a collection of API endpoints driving automated create/update destroy operations on infra. All of their large customers use tools like terraform and infrastructure as code is the rule. CI/CD pipelines are just automation stacks. As a dev I find some of my most useful skills are in my ability to automate large number of tests on hundreds of thousands of slightly different payloads, and to replay those payloads in different environments to make sure the code works reliably. Then to zero in on the one in 10000 case where something blew up. So for me automation is not just about repeating something manually it’s about making stuff work reliably. it’s an important part of defensive programming.
I’m not here to criticise what others are saying. I’m advocating for having automation skills if you want to be valued in our industry, and to value those skills in others.
in reply to spmatich :blobcoffee:

@spmatich @sgf Sure, and I'm not saying that I don't value automation. If you need to do the same thing with subtle differences in 1000 places, absolutely, automate that.

All I'm saying is that it's an engineering / judgement call. 'It's automated or it does not exist' leads to the aforementioned brittle and useless UI tests, or stupid and wrong behavior that the system has because that was automatable while the correct behavior was not.

An example of the latter is systems that reassign a bug back to the person who opened it when the bug is resolved. In 12 years I can probably count on one hand the number of times that was the right person to close the bug. But 'assign to the opener' is easy to automate, so a lot of systems do that.

in reply to Billy O'Neal

@malwareminigun @spmatich

Oh, wow, yeah, I don't think anyone here is advocating for "automatically do things that people don't want and wouldn't do manually".

On the other hand, what's the alternative to bad UI tests? I would assume it's either "don't do them at all, they're useless", or good automation, and that "test the UI manually every time" isn't a good option?

in reply to Simon Frankau

@sgf @spmatich in the example I ran into, they were worse than useless. The only example I’ve heard of doing UI tests vaguely well is the chromium folks, but for them a 1 pixel difference is actually potentially breaking, so they effectively use screenshots.

For most folks I think the correct approach is to get as much of the product out of β€˜UI only’ as possible and back that up with a small number of manual sanity / smoke tests before release.

That does mean that there are parts of the product which are hypothetically untested, but in practice the likelihood of everything else passing but some UI part just being randomly disconnected is low.

(Which is exactly what we do in github.com/microsoft/vcpkg-too… )

If your product exists primarily for the UI, like a game, god help you :(. Though I note those kinds of things tend to just have lots of manual testing. It sucks, but the alternative of flaky tests can be worse…

in reply to Billy O'Neal

@malwareminigun @spmatich Games are an interesting one. Yes, there's a shed-load of manual testing, but they also care about a) reproducing found bugs & b) regression tests.

I've heard that many modern games are built for testability. That is, their core is built around an explicit kind-of event loop into which events are funneled, and from which output is generated. User input, network input, timing information, all unified, and the game logic runs deterministically from that...

in reply to Simon Frankau

@sgf @spmatich You can test several kinds of game logic, but you aren't likely to be able to test for "there's a visual glitch in this part of that level when you do XYZ".

It's part of why games often ship in a state that is so buggy :(

in reply to Simon Frankau

@malwareminigun @spmatich

Testers run with event logging on & logs can be replayed. Obviously this doesn't help with bugs outside that core loop, or e.g. memory corruption, but it handles a lot.

As a side effect, it allows time-travel debuggers, makes diff tools possible etc. So, while it's quite an effort to build, there are significant benefits.

Manual testing is still needed. Although in this approach you might be able to pretend it's "recording test traces". :p

in reply to Simon Frankau

If it's possible to boil a game down to an 'event log' then sure, that's effectively the 'get as much out of the UI as you can', but I guarantee you that shaders are not going through event logs.
This entry was edited (4 months ago)
in reply to Simon Frankau

@sgf @spmatich In an SRE context I've seen both really good and really bad automation. Really good cases that caught real bugs, and really bad cases that caught no bugs and actually *caused* bugs resulting from the extra instrumentation added for it in the product.

The point with automated testing is, if the test is broken, the likelihood that there is a bug in the product needs to be high. Automated tests that check real expected input and salient output of a component are great, and in a lot of web scenarios are relatively easy to do. After all, the entire user experience is going to be through the straw that is request/response and lots of things are likely to be structured.

Automated tests that, for example, check which SQL query comes out of a product, are garbage. They break in the face of legitimate changes, which makes folks question the legitimacy of failures of all testing. But in memory databases are hard, automating 'this exact SQL comes out' is easy, so lots of times people do that.

in reply to Billy O'Neal

@sgf @spmatich Another example: there was a bug in an installer that caused a critical DLL to not get deployed. A manual smoke test would have caught 'the product does not start' immediately. But the automated test harness came with the same DLL which papered over the problem.
in reply to Billy O'Neal

@malwareminigun @spmatich Owww! That's messy.

I guess nowadays you have a better chance of catching such things by running the installer inside a VM and having all the test gear live outside, but still.

I find this a really tough call, because "just add a few manual tests" soon adds up. I'd rather the takeaway be "ok, let's get really good at automated testing" rather than "we have to rely on manual testing", but I can see people making the other call.

in reply to Merry Jerry πŸŽ„πŸŽ…πŸ•Žβ›„οΈβ„οΈ

which is cheaper, and which reduces the largest risk exposures more effectively? Satisfying regulators is better for fiscal impact than stopping other types of threat actors.
in reply to Merry Jerry πŸŽ„πŸŽ…πŸ•Žβ›„οΈβ„οΈ

Automate, unless you have a compelling reason not to.

For example, last year we had basically 3 months to to out and replace our SIEM. That happened manually because there was no other way to make it happen. In the year+ since, we have very nearly completely automated deployment processes. If we had to do the same thing today it might take a week or two, and we have a goal of getting that down to a day by continuing to expand automation.

Also the primary benefit for us of automation is automated testing via pipelines that look for failures (preferably end to end testing). So our pitch is essentially that if you want that SOC available 24/7, you want to give us the resources to sink into automation so that we never deploy bad configs and disaster recovery is minutes instead of hours (it's literally hitting a play button in Gitlab). Side benefit: we basically never get 2AM calls from the SOC.

Unknown parent

ajn142
now granted, I’m arguing your example and not the entire class of problems your example represents.
Unknown parent

ajn142
@hotsoup that’s part of why I favor automating via a tool instead of building your own duct tape and baling wire solutions. SCCM/Intune/ManageEngine/etc., admins are semi-fungible, and any tool worth what you’re paying shouldn’t be breaking without constant maintenance.
in reply to Merry Jerry πŸŽ„πŸŽ…πŸ•Žβ›„οΈβ„οΈ

Let me drop another dimension into the discussion: Resource fluctuation with varying competence level. Process automation is essentially writing down the process in a closed language. This massively reduces the room for interpretation, which usually stems from the different ~competence~ experience levels.

To give a rather practical example: The opsec engineer, you could blindly trust to have everything patched, left the company to start their early farming retirement or for a competitor because the artificial head count limitation prevented you from reducing their workload. After half a year you finally get an entry level replacement fresh from college from HR for the same salary. When your process is written down (and accidentally also to be executed by a computer) it is waaaay faster and easier to bring the new person up to speed.

This is the reason why I massively like infrastructure as code. Yeah, we can also execute it and get our infrastructure. But I actually want to have it written down (at best even structured), so my forgetful brain and the new person can go back to the code and just read why we did it this way.

Additionally, it is way easier to identify which parts aren't DRY and which steps of the process are the bad or costly ones.

This entry was edited (4 months ago)
in reply to Merry Jerry πŸŽ„πŸŽ…πŸ•Žβ›„οΈβ„οΈ

xkcd.com/1205/, but that's assuming you want to break even in five years.
I guess the hard part is deciding this horizon :)

Also, in real life situations, you're capacity-bound, and need something like :
- a capacity to modulate available capacity, like, temporarily adding people to the team
- accepting the risk to not run the manual process as often as you should to free capacity, which may be dangerous

Very interesting problem space indeed :)

This entry was edited (4 months ago)
in reply to Merry Jerry πŸŽ„πŸŽ…πŸ•Žβ›„οΈβ„οΈ

Don't you have normal dreams like "I was making pancakes, but they were made of car tires and when I turned them over each one had the face of some kid I remember from 1st grade on it."
in reply to I Thought I Saw A 2

@ithoughtisawa2 the other dream I had last night which I can remember is being in the airport and a lady walking up to me and saying β€œdad, can you recommend a good restaurant for dinner with my friends?”
My reply was β€œI’m your dad? I have to talk to your mom about this. BTW, P.F. Chang’s is right off the train in terminal A.”

(Note, I do not have a daughter)

Other than that, my dreams are all being on planes that are crashing, having various parts of my house come alive and eat my family members, having to fix ungodly complicated machines, and so on. This one was one of the better dreams I have had as far back as I can remember.

⇧