3 rule of thumb when it comes to automation

There is a race between human and robot (as seen below). Who is going to win this race. Well we all know that it will be robot. Now imagine the person is usain bolt. Who is going to win that race. May be usain bolt the first time. Lets race again. Who will win now. May be again usain. Lets race last time. Now who will win !! :):) Usain do get tired guys, trust me, Robot will win

New Picture

Very recently I had done the survey of big brands and agencies on the topic of devops and automation. What was very surprising was how they responded to couple of questions related to automation

First question, what is most important initiative for them when it comes to devops. Answer from more than 90% was automation.

Second question. What have they seen not working effectively in organisation. Answer from more than 90% was automation

Now that makes you think why most important initiative of the company is not working effectively.

After doing some more research on this and seeing how we are doing automation for some of the brands, here are my 3 rule of thumb when it comes to automation

  • First rule. Think of automation as automation with noops. Assume you have to do automation with no operations team to support i.e. if you are automating build and deployment then automate assuming there are no infrastructure people to execute any of the commands, if you are automating environment creation then assume it will be all with one click, if you are automating testing then assume there will no tester to execute those or validate the outputs You know what I am saying. Think NOOPS. This will make you to think differently in every aspect of automation
  • Second rule. If there is anything you are doing more than twice then consider automation. If there is anything you are doing three times then automate. You cannot go wrong if you apply this rule
  • Third rule. Now the trap, even after doing above, few people still do not succeed. This is because they see automation very manually :P:P. As a rule, please make sure you use all of the modern tools which are available to automate rather than going ahead and creating those automation scripts manually. If you do this manually then invariable the effort to maintain those scripts will outweigh the effort to do things manually in the first place. You can refer all of the modern day tools over here

As I am into an analogy of manufacturing a car, google’s self driving car is a great example of how all of the above rules are applied.

Car works without anyone seating in the driver’s seat (noops) and they have made use of all the modern tools and sensors to make it work. Imagine the kind of accuracy in automation which is needed to make sure it still works in traffic, in the tunnel , without gps signal, with road closures and the likes

Would love to know more thoughts if you have come across anything else in this space. Automation with noops is one of my passion, is that yours ??

Advertisements

How to build a platform which will survive Black Friday

I still remember George Colony from Forrester said ” Throughput and processing integrity will not be the key considerations; the magic will center on overall customer experience”. Is he right ?? Lets see

As you know, online retailers were struggling to meet the demand of the Black Friday in UK (28th Nov 2014). Websites of John Lewis, Tesco, Boots, Currys, Argos, GAME along with many others crashed or struggled within few hours of peak. Some of them came back while for others problem kept on persisting for almost till noon. Currys even went to the extent of creating queue’s and waiting time of hours before users were allowed in. In the “Age of customer”, where retailers conversion drops even if the response time for pages increases by couple of secs, it is odd that users have to wait hours.

All of these websites have great customer experience (yes there is a always a room for improvement) but may be there lack of enterprise architecture and stability was exposed.

This is where I believe, we need to challenge George. I believe the statement should be “Throughput and processing integrity will remain the key considerations; the magic will center on overall customer experience”. This magic is something which is enabled by combination of enterprise technology and creative technology

Here are some best practises and key considerations for making sure you can scale for peak

  • Always use marketing lever to control the traffic to the site. Like there is no need to increase paid search in the first half hour of the sale when you are any ways going to get traffic by word of mouth and hype created for this Black Fridays
  • Isolate your frond end i.e. websites from the legacy backend. Most of the places will have modern digital platforms connecting to legacy (sometimes mainframe) systems. These legacy systems do not scale and thus you create a bottleneck for your front end systems on how many orders they can take. You should be in a state where you can take orders and batch them with no legacy systems working including payments
  • Use your caching strategy cleverly. Please refer details on caching strategy. Make sure your cache TTL (time to live) are set as high as practically you can in the business scenario. You will need a different TTL strategy for peak days as compared to BAU. This will make sure your servers are processing only things it needs to process
  • Make sure you have a provision for dynamic bursting for your servers. Your architecture should scale horizontally. In case user load exceeds your projection, you should be able to burst into cloud or your test environments to make full use of the environments
  • Kill Switch for functionalities. You should be able to disable functionalities one at a time so you can create as much room as possible for things which matters
  • Most important thing. Monitor , monitor , monitor. Make sure you monitor real user timings and figure out if there is any impact to end user. You need some RUM tools for this. Worst case, atleast rely on synthetic user testing as it will provide some good insights. Also making sure you monitor your log files for 4XX erros is not a bad idea

In summary, you have lot of levers and with good enterprise thought through architecture, you can have good peak days and most importantly happy customers

Please let me know if you have found anything else which has worked for you

Smaller frequent releases

 

Normally I have seen that large organisations have such a heavy processes that they go for monthly or quarterly releases OR even worse, they release every 6 months. This means they are doing more in each release. Due to this, they keep on accumulating risk, rather than splitting the risk over many small changes

If you are not finding time to complete your definition of ready or definition of done with simple things like code reviews, design reviews, demos, unit testing and the likes than it is a symptom that you are not only doing lots of changes in a release but also trying to run multiple releases in parallel

You will be surprised that by just shortening the release window and reducing things in parallel, you can smoothen so many of those processes

release

Advantage of the shortening the release cycle and reducing things in parallel are simple

  • As you are doing less in a given release, you are not that much worried about regression cycles
  • You have a laser focussed team, who is working on making sure they propagate changes to production and take pride in seeing things working for end users
  • Doing an MVT for a new change is much easier, if we are dealing with 2 change as oppose to 100 changes

One of the biggest capability you need for frequently release is ability to deploy to production faster. Your deployment needs to be atleast 90% automated to go in this direction. Other thing to invest in will be testing automation but that is not mandatory as you are minimising regression risk by doing small changes

Please share your thoughts on your experience in this space

Engineering the “Delivery model”

Whenever we engineer any delivery team, process or structure, we need to measure success by two factors, outcome they have achieved and habits they exhibit. Let’s elaborate what I mean by both Outcome and Habits.

Outcome

When engineering the delivery model for your team, following three factors should be considered as an outcome

outcomes

 

Habits – 7 Habit of successful delivery teams

I have covered what are the 7 habits of highly ineffective enterprises. So we should certainly avoid those. Also refer to embracing change and be aware of interpretations

This are the 7 habits of highly successful delivery teams that we should embed in the teams

  • Work smart – Always think of innovative ways to reduce waste and focus on automating anything which you do more than 2 times
  • Focus on Quality – Focusing on the smooth flow of work  and delivering high quality code. Never compromise doneness at each step of the work flow. But please note that focus is on Minimum Viable Product MVP rather than trying to go for a rolls royce
  • Self organising – make their own decisions. Rely on coaching from leaders rather than direct answers
  • Takes pride in Finishing Work rather than working multiple items in parallel
  • Openness – Be open, honest and direct to build trust. This is the way you will improve and your team will improve
  • Collaborative – Focusing on team’s performance rather than individual performance
  • Data driven – Rely on data over opinion or subjective analysis

Engineering the delivery model

Let’s look at the different phases in the delivery model, what they are, what are the recommended practices and impact on other teams in the enterprise organisation

sprints

The 4 common phases we need to understand are Define, Deliver, Acceptance and Cutover or Go live

Define

“Design is a complex, iterative process. Initial design solutions are usually wrong, and certainly not optimal”
  • The deliverable is scoped out and designed at the Define stage. The process begins with a workshop, bringing together key stakeholders, including software developers, operations team, testing, business analysts, product owners, project managers and architects
  • This is where we capture primary business driver for the change. This is broadly called EPIC as per scaled agile framework model (Relationship is EPIC -> Features -> Stories)
  • All documents produced for a product will be cumulative i.e. existing product documents will be enhanced to incorporate  these new requirements. For new projects this would be brand new set of documents.
  • Goal is to define features for each EPIC in this phase

Deliverables

  • Technical approach detailing impact to high level design and other systems
  • Features in alignment with Minimal viable production for a release
  • Testing approach , if there is any change from normal
  • Business value
  • High level estimates/points
  • Dependencies with other product teams

Impacts to other teams

Environment teams – Plan new environment if needed depending on the non functional requirements

Operations team – Define any operational acceptance test scenarios and come up with a list of alerts which need to be set up

Security – Identify involvement at future phases including need for any additional testing in the hardening phase

Performance – Identify performance impact and work towards mitigating the same


Delivery

This phase consists of majority of the activity necessary to produce an outcome. Input to this phase is the output of the Define phase. The phase is executed as series of sprints. Each sprint builds upon the previous one, allowing work to be validated and progressed. The top features of the backlog are picked up and executed. Definition of MVP may/will/should change as part of this phase as you learn more from the user feedback

Sprints can be conducted using scrum methodology and follow scrum ceremonies like Sprint planning, daily stand ups, sprint retrospective , grooming and reviews. Multiple teams can be folded into scrum of scrums for managing cross product team dependencies

There should be a common notion of “done”. “Definition of done” is a transparent and mutually agreed list of criteria that must be adhered to for the  stories to be marked as complete

All major testing including integration, component, system test are part of this phase. Acceptance test (please see below) and non functional testing should be started in this phase in staggered manner with the aim to complete as much as possible in this phase. E.g. acceptance test for Sprint 1 must be complete by Sprint 2, for sprint 2 in sprint 3 and the likes. Last sprint is the only one which should remain to be covered in the acceptance phase for the release

Epics are converted into stories with acceptance criteria and then stories is further break down into tasks. Each task should be as measurable as possible. Like if it is code then we should measure code quality metrics, code coverage, unit test execution and the likes

Also definition of done should be broken down into several done’s so it is easy to monitor the progress

Team

For multiple location scenario, communication is one of the most important things to be taken into account. Talk talk talk. You can never communicate less. Avoid too many emails and make use of phone or skype or google hangouts

For multiple vendor scenario, make sure you create an environment where they work as one team. You really don’t want people to have their own agenda otherwise you will drift from the main objective.

Deliverables

  • Convert Features into Stories with acceptance criteria
  • Updated functional specification documentation, wireframe/VD. This is quite important for multiple location teams and more importantly this will also serve your knowledge base for support teams and sprint teams for future development
  • Updated HLD or equivalent technical documentations
  • Updated automated testing scripts
  • Working code

Impacts to other teams

 Environment team – New environment or component needed will be provisioned

 Operations team – Operational service documentation updated

 Security – Updated tools or test scripts. Completed the assessment to be fit and not something which will damage brand

“The best programmers are up to 28 times better than the worst programmers” 

Acceptance

This phase accepts the MVP that comes from the ‘Delivery In Sprints’ phase. The product goes through rigorous testing by users and Ops. It is also tested to ensure that it complies with non-functional requirements. Typical tests done in this phase include the spill overs of the acceptance tests and Non functional tests (e.g. Volume & Performance, Availability etc.) for the stories completed in the last sprints.

By absorbing the majority of tests into the preceding ‘Delivery In Sprints’, the process is accelerated. It can be further sped up through automation. In advanced teams, approval can be automatic, further shortening the cycle.

End users and BA’s must be involved in the acceptance/regression testing so that complete E2E business process is executed

Volume and performance testing is conducted on the E2E flow to understand any implications

A high degree of automation is essential to ensure that the application moves through this phase very quickly.

Typically this phase should not take more than 1-2 weeks


 

Cutover or go live

In this phase, the hardened application moves into the staging environment. The environment is prepared with data from production. The solution is ‘sanity tested‘ in the staging environment and certified mostly by operations team. The existing sanity test suite is enhanced to include the new features. In mature organisations, this phase can be completely automated, although a small number of random checks will be carried out before it is cutover to LIVE

If there is any roll back needed then this is also practiced in this environment

Impacts to other teams

Environment teams – Start setting up production environment for the release

Operation teams – Get ready for the release with new monitoring, alerts set up, configuring existing tools etc


Optimisations

soptimisations

Define: Define phase should be based on the principle of  “good enough”. Try and cover as much as possible in the sprints

Deliver: In order to get most benefit of agility, it is imperative to do majority of the work in increments

Acceptance : Acceptance phase and cutover phase combined should be less than a couple of days in extremely advanced large organisations. Rule is “If it can be automated then it should be automated”

 

 

 

 

Survey: Large organisations wants to embrace continuous delivery

Technology leaders from big brands and agencies took this survey and results were very interesting

100% of the leaders who are working for this organisations wants to adopt DevOps and continuous delivery. So as we always used to say… it is not a question of  “Why” but it is “How”

Best definition of Devops and continuous delivery was-

“DevOps is about getting everyone in the organisation (particularly Development and IT Operations) pulling in the same direction by (a) setting common goals and priorities, (b) sharing the responsibility for success and failure across the organisation, and (c) constantly improving communication channels, processes, infrastructure and tools to remove bottlenecks and allow a consistent flow of value to the business’ customers. Continuous delivery is the automation of the software development process from code construction through live production deployment. Simply stated, both DevOps and Continuous Delivery are about reducing cycle time, improving product quality and productivity – bring working, high quality software to production faster”

Other learnings on the subjective questions are documented as part of the Engineering the delivery model posts

This picture summaries statistical key findings

survey

 

 

 

Continuous delivery tools landscape

Since there is almost one tool released everyday in this space, it becomes impossible to keep up to date with the tools and there is a fear that real good tried and tested tools get lost in the midst of a newbie.

Objective of this infographic is to provide different categories of these tools and what are the most common tools in these categories which are used by enterprise organizations and thus are tried and tested.

If you are working in such organization and if you discovered a new tool which you are using and is working well then please do post your comments so I can update the infographic to include the same.

Description of each category (mostly from Wikipedia)

Code repository – A repository is a term used by most of the different source control tools to refer to the collection of source code. A source control instance can have many repositories. Usually a repository contains a project, or a group of projects that are closely related. Distinct projects would be a good example where you’d want to make use of multiple repositories.

Continuous integration – Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least

Static code analysis – Static program analysis is the analysis of computer software that is performed without actually executing programs (analysis performed on executing programs is

Code coverage – In computer science, code coverage is a measure used to describe the degree to which the source code of a program is tested by a particular test suite. A

Configuration management – In software engineering, software configuration management is the task of tracking and controlling changes in the software, part of the larger cross-discipline field of configuration management. SCM practices include revision control and the establishment of baselines. If something goes wrong, SCM can determine what was changed and who changed it. If a configuration is working well, SCM can determine how to replicate it across many hosts

Application lifecycle management – Includes Change management and Issue management

Monitoring – operational monitoring of the application by providing operation intelligence

Application Release Automation – Includes application release and deploy

Testing automation – In software testing, test automation is the use of special software to control the execution of tests and the comparison of actual outcomes with predicted outcomes

 

Maturity model

Maturity model helps you to see bigger picture and assesses where you are as an organization within the DevOps and Continuous Delivery space so that it is easier to assess how much more ground we need to cover.

This can also be used as a reporting tool as part of the process to understand if there are improvements made overall in the transformation.

There are multiple DevOps maturity model which are already defined and goal is certainly to not repeat these here. My goal is to provide more objective assessment and help in creating that action plan in a more interactive fashion.

Some existing DevOps Maturity models include:

Based on my personal experience and research, for large enterprises, there are 10 key factors to measure and most of the action plan will surround around these 10 factors.

Please download the maturity model and select your level in Column D for each of the 10 factors. This will give you the score. You can select the area for your organization which you want to include in the action plan based on the current state.

Please do not hesitate to get in touch if I could be of any assistance in this small exercise.

Maturity model 1 3