Automated Solutions

Automation Engineer to the Core!

"Test Automation is the key to dependable mobile and web applications. I make it my business to have computers work for us and not the other way around."

Understanding State with Elixir Agents

I mentioned a few blog posts back how managing state during the process of programming can be a very difficult task. Next generation programming languages are now starting to build in standardized approaches to solving this state change problem in a more elegant way. I've found myself doing more and more Elixir lately and I am sold on the concept of Agents. I get to write waaaay less code to take an object, or a piece of data through different states without the hassle of figuring out the best way to do it. This blog post will show a quick example of what I mean.

A most common example of something that changes state all the time is the bank account! Bank account state change can be for better or for worse. Modeling a Bank account in Elixir is a great way to show how simple this state management process is, and I will use the Agent module to illustrate this.

For a basic Bank account we are going to need 5 functions. These functions are

  1. open - Opens the bank account
  2. close - Closes Bank Account
  3. withdraw - Takes money from bank account
  4. deposit - Adds money to bank account
  5. balance - Gives current balance of bank account

In an effort to not get too complicated I'll start with these. 

-Starting with a Module named BankAccount. - Open function Calls the Agent.Start/2 function giving it a function that returns the starting value of 0, as well as the name of the current module

-Starting with a Module named BankAccount.

- Open function Calls the Agent.Start/2 function giving it a function that returns the starting value of 0, as well as the name of the current module

I like to always give my processes names because I don't like manually passing around a pid to all the functions I want to use. I am after all an automation engineer, so wherever I can find a shortcut I will!! Lets write the close function!

-Puts a message to the user to let them know the accounts being closed - Then calls the Agent.stop/3 function which terminates a running agent. __MODULE__ references the current module & in this case it is BankAccount. The second param :normal is an atom that tells the process which mode it should shutdown in. Because its not blowing up, and the close will be requested by the user of the module it can be set as normal. The last Param is defaulted to :infinity and doesn't need to be there, but I listed it anyway so that it is clear what timeout option I'm looking for.

-Puts a message to the user to let them know the accounts being closed

- Then calls the Agent.stop/3 function which terminates a running agent. __MODULE__ references the current module & in this case it is BankAccount. The second param :normal is an atom that tells the process which mode it should shutdown in. Because its not blowing up, and the close will be requested by the user of the module it can be set as normal. The last Param is defaulted to :infinity and doesn't need to be there, but I listed it anyway so that it is clear what timeout option I'm looking for.

Its always necessary to have a way to gracefully stop a process and thats the whole point of the function above. In this case a user might need to close their account for whatever reason. This is a quick way to do it. Lets actually increase our net worth by building our deposit function.

-deposit takes one argument called amount.  -We then call out to Agent.update/3 and tell the agent we want to update the current state + the amount given on the BankAccount module

-deposit takes one argument called amount. 

-We then call out to Agent.update/3 and tell the agent we want to update the current state + the amount given on the BankAccount module

I absolutely love function guards in Elixir. It saves me from having to worry about having too much logic in the function body that guards against bad parameters. In this case, we want to make sure the amount given is an integer. Its important to note as well, that it doesn't make a whole lot of sense to make a deposit less than $1, so its also smart to add another guard against bogus negative amounts being entered. In a perfect world we would be done! Our bank accounts would just increase in value over time given the way our BankAccount currently works, but unfortunately, we have to provide a way for users to take away from our account (BILLS, BILLS, BILLS).  Let us painfully write this function next.

-once again we pass an amount but we need to make sure its more than 1. It doesn't make sense to allow any other type of value below 1 -Again we call on the Agent.update/3 function that takes the current state - the amount given by the user of the function.

-once again we pass an amount but we need to make sure its more than 1. It doesn't make sense to allow any other type of value below 1

-Again we call on the Agent.update/3 function that takes the current state - the amount given by the user of the function.

In my opinion this is a good use case for the if statement in Elixir. I am aware that you could do pattern matching to clean this function up, but I usually only resort to pattern matching when I'm expecting more than 2 possible decisions a program needs to make. In this case you either have enough money in the bank to make a withdrawal, or you don't. If you don't have money I want to raise an exception. This exception is important because if we ever wanted to add a supervisor for this BankAccount process, the reason for a sudden failure of the BankAccount would be made absolutely clear in the log output. Now we don't want to leave the balance of our account a mystery, so we need a way to display the balance. Lets finish up this module with the final function.

- Once Again relying on the Agent.get/3 to show the current state of the BankAccount Module and binding that value to current_balance reference - The last step is to take the current balance and display it to the user

- Once Again relying on the Agent.get/3 to show the current state of the BankAccount Module and binding that value to current_balance reference

- The last step is to take the current balance and display it to the user

Bill Gates utilizing this module would look something like this....

- Module interaction

- Module interaction

The output should look like this below:

-BankAccount output

-BankAccount output

This is pretty cool because the state is tracked on its own by the agent. All you have to do is give it values. When I first started understanding the true reasons for Agents, I began to see so many use cases for them, especially for qa automation. State can now be managed independently of normal program flow which is HUGE! In languages that don't have state management carefully designed into them you would not be able to scale that program properly without major bugs and issues cropping up that you had no idea were present. Both Elixir & Clojure side step that nightmare by creating a standard way to handle state manipulation through Agents. This way, you don't have to worry about writing tons of code to track and manage state yourself, or exposing too many other objects with details of how and why to update a certain state. This is a huge win for programming, and actually makes it much more pleasurable than it used to be.

 

 

How We Evolved our QA TestScripts into a Distributed TestService with Elixir!

I have recently joined the elixir bandwagon toward the end of last year, and now I have a personal mission to use this technology to advance QA testing into its most needed next phase. As I write this I do believe both Clojure and Elixir are the only two development ecosystems in position to advance QA automation effectively. In this post I'm going to explain how I've set up our system architecture at Spongecell using Elixir, Robot Framework, and a little imagination to build a robust Test Service known as DUEY.

There are 5 main parts that make up the full test service. Each part has a specific responsibility, and the cool part is users of the test service only need to know the commands to send to the test service in order to use it. That is it. This way there is no confusion, no extra training, and each part can be updated independently as our product changes, and as the ad tech industry evolves. Updates are able to happen without the user even being aware that they are going on. A visual representation of this architecture is presented below

Test Service Architecture

(A quick rant about how much of a god send Elixir is thanks to Jose' Valim) Our Test Service needed to be used by anyone or anything in the company. This was a huge requirement, and it could not have been done without the use of Elixir/Erlang's BEAM environment. With the use of these technologies we have been able to distribute our test automation tools to anyone regardless of their office location, or time zone. In addition to that, because this service is exposed to so many people and offices, its bound to get a bad request from someone or something, but thanks to the fault tolerance features, our service restarts and lets the user know that they probably need to choose from a list of appropriate requests. 

Here is how the architecture breaks down!

  • The User

As I said before the user can be anyone or anything. It can be a Jenkins Server, a QA tester, developer, or any random company employee. They decide internally what they want to test and they send that request to the the test service that is always up and waiting for commands from interested parties.

  • The Test Service

The test service which is built entirely in elixir using the GenServer behavior, and GenEvent Behavior modules, is designed to listen for many types of requests from mobile automation, all the way to api and web integration test requests. once it gets a valid request that it knows how to handle from a user it begins to carryout that action for the user. In order to do that there are many things that need to happen after that request that involve the other pieces of the architecture. The user never even realizes that these things are happening, nor should they.

  • The Robot Framework

The robot framework is a generic DSL that allows users to write test steps in a plain english like syntax. To find out more about it click here. This layer is used by the qa team. It allows all members to write their own tests without needing to worry about coding in a specific programming language. It also saves development from having to worry too much about implementing integration tests for their features because it is handled as a part of the QA cycle. The thing is for this layer to be effective the test logic that governs the english based commands on the robot level have to be implemented somewhere. This is where the Test libraries come in.

  • The Test Libraries

The test libraries expose themselves as modules that can be imported by the robot framework. This level is where all the heavy lifting happens. This is where I spend the majority of my time coding up generic steps that it might take to complete a specific task. This layer was originally done in python, but because of the python 2/3 versioning hell that ensued for us, all the logic has been moved over to Elixir. The users of the Robot Framework don't need to know how the functions do what they do, they only need to know the function module, and function name, along with the inputs. These functions are then exposed to the robot framework to be used by the QA members while making their test scripts.

  • The Data Manager

The elephant in the room for any tester is data setup. I side stepped this headache by building out a data manger that knows how to communicate with the test libraries and serve up  any test data that is needed to carry out a specific test task. The data source can be anything from Kafka, to Cassandra, to Mysql, to a generic endpoint, or a flat out csv file!!! The user should NEVER have to worry about data setup for a test. If that burden is ever placed on a user then you are missing out on the full benefits of automation. 

  • The System Under Test

This can be anything, from a web app, mobile app, or Adtag server. After each layer has done its job it now takes all that information to the system under test and performs its action. When the action is completed a response is sent back to the user saying all is ok, or there is a specific thing wrong.

This whole loop from request to response takes on average 2 to 5 seconds depending on how complicated the request is. We have cases of thousands of adtags being checked in just under 2 seconds at times!!! Here is a quick use case in IEX of what its like to use our new system to check live AD Tags on the web.

Screen Shot 2016-02-16 at 11.45.26 AM.png

Duey is simply a node that can take a whole bunch of commands. The checks above normally take less than 2 seconds when utilizing concurrency. It previously took 15 seconds when it was written in Python!!

Honestly, the sky is the limit with Elixir. As QA Automation Engineers we should really start investigating these next generation technologies, because automated test system requirement complexity will crush you without knowledge of these languages. I'm a huge fan of automation and automated systems, and to me Elixir gives you many tools straight out of the box that allows you to do this effectively while keeping up with the complexities of modern day system requirements. Your team will never know the complexity involved, all they will know is that integration testing is automated and way less of a pain than it use to be. Developers and stakeholders get rapid feedback, and the QA team gets satisfaction out of having a service that catches many failures in each product! As I stated before on Twitter, Elixir is now my go to tool for automation projects.

Stay tuned for my open source library called Robot Remote Server Elixir! This will allow you to execute Robot Framework  commands written specifically in Elixir!!

 

The Changing Face of Test Automation

Almost every object that has its own identity can be transient when it comes to its own set of behaviors. Nothing stays the same in the tech field. The same is true of Test Automation. This fact presents its own set of problems when trying to build out a solid test suite that will be useful for the company that initiates it. 

When testing anything, you need a control group. Control groups should stay the same so that you can derive a clear conclusion of your tests from your experimental group. If this is true, how do you make test automation work in a meaningful manner given things change so often? Further more, for successful automation to work it needs this experimental group to be tested well enough in order to avoid the death of any test automation project....False Positives*.

Engineers like certainty, as well as the consumers who use their products on the daily, whether they be software or hardware based. In the great pursuit of this certainty Test Automation has shown its face. It has become a pressing need for enlightened organizations in order to continue delivering software at lightning speed without a great reduction in product quality. The interesting thing is some companies have not realized the importance and the necessity of this need yet. 

What does this mean for the Test Automation Engineers? It means that we now have to consider these changes, and factor them into our test design decisions. It means as test engineers we have to approach solving these automation problems like mad scientists rather than regular QA testers verifying user functionality. It means that our testing algorithms have to become much more intuitive. Gone are the days where we could ensure quality based on basic Pass/Fail results of user features. Why? Because the software we release is no longer uninvolved like it was in the 90's, and those software features will likely change just as quick as they were written. The smarter our software gets, the smarter our testing techniques need to get in order to comfortably tell these companies that their product is ready for a mass audience.

To do this successfully it is a two step process. 

  1. Identify the control group (Why and What are you testing...)
  2. Design smart algorithms around your experimental group to verify the software under test

In my next blog post I will accentuate this procedure utilizing Real World examples.

 

The Automated Selfie.....

During test automation things are going to go wrong. This fact doesn't have to rain on your parade though. While working on different projects at HotelTonight I've decided to overhaul the way that we do error handling within a few of our key test suites. This post will focus on a mobile app failure scenario.

On average the mobile suite I've put together takes about 15 minutes to complete a few key regression tasks. I expect the amount of time the tests take to reach completion to increase in the near future. With test runtime increasing, and more and more complexity being added to the code base, the need for smart error handling shows its head.

There have been a number of times that tests would be running and a failure would happen. Debugging these errors became an absolute pain, especially if there was no-one around to see exactly what app state was like during those failures. To tackle this problem on the mobile side I decided to do two things.

1.) Make the errors that were being reported way more realistic and less vague.(More on this on the next post)
2.) Snap an image of what the app state was like when the error was encountered.

Both of these solutions together make for an amazingly smart tool for debugging automation failures. Debugging errors for us now have gone from tedious to being absolutely painless. The amount of time it takes for me or anyone else to know what happened has gone from 20 minutes of investigation time to less than 1 minute in most cases.

Here's a brief example....

Lets say that we have some kind of an outage during test run time. These outages can be caused by anything from db migrations, bad api requests, all the way to just plain silly mistakes made. Lets also say that during this time I went out for a cup of coffee while the tests were happily doing their thing. When I return I see something like this in my test report report......


We can easily see what type of error it is, but we have no idea why this was thrown while trying to search for a random city on the app. Luckily, the automated selfie was in place. Lets checkout the screen shots directory....


BINGO! We've got a little bit more information. The time date and month as well as the exact test action it was performing. That tells me when, but I still need a little more info on what the app witnessed. If you open the screenshot .png we see this...



Now this is more like it. We clearly see exactly what the test saw while it was running. We had a failure during our same day booking test at the exact time of the screenshot, and we see exactly what the app was showing during the test run. This technique makes it much easier to approach a group of engineers about failures. This information gives us a great place to start investigating.

How does all this work using Appium? Just a few lines of code to do the trick in the rspec spec_helper.rb. Here is the code that makes this all work...


In short, after each scenario that has an exception take a screenshot and place it into my specified directory. I've also added some time formatters and test descriptions as well to make the reasons for the images much more clearer.

As you can imagine this folder will grow and grow with more and more failures happening. The last thing I want is a directory full of failed test images that I no longer need to reference. So I've written a task that clears them out called "rake clear_screenshots". Here is code example of that...



Once this is run you'll see the directory below is now empty again, ready for the next test run.





The Importance of Test Automation

During any release of a major product the testing cycle becomes increasingly important. Automation is not the end all be all for verifying software quality, but it sure does the trick for minimizing load on test teams. As I mentioned before at HotelTonight we utilize automated test strategies for a number of our behavior verifications. One question I've heard repeatedly in the past is "How does one go about determining what tests should be automated?" More importantly, "How do you implement automation into your test strategy once you've identified what to automate?" I'm going to use this blog post to answer both of those questions by taking a real world scenario, and illustrating how one could use automation to address it.

Within the mobile app for HotelTonight, our objective is to give our users the best options for booking last minute hotels. The thing is these hotels don't only encompass US options. We also will need to serve users hotel options in other countries. Serving users in other countries means that we need to deal with other time zones. If our market opens at say... 9am for our users, then thats a piece of cake to test manually from a San Francisco users perspective. Its not until you think about 9am local time for a specific user that is not under the pacific time zone scenario when testing this behavior gets a bit complex.

Lets also say that QA is given the task to check whether or not our perspective markets are showing our users hotels when the markets are open, as well as showing the correct message for our users during times when markets are closed. QA will ideally get a list of those cities opening and closing times and then go through the app and check if the app is showing the correct results at the appropriate time in relation to the specific time zone.

As you can imagine this could turn into a tedious process. After all, there are multiple cities on the planet, and there are multiple time zones. The US alone has multiple time zones that would need to be considered. Running through these time zone checks every time market opens to verify that inventory is available everyday for our users can be cumbersome for QA, especially if some of those markets open at 12am, or 1am pacific time!!! This sounds like a great candidate for automation.

How do we implement test automation to tackle this problem?  The answer is broken into these steps.

  1. Gather a list of Major Cities that represent different time zones (Paris, London, New York, San Francisco)
  2. Determine how to represent the 9am - 2am market open and close window (Relative to where the tests will be running. In this case in the pacific timezone).
  3. Most importantly, what should the automated tests expect to see during their execution of these behaviors? If the Market is closed, the user needs to see that appropriate message. If the market is open then they better see hotels, or we have a problem!
Using Appium to host the app, and the power of RSpec to drive the tests, lets look at what our automation logic looks like for the 4 cities listed above. Keep in mind the times are based on a 24 hour format from the pacific time zone perspective. This keeps the tests valid even during daylight savings time changes! 


As you can see we are searching for hotels in our 4 major cities I've chosen. The catch here is these automated tests will run at any time, so they need to be smart enough to handle these open/close time windows for each city. In addition to understanding the time window the tests also need to be smart enough to know that sometimes it will see a different message based on these time windows, especially if the tests run during hours that a specific market is closed for that city. These tests can now be added to a scheduler and run anytime of the day, seven days a week with no human interaction whatsoever! Check out how these automated tests look in action below!!