Load, performance, and stress testing considerations
Testing to validate scalability
When designing and building tests to validate an application's scalability (i.e. load, performance, scalability, stress tests, etc.), there are a few important considerations that you should review. In this article we have detailed some of those considerations, though not all will be applicable in every situation.
1. Identify goals for the test
It is important to fully identify the overall goal of each test, rather than just assume that all participants have the same understanding. For example, load and stress tests are not the same — their goals are different, and your approach to them must be different.
Load tests typically have a goal of ‘steady state’, meaning that they are designed to represent common scenarios that are experienced during day to day usage of the application under test (AUT).
Stress tests, on the other hand, are designed to test the breaking point of the AUT (e.g. let's keep adding more and more users until the AUT is no longer functioning). This type of test can be used to ensure the AUT is future proof. For example, can your system handle the extra load if you need to add 5,000 additional users to the system in 6 months due to an acquisition?
Due to the differing purpose of different kinds of tests, documenting the goal of each test is critical. You may also run iterations that test different aspects, which may have unique goals. Factors such as success criteria (e.g. did we meet our service level agreements under our typical load?), environment details, and more should also be documented to ensure that all participants have the same understanding as to why you are conducting the test.
2. Match the test environment to production
While it might not always be possible due to costs and resources, the test environment for scalability testing should be as close to that of production as possible. Many factors contribute to the performance of an application, so the closer the production environment can be represented in testing, the more accurate the test will be. This is not the case for other types of testing, such as functional testing.
When building the test environment, it's not only hardware "power" that should match production. Application, operating system, third-party factors (e.g. database/Java version), and test data should be reflective as well. When the production environment can't be matched, the differences between production and test should be accurately documented.
Note that making assumptions about results when using an environment that is “smaller” than production is not going to provide accurate results. For example, if you build an environment that is roughly half as powerful as production in terms of computing power/data, it doesn’t mean that you can take the results of the scalability test and double them to predict what will happen in production. The relationship between components is complex, with small changes often having big effects. Thus, the closer you can get to production, the more accurate the test.
3. Identify the top 5-7 transactions
There are often many different paths through an application, and attempting to test them all is unmanageable. However, on a day-to-day basis there tends to be between 5-7 transactions that are used most often. This list of 5-7 transactions is what should be represented in the load test, though you may test those transactions under different scenarios. For example, you might test “adding an order” under load conditions as well as under stress conditions, but the steps of the transaction should remain the same.
Identifying which 5-7 transactions (no more than 10) to test can be done in multiple ways. If it’s a brand new application, the business analyst or product owner will often scope the top transactions based on use case data gathered when designing the application. Note that once the application has been released, it’s important to review these transactions to see if they are actually representative of usage. For example, the business analyst may have determined that a user will go from screen A to B to C and build a load test script to represent that. However, if usage data shows that users actually go from screen A directly to C, then the script should be updated to reflect this.
Top transactions for applications that are in use can often be identified by mining data from application monitoring solutions. These systems will provide real data on where users are spending most of their time by counting the transactions/screens over a given period of time. The "high usage" transactions become the candidates for scalability testing.
4. Run different tests for different cycles
When looking at the usage pattern of a given application, you may find that it is subject to periodic scenarios that require different tests. For example, a retail company may conduct two tests for the top transactions, one to represent normal day-to-day usage and one to represent the additional load the app will experience during the holidays. Both scenarios are valid and happen throughout the course of a single year and should be tested independently. Other industries and applications have similar peak periods (e.g. medical = open enrollment, financial = close of year).
These cycles are not only yearly; cycles can happen daily, weekly, and even monthly. For example, a banking application might see a spike in traffic at lunch time, when end users check their accounts. User timezones can have an effect as well. If your user base is in two different timezones, you may see an uptick in load at the beginning of each timezone's business day. These can also help define test scenarios.
5. Choose the right load testing solutions
There are many different load testing solutions offered by various vendors. Choosing the right one for your load test is important.
Paid solutions, such as Microfocus' LoadRunner, offer extremely rich features and diagnostics. They can also can be configured to be highly representative of real scenarios, but come with licensing costs.
Open-source solutions, such as Gattling or JMeter, may have limitations (such as platform support) but can often achieve the goals of your testing.
6. Conduct transactions “as a user would”
When creating the transactions for tests, it’s important to collect and present how users actually use the system. Often, as users navigate through a business transaction, they will require some "think time". For example, when a user arrives on a page, they don’t instantly click the link to the next page — they may take 2-3 seconds to find and click the right link. This "think time" should be included in the load test. This will spread the load in a more realistic fashion and better represent real usage.
It’s often said that removing “think time” and caching will simulate a larger load test, but this is not true. Removing these to run the transactions as fast as possible will not predict higher user counts. You can't assume that running a ten-user load test without including factors such as "think time" to be the same as running a 150-user load test. There are too many elements that can change under real scaling conditions.
Obviously, there are exceptions to this rule. For example, if you want to measure the difference in transaction times between a single node and a multiple node cluster. As long as the goal is to see that a single node is slower than a cluster and the actual transaction times don’t matter, then the test is valid.
7. Ramp slow and let the environment stabilize
For tests that are measuring either a certain load or stress testing, it’s important to ramp in increments and to let the environment stabilize before ramping up again. Many systems employ some level of caching of data that can greatly effect performance. Measurements of the original non-cached transaction and the cached one can vary greatly.
A scalability plan should factor in these considerations. For example, for a 5000-user load test, you may first ramp to 250 users and run at this state for ten minutes so that data is properly cached. Then, after ten minutes, you can ramp to 1,000 users and begin the process again.
This approach may also be affected by the first point above, the goal of the test.
8. Monitor to find the root causes of issues
While the goal of a scalability test is to simply test the performance under certain conditions, if the test fails, it’s important to identify why and where the problem occurred.
The same application monitoring solutions that are used in production should be used during the test. This will allow for root cause analysis to be performed if the goal of the test is not met.
If you're using a paid load testing solution, the data from the monitoring solutions can often be correlated with that of the load testing solution, making root cause analysis significantly easier.
9. Represent your user base geographically
Most applications have users that are spread over a wide geographic area. Users from remote locations will have different experiences than those that are in the same building as the AUT’s hardware.
Factors such as latency, tunneling through a VPN, additional network hops, etc. can greatly affect how a remote user experiences the system. Therefore, it’s important to represent these remote locations during testing.
Scalability testing solutions often provide small agents that can be installed at a remote location. During the scalability test, testers can assign a portion of their users to be run from the remote agent. You can then send and execute the scalability scripts at the remote location and measure the results there. This means that you're able to get the difference in transaction times between a local location and a remote location.
10. Test early and test thoroughly
Scalability testing is generally performed towards the end of an application's build lifecycle, just before its release to production. However, issues uncovered during this late-stage testing can often cause delays to production release. Smaller and more focused scalability testing conducted during development can often uncover issues early in the cycle.
Smaller load tests (less than 30 users) can be built into the development process to test individual components and integrated transactions. DevOps teams typically build these tests in separate pipelines. 
Additional solutions may need to be used to help replicate other components that aren’t yet built. For example, if component A calls component B but component B is still under development, component B can be virtualized using appropriate technologies to simulate realistic responses without the need for the actual component.
While it’s still important to run a full scalability test once changes to the application are completed, these smaller scale tests can help you uncover and remediate issues early on.
Was this content helpful?
Connect, share, or get additional help
Atlassian Community