Customer Service - The Applause Quality Score

For brands that look to consistently measure and improve the quality of their releases over time, Applause has developed the industry’s first and only quality benchmark built for the enterprise. With AQS it is easy to see how your quality is trending over time and to determine which factors are impacting your release quality. Leveraging 10-plus years of historical quality data along with your testing results, we create a customized quality score for each build your company releases. You can view analytics, identify trends and verify coverage to make informed decisions.

To completely utilize the score when evaluating a build, you should take into account three components: the Applause Quality Score (AQS), its matching Confidence Level (CL) and the underlying test result data both are based on.

What the Applause Quality Score is

The AQS is a calculated value ranging from 0 to 100 which describes the quality of testing results for a product or build during one or more test cycles based on testing done and results collected.

How AQS Helps You

With AQS, development teams can understand the level of quality they achieve build-over-build, helping them make a data-driven decision for when a build is ready for release. The purpose of the AQS is to help empower you to make critical release/don’t release decisions in a much more fact-based quick and easy manner.

AQS Scores

AQS provides three distinct, weighted scores for users to assess their software quality:

Structured testing score (Test Cases) - This score reflects the quality of your build based on the results of structured test case runs. With test cases covering critical user flows, a poor score here can indicate when you’re suffering from regressions and customers are at risk of poor experiences that impact the bottom line. The Test Cases score is only available to customers who use Applause for structured test cases
Exploratory testing score (Issues) - This rating assesses issues testers discovered outside of structured test cases and how those issues are triaged to make informed release decisions and potentially improve testing strategy. As exploratory testing incorporates testing unexpected user flows, this rating gives you perspective on the ways customers will actually use your product. To receive an Issues score, you must conduct exploratory tests with Applause
Combined (Quality Score) - For users with access to both the Test Cases Sub-Score and Issues Sub-Score, the combined Quality Score calculates a weighted score relative to the testing profile of the product or the amount of structured or exploratory testing under assessment. Thus, the combined score provides a blended assessment of quality

Read on to learn more about how the Applause Quality Score ratings work, how you can use AQS to make informed decisions, and how to access your information via the Applause Product Excellence Platform.

How AQS Works

Just as several bureaus provide credit scores that offer an overall picture of a person’s credit, the individual and combined AQS ratings give you a customized view of software quality, including details to identify weak points or negative trends.

The AQS leverages sophisticated data science models based on several factors for both structured test cases and exploratory tests. The AQS algorithm combines the subjective and objective areas of testing, with heightened value placed on the results that are most relevant to you and your customers, to provide a robust, quantitative assessment of digital quality based on historical and current data.

How Structured Test Cases are Calculated

AQS analyzes structured test cases according to:

Pass/fail status of the structured tests - The goal is for all structured tests to pass, but the algorithm takes into account the types of test cases that fail, as well as the overall coverage of the structured tests
Failed regression tests - If a test passed on a previous build, but failed on a subsequent build, it has a greater impact on the rating because it affects existing functionality. Introducing issues to existing functionality is more problematic than failing to introduce new functionality
The number of test cases executed - Every build might not have all test cases executed every time. Our algorithm takes into account the number of test cases that were executed, and adjusts to ensure your Test Cases Sub-Score is not artificially inflated by conducting fewer tests

How Issues are Calculated

For the exploratory testing (Issues) sub-score, the algorithm assesses:

Bug rating, as defined by the severity and value of the issues - The project team assesses every bug’s severity and value as part of their triaging process, and then you have the opportunity to mark its priority. The algorithm uses these categories to identify which bugs have a greater impact on that build’s score, and reflects that priority in the score
The number and validity of those collected issues - If you mark an issue as verified and fixed, it has a less severe impact on the rating than an unfixed and/or unverified bug
A reasonable estimate of the number of bugs against Applause benchmarks - The algorithm adjusts the rating according to the number of bugs it would expect to be discovered based on our database of results from all Applause customers. This allows for an accurate comparison of results specific to the life cycle phase, product type and the industry it serves. As you run more exploratory test cycles with Applause, we continually add that data into our database, enabling customers to receive a more meaningful AQS rating over time
The relative number of bugs in a build against your own product’s benchmarks - By observing historical data for your specific product, the algorithm estimates an expected number of bugs for your build, which it can then compare to past results at different life cycle stages, release scopes and development stages
Duration of testing, from start to finish - This data, compared to past durations, enables the algorithm to adjust for deviations in the time spent testing. If the build was allotted less time for testing than previous builds, our algorithm adjusts to ensure a meaningful score

Results Calculation Summary

As you can see, the Applause Quality Score does not judge all bugs the same.

A more severe or valuable bug (i.e. a bug that causes the app to crash or significantly impacts the user’s experience) will lower the Issues score more than a less severe or less valuable bug that has minimal significance to the app’s functionality or user experience
Likewise, a higher number of failed regressions than usual will result in a poorer structured test score. AQS calculates quality scores in near real time

How AQS is Presented

While based on sophisticated data science models, the end results are as simple as a single metric displayed across three ranges:

High - between 90 to 100
Medium - between 66 to 89
Low - between 0 to 65

How to View AQS for Your Product

Log in to the Applause platform and navigate to “Products”.
Locate the relevant product from the products list.
Select the product name.
The activity dashboard for the product will be displayed, offering information general to the product, as well as per-build “cards” containing the build’s AQS, CL and issue distribution at a high level.
To drill into more details about a specific build, click on the “See more details” link at the bottom of the build card.

Ideas to Think About as You View a Certain Build’s AQS

Evaluate the build’s AQS against the above objective ranges. Naturally, you may be aiming as close to a "perfect score" as possible
Since the score is calculated out of the scope of testing done and results obtained, make sure to also understand the nature of reported issues, their severity and value, type, components and status. Over time you’ll be able to identify irregularities that will direct you to better identify root causes
As not all products and builds were born equal, it is also imperative to evaluate the build’s score to those of preceding builds; a specific build might not generate a “high” score from the beginning, thus a steady, positive trend is certainly a valid goal to maintain
As you review the build-over-build trends, try to bring forth information not entirely known to Applause about these builds, such as testing done outside of the Applause platform, personnel changes and other interfering factors. As we keep enhancing the data science and machine learning models behind the AQS, it will become more effective and accurate to account for such uncertainties and others

What the AQS Confidence Level (CL) is

Next to the Quality Score, you will also find an indication of the level of confidence we have in our calculations. The Confidence Level (CL) for an individual build is based on the scale and scope of the testing conducted for the build such as duration and coverage, as well as the breadth of historical data collected on previous builds of the product or app.

How AQS CL Helps You

Reviewing the CL is key in transforming AQS into actionable insights. As the scope of testing is changing while the build is being tested, considering the CL will help you understand how reliable the AQS for a given build is at a given moment in time.

While a low AQS may result in you deciding to hold off from releasing the build (for instance, because it’s too “buggy”), a low CL may result in you deciding to intensify the testing to allow more data to be collected – longer, across more devices or regions, involving more individuals, etc.

How AQS CL is Presented

Once calculated, the CL is presented as one of three available values:

High
Medium
Low
Very Low