Updating a mission-critical application often evokes a collective sense of anxiety knowing that any failure or moment of downtime can obstruct business. You need to make sure that your updates go off without a hitch. Pega recently put our near-zero downtime update process for Pega Infinity™ to the test with our Global Operations Console. With proper planning, rigorous testing, and the engagement of both Pega Cloud’s Site Reliability Team and Pega Product engineering, we were able to update our software from Pega Platform 8.5.3 to Pega Platform 8.6.0 in 14 business days and without impacting client operations.
What is Pega’s Global Operations Console
Pega’s Global Operations Console is the Pega Application we built to operate Pega Cloud. It is Pega running Pega. The Global Operations Console serves as the management plane for Pega Cloud that governs and enables our cloud service. These functions include:
- Access control for operations management
- Audit and conflict detection
- Resource management
- Provisioning
- Decommissioning
- Maintenance (Updates, patching, hotfixes, infrastructure, database, etc.)
- Software updates to add new and enhanced features
- Configuration management
- 24x7 alert monitoring
- Troubleshooting and diagnostics
- Client self-service (through My Pega Cloud)
When looking at the console’s role, it’s very easy to see that this application is the definition of mission-critical for Pega. If My Pega Cloud goes down, then all Pega Cloud clients cannot access self-service features, such as restarts or log downloads.
Updating Pega on Pega Cloud
Because the console is a Pega Cloud Application, just like any client application, the update process we follow is the same as the best practices we recommend to our clients. Updates to the console need to be completed with speed, quality, and without client downtime. This means we need to update the platform without impacting critical operations, like patch installs, platform updates, infrastructure updates, and restarts.
The team followed the following high-level steps:
- Request a clone of the staging environment,
- The Pega Cloud Site Reliability Team updates the cloned application to the latest version, in this case, from Pega Platform 8.5.3 to Pega Platform 8.6.0.
- The console team performs UAT on the updated, cloned application through automated regression testing.
- Any identified bugs in the updated application are addressed.
- The console team provides their Go/No Go decision to the site reliability team.
- The production environment is updated during the maintenance window.
- The lower environments (staging and development) are updated following production.
- The cloned environment is decommissioned.
What happened
The console application team kicked off the update process by requesting a clone of their pre-production (or staging) environment. Once the environment was available, they commenced their testing process. Testing on the clone environment identified three bugs that had a production impact. Two were local bugs that were addressed by the console team and retested. The second was an issue that was reported to Pega’s product management team. Like other client-reported bugs, this was repaired through a branch update to enable the update to move forward, and the fix was fully tested and incorporated into our initial 8.6 patch release (8.6.1). Most importantly, the update process was suspended while the bugs were addressed. Because the testing occurred in a cloned environment, rather than an existing development environment, the development team was able to continue work on their existing sprint unimpacted by the update process.
One of the most important objectives of updating the console was to ensure that Pega Cloud clients operated without impact. Client applications must remain available, and any patch installs, platform updates, and restarts occurred without impact during the update period. Pega Cloud’s Site Reliability Team, who is responsible for cloud operations, even performed infrastructure updates on Pega Cloud during the update period. Most importantly, Pega Cloud clients did not experience any production impacts.
During the update process the console users could create and read cases and all pre-scheduled activities on background processing continued without interruption.
Best Practices
Achieving seamless updates for mission-critical applications is an easily repeatable process when following proven best practices. Pega’s team followed these key principles through the process that helped achieve this result:
- Planning. It goes without saying, but keeping platform users informed of upcoming maintenance through in-application messaging was critical to prepare all users.
- Build a comprehensive user acceptance test plan, save it, and build on it as your applications evolve. Testing should include automated tests, exploratory testing, and manual testing.
- Delay feature work during the test period.
- Validate that the cloned systems are configured to work with external systems impacting your application.
- Halt the update until bugs can be addressed. Pega will work with you to take the best action to fix the issue immediately.
- Update using Pega’s clone environment methodology. This approach is partially enabled by the cloned systems that Pega Cloud provides for software updates. It has benefits both in terms of the duration of your update process, but also the effectiveness of development teams during the update. From a time perspective, updating lower environments first often means that you have a delay in the availability of the maintenance window to update production. Across Pega Cloud clients, those that elect to update lower environments first see a 200% increase in the duration of their update process. Second, developer productivity is not impacted by production-first updates because Pega regression tests releases to be backward compatible.
By following these best practices, you can make significant updates to your applications without impacting operations.
Visit Pega Community for more information about keeping your Pega solutions up-to-date, both for Pega Cloud Services and on-premises environments
Don't forget
-
JOIN THE CONVERSATION on Collaboration Center
-
FOLLOW @PegaDeveloper on Twitter
-
SUBSCRIBE to the Pega Developer Podcast on Spotify or via RSS