How Picsart Turned Pre-Production Alert Noise into Automated Reliability Improvements

Customer Story: Picsart

Turning Operational Signals into Continuous Infrastructure Improvements.

PicsartWorld’s largest AI-powered creative platform for photo editing, video editing, and graphic design. With over 130+ million monthly active users, it serves as a comprehensive tool for creating social media content, digital art, and marketing materials

picsart.com

1,000+ employees

Miami, FL (HQ)

Explore the OpsWorker Platform

Konstantin Lalafaryan

Chief Information Officer (CIO)

Picsart

Modern cloud platforms generate enormous amounts of operational data, but the real challenge is turning those signals into improvements. OpsWorker helped us convert operational findings directly into code-level fixes, allowing our teams to continuously strengthen the platform.

90%

reduction in low-value alert noise

100%

of prioritized alerts converted into investigation actions

60%

reduction in manual infrastructure fixes

30%

reduction in over-provisioned K8s resources

Picsart operates a large cloud-native platform that powers creative tools used by millions of users worldwide. The platform consists of hundreds of microservices running on Kubernetes and managed by platform engineering teams responsible for reliability, scalability, and operational standards.

To support developer productivity and platform governance, Picsart relies on Git-based workflows with GitLab and uses Backstage as a centralized developer portal and service catalog.

As the platform grew, so did operational complexity. Hundreds of alerts were generated daily in pre-production environments, making it increasingly difficult to distinguish between real issues and low-value noise. At the same time, platform teams were spending a significant amount of time performing repetitive configuration fixes and enforcing platform standards across repositories.

To address these challenges, Picsart partnered with OpsWorker, an AI SRE platform designed to transform operational signals into automated infrastructure improvements.

Converting Alerts into Infrastructure Fixes

Alerts become triggers for permanent system improvements, ending temporary band-aids.

Correcting misconfigured alert rules
Adjusting alert thresholds intelligently
Improving service resilience configuration
Resolving dependency-related issues

Explore this use case

Kubernetes Resource Optimization

Identify opportunities to optimize Kubernetes resource configuration based on historical behavior.

Adjusting CPU and memory limits
Configuring Horizontal Pod Autoscaling (HPA)
Introducing Vertical Pod Autoscaling (VPA)
Analyzing traffic spikes & workload patterns

Explore this use case

Platform Standardization Drift

Continuously analyze service configuration and compare it with established platform practices.

Propose PRs to align with Backstage templates
Maintain consistent standards across teams
Eliminate manual audits of repositories
Automate compliance enforcement

Explore this use case

Like many rapidly scaling cloud-native platforms, Picsart relied heavily on monitoring and alerting systems to maintain operational visibility.

However, over time the pre-production environment began generating hundreds of alerts daily. Many of these alerts were triggered by configuration issues, threshold misalignment, or temporary service conditions during testing.

This created a classic operational challenge: alert fatigue. When most alerts are not actionable, engineers gradually stop paying attention to them. Alerts lose credibility as signals, and teams begin ignoring large portions of monitoring output.

•Important alerts may be missed
•Pre-production issues remain unresolved
•Configuration problems propagate into production
•Platform engineers spend time manually triaging repetitive operational issues

Many alerts represented small but necessary improvements, such as:

•adjusting alert thresholds
•fixing service configuration
•improving resilience settings
•updating scaling policies

Implementing these improvements required engineers to manually investigate the issue, determine the fix, and create pull requests across repositories. This process created significant operational toil for platform teams.

Alert

AI Investigation

Root Cause

Pull Request

System Improvement

When an alert occurs, OpsWorker automatically investigates the underlying signals, correlating telemetry, infrastructure state, and service dependencies. If the system identifies a configuration issue or improvement opportunity, it generates a pull request containing the recommended fix.

Engineers can review and merge the change through their existing Git workflows.

To maintain governance and control, platform teams define which namespaces allow automatic pull request generation and which require manual approval suggestions.

Instead of ignoring alerts, the system converts them into actionable improvements that strengthen the platform over time.

Use Case 2: Kubernetes Resource Optimization

As Picsart’s platform expanded, managing Kubernetes resource efficiency became increasingly complex.

Different services had varying runtime characteristics, traffic patterns, and scaling requirements. Over time this led to inconsistent resource allocation across workloads.

Some services were significantly over-provisioned, consuming more CPU and memory than required, while others lacked appropriate scaling configuration. This created both infrastructure inefficiencies and potential reliability risks during traffic spikes.

Manually optimizing Kubernetes resources across hundreds of services proved difficult for platform teams. It required analyzing historical telemetry, understanding workload behavior, and adjusting scaling configurations across many repositories.

OpsWorker continuously analyzes:

historical resource utilization
workload patterns and traffic spikes
runtime behavior
service dependency patterns

Recommendations generated:

adjusting CPU and memory limits
configuring Horizontal Pod Autoscaling (HPA)
introducing Vertical Pod Autoscaling (VPA)

These improvements are delivered as pull requests, allowing platform engineers to review and merge resource optimization changes directly through Git workflows.

This approach allows Kubernetes resource configuration to evolve continuously based on real operational data.

Use Case 3: Platform Standardization Drift

As the number of services grew, platform standardization drift began appearing across repositories.

Without automated enforcement mechanisms, services gradually diverged from recommended platform practices. Differences appeared in areas such as:

configuration structure
scaling policies
resilience settings
service dependency configuration

Maintaining consistent platform standards across hundreds of repositories became increasingly difficult for platform engineering teams.

OpsWorker continuously analyzes service configuration and compares it with platform practices defined in Backstage and internal engineering guidelines.

When deviations are detected, the system proposes pull requests to align services with recommended configurations.

This allows platform teams to maintain consistent standards across the platform without requiring manual audits of hundreds of repositories.

Operational Signals

AI Investigation

Pull Request generated

Stronger Platform

Instead of treating alerts as temporary signals requiring manual intervention, the platform now converts operational findings into reviewable code improvements.

This creates a continuous improvement loop where the platform becomes more resilient over time, entirely driven by real operational data.

“With OpsWorker, Picsart transformed operational signals into a scalable mechanism for continuously improving platform reliability, efficiency, and engineering productivity.”

Ready to start?

Shape the Future of
AI-Driven SRE

Turn operational signals into continuous reliability improvements. Join forward-thinking engineering teams today.

Book a Demo Get Similar Results — Contact Us

Get your free trial

Use Case 2: Kubernetes Resource Optimization

As Picsart’s platform expanded, managing Kubernetes resource efficiency became increasingly complex.

Different services had varying runtime characteristics, traffic patterns, and scaling requirements. Over time this led to inconsistent resource allocation across workloads.

OpsWorker continuously analyzes:

historical resource utilization
workload patterns and traffic spikes
runtime behavior
service dependency patterns

Recommendations generated:

adjusting CPU and memory limits
configuring Horizontal Pod Autoscaling (HPA)
introducing Vertical Pod Autoscaling (VPA)

These improvements are delivered as pull requests, allowing platform engineers to review and merge resource optimization changes directly through Git workflows.

This approach allows Kubernetes resource configuration to evolve continuously based on real operational data.

Use Case 3: Platform Standardization Drift

As the number of services grew, platform standardization drift began appearing across repositories.

Without automated enforcement mechanisms, services gradually diverged from recommended platform practices. Differences appeared in areas such as:

configuration structure
scaling policies
resilience settings
service dependency configuration

Maintaining consistent platform standards across hundreds of repositories became increasingly difficult for platform engineering teams.

OpsWorker continuously analyzes service configuration and compares it with platform practices defined in Backstage and internal engineering guidelines.

When deviations are detected, the system proposes pull requests to align services with recommended configurations.

This allows platform teams to maintain consistent standards across the platform without requiring manual audits of hundreds of repositories.

How Picsart Turned Pre-Production Alert Noise into Automated Reliability Improvements

Key Results

Key Use Cases at Picsart

Converting Alerts into Infrastructure Fixes

Kubernetes Resource Optimization

Platform Standardization Drift

The Challenge: When Alerts Become Noise

This creates several risks:

Missed opportunities:

The Solution: Converting Alerts into Infrastructure Improvements

Use Case 2: Kubernetes Resource Optimization

OpsWorker continuously analyzes:

Recommendations generated:

Use Case 3: Platform Standardization Drift

Continuous Reliability Improvement

Shape the Future of
AI-Driven SRE

Get your free trial

How Picsart Turned Pre-Production Alert Noise into Automated Reliability Improvements

Key Results

Key Use Cases at Picsart

Converting Alerts into Infrastructure Fixes

Kubernetes Resource Optimization

Platform Standardization Drift

The Challenge: When Alerts Become Noise

This creates several risks:

Missed opportunities:

The Solution: Converting Alerts into Infrastructure Improvements

Use Case 2: Kubernetes Resource Optimization

OpsWorker continuously analyzes:

Recommendations generated:

Use Case 3: Platform Standardization Drift

Continuous Reliability Improvement

Shape the Future of
AI-Driven SRE

Get your free trial

How Picsart Turned Pre-Production Alert Noise into Automated Reliability Improvements

Key Results

Key Use Cases at Picsart

Converting Alerts into Infrastructure Fixes

Kubernetes Resource Optimization

Platform Standardization Drift

The Challenge: When Alerts Become Noise

This creates several risks:

Missed opportunities:

The Solution: Converting Alerts into Infrastructure Improvements

Use Case 2: Kubernetes Resource Optimization

OpsWorker continuously analyzes:

Recommendations generated:

Use Case 3: Platform Standardization Drift

Continuous Reliability Improvement

Shape the Future ofAI-Driven SRE

Get your free trial

How Picsart Turned Pre-Production Alert Noise into Automated Reliability Improvements

Key Results

Key Use Cases at Picsart

Converting Alerts into Infrastructure Fixes

Kubernetes Resource Optimization

Platform Standardization Drift

The Challenge: When Alerts Become Noise

This creates several risks:

Missed opportunities:

The Solution: Converting Alerts into Infrastructure Improvements

Use Case 2: Kubernetes Resource Optimization

OpsWorker continuously analyzes:

Recommendations generated:

Use Case 3: Platform Standardization Drift

Continuous Reliability Improvement

Shape the Future ofAI-Driven SRE

Get your free trial

Shape the Future of
AI-Driven SRE

Shape the Future of
AI-Driven SRE