OpsWorker
ProductCustomers
Resources
Company
Sign upBook a Demo
← More Stories

How Picsart Turned Pre-Production Alert Noise into Automated Reliability Improvements

Customer Story: Picsart

Turning Operational Signals into Continuous Infrastructure Improvements.

Share:
Picsart
Picsart
PicsartWorld’s largest AI-powered creative platform for photo editing, video editing, and graphic design. With over 130+ million monthly active users, it serves as a comprehensive tool for creating social media content, digital art, and marketing materials
picsart.com
1,000+ employees
Miami, FL (HQ)
Explore the OpsWorker Platform
Konstantin Lalafaryan

Konstantin Lalafaryan

Chief Information Officer (CIO)

PicsartPicsart

Modern cloud platforms generate enormous amounts of operational data, but the real challenge is turning those signals into improvements. OpsWorker helped us convert operational findings directly into code-level fixes, allowing our teams to continuously strengthen the platform.

Key Results

Measurable improvements across platform operations.

90%

reduction in low-value alert noise

100%

of prioritized alerts converted into investigation actions

60%

reduction in manual infrastructure fixes

30%

reduction in over-provisioned K8s resources

Picsart operates a large cloud-native platform that powers creative tools used by millions of users worldwide. The platform consists of hundreds of microservices running on Kubernetes and managed by platform engineering teams responsible for reliability, scalability, and operational standards.

To support developer productivity and platform governance, Picsart relies on Git-based workflows with GitLab and uses Backstage as a centralized developer portal and service catalog.

As the platform grew, so did operational complexity. Hundreds of alerts were generated daily in pre-production environments, making it increasingly difficult to distinguish between real issues and low-value noise. At the same time, platform teams were spending a significant amount of time performing repetitive configuration fixes and enforcing platform standards across repositories.

To address these challenges, Picsart partnered with OpsWorker, an AI SRE platform designed to transform operational signals into automated infrastructure improvements.

Key Use Cases at Picsart

How OpsWorker drives value across the engineering organization.

Converting Alerts into Infrastructure Fixes

Alerts become triggers for permanent system improvements, ending temporary band-aids.

  • Correcting misconfigured alert rules
  • Adjusting alert thresholds intelligently
  • Improving service resilience configuration
  • Resolving dependency-related issues
Explore this use case

Kubernetes Resource Optimization

Identify opportunities to optimize Kubernetes resource configuration based on historical behavior.

  • Adjusting CPU and memory limits
  • Configuring Horizontal Pod Autoscaling (HPA)
  • Introducing Vertical Pod Autoscaling (VPA)
  • Analyzing traffic spikes & workload patterns
Explore this use case

Platform Standardization Drift

Continuously analyze service configuration and compare it with established platform practices.

  • Propose PRs to align with Backstage templates
  • Maintain consistent standards across teams
  • Eliminate manual audits of repositories
  • Automate compliance enforcement
Explore this use case

The Challenge: When Alerts Become Noise

Like many rapidly scaling cloud-native platforms, Picsart relied heavily on monitoring and alerting systems to maintain operational visibility.

However, over time the pre-production environment began generating hundreds of alerts daily. Many of these alerts were triggered by configuration issues, threshold misalignment, or temporary service conditions during testing.

This created a classic operational challenge: alert fatigue. When most alerts are not actionable, engineers gradually stop paying attention to them. Alerts lose credibility as signals, and teams begin ignoring large portions of monitoring output.

This creates several risks:

  • •Important alerts may be missed
  • •Pre-production issues remain unresolved
  • •Configuration problems propagate into production
  • •Platform engineers spend time manually triaging repetitive operational issues

Missed opportunities:

Many alerts represented small but necessary improvements, such as:

  • •adjusting alert thresholds
  • •fixing service configuration
  • •improving resilience settings
  • •updating scaling policies

Implementing these improvements required engineers to manually investigate the issue, determine the fix, and create pull requests across repositories. This process created significant operational toil for platform teams.

Georgy Khachatryan

Georgy Khachatryan

Head of SRE

PicsartPicsart

The most valuable shift was operational discipline. Alerts are no longer something we ignore. They now trigger investigations and improvements that are captured as pull requests, making reliability improvements part of our engineering workflow.

The Solution: Converting Alerts into Infrastructure Improvements

Picsart integrated OpsWorker into its operational and development workflows to transform how platform reliability improvements are discovered and implemented.

OpsWorker connects to Kubernetes environments, monitoring systems, Git repositories, and developer portals to continuously analyze operational signals and propose improvements.

The core operational workflow introduced by OpsWorker is simple but powerful:

Alert
AI Investigation
Root Cause
Pull Request
System Improvement

When an alert occurs, OpsWorker automatically investigates the underlying signals, correlating telemetry, infrastructure state, and service dependencies. If the system identifies a configuration issue or improvement opportunity, it generates a pull request containing the recommended fix.

Engineers can review and merge the change through their existing Git workflows.

To maintain governance and control, platform teams define which namespaces allow automatic pull request generation and which require manual approval suggestions.

Instead of ignoring alerts, the system converts them into actionable improvements that strengthen the platform over time.

Use Case 2: Kubernetes Resource Optimization

As Picsart’s platform expanded, managing Kubernetes resource efficiency became increasingly complex.

Different services had varying runtime characteristics, traffic patterns, and scaling requirements. Over time this led to inconsistent resource allocation across workloads.

Some services were significantly over-provisioned, consuming more CPU and memory than required, while others lacked appropriate scaling configuration. This created both infrastructure inefficiencies and potential reliability risks during traffic spikes.

Manually optimizing Kubernetes resources across hundreds of services proved difficult for platform teams. It required analyzing historical telemetry, understanding workload behavior, and adjusting scaling configurations across many repositories.

OpsWorker continuously analyzes:

  • historical resource utilization
  • workload patterns and traffic spikes
  • runtime behavior
  • service dependency patterns

Recommendations generated:

  • adjusting CPU and memory limits
  • configuring Horizontal Pod Autoscaling (HPA)
  • introducing Vertical Pod Autoscaling (VPA)

These improvements are delivered as pull requests, allowing platform engineers to review and merge resource optimization changes directly through Git workflows.

This approach allows Kubernetes resource configuration to evolve continuously based on real operational data.

Use Case 3: Platform Standardization Drift

As the number of services grew, platform standardization drift began appearing across repositories.

Without automated enforcement mechanisms, services gradually diverged from recommended platform practices. Differences appeared in areas such as:

  • configuration structure
  • scaling policies
  • resilience settings
  • service dependency configuration

Maintaining consistent platform standards across hundreds of repositories became increasingly difficult for platform engineering teams.

OpsWorker continuously analyzes service configuration and compares it with platform practices defined in Backstage and internal engineering guidelines.

When deviations are detected, the system proposes pull requests to align services with recommended configurations.

This allows platform teams to maintain consistent standards across the platform without requiring manual audits of hundreds of repositories.

Continuous Reliability Improvement

By integrating operational intelligence directly into development workflows, Picsart introduced a new model for maintaining platform reliability.

Operational Signals
AI Investigation
Pull Request generated
Stronger Platform

Instead of treating alerts as temporary signals requiring manual intervention, the platform now converts operational findings into reviewable code improvements.

This creates a continuous improvement loop where the platform becomes more resilient over time, entirely driven by real operational data.

“With OpsWorker, Picsart transformed operational signals into a scalable mechanism for continuously improving platform reliability, efficiency, and engineering productivity.”
Ready to start?

Shape the Future of
AI-Driven SRE

Turn operational signals into continuous reliability improvements. Join forward-thinking engineering teams today.

Book a DemoGet Similar Results — Contact Us

Get your free trial

Company
About UsContact UsSecurityPrivacyTerms
Resources
GlossaryBlogProduct NewsAgentic Ops Weekly
Product Resources
DocsIntegrations
AI Tools
KubectlAI

Automating reliability for modern engineering teams.

Trusted, Enterprise-Level Security to Protect Your Data. OpsWorker's agent doesn't transfer any PII or sensitive data, and allows you to control which data is uploaded.

OpsWorker © 2026. All Rights Reserved