All Case Studies
Cloud / Storage · 18 days

Cloud File Hosting: 6 PB Data and 99.99% Availability

A major international cloud file hosting was growing faster than its infrastructure. Manual server management with 3M users became a critical risk. We migrated everything to IaC, auto-scaling, and CI/CD — in 18 days.

TerraformPackerAWS S3Load BalancersAuto ScalingGitLab CIMySQLPrometheusGrafana
6 PB
Data at 99.99% availability
3M
Active users
12K
Concurrent connections
1.5 Gbps
Peak traffic
-75%
Deployment time
-30%
Operational costs

Problem

The service served 3 million users and stored petabytes of data, but infrastructure was managed manually. Each deployment took up to 4 hours and required manual engineer intervention.

During peak loads, scaling took 2-3 hours. The risk of human error when working with production data was unacceptably high.

  • Manual management of 50+ servers
  • No auto-scaling for peak loads
  • 4-hour deploys, downtime risk
  • No unified monitoring

Solution

We migrated all infrastructure to Terraform: every server, load balancer, S3 bucket is described as code. Packer images ensure environment consistency. Auto Scaling Groups respond to load automatically.

GitLab CI/CD provides zero-downtime deploys via rolling updates. Prometheus + Grafana give full visibility into 12K+ concurrent connections in real time.

  • Terraform IaC for all infrastructure
  • Auto Scaling Groups + Packer images
  • GitLab CI/CD, 60-minute deploys
  • Prometheus + Grafana monitoring

Implementation Timeline

Phase 1
3 days

Audit & Architecture Design

  • Current infrastructure audit and bottleneck analysis
  • Target IaC architecture design
  • Auto-scaling strategy definition
Phase 2
7 days

IaC & Auto Scaling

  • Terraform modules for full infrastructure
  • Packer images for auto-scaling groups
  • Load Balancer and health check configuration
Phase 3
5 days

CI/CD & Storage

  • GitLab CI/CD pipelines for zero-downtime deploys
  • AWS S3 and PDS integration for petabyte storage
  • MySQL optimization for high-load queries
Phase 4
3 days

Monitoring & Optimization

  • Prometheus + Grafana dashboards for 12K+ connections
  • Performance degradation alerts
  • Cost optimization: Reserved Instances + Spot

Before & After

MetricBeforeAfterChange
New version deployment~4 hours (manual)~60 minutes (CI/CD)-75%
Peak load scalingManual, 2-3 hoursAuto, 5-10 minutes-95%
Service availability99.5%99.99%+0.49%
Operational costsBaseline-30% of baseline-30%
Concurrent connections~3K12K++4x

Key Architecture Decisions

🏗️

IaC-First Approach

100% of infrastructure described in Terraform. Any server can be recreated in minutes. Zero manually created resources.

📦

Immutable Images

Packer builds ready AMI images with pre-installed software. Auto Scaling Group launches identical instances — no configuration drift.

🗄️

Tiered Storage

Hot data on SSD instances, cold data in S3. PDS for object storage. MySQL with read replicas for metadata. Result: 6 PB at minimal cost.

"We didn't believe it was possible to migrate a petabyte storage to IaC in 18 days without a single hour of downtime. Now our deployment takes one hour instead of four, and scaling happens automatically."

— CTO, International Cloud File Hosting

Similar challenge?

Tell us about your infrastructure — we'll propose a concrete plan within 24 hours.