Success Story | Intelligence

Modernizing Data Orchestration and Warehouse for a Technical Support Company

Success Story | Intelligence

Modernizing Data Orchestration and Warehouse for a Technical Support Company

Our client is a technical support company for businesses and consumers. The company’s services are performed on Windows, macOS, iOS, and Android, supporting connected and smart devices. These services are performed by the company’s remote, full-time workforce based mainly in the U.S.

How Did It All Start

As a technical support company, they rely heavily on insights from the data generated from the logs, call center conversation, notes, support page, and other data sources to make better business decisions. They built their data architecture using Amazon Web Services during their first years of existence and Amazon Redshift for their data warehouse. Their current process loads and transforms data to their data warehouse six times a day from various data sources such as the website, call center data, relational databases, and other third-party APIs. Its business is growing very fast and, in consequence, so are its needs in terms of data storage and compute resources.

Our client started to have performance issues, including long query wait times, crashes, and lockups. The client had to restart their Redshift cluster, kick-off manual runs of Apache Airflow, manually cancel queries to get things moving and limit the scheduled runs in Airflow.

Our client wanted to consider new architectures for their ETL processes which would allow them to scale faster and cheaper. They asked TVS Next to do an assessment and PoC for the new workload orchestration service, Snowflake Data Warehouse, and compare it with their current technology: Amazon Redshift.

How Did It All Start

As a technical support company, they rely heavily on insights from the data generated from the logs, call center conversation, notes, support page, and other data sources to make better business decisions. They built their data architecture using Amazon Web Services during their first years of existence and Amazon Redshift for their data warehouse. Their current process loads and transforms data to their data warehouse six times a day from various data sources such as the website, call center data, relational databases, and other third-party APIs. Its business is growing very fast and, in consequence, so are its needs in terms of data storage and compute resources.

Our clients started to have performance issues, including long query wait times, crashes, and lockups. We’d have to restart our Redshift cluster, kick-off manual runs of Apache Airflow, manually cancel queries to get things moving, and limit our scheduled runs in Airflow.

Our client wanted to consider new architectures for their ETL processes which would allow them to scale faster and cheaper. They asked TVS Next to do an assessment and PoC for the new workload orchestration service, Snowflake Data Warehouse, and compare it with their current technology: Amazon Redshift.

What Did We Do

We devised a plan to benchmark the Redshift to Snowflake migration on the following:

Estimate the effort to translate the SQL scripts:

Snowflake and Redshift both support a very similar dialect of ANSI-SQL, and Snowflake, for the most part, supports a superset of the functionality that’s included in Redshift. But our goal here is to detect the unexpected inconsistencies, only seen when you get your hands dirty.

Do a performance test:

Measure query response time for different pipelines used in our client environment while comparing with Redshift performance.

Estimate the cost of migrating the data warehouse:

The Snowflake pay-per-use pricing model is very different from Redshift’s fixed price, so we aimed to carry out a dedicated cost/performance analysis to estimate the cost of running the whole client’s ETL.

This helped us to iron out issues earlier related to SQL syntax, determine the optimal size for the warehouses, that will allow us to execute the jobs in less time at a similar cost, determine ETL execution time against different Snowflake flavors (Standard vs Enterprise) and warehouse size by looking at Cost vs. Time, executing parallel queries in multi-clustered environment.

While cost and performance were essential for the client, and maintaining an EC2 based Airflow was hard, it was easy to migrate them to Step Function. Existing monolithic python ETL scripts were migrated to AWS lambda, and EC2 based single instance containers were migrated from Docker to ECS.

Architecture

The Business Outcome

Instant availability of new data

Lower query wait times

Less report downtime

Ability to run data workload multiple times in a day and copy data to Snowflake

91%

reduction in Airflow workflow completion time to copy data into Snowflake

Speak to an expert today

We have the expertise to help enterprises of all sizes take on their greatest challenges.

Let us show you how our services can help you create a better experience for your customers.