Success Story | Intelligence

Data Lake Modernization to drive cost-efficiency and faster decision-making for a retail customer

Success Story | Intelligence

Data Lake Modernization to drive cost-efficiency and faster decision-making for a retail customer

Our client is one of the largest and most valuable free membership loyalty programs in the world, connecting savvy shoppers with more than 3,500 online merchants and services to give them a hassle-free way to save on everyday purchases. Founded in 1997, the client has nearly 15 million active members who have earned over $1.5 billion in cash back rewards.

The client performs various insights and analysis, leveraging data from disparate sources on its current on-premise Hadoop platform on shopper behaviour, pricing, commissions and much more to create compelling offers and satisfying service for their customers.

As multiple business units started running various workloads and on-demand/ad-hoc analytics on the platform it started to take a toll on disks and processors, reaching it’s critical mass limit/maximum threshold.

As the data volume increased, the need for more demand from platform resources such as hard disk access, memory, and CPU time for resource-intensive processes from various business unit demands increased. The current environment couldn’t manage the needs and demand for scalability. Operations engineers were spending most of their time in troubleshooting hardware and services failures. The client needed a new environment for their data and a new way to query their data that would ensure efficient and responsive analysis during peak demand periods.

How Did It All Start

TVS Next did a 4 weeks POC for the customer to transfer 15TBs of on-premise Hadoop clusters data to S3 while supporting current Hadoop environment.

What Did We Do

TVS Next team restructured their data infrastructure and migrated the client data warehouse workloads from their on-premises Cloudera Hadoop cluster to AWS using a combination of services like Amazon S3, AWS EMR, and Snowflake. The on-premises enterprise Hadoop cluster consisted 270 TB of total storage. The data marts were created using Snowflake on AWS. AtScale was used to provide query performance optimization and a single, virtualized view of the data delivered as Data-as-a-Service.

TVS Next team leveraged its Elastic Data Platform solution accelerators to build a job orchestration platform to ingest data (original data store layer), process data (data warehouse layer) and store in data warehouse. This involved near real time jobs and batch jobs. The data ingestion layer ingested data from various sources like PostgreSQL, MySQL, Apache Kafka etc., to Snowflake on AWS.

The data processing layer read data from snowflake and then it was processed and written to a final data warehouse layer. The data processing job types included Scala, Python, SQL etc. The job orchestration framework was designed keeping ease of usability and scalability in mind. Dockerizing the framework and deploying it on Amazon Elastic Kubernetes Service (EKS) allowed for graceful scaling.

A new user to the system could schedule and configure a job without deeply getting involved in the underlying structure. An Airflow based orchestration workflow was designed for real time user data pulls (login data, order transactions etc.,) and ingest to Snowflake. The framework supported various sources like PostgresSQL, MySQL, Apache Kafka etc., and target as Snowflake and Apache Kafka. Also, there were other jobs which read data from Snowflake, heavy data transformations, combine multiple tables and write data to Snowflake. Approximately 600 jobs were migrated and now run in the cloud using the job orchestration framework.

Architecture

The Business Outcome

30%

drop in load on the on-premises computing cluster

68%

improvement in real-time feedback & live monitoring

98%

speedy delivery & release cycles

0

downtime due to infrastructure updates with no negative impact on business

270TB

data migrated to a hybrid cloud platform with Snowflake

0

downtime due to infrastructure updates with no negative impact on business

270TB

data migrated to a hybrid cloud platform with Snowflake

Speak to an expert today

We have the expertise to help enterprises of all sizes take on their greatest challenges.

Let us show you how our services can help you create a better experience for your customers.