TVS Next

Retail Application

Data Engineering for a Leading Retailer

Faster product life cycles and ever-complex operations cause retailers to use Big Data analytics to understand supply chains and product distribution to reduce costs. Many retailers know well, the intense pressure to optimize asset utilization, budgets, performance, and service quality. It’s essential to gaining a competitive edge and driving better business performance.

The Need

  • Aggregation of data into a single source of truth
  • Reduce time to run a batch process
  • Challenges in deriving analysis and meaningful insights
  • Addressing the challenge for Business to comprehend current operational and financial metrics which took over 24 hours then.

The key to utilizing data engineering platforms to increase operational efficiency is to use them to unlock insights about trends, patterns, and outliers that can improve decisions, drive better operational performance, and save millions of dollars.

What was done

We configured, implemented, integrated, and deployed a structured enterprise data warehouse that provided a comprehensive set of subject areas for further analysis, focusing on Sales, Purchase, Inventory, HR, and Finance.

The Solution Architecture addresses major business needs as below:

1) Enterprise Data Warehouse containing Clean & transformed data for a single point of truth

2) Historical data up to 3 years

3) Volume Capability to handle and process a million rows per week

4) Aggregated tables and granular data with Staging capabilities to audit

5) Initial and Delta load capabilities based on delta extraction rules

6) Star schema-based solution for performance efficiency

7) Enterprise Open source solution for ETL

8) Data Integration Server

9) Distributed processing capabilities with scalability


How was this done?

Data Warehouse Modelling

The first of our actions was implementing Data warehouse modeling covering identification of possible subject areas from provided source datasets and creation of relevant data warehouse with subject area-specific data marts and aggregates. A hybrid model (star and snowflake) was created for relational structures and handling storage models for unstructured or semi-structured data.

Data Integration

Confluencing various structured and semi-structured data for integration from sources such as RDBMS, structured text files from the FTP server. Target data environment being an RDBMS-based solution.