Latency = delay. It’s the amount of delay (or time) it takes to send information from one point to the next. Latency is usually measured in milliseconds or ms. It’s also referred to as a ping rate.
In retail and e-commerce, its life and death.
Levi Strauss, the iconic American clothing manufacturer was in the process of migrating databases from multiple database vendors (ORCL and MSFT) to AWS.
As is common with migration efforts, the new AWS environment was experiencing slow response times leading to a loss in transactions. To put this in perspective, an outage on Black Friday lasting one minute with subsequent data loss would cost Levi’s $45,000
About The Customer
Chances are good, you’ve worn a Levi’s garment in the past 72 hours. The company holds US Patent #139121 giving them the right to copper rivet reinforced denim blue jeans. Contrary to common lore, the company did not sell blue jeans to Gold Rush prospectors and in fact did not sell what we now consider their iconic brand of pants until 1890. Originally marketed to western laborers and workers, the pants were considered an essential commodity during WW2 and were sold only to people in the defense industry. Popularized by greasers, cowboys, mods and hippies in the 50’s and 60’s, the company is caretaker to one of America’s iconic brands alongside Coca Cola and Disney.
Levis had created a Redshift Database Environment that facilitated the collection of multiple data sources from both a transactional perspective as well as an analytical perspective. The purpose of the warehouse is to provide a treasure trove of data that could be analyzed, reported on and used to make business decisions ahead of planned events or key sales periods. The information Warehouse that was deployed had quite a few functions that it needed to support in order to fulfill its purpose. Those functions were the ability to ingest data from multiple Transactional and Analytical Database platforms such as Oracle and SQL Server, transform the ingested data and update it into the Redshift warehouse in a timely manner all while providing Business Critical Reports to key personnel that would use the data to make key business decisions. The system as designed was not able to fulfill the basic functions outlined as it was constantly crashing. In essence, the system lacked the following characteristics: Predictability, Stability and Scalability.
- Levi’s customer data hub (CDH) was the primary analytics platform for customer data.
- CDH on Redshift ran daily data pipelines to load and transform data in Redshift.
- The majority of these OLTP processes were run during the 12:00 AM PST to 4:00 AM PST window.
- The analytics team was dependent on this data for ad hoc analysis, Tableau dashboards, other analytics processes.
- Jobs were getting stuck.
- mLogica identified locks blocking a few of their critical ELT jobs.
- The challenges:
- manual process
- lacking resources
- short window
Levi’s wanted to bring in more data from internal data sources as well as third party data sources andopen the analytics platform to more users globally.
The mLogica team began with a detailed analysis of the environment to review the Redshift design and architecture as well as the key processes and tools used to enable the functionality required of the system and establish the baseline issues impacting the stability of the environment. After a thorough Analysis, mLogica was able to determine that due to the original design, key processes that needed to complete were in constant contention with each other thereby causing the system to become unstable and crash. mLogica recommended modification of the processes, new configuration settings for the tools which were being used to support the key functional requirements as well as a new deployment architecture more suited to support the requirements of the system and to achieve the performance goals outlined early in the engagement. A resolution of this situation was proposed to the Customer with mLogica providing a best practices and recommendations document along with a conceptual architecture to improve the end-to-end data pipeline performance including DMS and Redshift service optimization. The recommendations were accepted and the new configuration deployed which allowed us to achieve the Predictability: Processes finishing on time with no contention, Stability: The system uptime being achieved, Scalability: The ability to scale to meet peak demands.
mLogica implemented the recommended end state architecture for separate workloads between Aurora and Redshift; including directing the insert, delete, update processing to Aurora; performing transformational processing at the DMS level (ETL) or at Aurora level (ELT) and using Redshift only for analytics.
After the new architecture was implemented, mLogica directed the users to Redshift based on their use cases. Batch processing was also directed accordingly. This eliminated user and processing restrictions.
This optimization enabled the Customer to separate dev/QA from the Production instance for both Redshift and Aurora as part of the proposed end-state architecture. Additionally, HA/DR was implemented and CloudWatch utilized to monitor various aspects of the Redshift cluster. These best practices enabled the Customer to define data residency requirements, since the final deployment will be global; review the compliance risks and create mitigation plans; and plan for data security risks – data privacy and GDPR regulations.