Here we go again.
This is the third episode of our series on how we are ingesting retailers’ data for our beloved customers.
If you did not read our previous post about this topic, consider spending a few minutes catching up on it.
In the following paragraphs, we will show you how our new process to update the assortment of our stores works and our journey to get there.
From the previous episode
In the previous episode, we showed you how we reached coordination in our price update flow. We achieved it by the mean of a semaphore stored on Redis: each time it gets accessed, we decrease its value. Once it reaches 0, the next stage of our pipeline begins.
We also covered our Conversion Optimizer, a machine learning algorithm.
As soon as prices are updated, the next step of our pipeline can start.
Is there really more?
Yes, prices are no joke to us, but keeping them updated is not all there is.
One core value of our e-commerce platform is to let our customers browse products in an intuitive way.
For this reason, we divide the assortment of our stores into many levels of categories.
1, 2, 3, 4, 5…? 🔢
At the time we set those up, we did not know how many levels there were going to be.
So, we structured them to be as flexible as possible. We managed to do it by introducing a table where records are linked recursively:
|1||NULL||0||Fruit & Vegetables|
All was running smoothly. We were happy with the flexibility we had, ready to face any sort of custom tree of categories.🦸♂️
As it happens, the future proved us wrong: we optimized for flexibility but we soon found out we needed scalability more. After some time, as we grew our customer base and the store pool, the system became very slow. It was barely usable.
This was worsening day-after-day, as we kept growing.
The problem 💣
It took us a while to isolate the issue: was it the database? Was it the category structure? Was it a poorly performing query?
It turned out, the slowness we faced was caused by how we read the data. 🎉
In particular, the problem was on the queries performed when presenting the main page of a store. Here, we load the first levels of the categories with the most popular items.
We assumed we could read the data the same way we stored it. This was a fair assumption when the load of the system was light. But, as it increased the slow queries became slower and our loading time increased.
Separation of concerns 🪚
Who says we need to write and read the data in the same way?
We trusted in the well-known principle of the separation of concerns and we applied it to our schema:
- We keep writing the data in this flexible table in the same way
- To improve the user experience, we re-work the way we present it
To make the reads faster, we added some tables where we loaded the data ready to be presented to our users:
FETCH1: the third level we want to prepare,
FETCH0: our second level categories,
TREE0: the highest view, the top level categories of a store,
TREE: a view that connects them all
These tables do not handle much: they connect a store to its best products for each category level.
However, this small task avoided many lookup queries and allowed us to reduce the load time from 8s to well-below 1s!
Problem solved, but…🕵️♂️
Although the problem was solved and our users were happy, the process had more issues that were waiting to be uncovered.
For instance, the way these tables were generated was not the best. We had to have a solution and to have it fast, but we always knew that we would need to revisit it.
Those tables were compiled in the most intuitive way: drop them, create them, fill them with some queries. And this means downtime. So, it was only possible to run it at night, when our customers were tucked in. 🛌
Moreover, the tables were initially running on the
MEMORY engine of MySQL, which supports only table-level locks and does not play nice with multi-threaded operations.
We kept growing, making the issues we just described worse. More users and more stores meant a longer running time for this process.
Given all these, we decided to build an improved version on this part of the flow and to make it future-proof.
Yet another solution 🔨
To handle our growth, the whole generation chain was revisited. We decided to take the time to have a system that:
- Has no impact on our customers
- Can be launched at any time
- Can be triggered by external services and tools
- Performs well even with thousands of [concurrent] stores
We found our solution in an event-driven system that would run the generation code for 1 store at a time. As the new process is store based, we have a tool that is inherently faster and that has fewer chances to impact our users. Also, it can be plugged in into other flows, as it is event driven. This gives the possibility to trigger it both programmatically and at will.
Was it really needed? 🤔
As we grew, we started taking notes about the failures and the bottlenecks of our system, to prioritize them. However, even if the previous version of the system had not failed for more than 1 year, the damage it could have caused would be catastrophic (e.g. all stores empty, no orders could be placed).
So we decided to get it done as soon as there was room – luckily we did.
As the new project was released and deployed on our canary environment, we had a big crash on the legacy one! 💥
The new version allowed us to recover in minutes what the previous system would have taken the entire day.
The future is bright 🚀
Oh, we have so many ideas on how it can be further improved!
Although, before all that, we want to prove even further that it is a very useful tool. We want to integrate it in as many places as we can. For example, without any further development, this tool could save our business team days of waiting time, making any change to the assortment instantaneous.
On top of that, it runs by store and this means we can experiment and do the same for our price update procedures: making them modular, fast, and resourceful!
As I said, many ideas: they may as well deserve an article on their own…
A lot of our processes have been improved and simplified recently: this is just one of many examples of how we keep improving our very own product.
We strive to create software that is not only efficient for our stakeholders and customers, but that solves problems they don’t even know they will have.
If you think you are brave enough, and you have what it takes to solve it, visit our careers site to see our current openings.
Stay tuned for the next episode!🖖