A Brief History of Price Update (Part 2)

Reading Time: 7 minutes

Here we go again.
This is the second episode of our docuseries about how we are managing retailers’ data for our beloved customers.
If you did not read our previous post about this topic, you should spend a few minutes catching up on it.
With this part, we would like to introduce the hidden aspect of our product and I promise, it is not a buzzword: Machine Learning.
In the following paragraphs, we are going to show new technical improvements and the analytics aspects under the hood.

From the previous episode

The entire process is asynchronous.
All the price producers are sending raw data into our FIFO queue, where consumers are fetching and inserting bulk data into our “temporary” table.
The final instruction sync tables via a single heavy UPDATE query.
Everything under tons of fallback systems to prevent a Single point of failure, we are handling prices after all.

Under Control

The first problem with this amount of messages: X Prices per Y Stores, is the coordination.
How can we ensure that a flow is completed when everything is asynchronous?
We introduced the concept of Semaphores 🚦.
If we know at the beginning how many subsets of events we will create, we can create a reference with the value and decrease it every time an event is processed.
In this scenario, we know the number of the active Stores (eg. Y) we are going to parse, thus we set a Redis Semaphore with Y as a value via SETNX command.
When a single Store is over, we decrement that shared variable via DECR command.
Finally, when the value is void we can assume that everything has been processed.
We could sum up everything like this:

  • Set the 🚦 with the number of “foobar” you have to handle.
    In our case, this is the number of active Stores we have to read.
  • Per every subset, you could send other events if you want.
    We are creating X events within a list of Prices for that Store.
  • When everything is done, decrease the shared variable.
    We know when the cursor of the Store is empty and we will decrease the value.
  • If the 🚦 is (void|empty|zero|null), we know that everything ran correctly 🎉.

The second problem about this flow is that if any particular Store cannot be executed for some reason, the whole process is stuck.
Thus we added the concept of a time-slot: if after N seconds no new messages are pushed into the queue we could assume that the Store-based process is stuck and we reduce the Semaphore.
Of course, we have to guarantee that a particular process will not send an event about his status anymore otherwise, we will decrease the value twice.

If you get all the steps, you might use this pattern to add sequences even if the flow is distributed and asynchronous.
In particular, we added different layers to our batch operator.
I am going to introduce you to the last layer we added: Conversion Optimizer.

Conversion Optimizer

The aim of this project is to show to customer the most appetizing configuration of an item.
To achieve the goal of the project, we added new temporary tables to use and achieve the very same concept and speed described in the previous blog post.
When we update all items’ data, we are iterating over the new changes with another series of updates led by our formula, assuming the same asynchronicity and fault-tolerance.

However, our current tech-stack cannot handle the new needs of the Analytics team.
So, following our Agile culture and MVP pattern, we decided to use AWS Lambda.
Basically, we are using the Lambdas as an external data provider for our queues and tables.
As we did for the retailer, the concept for us is the same, we are handling this provider as our partner.

Items are updated daily, basing their values mainly on customer behaviour.
On our platform, customers define their behaviour choosing one item instead of another one on the same page.
In this way, we know that customers have seen both items, deciding which one to buy.
In fact, the metric considered to define changes in items’ info is conversion rate.
It is defined, at the product level, as the number of item quantities “added to cart” over the total number of times it has been seen (known as product impressions).

Item conversion rate
CR formula

This quantity gives us an idea about customer preferences on single items.
The algorithm tests the significance of rate changes over different points.
The aim is to evaluate the reaction of a customer to a new configuration of the item and, as a consequence, investigate preferences.
Collecting different data points, the final goal is to optimize the conversion rate and find the best configuration of each item.

This optimization problem is solved by exploring items’ info space, trying different configuration of items.
It allows us to compare different scenarios and find the best possible solution to optimize cart composition following customer sensitivity.

As mentioned above, the items’ optimization app is running on AWS services.
The main service used is Lambda, where the code implementing the algorithm is deployed. Other used services are:

  • S3: to store data needed to correctly update items
  • API Gateway: to make application callable from Everli servers, “dispatching” lambdas based on the HTTP verb of the request
  • DynamoDB: a store state manager
  • CloudWatch: to store and easily analyse application logs

To make items update daily, we have scheduled a data retrieve function to download the starting point at a store level.
Then during the upload described above and in the previous article, data ingested from partners is sent to the app and store state is updated.
All of the data is saved into a S3 bucket and DynamoDB state is updated.
At the end of the entire process, during the UPDATE step, the app is called on its GET endpoint and it returns all the items that need to be updated.

In the flow below we summarised the AWS app behaviour:

Prices in AWS
AWS Infrastructure

What are right now our priorities?

Since right now we have to “append” a new process in a pipeline way and as you can imagine, we have a limited processing window during the night, we decided to test a different behaviour of our process: Priority.
We are doing different things but using the same data.
Instead of waiting for the first part of the process to start the second one and so on, we are trying to interlace events.
When the batch request has been handled, we immediately send the Conversion Optimizer events with items’ data, removing downtime between both processes.
A downside of this pattern is that during a spike of messages inside our FIFO queue we will handle events from different processes one after the other.

In our mind, if the queue has Z messages, Z / 2 from retailers and Z / 2 for Conversion Optimizer, we achieve it by applying RabbitMQ priority queues: internally, the messages are sent with a different priority and the consumer will always try the one with the highest.
However, this being transparent in our logic, while the queue will always be one, it is as if messages are sorted now.
Via RabbitMQ this is possible saving a maximum value inside the queue and setting the right value per message.
ProTip: if you have your queue running, you have to destroy it BEFORE adding the x-max-priority, but first enjoy tons of Slack alerts.

The flow with 2 producers and 1 consumer is something like this:

  • producer side
    1. producer A at t0 => push 1, priority 10
    2. producer B at t1 => push 2, priority 0
    3. producer A at t2 => push 3, priority 10
    4. producer A at t3 => push 4, priority 10
  • consumer side
    1. consumer C, fetch 1 at t1
    2. consumer C, fetch 3 at t2
    3. consumer C, fetch 4 at t3
    4. consumer C, fetch 2 at t4
2 producers, 1 consumer

There are few more considerations to take into account, such as time and a number of consumers.
E.g. the flow with 2 producers and 1 consumer with time consideration:

  • producer side
    1. producer A at t0 => push 1, priority 10
    2. producer B at t1 => push 2, priority 0
    3. producer A at t5 => push 3, priority 10
    4. producer A at t6 => push 4, priority 10
  • consumer side
    1. consumer C, fetch 1 at t1
    2. consumer C, fetch 2 at t2
    3. consumer C, fetch 3 at t6
    4. consumer C, fetch 4 at t7
2 producers, 1 consumer and different time T

E.g. the flow with 2 producers and 2 consumers:

  • producer side
    1. producer A at t0 => push 1, priority 10
    2. producer B at t1 => push 2, priority 0
    3. producer A at t5 => push 3, priority 10
    4. producer A at t6 => push 4, priority 10
  • consumer side
    1. consumer C, fetch 1 at t1
    2. consumer D, fetch 2 at t1
    3. consumer C, fetch 3 at t2
    4. consumer D, fetch 4 at t2
2 producers, 2 consumers

Day after day we are moving the limit ahead by ourself.
If I am looking at myself six years ago, when we started this journey, it was not possible for me to understand how many complex things we could achieve.
We are keeping a lookout for developers to help us solve problems like this, and if you think you are brave enough and you have what it takes to solve it, visit our careers site to see our current openings.
Stay tuned for the next episode 🖖.

Ciao! 🚀🚀🚀

Authors: @hex7c0 @MattiaU

Leave a Reply

Your email address will not be published. Required fields are marked *