The context

 

I bumped into this issue like I did with many others – working on a large catalog which needs frequent price updates. In my particular case, we are talking about a catalog of 50k products spread across 4 active websites (each website with a single active store view) on a Magento Enterprise 1.14.2.0 platform.

The products receive their selling prices from their original providers via ERP calls and a product’s price can change multiple times during the day. Keep in mind that these updates must be executed at an interval of 15 mins and over 5k products can change their price data in one synchronization call, which means a fast mechanism needs to be in place in order to cope with the huge amount of data.

The development of the solution was a team effort as I worked together with my colleague, Andra. We split the implementation into two large components: ‘the API processing’, which I took care of, and ‘the API post-processing’implemented by Andra. In this article, the two of us will, in turns, guide you towards the implementation of a fast mass price update for your product catalog.

The API processing

 

Since the Magento SOAP v2 API was used in the project for operations like stock or products’ general data updates, I thought I should keep this consistent and build a SOAP API v2 method that will save the price and special price data per store for a list of given product SKUs.

The strategy

 

My plan was to inject the values directly into the database as I knew exactly which attributes were going to be changed (‘price’, ‘special_price’, ‘special_from_date’ and ‘special_to_date’) and which particular tables were involved (‘catalog_product_entity_decimal’ and ‘catalog_product_entity_datetime’).

I didn’t worry about the partial reindexing process of the affected products, since the triggers used in this process are set at database level. Any change to the targeted tables will trigger the partial reindex.

I took this approach as I knew Magento was very slow when performing a ‘save()’ operation on a product, and the usage of the ‘updateAttributes()’ method would be incomplete without the post product saving processing and still slower than a direct values injection. Basically, I wanted to make ‘my way on the highway’.

I recommend this approach if you know what kind of data processing is triggered after your product is normally saved in the context of your project. In my case, there wasn’t any 3rd party extension post processing, only Magento out-of-the-box actions, but we will talk about this later.

The new API operations module

 

So, let’s get to work! I started by creating a new module called ‘Evozon_API’ which would hold my new API method (I am assuming you already know how to create a custom Magento module and that you also understand the basic Magento SOAP API concepts and components, therefore I will not insist on these aspects).

The API configuration files

 

Under the ‘etc/’ folder, along with the obvious ‘config.xml’, other configuration files required for the API method declaration were introduced:

I added the ‘api.xml’ file in order to define the class and method that will handle the SOAP API call, as well as the API ACL resources:

Then I added the ‘wsdl.xml’ file, which describes the new method’s functionality. This file defines:

  • the complex typeswhich represent the request and response internal structures;
  • the messagespassed (the request and the response);
  • the port typewhich describes the API call with input (request) and output (response) messages;
  • the bindingand the service:
If you are using the Magento SOAP API with WS-I compliance enabled (‘Services -> Magento Core API -> WS-I Compliance’), you need to define the ‘wsi.xml’ configuration file, since the WSDL nodes structure is different than the one in the ‘wsdl.xml’ file. This was not my case so, for the sake of simplicity, I will proceed with my implementation without the WS-I compliance.

The API model classes

 

I first created the ‘Evozon_API_Model_Product_Price_Api_V2’ class, which performs the mass price update. As previously mentioned by the configurations inside ‘api.xml’, this class includes a method called ‘multiupdatePrice()’, which receives an array of product prices data and returns a result array. This is the method that is going to do all the hard work.

This class extends the ‘Evozon_API_Model_Product_Price_Api’, which would have been the class used for SOAP v1 calls as you may have already guessed. In our example, we use only SOAP v2, but I left ‘an open door’ for future implementations, if at some point we need to implement the same mass update method for the v1 mechanism.

For now, the method ‘multiupdatePrice()’ exists in this class, but it doesn’t do anything :). Other than that, the class includes database connection resources retrieval, which will later be used by the child class ‘Evozon_API_Model_Product_Price_Api_V2’.

Tape rewind

 

Let’s get back to the child class ‘Evozon_API_Model_Product_Price_Api_V2’ and break the ‘multiupdatePrice()’ method implementation into smaller and easier to digest chunks. I will keep my focus on this method and not overload you with the full class code, as each of the used methods will be described below anyway.

Call to method ‘_init()’ is adding values to protected properties of the current class, which will be used later in the implementation of the price data saving:

Validate your data

 

Before changing the products’ prices, we need to validate the data in order to make sure that only correct and secure data is saved inside the database. The call to ‘_validateData()’ points to the class ‘Evozon_API_Model_Product_Price_Validator’, which will be used for all data checks (I added this class as a stand-alone one, no need to extend anything).

From my point of view, the validation process should consist of two major parts: static and dynamic data validations.

The ‘Static data validation’ part, as I called it, includes simple checks which confirm whether the data provided is correct or not, that is if all mandatory fields are included in the provided XML and all included fields have the corresponding data type:

  • ‘sku’ field is mandatory, since this is the common denominator used by the two systems when referencing the same product entity;
  • ‘store_id’ is also mandatory, since we are saving product prices on the website by referencing the ID of the store associated to it;
  • ‘price’ field should be included every time and must be a numeric value greater than 0 (we don’t allow 0 priced products);
  • same goes for ‘special_price’ value if it’s included in the call;
  • date fields ‘special_from_date’ and ‘special_to_date’ must have a correct date format (I used ‘Y-m-d’) if included in the call.

Some sample snippets:

The array ‘$this->_result’, with its particular keys in ‘ – ’ format, is used in order to identify the validation violations and to report them back to the API class handler, and then back to the caller. You can choose your own keys in order to specify which product data and on which store is faulty.

The ‘Dynamic data validation’ part will check whether the provided data matches products and store ids from Magento’s database. All these checks are performed by executing Zend style select statements in order to provide a good performance. So we will check:

  • if given SKUs correspond to existing products;
  • if the provided store IDs match stores which are assigned to existing websites;
  • if the products with the given SKUs are assigned to the websites referenced by the given store IDs.

A few examples:

Any product data items which don’t satisfy the validation conditions should be rejected and reported back to the caller with a specific error message. All product data items which satisfy the conditions should be saved in the Magento database and successful updates should be reported back to the caller also.

In my first implementation attempt, the API method returned an array of successfully saved products or a fault code and message every time the provided data was not correct (‘all or nothing’ approach). This mechanism induced a ‘bottleneck’ effect, since no product price was saved until the faulty data was corrected.

In a later improvement, I modified the implementation so that the correct data is saved and faulty chunks are returned with a specific error message referencing a list of SKUs and store IDs for a particular incorrect behavior, instead of using API fault messages. This approach eased the data check on the caller side, while not blocking the price synchronization process.

Database operations

Once the data is validated, we can move forward. We got to the fun part, which is the data saving. By now you know we are building a fast mechanism, but you may have wondered how we were going to do this in an optimal way. The answer: Zend style update queries to the database.

We will prepare the data first, as we need to know what kind of operation is required (insert/update or delete), which are the tables and attributes involved in these operations.

We are going to save the price by referencing the ‘store_id’. The price attribute has website scope in our case, so our changes will set the price per website actually (there isn’t a customization to set price on an individual store per se). We are using the store_id directly in order to avoid a secondary computation regarding extraction of the data and also for consistency reasons since all other existing SOAP calls manipulating products are performed by usage of ‘store_id’.

Note: no, you cannot delete the price attribute value of a product by API 🙂

Now let’s save this to the database. Take a look below:

I’m sure you’ve noticed that I’m using a transaction for my database changes upon each product entity. The reasoning behind that implementation targets the usage of multiple tables and multiple attributes. We don’t want partial data to be saved on our product, do we? So I am using a transaction in order to encapsulate all database writing operations concerning a specific product.

Another key aspect of this implementation is the ‘evozon_api_price_update’ event dispatch in the initial multiupdatePrice() method, which will allow other implementations to hook observers to it and to process the recently updated products. What kind of processing? My colleague Andra will tell you all about it 🙂

The API post-processing operations

 

The snag we hit at this stage of the implementation had to do with promotions – we noticed that, after the prices were updated via API, the discounts that should have applied to them still used the old prices as reference. This was because the ‘Mage_CatalogRule’ module, which updates the ‘catalogrule_product_price’ table in accordance with product price changes, was not aware that the product data had changed. Had we modified the product prices in the backend, this module’s observer method applyAllRulesOnProduct() would have been triggered and would have adjusted the prices accordingly. As this was not the case, a solution had to be implemented.

The solution I decided on was creating an observer for the event dispatched in the multiupdatePrice() method. This observer, in theory, could have simply mimicked the behaviour of the ‘Mage_CatalogRule’ module observer, but this was undesirable because:

  • it would have meant instantiating the product model for every updated product;
  • somewhere down the line, it would have involved recalculating the rule-product relationships in the ‘catalogrule_product’ table, which was unnecessary in our case.

Given the above issues, I chose to use the same approach as the cron that performs the price calculations on a daily basis, i.e. call the Mage_CatalogRule_Model_Resource_Rule::applyAllRules() method, providing the successfully updated products’ IDs as arguments.

Are we there yet?

 

If your store is using flat tables (which it should), there is one last missing piece in the implementation. The indexer that updates the data in the ‘catalog_product_index_price’ table, the ‘Mage_CatalogRule_Model_Action_Index_Refresh_Row’ class, will, at some point, create a temporary table to use as a data source. This method will use the flat table data if possible, otherwise it will retrieve the price from the ‘catalog_product_entity_decimal’ table.

Unfortunately, this method produced inconsistent results. If the updated value was present in the flat table, the values subsequently inserted into the ‘catalogrule_product_price’ table and the ‘catalog_product_index_price’ table were all consistent. Otherwise, discrepancies manifested in the ‘catalog_product_index_price’ table, where the product’s price was the updated price, whereas the final price was the price calculated based on the outdated value in the flat table. In the end, I chose to rewrite the method and force the indexer to always query the ‘catalog_product_entity_decimal’, which definitely contains the correct value, as this is where the API injects the price. To achieve this, I removed the ‘if’ statement and only preserved the code found on the ‘else’ branch in the rewritten method.

Conclusions

 

It’s been quite a ride writing the above described solution. We began by asking ourselves “how on Earth will we ever build a solution suitable for such a large catalog with so many updates?” and we ended up thinking that this was actually not so bad at all. 🙂

The process followed the well known trial&error pattern. Many tears, laughs and testing hours were spent building this synchronization mechanism, but it was worth it: the above described implementation is deployed on a production system where it processes 5k products under one minute (100 products/second).

What do you think?

By Diana Botean & Andra Barsoianu

This article was originally published on magento.evozon.com