This article guides you toward the implementation of a fast mass price update for your product catalog.

The context

I bumped into this issue as I did with many others. I was working on a large catalog that needed frequent price updates. In my particular case, we are talking about a catalog of 50k products spread across 4 active websites (each website with a single active store view) on a Magento Enterprise 1.14.2.0 platform.

The products receive their selling prices from their original providers via ERP calls. A product’s price can change multiple times during the day. Keep in mind that these updates must be executed at an interval of 15 mins. And, over 5k products can change their price data in one synchronization call. That means a fast mechanism needs to be in place in order to cope with the huge amount of data.

The development of the solution was a team effort as I worked together with my colleague, Andra. We split the implementation into two large components: ‘the API processing’, which I took care of, and ‘the API post-processing’ implemented by Andra. In this article, the two of us will, in turn, guide you toward the implementation of a fast mass price update for your product catalog.

The API processing

We used Magento SOAP v2 API in the project for operations like stock or products’ general data updates. So, I thought I should keep this consistent and build a SOAP API v2 method that will save the price and special price data per store for a list of given product SKUs.

The strategy

My plan was to inject the values directly into the database. I decided this because I knew exactly which attributes were going to be changed (‘price’, ‘special_price’, ‘special_from_date’ and ‘special_to_date’). I also knew which particular tables were involved (‘catalog_product_entity_decimal’ and ‘catalog_product_entity_datetime’).

The triggers used in this process are set at database level, I didn’t worry about the partial reindexing process of the affected products. Any change to the targeted tables will trigger the partial reindex.

I took this approach as I knew Magento was very slow when performing a ‘save()’ operation on a product. And the usage of the ‘updateAttributes()’ method would be incomplete without the post-product saving processing. And it would be still slower than a direct values injection. Basically, I wanted to make ‘my way on the highway’.

I recommend this approach if you know what kind of data processing is triggered after your product is normally saved in the context of your project. In my case, there wasn’t any 3rd party extension post-processing. There were only Magento out-of-the-box actions. But, we will talk about this later.

The new API operations module

So, let’s get to work! I started by creating a new module called ‘Evozon_API’ which would hold my new API method (I am assuming you already know how to create a custom Magento module and that you also understand the basic Magento SOAP API concepts and components, therefore I will not insist on these aspects).

The API configuration files

Under the ‘etc/’ folder, along with the obvious ‘config.xml’, other configuration files required for the API method declaration were introduced:

I added the ‘api.xml’ file in order to define the class and method that will handle the SOAP API call, as well as the API ACL resources:

<?xml version="1.0"?>
<config>
    <api>
        <resources>
            <product_price translate="title" module="evozon_api">
                <title>Update product prices</title>
                <model>evozon_api/product_price_api</model>
                <acl>multiupdate_price</acl>
                <methods>
                    <multiupdate translate="title" module="evozon_api">
                        <title>Update product prices bulk</title>
                        <method>multiupdatePrice</method>
                    </multiupdate>
                </methods>
            </product_price>
        </resources>
        <v2>
            <resources_function_prefix>
                <product_price>productPrice</product_price>
            </resources_function_prefix>
        </v2>
        <acl>
            <resources>
                <multiupdate_price translate="title" module="evozon_api">
                    <title>Evozon API - product price multiupdate</title>
                    <sort_order>1</sort_order>
                </multiupdate_price>
            </resources>
        </acl>
    </api>
</config>

Then I added the ‘wsdl.xml’ file, which describes the new method’s functionality. This file defines:

  • the complex types which represent the request and response internal structures;
  • the messages passed (the request and the response);
  • the port type which describes the API call with input (request) and output (response) messages;
  • the binding and the service:
<?xml version="1.0" encoding="UTF-8"?>
<definitions xmlns:typens="urn:{{var wsdl.name}}" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns="http://schemas.xmlsoap.org/wsdl/" name="{{var wsdl.name}}" targetnamespace="urn:{{var wsdl.name}}">
    <types>
        <schema xmlns="http://www.w3.org/2001/XMLSchema" targetnamespace="urn:Magento">
            <import namespace="http://schemas.xmlsoap.org/soap/encoding/" schemalocation="http://schemas.xmlsoap.org/soap/encoding/">
            <complextype name="productPriceMultiUpdateEntity">
                <all>
                    <element name="sku" type="xsd:string">
                    <element name="price" type="xsd:string">
                    <element name="special_price" type="xsd:string">
                    <element name="special_from_date" type="xsd:string">
                    <element name="special_to_date" type="xsd:string">
                    <element name="store_id" type="xsd:string">
                </element></element></element></element></element></element></all>
            </complextype>
            <complextype name="productPriceMultiUpdateEntityArray">
                <complexcontent>
                    <restriction base="soapenc:Array">
                        <attribute ref="soapenc:arrayType" wsdl:arraytype="typens:productPriceMultiUpdateEntity[]">
                    </attribute></restriction>
                </complexcontent>
            </complextype>
            <complextype name="productPriceMultiUpdateResultEntity">
                <all>
                    <element name="sku" type="xsd:string">
                    <element name="store_id" type="xsd:string">
                    <element name="error" type="xsd:string">
                </element></element></element></all>
            </complextype>
            <complextype name="productPriceMultiUpdateEntityResultArray">
                <complexcontent>
                    <restriction base="soapenc:Array">
                        <attribute ref="soapenc:arrayType" wsdl:arraytype="typens:productPriceMultiUpdateResultEntity[]">
                    </attribute></restriction>
                </complexcontent>
            </complextype>
        </import></schema>
    </types>
    <message name="productPriceMultiUpdateRequest">
        <part name="sessionId" type="xsd:string">
        <part name="products" type="typens:productPriceMultiUpdateEntityArray">
    </part></part></message>
    <message name="productPriceMultiUpdateResponse">
        <part name="result" type="typens:productPriceMultiUpdateEntityResultArray">
    </part></message>
    <porttype name="{{var wsdl.handler}}PortType">
        <operation name="productPriceMultiUpdate">
            <documentation>Multiupdate product price</documentation>
            <input message="typens:productPriceMultiUpdateRequest">
            <output message="typens:productPriceMultiUpdateResponse">
        </output></operation>
    </porttype>
    <binding name="{{var wsdl.handler}}Binding" type="typens:{{var wsdl.handler}}PortType">
        <soap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap/http">
        <operation name="productPriceMultiUpdate">
            <soap:operation soapaction="urn:{{var wsdl.handler}}Action">
            <input>
                <soap:body namespace="urn:{{var wsdl.name}}" use="encoded" encodingstyle="http://schemas.xmlsoap.org/soap/encoding/">
             
            <output>
                <soap:body namespace="urn:{{var wsdl.name}}" use="encoded" encodingstyle="http://schemas.xmlsoap.org/soap/encoding/">
            </soap:body></output>
        </soap:body></soap:operation></operation>
    </soap:binding></binding>
    <service name="{{var wsdl.name}}Service">
        <port name="{{var wsdl.handler}}Port" binding="typens:{{var wsdl.handler}}Binding">
            <soap:address location="{{var wsdl.url}}">
        </soap:address></port>
    </service>
</definitions>

If you are using the Magento SOAP API with WS-I compliance enabled (‘Services -> Magento Core API -> WS-I Compliance’), you need to define the ‘wsi.xml’ configuration file, since the WSDL nodes structure is different than the one in the ‘wsdl.xml’ file. This was not my case so, for the sake of simplicity, I will proceed with my implementation without the WS-I compliance.

The API model classes

I first created the ‘Evozon_API_Model_Product_Price_Api_V2’ class, which performs the mass price update. As previously mentioned by the configurations inside ‘api.xml’, this class includes a method called ‘multiupdatePrice()’, which receives an array of product prices data and returns a result array. This is the method that is going to do all the hard work.

class Evozon_API_Model_Product_Price_Api_V2 extends Evozon_API_Model_Product_Price_Api
{
    //....
 
    /**
     * Will perform product prices bulk update operations and will return an array containing
     * the SKUs of successfully updated products as well as the those included in incorrect 
     * provided data chunks
     *
     * @author Diana Costin <‍diana.costin@evozon.com>
     * @param array $products
     * @return  array
     */
    public function multiupdatePrice($products)
    {
        $this->_init();
         
        $processedData = $productIds = [];        
 
        foreach ($products as $product) {
            $validationResult = $this->_validateData($product);
            $error = "";
            if (!isset($validationResult[$product->sku."-".$product->store_id]['error']) &&
                isset($validationResult[$product->sku."-".$product->store_id]['product_id'])) {
 
                $productId = $validationResult[$product->sku."-".$product->store_id]['product_id'];
                unset($validationResult[$product->sku."-".$product->store_id]['product_id']);
                $saved = $this->_updateProductPrices($product, $productId);
 
                if ($saved) {
                    $productIds[] = $productId;
                }
            } else {
                if (!empty($validationResult[$product->sku."-".$product->store_id]['error'])) {
                    $error = $validationResult[$product->sku."-".$product->store_id]['error'];
                }
            }
            $processedData[]= array(
                "sku"      => $product->sku,
                "store_id" => $product->store_id,
                "error"    => $error
            );
        }
 
        if (!empty($productIds)) {
            // expose successfully updated product ids and overall processed data via event
            Mage::dispatchEvent('evozon_api_price_update', array(
                'product_data' => $processedData,
                'product_ids'  => $productIds
            ));
 
 
        }
        return $processedData;
    }
 
   //...
}

This class extends the ‘Evozon_API_Model_Product_Price_Api’, which would have been the class used for SOAP v1 calls as you may have already guessed. In our example, we use only SOAP v2, but I left ‘an open door’ for future implementations if at some point we need to implement the same mass update method for the v1 mechanism.

For now, the method ‘multiupdatePrice()’ exists in this class, but it doesn’t do anything :). Other than that, the class includes database connection resources retrieval, which will later be used by the child class ‘Evozon_API_Model_Product_Price_Api_V2’.

class Evozon_API_Model_Product_Price_Api extends Mage_Api_Model_Resource_Abstract
{
    protected $_resource;
    protected $_writeConnection;
    protected $_readConnection;
 
    /**
     * Initialize DB connection
     *
     * @author Diana Costin <‍diana.costin@evozon.com>
     * @return $this
     */
    protected function _initResource()
    {
        if (is_null($this->_resource)) {
            $this->_resource = Mage::getSingleton('core/resource');
        }
        if (is_null($this->_writeConnection)) {
            $this->_writeConnection = $this->_resource->getConnection('core_write');
        }
        if (is_null($this->_readConnection)) {
            $this->_readConnection = $this->_resource->getConnection('core_read');
        }
 
        return $this;
    }
 
    /**
     * Update product prices in bulk -- method unavailable for SOAP V1
     *
     * @param array $products
     * @return boolean
     */
    public function multiupdatePrice($products)
    {
    }
}

Tape rewind

Let’s get back to the child class ‘Evozon_API_Model_Product_Price_Api_V2’ and break the ‘multiupdatePrice()’ method implementation into smaller and easier-to-digest chunks. I will keep my focus on this method and not overload you with the full class code, as each of the used methods will be described below anyway.

Call to method ‘_init()’ is adding values to protected properties of the current class, which will be used later in the implementation of the price data saving:

/**
 * Will initialize protected properties
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @return array
 */
protected function _init()
   {
       $this->_initResource();
       $this->_attributeIds = $this->_getAttributeIds();
       $this->_productTypeId = Mage::getSingleton('eav/config')->getEntityType(Mage_Catalog_Model_Product::ENTITY)->getId();
       $this->_decimalTable = $this->_resource->getTableName('catalog_product_entity_decimal');
       $this->_datetimeTable = $this->_resource->getTableName('catalog_product_entity_datetime');
 
       return $this;
   }
/**
 * Will retrieve the product price attribute ids as an associative array of attribute_code => attribute_id
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @return array
 */
private function _getAttributeIds()
{
    $productEntityTable = $this->_writeConnection->getTableName('eav_attribute');
    $select = $this->_writeConnection->select()
        ->from(array('eav' => $productEntityTable), array('attribute_code', 'attribute_id'))
        ->where('eav.attribute_code IN (?)', array("price", "special_price", "special_from_date", "special_to_date"));
    $attributesData = $this->_writeConnection->fetchPairs($select);
 
    return $attributesData;
}

Validate your data

Before changing the products’ prices, we need to validate the data in order to make sure that only correct and secure data is saved inside the database. The call to ‘_validateData()’ points to the class ‘Evozon_API_Model_Product_Price_Validator’, which will be used for all data checks (I added this class as a stand-alone one, no need to extend anything).

/**
 * Will check if the received data is valid
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param  StdClass $product
 * @return int valid product ID
 */
private function _validateData($product)
{
    /** @type Evozon_API_Model_Product_Price_Validator $validator */
    $validator = Mage::getSingleton("evozon_api/product_price_validator");
 
    // check XML data
    $checkResult = $validator->validate($product);
 
    return $checkResult;
}

From my point of view, the validation process should consist of two major parts: static and dynamic data validations.

The ‘Static data validation’ part, as I called it, includes simple checks which confirm whether the data provided is correct or not, that is if all mandatory fields are included in the provided XML and all included fields have the corresponding data type:

  • ‘sku’ field is mandatory, since this is the common denominator used by the two systems when referencing the same product entity;
  • ‘store_id’ is also mandatory, since we are saving product prices on the website by referencing the ID of the store associated to it;
  • ‘price’ field should be included every time and must be a numeric value greater than 0 (we don’t allow 0 priced products);
  • same goes for ‘special_price’ value if it’s included in the call;
  • date fields ‘special_from_date’ and ‘special_to_date’ must have a correct date format (I used ‘Y-m-d’) if included in the call.

Some sample snippets:

/**
 * Will check if the given product price data is valid
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param $product
 * @return array
 */
private function _checkStaticData($product)
{
    $helper = Mage::helper("evozon_api");
 
    if (!isset($this->_result[$product->sku."-".$product->store_id]['error'])) {
        $this->_result[$product->sku."-".$product->store_id]['error'] = null;
    }
    // check mandatory fields
    if (empty($product->sku)) {
        $this->_result[$product->sku."-".$product->store_id]['error'] .= 
           "The SKU field is mandatory, no SKU provided.";
    }
 
    if (strlen($product->price) === 0) {
        $this->_result[$product->sku."-".$product->store_id]['error'] .= 
           "The price field is mandatory, no price provided. ";
    }
 
    if (strlen($product->store_id) === 0) {
        $this->_result[$product->sku."-".$product->store_id]['error'] .= 
           "The store ID field is mandatory, no store ID provided. ";
    }
 
    if (!is_numeric($product->price)) {
        $this->_result[$product->sku."-".$product->store_id]['error'] .=
           sprintf( "The price field should be numeric, '%s' is not a number. ", $product->price);
    }
 
    if ($product->price <= 0) {
        $this->_result[$product->sku."-".$product->store_id]['error'] .=
           "The price should not be less than or equal to 0.";
    }
 
    // check optional fields if they are given as input
    if (property_exists($product, 'special_price') && strlen($product->special_price)) {
        if (!is_numeric($product->special_price)) {
            $this->_result[$product->sku."-".$product->store_id]['error'] .=
                sprintf("If provided, the special price field should be numeric, '%s' is not a number. ",
                    $product->special_price
                );
        }
    }
 
    if (property_exists($product, 'special_price') && strlen($product->special_price)) {
        if ($product->special_price <= 0) {
            $this->_result[$product->sku."-".$product->store_id]['error'] .=
               "The special price should not be less than or equal to 0.";
        }
    }
 
    if (property_exists($product, 'special_from_date') && strlen($product->special_from_date)) {
        if (!$this->_isValidDate($product->special_from_date)) {
            $this->_result[$product->sku."-".$product->store_id]['error'] .=
                sprintf("If provided, the ‘special from’ field should be date('Y-m-d'), '%s' is not a valid date format. ",
                    $product->special_from_date
                );
        }
    }
 
    if (property_exists($product, 'special_to_date') && strlen($product->special_to_date)) {
        if (!$this->_isValidDate($product->special_to_date)) {
            $this->_result[$product->sku."-".$product->store_id]['error'] .=
               sprintf("If provided, the ‘special to’ field should be date ('Y-m-d'), '%s' is not a valid date format. ",
                    $product->special_to_date
                );
        }
    }
 
    return $this->_result;
}
 
/**
 * Will check if the given date is a valid one
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param string $date
 * @return  bool
 */
private function _isValidDate($date)
{
    $createdDate = DateTime::createFromFormat('Y-m-d', $date);
    return $createdDate && $createdDate->format('Y-m-d') === $date;
}

The array ‘$this->_result’, with its particular keys in ‘ – ’ format, is used in order to identify the validation violations and to report them back to the API class handler, and then back to the caller. You can choose your own keys in order to specify which product data and on which store is faulty.

The ‘Dynamic data validation’

part will check whether the provided data matches products and store ids from Magento’s database. All these checks are performed by executing Zend style select statements in order to provide a good performance. So we will check:

  • if given SKUs correspond to existing products;
  • if the provided store IDs match stores which are assigned to existing websites;
  • if the products with the given SKUs are assigned to the websites referenced by the given store IDs.

A few examples:

/**
 * Will check if the given SKU is a valid product SKU
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param string $sku
 * @return bool[/fusion_text][/fusion_text][/fusion_builder_column][/fusion_builder_column][/fusion_builder_row][/fusion_builder_container][/fusion_builder_row][/fusion_builder_container]

int */ protected function _checkIfProductExists($sku) { $productEntityTable = $this->_readConnection->getTableName('catalog_product_entity'); $select = $this->_readConnection->select() ->from(array('ce' => $productEntityTable), array('entity_id')) ->where('ce.sku=?', $sku); $productId = $this->_readConnection->fetchOne($select); if (empty($productId)){ return false; } return $productId; }

/**
 * Will check if the given ID is a valid store ID
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param  int $storeId
 * @return bool
 */
public function checkIfStoreExists($storeId)
{
    $productEntityTable = $this->_readConnection->getTableName('core_store');
    $select = $this->_readConnection->select()
        ->from(array('cs' => $productEntityTable), array('code'))
        ->where('cs.store_id=?', $storeId);
    $storeCode = $this->_readConnection->fetchOne($select);
 
   return !empty($storeCode); 
}
/**
 * Will check if the given product is assigned to the given website
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param  int $websiteId
 * @param  int $productId
 * @return bool
 */
public function checkIfProductIsAssignedToWebsite($websiteId, $productId)
{
    if ($websiteId == 0) {
        return true;
    }
 
    $productWebsiteTable = $this->_readConnection->getTableName('catalog_product_website');
 
    $select = $this->_readConnection->select()
        ->from(array('cpw' => $productWebsiteTable), array('website_id'))
        ->where('cpw.website_id=?', $websiteId)
        ->where('cpw.product_id=?', $productId);
    $storeCode = $this->_readConnection->fetchOne($select);
 
    return !empty($storeCode); 
}
/**
 * Will retrieve a websiteId based on a given storeId
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param $storeId
 * @return bool

int */ private function _getWebsiteIdByStoreId($storeId) { // cache values if (is_null($this->_storesArray)) { $storeTable = $this->_readConnection->getTableName('core_store'); $select = $this->_readConnection->select() ->from(array('cs' => $storeTable), array('store_id', 'website_id'));

$this->_storesArray = $this->_readConnection->fetchPairs($select); }

if (!isset($this->_storesArray[$storeId])) { return false; } return $this->_storesArray[$storeId]; }

Any product data items which don’t satisfy the validation conditions should be rejected and reported back to the caller with a specific error message. All product data items which satisfy the conditions should be saved in the Magento database and successful updates should be reported back to the caller also.

In my first implementation attempt, the API method returned an array of successfully saved products or a fault code and message every time the provided data was not correct (‘all or nothing’ approach). This mechanism induced a ‘bottleneck’ effect, since no product price was saved until the faulty data was corrected.

In a later improvement, I modified the implementation so that the correct data is saved and faulty chunks are returned with a specific error message referencing a list of SKUs and store IDs for a particular incorrect behavior, instead of using API fault messages. This approach eased the data check on the caller side, while not blocking the price synchronization process.

Database operations

Once the data is validated, we can move forward. We got to the fun part, which is the data saving. By now you know we are building a fast mechanism, but you may have wondered how we were going to do this in an optimal way. The answer: Zend style update queries to the database.

We will prepare the data first, as we need to know what kind of operation is required (insert/update or delete), which are the tables and attributes involved in these operations.

We are going to save the price by referencing the ‘store_id’. The price attribute has website scope in our case, so our changes will set the price per website actually (there isn’t a customization to set price on an individual store per se). We are using the store_id directly in order to avoid a secondary computation regarding extraction of the data and also for consistency reasons since all other existing SOAP calls manipulating products are performed by usage of ‘store_id’.

/**
 * Prepare data for multiple update queries
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param  StdClass $product
 * @return array
 */
private function _prepareProductData($product)
{       
    $preparedDeleteData = [];
     
    $preparedUpdateData['price'] = array(
        'tableName'    => $this->_decimalTable,
        'entityTypeId' => $this->_productTypeId,
        'attributeId'  => $this->_attributeIds['price'],
        'storeId'      => $product->store_id,
        'value'        => $product->price
    );
 
    if (property_exists($product, 'special_price')) {
        if (strlen($product->special_price)) {
            $preparedUpdateData['special_price'] = array(
                'tableName'    => $this->_decimalTable,
                'entityTypeId' => $this->_productTypeId,
                'attributeId'  => $this->_attributeIds['special_price'],
                'storeId'      => $product->store_id,
                'value'        => $product->special_price
            );
        } else {
            $preparedDeleteData['special_price'] = array(
                'tableName'   => $this->_decimalTable,
                'attributeId' => $this->_attributeIds['special_price'],
                'storeId'     => $product->store_id
            );
        }
    }
    if (property_exists($product, 'special_from_date')) {
        if (strlen($product->special_from_date)) {
            $preparedUpdateData['special_from_date'] = array(
                'tableName'    => $this->_datetimeTable,
                'entityTypeId' => $this->_productTypeId,
                'attributeId'  => $this->_attributeIds['special_from_date'],
                'storeId'      => $product->store_id,
                'value'        => date($product->special_from_date)
            );
        } else {
            $preparedDeleteData['special_from_date'] = array(
                'tableName'   => $this->_datetimeTable,
                'attributeId' => $this->_attributeIds['special_from_date'],
                'storeId'     => $product->store_id
            );
        }
    }
    if (property_exists($product, 'special_to_date')) {
        if (strlen($product->special_from_date)) {
            $preparedUpdateData['special_to_date'] = array(
                'tableName'    => $this->_datetimeTable,
                'entityTypeId' => $this->_productTypeId,
                'attributeId'  => $this->_attributeIds['special_to_date'],
                'storeId'      => $product->store_id,
                'value'        => date($product->special_to_date)
            );
        } else {
            $preparedDeleteData['special_to_date'] = array(
                'tableName'   => $this->_datetimeTable,
                'attributeId' => $this->_attributeIds['special_to_date'],
                'storeId'     => $product->store_id
            );
        }
    }
 
    $result['delete']          = $preparedDeleteData;
    $result['insertDuplicate'] = $preparedUpdateData;
 
    return $result;
}

Note: no, you cannot delete the price attribute value of a product by API 🙂

Now let’s save this to the database. Take a look below:

/**
 * Will perform the price update on a given product ID and data supplied by the SOAP call
 *
 * @author Diana Costin <‍diana.costin@evozon.com>
 * @param  StdClass $product
 * @param  int $productId
 * @return bool
 */
private function _updateProductPrices($product, $productId)
{
    $productDataArray       = $this->_prepareProductData($product);
    $hasInsertUpdateData = isset($productDataArray['insertDuplicate']);
    $hasDeleteData           = isset($productDataArray['delete']);

try { if ($hasDeleteData