Guides / Sending and managing data / Send and update your data

Post-indexing transformations

Post-indexing transformations can help you optimize your data:

  • Data cleaning. Correct inconsistencies or errors discovered after indexing, rename attributes, or change data types.
  • A/B testing. Create variations of attributes for testing different ranking strategies.
  • Index optimizations. Bucket, compute new values, convert units of measure.

To apply post-indexing transformations, use the Fetch from Algolia connector.

Get started

  1. Go to the Algolia dashboard and select your Algolia application.
  2. On the left sidebar, select Data sources.
  3. On the Connectors page, select the Fetch from Algolia connector, and click Connect.
  4. Choose the index to transform.
  5. Configure your transformation: create a new transformation or reuse an existing one.
  6. Create the task. You can select two types of tasks:
    • On demand. You can manually trigger the task from the Task page in the dashboard, or using the API.
    • Scheduled. You can select a schedule from the list, or enter a custom schedule with a cron expression.

Costs

The Fetch from Algolia connector browses your index and re-applies the changes.

Depending of the indexing strategy you choose, this adds read and write operations. Theses operations might lead to costs, depending on your plan.

If you’re on an Algolia plan that charges for read and write operations, carefully choose the delay between connector runs.

Performance considerations

If you don’t need an Algolia-managed transformation step in your pipeline, prefer using the traditional way of sending records to Algolia instead, as using the Push Connector adds overhead to your ingestion process.

Constraints

  • Index size: no limit on the index size.
  • Duration: each run cannot take longer than 60 minutes.

How to select the right indexing strategy

To avoid data drift, which occurs when you have different sources of truth for your data, you need to carefully select your indexing strategy when using the Fetch from Algolia connector. This connector works by fetching data from your Algolia index and applying transformations. It’s crucial to understand how this interacts with your primary record ingestion method (for example API or other connectors).

Key limitation for real-time updates and full record updates

Be careful when using the Fetch from Algolia connector with ingestion methods that use the full record updates action with real-time updates. Full record updates replace the entire record. If a real-time update runs after the connector reads data from your index but before it completes the transformations, any transformation applied by the connector is lost. That’s because a real-time update will overwrite the entire record, including the transformed fields.

The following Algolia ecommerce integrations use the full record updates by default:

Other connectors and API clients using partial updates

Connectors and API clients using partial updates will work correctly with the Fetch from Algolia connector. Partial updates ensure that only the specified attributes are modified, preserving other attributes and avoiding conflicts.

To avoid data loss, carefully select your indexing strategy when using the Fetch from Algolia connector and prefer partial updates whenever possible.

  • Partial record updates (Recommended) allow creating, updating, and deleting records during the connector run. However, concurrent field updates will be overwritten.
  • Full record updates completely replace individual records, overriding changes made during the connector run. You should avoid this strategy.
  • Full reindexing completely replaces your index, overriding any changes made during the connector run. You should avoid this indexing strategy.

Example scenario of conflicts

The following scenario illustrates a potential issue when two connectors try to update the same index:

  1. API client:
    • Ingests product data, including price, category, and inventory.
    • Uses “Partial Update” action.
    • Runs on a cron job every hour.
  2. Fetch from Algolia Connector:
    • Fetches data from the Algolia index and adds a new field, discounted_price, based on the price.
    • Uses “Partial Update” action.

Potential Issue

An issue can arise if the Fetch from Algolia connector runs for a long time. For example, if the API client updates a product’s price at 10 AM, the Fetch from Algolia connector starts at 10:07 AM, and finishes after 11 AM, the calculated discounted_price attribute will be outdated. At 11 AM, the API client already updates the price again, which means the distributed_price attribute will be based on an outdated price attribute.

Recommendation

To avoid such issues, ensure that the Fetch from Algolia connector never runs concurrently with any other indexing operation. Alternatively, you can programmatically trigger your connector after your other indexing operations.

Did you find this page helpful?