Guide: Performance Tuning¶

When working with large datasets, the performance of your data import can become critical. This guide covers the key parameters and strategies you can use to tune the import process for maximum speed and efficiency.

The primary way to control performance is by adjusting the parameters passed to the fluvo import command, which you can set in the params dictionary in your transform.py script.

Choosing the Right Protocol¶

The easiest performance win is choosing the right RPC protocol. For Odoo 10 and newer, switching from XML-RPC to JSON-RPC can provide approximately 30% faster imports.

CLI Option: --protocol
Config Key: protocol
Default: xmlrpc

Protocol Comparison¶

Protocol	Odoo Version	Performance	Security
`xmlrpc`	8+ (all)	Baseline	HTTP
`xmlrpcs`	8+ (all)	Baseline	HTTPS
`jsonrpc`	10+	~30% faster	HTTP
`jsonrpcs`	10+	~30% faster	HTTPS
`json2`	19+	Best	HTTP
`json2s`	19+	Best	HTTPS

Why JSON-RPC is Faster¶

Smaller payloads: JSON is more compact than XML
Faster parsing: Python’s JSON parser is highly optimized
Better data types: Native support for all Python types

Example¶

# Switch to JSON-RPC for better performance
fluvo import --protocol jsonrpc --connection-file conf/connection.conf ...

Or set it permanently in your config file:

[Connection]
hostname = odoo.example.com
database = mydb
login = admin
password = secret
protocol = jsonrpc

Recommendation

For production imports on Odoo 10+, always use jsonrpcs (JSON-RPC over HTTPS) for both security and performance.

Using Multiple Workers¶

The most significant performance gain comes from parallel processing. The import client can run multiple “worker” processes simultaneously, each handling a chunk of the data.

CLI Option: --worker
params Key: 'worker'
Default: 1

By increasing the number of workers, you can leverage multiple CPU cores on the machine running the import script and on the Odoo server itself.

Example¶

To use 4 parallel processes for an import:

# In your transform.py script

import_params = {
    'model': 'sale.order',
    'worker': 4, # Use 4 workers
    # ... other params
}

processor.process(
    mapping=my_mapping,
    filename_out='data/sale_order.csv',
    params=import_params
)

This will add the --worker=4 flag to the command in your generated load.sh script.

Trade-offs and Considerations¶

CPU Cores: A good rule of thumb is to set the number of workers to be equal to, or slightly less than, the number of available CPU cores on your Odoo server.
Database Deadlocks: The biggest risk with multiple workers is the potential for database deadlocks. This can happen if two workers try to write records that depend on each other at the same time. The library’s two-pass error handling system is designed to mitigate this.

Tuning Workers for Your Server¶

The optimal number of workers depends on your Odoo server’s database connection pool. Check these settings in your odoo.conf:

db_maxconn: Maximum database connections per Odoo worker (default: 64)
workers: Number of Odoo worker processes

Recommended formula: --worker = db_maxconn / odoo_workers

For example, with db_maxconn = 64 and workers = 4:

Maximum safe value: 64 / 4 = 16 workers

# For a server with 4 Odoo workers and db_maxconn=64
fluvo import --worker 12 --protocol jsonrpc ...

Warning

Setting --worker too high can exhaust your database connection pool, causing “too many connections” errors. Start with a lower value and increase gradually while monitoring server performance.

Solving Concurrent Updates with `--groupby`¶

The --groupby option is a powerful feature designed to solve the “race condition” problem that occurs during high-performance, multi-worker imports.

CLI Option: --groupby
params Key: 'groupby' (Note: use groupby, not split)
Default: None

The Problem: A Race Condition¶

Imagine you are using multiple workers to import contacts that all link to the same parent company.

Worker 1 takes a contact and tries to update “Company A”.
At the exact same time, Worker 2 takes another contact and also tries to update “Company A”.

The database locks the company record for Worker 1, so when Worker 2 tries to access it, it fails with a “concurrent update” error.

The Solution: The “Sorting Hat”¶

The --groupby option acts like a “sorting hat.” Before the import begins, it looks at the column you specify (e.g., parent_id/id) and ensures that all records with the same value in that column are sent to the exact same worker.

This guarantees that two different workers will never try to update the same parent record at the same time, completely eliminating these errors.

Visualizing the Difference¶

--- config: theme: redux --- graph TD subgraph subGraph0["Without --groupby (High Risk of Error)"] A["Records:
C1 (Parent A)
C2 (Parent B)
C3 (Parent A)"] --> B{Random Distribution}; B --> W1["Worker 1 gets C1"]; B --> W2["Worker 2 gets C3"]; B --> W3["Worker 3 gets C2"]; W1 -- "tries to update" --> P_A(("Parent A")); W2 -- "tries to update" --> P_A; W3 -- "updates" --> P_B(("Parent B")); P_A --> X["ERROR
Concurrent Update"]; end subgraph subGraph1["With --groupby=parent_id/id (Safe)"] C["Records:
C1 (Parent A)
C2 (Parent B)
C3 (Parent A)"] --> D{Smart Distribution}; D -- "parent_id = A" --> W3b["Worker 1 gets C1, C3"]; D -- "parent_id = B" --> W4b["Worker 2 gets C2"]; W3b --> S1[("Update Parent A")]; W4b --> S2[("Update Parent B")]; S1 & S2 --> Y(["SUCCESS"]); end style W1 fill:#FFF9C4 style W2 fill:#C8E6C9 style W3 fill:#FFE0B2 style W3b fill:#FFF9C4 style W4b fill:#C8E6C9 style D fill:#BBDEFB style B fill:#BBDEFB style subGraph0 fill:transparent style subGraph1 fill:transparent style Y stroke:#00C853

Example¶

To safely import contacts in parallel, grouped by their parent company:

# In your transform.py script

import_params = {
    'model': 'res.partner',
    'worker': 4,
    # This is the crucial part
    'groupby': 'parent_id/id', # The internal key is 'groupby'
}

This will add --groupby=parent_id/id to your generated load.sh script.

When to use `--groupby` (and when not to)¶

--groupby only matters for parallel imports — it trades a little parallelism for safety. Reach for it when all of these hold:

You run with --worker greater than 1. With a single worker there is no concurrency, so --groupby does nothing useful.
Your records share a written/locked related record — e.g. many res.partner contacts under the same company, or order lines on the same order. That sharing is what triggers the “concurrent update” errors.
The grouping column has many distinct values, each with several rows, so partitioning still leaves real parallelism.

Avoid --groupby when:

You import single-threaded (--worker 1) — it is pure overhead.
Records are independent (no shared parent/relation) — there is nothing to serialize.
The column has few distinct values (e.g. a two-value status). Grouping then creates a couple of huge serial partitions, which kills parallelism and holds locks on each shared row longer — making contention worse, not better. Group by the column that is actually contended (usually the shared *_id/id), not just any column.

Important

--groupby is not for import ordering. Getting a child imported after its parent is handled by the two-pass deferral (--auto-defer / --deferred-fields), not --groupby. Use --groupby for lock contention in parallel mode and --auto-defer for relational correctness — they compose.

Letting Fluvo choose the column: `--auto-groupby`¶

If you would rather not pick the column yourself, add --auto-groupby (and leave --groupby unset). During pre-flight, Fluvo inspects the data and picks a non-self many2one column to partition by. Among columns with real duplication (at least two rows per target), it chooses the one with the highest cardinality — i.e. the most distinct targets. That deliberately avoids the low-cardinality trap described above: it groups contended writes together while keeping the largest number of partitions, so parallelism is preserved. It is conservative (a column needs real duplication and more than one group, otherwise it groups by nothing), off by default, and never overrides an explicit --groupby.

import_params = {
    'model': 'res.partner',
    'worker': 4,
    'auto_groupby': True,  # Fluvo picks the deadlock-avoidance column for you
}

Understanding Batch Size (`--size`)¶

The --size option is one of the most critical parameters for controlling the performance and reliability of your imports. In simple terms, it controls how many records are processed in a single database transaction.

To understand why this is so important, think of it like going through a checkout at a grocery store.

CLI Option: --size
params Key: 'size'
Default: 1000 (or the default set in the application’s configuration)

The Default Odoo Behavior: One Big Basket¶

When you use Odoo’s standard import wizard, it’s like putting all of your items (every single row in your file) into one giant shopping basket. This “all-or-nothing” approach has two major problems:

Transaction Timeouts: The Odoo server has a time limit to process your entire basket. If you have too many items (a very large file), it might take too long, and the server will give up with a “Transaction timed out” error. None of your records are imported.
Single Point of Failure: If just one record in your giant basket is “bad” (e.g., a missing price), the server rejects the entire basket. All of your other perfectly good records are rejected along with the single bad one.

How `--size` Solves the Problem: Multiple Small Baskets¶

The fluvo library allows you to break up your import into smaller, more manageable chunks. When you use --size 100, you are telling the tool to use multiple, smaller baskets, each containing only 100 items.

This solves both problems:

Each small basket is processed very quickly, avoiding server timeouts.
If one small basket has a bad record, only that basket of 100 records is rejected. All the other baskets are still successfully imported.

Visualizing the Difference¶

--- config: theme: redux --- flowchart TD subgraph subGraph0["Default Odoo Import (One Big Basket)"] B{"One Large Transaction
Size=1000"} A["1000 Records"] D@{ label: "FAIL
All 1000 records rejected" } C["Odoo Database"] end subgraph subGraph1["fluvo with --size=100 (Multiple Small Baskets)"] F{"Transaction 1
100 records"} E["1000 Records"] G["Odoo Database"] H{"Transaction 2
100 records"} I@{ label: "FAIL
Only 100 records rejected" } J["...continues with Transaction 3"] end A --> B B -- Single Error --> D B -- No Errors --> C E --> F F --> G & H H -- Single Error --> I H -- No Errors --> G I --> J J --> G D@{ shape: rect} C@{ shape: cyl} G@{ shape: cyl} I@{ shape: rect} style C fill:#AA00FF style G fill:#AA00FF style subGraph0 fill:transparent style subGraph1 fill:transparent

Trade-offs and Considerations¶

Larger Batch Size: Can be faster as it reduces the overhead of creating database transactions, but consumes more memory. If one record in a large batch fails, Odoo may reject the entire batch.
Smaller Batch Size: More resilient to individual record errors and consumes less memory, but can be slower due to increased network overhead.
WAN Performance: For slow networks, sending smaller chunks of data is often more stable than sending one massive payload.

Handling Server Timeouts (`limit-time-real`)¶

A common source of import failures, especially with large or complex data, is the Odoo server’s built-in request timeout.

What it is: Odoo servers have a configuration parameter called limit-time-real which defines the maximum time (in seconds) a worker process is allowed to run before it is automatically terminated. The default value is 120 seconds (2 minutes).
The Problem: If a single batch of records takes longer than this limit to process (due to complex computations, custom logic, or a very large batch size), the server will kill the process, and your import will fail for that batch.
The Solution: The solution is to reduce the batch size using the --size option. By sending fewer records in each transaction, you ensure that each individual transaction can be completed well within the server’s time limit.

Tip: If your imports are failing with “timeout” or “connection closed” errors, the first thing you should try is reducing the --size value (e.g., from 1000 down to 200 or 100).

Adaptive Throttling (`--adaptive-throttle`)¶

For long-running imports or when working with servers under variable load, the --adaptive-throttle option provides intelligent, automatic performance tuning.

CLI Option: --adaptive-throttle
Default: Disabled

What It Does¶

When enabled, the import client monitors server response times and automatically adjusts both:

Delays between batches - Adds pauses when the server is slow
Batch sizes - Dynamically splits batches when the server is stressed

Health States and Behavior¶

The throttle controller categorizes server health into four states based on response times:

Health State	Response Time	Batch Size	Delay
Healthy	< 2s	100%	0s
Degraded	2-5s	75%	0.5s
Stressed	5-10s	50%	2s
Overloaded	> 10s	25%	5s

How Batch Scaling Works¶

When the server health degrades, the throttle controller automatically splits batches:

Original batch size: 100 records
Server health: STRESSED (50% multiplier)
Actual batch size: 50 records (split into 2 sub-batches)

The controller logs these adjustments:

INFO: Adaptive batch scaling: reducing batch size from 100 to 50 (server health: STRESSED)
INFO: Adaptive batch scaling: restored to full batch size 100 (server health: HEALTHY)

When to Use It¶

Long imports (1000+ records) where server load may vary
Shared servers where other users/processes compete for resources
Production environments where you want to avoid overloading the server
Unreliable networks where timeouts are common

Example¶

# Enable adaptive throttling for a large import
fluvo import \
    --connection-file conf/connection.conf \
    --file data/products.csv \
    --model product.product \
    --size 100 \
    --adaptive-throttle

Note

Adaptive throttling is conservative by default. It prioritizes stability over speed, making it ideal for production imports where reliability is more important than raw performance.

Mapper Performance¶

The choice of mappers can impact performance.

Fast Mappers: Most mappers, like val, const, concat, and num, are extremely fast as they operate only on the data in the current row.
Slow Mappers: The mapper.relation function should be used with caution. For every single row, it performs a live search request to the Odoo database, which can be very slow for large datasets.

Recommendation: If you need to map values based on data in Odoo, it is much more performant to first export the necessary mapping data from Odoo (e.g., using fluvo export) into a Python dictionary or a separate CSV file, and then use the much faster mapper.map_val or other in-memory lookups to do the translation.

Minimizing Odoo’s ORM Work¶

For most imports the wall-clock time is dominated by what Odoo’s ORM does on the server — resolving each relational value with a name_search, recomputing fields, firing mail/tracking side-effects — not by the client sending data. The biggest wins come from doing that work on the client, in Polars, before load().

Pre-resolve relations (`--resolve-relation`)¶

When a column holds a natural key (a country name, a partner reference), Odoo would normally run a name_search per row to turn it into a database id. Instead, fluvo can resolve the whole column in one vectorized Polars join against a cached id-map of the related model, and hand load() an already-resolved field/id column — so Odoo performs no name_search for that field.

# 'country' column holds res.country codes -> resolve into country_id, no name_search.
fluvo import --connection-file conf/connection.conf --file partners.csv \
    --model res.partner \
    --resolve-relation country:res.country:code:country_id

Format: source_column:model:key_field:relation_field[:xmlid|dbid] (repeatable). xmlid (default) is portable; dbid is fastest (zero server resolution) but database-specific. The id-map is cached to parquet and reused across runs. From a transform script, the same is available as Processor.resolve_relation(...).

Measured: importing 2,000 res.partner records with country_id ran in 6.96s the naive way (Odoo name_search per row) versus 4.27s pre-resolved — a 1.6× speedup even though the data has only 5 distinct countries (which Odoo caches). The win grows with relation cardinality: a column where most values are distinct (suppliers, categories, partner refs) gets no server-side name_search caching, so pre-resolving it saves far more.

Skip unchanged records (`--skip-unchanged`)¶

On a re-import, fluvo can fetch the current field values, compare them to the incoming rows with a vectorized Polars anti-join, and send only the rows that are new or changed. Re-running an unchanged dataset then sends ~0 rows.

fluvo import --connection-file conf/connection.conf --file partners.csv \
    --model res.partner --skip-unchanged

Suppress side-effects (default) and auto-clean¶

By default fluvo imports with tracking_disable, mail_create_nolog, and mail_notrack set, so Odoo skips chatter/tracking work (override any of them with --context '{"tracking_disable": false}'). --auto-clean applies safe, type-aware coercions (whitespace, null tokens, booleans) on the client before load; an uncoercible value routes that row to the fail file rather than aborting the batch.

These optimizations are all opt-in (except the default side-effect suppression) and correctness-preserving: the resulting Odoo state is identical to a naive import.

Performance Strategy for Relational Data (Automatic Two-Pass Import)¶

A common performance trap when importing data is writing to relational fields (like parent_id) that have an inverse relation (like child_ids). In a single pass, updating the parent_id for 500 child records could cause Odoo to re-write the child_ids list on the single parent record 500 times, slowing the import to a crawl.

fluvo solves this problem automatically with its smart, two-pass import engine.

The New Workflow: Automatic and Efficient¶

When the pre-flight check detects a self-referential or many2many field, the importer automatically switches to this high-performance, two-pass strategy:

Pass 1 (Create): The tool automatically excludes the relational fields from the initial import. It then uses the fast, multi-threaded load method to create all the base records. This completely avoids the slow, cascading update problem.
Pass 2 (Write): After all records have been created, the tool performs a second, multi-threaded write pass that efficiently sets the relational fields (e.g., parent_id) on all the newly created records.

Because this process is automatic, you no longer need to manually use --ignore as a performance workaround for these types of fields.

The Correct Use of `--ignore`¶

With the new smart importer, the --ignore option should be used for its original purpose: to completely exclude a column from the import process. Use it for source columns that you do not want to be sent to Odoo in either Pass 1 or Pass 2.

CLI Option: --ignore
params Key: 'ignore'

# In your transform.py script

# The mapping still defines the relationship
my_mapping = {
    'id': mapper.m2o_map('child_', 'Ref'),
    'name': mapper.val('Name'),
    'parent_id/id': mapper.m2o_map('parent_', 'ParentRef'), # Define the mapping
}

# The params tell the client to IGNORE the parent_id/id field during import
import_params = {
    'model': 'res.partner',
    'ignore': 'parent_id/id', # The field to ignore for direct import
}

processor.process(
    mapping=my_mapping,
    filename_out='data/contacts.csv',
    params=import_params
)

This will generate a load.sh script with the --ignore=parent_id/id flag. The import client will then skip this column, avoiding the cascading updates entirely.

Guide: Performance Tuning¶

Choosing the Right Protocol¶

Protocol Comparison¶

Why JSON-RPC is Faster¶

Example¶

Using Multiple Workers¶

Example¶

Trade-offs and Considerations¶

Tuning Workers for Your Server¶

Solving Concurrent Updates with --groupby¶

The Problem: A Race Condition¶

The Solution: The “Sorting Hat”¶

Visualizing the Difference¶

Example¶

When to use --groupby (and when not to)¶

Letting Fluvo choose the column: --auto-groupby¶

Understanding Batch Size (--size)¶

The Default Odoo Behavior: One Big Basket¶

How --size Solves the Problem: Multiple Small Baskets¶

Visualizing the Difference¶

Trade-offs and Considerations¶

Handling Server Timeouts (limit-time-real)¶

Adaptive Throttling (--adaptive-throttle)¶

What It Does¶

Health States and Behavior¶

How Batch Scaling Works¶

When to Use It¶

Example¶

Mapper Performance¶

Minimizing Odoo’s ORM Work¶

Pre-resolve relations (--resolve-relation)¶

Skip unchanged records (--skip-unchanged)¶

Suppress side-effects (default) and auto-clean¶

Performance Strategy for Relational Data (Automatic Two-Pass Import)¶

The New Workflow: Automatic and Efficient¶

The Correct Use of --ignore¶

Solving Concurrent Updates with `--groupby`¶

When to use `--groupby` (and when not to)¶

Letting Fluvo choose the column: `--auto-groupby`¶

Understanding Batch Size (`--size`)¶

How `--size` Solves the Problem: Multiple Small Baskets¶

Handling Server Timeouts (`limit-time-real`)¶

Adaptive Throttling (`--adaptive-throttle`)¶

Pre-resolve relations (`--resolve-relation`)¶

Skip unchanged records (`--skip-unchanged`)¶

The Correct Use of `--ignore`¶