Need

Deploying the Semarchy xDM Native App on Snowflake requires sizing several Snowflake resources appropriately to deliver the expected performance while optimizing infrastructure costs.

Unlike a traditional xDM deployment, where the application server and database are managed independently, the Native App relies on Snowflake services to provide both application execution and database processing.

Sizing therefore involves evaluating three independent components:

Snowpark Container Services Compute Pools, which host the xDM application runtime.
Snowflake Virtual Warehouses, which execute SQL processing and integration workloads.
Snowflake Storage, which stores application data and metadata.

Each of these components has different sizing drivers and scaling mechanisms. Understanding how they interact is essential when planning a new deployment or expanding an existing one.

This article provides practical guidance to help architects, Snowflake administrators, and implementation teams estimate an appropriate starting configuration and identify when scaling adjustments are required.

Summarized Solution

The xDM Native App should be sized by considering each Snowflake resource independently.

Compute Pools should be sized according to application workload, including user sessions, APIs, workflows, security model complexity, and memory consumption.
Virtual Warehouses should be sized according to SQL workload, including integration jobs, matching operations, joins, transformations, and concurrent processing.
Storage should be estimated from expected data volumes, enrichment strategy, historization, and anticipated growth.

As a general recommendation:

Start with a modest configuration.
Measure actual CPU, memory, and SQL performance.
Increase Compute Pool size when the application runtime becomes the bottleneck.
Increase Warehouse size when SQL execution becomes the bottleneck.
Separate interactive and batch workloads whenever possible.
Reassess storage estimates after the first implementation iterations using actual production data.

Detailed Solution

Understanding the xDM Native App Architecture

The xDM Native App relies on three distinct Snowflake resources.

Compute Pool

The Compute Pool hosts the xDM application runtime.

It is responsible for:

User sessions
REST APIs
Workflow execution
Model loading
Application orchestration

Virtual Warehouse

Virtual Warehouses execute SQL processing inside Snowflake, including:

Integration jobs
Matching and consolidation
Data transformations
User queries
Analytics

Storage

Storage contains:

Business data
Metadata
Match tables
Workflow data
Historical records

Each component scales independently and therefore should be sized independently.

Horizontal vs Vertical Scaling

Horizontal Scaling

Traditional xDM deployments often scale horizontally by adding application nodes behind a load balancer.

Additional nodes can be dedicated to:

REST APIs
Interactive users
Background processing

or simply increase overall concurrency.

The xDM Native App currently supports a single active application instance.

Consequently, horizontal scaling is not available.

The Compute Pool therefore behaves similarly to a single virtual machine, and its instance class determines the amount of available CPU and memory.

Although small Compute Pools may sustain moderate concurrent activity, there is no universal sizing rule because performance depends heavily on:

Data model complexity
Security model
Workflow configuration
Business views
API workload
User concurrency

Vertical Scaling

Most processing scalability is achieved at the database level through Snowflake Virtual Warehouses.

While the Compute Pool manages application execution, the Warehouse performs the SQL work.

This includes:

Bulk ingestion
Integration jobs
Match and merge operations
Transformations
User queries

As data volumes increase, warehouse sizing becomes the primary factor affecting processing performance.

Compute Pool Sizing

Several factors influence Compute Pool sizing.

Security Model Complexity

One of the largest consumers of application memory is the security model.

Memory usage increases with the number of distinct role combinations that users may impersonate.

Reducing unnecessary role combinations and simplifying workflow role mappings decreases memory consumption and leaves more resources available for user sessions.

Entity Views

Default entity views consume runtime resources.

If some default views are not required by the application, disabling them reduces both CPU and memory usage, particularly for UI-intensive deployments.

CPU vs Memory

Two different bottlenecks can occur.

Memory Bottleneck

Typical symptoms include:

Out-of-memory errors
Frequent garbage collection
Application instability
Random slowdowns

Memory should generally be considered the first limiting factor.

CPU Bottleneck

When sufficient memory is available, CPU becomes the limiting factor.

Typical symptoms include:

Increasing response times
Slow API calls
Sluggish UI performance

even though memory remains stable.

Practical Recommendation

For applications primarily serving:

Interactive users
Moderate API traffic

start with a Small or Medium Compute Pool and monitor CPU and memory usage.

For API-intensive environments with higher concurrency, consider selecting a larger Compute Pool from the beginning, since additional application instances cannot currently be added.

Virtual Warehouse Sizing

Warehouse sizing depends primarily on SQL workload rather than the number of connected users.

Important sizing factors include:

Volume of processed data
Number of joins
Sorting operations
Grouping operations
Window functions
Ranking
Deduplication logic
Concurrent integration jobs

Large matching operations and integration jobs typically benefit from larger warehouse sizes.

Scale Up vs Scale Out

Snowflake provides two complementary scaling strategies.

Scale Up

Increasing warehouse size allocates more compute resources to each query.

This generally improves performance for:

Large joins
Heavy transformations
Match and merge jobs
Complex SQL processing

If a single integration or matching job is too slow, scaling up is usually the appropriate first step.

Scale Out

Multi-cluster warehouses increase concurrency by adding additional clusters.

They do not necessarily reduce the execution time of a single query.

Instead, they allow more queries to execute simultaneously.

Consider multi-cluster warehouses when:

Multiple integration jobs execute concurrently
Interactive users compete with batch processing
Queries spend significant time waiting in queue

Separating Workloads

Although the application runtime consists of a single instance, database workloads can still be isolated.

A common architecture consists of dedicated warehouses such as:

WH_APP for UI interactions and REST APIs
WH_BATCH for integration jobs and heavy processing

Integration jobs can be configured to use a dedicated datasource connected to a specific warehouse by using the job parameter:

PARAM_DATASOURCE_NAME_SUFFIX

Separating workloads improves predictability and prevents heavy batch processing from affecting interactive users.

Estimating Storage Requirements

Storage estimation depends on several design decisions.

Important factors include:

Number of entities
Entity type
Average record size
Number of standardized attributes
Number of technical attributes
Match keys
Historization
Update frequency

Storage requirements can vary significantly depending on the implementation.

For example:

Preserving both source and standardized values consumes considerably more storage than overwriting source values.
Match keys, phonetic values, concatenated strings, and technical attributes increase row size.
Historization increases storage proportionally to update frequency.

Estimating Average Record Size

Average row size should represent the expected stored content rather than the maximum defined column size.

For example:

A VARCHAR(500) column does not typically contain 500 characters.

If the average stored value is approximately 20 characters, the estimate should be based on 20 bytes rather than 500.

Estimating from Existing Data

When sample source files are available, a practical estimate can be obtained by dividing: File Size by Number of Records.

This provides an approximate raw row size.

Additional storage should then be added for:

Standardized attributes
Enriched values
Technical columns
Match keys

Many implementations preserve both source and standardized values.

In such cases, a standardization ratio of approximately two means the stored data volume roughly doubles because each business attribute is accompanied by its standardized counterpart.

Growth Estimation

Storage estimation should also include:

Initial data volume
Expected daily inserts
Daily updates
Record historization
Expected retention period

When match ratios are unknown, conservative default assumptions should be used initially and refined after the first production iterations.

Repository Database (Snowflake Postgres) Sizing

In addition to Compute Pools, Virtual Warehouses, and Storage, deployments using Snowflake Postgres

as the xDM repository database require their own sizing consideration.

Compute Family

Snowflake Postgres instances are sized at creation time by selecting a compute family.

We recommend starting with a **STANDARD_XL** compute family, which provides a resource profile

roughly comparable to a traditional **4 CPU / 16 GB RAM** PostgreSQL deployment.

High Availability

- High Availability is **not required** for the repository in non-production environments.

- We recommend **enabling High Availability for production** deployments.

PostgreSQL Configuration Settings

We recommend using the **Snowflake Postgres default settings**.

Custom tuning parameters should not be applied unless:

- A specific need is identified during testing, or

- Tuning is explicitly advised by Snowflake.

Storage

**20 GB** of storage should be sufficient as a starting point.

General Guidance

These recommendations should be treated as a **baseline starting configuration**. Final sizing

should be validated against actual workload characteristics observed in your environment,

consistent with the sizing approach used for Compute Pools, Warehouses, and Storage elsewhere

in this article.

Storage Costs

Although hybrid tables store data in multiple formats internally, storage costs generally remain significantly lower than compute costs.

For most Native App deployments, the primary cost drivers are:

Compute Pools
Virtual Warehouses
AI services such as Cortex

Applications making extensive use of workflows should also consider that current workflow execution relies on polling mechanisms, which may reduce opportunities for warehouse auto-suspend.

Best Practices

Start with conservative Compute Pool and Warehouse sizes.
Monitor actual CPU, memory, and query execution metrics before increasing capacity.
Optimize the security model to reduce memory consumption.
Disable unused entity views whenever possible.
Separate interactive and batch workloads across different warehouses.
Scale Warehouse size when individual SQL jobs are slow.
Scale Warehouse concurrency when query queueing becomes frequent.
Re-evaluate storage estimates after the first implementation iterations using actual production data.

Portal

Sizing Guidelines for the Semarchy xDM Native App on Snowflake Print

Need

Summarized Solution

Detailed Solution

Understanding the xDM Native App Architecture

Compute Pool

Virtual Warehouse

Storage

Horizontal vs Vertical Scaling

Horizontal Scaling

Vertical Scaling

Compute Pool Sizing

Security Model Complexity

Entity Views

CPU vs Memory

Memory Bottleneck

CPU Bottleneck

Practical Recommendation

Virtual Warehouse Sizing

Scale Up vs Scale Out

Scale Up

Scale Out

Separating Workloads

Estimating Storage Requirements

Estimating Average Record Size

Estimating from Existing Data

Growth Estimation

Storage Costs

Best Practices

Sizing Guidelines for the Semarchy xDM Native App on Snowflake Print

Need

Summarized Solution

Detailed Solution

Understanding the xDM Native App Architecture

Compute Pool

Virtual Warehouse

Storage

Horizontal vs Vertical Scaling

Horizontal Scaling

Vertical Scaling

Compute Pool Sizing

Security Model Complexity

Entity Views

CPU vs Memory

Memory Bottleneck

CPU Bottleneck

Practical Recommendation

Virtual Warehouse Sizing

Scale Up vs Scale Out

Scale Up

Scale Out

Separating Workloads

Estimating Storage Requirements

Estimating Average Record Size

Estimating from Existing Data

Growth Estimation

Storage Costs

Best Practices

Related Articles