Data Loading
Data Loading
This first post explores some crucial decisions you can make in data model design to
ensure good performance far into the future. The most important decisions you make to
ease the loading of big data can happen much earlier in your project ? when you are
designing your data model. Some choices in object structure, fields, and relationships
may fit your business requirements, but cause problems later during data loading or
operations. This post highlights four common issues: object bloat, slow queries, record
ownership skew, and parent/child data skew.
The Solution
When business requirements demand large objects, there is a platform feature that can
significantly improve performance ? the “skinny table.” For example, after careful
analysis of the fields that a report requires, you can use a skinny table to hold just those
fields. The report will then calculate and display much faster because the underlying
query scans fewer fields.
Put It Into Practice
To minimize the impact of large objects on future queries, list views, and reports,
consider the following when designing your data model:
• if requirements do call for an object with a large number of fields, ask Salesforce
Support to help you create a skinny table that contains only the required fields
The Solution
You can anticipate and prevent query performance from degrading by understanding
and using indexed fields in your queries. The Salesforce platform includes its own query
optimizer that takes advantage of indexed fields to create the most efficient execution
plan for a given query. The handy cheat sheet below shows which fields Salesforce
always indexes in objects.
In addition to these standard fields, Salesforce also allows customers to create custom
indexes on most fields. The exceptions are: non-deterministic formula fields, multi-select
picklists, currency fields in a multi-currency org, long text area and rich text area fields,
and binary fields (type blob, file or encrypted text).
• ask Salesforce Support to help you create custom indexes to speed up specific
queries
You can find more information on how the Salesforce query optimizer takes advantage
of standard and custom indexes to speed query execution in the doc “Best Practices for
Deployments with Large Data Volumes” in the reference section below.
Force.com Record Ownership Skew
The concept of record ownership is a very powerful feature for managing record access
on the Salesforce platform. When individual users own the records they create, the role
hierarchy makes sure managers have access to the data owned by their subordinates.
But when a single user owns a very high percentage of the data for any one object,
Force.com must perform large sharing recalculations when you move that user in the
hierarchy. These recalculations can be even worse when you add or remove the user to
a role or public group that uses a sharing rule to make its data visible to other users in
the organization.
The Solution
Avoid assigning a single user as owner of a large amount of data whenever possible.
• Design your ownership strategy from the beginning so that users own the data
they create, then use the role hierarchy and sharing rules to provide access to
others.
• When you must have a single owner for a large amount of data, place that user in
their own role at the top of the hierarchy, use sharing to provide access for other
users, and don’t move that user to a new role.
For more information about managing data ownership skew, consult the doc “Architect
Salesforce Record Ownership Skew for Peak Performance in Large Data Volume
Environments” below.
For example, in the diagram above Jean is transferring the Contact “Bob Smith” to her
teammate, Thomas. So Thomas should gain access to the parent Universal Containers
Account, and Jean should lose access. But if Jean owns any of the other 299,999
contacts under the Account, she should retain access. To resolve this situation, the
platform has to check every one of these Contacts to make sure Jean is not the owner.
This underlying operation can take a substantial amount of time.
The Solution
To avoid issues with parent/child data skew, salesforce.com recommends that you keep
the number of child records assigned to a single parent below 10,000.
• Plan your data model with enough Accounts to keep the parent/child ratio below
10,000, and distribute new child records across these Accounts as the child
records are created.
• Engage an architect from Salesforce Strategic Services to help you design the
best way to manage the initial configuration and growth over time.
For more information about managing parent/child data skew, consult the doc
“Reducing Lock Contention by Avoiding Account Data Skews” below.
Summary
When planning a Salesforce implementation that will serve a large number of users with
a large amount of data, design your data model to build scalability in from the beginning.
Key elements to consider include:
• Keep objects as lean as possible ? where you can’t, consider using skinny tables.
Related Resources
• Best Practices for Deployments with Large Data Volumes
• Architect Salesforce Record Ownership Skew for Peak Performance in Large Data
Volume Environments
This post outlines the steps that you can take to maximize loading efficiency when you
need to get your application data into Force.com as quickly as possible.
2. Identify the minimal data set and configuration required to implement those
operations.
• Workflow rules, validation rules, and triggers – These are powerful tools for
making sure data entered during daily operations is clean and includes
appropriate relationships between records. Unfortunately, they can also slow
down processing if they are enabled during massive data loads. We will cover
these rules and triggers in greater detail in the upcoming post about suspending
events that fire on insert.
• Parent records with master-detail children – You won’t be able to load child
records if the parents don’t already exist. We will cover this topic in detail in the
upcoming post about sequencing load operations.
• Record owners (users) – In most cases, your records will be owned by individual
users, and the owners need to exist in the system before you can load the data.
• Role hierarchy – You might think that loading would be faster if the owners of
your records were not members of the role hierarchy. But in almost all cases, the
performance would be the same, and it would be considerably faster if you were
loading portal accounts. So there would be no benefit to deferring this aspect of
configuration.
Summary
When preparing for a very large Force.com implementation, you want to transfer legacy
data onto the platform efficiently. By stripping down your initial configuration to only
those items required for data integrity between objects, you can greatly increase the
speed of a massive initial data load. As always, you should test these recommendations
in a sandbox organization to identify how these lean methods can best benefit your
business needs.
Related Resources
• Extreme Force.com Data Loading, Part 1: Tune Your Data Model