ETL — Extract, Transform, Load — is the process of pulling data from source systems, cleaning and normalizing it, then writing it to a central warehouse. For wealth management firms, ETL is the infrastructure layer that makes unified analytics, compliance reporting, and AI applications possible.
What Is ETL — and Why Does It Matter for Wealth Management?
ETL breaks into three discrete steps that every data pipeline must execute, regardless of the tools involved:
Extract: Pull from Source Systems
Extraction is the process of connecting to each source system and pulling data out. For wealth management firms, this means authenticating against CRM APIs, polling SFTP servers for custodian flat files, calling portfolio system endpoints, and fetching market data feeds. Each connection is unique: different authentication mechanisms, different rate limits, different data formats, different update frequencies.
Extraction is deceptively difficult. APIs throttle requests. SFTP servers go offline. Authentication tokens expire. File formats change without notice. A robust extraction layer handles all of this gracefully — retrying failed calls, detecting format drift, and alerting when data stops flowing.
Transform: Clean, Normalize, and Map
Raw data from source systems is rarely usable. A client might appear as "John A. Smith" in Salesforce, "SMITH JOHN" in a custodian feed, and "J. Smith (Trust)" in a portfolio system. Account numbers use different formats. Security identifiers mix CUSIPs, ISINs, and tickers. Dates arrive in different time zones. Null values mean different things in different systems.
The transform step resolves these inconsistencies. It normalizes entity names, maps identifiers to canonical formats, calculates derived fields (household AUM, advisor attribution, performance metrics), and validates data quality before it reaches the warehouse. This is where domain expertise — specifically, knowledge of wealth management data models — creates or destroys value.
Load: Write to the Warehouse
The load step writes transformed data to the destination — typically a cloud data warehouse like Snowflake. Loading strategies vary: full refreshes replace all data on each run, incremental loads append or update only changed records, and upsert patterns handle both inserts and updates atomically. The right strategy depends on data volume, source system capabilities, and latency requirements.
ELT: The Modern Variant
Traditional ETL transforms data before loading it. Modern ELT flips the order: raw data loads first, then transforms run inside the warehouse using SQL and tools like dbt. ELT is winning for three reasons. First, Snowflake's compute engine handles transformations at scale without a separate transformation server. Second, loading raw data first preserves a complete audit trail — every raw record is available for compliance review. Third, transforms can be updated, fixed, or re-run without re-extracting from source systems.
Why Wealth Management ETL Is Uniquely Hard
Generic ETL tools — Fivetran, Stitch, even custom Airflow pipelines — work well for standard SaaS applications. Wealth management breaks all the assumptions those tools are built on.
Heterogeneous Source Types
Most modern SaaS applications offer clean REST APIs. Wealth management systems do not. Data arrives in every format imaginable: REST APIs with OAuth, SOAP endpoints, SFTP flat files (CSV, pipe-delimited, fixed-width), FTP batch exports, email attachments, direct database connections, and in some cases, screen scraping against legacy portals that predate API access. A single firm might have five different integration modalities running simultaneously.
Constant Schema Changes from Vendors
Custodians, portfolio systems, and CRM vendors push updates constantly. When Schwab adds a column to a custodian file or Orion renames an API field, every pipeline that depends on that data breaks. In a self-built stack, these breaks create immediate incidents requiring engineering response. Firms that build their own pipelines quickly discover that maintenance — not initial development — is the dominant ongoing cost.
Identity Resolution During Transform
Wealth management firms deal with complex entity relationships: individual clients, joint accounts, trusts, entities, households, and advisor teams. The same human appears under different names, different account structures, and different identifiers across every system in the stack. Resolving these identities correctly — connecting all of "John Smith's" accounts across Orion, Salesforce, and Schwab into a unified household view — requires wealth-specific matching logic that generic transform tools do not provide out of the box.
Compliance Audit Requirements
Unlike consumer data pipelines, wealth management ETL operates under regulatory oversight. FINRA, SEC, and state regulators can demand proof of data lineage: where did this number come from, when was it last updated, who accessed it, and was it altered? A compliant pipeline preserves the complete chain of custody from raw source record to final analytics table — something generic ETL tools often omit.
Mixed Batch and Real-Time Requirements
Not all data moves at the same speed. Custodian files arrive overnight in batch. CRM events should propagate in near-real-time for advisor workflows. Market data updates tick-by-tick. Performance calculations run on end-of-day prices. A production wealth management pipeline must handle all of these latency profiles simultaneously, routing each source to the appropriate ingestion pattern without mixing them.
Common Pipeline Sources in the Advisor Tech Stack
Understanding what you are extracting from — and the specific integration characteristics of each source — is the first step in designing a reliable pipeline architecture.
Custodian Feeds
CRM Systems
Portfolio Management Systems
Financial Planning Tools
Additional Sources
- Market Data: Bloomberg, Refinitiv, Morningstar — reference data for securities, benchmarks, and pricing used in performance calculation and risk analytics.
- Billing Platforms: Orion Billing, Tamarac, Advisor Billing — fee schedules, invoice data, and revenue recognition records essential for firm financial reporting.
- Compliance Systems: Smarsh, Global Relay — communication archiving and surveillance data for regulatory audit trails.
- Operational Tools: DocuSign, Calendly, Microsoft 365 — activity data that enriches advisor productivity and client engagement metrics.
Build vs. Buy: The True Cost of DIY Pipelines
The standard DIY approach combines Airflow for orchestration, dbt for transforms, Fivetran or custom Python for extraction, and Snowflake as the warehouse. On paper, this stack is capable. In practice, the gap between proof-of-concept and production-grade reliability is measured in years, not months. A platform like Milemarker eliminates the need for a separate dbt layer entirely — extraction, transformation, and loading are handled natively as a single managed pipeline.
| Dimension | Build (DIY) | Milemarker Platform |
|---|---|---|
| Time to first production data | 6–12 months per integration | 8–16 weeks total |
| Engineering headcount | 1–2 senior data engineers minimum | No dedicated engineering required |
| Connector library | Build each from scratch | 130+ pre-built, maintained connectors |
| Vendor API changes | Breaks pipeline, requires manual fix | Handled automatically, no downtime |
| Wealth management transforms | Must build identity resolution, household mapping, security normalization | Pre-built wealth data model included |
| Compliance audit trail | Must design and build separately | Built-in data lineage and audit logging |
| Ongoing maintenance | Permanent engineering allocation | Managed by Milemarker |
The hidden cost of DIY is maintenance. A typical Airflow/dbt stack for 10 wealth management integrations requires an estimated 40 to 60 hours of engineering per month to maintain — handling API changes, schema drift, failed runs, data quality incidents, and infrastructure updates. That maintenance burden never decreases as the vendor landscape continues to evolve. A platform like Milemarker absorbs this maintenance entirely — extraction, transformation, and data model updates are managed as part of the service.
The build-vs-buy calculus changes when you account for opportunity cost. Every month a data engineering team spends maintaining custodian file parsers is a month not spent building the analytics, AI models, and reporting capabilities that create competitive advantage.
The Transformation Layer: Where Domain Expertise Matters
Generic ETL tools can move data. They cannot understand it. The transformation layer is where wealth management domain knowledge separates a reliable production pipeline from a brittle data transfer.
Identity Resolution Across Systems
A real-world wealth management firm has a client who appears in four systems under four different representations: "John Andrew Smith" in Salesforce, "SMITH, JOHN A" in the Schwab custodian file, "John Smith (Revocable Trust)" in Orion, and "jsmith@email.com" in the planning tool. These are all the same person.
Generic ETL tools treat these as four separate records. A wealth-specific transform layer applies probabilistic matching across name variants, cross-references account numbers that appear in multiple systems, matches tax IDs where available, and uses address normalization to confirm physical identity. The result is a unified client record with all accounts, all relationships, and all data linked correctly.
Security Identifier Normalization
Securities appear under different identifiers in different systems. Custodians use CUSIPs. Portfolio systems often use internal IDs. Market data providers use ISINs. Trading systems use tickers. A complete security master maps all of these to a canonical identifier, handling corporate actions (mergers, splits, ticker changes) that invalidate historical mappings over time.
Household AUM Calculation
AUM is not a raw field in any source system — it is a calculation. Total household AUM requires summing market values across all accounts, all custodians, and all portfolio systems attributed to the household, then adjusting for accounts managed by other advisors within the household relationship. This calculation requires identity resolution (knowing which accounts belong to which household) and security normalization (knowing the market value of each position) before it can execute correctly.
Advisor Team Attribution
Large RIAs have complex advisor team structures: lead advisors, service advisors, relationship managers, and business development officers all associated with the same client. Attribution models for revenue, AUM, and activity vary by firm. The transform layer must apply the firm's specific attribution logic to produce reporting that accurately reflects advisor performance and team contribution.
Performance Calculation Inputs
Time-weighted and money-weighted performance calculations require precisely sequenced transaction data, accurate pricing at each transaction date, and correct treatment of dividends, splits, and contributions. The transform layer must validate data completeness and sequencing before performance calculations run — because a single missing transaction corrupts the entire return series.
Pipeline Monitoring, Reliability, and Compliance
A pipeline that runs successfully 95 percent of the time is not a production pipeline — it is a liability. In wealth management, data failures have direct consequences: advisors working from stale data make incorrect recommendations, compliance teams cannot meet reporting deadlines, and billing errors create client disputes.
Alerting on Failures
Every extraction, transformation, and load step should emit structured logs and fire alerts on failure. Alerts must be actionable: what failed, why, what data is missing, and what downstream processes are affected. Vague "pipeline failed" notifications are useless; specific "Schwab SFTP file not received by 6:00 AM, positions data for 847 accounts is stale" notifications enable immediate response.
Data Quality Checks
Automated quality checks run after each transform step, validating that data meets expected parameters: AUM totals reconcile within tolerance, transaction counts match source system records, no accounts have gone to zero unexpectedly, and required fields are populated. These checks catch data quality issues before they reach downstream consumers — analysts, advisors, and compliance teams who rely on the warehouse.
Reconciliation
Daily reconciliation compares warehouse data against source system records to detect drift. A position that changes in the portfolio system but not in the warehouse indicates an extraction failure. A transaction that appears in the custodian file but not in the warehouse indicates a parsing error. Reconciliation surfaces these discrepancies systematically rather than waiting for a user to notice incorrect data.
Audit Trails for Compliance
Regulatory exams require firms to produce evidence of data integrity: when was a record last updated, what was its source, who accessed it, and was it modified? A compliant pipeline architecture maintains immutable audit logs at every stage — raw ingestion, transform, and load — so that any data point in the warehouse can be traced back to its original source record with a complete chain of custody.
Extraction Monitoring
Track every source connection — SFTP polls, API calls, file receipts — with success/failure status, record counts, and latency metrics.
Transform Validation
Row-level data quality rules run after each transform step. Failed validations halt the load and alert the operations team before bad data reaches the warehouse.
Reconciliation Reports
Daily comparison of warehouse totals against source system records, surfacing discrepancies by account, custodian, and data type for rapid investigation.
Immutable Audit Log
Every raw record preserved with ingestion timestamp, source system, file or API reference, and processing status. Available for regulatory examination without reconstruction.
Modern Architecture: From Source Systems to Analytics-Ready Data
The modern wealth management data architecture follows a clear pipeline from heterogeneous source systems to analytics-ready tables that power BI dashboards, reporting, and AI applications.
Raw Layer: Land Everything
Source data lands in a raw schema in Snowflake exactly as received — no transformations, no filtering, no modification. Flat files land as structured tables with original column names preserved. API responses land as JSON or normalized rows with source metadata attached. This raw layer is immutable: records are never updated or deleted, only appended. The result is a complete historical record of every piece of data that has ever been ingested.
Transformation: Milemarker's Built-In Data Models
Unlike a DIY stack where you'd bolt on dbt as a separate transformation layer, Milemarker handles data transformation natively as part of its pipeline. Extraction, normalization, identity resolution, and schema mapping happen in a single managed process — no separate orchestration tool, no maintaining SQL model files, no debugging dependency chains. Milemarker's pre-built wealth management data models cover the core entities: clients, households, accounts, securities, transactions, performance, and advisor attribution — all maintained by Milemarker and extensible with firm-specific logic.
Analytics-Ready Tables
After dbt transforms run, analytics-ready tables contain clean, normalized, joined data that downstream consumers can query directly. A business intelligence analyst can query `household_aum_daily` without knowing anything about custodian file formats. An AI engineer can train a client churn model against `client_activity_features` without building data pipelines first. The separation between pipeline infrastructure and analytical consumption is what makes the modern architecture productive.
Downstream Consumers
- Business Intelligence: Tableau, Power BI, Looker, and custom dashboards query analytics-ready tables via Snowflake's native connectors.
- Automated Reporting: Compliance reports, client performance statements, and board decks pull from pre-computed summary tables on a scheduled basis.
- AI and Machine Learning: Feature stores, training datasets, and inference pipelines draw from the same normalized warehouse, ensuring models train on production-quality data.
- Operational Workflows: CRM automations, advisor alerts, and client outreach triggers read from near-real-time tables to act on data as it arrives.
Frequently Asked Questions
Conclusion
ETL infrastructure is the foundation that every wealth management analytics, reporting, and AI initiative is built on. Firms that get it right — reliable pipelines, clean transforms, complete audit trails — move faster and compete on insights. Firms that get it wrong spend their engineering capacity on maintenance instead of building advantage.
Modern ELT architecture on Snowflake, powered by wealth-specific connectors and built-in transformation models, delivers the clean, normalized, audit-ready data that wealth management firms need. The question is whether to build it from scratch — stitching together Airflow, dbt, Fivetran, and custom code — or deploy a platform like Milemarker that handles extraction, transformation, and loading as a single managed pipeline with 130+ pre-built connectors.
Milemarker's 130+ maintained connectors, pre-built wealth data models, and managed pipeline infrastructure let firms skip the 6 to 12 months of build time and go directly to the analytics and AI applications that create competitive advantage.