Can I bring my own model (BYOM) to Cortex?

Snowflake supports custom model deployment through Snowpark Container Services, which allows firms to run their own containerized model inference inside Snowflake's infrastructure. This is distinct from the Cortex hosted model catalog but uses the same security and governance boundary. For most financial services firms, the hosted Cortex catalog provides sufficient capability without the operational overhead of managing custom model deployments. BYO model via Snowpark Containers is best suited to firms with proprietary models trained on domain-specific financial data.

Snowflake Cortex for Financial Services. AI Where Your Data Already Lives.

Q: How does Cortex compare to calling external LLM APIs directly?

External API calls (OpenAI, Anthropic, Google) require sending data out of your Snowflake environment to reach the model, then receiving results back. This creates a data governance gap: the data in transit is outside your security boundary, subject to the API provider's data retention policies, and not audited by Snowflake's Access History. Cortex keeps inference inside the boundary. The trade-off is model selection: external APIs offer the most capable models (GPT-4o, Claude Opus, Gemini Ultra), while Cortex offers a curated set of open-weight and proprietary models. For tasks where frontier model capability is required — complex reasoning, nuanced generation — Milemarker Navigator uses external Claude inference via Snowpark, with data handled according to Anthropic's enterprise data policies.

Snowflake Cortex is a set of AI functions that run inside Snowflake — LLM functions (COMPLETE, SUMMARIZE, EXTRACT_ANSWER, SENTIMENT), Cortex Search (vector + hybrid), Document AI, and Cortex Analyst (text-to-SQL). No data leaves your warehouse to reach the model.

This page is part of Milemarker's Snowflake cluster for financial services. See the Snowflake for Financial Services pillar for the full landscape, or explore related capabilities: Snowflake Data Sharing and the Snowflake Marketplace.

Why AI Belongs in the Warehouse, Not Beside It

The standard architecture for AI in financial services has been: extract data from your systems, send it to an external API, receive the model's output, and store it somewhere. This pattern is operationally simple but creates a governance gap that regulators and compliance teams are increasingly focused on. The moment data leaves your controlled environment to reach a model endpoint, it is outside your security boundary, outside your audit trail, and subject to the data handling policies of whoever operates that endpoint.

Data Movement Is a Governance Failure

For wealth management firms, the sensitivity of the data makes this gap unacceptable at scale. Client names, account values, social security numbers, and investment profiles are all potentially in scope for any AI workflow that touches client records. Sending that data to an external LLM API — even a reputable one — means that data has left your environment. Your SOC 2 auditors want to know where regulated data goes. Cortex's answer is that it never goes anywhere: inference runs inside Snowflake, on data that never leaves your account.

The AI-readiness conversation in wealth management typically focuses on data quality and normalization. But AI readiness also means architectural readiness — having an inference layer that respects your security boundary. Cortex is that layer for firms already on Snowflake.

0

Data egress events per Cortex inference call — inference runs inside your Snowflake account

SQL

The interface for Cortex functions — no new language, no SDK, no separate service to manage

Full

Audit trail — every Cortex function call logged in Snowflake Access History

What Cortex Can Do for a Wealth Firm

Cortex functions are callable as SQL expressions, which means any team with access to Snowflake can apply AI to the data they already work with. The following represent the highest-value applications for wealth management operations and advisory practices.

Pre-Meeting Client Briefings

Summarize the last 12 months of CRM interaction notes, account changes, and life events into a structured briefing before an advisor's client review. The advisor walks into the meeting with context that would otherwise take 30 minutes to manually compile.

IPS Extraction from PDFs

Apply Document AI to ingest investment policy statements, extract structured data — target allocations, prohibited securities, risk tolerance bands — and populate your compliance monitoring database automatically. IPS updates propagate into compliance rules without manual rekeying.

Email and Note Sentiment

Run SENTIMENT scoring on incoming client communication to surface at-risk relationships before formal complaint or attrition. Advisors see a rolling sentiment score for each client household — a data point that no manual review process can produce at scale.

Semantic Research Search

Index firm-generated research, analyst commentary, and external reports using Cortex Search. Operations and advisory staff search for answers in natural language across thousands of documents — without reading every file or knowing which documents contain the relevant section.

Risk Anomaly Explanations

When a portfolio triggers a risk alert — concentration breach, drawdown threshold, factor exposure limit — use COMPLETE to generate a plain-language explanation of what changed and why, ready for advisor or compliance review without waiting for a quant team to write the narrative.

Natural-Language Reporting

Use Cortex Analyst to let operations, compliance, and advisory staff ask questions about firm data in plain English. Questions like "what is our average household AUM by advisor this quarter?" return accurate, SQL-generated answers without requiring anyone to write a query or open a BI tool.

Cortex in SQL

The entire Cortex API is standard SQL.

Cortex functions call directly from SELECT statements, CTEs, and stored procedures. Teams that already know SQL can apply AI to their data immediately — no Python environment, no API client library, no external service to configure.

          -- Summarize the last 6 months of advisor notes for client review prep

          SELECT household_id,

            SNOWFLAKE.CORTEX.SUMMARIZE(

              LISTAGG(note_text, ' | ') WITHIN GROUP (ORDER BY note_date)

            ) AS client_briefing

          FROM crm_notes

          WHERE note_date >= DATEADD('month', -6, CURRENT_DATE())

          GROUP BY household_id;

Cortex Search and Vector Search

Traditional keyword search fails on financial content because the vocabulary of wealth management is imprecise. A note that says "the client is cautious about market exposure" and a query for "conservative risk tolerance" share no keywords but convey the same concept. Cortex Search solves this by representing text as vectors in high-dimensional space — similar meanings cluster together regardless of exact word choice.

RAG Patterns Over Firm Content

Retrieval-augmented generation (RAG) is the architecture that makes Cortex Search operationally useful. Instead of asking an LLM to answer from its training data alone, RAG retrieves the most relevant documents from your corpus, feeds them into the prompt as context, and asks the model to answer from that context. The model's response is grounded in your firm's actual data rather than general world knowledge.

For wealth management, RAG over Cortex Search enables use cases that closed-corpus LLMs cannot: answering questions about a specific client's situation using that client's actual records, explaining a portfolio decision using the firm's own research at the time, surfacing relevant precedent from the firm's compliance history. See the AI agents in wealth management page for how these patterns connect to orchestrated workflows.

Hybrid Search

Cortex Search supports hybrid search that combines vector similarity with keyword matching. This is the right approach for financial content where precision matters alongside recall. A search for "Regulation Best Interest documentation for the Henderson account" benefits from vector similarity (understanding the regulatory concept) and from keyword matching (returning results that actually mention the Henderson account specifically). Hybrid search provides both without requiring two separate search systems.

Cortex Analyst: Text-to-SQL for Non-Technical Teams

One of the persistent limitations of analytics in wealth management is that the people who most need answers — advisors, compliance officers, operations managers — are not the people who can write SQL. The result is a queue of data requests that flows through analytics or IT teams, adding days or weeks to the cycle time for basic business questions.

Cortex Analyst addresses this directly. It accepts a natural-language question, consults a semantic model that maps business concepts to your warehouse schema, generates a SQL query, executes it, and returns the result. The semantic model is the critical piece: it defines what "AUM" means, what "client" means, what "advisor" means in the context of your specific data model — eliminating the ambiguity that makes naive text-to-SQL unreliable.

Democratized Analytics Across the Firm

When Cortex Analyst is deployed against Milemarker's wealth management data model, advisors and operations staff can ask questions like "which clients have not had a scheduled review in the last 12 months?" or "what is the aggregate equity allocation for clients over 70?" and receive accurate, SQL-generated answers in seconds. The analytics team shifts from being a query production service to being a semantic model maintainer — a much higher-value role.

Explore how this capability fits into a broader wealth management data platform and the Snowflake for RIAs and Snowflake for asset managers use cases.

Cortex vs. External LLM APIs

This is an honest comparison. Cortex and external LLM APIs each have genuine strengths, and the right architecture for most firms is not one or the other — it is a combination, with Cortex handling the majority of high-volume, governed workflows and external APIs handling tasks that require frontier model capability.

Dimension	Cortex (In-Warehouse)	External LLM API
Data governance	Data stays inside Snowflake; no egress; audited by Access History	Data sent to external endpoint; subject to provider's data retention policy; not audited in Snowflake
Data residency	Inference in the Snowflake region where your data lives	Data routed to provider's inference infrastructure; region may differ from your data region
Model capability	Curated open-weight and proprietary models; strong for summarization, extraction, classification, RAG	Access to frontier models (GPT-4o, Claude Opus, Gemini Ultra); stronger for complex reasoning and nuanced generation
Cost model	Snowflake credits per million tokens; no separate API contract; no egress cost	Per-token pricing from API provider; plus data egress cost; plus engineering to manage API client
Latency	Comparable to external API for batch; may be slower for real-time single-prompt use cases	Low latency for interactive single-prompt; scales well with streaming
Interface	Standard SQL — no additional SDK or environment required	REST API or SDK; requires application code to call and parse results
Audit trail	Built-in — every Cortex call in Access History with user, time, and data accessed	Depends on provider's logging; not integrated with Snowflake access logging

The practical guidance: use Cortex for high-volume, data-intensive workflows — summarizing thousands of client notes, scoring sentiment across millions of records, extracting structured data from large document sets. Use external APIs for interactive, high-judgment tasks — generating investment narrative for a specific client, drafting a complex compliance letter, synthesizing a research argument from multiple conflicting sources. Milemarker Navigator is designed precisely for this combination, using Cortex for warehouse-native operations and Claude for reasoning tasks that benefit from frontier model capability.

Where Milemarker Fits

Milemarker Navigator runs Claude-powered AI agents inside Snowflake using Cortex and Snowpark. This is a partner-positioned, not competitive, relationship with Snowflake — Milemarker uses the Cortex layer for in-warehouse functions (summarization, extraction, vector search) and external Claude inference via Snowpark for tasks that require more sophisticated reasoning, while keeping the data model and orchestration inside the Snowflake security boundary.

The practical implication for a wealth firm: deploying Milemarker's platform means that Cortex capabilities are available against Milemarker's pre-normalized wealth data model from day one. Firms do not need to build their own semantic layer, their own vector index, or their own document processing pipeline. The Snowflake for Financial Services infrastructure and the AI layer land together.

01

Pre-Built Semantic Model

Cortex Analyst is deployable against Milemarker's wealth data model without custom semantic model authoring. Wealth-specific concepts — household, advisor, AUM, sleeve — are pre-defined.

02

Cortex Search Index

Milemarker includes a configurable Cortex Search index over CRM notes, client documents, and firm research as part of the platform deployment.

03

Document AI Pipelines

Pre-built Document AI workflows for common wealth management documents — IPS, client agreements, custodian statements — extract structured data directly into the normalized data model.

04

Navigator Agent Orchestration

Milemarker Navigator orchestrates Cortex functions alongside Claude inference via Snowpark, giving firms access to both in-warehouse AI and frontier model capability from a single platform.

05

Governed by Design

Every AI function in the Milemarker platform — Cortex or external — operates under Snowflake's RBAC framework. No AI workflow bypasses the access controls that govern the underlying data.

06

Milemarker Augments, Not Replaces

Milemarker is designed to extend Snowflake's capabilities for wealth management — providing the data model, integration library, and AI orchestration layer that makes Snowflake and Cortex immediately productive.

Frequently Asked Questions

Is Snowflake Cortex secure for sensitive financial data?

Yes. Cortex runs AI inference inside Snowflake's security boundary — your data does not leave your Snowflake account to reach a model. All queries are subject to the same role-based access controls (RBAC) that govern standard SQL queries. Cortex maintains Snowflake's SOC 2 Type II posture, and every Cortex function call is logged in Snowflake's Access History. For firms subject to GLBA, SEC, or FINRA data governance requirements, Cortex eliminates the data residency risk of sending financial data to external API endpoints.

Which AI models are available through Snowflake Cortex?

Snowflake Cortex provides access to a curated set of hosted models including Mistral (7B, Large), Llama (3.1, 3.2, 3.3 in various sizes), Reka Core and Flash, Jamba-Instruct, and Arctic Embed for vector search. The model catalog evolves as Snowflake adds new offerings. Firms do not need API keys or separate contracts with model providers — Cortex handles licensing through Snowflake's consumption billing. Claude and OpenAI models are not available natively in Cortex, though Milemarker Navigator uses Claude via Snowpark external functions for specific use cases.

How much does Snowflake Cortex cost?

Cortex functions are billed in Snowflake credits, priced per million tokens processed. Costs vary by model — smaller models like Mistral 7B are significantly cheaper per token than larger models like Llama 3.3 70B. For most wealth management use cases (summarization, extraction, semantic search over firm content), monthly Cortex costs for a mid-size firm typically run $200 to $2,000 depending on volume and model selection. These costs are additive to base Snowflake compute and storage costs.

Can I fine-tune models in Cortex on my own data?

Snowflake Cortex supports fine-tuning for select models, allowing firms to train task-specific model variants on their own data without that data leaving Snowflake. For most wealth management use cases — structured extraction, summarization, classification — prompt engineering and retrieval-augmented generation (RAG) via Cortex Search outperform fine-tuning and are significantly cheaper and faster to deploy.

Can I bring my own model to Cortex?

Snowflake supports custom model deployment through Snowpark Container Services, which allows firms to run their own containerized model inference inside Snowflake's infrastructure. For most financial services firms, the hosted Cortex catalog provides sufficient capability without the operational overhead of managing custom model deployments. BYO model via Snowpark Containers is best suited to firms with proprietary models trained on domain-specific financial data.

What is Cortex Analyst and who is it for?

Cortex Analyst is Snowflake's text-to-SQL capability — it allows non-technical users to ask natural-language questions against a defined semantic model and receive SQL-generated answers. For wealth management, this means advisors or operations staff can ask "which clients have more than 60% equity exposure?" or "what is our total AUM by advisor this quarter?" without writing SQL. Cortex Analyst requires defining a semantic model, which Milemarker provides as part of the platform deployment against the pre-built wealth data model.

How does Cortex compare to calling external LLM APIs directly?

External API calls require sending data out of your Snowflake environment to reach the model, which creates a governance gap. Cortex keeps inference inside the boundary. The trade-off is model selection: external APIs offer frontier models with stronger reasoning capability, while Cortex offers a curated set of open-weight and proprietary models. For tasks requiring frontier capability, Milemarker Navigator uses external Claude inference via Snowpark, handled under Anthropic's enterprise data policies, while keeping orchestration and the data model inside Snowflake.

What is Cortex Search and how does it differ from keyword search?

Cortex Search indexes text content using vector embeddings, enabling semantic similarity search — finding content that is conceptually related to a query, not just lexically matching keywords. A query for "clients concerned about estate planning" will surface notes that discuss inheritance, beneficiary designations, and trust structures — not just notes containing the exact phrase. Cortex Search supports hybrid search combining vector similarity and keyword matching for optimal recall in financial content.