PII discovery · Anonymization · Synthetic data · Runs in your environment

Your company holds personal data
in places nobody has
thought to look.

VestraData was built after watching the same pattern repeat inside regulated organisations. Production is locked down and audited. Test environments, shared drives, and last quarter's data extract often contain the same personal data with none of the same controls. The liability is identical. The visibility is not. We built this to close that gap.

Book a technical review See how it works

Runs in your environment No data egress Air-gap ready GDPR · HIPAA · PCI-DSS

Review in progress Source: prod-postgres-01 Schema: public Tables: users, payments, patients Scan ID: VD-2026-04-0912

Field	Detected as	Rows	Risk	Confidence	Status
users.email VARCHAR(255) · NOT NULL	PERSONAL_EMAIL	14,823	High	99.1%	Masked
patients.full_name TEXT · NULLABLE	FULL_NAME	8,441	High	97.8%	Review
payments.card_no VARCHAR(19) · NOT NULL	CREDIT_CARD	22,109	Medium	94.2%	Scoped
contacts.mobile VARCHAR(20) · NULLABLE	PHONE_NUMBER	6,201	High	98.5%	Masked

The reality

The controlled system is rarely the only place the data exists.

Most organisations have one environment that is locked down and audited, and several others that contain the same personal data with none of the same controls.

The test environment, the staging server, the extract someone sent last quarter, and the spreadsheet in a shared drive often carry the same liability as production. The audit trail does not.

Most tools either require your data to leave your building or only scan what they can see. VestraData is built for the places where personal data actually hides.

The platform

Everything runs on the same detection engine.

One detection layer supports discovery, de-identification, synthetic data, and AI controls. Findings, policy, and audit history stay consistent.

PII discovery and de-identification

See where sensitive data actually lives, then decide what to do with it.

VestraData connects to databases, file stores, and cloud buckets, then shows which fields need review. Teams can document findings, apply de-identification rules, and generate audit evidence without moving data into a third-party service.

Detects obfuscated column names. A field called "usr_eml" is flagged as personal email, not ignored.

Multi-pass adaptive sampling avoids full table reads during discovery.

Supports masking, redaction, hashing, format-preserving transforms, and synthetic replacement.

Works across structured and unstructured sources in one review model.

Synthetic data generation

Give engineering and analytics teams useful data without widening production exposure.

Synthetic data only works if it still behaves like the original system. VestraData preserves enough of the statistical shape and table relationships for testing, analytics, and model development without handing teams live customer records.

Preserves column distributions and cross-table relationships.

Maintains foreign key integrity so the data still joins correctly.

Exports to staging databases, object storage, and ML pipelines.

Supports scheduled refreshes instead of one-off manual copies.

Data airlock

Documents leave the organisation clean, or they do not leave.

Connect VestraData to a document store and it prepares governed copies in the background. When someone needs to share a file with a partner or upload it to an external tool, the safer version is already available.

Monitors SharePoint, Google Drive, Dropbox, and S3 for new or changed files.

Creates governed copies indexed by file hash for fast retrieval.

Keeps mapping and review context for audit purposes.

Reduces manual review work at the point of sharing.

VestraShield

Personal data should be reviewed before it reaches any external AI tool.

Teams are already using ChatGPT, Claude, and Copilot. VestraShield adds a control point at the browser so prompts and uploads can be warned on, blocked, or transformed before they leave the user's machine.

Applies policy to prompts and uploaded documents.

Can warn, block, or transform content before submission.

Runs with the same detection logic as the rest of the platform.

Keeps AI controls aligned with existing privacy policy instead of creating a parallel process.

The workflow

From connection to audit report.

The goal is not just to detect sensitive data. It is to move from first connection to a defensible outcome without creating a second privacy problem along the way.

Connect

Add a source and define the first review boundary. Credentials are encrypted and isolated before any scan runs.

encrypted_credential

tenant_scoped

source_scoped

Discover

Run an initial pass to map schemas, flag likely sensitive fields, and show where deeper review is worth the time.

adaptive_sample

schema_map

no_full_scan

Review

Review findings with field evidence, confidence, and risk so teams can decide what needs action.

field_evidence

risk_ranked

review_queue

Act

Apply the right control: de-identify, generate a safe export, or prepare a governed copy for downstream use.

mask

synthetic_export

governed_copy

Prove

Keep the decision trail so teams can show what changed and which policy or reviewer approved the outcome.

audit_logged

policy_linked

report_ready

Deployment

Your environment. Not ours.

All three deployment models share one property: the privacy boundary stays with you. That matters more to serious buyers than any headline claim about AI.

On-premises and air-gap

For organisations that cannot allow operational data egress

VestraData runs inside your data centre with no internet access at runtime. The ML models are bundled in the install package. Licensing uses an offline key, and LDAP plus SAML are included.

Docker ComposeHelm / KubernetesLDAPSAMLOffline licenseBundled ML modelsNo phone-home

Cloud appliance

Deploy into your own cloud account

Available on AWS, Azure, and GCP Marketplace. You control networking, IAM, and storage. VestraData does not take over the environment boundary.

AWS MarketplaceAzure MarketplaceGCP MarketplaceTerraformCloudFormation

SDK

Embed controls inside your own workflow

Use the generated REST client or the embedded ML stack in-process when privacy controls need to sit inside an existing pipeline.

PythonNode.jsJava.NETOpenAPI

Who this fits

The teams where getting this wrong has real consequences.

Not every company needs this level of control. These ones usually do.

Healthcare

NHS and private health

Health systems where patient data is spread across clinical platforms, exports, and shared files, and audit evidence has to be available on demand.

Air-gap deployment

NHS DSPT and HIPAA

No cloud egress

Financial services

Banks and investment firms

PCI-DSS scope reduction, safer non-production data, and tighter controls around how sensitive records move into engineering and analytics workflows.

PCI-DSS scope reduction

LDAP and SAML auth

Synthetic data for ML

Legal

Law firms and accountancies

Client documents contain information that cannot reach external AI tools unchecked. Review and control need to happen before those files are shared or uploaded.

Document airlock

VestraShield

Client privilege audit trail

Data platforms

Data marketplaces

Review inbound datasets for personal data before publishing or downstream reuse, with tenant isolation between each customer's data.

SDK integration

Multi-tenant isolation

Event-driven scanning

Engineering

Dev and test teams

Realistic test data that behaves like production, without handing developers live personal data or relying on brittle one-off masking scripts.

FK-preserving subsets

Direct staging DB import

Scheduled refresh

ML engineering

Model and data science teams

Training and evaluation datasets that are statistically faithful to production without normalising access to live personal data.

Differential privacy

S3 and Parquet export

Distribution preserved

The demo

Here is exactly what happens when you book a session.

Not a slide deck. Not a sandboxed environment with fabricated data. We connect to something real in your organisation and you see actual findings.

Minutes 0-5

We start with one real source

Usually a read-only database credential, file store, or bucket that is representative enough to answer whether the product fits your environment.

Minutes 5-20

We run discovery and scan

You watch it happen live. The schema map builds in real time. Findings appear as the scan progresses. No prepared screenshots.

Minutes 20-35

We walk through the findings

What was found, where, the risk level, and the confidence score. We explain any finding you want to understand in more depth.

Minutes 35-45

You pressure-test the fit

Deployment, controls, source coverage, air-gap requirements, and what a narrow pilot in your environment would actually involve.

After the session, you should know whether the deployment model works, whether the first workflow is meaningful, and whether a pilot is justified.

Book a session

See what is in your data.

The first session should answer three things quickly: can this run inside your boundary, does it fit the first workflow you actually care about, and is a pilot in your environment worth doing. If those answers are not clear by the end, it was not a useful session.

Book a technical review View resources

Start with architecture, controls, and operating model. The sales layer can wait.

What the session should answer

Can this run inside your boundary without weakening your deployment model?

Does it fit the first workflow you actually care about, not a generic demo path?

What would a narrow thirty-day pilot in your environment actually involve?

What findings report, controls review, and next-step decision should you expect after the session?

Design partner programme

Working with a small number of organisations before the public launch.

We are working closely with early design partners in healthcare, financial services, legal, and data platforms. The point is not early access for its own sake. It is to shape the product against real operating constraints before broader launch.

If your organisation has a genuine privacy or compliance problem and wants direct input into how the product evolves, this is the right way to engage before launch.

Design partners get direct engineering access, early capabilities, and a shorter loop from feedback to changes.

Apply as a design partner

Open slots by sector

Healthcare / NHS

Air-gapped clinical network or NHS trust with DSPT requirements.

Open now