PII discovery · Anonymization · Synthetic data · Runs in your environment

Your company holds personal data
in places nobody has
thought to look.

VestraData was built after watching the same pattern repeat inside regulated organisations. Production is locked down and audited. Test environments, shared drives, and last quarter's data extract often contain the same personal data with none of the same controls. The liability is identical. The visibility is not. We built this to close that gap.
Runs in your environment No data egress Air-gap ready GDPR · HIPAA · PCI-DSS
Review in progress Source: prod-postgres-01 Schema: public Tables: users, payments, patients Scan ID: VD-2026-04-0912
Field Detected as Rows Risk Confidence Status
users.email VARCHAR(255) · NOT NULL PERSONAL_EMAIL 14,823 High 99.1% Masked
patients.full_name TEXT · NULLABLE FULL_NAME 8,441 High 97.8% Review
payments.card_no VARCHAR(19) · NOT NULL CREDIT_CARD 22,109 Medium 94.2% Scoped
contacts.mobile VARCHAR(20) · NULLABLE PHONE_NUMBER 6,201 High 98.5% Masked
The reality

The controlled system is rarely the only place the data exists.

Most organisations have one environment that is locked down and audited, and several others that contain the same personal data with none of the same controls.

The test environment, the staging server, the extract someone sent last quarter, and the spreadsheet in a shared drive often carry the same liability as production. The audit trail does not.

Most tools either require your data to leave your building or only scan what they can see. VestraData is built for the places where personal data actually hides.

The platform

Everything runs on the same detection engine.

One detection layer supports discovery, de-identification, synthetic data, and AI controls. Findings, policy, and audit history stay consistent.

01
PII discovery and de-identification

See where sensitive data actually lives, then decide what to do with it.

VestraData connects to databases, file stores, and cloud buckets, then shows which fields need review. Teams can document findings, apply de-identification rules, and generate audit evidence without moving data into a third-party service.

Detects obfuscated column names. A field called "usr_eml" is flagged as personal email, not ignored.
Multi-pass adaptive sampling avoids full table reads during discovery.
Supports masking, redaction, hashing, format-preserving transforms, and synthetic replacement.
Works across structured and unstructured sources in one review model.
02
Synthetic data generation

Give engineering and analytics teams useful data without widening production exposure.

Synthetic data only works if it still behaves like the original system. VestraData preserves enough of the statistical shape and table relationships for testing, analytics, and model development without handing teams live customer records.

Preserves column distributions and cross-table relationships.
Maintains foreign key integrity so the data still joins correctly.
Exports to staging databases, object storage, and ML pipelines.
Supports scheduled refreshes instead of one-off manual copies.
03
Data airlock

Documents leave the organisation clean, or they do not leave.

Connect VestraData to a document store and it prepares governed copies in the background. When someone needs to share a file with a partner or upload it to an external tool, the safer version is already available.

Monitors SharePoint, Google Drive, Dropbox, and S3 for new or changed files.
Creates governed copies indexed by file hash for fast retrieval.
Keeps mapping and review context for audit purposes.
Reduces manual review work at the point of sharing.
04
VestraShield

Personal data should be reviewed before it reaches any external AI tool.

Teams are already using ChatGPT, Claude, and Copilot. VestraShield adds a control point at the browser so prompts and uploads can be warned on, blocked, or transformed before they leave the user's machine.

Applies policy to prompts and uploaded documents.
Can warn, block, or transform content before submission.
Runs with the same detection logic as the rest of the platform.
Keeps AI controls aligned with existing privacy policy instead of creating a parallel process.
The workflow

From connection to audit report.

The goal is not just to detect sensitive data. It is to move from first connection to a defensible outcome without creating a second privacy problem along the way.

01

Connect

Add a source and define the first review boundary. Credentials are encrypted and isolated before any scan runs.

encrypted_credential
tenant_scoped
source_scoped
02

Discover

Run an initial pass to map schemas, flag likely sensitive fields, and show where deeper review is worth the time.

adaptive_sample
schema_map
no_full_scan
03

Review

Review findings with field evidence, confidence, and risk so teams can decide what needs action.

field_evidence
risk_ranked
review_queue
04

Act

Apply the right control: de-identify, generate a safe export, or prepare a governed copy for downstream use.

mask
synthetic_export
governed_copy
05

Prove

Keep the decision trail so teams can show what changed and which policy or reviewer approved the outcome.

audit_logged
policy_linked
report_ready
Deployment

Your environment. Not ours.

All three deployment models share one property: the privacy boundary stays with you. That matters more to serious buyers than any headline claim about AI.

On-premises and air-gap

For organisations that cannot allow operational data egress

VestraData runs inside your data centre with no internet access at runtime. The ML models are bundled in the install package. Licensing uses an offline key, and LDAP plus SAML are included.

Docker ComposeHelm / KubernetesLDAPSAMLOffline licenseBundled ML modelsNo phone-home
Cloud appliance

Deploy into your own cloud account

Available on AWS, Azure, and GCP Marketplace. You control networking, IAM, and storage. VestraData does not take over the environment boundary.

AWS MarketplaceAzure MarketplaceGCP MarketplaceTerraformCloudFormation
SDK

Embed controls inside your own workflow

Use the generated REST client or the embedded ML stack in-process when privacy controls need to sit inside an existing pipeline.

PythonNode.jsJava.NETOpenAPI
Who this fits

The teams where getting this wrong has real consequences.

Not every company needs this level of control. These ones usually do.

Healthcare

NHS and private health

Health systems where patient data is spread across clinical platforms, exports, and shared files, and audit evidence has to be available on demand.

Air-gap deployment
NHS DSPT and HIPAA
No cloud egress
Financial services

Banks and investment firms

PCI-DSS scope reduction, safer non-production data, and tighter controls around how sensitive records move into engineering and analytics workflows.

PCI-DSS scope reduction
LDAP and SAML auth
Synthetic data for ML
Legal

Law firms and accountancies

Client documents contain information that cannot reach external AI tools unchecked. Review and control need to happen before those files are shared or uploaded.

Document airlock
VestraShield
Client privilege audit trail
Data platforms

Data marketplaces

Review inbound datasets for personal data before publishing or downstream reuse, with tenant isolation between each customer's data.

SDK integration
Multi-tenant isolation
Event-driven scanning
Engineering

Dev and test teams

Realistic test data that behaves like production, without handing developers live personal data or relying on brittle one-off masking scripts.

FK-preserving subsets
Direct staging DB import
Scheduled refresh
ML engineering

Model and data science teams

Training and evaluation datasets that are statistically faithful to production without normalising access to live personal data.

Differential privacy
S3 and Parquet export
Distribution preserved
The demo

Here is exactly what happens when you book a session.

Not a slide deck. Not a sandboxed environment with fabricated data. We connect to something real in your organisation and you see actual findings.

Minutes 0-5
We start with one real source
Usually a read-only database credential, file store, or bucket that is representative enough to answer whether the product fits your environment.
Minutes 5-20
We run discovery and scan
You watch it happen live. The schema map builds in real time. Findings appear as the scan progresses. No prepared screenshots.
Minutes 20-35
We walk through the findings
What was found, where, the risk level, and the confidence score. We explain any finding you want to understand in more depth.
Minutes 35-45
You pressure-test the fit
Deployment, controls, source coverage, air-gap requirements, and what a narrow pilot in your environment would actually involve.

After the session, you should know whether the deployment model works, whether the first workflow is meaningful, and whether a pilot is justified.

Book a session

See what is in your data.

The first session should answer three things quickly: can this run inside your boundary, does it fit the first workflow you actually care about, and is a pilot in your environment worth doing. If those answers are not clear by the end, it was not a useful session.

Start with architecture, controls, and operating model. The sales layer can wait.
What the session should answer
Can this run inside your boundary without weakening your deployment model?
Does it fit the first workflow you actually care about, not a generic demo path?
What would a narrow thirty-day pilot in your environment actually involve?
What findings report, controls review, and next-step decision should you expect after the session?
Design partner programme

Working with a small number of organisations before the public launch.

We are working closely with early design partners in healthcare, financial services, legal, and data platforms. The point is not early access for its own sake. It is to shape the product against real operating constraints before broader launch.

If your organisation has a genuine privacy or compliance problem and wants direct input into how the product evolves, this is the right way to engage before launch.

Design partners get direct engineering access, early capabilities, and a shorter loop from feedback to changes.

Open slots by sector

Healthcare / NHS
Air-gapped clinical network or NHS trust with DSPT requirements.
Open now
Financial services
Bank or investment firm with PCI-DSS or synthetic data requirements.
Open now
Legal
Law firm or accountancy sharing documents with external AI tools.
Next cohort
Data platforms
Platform ingesting third-party datasets that require PII scanning.
Limited