Methodology
How we turn raw records into usable property intelligence.
Turkish real estate data is fragmented across TKGM, municipal zoning authorities, TÜİK, BDDK, MTA and EFEHR. We normalize, reconcile and version it through a six-stage pipeline with explicit coverage and missingness at every step.
The pipeline
Six stages, zero shortcuts.
Raw source records, files, geometry and period data.
Immutable evidence, manifest and source trace.
Type, geometry, identity and date standards.
Decision-ready domain outputs.
Property, parcel, district and market context.
Report, screening result or JSON payload.
Stage detail
What happens at each layer.
01
Raw Ingestion
Source documents are ingested without modification. TKGM cadastral records, municipal zoning PDFs, statistical releases, geological rasters, and project manifests are all stored in their original form with an ingestion timestamp and source identifier.
02
Bronze Evidence
Each raw record is parsed into structured evidence fragments. Field types are validated, encoding issues are fixed, and each fragment is tagged with its source family, record type, and ingestion batch.
03
Silver Normalization
Evidence fragments are normalized to canonical field definitions. Address text is parsed and geocoded. Parcel IDs are resolved to canonical property identifiers. Zoning codes are mapped to a unified taxonomy. Units and date formats are standardized.
04
Gold Aggregation
Normalized fields from multiple source families are merged per canonical property. Conflicts are resolved by source priority and recency. Missing fields are explicitly marked — inferred when cross-source evidence permits, unknown otherwise.
05
Intelligence Profile
The gold aggregate is assembled into a property intelligence profile: canonical identity, zoning and building context, risk signals, market activity, macro context, and a full provenance trace for every field.
06
Dossier / API Serving
Intelligence profiles are served as structured API payloads, human-readable reports, or batch exports. Every output carries the same Dossier Evidence Contract: source, freshness, confidence and coverage state per field.
Core principles
What we do not compromise on.
-
Explicit missingness
Every field that cannot be populated carries an explicit state: unknown, inferred, or unsupported. We never interpolate to hide gaps.
-
Source provenance
Every field in every output traces back to a specific source, ingestion batch and timestamp. Nothing is produced without a traceable origin.
-
Determinism
The same property at the same pipeline version and the same data state always produces the same output. Our pipeline is not probabilistic at the record level.
-
Scoped coverage
We only claim coverage we can back with real data. Geographies and field types not yet supported are listed as unsupported, not silently omitted.